Expert Systems With Applications 83 (2017) 187–205
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Deep learning networks for stock market analysis and prediction:
Methodology, data representations, and case studies
Eunsuk Chong
∗
a ,
a , Chulwoo Han
b , Frank C. Park
a Robotics Laboratory, Seoul National University, Seoul 08826, Korea
b Durham University Business School, Mill Hill Lane, Durham DH1 3LB, UK
a r t i c l e
i n f o
a b s t r a c t
Article history:
Received 15 November 2016
Revised 15 April 2017
Accepted 16 April 2017
Available online 22 April 2017
Keywords:
Stock market prediction
Deep learning
Multilayer neural network
Covariance estimation
We offer a systematic analysis of the use of deep learning networks for stock market analysis and pre-
diction. Its ability to extract features from a large set of raw data without relying on prior knowledge
of predictors makes deep learning potentially attractive for stock market prediction at high frequencies.
Deep learning algorithms vary considerably in the choice of network structure, activation function, and
other model parameters, and their performance is known to depend heavily on the method of data repre-
sentation. Our study attempts to provides a comprehensive and objective assessment of both the advan-
tages and drawbacks of deep learning algorithms for stock market analysis and prediction. Using high-
frequency intraday stock returns as input data, we examine the effects of three unsupervised feature
extraction methods—principal component analysis, autoencoder, and the restricted Boltzmann machine—
on the network’s overall ability to predict future market behavior. Empirical results suggest that deep
neural networks can extract additional information from the residuals of the autoregressive model and
improve prediction performance; the same cannot be said when the autoregressive model is applied to
the residuals of the network. Covariance estimation is also noticeably improved when the predictive net-
work is applied to covariance-based market structure analysis. Our study offers practical insights and
potentially useful directions for further investigation into how deep learning networks can be effectively
used for stock market analysis and prediction.
© 2017 Elsevier Ltd. All rights reserved.
1. Introduction
Research on the predictability of stock markets has a long his-
tory in financial economics ( e.g., Ang & Bekaert, 2007; Bacchetta,
Mertens, & Van Wincoop, 2009; Bondt & Thaler, 1985; Bradley,
1950; Campbell & Hamao, 1992; Campbell & Thompson, 2008;
Campbell, 2012; Granger & Morgenstern, 1970 ). While opinions dif-
fer on the efficiency of markets, many widely accepted empirical
studies show that financial markets are to some extent predictable
( Bollerslev, Marrone, Xu, & Zhou, 2014; Ferreira & Santa-Clara,
2011; Kim, Shamsuddin, & Lim, 2011; Phan, Sharma, & Narayan,
2015 ). Among methods for stock return prediction, econometric or
statistical methods based on the analysis of past market move-
ments have been the most widely adopted ( Agrawal, Chourasia, &
Mittra, 2013 ). These approaches employ various linear and nonlin-
ear methods to predict stock returns, e.g., autoregressive models
∗
Corresponding author.
E-mail addresses: bear3498@snu.ac.kr (E. Chong), chulwoo.han@durham.ac.uk
(C. Han), fcp@snu.ac.kr (F.C. Park).
http://dx.doi.org/10.1016/j.eswa.2017.04.030
0957-4174/© 2017 Elsevier Ltd. All rights reserved.
and artificial neural networks (ANN) ( Adebiyi, Adewumi, & Ayo,
2014; Armano, Marchesi, & Murru, 2005; Atsalakis & Valavanis,
2009; Bogullu, Dagli, & Enke, 2002; Cao, Leggio, & Schniederjans,
2005; Chen, Leung, & Daouk, 2003; Enke & Mehdiyev, 2014; Gure-
sen, Kayakutlu, & Daim, 2011a; Kara, Boyacioglu, & Baykan, 2011;
Kazem, Sharifi, Hussain, Saberi, & Hussain, 2013; Khashei & Bi-
jari, 2011; Kim & Enke, 2016a; 2016b; Monfared & Enke, 2014;
Rather, Agarwal, & Sastry, 2015; Thawornwong & Enke, 2004; Tic-
knor, 2013; Tsai & Hsiao, 2010; Wang, Wang, Zhang, & Guo, 2011;
Yeh, Huang, & Lee, 2011; Zhu, Wang, Xu, & Li, 2008 ). While there
is uniform agreement that stock returns behave nonlinearly, many
empirical studies show that for the most part nonlinear models do
not necessarily outperform linear models: e.g., Lee, Sehwan, and
Jongdae (2007) , Lee, Chi, Yoo, and Jin (2008) , Agrawal et al. (2013) ,
and Adebiyi et al. (2014) propose linear models that outperform
or perform as well as nonlinear models, whereas Thawornwong
and Enke (2004) , Cao et al. (2005) , Enke and Mehdiyev (2013) ,
and Rather et al. (2015) find nonliner models outperfrom linear
models. Table 1 provides a summary of recent works relevant to
our research. For more exhaustive and detailed reviews, we refer
188
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
Table 1
A summery of recent studies on stock market prediction.
Authors
(Year)
Data type (Num.
of input features
× lagged times)
Target output
Num. of
samples (Training:
Validation: Test)
Sampling
period
(Frequency)
Method
Enke and Mehdiyev
(2013)
US S&P 500 index
(20
× 1)
Stock price
361
Jan-1980 to
Jan-2010 (daily)
Niaki and
Hoseinzade (2013)
Cervelló-Royo et al.
(2015)
Korea KOSPI200
× 1)
index (27
US Dow Jones
× 10)
index (1
Market direction
(up or down)
Market trend
(bull/bear-flag)
Performance
measure
RMSE
statistical tests
Feature
+
fuzzy
selection
+
clustering
fuzzy
NN
Feature
+
selection
ANN
Template matching
trading simulation
+
SVR
SVR}
{ANN, RF,
MAPE, MAE, rRMSE,
MSE
template
Dimension
+
reduction
matching
Particle swarm
+
optimization
ANN
trading simulation
trading simulation
Fuzzy system
trading simulation
{genetic
+
ANN
algorithm,
simulated
annealing}
Deep NN
MSE
MSE, directional
accuracy
Dimension
+
reduction
ANN
trading simulation,
statistical tests
Data
+
representation
NN
deep
NMSE, RMSE, MAE,
MI
1-Mar-1994 to
30-Jun-2008 (daily)
22-May-20 0 0 to
29-Nov-2013
(15-min)
Jan-2003 to
Dec-2012 (daily)
7-Jan-1989 to
24-Mar-2004
(daily)
Jan-2008 to
Dec-2010 (daily)
15-Nov-1996 to
5-Jun-2012 (daily)
Nov-1993 to
Jul-2013 (monthly)
2-Sep-2008 to
7-Nov-2008
(1-minute)
1-Jun-2003 to
31-May-2013
(daily)
4-Jan-2010 to
30-Dec-2014
(5-minute)
3650 (8:1:1)
91,307
∗
2393
∗
b ,
3412
∗
a ,
3818
(7:0:1)
756 (2:0:1)
∗
3907
India CNX and BSE
indices (10
× 1)
Stock price
a and
× 20)
Taiwan TAIEX
b
US NASDAQ
indices (27
World 22 stock
market indices
({3
Greece ASE general
index (8
× 1)
× 1)
∼5}
Market trend
(bull-flag)
Trading signal
(stock price)
Portfolio
composition
(cash:stock)
Japan Nikkei 225
index (71
× 1)
Stock return
237 (7:0:3)
×
US Apple stock (3
{2
∼15}
+
2)
Stock price
19,109 (17:0:3)
US SPDR S&P 500
× 1)
ETF (SPY) (60
Market direction
(up or down)
2518 (14:3:3)
Korea KOSPI 38
stock returns
(38
× 10)
Stock return
73,041 (3:1:1)
Patel, Shah,
Thakkar, and
Kotecha (2015)
T.-l. Chen and Chen
(2016)
Chiang, Enke, Wu,
and Wang (2016)
Chourmouziadis
and Chatzoglou
(2016)
Qiu, Song, and
Akagi (2016)
Arévalo, Niño,
Hernández, and
Sandoval (2016)
Zhong and Enke
(2017)
Our study
NN: neural network, SVR: support vector regression, RF: random forest, rRMSE: relative RMSE, NMSE: normalized MSE, MI: mutual information.
∗
In some studies the number of samples is not explicitly provided. We have calculated the number of samples based on each country’s business days.
the reader to Atsalakis and Valavanis (2009) , Guresen, Kayakutlu,
and Daim (2011b) , and Cavalcante, Brasileiro, Souza, Nobrega, and
Oliveira (2016) .
Recently there has been a resurgence of interest in artificial
neural networks, in large part to its spectacular successes in image
classification, natural language processing, and various time-series
problems ( Cire ¸s An, Meier, Masci, & Schmidhuber, 2012; Hinton &
Salakhutdinov, 2006; Lee, Pham, Largman, & Ng, 2009 ). Underlying
this progress is the development of a feature learning framework,
known as deep learning ( LeCun, Bengio, & Hinton, 2015 ), whose
basic structure is best described as a multi-layer neural network,
and whose success can be attributed to a combination of increased
computational power, availability of large datasets, and more so-
phisticated algorithms ( Bengio, Lamblin, Popovici, Larochelle et al.,
2007; Deng & Yu, 2014; Hinton, Osindero, & Teh, 2006; Salakhut-
dinov & Hinton, 2009; Srivastava, Hinton, Krizhevsky, Sutskever, &
Salakhutdinov, 2014 ).
There has been growing interest in whether deep learning can
be effectively applied to problems in finance, but the literature (at
1 With the increas-
least in the public domain) still remains limited.
ing availability of high-frequency trading data and the less-than-
satisfactory performance of existing models, comprehensive studies
that objectively examine the suitability of deep learning to stock
1 There exist a few studies that apply deep learning to identification of the rela-
tionship between past news events and stock market movements ( Ding, Zhang, Liu,
& Duan, 2015; Yoshihara, Fujikawa, Seki, & Uehara, 2014 ), but, to our knowledge,
there is no study that apply deep learning to extract information from the stock
return time series.
market prediction and analysis seem opportune. The ability to ex-
tract abstract features from data, and to identify hidden nonlinear
relationships without relying on econometric assumptions and hu-
man expertise, makes deep learning particularly attractive as an
alternative to existing models and approaches.
ANNs require a careful selection of the input variables and net-
work parameters such as the learning rate, number of hidden lay-
ers, and number of nodes in each layer in order to achieve satisfac-
tory results ( Hussain, Knowles, Lisboa, & El-Deredy, 2008 ). It is also
important to reduce dimensionality to improve learning efficiency.
On the other hand, deep learning automatically extracts features
from data and requires minimal human intervention during fea-
ture selection. Therefore, our approach does not require expertise
on predictors such as macroeconomic variables and enables us to
use a large set of raw-level data as input. Ignoring the factors that
are known to predict the returns, our approach may not be able
to outperform existing models based on carefully chosen predic-
tors. However, considering the fast growth of deep learning algo-
rithms, we believe our research will serve as a milestone for the
future research in this direction. Chourmouziadis and Chatzoglou
(2016) also predict that deep learning will play a key role in finan-
cial time series forecasting.
We conjecture that, due to correlation, past stock returns affect
not only its own future returns but also the future returns of other
stocks, and use 380 dimensional lagged stock returns (38 stocks
and 10 lagged returns) as input data letting deep learning extract
features. This large input dataset makes deep learning a particu-
larly suitable choice for our research.
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
189
We test our model on high-frequency data from the Korean
stock market. Previous studies have predominantly focused on re-
turn prediction at a low frequency, and high frequency return
prediction studies haven been rare. However, market microstruc-
ture noise can cause temporal market inefficiency and we ex-
pect that many profit opportunities can be found at a high fre-
quency. Also, high-frequency trading allows more trading opportu-
nities and makes it possible to achieve statistical arbitrage. Another
advantage of using high-frequency data is that we can get a large
dataset, which is essential to overcome data-snooping and over-
fitting problems inevitable in neural network or any other non-
linear models. Whilst it should be possible to train the network
within 5 min given a reasonable size of training set, we believe
daily training should be enough.
Put together, the aim of the paper is to assess the potential of
deep feature learning as a tool for stock return prediction, and
more broadly for financial market prediction. Viewed as multi-
layer neural networks, deep learning algorithms vary considerably
in the choice of network structure, activation function, and other
model parameters; their performance is also known to depend
heavily on the method of data representation. In this study, we
offer a systematic and comprehensive analysis of the use of deep
learning. In particular, we use stock returns as input data to a stan-
dard deep neural network, and examine the effects of three un-
supervised feature extraction methods—principal component anal-
ysis, autoencoder, and the restricted Boltzmann machine—on the
network’s overall ability to predict future market behavior. The
network’s performance is compared against a standard autoregres-
sive model and ANN model for high-frequency intraday data taken
from the Korean KOSPI stock market.
The empirical analysis suggests that the predictive performance
of deep learning networks is mixed and depends on a wide range
of both environmental and user-determined factors. On the other
hand, deep feature learning is found to be particularly effective
when used to complement an autoregressive model, considerably
improving stock return prediction. Moreover, applying this predic-
tion model to covariance-based market structure analysis is also
shown to improve covariance estimation effectively. Beyond these
findings, the main contributions of our study are to demonstrate
how deep feature learning-based stock return prediction models
can be constructed and evaluated, and to shed further light on fu-
ture research directions.
The remainder of the paper is organized as follows.
Section 2 describes the framework for stock return prediction
adopted in our research, with a brief review of deep neural net-
works and data representation methods. Section 3 describes the
sample data construction process using stock returns from the
KOSPI stock market. A simple experiment to show evidence of pre-
dictability is also demonstrated in this section. Section 4 proposes
several data representations that are used as inputs to the deep
neural network, and assesses their predictive power via a stock
market trend prediction test. Section 5 is devoted to construction
and evaluation of the deep neural network; here we compare our
model with a standard autoregressive model, and also test a hybrid
model that merges the two models. In Section 6 , we apply the
results of return predictions to covariance structure analysis, and
show that incorporating return predictions improves estimation of
correlations between stocks. Concluding remarks with suggestions
for the future research are given in Section 7 .
2. Deep feature learning for stock return prediction
2.1. The framework
+
the stock return at time t
For each stock, we seek a predictor function f in order to predict
r t+1
,
,
1
given the features ( represen-
f (u t
)
,
and the unpredictable part
tation ) u t extracted from the information available at time t . We
assume that r t+1 can be decomposed into two parts: the predicted
=
output ˆ r t+1
γ , which we re-
gard as Gaussian noise:
=
+
γ ∼ N
ˆ r t+1
r t+1
N
β )
(0
,
where
denotes a normal distribution with zero mean and
β. The representation u t can be either a linear or a non-
variance
linear transformation of the raw level information R t . Denoting the
transformation function by
φ, we have
(0
,
(1)
β )
γ ,
,
φ(R t
)
,
(2)
=
u t
.
◦ φ(R t
)
and
=
ˆ r t+1
f
For our research, we define the raw level information as the past
returns of the stocks in our sample. If there are M stocks in the
sample and g lagged returns are chosen, R t will have the form
(3)
.
.
.
r 1
,t−g+1
,
.
.
.
,
,
r M,t
.
.
.
,
r M,t−g+1 ]
Mg ,
R
(4)
T ∈
=
,
[ r 1
R t
,t
where r i, t
denotes the return on stock i at time t . In the remainder
of this section, we illustrate the construction of the predictor func-
tion f using a deep neural network, and the transformation func-
tion
φ using different data representation methods.
2.2. Deep neural network
,
and h l+1
A Neural network specifies the nonlinear relationship between
through a network function, which typi-
two variables h l
cally has the form
=
+
δ(W h l
b)
h l+1
(5)
δ is called an activation function, and the matrix W and
where
vector b are model parameters. The variables h l
are said to
form a layer; when there is only one layer between the variables,
their relationship is called a single-layer neural network. Multi-
layer neural networks augmented with advanced learning methods
are generally referred to as deep neural networks (DNN). A DNN
=
,
for the predictor function, y
can be constructed by serially
stacking the network functions as follows:
and h l+1
f (u
)
=
=
h 1
h 2
=
y
+
δ1
(
)
b 1
W 1 u
+
δ2
(
)
b 2
W 2 h 1
. . .
δL
(
W L h L
where L is the number of layers.
,
+
−1
)
b L
{
n ,
u
τ n }
N
=1
n
Given a dataset
E
(y
n ,
f (u
n )
of inputs and targets, and an error
τ n )
that measures the difference between the out-
function
n =
τ n , the model parameters for the
and the target
put y
θ =
,
,
,
,
entire network,
can be chosen so as to
b 1
W L
minimize the sum of the errors:
{
,
W 1
}
,
b L
.
.
.
.
.
.
(y
n ,
τ n )
.
(6)
N
E
=1
n
=
J
min θ
E (·)
,
its gradient can be obtained
Given an appropriate choice of
analytically through error backpropagation ( Bishop, 2006 ). In this
case, the minimization problem in (6) can be solved by the usual
gradient descent method. A typical choice for the objective func-
tion that we adopt in this paper has the form:
=
J
N
L
2 +
λ ·
(7)
y
W l
l=1
,
2
respectively denote the Euclidean norm and
norm. The second term is a “regularizer” added to
λ is a user-defined coefficient.
1
N
=1
n
·
n − τ n
·
and
where
the matrix L 2
avoid overfitting, while
2
190
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
2.3. Data representation methods
Transforming (representing) raw data before inputing it into a
machine learning task often enhances the performance of the task.
It is generally accepted that the performance of a machine learn-
ing algorithm depends heavily on the choice of the data repre-
sentation method ( Bengio, Courville, & Vincent, 2013; Längkvist,
Karlsson, & Loutfi, 2014 ). There are various data representation
methods such as zero-to-one scaling, standardization, log-scaling,
and principal component analysis (PCA) ( Atsalakis & Valavanis,
2009 ). More recently, nonlinear methods became popular: Zhong
and Enke (2017) compare PCA and two of its nonlinear variants;
fuzzy robust PCA and kernel-based PCA. In this paper, we consider
three unsupervised data representation methods: PCA, the autoen-
coder (AE), and the restricted Boltzmann machine (RBM). PCA in-
volves a linear transformation while the other two are nonlinear
transformations. These representation methods are widely used in,
e.g., image data classification ( Coates, Lee, & Ng, 2010 ).
For raw data x and a transformation function
φ(x
)
is
called a representation of x , while each element of u is called a
→
ψ : u
x ,
feature. In some cases we can also define a reverse map,
=
ψ (u
)
,
and retrieve x from u ; in this case the retrieved value, x rec
ψ
is called a reconstruction of x . The functions
can be
learned from x and x rec by minimizing the difference (reconstruc-
tion error) between them. We now briefly describe each method
below.
φ and
=
φ, u
R
R
R
(i) Principal Component Analysis, PCA
=
D via a linear transformation u
∈
∈
d is generated from the raw
In PCA, the representation u
∈
b,
data x
∈
d×D ,
d . The rows of W are
and b
where W
the eigenvectors of the first d largest eigenvalues, and b is
−W E[ x ] so that E[ u ]
=
0 . Reconstruction of
usually set to
=
the original data from the representation is given by x rec
− b)
ψ (u
)
(ii) Autoencoder, AE
+
W x
φ(x
)
W W
T =
T (u
W
=
=
I,
.
R
δ
=
/
1
(W l
l
(−z)
δ
. Although
+
δ(z)
(1
AE is a neural network model characterized by a structure
in which the model parameters are calibrated by minimiz-
+
=
h l−1
)
be the
ing the reconstruction error. Let h l
b l
network function of the l th layer with input h l−1
and out-
can differ across layers, a sigmoid func-
put h l
l
,
is typically used for all lay-
tion,
exp
ers, which we also adopt in our research.
as
a function of the input, the representation of x can be writ-
=
φ(x
)
ten as u
x ).
Then the reconstruction of the data can be similarly de-
=
,
fined: x rec
and the model can
be calibrated by minimizing the reconstruction error over
a training dataset,
. We adopt the following learning
criterion:
◦ h 1
(x
)
.
◦ h L
◦ .
+1
(
h 2 L
n }
2 Regarding h l
=
for an L -layer AE ( h 0
◦ .
h L
=
{
ψ (u
)
N
=1
n
=
)
u
x
)
.
.
.
2 ,
||
1
N
min θ
◦ φ(x
n )
||
n − ψ
x
=1
n
}
{
=
θ =
,
,
is often set as the
b i
where
i
W i
=
−i
+1
,
L,
,
,
i
in which case only W i
transpose of W L
need to be estimated. In this paper, we consider a single-
layer AE and estimate both W 1
+
,
2 L . W L
i
and W 2
.
(8)
,
1
,
1
.
.
.
.
.
.
N
(iii) Restricted Boltzmann Machine, RBM
RBM ( Hinton, 2002 ) has the same network structure as a
single-layer autoencoder, but it uses a different learning
method. RBM treats the input and output variables, x and
,
u , random, and defines an energy function ,
from
which the joint probability density function of x and u is
(x,
)
u
E
determined from the formula
(−E
(x,
Z
exp
,
))
u
)
u
exp
p(x,
=
=
where Z
is the partition function. In
most cases, u is assumed to be a d -dimensional binary vari-
d , and x is assumed to be either binary
able, i.e., u
or real-valued. When x is a real-valued variable, the energy
function has the following form ( Cho, Ilin, & Raiko, 2011 ):
∈
{0, 1}
(−E
(9)
(x,
x,u
u
))
E
(x,
)
u
=
− b)
T −1 (x
− b)
(x
1
2
− c
T u
− u
T W
−1
/
2 x , (10)
,
, W, b, c are model parameters. We set
=
+
δ(c j
W ( j,
+
(b i
T
,
u
W (:
)
,i
i
to be
where
the identity matrix; this makes learning simpler with little
performance sacrifice ( Taylor, Hinton, & Roweis, 2006 ). From
Eqs. (9) and (10) , the conditional distributions are obtained
as follows:
|
=
p(u j
)
1
x
|
=
N
)
p(x i
D,
u
·) is the sigmoid function, and W ( j , : )
δ(
where
are
the j th row and the i th column of W , respectively. This type
of RBM is denoted the Gaussian–Bernoulli RBM. The input
data is then represented and reconstructed in a probabilis-
tic way using the conditional distributions. Given an input
N
=1
,
dataset
maximum log-likelihood learning is formu-
n
lated as the following optimization:
=
,
1
j
=
,
1
i
and W (:, i )
)
x
:)
1)
(12)
(11)
d,
{
x
,
,
,
.
.
.
.
.
.
,
(13)
n }
N
max
θ
n ; θ )
=
L
log p(x
c}
b,
=1
n
{
W,
θ =
where
are the model parameters, and u is
marginalized out ( i.e., integrated out via expectation). This
problem can be solved via standard gradient descent. How-
ever, due to the computationally intractable partition func-
tion Z , an analytic formula for the gradient is usually un-
available. The model parameters are instead estimated us-
ing a learning method called the contrastive divergence (CD)
method ( Carreira-Perpinan & Hinton, 2005 ); we refer the
reader to Hinton (2002) for details on learning with RBM.
3. Data specification
We construct a deep neural network using stock returns from
the KOSPI market, the major stock market in South Korea. We
first choose the fifty largest stocks in terms of market capitaliza-
tion at the beginning of the sample period, and keep only the
stocks which have a price record over the entire sample period.
This leaves 38 stocks in the sample, which are listed in Table 2 .
The stock prices are collected every five minutes during the trad-
ing hours of the sample period (04-Jan-2010 to 30-Dec-2014), and
five-minute logarithmic returns are calculated using the formula
=
t is
r t
five minutes. We only consider intraday prediction, i.e., the first ten
=
five-minute returns ( i.e., lagged returns with g
10 ) each day are
used only to construct the raw level input R t , and not included in
the target data. The sample contains a total of 1239 trading days
and 73,041 five-minute returns (excluding the first ten returns each
day) for each stock.
,
where S t is the stock price at time t , and
/S t−t
)
(S t
ln
The training set consists of the first 80% of the sample (from
) stock re-
04-Jan-2010 to 24-Dec-2013) which contains 58,421 ( N 1
turns, while the remaining 20% (from 26-Dec-2013 to 30-Dec-
2014) with 14,620 ( N 2
}
n
i,t+1
r
) returns is used as the test set:
=
,
1
i
}
n
i,t+1
r
Training set:
Test set:
n
,
R
t
n
,
R
t
N 1
,
=1
n
N 2
,
=1
n
M.
{
{
,
.
.
.
2 exp ( z ) is applied to each element of z .
To avoid over-fitting during training, the last 20% of the training
set is further separated as a validation set.
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
191
Fig. 1. Mean and variance of stock returns in each group in the test set. The upper graph displays the mean returns of the stocks in each group defined by the mean of the
past returns. The lower graph displays the variance of the returns in each group defined by the variance of the past returns. x -axis represents the stock ID. The details of the
grouping method can be found in Section 3.1 .
Fig. 2. Up/down prediction accuracy of RawData. Upper graph: prediction accuracies in each dataset, and the reference accuracies. Lower graph: The difference between the
test set accuracy and the reference accuracy. x -axis represents the stock ID.
All stock returns are normalized using the training set mean
and the standard de-
over the training set, the normalized return is de-
)
. Henceforth, for notational convenience we
i
i
and standard deviation, i.e., for the mean
σ
of r i, t
viation
− μ
i
(r i,t
fined as
to denote the normalized return.
will use r i, t
/σ
μ
i
At each time t , we use ten lagged returns of the stocks in the
sample to construct the raw level input:
=
,
[ r 1
R t
,t
.
.
.
,
r 1
,t−9
,
.
.
.
,
,
r 38
,t
.
.
.
,
r 38
,t−9 ]
T .
3.1. Evidence of predictability in the Korean stock market
As a motivating example, we carry out a simple experiment to
see whether past returns have predictable power for future returns.
We first divide the returns of each stock into two groups accord-
ing to the mean or variance of ten lagged returns: If the mean of
η, the re-
the lagged returns, M (10), is greater than some threshold
turn is assigned to one group; otherwise, it is assigned to the other
group. Similarly, by comparing the variance of the lagged returns,
, the returns are divided into two groups.
V (10), with a threshold
192
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
Fig. 3. Up/down prediction accuracy of PCA200 and PCA380. Upper graph: prediction accuracies in each dataset, and the reference accuracies. Lower graph: The difference
between the test set accuracy and the reference accuracy. x -axis represents the stock ID.
η and
The classification is carried out in the training set, and the
are chosen for each stock so that the two
thresholds
groups have the same size. These thresholds are then applied to
the test set to classify the returns, and the mean and variance of
each group are calculated.
>
The results are shown in Table 3 and Fig. 1 . From the first graph
in the figure, it can be clearly seen that the average return of the
η group is lower than that of the other group, for all
M (10)
stocks except AMORE (36). The difference between the two groups
is very large for some stocks, e.g., SAMSUNG SDI (10), HHI (12),
and IBK (22). More specifically, the mean difference between two
groups is significant at 99% confidence level for all stocks except
AMORE, which is significant at 95% level. Similar results are ob-
tained for the variance-based classification. The variance difference
is significant for all stocks at 99% confidence level. The difference
between the two groups may not be large enough for many stocks
to make a profit after transaction costs are accounted for. Never-
theless, this example suggests that past returns do have predictive
power to a certain degree, which can be further exploited by deep
feature learning.
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
193
Fig. 4. Up/down prediction accuracy of RBM400 and RBM800. Upper graph: prediction accuracies in each dataset, and the reference accuracies. Lower graph: The difference
between the test set accuracy and the reference accuracy. x -axis represents the stock ID.
4. Market representation and trend prediction
In this section, we compare the representation methods de-
scribed in Section 2.3 , and analyze their effects on stock return
prediction. Each representation method receives the raw level mar-
∈
380 as the input, and produces the represen-
ket movements R t
=
tation output (features), u t
. Several sizes of u t are con-
sidered: 200 and 380 for PCA, and 400 and 800 for both RBM
and AE. These representations are respectively denoted by PCA200,
PCA380, RBM40 0, RBM80 0, AE40 0, and AE80 0 throughout the
φ(R t
)
R
paper. For comparison, the raw level data R t , denote by RawData,
is also considered as a representation. Therefore, there are total 7
representations.
4.1. Stock market trend prediction via logistic regression
Before training DNN using the features obtained from the rep-
resentation methods, we perform a test to assess the predictive
power of the features. The test is designed to check whether the
features can predict the up/down movement of the future return.
194
E. Chong et al. / Expert Systems With Applications 83 (2017) 187–205
Fig. 5. Up/down prediction accuracy of AE400 and AE800. Upper graph: prediction accuracies in each dataset, and the reference accuracies. Lower graph: The difference
between the test set accuracy and the reference accuracy. x -axis represents the stock ID.
This up/down prediction problem is essentially a two-class classifi-
cation problem, which can be solved by, among others, the logistic
∈
regression method. Let y
1
(−1)
representing up (down) movement of r t+1 . The probability of
y conditional on the features u t is defined as
=
denote the class label with y
{−1
,
1
}
|
; w,
(y
u t
P
=
b)
+
exp
1
1
(−(w
+
b)
T u t
,
)
y
(14)
where w and b are model parameters. We label the returns r t+1
=
in the training set according to the rule: y
and
=
to 0.02). The regression
y
− for some
if r t+1
>
,
1
(we set
−1
if r t+1
<
,
,
{
− N 1
n }
=
L
max
{
w,b}
log
=1
n
For each representation, we compute the classification accu-
racy defined as the percentage of correct prediction of ups and
downs, and compare with a reference accuracy. The reference ac-
curacy is defined as the percentage of the occurrences of ups
parameters are then estimated from the input-target pairs in the
training set,
N 1
=1
,
by the maximum likelihood estimation:
n
n
,
u
y
t
+
exp
1
(−(w
n
T u
t
+
b)
n )
y
.
(15)