Deep Learning for Multivariate
Financial Time Series
Gilberto Batres-Estrada
June 4, 2015
Abstract
Deep learning is a framework for training and modelling neural networks
which recently have surpassed all conventional methods in many learning
tasks, prominently image and voice recognition.
This thesis uses deep learning algorithms to forecast financial data. The
deep learning framework is used to train a neural network. The deep neural
network is a DBN coupled to a MLP. It is used to choose stocks to form
portfolios. The portfolios have better returns than the median of the stocks
forming the list. The stocks forming the S&P 500 are included in the study.
The results obtained from the deep neural network are compared to bench-
marks from a logistic regression network, a multilayer perceptron and a naive
benchmark. The results obtained from the deep neural network are better
and more stable than the benchmarks. The findings support that deep learn-
ing methods will find their way in finance due to their reliability and good
performance.
Keywords: Back-Propagation Algorithm, Neural networks, Deep Belief Net-
works, Multilayer Perceptron, Deep Learning, Contrastive Divergence, Greedy
Layer-wise Pre-training.
Acknowledgements
I would like to thank Söderberg & Partners, my supervisor Peng Zhou at
Söderberg & Partners, my supervisor Jonas Hallgren and examiner Filip
Lindskog at KTH Royal Institute of Technology for their support and guid-
ance during the course of this interesting project.
Stockholm, May 2015
Gilberto Batres-Estrada
iv
Contents
1 Introduction
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . .
2 Neural Networks
2.1 Single Layer Neural Network . . . . . . . . . . . . . . . . . .
2.1.1 Artificial Neurons . . . . . . . . . . . . . . . . . . . . .
2.1.2 Activation Function . . . . . . . . . . . . . . . . . . .
2.1.3
Single-Layer Feedforward Networks . . . . . . . . . . .
2.1.4 The Perceptron . . . . . . . . . . . . . . . . . . . . . .
2.1.5 The Perceptron As a Classifier
. . . . . . . . . . . . .
2.2 Multilayer Neural Networks . . . . . . . . . . . . . . . . . . .
2.2.1 The Multilayer Perceptron . . . . . . . . . . . . . . . .
2.2.2 Function Approximation with MLP . . . . . . . . . . .
2.2.3 Regression and Classification . . . . . . . . . . . . . .
2.2.4 Deep Architectures . . . . . . . . . . . . . . . . . . . .
2.3 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Boltzmann Machines . . . . . . . . . . . . . . . . . . .
2.3.2 Restricted Boltzmann Machines . . . . . . . . . . . . .
2.3.3 Deep Belief Networks . . . . . . . . . . . . . . . . . . .
2.3.4 Model for Financial Application . . . . . . . . . . . . .
3 Training Neural Networks
3.1 Back-Propagation Algorithm . . . . . . . . . . . . . . . . . .
3.1.1
Steepest Descent . . . . . . . . . . . . . . . . . . . . .
3.1.2 The Delta Rule . . . . . . . . . . . . . . . . . . . . . .
Case 1 Output Layer . . . . . . . . . . . . . . . . . . .
Case 2 Hidden Layer . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3 Forward and Backward Phase . . . . . . . . . . . . . .
Forward Phase . . . . . . . . . . . . . . . . . . . . . .
Backward Phase . . . . . . . . . . . . . . . . . . . . .
. .
3.1.4 Computation of δ for Known Activation Functions
1
2
2
5
6
6
7
11
12
12
15
15
16
17
18
22
22
24
25
27
31
31
31
32
33
33
33
34
34
34
35
v
36
36
37
39
41
42
42
43
43
44
47
49
53
58
59
59
60
63
64
65
67
69
71
75
77
77
78
78
79
81
3.2 Batch and On-Line Learning
3.1.5 Choosing Learning Rate . . . . . . . . . . . . . . . . .
Stopping Criteria . . . . . . . . . . . . . . . . . . . . .
3.1.6
Early-Stopping . . . . . . . . . . . . . . . . . . . . . .
3.1.7 Heuristics For The Back-Propagation Algorithm . . .
. . . . . . . . . . . . . . . . . .
3.2.1 Batch Learning . . . . . . . . . . . . . . . . . . . . . .
3.2.2 The Use of Batches . . . . . . . . . . . . . . . . . . . .
3.2.3 On-Line Learning . . . . . . . . . . . . . . . . . . . . .
3.2.4 Generalization . . . . . . . . . . . . . . . . . . . . . .
3.2.5 Example: Regression with Neural Networks . . . . . .
3.3 Training Restricted Boltzmann Machines . . . . . . . . . . . .
3.3.1 Contrastive Divergence . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
Implementation . . . . . . . . . . . . . . . . . . . . . .
3.4 Training Deep Belief Networks
3.4.1
4 Financial Model
4.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
Input Data and Financial Model
4.1.1
5 Experiments and Results
5.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
Summary of Results
5.3.1
6 Discussion
Appendices
A Appendix
A.1 Statistical Physics
. . . . . . . . . . . . . . . . . . . . . . . .
A.1.1 Logistic Belief Networks . . . . . . . . . . . . . . . . .
A.1.2 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . .
A.1.3 Back-Propagation: Regression . . . . . . . . . . . . . .
A.2 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Chapter 1
Introduction
Deep learning is gaining a lot of popularity in the machine learning commu-
nity and especially big technological companies such as Google inc, Microsoft
and Facebook are investing in this area of research. Deep learning is a set of
learning algorithms designed to train so called artificial neural networks an
area of research in machine learning and artificial intelligence (AI). Neural
networks are hard to train if they become too complex, e.g., networks with
many layers, see the Chapter on neural networks. Deep learning is a frame-
work facilitating training of deep neural networks with many hidden layers.
This was not possible before its invention.
The main task of this master thesis is to use methods of deep learning,
to compose portfolios. It is done by picking stocks according to a function
learned by a deep neural network. This function will take values in the
discrete set {0, 1} representing a class label. The prediction task will be per-
formed as a classification task, assigning a class or label to a stock depending
on the past history of the stock, see Chapter 4.
We begin this work by presenting the background in which the formu-
lation of the task to be solved is presented in detail, we then continue by
presenting a short survey of the literature studied in order to understand the
area of deep learning. We have tried to distinguish between the theory of
neural networks’ architecture and the theory on how to train them. Theory
and architecture is given in Chapter 2, and the training of neural networks is
presented in Chapter 3. In Chapter 4 the financial model is presented along
with the assumptions made. In Chapter 5 we describe the experiments done
and show the results from those experiments. The thesis concludes with
Chapter 6 comprising a discussion of results and reflections about model
choice and new paths to be taken in this area of research.
1
1.1 Background
Our artificial intelligent system will be constructed around neural networks.
Neural networks, and in particular shallow networks, have been studied over
the years to predict movements in financial markets and there are plenty
of articles on the subject, see for instance (Kuo et al., 2014). In that paper
there is a long reference list on the subject. We will use a type of neural
network, called DBN. It is a type of stochastic learning machine which we
connect to a multilayer perceptron (MLP).
The theory describing these networks is presented in the Chapter on neu-
ral networks. The task of predicting how financial markets evolve with time
is an important subject of research and also a complex task. The reason is
that stock prices move in a random way. Many factors could be attributed to
this stochastic behaviour, most of them are complex and difficult to predict,
but one that is certainly part of the explanation is human behaviour and
psychology.
We rely on past experience and try to model the future empirically
with the help of price history from the financial markets.
In particular
the historical returns are considered to be a good representation of future
returns (Hult, Lindskog at al., 2012). Appropriate transformations of histor-
ical samples will produce samples of the random returns that determine the
future portfolio values. If we consider returns to be identically distributed
random variables then we can assume that the mechanisms that produced
the returns in the past is the same mechanism behind returns produced in
the future (Hult, Lindskog at al., 2012). We gather data from the financial
markets and present it to our learning algorithm. The assumptions made are
presented in the Chapter on financial model, Chapter 4. We chose to study
the S&P 500.
1.2 Literature Survey
This work is based, besides computer experiments, on literature studies of
both standard books in machine learning as well as papers on the subject
of deep learning. An introduction to artificial neural networks can be found
in The Elements of Statistical learning of Hastie, Tibshirani and Friedman,
(Hastie and Friedman, 2009). Simon Haykin goes more in depth into the
theory of neural networks in his book Neural Networks and Learning Ma-
chines where he also introduces some theory on deep learning (Haykin, 2009).
Research on deep learning is focused mostly on tasks in artificial intelligence
intending to make machines perform better on tasks such as vision recog-
nition, speech recognition, etc. In many papers e.g. (Hinton et al., 2006),
2