Advanced Algorithmic Trading.pdf

发布时间：2022-06-16 发布人：admin 分类：说明书资料大小：13.83M 资料格式：pdf 举报版权申诉

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第1页.png

第1页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第2页.png

第2页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第3页.png

第3页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第4页.png

第4页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第5页.png

第5页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第6页.png

第6页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第7页.png

第7页 / 共517页

7d85a50c-ec38-4d4a-a381-3761b7ed4260.pdf-第8页.png

第8页 / 共517页

I Introduction

Introduction To Advanced Algorithmic Trading

The Hunt for Alpha

Why Time Series Analysis, Bayesian Statistics and Machine Learning?

Bayesian Statistics

Time Series Analysis

Machine Learning

How Is The Book Laid Out?

Required Technical Background

Mathematics

Programming

How Does This Book Differ From "Successful Algorithmic Trading"?

Software Installation

Installing Python

Installing R

QSTrader Backtesting Simulation Software

Alternatives

Where to Get Help

II Bayesian Statistics

Introduction to Bayesian Statistics

What is Bayesian Statistics?

Frequentist vs Bayesian Examples

Applying Bayes' Rule for Bayesian Inference

Coin-Flipping Example

Bayesian Inference of a Binomial Proportion

The Bayesian Approach

Assumptions of the Approach

Recalling Bayes' Rule

The Likelihood Function

Bernoulli Distribution

Bernoulli Likelihood Function

Multiple Flips of the Coin

Quantifying our Prior Beliefs

Beta Distribution

Why Is A Beta Prior Conjugate to the Bernoulli Likelihood?

Multiple Ways to Specify a Beta Prior

Using Bayes' Rule to Calculate a Posterior

Markov Chain Monte Carlo

Bayesian Inference Goals

Why Markov Chain Monte Carlo?

Markov Chain Monte Carlo Algorithms

The Metropolis Algorithm

Introducing PyMC3

Inferring a Binomial Proportion with Markov Chain Monte Carlo

Inferring a Binonial Proportion with Conjugate Priors Recap

Inferring a Binomial Proportion with PyMC3

Bibliographic Note

Bayesian Linear Regression

Frequentist Linear Regression

Bayesian Linear Regression

Bayesian Linear Regression with PyMC3

What are Generalised Linear Models?

Simulating Data and Fitting the Model with PyMC3

Bibliographic Note

Full Code

Bayesian Stochastic Volatility Model

Stochastic Volatility

Bayesian Stochastic Volatility

PyMC3 Implementation

Obtaining the Price History

Model Specification in PyMC3

Fitting the Model with NUTS

Full Code

III Time Series Analysis

Introduction to Time Series Analysis

What is Time Series Analysis?

How Can We Apply Time Series Analysis in Quantitative Finance?

Time Series Analysis Software

Time Series Analysis Roadmap

How Does This Relate to Other Statistical Tools?

Serial Correlation

Expectation, Variance and Covariance

Example: Sample Covariance in R

Correlation

Example: Sample Correlation in R

Stationarity in Time Series

Serial Correlation

The Correlogram

Example 1 - Fixed Linear Trend

Example 2 - Repeated Sequence

Next Steps

Random Walks and White Noise Models

Time Series Modelling Process

Backward Shift and Difference Operators

White Noise

Second-Order Properties

Correlogram

Random Walk

Second-Order Properties

Correlogram

Fitting Random Walk Models to Financial Data

Autoregressive Moving Average Models

How Will We Proceed?

Strictly Stationary

Akaike Information Criterion

Autoregressive (AR) Models of order p

Rationale

Stationarity for Autoregressive Processes

Second Order Properties

Simulations and Correlograms

Financial Data

Moving Average (MA) Models of order q

Rationale

Definition

Second Order Properties

Simulations and Correlograms

Financial Data

Next Steps

Autogressive Moving Average (ARMA) Models of order p, q

Bayesian Information Criterion

Ljung-Box Test

Rationale

Definition

Simulations and Correlograms

Choosing the Best ARMA(p,q) Model

Financial Data

Next Steps

Autoregressive Integrated Moving Average and Conditional Heteroskedastic Models

Quick Recap

Autoregressive Integrated Moving Average (ARIMA) Models of order p, d, q

Rationale

Definitions

Simulation, Correlogram and Model Fitting

Financial Data and Prediction

Next Steps

Volatility

Conditional Heteroskedasticity

Autoregressive Conditional Heteroskedastic Models

ARCH Definition

Why Does This Model Volatility?

When Is It Appropriate To Apply ARCH(1)?

ARCH(p) Models

Generalised Autoregressive Conditional Heteroskedastic Models

GARCH Definition

Simulations, Correlograms and Model Fittings

Financial Data

Next Steps

Cointegrated Time Series

Mean Reversion Trading Strategies

Cointegration

Unit Root Tests

Augmented Dickey-Fuller Test

Phillips-Perron Test

Phillips-Ouliaris Test

Difficulties with Unit Root Tests

Simulated Cointegrated Time Series with R

Cointegrated Augmented Dickey Fuller Test

CADF on Simulated Data

CADF on Financial Data

EWA and EWC

RDS-A and RDS-B

Full Code

Johansen Test

Johansen Test on Simulated Data

Johansen Test on Financial Data

Full Code

State Space Models and the Kalman Filter

Linear State-Space Model

The Kalman Filter

A Bayesian Approach

Prediction

Dynamic Hedge Ratio Between ETF Pairs Using the Kalman Filter

Linear Regression via the Kalman Filter

Applying the Kalman Filter to a Pair of ETFs

TLT and ETF

Scatterplot of ETF Prices

Time-Varying Slope and Intercept

Next Steps

Bibliographic Note

Full Code

Hidden Markov Models

Markov Models

Markov Model Mathematical Specification

Hidden Markov Models

Hidden Markov Model Mathematical Specification

Filtering of Hidden Markov Models

Regime Detection with Hidden Markov Models

Market Regimes

Simulated Data

Financial Data

Next Steps

Bibliographic Note

Full Code

IV Statistical Machine Learning

Introduction to Machine Learning

What is Machine Learning?

Machine Learning Domains

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Machine Learning Techniques

Linear Regression

Linear Classification

Tree-Based Methods

Support Vector Machines

Artificial Neural Networks and Deep Learning

Bayesian Networks

Clustering

Dimensionality Reduction

Machine Learning Applications

Forecasting and Prediction

Natural Language Processing

Factor Models

Image Classification

Model Accuracy

Parametric and Non-Parametric Models

Statistical Framework for Machine Learning Domains

Supervised Learning

Mathematical Framework

Classification

Regression

Financial Example

Training

Linear Regression

Probabilistic Interpretation

Basis Function Expansion

Maximum Likelihood Estimation

Likelihood and Negative Log Likelihood

Ordinary Least Squares

Simulated Data Example with Scikit-Learn

Full Code

Bibliographic Note

Tree-Based Methods

Decision Trees - Mathematical Overview

Decision Trees for Regression

Creating a Regression Tree and Making Predictions

Pruning The Tree

Decision Trees for Classification

Classification Error Rate/Hit Rate

Gini Index

Cross-Entropy/Deviance

Advantages and Disadvantages of Decision Trees

Advantages

Disadvantages

Ensemble Methods

The Bootstrap

Bootstrap Aggregation

Random Forests

Boosting

Python Scikit-Learn Implementation

Bibliographic Note

Full Code

Support Vector Machines

Motivation for Support Vector Machines

Advantages and Disadvantages of SVMs

Advantages

Disadvantages

Linear Separating Hyperplanes

Classification

Deriving the Classifier

Constructing the Maximal Margin Classifier

Support Vector Classifiers

Support Vector Machines

Biblographic Notes

Model Selection and Cross-Validation

Bias-Variance Trade-Off

Machine Learning Models

Model Selection

The Bias-Variance Tradeoff

Cross-Validation

Overview of Cross-Validation

Forecasting Example

Validation Set Approach

k-Fold Cross Validation

Python Implementation

k-Fold Cross Validation

Full Python Code

Unsupervised Learning

High Dimensional Data

Mathematical Overview of Unsupervised Learning

Unsupervised Learning Algorithms

Dimensionality Reduction

Clustering

Bibliographic Note

Clustering Methods

K-Means Clustering

The Algorithm

Issues

Simulated Data

OHLC Clustering

Bibliographic Note

Full Code

Natural Language Processing

Overview

Supervised Document Classification

Preparing a Dataset for Classification

Vectorisation

Term-Frequency Inverse Document-Frequency

Training the Support Vector Machine

Performance Metrics

Full Code

V Quantitative Trading Techniques

Introduction to QSTrader

Motivation for QSTrader

Design Considerations

Installation

Introductory Portfolio Strategies

Motivation

The Trading Strategies

Data

Python QSTrader Implementation

MonthlyLiquidateRebalanceStrategy

LiquidateRebalancePositionSizer

Backtest Interface

Strategy Results

Transaction Costs

US Equities/Bonds 60/40 ETF Portfolio

"Strategic" Weight ETF Portfolio

Equal Weight ETF Portfolio

Full Code

ARIMA+GARCH Trading Strategy on Stock Market Indexes Using R

Strategy Overview

Strategy Implementation

Strategy Results

Full Code

Cointegration-Based Pairs Trading using QSTrader

The Hypothesis

Cointegration Tests in R

The Trading Strategy

Data

Python QSTrader Implementation

Strategy Results

Transaction Costs

Tearsheet

Full Code

Kalman Filter-Based Pairs Trading using QSTrader

The Trading Strategy

Data

Python QSTrader Implementation

Strategy Results

Next Steps

Full Code

Supervised Learning for Intraday Returns Prediction using QSTrader

Prediction Goals with Machine Learning

Class Imbalance

Building a Prediction Model on Historical Data

QSTrader Strategy Object

QSTrader Backtest Script

Results

Next Steps

Full Code

Sentiment Analysis via Sentdex Vendor Sentiment Data with QSTrader

Sentiment Analysis

Sentdex API and Sample File

The Trading Strategy

Data

Python Implementation

Sentiment Handling with QSTrader

Sentiment Analysis Strategy Code

Strategy Results

Transaction Costs

Sentiment on S&P500 Tech Stocks

Sentiment on S&P500 Energy Stocks

Sentiment on S&P500 Defence Stocks

Full Code

Market Regime Detection with Hidden Markov Models using QSTrader

Regime Detection with Hidden Markov Models

The Trading Strategy

Data

Python Implementation

Returns Calculation with QSTrader

Regime Detection Implementation

Strategy Results

Transaction Costs

No Regime Detection Filter

HMM Regime Detection Filter

Full Code

Strategy Decay

Calculating the Annualised Rolling Sharpe Ratio

Python QSTrader Implementation

Strategy Results

Kalman Filter Pairs Trade

Aluminum Smelting Cointegration Strategy

Sentdex Sentiment Analysis Strategy

Contents I Introduction 1 Introduction To Advanced Algorithmic Trading . . . . . . . . . . . . . . . . . 1.1 The Hunt for Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Why Time Series Analysis, Bayesian Statistics and Machine Learning? . . . . . . 1.2.1 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 How Is The Book Laid Out? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Required Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 How Does This Book Diﬀer From "Successful Algorithmic Trading"? . . . . . . . 1.6 Software Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Where to Get Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 QSTrader Backtesting Simulation Software 1.6.1 1.6.2 II Bayesian Statistics 1 3 3 3 4 4 5 6 6 7 7 8 8 8 9 9 10 10 11 2 Introduction to Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 17 18 2.1 What is Bayesian Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Applying Bayes’ Rule for Bayesian Inference . . . . . . . . . . . . . . . . . . . . . 2.3 Coin-Flipping Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Frequentist vs Bayesian Examples 3 Bayesian Inference of a Binomial Proportion . . . . . . . . . . . . . . . . . . . 23 24 24 25 25 25 26 27 27 3.1 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Assumptions of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Recalling Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Bernoulli Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Multiple Flips of the Coin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Quantifying our Prior Beliefs 1

3.5.1 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Why Is A Beta Prior Conjugate to the Bernoulli Likelihood? . . . . . . . 3.5.3 Multiple Ways to Specify a Beta Prior . . . . . . . . . . . . . . . . . . . . 3.6 Using Bayes’ Rule to Calculate a Posterior . . . . . . . . . . . . . . . . . . . . . . 2 27 30 30 31 4.2.1 Markov Chain Monte Carlo Algorithms 4 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 36 37 37 38 38 39 40 45 4.1 Bayesian Inference Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Why Markov Chain Monte Carlo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The Metropolis Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Introducing PyMC3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inferring a Binomial Proportion with Markov Chain Monte Carlo . . . . . . . . . 4.5 Inferring a Binonial Proportion with Conjugate Priors Recap . . . . . . . 4.5.1 4.5.2 Inferring a Binomial Proportion with PyMC3 . . . . . . . . . . . . . . . . 4.6 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 48 49 49 50 55 56 5.1 Frequentist Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Bayesian Linear Regression with PyMC3 . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 What are Generalised Linear Models? . . . . . . . . . . . . . . . . . . . . Simulating Data and Fitting the Model with PyMC3 . . . . . . . . . . . . 5.3.2 5.4 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Bayesian Stochastic Volatility Model 6.1 Stochastic Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Bayesian Stochastic Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 PyMC3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Obtaining the Price History . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Model Speciﬁcation in PyMC3 . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Fitting the Model with NUTS . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 63 63 65 65 67 III Time Series Analysis 71 7 Introduction to Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73 73 74 74 75 76 7.1 What is Time Series Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 How Can We Apply Time Series Analysis in Quantitative Finance? . . . . . . . . 7.3 Time Series Analysis Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Time Series Analysis Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 How Does This Relate to Other Statistical Tools? . . . . . . . . . . . . . . . . . . 8 Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 77 78 8.1 Expectation, Variance and Covariance . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Example: Sample Covariance in R . . . . . . . . . . . . . . . . . . . . . .

3 8.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Example: Sample Correlation in R . . . . . . . . . . . . . . . . . . . . . . 8.3 Stationarity in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 The Correlogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Example 1 - Fixed Linear Trend . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 Example 2 - Repeated Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Next Steps 80 80 80 82 83 84 84 86 9 Random Walks and White Noise Models . . . . . . . . . . . . . . . . . . . . . . 87 87 88 89 89 90 91 91 92 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Time Series Modelling Process 9.2 Backward Shift and Diﬀerence Operators . . . . . . . . . . . . . . . . . . . . . . 9.3 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Second-Order Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Correlogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second-Order Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 9.4.2 Correlogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.3 Fitting Random Walk Models to Financial Data . . . . . . . . . . . . . . 10.1 How Will We Proceed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Strictly Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Akaike Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Autoregressive (AR) Models of order p . . . . . . . . . . . . . . . . . . . . . . . . 10 Autoregressive Moving Average Models . . . . . . . . . . . . . . . . . . . . . . 97 98 98 99 99 10.4.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 10.4.2 Stationarity for Autoregressive Processes . . . . . . . . . . . . . . . . . . 100 10.4.3 Second Order Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.4.4 Simulations and Correlograms . . . . . . . . . . . . . . . . . . . . . . . . . 102 10.4.5 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.5 Moving Average (MA) Models of order q . . . . . . . . . . . . . . . . . . . . . . . 111 10.5.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 10.5.2 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 10.5.3 Second Order Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 10.5.4 Simulations and Correlograms . . . . . . . . . . . . . . . . . . . . . . . . . 113 10.5.5 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 10.5.6 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.6 Autogressive Moving Average (ARMA) Models of order p, q . . . . . . . . . . . . 124 10.6.1 Bayesian Information Criterion . . . . . . . . . . . . . . . . . . . . . . . . 125 10.6.2 Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.6.3 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.6.4 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.6.5 Simulations and Correlograms . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.6.6 Choosing the Best ARMA(p,q) Model . . . . . . . . . . . . . . . . . . . . 130 10.6.7 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

4 10.7 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 11 Autoregressive Integrated Moving Average and Conditional Heteroskedastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 11.1 Quick Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 11.2 Autoregressive Integrated Moving Average (ARIMA) Models of order p, d, q . . 136 11.2.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 11.2.2 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 11.2.3 Simulation, Correlogram and Model Fitting . . . . . . . . . . . . . . . . . 137 11.2.4 Financial Data and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 139 11.2.5 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.3 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.4 Conditional Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.5 Autoregressive Conditional Heteroskedastic Models . . . . . . . . . . . . . . . . . 145 11.5.1 ARCH Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 11.5.2 Why Does This Model Volatility? . . . . . . . . . . . . . . . . . . . . . . . 146 11.5.3 When Is It Appropriate To Apply ARCH(1)? . . . . . . . . . . . . . . . . 146 11.5.4 ARCH(p) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 11.6 Generalised Autoregressive Conditional Heteroskedastic Models . . . . . . . . . . 147 11.6.1 GARCH Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 11.6.2 Simulations, Correlograms and Model Fittings . . . . . . . . . . . . . . . 147 11.6.3 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 11.7 Next Steps 12 Cointegrated Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 12.1 Mean Reversion Trading Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 155 12.2 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12.3 Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 . . . . . . . . . . . . . . . . . . . . . . . . 157 12.3.1 Augmented Dickey-Fuller Test 12.3.2 Phillips-Perron Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 12.3.3 Phillips-Ouliaris Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 12.3.4 Diﬃculties with Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . 157 12.4 Simulated Cointegrated Time Series with R . . . . . . . . . . . . . . . . . . . . . 157 12.5 Cointegrated Augmented Dickey Fuller Test . . . . . . . . . . . . . . . . . . . . . 162 12.6 CADF on Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 12.7 CADF on Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 12.7.1 EWA and EWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 12.7.2 RDS-A and RDS-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 12.8 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 12.9 Johansen Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 12.9.1 Johansen Test on Simulated Data . . . . . . . . . . . . . . . . . . . . . . 175 12.9.2 Johansen Test on Financial Data . . . . . . . . . . . . . . . . . . . . . . . 178 12.9.3 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 13 State Space Models and the Kalman Filter . . . . . . . . . . . . . . . . . . . . 185 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 13.1 Linear State-Space Model

5 13.3 Dynamic Hedge Ratio Between ETF Pairs Using the Kalman Filter 13.2 The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 13.2.1 A Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 13.2.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 . . . . . . . 191 13.3.1 Linear Regression via the Kalman Filter . . . . . . . . . . . . . . . . . . . 191 13.3.2 Applying the Kalman Filter to a Pair of ETFs . . . . . . . . . . . . . . . 192 13.3.3 TLT and ETF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 13.3.4 Scatterplot of ETF Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 13.3.5 Time-Varying Slope and Intercept . . . . . . . . . . . . . . . . . . . . . . 193 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 13.4 Next Steps 13.5 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 13.6 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 14.2 Hidden Markov Models 14 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 14.1 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 14.1.1 Markov Model Mathematical Speciﬁcation . . . . . . . . . . . . . . . . . . 203 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 14.2.1 Hidden Markov Model Mathematical Speciﬁcation . . . . . . . . . . . . . 205 14.2.2 Filtering of Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 205 14.3 Regime Detection with Hidden Markov Models . . . . . . . . . . . . . . . . . . . 206 14.3.1 Market Regimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 14.3.2 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 14.3.3 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 14.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 14.5 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 14.6 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 IV Statistical Machine Learning 217 15 Introduction to Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 219 15.1 What is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 15.2 Machine Learning Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.3 Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.3.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.2 Linear Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.3 Tree-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.4 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.5 Artiﬁcial Neural Networks and Deep Learning . . . . . . . . . . . . . . . . 221 15.3.6 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.7 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.3.8 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.4 Machine Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

6 15.4.1 Forecasting and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 222 15.4.2 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . 222 15.4.3 Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 15.4.4 Image Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 15.4.5 Model Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 15.4.6 Parametric and Non-Parametric Models . . . . . . . . . . . . . . . . . . . 223 15.4.7 Statistical Framework for Machine Learning Domains . . . . . . . . . . . 224 16 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 16.1 Mathematical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 16.2 Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 16.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 16.3.1 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 16.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 17 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 17.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 17.2 Probabilistic Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 17.2.1 Basis Function Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 17.3 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 17.3.1 Likelihood and Negative Log Likelihood . . . . . . . . . . . . . . . . . . . 233 17.3.2 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 17.4 Simulated Data Example with Scikit-Learn . . . . . . . . . . . . . . . . . . . . . 235 17.5 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 17.6 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 18.4 Advantages and Disadvantages of Decision Trees 18.2.1 Creating a Regression Tree and Making Predictions 18.2.2 Pruning The Tree 18 Tree-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 18.1 Decision Trees - Mathematical Overview . . . . . . . . . . . . . . . . . . . . . . . 243 18.2 Decision Trees for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 . . . . . . . . . . . . 245 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 18.3 Decision Trees for Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 18.3.1 Classiﬁcation Error Rate/Hit Rate . . . . . . . . . . . . . . . . . . . . . . 247 18.3.2 Gini Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 18.3.3 Cross-Entropy/Deviance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 . . . . . . . . . . . . . . . . . . 248 18.4.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 18.4.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 18.5 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 18.5.1 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 18.5.2 Bootstrap Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 18.5.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 18.5.4 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 18.5.5 Python Scikit-Learn Implementation . . . . . . . . . . . . . . . . . . . . . 251 18.6 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 18.7 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

7 19 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 19.1 Motivation for Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . 261 19.2 Advantages and Disadvantages of SVMs . . . . . . . . . . . . . . . . . . . . . . . 262 19.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 19.2.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 19.3 Linear Separating Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 19.4 Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 19.5 Deriving the Classiﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 19.6 Constructing the Maximal Margin Classiﬁer . . . . . . . . . . . . . . . . . . . . . 267 19.7 Support Vector Classiﬁers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 19.8 Support Vector Machines 19.8.1 Biblographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 20 Model Selection and Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . 275 20.1 Bias-Variance Trade-Oﬀ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 20.1.1 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 20.1.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 20.1.3 The Bias-Variance Tradeoﬀ . . . . . . . . . . . . . . . . . . . . . . . . . . 278 20.2 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 20.2.1 Overview of Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . 282 20.2.2 Forecasting Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 20.2.3 Validation Set Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 20.2.4 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 20.2.5 Python Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 20.2.6 k-Fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 20.2.7 Full Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 21 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 21.1 High Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 21.2 Mathematical Overview of Unsupervised Learning . . . . . . . . . . . . . . . . . 302 21.3 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 303 21.3.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 21.3.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 21.4 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 22 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 22.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 22.1.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 22.1.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 22.1.3 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 22.1.4 OHLC Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 22.2 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 22.3 Full Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 23 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 23.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

分享到：

赞收藏

资料库

Advanced Algorithmic Trading.pdf

相关推荐

行业

热门标签

最新资料