Quant Finance (Machine Learning Trading) — Statistical Arbitrage


This provides a nice little refresher into the possibilities of using ML in Quant Finance. Additionally uses Python (Pandas) for processing. Gives a neat introduction.


There were a couple of books on Quant Finance recommended from multiple sources especially from several handles on Reddit. The first one is a nice hands-on book, allowing you to build on the course of Udacity.

Competitions/ Platforms

  • Numerai: Had participated in this portal way back when they had just got started. Had even received some bitcoin as a part of the reward. Ever since they have gone on to create their cryptocurrency and made several changes to the overall platform. Think ML for finance in Kaggle style.
  • Quantopian: Another popular platform but more traditional in nature with a much simpler scheme. Has plenty of data sources, a nice research environment and a thriving community on forums.
  • Websim: Another simulation platform, by WorldQuant. This one has a nice payout scheme if you do well with revenue sharing and fixed stipends.
  • Cost is extremely prohibitive.
  • Openly available data is ubiquitous and has low signal power.
  • Non-stationary, Non-IID & Non-Normal price data points result in the violation of several Machine Learning algorithmic assumptions.
  • Instead of sampling data in terms of time, we sample data in volume terms called volume bars. This has a dual advantage :
  • The corresponding volume has better statistical properties (iid & gaussian)
  • This takes into account also the volume aspect of information. We manage to capture more information due to higher sampling during higher activity

Signal Generation & Processing

This is one area where the majority of ML applications is being explored. Everything around converting the data sets into useful signals comes here.

  • Feature selection and not back-testing is where the edge is. Use simple models to understand and interpret the top features/ predictors.
  • Model ML problems as classification over regression. Simple models over complex approach: Occam’s Razor
  • Follow a research-driven approach(EDA, summaries) contrary to the back-testing heavy model.
  • Split Data into Train, Validation & Test.
  • Normalise the values.
  • Plot the error values after each epoch. This will help in understanding if we are overfitting, generalisability of models etc.
  • Order book “pictures” used and trained using transfer-learning to predict the next set of movements.
  • LSTMs: Any paper on time series will have some relevant stuff for financial data sets. The pre-processing steps can be replicated for financial data sets as well.

Portfolio Allocation & Risk Management

Besides signal generation, portfolio allocation processes also involve several risk management principles. I have seen firms employ strict risk management constraints such as not more than 2% liquidity in one stock etc.

  • Extreme returns(primarily negative) have a higher probability than the traditional normal distribution. This is important from a risk management point of view. This is referred as fat tails in returns.
  • This can result in greater drawdown during live markets compared to backtesting.
  • A large & diverse portfolio can bring the excess kurtosis close to zero.
  • This assumption of independence can be dangerous. The fat tails aspect coupled with highly correlated asset movements resulted in a failure of risk management during the 2008 crisis.
  • The covariance between assets is constantly changing. They highly correlate with negative stock movements.
  • Returns typically are known to have a high kurtosis and long negative tail. So the probability of high drawdowns is greater in real scenario compared to back-tests.
  • Bonferroni Test: The p-value for significance usually adjusted as we carry out more backtesting operations. This allows to ensure new max Sharpe >> old max Sharpe for it to be an actual signal.
  • Transaction costs are absolutely critical and important to check for. Often not accounted for in back-tests and modelling.
  • Execution Strategy :
  • You can’t execute at midpoint prices, so need to include price.
  • Trade execution needs to include the volume aspect as well.
  • You need at the minimum last trade price with volume. The best is to have order book depth.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store