Quant Finance (Machine Learning Trading) — Statistical Arbitrage



Competitions/ Platforms

  • Numerai: Had participated in this portal way back when they had just got started. Had even received some bitcoin as a part of the reward. Ever since they have gone on to create their cryptocurrency and made several changes to the overall platform. Think ML for finance in Kaggle style.
  • Quantopian: Another popular platform but more traditional in nature with a much simpler scheme. Has plenty of data sources, a nice research environment and a thriving community on forums.
  • Websim: Another simulation platform, by WorldQuant. This one has a nice payout scheme if you do well with revenue sharing and fixed stipends.
  • Gaining access to quality data is the biggest challenge in terms of the entry barrier.
  • Cost is extremely prohibitive.
  • Openly available data is ubiquitous and has low signal power.
  • Non-stationary, Non-IID & Non-Normal price data points result in the violation of several Machine Learning algorithmic assumptions.
  • Instead of sampling data in terms of time, we sample data in volume terms called volume bars. This has a dual advantage :
  • The corresponding volume has better statistical properties (iid & gaussian)
  • This takes into account also the volume aspect of information. We manage to capture more information due to higher sampling during higher activity

Signal Generation & Processing

  • Overfitting data is the biggest challenge with ML models. Coupled with relentless backtesting can result in lots of spurious results.
  • Feature selection and not back-testing is where the edge is. Use simple models to understand and interpret the top features/ predictors.
  • Model ML problems as classification over regression. Simple models over complex approach: Occam’s Razor
  • Follow a research-driven approach(EDA, summaries) contrary to the back-testing heavy model.
  • Split Data into Train, Validation & Test.
  • Normalise the values.
  • Plot the error values after each epoch. This will help in understanding if we are overfitting, generalisability of models etc.
  • Order book “pictures” used and trained using transfer-learning to predict the next set of movements.
  • LSTMs: Any paper on time series will have some relevant stuff for financial data sets. The pre-processing steps can be replicated for financial data sets as well.

Portfolio Allocation & Risk Management

  • Kelly Criterion & Portfolio Allocation theory to distribute funds to signals. Make assumptions about a known mean, variance of returns which is a huge assumption.
  • Extreme returns(primarily negative) have a higher probability than the traditional normal distribution. This is important from a risk management point of view. This is referred as fat tails in returns.
  • This can result in greater drawdown during live markets compared to backtesting.
  • A large & diverse portfolio can bring the excess kurtosis close to zero.
  • This assumption of independence can be dangerous. The fat tails aspect coupled with highly correlated asset movements resulted in a failure of risk management during the 2008 crisis.
  • The covariance between assets is constantly changing. They highly correlate with negative stock movements.
  • Sharpe Ratio: The best single criteria to evaluate stocks under the assumption that returns have a normal distribution.
  • Returns typically are known to have a high kurtosis and long negative tail. So the probability of high drawdowns is greater in real scenario compared to back-tests.
  • Bonferroni Test: The p-value for significance usually adjusted as we carry out more backtesting operations. This allows to ensure new max Sharpe >> old max Sharpe for it to be an actual signal.
  • Transaction costs are absolutely critical and important to check for. Often not accounted for in back-tests and modelling.
  • Execution Strategy :
  • You can’t execute at midpoint prices, so need to include price.
  • Trade execution needs to include the volume aspect as well.
  • You need at the minimum last trade price with volume. The best is to have order book depth.

Startups & Analytics

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Natural Language Generation using BERT

Deciding optimal filter size for CNNs

ML Classifier Performance Comparison for Spam Emails Detection (Part 2)

Why Reinforcement Learning is Wrong for Your Business

AutoWorkout: How to Improve Motion Activity Classifier Predictions?

What is ML-powered Anomaly Detection in IoT, and why is it so important for businesses?

Automation Toolkit for Machine Learning: A Python Package to Make Machine Learning Journey Smoother

An Intro to Neural Networks: HOW does a computer know what a handbag or number is?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhinav Unnam

Abhinav Unnam

Startups & Analytics

More from Medium

Analysis of social impact on crypto price moves

Stock Market Suggestion System

Prediction of Housing Prices

Books I prefer for Stocks Trading