Cricket Analytics Starter Kit — Data Science Projects — Statistical Arbitrage

Cricket has a crazy following in the sub-continent with IPL being last valued at 5.3 billion USD. This game of bat and ball largely prevalent in Commonwealth Nations is not just interesting to watch but has an equally growing analytical use case. Such much so, there seem to be courses/classes around Cricket Analytics!!

To me, cricket is a simple game. Keep it simple and just go out and play

Shane Warne

With the discrete nature of the game and the growth of IPL, the need for cricket analytics as an edge both for on-field performance and other ancillary services is growing.


  • The same index works in an opposite direction and combines economy/wickets for bowlers.
  • The idea is to bring contribution/effectiveness to a single number and to be able to compare them.
  • The paper and some of the resources are mentioned towards the end.
  • The tools can be used separately to explore batsmen and bowlers. The data though is currently, only from IPL matches of the last few years. It might be from 2017 onwards and might have a few missing values.

Given that my previous startup experience was around trying to cash on this growing niche, I wanted to recap and document my learnings about the ecosystem in general. Broadly as mentioned above, the opportunity lies across two directions :

The fan engagement aspect largely involves the fantasy gaming sites, IPL Teams and any other celebrity imports, primarily Bollywood and so on. The private nature of the data involves fan engagement but the large part of the post is about performance analysis data.


The only open-source data set available was at Cricsheet. Unfortunately, it stopped updating from July 2017 onwards. But White Ball Analytics recently released an updated version.

Courses for Cricket Analytics

There are only a few books on the subject with a couple of them being by Tiniam V Ganesh.


Data & Processing

This free historical dataset is limited to a ball-ball events catalogue. But based on my experience, there are a couple of paid vendors with much richer data including sensor information. In conclusion, having access to a greater diversity of data should make it possible to do a broader range of analytics beyond the obvious metric.

Paid Historical Data

FYI, the above vendor supplies data to several IPL Teams but their minimum quote is pretty steep for analytics startups. It can almost cost over 3K dollars which is quite pricey from a sub-continent point of view.

Paid Streaming Data

Streaming data or live feed is used by fantasy sites to be able to run their games and update scores. This kind of service involves hitting a specified API service and updated match info ball-ball.

Stack & Resources

The typical Fantasy game has a simple platform to choose the 11 odd players. Based on the points incurred, the top fantasy teams would be deemed winners and eligible for prizes.

However, building any sophisticated or offbeat Fantasy game/ analytics over the streaming data had several challenges :

  • The ball update typically had a delay of 5 seconds which in rare cases would extend to 15 sec or more. This delay was incredibly volatile and made building a live analytical engine difficult.
  • The data quality in streaming services has its own challenges involving frequent errors which would later be corrected.

There are no known fan-based engagement numbers streaming service providers.

Use Cases & Stakeholders

  • Fans: Analytical reports can be a source of engaging news and alternate medium for fans to ponder on. This is something along the lines of FiveThirtyEight.
  • League Teams: IPL franchises and other T20 leagues are a ripe customer for such analytics. Though analytics is still prevalent, it is largely driven by video analysts who or were largely ex-cricketers. They have no statistical backgrounds resulting in the same old domain knowledge being circulated around.
  • Media/ Agencies: Fan engagement numbers and even player performance forecasts etc can be incredibly useful for advertising agencies and celebrity management firms. However, they can better price their associated players. Firms looking to advertise can make a more scientific assessment of their marketing spends.

Landscape & Opportunities

The fantasy & streaming service are the two primary fan endpoints with both Dream11 and HotStar going head to head in terms of their future goals.

  • You have Cricbuzz & Cricinfo dominating the content landscape. They have the largest volume of visits but suffer from poor engagement time and the fact that their offering has no direct monetisation.
  • Dream11 has the numbers in terms of paying user base and very fast-growing one but poor engagement numbers seeing the nature of their static game. The next logical step is to go for some sort of streaming.
  • HotStar has the best of both worlds, official streaming partners so not only high engagement numbers but given their recent foray into fantasy, they might eat into Dream11’s pie.

Given the interesting dynamics, it looks like an open fight between Dream11 and HotStar with both and Cricinfo looking like potential acquisitions.

Lastly, I built an alternative means to analyse player performance based on this paper: The Best Batsmen And Bowlers in One Day Cricket

Also read: Poker Analytics For Beginners

Originally published at

Startups & Analytics