Cricket Analytics Starter Kit — Data Science Projects — Statistical Arbitrage

Abhinav Unnam
5 min readApr 3, 2021

Cricket has a crazy following in the sub-continent with IPL being last valued at 5.3 billion USD. This game of bat and ball largely prevalent in Commonwealth Nations is not just interesting to watch but has an equally growing analytical use case. Such much so, there seem to be courses/classes around Cricket Analytics!!

To me, cricket is a simple game. Keep it simple and just go out and play

Shane Warne

With the discrete nature of the game and the growth of IPL, the need for cricket analytics as an edge both for on-field performance and other ancillary services is growing.


  • The tools rely on a D/L based index which combines strike rate and runs for batsmen.
  • The same index works in an opposite direction and combines economy/wickets for bowlers.
  • The idea is to bring contribution/effectiveness to a single number and to be able to compare them.
  • The paper and some of the resources are mentioned towards the end.
  • The tools can be used separately to explore batsmen and bowlers. The data though is currently, only from IPL matches of the last few years. It might be from 2017 onwards and might have a few missing values.

Given that my previous startup experience was around trying to cash on this growing niche, I wanted to recap and document my learnings about the ecosystem in general. Broadly as mentioned above, the opportunity lies across two directions :

The fan engagement aspect largely involves the fantasy gaming sites, IPL Teams and any other celebrity imports, primarily Bollywood and so on. The private nature of the data involves fan engagement but the large part of the post is about performance analysis data.


Before you can play around with the data, the first question is where do you get it. Having similarity with Baseball which has a whole branch of analytics called Sabermetrics, Cricket analytics is still in very early stages.

The only open-source data set available was at Cricsheet. Unfortunately, it stopped updating from July 2017 onwards. But White Ball Analytics recently released an updated version.

Courses for Cricket Analytics

There are a couple of Sabermetrics courses online that should be able to give an idea or impetus around getting started with Cricket Analytics. How to define KPIs and think about performance analysis in general.

There are only a few books on the subject with a couple of them being by Tiniam V Ganesh.


These are some blogs you can refer to, to get an idea about the work already done. The approaches that were taken and the challenges with analysis and otherwise.

Data & Processing

The most important part of being able to do any good analytics is dependent on the quality and breadth of data available. However the only freely available data sets are by Irish & English gentlemen. It is ironic as we know India is home to IPL.

This free historical dataset is limited to a ball-ball events catalogue. But based on my experience, there are a couple of paid vendors with much richer data including sensor information. In conclusion, having access to a greater diversity of data should make it possible to do a broader range of analytics beyond the obvious metric.

Paid Historical Data

Source: Agaram Infotech

FYI, the above vendor supplies data to several IPL Teams but their minimum quote is pretty steep for analytics startups. It can almost cost over 3K dollars which is quite pricey from a sub-continent point of view.

Paid Streaming Data

Source: Cricket API

Streaming data or live feed is used by fantasy sites to be able to run their games and update scores. This kind of service involves hitting a specified API service and updated match info ball-ball.

Stack & Resources

It can be inferred from the two courses mentioned that SQL for data storage & R for basic statistical analysis is more than enough for standalone reporting.

The typical Fantasy game has a simple platform to choose the 11 odd players. Based on the points incurred, the top fantasy teams would be deemed winners and eligible for prizes.

However, building any sophisticated or offbeat Fantasy game/ analytics over the streaming data had several challenges :

  • The ball update typically had a delay of 5 seconds which in rare cases would extend to 15 sec or more. This delay was incredibly volatile and made building a live analytical engine difficult.
  • The data quality in streaming services has its own challenges involving frequent errors which would later be corrected.

There are no known fan-based engagement numbers streaming service providers.

Use Cases & Stakeholders

The entire idea behind carrying out this analysis is to be able to use them for some purpose. The numbers crunched can be consumed by :

  • Fans: Analytical reports can be a source of engaging news and alternate medium for fans to ponder on. This is something along the lines of FiveThirtyEight.
  • League Teams: IPL franchises and other T20 leagues are a ripe customer for such analytics. Though analytics is still prevalent, it is largely driven by video analysts who or were largely ex-cricketers. They have no statistical backgrounds resulting in the same old domain knowledge being circulated around.
  • Media/ Agencies: Fan engagement numbers and even player performance forecasts etc can be incredibly useful for advertising agencies and celebrity management firms. However, they can better price their associated players. Firms looking to advertise can make a more scientific assessment of their marketing spends.

Landscape & Opportunities

Despite the growth in tech in recent times, the majority of stakeholders who run the show(BCCI, IPL Teams) have been very slow to adopt and less willing to bet on newer possibilities. Though, it has to be mentioned that both HotStar and Dream11 have made some serious strategic moves backed by sound technical expertise.

The fantasy & streaming service are the two primary fan endpoints with both Dream11 and HotStar going head to head in terms of their future goals.

  • You have Cricbuzz & Cricinfo dominating the content landscape. They have the largest volume of visits but suffer from poor engagement time and the fact that their offering has no direct monetisation.
  • Dream11 has the numbers in terms of paying user base and very fast-growing one but poor engagement numbers seeing the nature of their static game. The next logical step is to go for some sort of streaming.
  • HotStar has the best of both worlds, official streaming partners so not only high engagement numbers but given their recent foray into fantasy, they might eat into Dream11’s pie.

Given the interesting dynamics, it looks like an open fight between Dream11 and HotStar with both and Cricinfo looking like potential acquisitions.

Lastly, I built an alternative means to analyse player performance based on this paper: The Best Batsmen And Bowlers in One Day Cricket

Also read: Poker Analytics For Beginners

Originally published at