Beating the Bookies

With the 2020-21 Premier League season over, I have been working on a way to apply my predictive AI skills to the problem of football match prediction. This is what I've learned.

Baselines

In order to exaluate any model we build, we first have to develop a baseline of match prediction to compare with our model. A baseline is simply a metric we can use to assess the contribution of the model to our prediction accuracy. Guessing randomly is a common baseline to be compared against so we will examine the accuracy of two baseline strategies. Namely guessing randomly between away wins, home wins and draws (Strategy 1) and randomly guessing between away or home wins (Strategy 2).

The above graph compares the prediction accuracy of the two potential baseline strategies over the course of the 20-21 Premier League season. The faint red lines show the simulated outcomes for Strategy 1, while the faint blue lines show the simulated outcomes for Strategy 2. The darker lines show the average accuracy for their respective strategies over the course of the season. Using these simulations, we can also estimate the probabilities of achieving any giving win percentage at the end of the season.

Accuracy	Strategy 1 odds	Strategy 2 odds
≤30%	1 in 0.008	1 in 0.0000...
≤40%	1 in 273	1 in 0.04
≤50%	1 in 999999...	1 in 61
≤60%	1 in 999999...	1 in 999999...

The odds then get progressively more unachievable from there. So now we know how good randomly guessing would be, exactly how good are the bookies?

The Opposition

So, how good are the bookies at predicting the results of Premier League matches? The graph below shows the accuracy of a number of popular odds makers over the course of the 20-21 Premier League season.

On average, the bookies accurately predicted around 51.57% of games during the season. VC Bet and Bet&Win were tied for most the accurate, correctly predicting 51.84% of all games. Meanwhile, Interwetten and Pinnacle were tied for least accurate with a seasonal accuracy of 51.32%. There is only one correctly predicted game between the two most accurate bookies and the average, and another game between the average and the two least accurate. What's also interesting is that there are significant differences in the prediction accuracy for each team with the bookies faring better in predicting the outcomes of matches involving some teams over others.

Interestingly, Sheffield United games were the easiest to predict with an average accuracy of 81.57%. Meanwhile Brighton games were the hardest to predict, with an average accuracy of only 31.57%. But enough about the bookies, let's get onto our team.

The Model

Our model is a neural network trained on a dataset including data from FBref, football-data, and Github user ewenme. So how well does it perform? Pretty well. Below is a comparison of the accuracy of the model compared with the average bookie over the course of the 20-21 season.

At first glance, this performance may seem somewhat undewhelming, but the model does indeed outperform the average bookie. The model even outperforms the two most accurate bookies, VC Bet and Bet&Win. Specifically, the model correctly guessed the outcome of 52.11% of games to their 51.84%. Fine margins, but we now have a model that outperforms the bookies. How would we perform over the course of the season if we had actually betted with it.

Betting with Models

It was here where I became aware of just how important having a betting strategy was for your performance when gambling. You can't just start betting on games and hope to make a profit. You need a strategy. I decided to simulate a couple of strategies combined with the model to see how they perform over the season. Below are a brief description of each of the strategies I chose.

Strategy	Description
Strategy 1	Betting 10% of the bank on the predicted outcome
Strategy 2	Bet 10% but only when the model is >70% sure of the outcome
Strategy 3	'Expected value' betting strategy as outlined by David Sumpter in his book 'Soccermatics'
Strategy 4	A more conservative version of the above strategy

I'll quickly explain what strategies 3 and 4 actually are before I continue. David Sumpter outlines these strategies in his book Soccermatics which revolve around the use of 'expected values'. Expected value is essentially calculating what you would expect to get from the bet given that we have the estimated probabilities and the possible reward/loss. The calculation follows this formula:

Expected value from bet = (Probability of being right * potential reward) + (Probability of being wrong * potential loss)

For Strategy 3 this value simply has to be greater than one, while for Strategy 4 this value has to be greater than 4. Now that we have covered the strategies, let's see how they performed. The graph below showcases the performance of each of the strategies over the course of the season.

Yeah, not great. All strategies bombed pretty hard. Even the most accurate strategy -Strategy 2, which accurately predicted 55% of games- ended the season with only £15.90 in the bank. Meanwhile, our most financially successful strategy, Strategy 1 ended the season with only £53.35 left in the bank.

These results showcase the fundamental importance of a betting strategy when gambling. You can't simply start betting on games without one. With this in mind, I'll now introduce Matthew Benham.

The Man with a Plan

Matthew Benham is many things, a physics graduate, owner of Brentford FC, and a professional gambler. He's also very secretive, particularly about his betting strategy that made him a millionare. However, I was able to glean an overview from a Youtube video about Brentford FC. The strategy essentially works like this: I compare my model's predicted outcome with the bookies predicted outcome and only place a bet when they differ. Going against the bookies is a risky strategy but it also makes use of more favourable odds because of this, thus increasing the potential rewards. This was the problem with our previously most successful strategy, Strategy 1. The risk was low as our model agreed with the bookies most of the time, meaning the rewards were also low (sometimes giving almost 1 to 1 odds). Benham's strategy solves this problem by putting our model directly against the bookies. The graph below shows how Benham's strategy performed compared with Strategy 1.

Wow. From an inital investment of £100 to a final bank of £581.79. What's also interesting about this strategy is how accurate it is. There is a mantra within professional sports gambling that a professional gambler has to win anywhere between 55-57% of their bets to be succesful. Benham's strategy is only right 40% of the time. Turns out, you just have to be rewarded more when you are right than lose when you are wrong. The graph below shows the importance of this fact when considering any betting strategy.

In this graph, the green bars show the average winnings when the respective strategy wins while the red bars show the average amount lost when that strategy loses. The blue bars show the expected return on any bet placed by that strategy, while the black bars show the maximum/minimum winnings and losses. As we can see, Benham's strategy outperforms all the other strategies with regards to their maximum winnings as well as their expected return per bet.

Now What?

I am tempted to test out this strategy in the upcoming 21-22 season, even if the actual betting is only simulated. If so I'll make a follow-up post to this one detailing my progress.