Anyone who has visited Ken Pomeroy’s site kenpom.com or read Dean Oliver’s “Basketball on Paper” is familiar with the concept of efficiencies. More simply put the amount of points scored per possession. College basketball is a sport that can effectively be modeled using an expected offensive efficiency (points scored per possession) compared against an opponents expected defensive efficiency (points allowed per possession). Combine that with the number of possessions per team, and you can predict the final score.
home score = home teams offensive efficiency vs visitors defensive efficiency * # of possessions
visitor score = visitors offensive efficiency vs home teams defensive efficiency * # of possessions.
A couple of clarifications are already needed to the above formulas. We say “vs” between an offensive and defensive efficiency, there are different ways to calculate this. If a team allows 0.8 points per possession and the opponents score 1.05 points per possession, what do we expect to happen when they play? The first thought might be to average the two numbers. This would be incorrect though, because a defensive efficiency of allowing 0.8 points per possession would likely be the best in the league, and assuming that is from playing a variety of opponents that likely have an accumulated offensive efficiency around the league average (~1.02 points per possession, varies year to year), playing an offense that scores slightly better we would not expect anywhere near the 1.05 efficiency they usually produce, we would probably expect somewhere between 0.8 and 0.85 points per possession. We can bring league average into the equation (simplified for now, but assuming a team plays both good and bad opponents and over time will average out, we will get more accurate later on). Say an expected offensive efficiency can be added to the opponents defensive efficiency, and then subtract the average efficiency out of it. So in this case the expected defensive efficiency would be 0.8 + 1.05 (the opponents avg offensive efficiency) – the league average 1.02 which equals 0.83. We would expect this opponent who scores slightly better than most teams in the league to similar score slightly more against team A. Similar calculations can be made for offensive efficiency.
The other clarification is finding the exact number of possessions, which is not a stat provided in a typical box score, however can be estimated by counting the number of made shots (including trips to the free throw line), defensive rebounds, and turnovers.
Another flaw with what we have proposed is that teams play other teams of different strengths over the course of the season. Often times some teams will have a much higher strength of schedule than others, which would tend to lead to lower efficiencies. Fortunately we have ways of accommodating this. We can look at the quality of opponents which a team faced and adjust our predicted efficiencies accordingly. I won’t go into too much detail on how this is done here, the short of it is if we want to calculate team A’s adjusted offensive efficiency, we need to look at the defense of every team, team A has already played and compare it with the national average. If they have played a weaker than average schedule, we would give them a bump in adjusted offensive efficiency, else we would reduce expectations. See http://kenpom.com/blog/ratings-methodology-update/ for a full description.
Betting on college basketball using adjusted efficiencies.
So now that we have an idea of what adjusted efficiencies are, I wanted to explore if these could be utilized to beat the spread and/or totals bet in college basketball. We learned in an earlier post that most college basketball lines are set extremely close to these adjusted efficiencies. However maybe there is a large enough discrepancy to exploit some weakness here. So I designed an experiment to find out.
For this experiment I am using college basketball data collected from the 2003/2004 through the 2014/2015 seasons. In order to calculate adjusted efficiencies I need a decent sampling of game data each season. For that reason I am only considering games from January through the end of each season. I don’t include any preseason rankings, or other prediction based approaches, I want this to be fueled by real data that resets each season, so I exclude the first two months from my simulated bets and use them only as data for calculating adjusted efficiencies.
I would have liked to compare with the adjusted efficiencies directly from kenpom.com. However, the data presented on that site is constantly changing as the season progresses. There is no way to go back and view the adjusted efficiencies at a specific point in time. I want a purely predictive model, so I needed a new approach. To solve this problem I have decided to calculate my own adjusted efficiencies based in a way as similar to Ken Pomeroy as I can. For this I calculated the raw offensive and defensive efficiencies, along with the predicted possessions for each game, calculated by:
(Field goal attempts – offensive rebounds) + turnovers + (0.475 * free throw attempts)
Then adjusted for competition as explained above. So for each game, I looked at team A and every team B it had played prior in that season, and calculated team A’s average offensive efficiency, and adjusted it for each of team B’s defensive efficiency performances up until that point in the season against the national average. So if team A averaged 1.05 points per possession (more than the league average), but their opponents adjusted defensive efficiency also allowed 1.05 points per possession (also more than the league average), I would adjust team A’s expected offensive efficiency to be the league average (1.02). Similar calculations were made for the adjusted defensive efficiency. Note that only games against Division I opponents were included in these calculations.
Before analyzing any results, I cross-checked my results with some of the late-season games each season, as these games should be the closest in comparison in my model to kenpom.com’s predictions. They were not exact matches, as his model likely weighs other factors such as favoring recent games and possibly considering the site of each game played. I have yet to find his exact formula for his calculations, however the values I cross-checked were reasonably close. Each adjusted efficiency averaged to be within a 2% difference with his model, not exact but close enough for now.
Nerd Speak: For this experiment I wrote a C# program to create my model. I load the raw data from csv’s and store into a sql database. For each game, I query the database for the home and visiting teams, I load every Division I game played up until that point in the season, calculate the adjusted efficiencies looking not only at every game the home and visitors played, but also each game all of their opponents played in order to determine proper weights for my adjusted efficiency model. For each game I output a predicted score for both teams and spit out into an Excel spreadsheet. I use some simple functions in Excel to evaluate how the model did, and visually cross-check that my results seem realistic.
Adjusted Efficiency Betting Results
21584 games were used for my analysis. While there were more applicable games in the January-April time frame for these college basketball seasons, I could only evaluate against games I could find betting lines for. I had purchased a historical data set, which was mostly complete but had some holes. My first approach included every game in this data set, evaluating against the closing spread and closing total line for each game. Here were the results:
Wins: 10617 Losses: 10523 Win %: 0.492
Wins: 10523 Losses: 11061 Win %: 0.488
Unfortunately, these results did not show any advantage. My next step was to try to conclude why. Perhaps because I am betting on every game, despite the differential between my prediction and the perceived advantage it might have over the spread. To test this hypothesis, I decided to only consider games where my predicted score differed from the Vegas spread by 5 points or more, and 8 or more for the totals bet. Lets look at the results.
Wins: 1149 Losses: 1187 Win %: 49.18
Wins: 1337 Losses: 1450 Win %: 47.97
Again, not the results I was secretly hoping for. There seems to be no advantage in using adjusted efficiencies the way I have to predict college basketball spreads or totals. However, it did give me some evidence that my model was fairly accurate at predicting Vegas spreads as 89.2% of the games I predicted the score differential was within 5 points of the spread. Considering my model does not count for injuries, other day to day lineup adjustments, or any perceived “hot streaks” that may influence the line one way or another I would say It is fairly a good prediction model, but one that is better at predicting Vegas spreads than beating them.
In this experiment we showed there is no easy button in beating college basketball spreads. We can’t simply plugin kenpom efficiencies and hope to go break the bookies in Vegas. However this won’t be the last we see of efficiencies, we can break them down into the four factors and look at how teams effective field goal percentage, offensive rebounding, turnovers, and free-throw rate match-up against their opponents, we will also tap into some machine learning approaches to try to dig deeper into understanding how to beat the college basketball spread. More to come.