Betting College Basketball with Adjusted Efficiencies


Anyone who has visited Ken Pomeroy’s site or read Dean Oliver’s “Basketball on Paper” is familiar with the concept of efficiencies.  More simply put the amount of points scored per possession.  College basketball is a sport that can effectively be modeled using an expected offensive efficiency (points scored per possession) compared against an opponents expected defensive efficiency (points allowed per possession).  Combine that with the number of possessions per team, and you can predict the final score.

home score = home teams offensive efficiency vs visitors defensive efficiency * # of possessions
visitor score = visitors offensive efficiency vs home teams defensive efficiency * # of possessions.

A couple of clarifications are already needed to the above formulas.  We say “vs” between an offensive and defensive efficiency, there are different ways to calculate this.  If a team allows 0.8 points per possession and the opponents score 1.05 points per possession, what do we expect to happen when they play?  The first thought might be to average the two numbers.  This would be incorrect though, because a defensive efficiency of allowing 0.8 points per possession would likely be the best in the league, and assuming that is from playing a variety of opponents that likely have an accumulated offensive efficiency around the league average (~1.02 points per possession, varies year to year), playing an offense that scores slightly better we would not expect anywhere near the 1.05 efficiency they usually produce, we would probably expect somewhere between 0.8 and 0.85 points per possession.  We can bring league average into the equation (simplified for now, but assuming a team plays both good and bad opponents and over time will average out, we will get more accurate later on).  Say an expected offensive efficiency can be added to the opponents defensive efficiency, and then subtract the average efficiency out of it.  So in this case the expected defensive efficiency would be 0.8 + 1.05 (the opponents avg offensive efficiency) – the league average 1.02 which equals 0.83.  We would expect this opponent who scores slightly better than most teams in the league to similar score slightly more against team A.  Similar calculations can be made for offensive efficiency.

The other clarification is finding the exact number of possessions, which is not a stat provided in a typical box score, however can be estimated by counting the number of made shots (including trips to the free throw line), defensive rebounds, and turnovers.

Another flaw with what we have proposed is that teams play other teams of different strengths over the course of the season.  Often times some teams will have a much higher strength of schedule than others, which would tend to lead to lower efficiencies.  Fortunately we have ways of accommodating this.  We can look at the quality of opponents which a team faced and adjust our predicted efficiencies accordingly.  I won’t go into too much detail on how this is done here, the short of it is if we want to calculate team A’s adjusted offensive efficiency, we need to look at the defense of every team, team A has already played and compare it with the national average.  If they have played a weaker than average schedule, we would give them a bump in adjusted offensive efficiency, else we would reduce expectations.  See for a full description.

Betting on college basketball using adjusted efficiencies.

So now that we have an idea of what adjusted efficiencies are, I wanted to explore if these could be utilized to beat the spread and/or totals bet in college basketball.  We learned in an earlier post that most college basketball lines are set extremely close to these adjusted efficiencies.  However maybe there is a large enough discrepancy to exploit some weakness here.  So I designed an experiment to find out.

For this experiment I am using college basketball data collected from the 2003/2004 through the 2014/2015 seasons.  In order to calculate adjusted efficiencies I need a decent sampling of game data each season.  For that reason I am only considering games from January through the end of each season.  I don’t include any preseason rankings, or other prediction based approaches, I want this to be fueled by real data that resets each season, so I exclude the first two months from my simulated bets and use them only as data for calculating adjusted efficiencies.

I would have liked to compare with the adjusted efficiencies directly from  However, the data presented on that site is constantly changing as the season progresses.  There is no way to go back and view the adjusted efficiencies at a specific point in time.  I want a purely predictive model, so I needed a new approach.  To solve this problem I have decided to calculate my own adjusted efficiencies based in a way as similar to Ken Pomeroy as I can.  For this I calculated the raw offensive and defensive efficiencies, along with the predicted possessions for each game, calculated by:

(Field goal attempts – offensive rebounds) + turnovers + (0.475 * free throw attempts)

Then adjusted for competition as explained above.   So for each game, I looked at team A and every team B it had played prior in that season, and calculated team A’s average offensive efficiency, and adjusted it for each of team B’s defensive efficiency performances up until that point in the season against the national average.  So if team A averaged 1.05 points per possession (more than the league average), but their opponents adjusted defensive efficiency also allowed 1.05 points per possession (also more than the league average), I would adjust team A’s expected offensive efficiency to be the league average (1.02).  Similar calculations were made for the adjusted defensive efficiency.  Note that only games against Division I opponents were included in these calculations.

Before analyzing any results, I cross-checked my results with some of the late-season games each season, as these games should be the closest in comparison in my model to’s predictions.  They were not exact matches, as his model likely weighs other factors such as favoring recent games and possibly considering the site of each game played.  I have yet to find his exact formula for his calculations, however the values I cross-checked were reasonably close.  Each adjusted efficiency averaged to be within a 2% difference with his model, not exact but close enough for now.

Nerd Speak: For this experiment I wrote a C# program to create my model.  I load the raw data from csv’s and store into a sql database.  For each game, I query the database for the home and visiting teams, I load every Division I game played up until that point in the season, calculate the adjusted efficiencies looking not only at every game the home and visitors played, but also each game all of their opponents played in order to determine proper weights for my adjusted efficiency model.  For each game I output a predicted score for both teams and spit out into an Excel spreadsheet.  I use some simple functions in Excel to evaluate how the model did, and visually cross-check that my results seem realistic.

Adjusted Efficiency Betting Results

21584 games were used for my analysis.  While there were more applicable games in the January-April time frame for these college basketball seasons, I could only evaluate against games I could find betting lines for.  I had purchased a historical data set, which was mostly complete but had some holes.  My first approach included every game in this data set, evaluating against the closing spread and closing total line for each game.   Here were the results:

Spread bets:
Wins: 10617  Losses: 10523  Win %: 0.492

Total bets:
Wins: 10523  Losses: 11061  Win %: 0.488

Unfortunately, these results did not show any advantage.  My next step was to try to conclude why.  Perhaps because I am betting on every game, despite the differential between my prediction and the perceived advantage it might have over the spread.  To test this hypothesis, I decided to only consider games where my predicted score differed from the Vegas spread by 5 points or more, and 8 or more for the totals bet.  Lets look at the results.

Spread bets:
Wins: 1149  Losses: 1187  Win %: 49.18

Total bets:
Wins: 1337  Losses: 1450 Win %: 47.97

Again, not the results I was secretly hoping for.  There seems to be no advantage in using adjusted efficiencies the way I have to predict college basketball spreads or totals.  However, it did give me some evidence that my model was fairly accurate at predicting Vegas spreads as 89.2% of the games I predicted the score differential was within 5 points of the spread.  Considering my model does not count for injuries, other day to day lineup adjustments, or any perceived “hot streaks” that may influence the line one way or another I would say It is fairly a good prediction model, but one that is better at predicting Vegas spreads than beating them.

In this experiment we showed there is no easy button in beating college basketball spreads.  We can’t simply plugin kenpom efficiencies and hope to go break the bookies in Vegas.  However this won’t be the last we see of efficiencies, we can break them down into the four factors and look at how teams effective field goal percentage, offensive rebounding, turnovers, and free-throw rate match-up against their opponents, we will also tap into some machine learning approaches to try to dig deeper into understanding how to beat the college basketball spread.  More to come.



College Basketball Databases and API’s


In order to find competitive advantages to beat spreads and totals we are going to need to either watch hundreds of games or find another way to understand how teams play, and how they play against each other.  The way in this case is to find some data.  An easier said than done task as anyone who has tried to do this can attest.  While the big 3 professional sports have a wide variety of good options both paid and free, men’s college basketball is a lot tougher to hone-in on.  Lets take a look at what is out there.

Maybe the holy grail of sports statistics.  However it comes at a cost.  How much?  You ask.  Well if you have to ask you probably can’t afford it.  From what I can ascertain from Redittors who have called to inquire, the price is well into the 5 figures.  If you have that kind of money lying ar0und you may already be an uber-successful sports better and likely already have the database or API you need.  The fact that prices aren’t listed on the website is probably a large enough of a red-flag that this is going to be prohibitive to any hobbyist or starting out sports bettor.

Provide an API based approach to query the data you need.  They subdivide their packages by sport so you can buy only the data you need.  They offer a vast array of data from box scores, player profiles, and game summaries.  Props to them for being upfront about their pricing, but this will be the biggest barrier to entry.  The cost for the most basic API is $950 a month for college basketball data.  Alternatively the cost for historic data feeds are around $3,000.  They do offer a demo program to get your feet wet with modified (not-real) data so you can write code against their API and not pay until you are ready.  Unfortunately the cost is still a little prohibitive for most casual bettors.

Finally a resource with a very reasonable $15 a year fee.  They provide some nice breakdowns of both team and individual statistics.  What is intriguing to me is they have a way to measure percentage of shots at the rim, vs 2 pt. jump shots.  Something not readily available in the typical box scores.  Its unclear how this is determined or if these are estimates based off some other data, but is interestingly nonetheless.  This is a source that seems worth exploring further in a future blog post.

This site seems to be the cheapest of the API’s I have found.  They provide a free trial to develop your code until you are ready to upgrade to the $499 monthly fee.  Fortunately they only charge this during active months of the season with other months being billed at $79 each (although you can probably cancel and renew again the next year).  Definitely worth exploring if you have the money, but I imagine most do not or will not throw $3,000 a year at this so lets look at whats left.

A great site for up to date rankings of teams and their efficiencies.  Not a feed or API based service, its just html pages that you can sort by various statistics some adjusted for the competition each team faces.  At $20 a year its very affordable and provides in depth detail derived from box scores since the 2002/2003 season.  The one caveat is the data is not necessarily static, efficiencies get adjusted in real time and occasionally prior seasons data can be changed due to different algorithms that better try to predict efficiency.  Not a major deal, just something to be aware of.

Web Scraping

The “free” way of obtaining data.  Is it legal?  Is it not?  I am not a lawyer, I offer no advice or recommendation other than recognizing its an option that is readily available.  For programmers, a variety of tools such as cURL, R, or a variety of other programming/scripting languages can be used.  For those less technically inclined Microsoft Excel provides a “Web Query” operation (available in the data tab) that can automatically be used to draw refresh-able data from various html tables found on the web.  We may go into more depth on these various options later on, particularly R as it seems to be the way of the future as far as statistical purposes go and has a lot of packages that can parse data out of html.

In the mean time, if you choose to scrape the web, read the terms of service, and be respectful.  If site’s get hit too often you will likely get your IP blocked if not worse.  More than likely if you make some queries with a reasonable wait between requests, in a semi-random pattern nobody will blink an eye, if you take someones server down that’s another story.

The most notable site you will likely find is which keeps very detailed statistics for all divisions of college basketball.  However the way the data is organized means you will have to make a lot of requests to get a seasons worth of data.  ESPN is another alternative with a similar data structure.

Some of these options are more viable than others, but at the end of the day it comes down to weighing price vs technical ability vs who has the data you want and what you are willing to put in.  At the very least I recommend checking out the rankings at and  Combined they are a $35 year investment and can provide some great insight until you are ready to invest more.

Setting the lines

The last 7 March’s I have spent in Vegas watching basketball with sides of booze and betting.  New teams every year with a lot of the same powerhouses making their annual cameos.  One constant I can’t avoid hearing year after year is how well the lines are set.  Hearing about how great a job the bookmakers do, and what inside knowledge they must have when a game lands within a point or two of the spread.


With 48 games the opening weekend alone (not counting play-in games) odds are bound that some of them are going to finish close.  One of my first realizations is that these lines are not magic numbers pulled out of a hat at some soon to be demolished casino north of the strip, but a mixture of simple math or ripping off of one mans work.

In the regular season there are around 350 D-1 teams, it would take a small army to watch all or even most of the games played in a regular season.  I can assure you that nobody is doing this.  Lines aren’t created from expert analysis having watched hundreds of games, but rather created from some simple math using two teams expected efficiencies adjusting for home or away and any possible injuries.  Fortunately for the bookmakers they don’t even have to do the simple math, as one man does the dirty work for them.  Lets take a look at a couple examples, we will use games played tomorrow to minimize biases.


The predictions are provided by  A subscription is required for full access to the site, but lets take a look at the predicted scores for these three games.

Butler at home is predicted to win 81-73.  Which translates to a -8 predicted spread.
Xavier at home is predicted to win 79-77.  Which translates to a -2 predicted spread.St. Johns at home is predicted to win 80-71.  Which translates to a -9 predicted spread.

Do you see where I am going with this?  In the first three examples I could find, we can predict what the spread will be within a 1 point margin.  In the past I have done analysis to determine what the difference is between the spread and kenpom’s predictions, and it averages to be slightly less than a 2 point difference.  Game point total predictions can be made in a similar manner with equally as convincing data.  These numbers aren’t being conceived from thin air.  They are simply a calculation of the expected pace (based on season long averages for each team and their opponents), times the expected offensive efficiency of team A vs the expected defensive efficiency of team B and the expected pace times the expected offensive efficiency of team B vs the expected defensive efficiency of team A.  An expected efficiency is just the amount of points scored or allowed on a per possession basis. For a full explanation of how these numbers are generated please read kenpom’s site or Dean Olivers book as their work is based off these concepts.

The point of this article is not to get bogged down in the exact math behind these predictions, we will elaborate on that in the future.  The point is to understand that these spreads are predictable with great accuracy, and we will reference these predictions as a baseline for developing algorithms to attempt to do better at predicting basketball.  More to come.