In order to find competitive advantages to beat spreads and totals we are going to need to either watch hundreds of games or find another way to understand how teams play, and how they play against each other. The way in this case is to find some data. An easier said than done task as anyone who has tried to do this can attest. While the big 3 professional sports have a wide variety of good options both paid and free, men’s college basketball is a lot tougher to hone-in on. Lets take a look at what is out there.
Maybe the holy grail of sports statistics. However it comes at a cost. How much? You ask. Well if you have to ask you probably can’t afford it. From what I can ascertain from Redittors who have called to inquire, the price is well into the 5 figures. If you have that kind of money lying ar0und you may already be an uber-successful sports better and likely already have the database or API you need. The fact that prices aren’t listed on the website is probably a large enough of a red-flag that this is going to be prohibitive to any hobbyist or starting out sports bettor.
Provide an API based approach to query the data you need. They subdivide their packages by sport so you can buy only the data you need. They offer a vast array of data from box scores, player profiles, and game summaries. Props to them for being upfront about their pricing, but this will be the biggest barrier to entry. The cost for the most basic API is $950 a month for college basketball data. Alternatively the cost for historic data feeds are around $3,000. They do offer a demo program to get your feet wet with modified (not-real) data so you can write code against their API and not pay until you are ready. Unfortunately the cost is still a little prohibitive for most casual bettors.
Finally a resource with a very reasonable $15 a year fee. They provide some nice breakdowns of both team and individual statistics. What is intriguing to me is they have a way to measure percentage of shots at the rim, vs 2 pt. jump shots. Something not readily available in the typical box scores. Its unclear how this is determined or if these are estimates based off some other data, but is interestingly nonetheless. This is a source that seems worth exploring further in a future blog post.
This site seems to be the cheapest of the API’s I have found. They provide a free trial to develop your code until you are ready to upgrade to the $499 monthly fee. Fortunately they only charge this during active months of the season with other months being billed at $79 each (although you can probably cancel and renew again the next year). Definitely worth exploring if you have the money, but I imagine most do not or will not throw $3,000 a year at this so lets look at whats left.
A great site for up to date rankings of teams and their efficiencies. Not a feed or API based service, its just html pages that you can sort by various statistics some adjusted for the competition each team faces. At $20 a year its very affordable and provides in depth detail derived from box scores since the 2002/2003 season. The one caveat is the data is not necessarily static, efficiencies get adjusted in real time and occasionally prior seasons data can be changed due to different algorithms that better try to predict efficiency. Not a major deal, just something to be aware of.
The “free” way of obtaining data. Is it legal? Is it not? I am not a lawyer, I offer no advice or recommendation other than recognizing its an option that is readily available. For programmers, a variety of tools such as cURL, R, or a variety of other programming/scripting languages can be used. For those less technically inclined Microsoft Excel provides a “Web Query” operation (available in the data tab) that can automatically be used to draw refresh-able data from various html tables found on the web. We may go into more depth on these various options later on, particularly R as it seems to be the way of the future as far as statistical purposes go and has a lot of packages that can parse data out of html.
In the mean time, if you choose to scrape the web, read the terms of service, and be respectful. If site’s get hit too often you will likely get your IP blocked if not worse. More than likely if you make some queries with a reasonable wait between requests, in a semi-random pattern nobody will blink an eye, if you take someones server down that’s another story.
The most notable site you will likely find is ncaa.org which keeps very detailed statistics for all divisions of college basketball. However the way the data is organized means you will have to make a lot of requests to get a seasons worth of data. ESPN is another alternative with a similar data structure.
Some of these options are more viable than others, but at the end of the day it comes down to weighing price vs technical ability vs who has the data you want and what you are willing to put in. At the very least I recommend checking out the rankings at kenpom.com and hoop-math.com. Combined they are a $35 year investment and can provide some great insight until you are ready to invest more.