Calculating Home Court Advantage in College Basketball

Home court advantage is a term often thrown around.  It is a common theme across all sports, with varying rates of its influence.  From a logical standpoint, some aspects make sense, not having to travel long distances (or cross time zones for that matter), having engaged fans rooting you on, and the comfort of being in a place you have often played before.  Books like Scorecasting suggest the possibility of other influences, particularly a bias in the referees to make calls that favor the home team.  Whatever the cause I want to explore the reach of home court advantage in college basketball.  While this has been done before, many times, I want to take this opportunity to exploit a powerful tool to help more accurately analyze true home court advantage.

One of the hurdles in calculating home court advantage in college basketball is the way teams schedule opponents.  Typically schools in the power conferences schedule a large percentage of exhibition type games with lesser opponents.  These games are almost never a home and away type setup.  If we were to calculate home court advantage using these games, we would get a lopsided result because the home team most likely always wins, and by a large margin.  For my calculation I want to restrict my data set to teams who play a home and away with each other in the same season.  Fortunately, conference play provides just this data.  The challenge lies in separating these games where two opponents play a home and away from all the others.  We will need some tools to assist us.

One common recurring theme when evaluating data is which tool is the best for the job?   Excel or its Open Office equivalents are often a good choice for tabular data, however so is mysql, or insert your favorite programming language.  I often find myself wanting to write queries against data in a delimited text file (csv), however I don’t want to layout a database schema, connect to a database, and perform inserts in order to do so.  It’s time prohibitive and tedious.  One tool I have found to be particularly helpful is a package called “Q – Text as Data“.  It is a simple command line utility that can run in Windows and Linux and let you query csv’s as if they were mysql tables, using the column headers as the table field names.

Calculating home court advantage in college basketball using Q

Calculating home court advantage in college basketball using Q

Back to our experiment of identifying games with a home and away within a single season.  Lets see how Q can help us.  I am starting with the following data set, which includes all games from 2005 to the 2015 seasons.  You can download it here:  2005-2015-scores.  Lets use some sql via Q to filter this file to the games we care about.  I will show the commands and then offer some explanation below.

q -H -d,
"SELECT AVG(a.teamscore - a.oppscore)
FROM 2005-2015.csv a
INNER JOIN(SELECT teamname, opponent, datestr, seasonyear, site, teamscore, oppscore from 2005-2015.csv WHERE (site = 'H' OR site = 'A')  group by teamname, opponent, seasonyear having count(*) = 2) b
ON a.teamname = b.teamname AND a.opponent = b.opponent and a.seasonyear = b.seasonyear and'H'"

The output of the above, we see teams win by 3.53 points per game in the home leg of the home and away.  Conversely the visiting disadvantage can be calculated as follows:

q -H -d,
"SELECT AVG(a.teamscore - a.oppscore)
FROM 2005-2015.csv a
INNER JOIN(SELECT teamname, opponent, datestr, seasonyear, site, teamscore, oppscore from 2005-2015.csv WHERE (site = 'H' OR site = 'A')  group by teamname, opponent, seasonyear having count(*) = 2) b
ON a.teamname = b.teamname AND a.opponent = b.opponent and a.seasonyear = b.seasonyear and'A'"

The output this time is -3.518, which represents the ppg the visitors lost by.  In this case the value of playing at home vs away is a swing of 7 points.  Home teams win by a margin of 3.53 ppg, while visiting teams lose at a margin of -3.518 ppg, so taking both of these into consideration we get a swing of 7 points (rounding to the nearest whole number).  That is our calculated home court advantage in college basketball.

Ok, so what did we just do?  You notice “q” is the name of the program running, we are passing a couple of parameters to it.
-H tells it to use the first row in the csv as the header, which translates to mysql column names.
-d, tells it that the text file we are passing in is comma delimited (defaults to pipe delimited).
the third option is the the sql to run, explained below.

Lets take a look at what this sql is doing, from the inside out.  You can see in our INNER JOIN we are grouping by teamname, opponent, and seasonyear which will isolate results for each combination of teams within a season.  We want only games where the site is ‘H’ or ‘A’, neutral games and semi-home or semi-away games are identified differently so we can rule out sites where there is not a true home court advantage.  We use only groups having exactly two games, where one game is home and one is away.  We do this to not include any additional times an opponent may have played, likely in the event of a tournament.  Next we join these group results with the original rows to return the original data set filtered to only include the games we care about.  From there we simply take the average of the score differential for each the home and away games to come up with our calculated home court advantage.

So there it is, we calculated a home court advantage of ~3.5 points per game, and a visitors disadvantage of ~-3.5 points per game, giving roughly a 7 point swing for non-neutral sites.  That is your calculated home court advantage in college basketball, courtesy of Q, which can give you an advantage in your analytics arsenal.


Easiest Fantasy League I Ever Won

I am going to take a quick aside from college basketball to tell you about the easiest fantasy league I ever won, and how you could have too.  Most people are familiar with your standard formats of fantasy football, baseball, or basketball.  Those who follow hockey or soccer also have regular fantasy leagues.  However, one in particular, is less popular but provided perhaps the easiest opportunity to win I have ever participated in.  Enter, NFL Playoff Challenge.

The brief synopsis is you play for four weeks, following the NFL playoffs.  Each week you choose a lineup consisting of a QB, 2 WR’s, 2RB’s, a TE, K, and DEF.  There are no salary caps, no draft, you can choose a new lineup each week.  Scoring follows standard fantasy football formats (non PPR) with one exception.  The multiplier.  Each week you start the same player at the same position, you gain a multiplier to that persons score.  For example if you were to start Aaron Rodgers in the wildcard round, when they won he would have scored 2X points in the divisional round, and 3X points in the NFC Championship, and had they made it all the way, 4X points in the Superbowl.  One thing to note, is you can select players not playing in the wildcard round, and they will automatically advance to a 2X multiplier in the next round (although will not net any points in the wildcard round).  If a team is eliminated, you can choose a new player next week, however the multiplier will be reset.

NFL Fantasy Playoff Challenge Strategy

Lets look at some simple strategy.  Assuming we know nothing about the teams being played.  If we look at the odds of a team playing in the wildcard round to reach the Superbowl, assigning a 50% chance to win each game, they have to win 3 games to advance which gives them a 12.5% probability.  Note we don’t care whether they win the last game, just get to it, as there are no games beyond that so advancing is no longer a concern, other than the winning team will likely yield more points.  A team with an opening round bye, only needs to win 2 games to reach the Superbowl, giving them twice the odds of a team playing in the wildcard round.  So we now know that choosing a team that gets a bye, will give us twice the chance of reaching the 4X multiplier we want.  The goal is to maximize our points, so what would you rather have?

Wildcard player:
1X + 2X + 3X + 4X = Max points possible (10X) if reaches Superbowl but half the odds of doing so.

First round bye player:
2X + 3X + 4X = 90% of Max points possible, but twice the odds of wildcard player.

Under these assumptions, it seems obvious it is in our best interest to pick players given the first round bye.  Yet few people will do so, with names like Antonio Brown and Le’Veon Bell available in the wildcard round, its hard to pass up, even though they are not as likely as advancing to the fantasy league finals where the coveted 4X multiplier comes into play.

Thus far we have assumed all teams are created equal, which we know is not the case.  This year in particular all the favorites won the wildcard games, if we assume that was the case going in, we could justify taking the big names players on those teams.  The problem though, is picking which of the teams that advanced to the divisional rounds would make it to the Superbowl.  On the NFC side, we had Atlanta, Dallas, Seattle, and Green Bay.  Any one of which had a legitimate shot at winning.  How do we know which one we want to pick players from?  Do we guess, pick a sampling from various teams and hope for a more balanced approach?

No, absolutely not!  While the NFC was a crap-shoot, the AFC this year was a different story.   Enter the Houston Texans and Oakland Raiders.  One team with no business being in the playoffs with a QB throwing more picks than he did touchdowns.  The other team an offensive powerhouse who lost their starting and backup QB’s.  Who does the winner get to play?  None other than Tom Brady and the Patriots in the divisional round.  Chalk this up to a free win for New England, boosting their odds of reaching the Superbowl to at least 50%, assuming they are the favorites to win the AFC championship since they will host it at home.  See where I am going with this?  While the NFC is going to be a toss-up, I can fill my NFL Playoff Challenge team with Patriots, who have a better than 50% chance of advancing to the final game, and thus receiving the 4X multiplier.

There lies the secret of my fantasy playoff strategy this year.  Load up on Patriots.  Had New England lost to Pittsburgh, I would have no chance of winning.  However I weighed that risk, vs the riskier scenario of having to pick which NFC team would advance and the choice became simple.  I filled 6 out of the 8 positions with Patriots.  I could have chosen all 8, however it is hard to predict if a #2 WR or RB are worth it even with the multiplier given the lack of opportunities as their #1 counterparts.  Lucking out with Julio Jones as my other WR was an added bonus (originally had Jordy Nelson, but after he got hurt, I switched to Julio).  My other RB slot was reserved for E. Elliot, who didn’t advance, but having a player from the 1 and 2 seeded teams in the NFC gave me a good shot of getting one in the Superbowl for the 4x bonus.  While I was trailing the first 3 weeks of the playoffs, I can assure you my fantasy league opponents were horrified to see 6 out of 8 players with a 4X point multiplier next to their name, as well as Julio with a 3X.  Although they did talk some smack after a dud of a first half by New England, Tom Brady answered and turned it around.  Check cashed, easiest fantasy league I ever won!


March Madness Prop Bets

March Madness Prop Bets
After a thrilling Superbowl weekend, the thought crossed my mind, why don’t other sports offer more prop bets, particularly college basketball’s March Madness?  There are a lot of similarities.  Often times your team is not the one competing so another reason to root for something can be refreshing.  Sure, there is the argument that it is an amateur sport, and we shouldn’t be betting on kids, and the fear that some player may exploit it for profit.  However, how fun would it be to have readily available prop bets for March Madness?  You are throwing money down in your office pool, but once the opening weekend is done and you are all but eliminated wouldn’t it be fun to double down and have a little more action?  March Madness prop bets could be just the ticket.

Lets take a look at some possible examples:

What will be higher throughout March Madness?
+150 Grayson Allen Trips
-200 12 seed vs 5 seed upsets

Number of schools names mispronounced in the opening round?
-300 Over 2.5
+200 Under 1.5

Will Donald Trump fill out a bracket?
– 700 No
+500 Yes

Number of games decided by one point throughout March Madness?
– 110 Over 3.5
– 110 Under 3.5

Which conference will have more wins?
Pac 12  2/1
Big 10  3/2
WCC    2/1

Will the National Champion be a repeat winner?
– 200 Yes
+ 150 No

Will a 16 seed knock off a 1 seed?
Yes  20/1
No   1/20

What will be higher?
The number of games Gonzaga wins
The number of Kentucky players who declare for the draft

Other common types of March Madness prop bets are which seed will end up winning the tournament?  Or how many 1 seeds will reach the Final Four?  However, I particularly enjoy the offbeat comparisons style bets.  A would you rather of atypical scenarios, such as the Gonzaga example above.

So sure, you can find some prop bets in your favorite offshore casino.  However I would like to see a little more variety of prop bets come March in Las Vegas.  While to most, 68 teams competing for one title is probably more than enough action to bet on, some times you want to put a few bucks down on something stupid and have a little fun.  Whats the harm in that?   Any interesting prop bets you would like to see?