Calculating Home Court Advantage in College Basketball

Home court advantage is a term often thrown around.  It is a common theme across all sports, with varying rates of its influence.  From a logical standpoint, some aspects make sense, not having to travel long distances (or cross time zones for that matter), having engaged fans rooting you on, and the comfort of being in a place you have often played before.  Books like Scorecasting suggest the possibility of other influences, particularly a bias in the referees to make calls that favor the home team.  Whatever the cause I want to explore the reach of home court advantage in college basketball.  While this has been done before, many times, I want to take this opportunity to exploit a powerful tool to help more accurately analyze true home court advantage.

One of the hurdles in calculating home court advantage in college basketball is the way teams schedule opponents.  Typically schools in the power conferences schedule a large percentage of exhibition type games with lesser opponents.  These games are almost never a home and away type setup.  If we were to calculate home court advantage using these games, we would get a lopsided result because the home team most likely always wins, and by a large margin.  For my calculation I want to restrict my data set to teams who play a home and away with each other in the same season.  Fortunately, conference play provides just this data.  The challenge lies in separating these games where two opponents play a home and away from all the others.  We will need some tools to assist us.

One common recurring theme when evaluating data is which tool is the best for the job?   Excel or its Open Office equivalents are often a good choice for tabular data, however so is mysql, or insert your favorite programming language.  I often find myself wanting to write queries against data in a delimited text file (csv), however I don’t want to layout a database schema, connect to a database, and perform inserts in order to do so.  It’s time prohibitive and tedious.  One tool I have found to be particularly helpful is a package called “Q – Text as Data“.  It is a simple command line utility that can run in Windows and Linux and let you query csv’s as if they were mysql tables, using the column headers as the table field names.

Calculating home court advantage in college basketball using Q

Calculating home court advantage in college basketball using Q

Back to our experiment of identifying games with a home and away within a single season.  Lets see how Q can help us.  I am starting with the following data set, which includes all games from 2005 to the 2015 seasons.  You can download it here:  2005-2015-scores.  Lets use some sql via Q to filter this file to the games we care about.  I will show the commands and then offer some explanation below.

q -H -d,
"SELECT AVG(a.teamscore - a.oppscore)
FROM 2005-2015.csv a
INNER JOIN(SELECT teamname, opponent, datestr, seasonyear, site, teamscore, oppscore from 2005-2015.csv WHERE (site = 'H' OR site = 'A')  group by teamname, opponent, seasonyear having count(*) = 2) b
ON a.teamname = b.teamname AND a.opponent = b.opponent and a.seasonyear = b.seasonyear and a.site='H'"

The output of the above, we see teams win by 3.53 points per game in the home leg of the home and away.  Conversely the visiting disadvantage can be calculated as follows:

q -H -d,
"SELECT AVG(a.teamscore - a.oppscore)
FROM 2005-2015.csv a
INNER JOIN(SELECT teamname, opponent, datestr, seasonyear, site, teamscore, oppscore from 2005-2015.csv WHERE (site = 'H' OR site = 'A')  group by teamname, opponent, seasonyear having count(*) = 2) b
ON a.teamname = b.teamname AND a.opponent = b.opponent and a.seasonyear = b.seasonyear and a.site='A'"

The output this time is -3.518, which represents the ppg the visitors lost by.  In this case the value of playing at home vs away is a swing of 7 points.  Home teams win by a margin of 3.53 ppg, while visiting teams lose at a margin of -3.518 ppg, so taking both of these into consideration we get a swing of 7 points (rounding to the nearest whole number).  That is our calculated home court advantage in college basketball.

Ok, so what did we just do?  You notice “q” is the name of the program running, we are passing a couple of parameters to it.
-H tells it to use the first row in the csv as the header, which translates to mysql column names.
-d, tells it that the text file we are passing in is comma delimited (defaults to pipe delimited).
the third option is the the sql to run, explained below.

Lets take a look at what this sql is doing, from the inside out.  You can see in our INNER JOIN we are grouping by teamname, opponent, and seasonyear which will isolate results for each combination of teams within a season.  We want only games where the site is ‘H’ or ‘A’, neutral games and semi-home or semi-away games are identified differently so we can rule out sites where there is not a true home court advantage.  We use only groups having exactly two games, where one game is home and one is away.  We do this to not include any additional times an opponent may have played, likely in the event of a tournament.  Next we join these group results with the original rows to return the original data set filtered to only include the games we care about.  From there we simply take the average of the score differential for each the home and away games to come up with our calculated home court advantage.

So there it is, we calculated a home court advantage of ~3.5 points per game, and a visitors disadvantage of ~-3.5 points per game, giving roughly a 7 point swing for non-neutral sites.  That is your calculated home court advantage in college basketball, courtesy of Q, which can give you an advantage in your analytics arsenal.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s