Computer rankings via Google Prediction API

By: HokieV on August 5, 2012, 10:40 AM | 11 comments

For those that follow me on Twitter, you may have noticed this Tweet where I commented that I had an idea for a computer rankings system that uses the Google Prediction API.

Well, it turns out that that it wasn't all that difficult to implement. Using data and results from the last 8 college football season as training data, I used Google Prediction to rank last year's teams. Here is resulting top 10:

  1) Alabama                          0.9914
  2) Louisiana State                  0.9785
  3) Stanford                         0.9744
  4) Oregon                           0.9701
  5) Oklahoma State                   0.9701
  6) Wisconsin                        0.9700
  7) Boise State                      0.9528
  8) Oklahoma                         0.9274
  9) Houston                          0.9270
 10) Arkansas                         0.9142

It had Virginia Tech at 18th. This is a link to an Excel spreadsheet of the full 2011 rankings (with the teams' AP & Coaches poll rankings noted). 2011-rankings.xls

How does this work? The first thing I did was feed the results of the last 8 seasons into Google Prediction. For each game & team, I gave it the final score differential as well as a number "features". These features include things like points scored per game vs the opponent's points allowed per game, offensive yards per game vs the opponent's defensive yards allowed per game, the turnover margins of each team, and the winning percentage of each team. (This is a simplified explanation.)

After feeding Google all of that data and it training my model, it was then possible to use that model to predict the final score differential of a match-up between any two teams. To make a prediction, I send it the "features" of any two teams and it spits back a score differential.

In order to come up with rankings, I had Google predict the outcome of over 14 thousand games - as if every team played every other team twice (once at home, once on the road) - and the results column is the winning percentage based on that prediction.

The results are not perfect, but they're definitely not terrible, either. I was actually pretty surprised to see just how decent the results actually were. I think I've spent 10 hours working on this, and I've managed to put together a computer ranking system that's just as good as any out there. Thanks Google!

What's the future of this thing? I have a ton of code clean-up to do (this is a total mash up for shell scripts and perl scripts). I'd love to add more to the data model (annual football revenue, for one). I think 4 weeks into the 2012 season I'll start generating rankings, as well use it to predict outcomes of games.

Forums:

Football Forum

DISCLAIMER: Forum topics may not have been written or edited by The Key Play staff.