Contender’s Season 1 is over and the clock has run out for any comebacks. Here are the standings for our analysts by week, followed by our overall season weight if we were to make future contenders predictions, the statistical significance of not achieving higher rank in the competition and a general measure of average performance.
With that, how did our predictions perform? First, below is a comparison of our predictions rounded to ranges of 10% in confidence (for instance all games where we had the favorite between 50-60% are put in one bucket). The predictions are averaged for each bucket and actual results are average. A model is better if either a) it’s “lift” is higher (there is large segmentation of the actuals) or b) our predictions are well calibrated where probabilities match up with our predicted probabilities. For instance, if you always knew the outcome but you bet 60% each time you would have significant lift but very poor prediction calibration. Alternatively, you could have no lift at all like coin flip but the predictions are very reasonably calibrated (chance that a team will win a game removing any knowledge is perfectly 50%). The size of the bubbles represents how many games are in the bin.
The below lift chart is comparing the predictions that were on the website prior to the games and then actual occurrences. Our aggregate predictions had very strong lift but some weaknesses in calibration.
As can be seen, generally the favorite has won more often than our aggregate prediction showed.
Next is a demonstration to use up all the data we have to test predictive power of our analysts (in the previous example we were not using many games for many of the predictions to calculate weight of analysts). For the next graph, I calculated weight based on all games excluding the game being predicted which is a reasonable estimate of the potential of our groups predictions in the longer term.
As can be seen, this improves the calibration of the predictions to a reasonable degree although aggregate predictions were biased slightly towards conservatism.
Now I will show every analysts own prediction vs actual plots which will show the relative strength of those analysts predictions in the season. The big story is that Sideshow and Barroi were excellent at determining which teams were favored for a match but could have been more confident in the margins on the odds. Following them in 3rd and 4th place, myself (Eden) and Yiska were excellent at calibrating our predictions to actuals but were less astute at determining the favorite for matches. Those below us typically either were less able to identify the favorite or did a worse job at calibrating their predictions properly.