Stalking plus-minus

I have always had a huge soft spot for top-down metrics in soccer. I wrote why at length elsewhere, but briefly, their central promise to rate players in a completely assumption-free fashion. This is in stark contrast with mainstream player analytics, which measure players’ contributions in terms of individual actions that they perform or assist. There are good reasons for the proliferation of bottom-up metrics, chief among them feasibility. Because goals are so rare in football and players are subsitituted infrequently, it is very difficult to tease apart the contributions of individual players without additional assumptions. In fact, every attempt at a plus-minus that has been written up in public consists at least in part of teeth-gnashing and garment-rending brought about by this central problem. Conversely, player ratings based on individual actions are easier to build, at least from the technical standpoint. What is less appreciated is that they come at the huge price of accepting, largely on faith, that these individual actions ultimately matter. Because of that, until +/- is conclusively proven dead, it is always going to merit another look.

This post contains no orginal research. I have merely collected my notes and links relating to +/- in soccer. Of course, the seminal contributions were made in basketball and ice hockey, but I do not review them here. I focus on adjusted +/-, that is where computing each player’s contribution is done while controlling for all other players involved, usually in a massive regression. The dependent variable in this regression is called the target; in the classical approach, the target is the goal difference.

prehistory

The beginnings of Jörg Seidel’s Goalimpact project, by far the best-established +/- model in soccer. The design of Goalimpact is not public, but it is known to target goal difference and, in the current version, incorporate player age curves.

2013

Dan Altman builds player Shapley values [archive.org link; see also a 2016 write-up and a high-level pitch], which are a close cousin of +/-. The target measure is the expected goal difference.

2014

Howard Hamilton describes his attempt at a classical +/-. Howard’s post is very valuable as a clear demonstration of the key technical issue with +/-, namely the (near-)singularity of the regression matrix, and how to overcome it with ridge regression.

Łukasz Szczepański experiments with +/- and concludes (in an unpublished report) that the classical approach is unworkable in soccer.

Matthias Kullowatz posts unadjusted +/- ratings for the 2014 season of the MLS.

2015

Martin Eastwood publishes his Bayesian take on the problem. The key idea is to inject extra information into the system by creating a prior on each player coefficient, presumably using on-the-ball event data.

2016

Will Gürpınar-Morgan tackles +/- head-on, using non-shot (i.e. play-level) expected goal difference as the target, regularising the estimates via ridge regression, and investigating the precision of the estimates by bootstrapping.

2017

Tarak Kharrat and colleagues develop three new +/- ratings, targeting actual goals, expected goals and expected points. The player ratings are regularised with ridge regression and the observations are weighted by time elapsed until the present to emphasise current player ability. As befits an academic work, the paper is a good source of references to previous +/- efforts.

2018

Steven Schultze and Christian-Matthias Wellbrock present an idiosyncratic +/- model in the Journal of Sports Analytics. Unlike almost all other models considered here, theirs does not estimate the ratings simultaneously for all players. Instead, bookmakers’ odds are used to obtain expected match outcomes, and calculations are done for each player separately. An additional novelty is valuing game-changing goals more than those that are not relevant to the game outcome.

2019

Lars Magnus Hvattum creates a series of videos about +/-, which I have yet to watch; and Garry Gelade models Lars’ ratings using on-the-ball event data, thus providing perhaps the first formal connection between top-down ratings and individual player actions.