The popularity of expected goals in football has grown hugely in recent years. It has represented a big step for data analytics in football and has even broken into some mainstream analysis on sky sports and match of the day. There are some simple ways to expand on it’s current usage.
It’s typically used in terms of per 90 figures for players. The number is calculated by dividing the expected goals accumulated across all games by the number of 90 minutes completed. For example, Gabriel Jesus leads the Premier League this year in terms of non-penalty expected goals (npxG) per 90 with a value of 0.83. This means that we would have expected Jesus to score 0.83 goals per game based on the chances that he has accumulated so far this season.
Questions begin to arise when we think about what this means for future performance, or the goal scoring quality of a player. Would we expect Jesus to score 0.83 goals per game in the near future? Does this mean that he’s currently the best goalscorer in the league? Sample sizes in football are usually small and the picture is constantly shifting. This can make it difficult to draw strong conclusions with data. Expected goals per 90 can play a part in answering these questions, but there are plenty of other things to consider in terms of the data.
What makes up the 0.83npxG per 90 for Gabriel Jesus? Is he consistently racking up ~0.83 per game? Are there outlying games where he performed brilliantly masking a number of poor games? Breaking down the expected goals number to a game by game level with a simple boxplot can give us more insight. The following graph shows the distribution of expected goals per 90 on a game by game basis in the premier league this season. This gives an idea of what makes up the per 90 figure.
This graph ranks the Premier League leaders in npxG per 90 from top to bottom. The orange section for each player contains the middle 50% of npxG per 90 values by game. The whiskers on the boxplot stretch to the max and min games excluding any outliers. The point at which the colour shifts from dark orange to light orange represents the median value. Each dot represents a single game.
I’ve filtered out any games with less than 30 minutes to avoid any large skews in the data. It would be better if every data point contained equal minutes played here, but it still serves some indication of what makes up the per 90 figures.
We can see straight away that Jesus is very consistent. He seems to get very good chances in nearly every game that he plays. Tammy Abraham’s distribution also seems very encouraging. He has been a regular starter for Chelsea this year and is consistently getting high quality goal scoring chances across games.
In contrast, Dominic Calvert Lewin has a very large range of values. His middle 50% of games stretch significantly lower than the players around him. His overall npxG per 90 is being dragged up by 4 very good games, including one very large value of 1.8 (Matchweek 7 vs Manchester City). This would lead me to be a bit more skeptical about Calvert Lewin’s npxG per 90. Is it realistic to think that he can maintain 0.58 npxG per 90? Or is this number reliant on some performances that are not likely to be reproducible? There isn’t an obvious answer to this question, but examining the distribution can offer more insight for small samples. A bit of a fall off from 0.58npxG could still be a decent number too.
There are plenty of other aspects to consider: team strength, team style, manager changes (Ancelotti vs Silva here), age of player, typical opposition strength for each player etc. Calvert Lewin has certainly improved this year but I would still like to see an overall improvement in consistency to make strong conclusions. Overall, I think we should always be asking if the sample of games is representative enough to make conclusions about player quality.
We can also look at the distributions for creative players. The following graph ranks the Premier League leaders in xG assisted per 90 from top to bottom.
Kevin De Bruyne shows an unrivalled level of consistency, as well as the highest overall xG assisted per 90.
The same logic can be applied at a team level. The following graph shows the breakdown of expected goals per game by Premier League teams this year. Teams are ranked from top to bottom in terms of expected goals per 90.
The distributions here can also offer more insight. Newcastle very seldom create a lot of chances, but they have had an odd couple of games where they have created a lot. Manchester City completely annihilated Watford in week 6. Leicester and Spurs have ranged from very dangerous in front of goal to completely toothless throughout the season. The same can be applied to expected goals against, or expected goal difference by game:
Wolves are incredibly consistent at being a little bit better than nearly every team that they play.
The data for this analysis came from fbref, who have a lot of useful statistics on European football. https://fbref.com/en/comps/9/Premier-League-Stats
They currently round their expected goal values to the nearest 0.1, so this analysis could offer additional insight with more accurate values. I’ve added some tableau dashboards with all graphs, where you can filter on different premier league teams and players (may not work well on mobile). I’ll add more players to these dashboards in time. https://public.tableau.com/profile/eoin.o.brien#!/vizhome/DistributionofxGandxAper90byplayer/Dashboard1
There isn’t anything too complicated about this analysis, but it’s not something that I see considered very often. Hopefully this can give people a push to think more deeply about averages, what they can tell us and the additional context that we can add.