Soccer Analytics Part 1: Mildly Interesting Statistics of 2013

In the summer term 2014, I participated in a seminar about Soccer Analytics. Imagine the fun of combining your research with your hobby! Anyways, i tried to use the seminar to improve my skills in scraping data from the web. Although my R code is the worst case of spaghetti code, I managed to get every match played in first divisions in 177 countries and in every continental competition from and The data set comprises results of 33884 games among 2398 teams world wide. Enough to go wild on analyses! This first Part mostly deals with semi interesting overall and country specific statistics. Future posts will deal with such exciting questions as “who was the best team in the world?” and “If A beats B and B beats C, does that mean A beats C?”

Most Common Scores and Goals per Game

A first question I am going to answer is “What are the most frequent results in soccer?” I did some quick googling and all I could find were common results in specific leagues. I am sure someone else has done a worldwide analysis before, but I couldn’t find anything. So I did some counting in my dataset. The figure below shows the 20 most common results in the season 2012/2013 in 177 countries.

Original size can be found here
So 1:0 and 1:1 are the most common results. Not very exciting but i guess that was to be expected. The two results add up to around 23% percent. Together with the next group of scores (0:0, 2:1, 0:1, 2:0, 1:2) they already cover 61% of all results. Interestingly, but also to be expected, the home win results are more frequent than their away win counter part (e.g. 2:1 and 1:2). I will deal with the home field advantage a bit later. The next figure shows the goals per game distribution.
Original size can be found here
Around 73% of all games end with 3 or less goals. The expected value of goals per game 2.62. But let us focus a bit more on the extreme values of this distribution. 0.2% of all games end with more than 10 goals. There are even two games where more than 20 goals were scored:

  • March 23: Popua-Lotoha’apai United 0:20 [Tonga]
  • June 12: Police-Friend Development 19:3 [Laos]

The zero-zero coefficient

Yes it is by far the most annoying result in soccer. Especially when you witness it live in the stadium. Years ago i had a discussion with a friend, that there should be something like a “zero-zero”-coefficient for each team. So if two teams with a high zero-zero coefficient play against each other, you better not be in the stadium! Although my dataset is large, it is not big enough to derive a meaningful zero-zero coefficient for each team. However, I can at least state which countries have the most and the least number of goalless games, i.e. a high and low zero-zero coefficient respectively . 
Top 3 countries with most 0:0
  1. Kenya [240 games 21.3% goalless]
  2. Benin [182 games 19.8% goalless]
  3. Guinea [132 games 19.7% goalless]
Interestingly, all three are African countries. More strikingly, from the top 10 countries, 8 are African.
Top 3 countries with least 0:0
  1. French Polynesia [75 games 0% goalless]
  2. American Samoa [45 games 0% goalless]
  3. Cook Islands [40 games 0% goalless]
Again we see a strong “continental trend”. All three countries are oceanic. They are followed by two others, the Solomon Islands and New Zealand. 
The continental trend observed in both lists will be explored in the next section a bit more thoroughly.

Mean Goals per Game in Different Countries

We saw in the last section, that there seems to be a continental trend when it comes to goalless games. The question is, does this hold in general, that is can we find general trends in goals per game? Looking at the following map, the answer is yes.
Original size can be found here
One can clearly see, that the mean number of goals per game is generally much lower in Africa compared to the rest of the world. A bit surprising, at least for me, are the low values in South America. An explanation might be, that on both continents the focus lies more on a good defense, or as a German would say: “Die Null muss stehen!” is their motto. In Asia and North America, scoring goals has a higher priority. That is a 4:3 is always better then a 1:0. Although goalkeepers might argue against that reasoning. As in the last section, let’s look at the extreme ends.
Top 3 countries with highest mean goals per game
  1. Tonga 6.625
  2. Cook Islands 6.275
  3. Laos 5.41
Top 3 countries with lowest mean goals per game
  1. Haiti 1.66
  2. Benin 1.67
  3. Burkina Faso 1.71

Home/Away Wins

Home field advantage plays a big role in soccer. There is a lot of research on that topic which can be found by simply googling for “home field advantage soccer”. So i will refrain from  philosophizing about it in depth. I was just curious about the differences in the fraction of home/away wins around the world. The first plot shows the home win fraction vs. the away win fraction. I was interested if there are countries, where more away wins occur than home wins. And indeed there are quite a few!

Original size can be found here

In total there are 19 countries where more away than home wins occurred:
Liberia, Swaziland, Bahrain, Latvia, Liechtenstein, Lithuania, Malta, San Marino, Syria,  Cambodia,  Cook Islands, Fiji, French Guiana, Saint Helena, Sao Tome and Principe, Solomon Islands, Somalia, Tonga, Trinidad and Tobago. So what do these countries have in common that could explain this phenomenon? (Besides being small countries  maybe) 


The thing that distinguishes soccer from many other sports are the draws. They can be a pain in the ass if you try to come up with sophisticated ranking methods in soccer (more on that in a future post). The following map shows the fraction of draws around the world.
Original size can be found here
The top three in this category are Morocco Kenya and Egypt where around 40% of the games end in a draw.

What do we learn?

There are some significant differences on a continental levels when it comes to soccer results. Especially the soccer played in Africa seems to deviate strongly from the rest of the world. It has a high frequency of 0:0, a low average number of goals per game and in general a high fraction of games that end in a draw. So Africa seems to be a paradise for defense enthusiasts!

As I said, these are just some mildly interesting statistics of the dataset. If anyone has further suggestions on what to add here, feel free to tell me in the comments.

Oh yeah and i guess it is quite obvious that i like xkcd.

Posted in: Soccer Analytics

Written by Dmathlete

3 Responses to Soccer Analytics Part 1: Mildly Interesting Statistics of 2013

  1. Anonymous says:

    also die Definition des 0:0 Koeffizienten zieht fuer mich gleich eine Reihe von Fragen nach sich:
    1) wurden denn Laenderspiele ausgewertet oder sind das akkumulierte Zahlen die aus Ligadaten berechnet wurden?
    2) wenn bei 1) letzteres zutrifft muesste man als naechstes den 0:0 Koeffizienten in Relation mit der Auslaenderquote der jeweiligen Liga setzen um praediktivere laenderabhaengige Aussagen treffen zu koennen. Gibt es da schon Ansaetze?
    3) auch interessant waere:die zeitliche Entwicklung des 0:0 Koeffizienten – zB wuerde ich mir heute ein Spiel Wolfsburg – Augsburg deutlich lieber anschauen als noch vor zehn Jahren (womit ich jetzt quasi schon einen expliziten Zusammenhang zwischen Attraktivitaet des Spiels und 0:0 Koeffzient hergestellt habe was natuerlich noch genauer untersucht werden muesste …)

    Beste Gruesse,
    Dipl.-Fussb. Exp. Ulf K.

  2. dmathlete says:

    1) Länder spiele wurden nicht berücksichtigt. Sind nur Ligaspiele.
    2) Guter Einwand! Ich denke gerade darüber nach, den 0:0 koeffizient wirklich zu entwickeln. Allerdings wirklich nur Mannschafts Namen abhängig. Ich hab auch schon darüber nachgedacht eine "Zeit" Variable einzubauen, denn wie du richtig sagst, ist Augsburg-Wolfsburg deutlich attraktiver geworden

  3. Anonymous says:

    ist auch noch nicht wirklich klar was der 0:0-Koeffizient wirklich aussagen soll…
    Meiner Meinung muesste das ein vektorielles, mannschaftsabhaengiges Mass sein das von Mannschaftsstaerke, Gegner usw abhaengt. Danach kann man eine Art Bilinearform einfuehren die mir dann angewandt auf beide Mannschaftsvektoren eine Wahrscheinlichkeit fuer eine bestimmte Anzahl Tore ausgibt…

Leave a Reply

Your email address will not be published. Required fields are marked *