So far in this series we have seen why population doesn’t matter much for national soccer success, and then ranked the countries by their calculated average soccer ability. To conclude this series, I’ll present three things that do matter. The epistemic status of these ranges from “makes sense” to “wild speculation”. In other words: if you agree with me – I’ll take the praise; if you don’t – this is all just meant to stimulate discussion.
The hot new release in macroeconomics literature (yes, that’s a thing) is Hive Mind by Garrett Jones. Jones argues that the average IQ of a nation’s citizens affects their prosperity much more than individual smarts. The argument is that a modern economy requires collective intelligence, allowing all participants to learn, cooperate and benefit from each other. I propose that collective soccer ability can help explain why average skill trumps population size in determining national outcomes.
Weightlifting and sprinting can probably be practiced in isolation, but a soccer game requires dealing with 21 other players at all times. Soccer players improve at the level of their teammates and competition.
As we saw in part 1, small changes in average skill create large difference in the number of extreme performers. Let’s say that Lilliputans are a tiny bit better at soccer, on average, than Blefuscans are. At 5 years old, a Lilliputan of exceptional talent will play against the kids on his block which are almost the same level in each country. At age 8, his small town will have more decent players to challenge him than a similar town in Blefuscu. By the time he’s 12 and playing for a top youth team, the Lilliputan will find a lot more opponents high enough on the bell curve to hone his skill against, creating a large gap in soccer level.
In a real life example, the 1987-born cohort of Barcelona FC’s legendary youth academy had Lionel Messi, Cesc Fàbregas and Gerard Piqué playing together since they were 12 (joined by Pedro a few years later). Looking through the biographies of top players, it’s almost impossible to find any that weren’t playing against world class competition by age 14 at the latest (many start competing seriously around age 7-8). In contrast, even if China produced 20 children with enormous soccer potential in a generation, and even if they devoted themselves to soccer, it would be hard for them to improve as fast. The level of competition in their home locales is likely too low, while the size of China and the lack of specialized infrastructure makes it hard to concentrate them in a few super-academies.
I often hear discussions about whether some great American athlete such as Adrian Peterson could be great at soccer. If he stayed for elementary school in Palestine, Texas, a city of 18,712 that is very unlikely to contain even one other soccer player of similar talent, he could not have.
This is the section where you’ll learn something about math, not sure if you will learn anything about soccer.
Instead of correlating soccer level with a hundred different country variables and finding spurious jelly beans, I restricted myself to testing three hypotheses regarding the effect of the national economy on soccer:
- GDP per capita will increase soccer level because richer countries can afford better sports infrastructure.
- Government spending as % of GDP will increase soccer level because there’s no private profit in youth sports but it can be pushed by a big government.
- Unemployment rate will increase soccer level, because if it’s hard to find a job you may as well play soccer.
With data from Wikipedia and the IMF, I ran a regression with all three variables and the regions of the world.
2 out of 3 on what were pretty wild guesses! I’m ready to accept my PhD in macroeconomics. Richer countries and those with bigger governments do in fact get better at soccer, unemployment showed an effect in the predicted direction but it wasn’t significant.
Here’s the stats-geeky part of today’s show: it seemed interesting to combine the two significant variables into one – government spending per capita. That variable by itself has a strong positive correlation with soccer level, but when I added it to the regression it showed as insignificant, and suddenly so did GDP per capita! What happened to turn a good variable bad?
When you add a variable that is a product of two others to a regression, it measures the effect of the interaction between the two, not a separate factor. A high coefficient means that the effect of the first variable (i.e. GDP) increases when the second variable (Gov. spending) is high and decreases when it is low, and vice versa. This doesn’t happen here: richer countries are better by the same amount regardless of the size of their government.
Why did GDP per capita become insignificant? There aren’t huge differences in government spending between countries, it’s between 25%-42% for the majority. GDP numbers, however, are all over the map: a quarter of the countries are in the $300-$2000 range and another quarter are between $20,000-$150,000. A Liechtensteinian produces more in a day than a Malawian in a year. This caused the variation in government spend per capita to be almost wholly dependent on GDP per capita, the more volatile factor. The correlation between the two is 0.95.
A linear regression doesn’t handle well two predictors that correlate so closely, since it doesn’t know which one actually affects the result. Example: sex and drugs could both correlate with rock n’ roll. However, everyone who does sex also does drugs. If you correlated rock n’ roll on both sex and drugs, you wouldn’t know if both sex and drugs caused rock n’ roll or if for example drugs caused extreme rock n’ roll and sex ameliorated the effect a bit. Since sex and drugs go together, either one, or both could be the cause.
A problem with linear regression is that it’s just so… linear. It doesn’t account well for any other types of relationships, such as threshold effects. Perhaps rock n’ roll only kicks in for a certain dose of drugs? A fun way to explore other effect is with a decision tree model, each node is a yes/no question, the path along the nodes from the root leads to a prediction of the category or variable. Here’s a tree generated with a machine learning package in R, the numbers in the bottom nodes represent the best guess of the soccer level (which mostly ranges from ±1,000):
For each of the six extreme nodes I looked at the countries that fit the macroeconomic profile and tried to find commonalities between them. Being rich is good for soccer, especially if your pockets are full of a currency that isn’t Euros. If you do pay in Euros, you want to play soccer on a sunny Mediterranean beach. On the other side: being broke, scorching hot or having now or recently been under the thumb of communist dictators makes you bad at soccer. Basically if you’re wealthy, free and the weather is nice everything seems to be going for you, including soccer. If you live in a generally sucky place, soccer is no solace. Life just ain’t fair.
A commenter on my last post claimed that “talent” is a myth and that soccer is all about hard work. I refrained from asking him if this view extends to other sports, perhaps LeBron James’ diligence made him 6’8″ with a 40+ inch vertical jump? Endurance, agility, speed, strength, quickness, balance and accuracy are all critical in soccer and are to significant degrees inborn. The most naturally gifted players train as fanatically as everyone else. The only consolation for those that rail against congenital advantages has been that at least, unlike most other sports, short people are better at soccer. If only that were true.
Average male height correlates with national soccer level ability at 0.32, higher than 0.2 for both GDP and government spending and as high any single factor is likely to get for such a noisy measure of such a complex phenomenon.
Everyone’s favorite exception, Lionel Messi, does in fact prove the rule more than he refutes it. Messi is 170 cm tall (5’7″), barely an inch shorter than the median Argentinian. His teammates on the Argentine national team average 181.1 cm, a full 3 inches above their compatriots. Like the Little Corporal (who was shorter than Messi but taller than the average Frenchman of the time), Messi only appears short next to other soccer players. The players at the 2014 World Cup averaged 181.3 cm. The tallest in the tournament were the Germans, who ended up lifting the cup to their full 185.4 (6’1″) stature.
Messi’s own career was in danger when he was diagnosed with growth hormone deficiency and the Argentine clubs refused to pay for the drug. Barcelona got Leo two years of growth hormone (HGH) treatment which literally made him tall enough to play soccer, along with possibly leading to increased strength, motor development and reduced body fat. Perhaps what contributes to soccer ability isn’t height but rather HGH with all its other benefits, and height is just a measurable proxy. I couldn’t find data on international variation in HGH levels, but it definitely offers a huge boost to athletic performance (and as a result is a popular doping agent in various sports).
Soccer is a sport for short people, for values of “short” equal to “actually tall, but not freakishly tall”.
So what can China do to lift a World Cup? The easiest way is to bribe Sepp Blatter, of course. With that in mind, other options are on the table: China can send the most promising players out of kindergarten to a few strong youth academies built in subtropical beachfront towns with money that isn’t denominated in Euros. Or, it may be easier to just pump them all full of Chinese HGH and set them loose on an unsuspecting soccer world.
Next post is up, my apologies to anyone who thought that this is a soccer blog ;-)
8 thoughts on “The Rich, the Tall and the Bees – 3 Soccer Stories”
From a mathematical pov your ideas might be really interesting, but as a european mildly interested in football I would recommend some reconsideration, especially for the conclusions offered in ‘The Rich’:
‘Being rich is good for soccer,…’
While this might be true to some extend it can’t be more than a rule of the thumb. If you look at the recent FIFA Top 20 Ranking
You will see, that among those Teams are those from Colombia, Uruguay, Romania, Croatia and Bosnia and Herzegovina, nations that I don’t really view as rich countries, and even Spain and Portugal have seen better times. And don’t let us forget, that the Brazilian football had it’s golden era in the 1950’s and 1960’s when Brazil’s economy wasn’t really thriving.
On the other hand did nations as Japan or the states on the Arabian Peninsula never have a realistic chance to win the World Championship.
But yeah, being rich *is* good for football, as long as you don’t overestimate this factor I will agree with you.
But your sentence goes on:
‘especially if your pockets are full of a currency that isn’t Euros. If you do pay in Euros, you want to play soccer on a sunny Mediterranean beach.’
No, simply no. Let’s have another look at the FIFA Top 20 Ranking. Here we can see, that 7 out of this 20 Teams get paid in Euro, only 2 of which adjoin the Mediterranean (Italy and Spain) while 3 adjoin the North Sea (Belgium, Netherlands and Germany), Portugal adjoins the Atlantic and Austria is landlocked.
Actually, if you really want to be successful in football make sure that your pockets aren’t full of Dollar (unless you are female), Riyal, Rial, Dinar, Dirham or Yen. Paying in Euro on the other hand is totally fine.
Btw is France in your tree on Macroeconomic Indicators etc. listed as a Northern European country, I wouldn’t agree with that either.
‘On the other side: being broke, scorching hot or having now or recently been under the thumb of communist dictators makes you bad at soccer.’
Well, sure, as always in life being broke is a really big issue if you want to succeed, no surprise here, so I totally agree with you on that point.
And while I learned during the last World Championship Tournement that Brazil can be really filthy hot I even agree with the point about the heat, you really need to get rid of the body heat while being playing football, otherwise you’ll gonna collaps at one point or the other.
But the thing about having been under a communist dictator’s thumb is in this form simply not true. The FIFA Top 20 Ranking again:
From this list former communist states have been:
– Czech Republic (as part of the Czechoslovakian Socialist Republic)
– Croatia (as part of the Socialist Federal Republic of Yugoslavia)
– Bosnia and Herzegovina (ditto)
Hungary had it’s best team in the 1950’s, when it’s been communist, Russia has a quite decent team, Poland too, and every know and then Bulgaria has a run.
You don’t even have to be free to be successful, the fascist Italy e.g. won the World Championship two times in a row, when Brazil won the Tournement in 1970 it was ruled by a military junta at that time, as was Argentina, when it won the Championship in 1978.
What you didn’t take into account while developing your ideas, is tradition.
Take a look at the teams that parcitipated in the 1908 Olympic Football Tournament
Bohemia is nowadays known as Czech Republic, Great Britain was split up in England, Scottland, Wales and Northern Ireland.
Now the Olympics 1912:
At this point only Brazil and Colombia are missing from the most recent FIFA Top 20 Ranking, if you count countries that once have been part of bigger entities, such as Croatia, Wales or England. And while Brazil will enter with the first ever World Championship Tournament in 1930, Colombia is the exception, that didn’t gain any importance until after WW2. But this is, of course, still more time to develop tradition, and grow structures, than any non-European or -Southern American nation could dream of.
Anyway, let’s have a look on some further facts:
FIFA World Ranking Leaders, you find the graph if you scroll down the page with the FIFA World Ranking. Each and every of the leaders since the introduction in 1993 participated in at least 1 important international tournament until 1930, and if you scroll even further down you can see, that the same is true even for the teams in 2nd and 3rd place.
And now take this link and scroll down to ‘Results’:
You can see, that even here the teams on the places 1 through 4 are all participants of at least 1 tournament until 1930, with 3 exceptions:
– Poland, 3rd in 1974 and 1982 (Communist at that time, scnr), but they qualified for and parcipitated in the 1938 Championship
– Bulgaria, 4th in 1994 (former communist state, btw.), have been invited to the 1930 Championship, but the players couldn’t afford the voyage
– South Korea, 4th in 2002, on their home turf. Their football association wasn’t founded and recognized until 1948, but on the other hand Korea had been occupied by Japan from 1910 – 1945, so kudos to them, they joined as soon as possible. And went to the 1948 Olympic games, the first international football tournament after WW2. And had more than 50 years to develop tradition and whatever comes with it until their biggest success to date.
So, yeah, those exceptions even seem to prove my point.
The only question left unanswered is, what happend to Finnland, Norway, Luxemburg and Egypt?
– Finnland never qualified for either European or World Championships. They prefer wintersports, and they are quite good at it. No surprise, since Finnland is almost at the north pole.
– Norway is also very much into winter sports, but they also qualified for a World Cup from time to time, once they even reached #9 in the FIFA Ranking.
– Luxemburg. Do they even have enough young people to build a football team?
– Egypt ist the most successful African national team, football is very important in Egypt.
So, once again, point proven.
Blastmeister, you blew me away!
To clarify, the goal of my blog is to use math to see some things that no one has thought of before. I don’t know if anyone has though to look not at absolute rankings but rankings adjusted for normalized population, whether anyone correlated soccer success with government spending or height etc. I picked those variable because they are original, not because I think I can reduce soccer to an equation. This is exploration! When I write “full of a currency that isn’t Euros” I mean to say “hmm, it’s curious that countries outside Europe seem to do slightly better when adjusting for population based on this super simplistic model, I wonder if there’s anything to it” and not “Euros make you suck at soccer, QED”.
Bottom line: I know something about teasing cool nuggets from data using statistics, I don’t know that much about soccer per se. 99% of the rest of this blog will not be about soccer, but about putting a number on something else :)
I leave the actual soccer theory to my educated readers, you obviously know a lot more and I’m always happy to learn!
Don’t worry, as I said above, I get that you took a mathmatical pov, and I also get that your ideas are in this regard really interesting. While I must admit that I’m not a mathmatician, which could cause plenty of misunderstanding on its own right, I *did* read your statements with lots of interest and even fascination. You can tell by the fact that I commented on the last part of your series. And if I was a smily using guy I would have used tons of them in my comments. Well, maybe I should have, because since English isn’t really my mother tongue it’s a good possibility that my answer to your ideas sounded much stronger than intended.
And since I’m kind of an amateur historian my intention was more like ‘Explaining why some football national teams suck more than others? Challenge accepted!’ I must admit that I’m not much of a football expert really, in this regard I’m the black sheep of my family, I just remembered all those hours, when my brothers tried to teach me something about football history, and I had to fact check a lot.
Concerning the Euro thing: I understood that you meant that it is preferable, and not mandatory, not to get paid in Euro to be really good at football, but I don’t understand how you could go this strong against your own data. You know, if I read your chart correctly 2 out of the 3 nations that outperform the expectations in their football affinity more than any other nation in the world are Germany and the Netherlands, 2 nations that pay their bills in Euro but see the Mediterranean only in their holidays. Even Portugal is outstanding, and the only Euro member from the Mediterranean better than those 3 nations is Spain while all 3 do better than Italy and Greece. And if you count France as a Mediterranean nation, although their most successful teams play mostly nearer to the Atlantic or the Alpes, you can see that even Belgium is higher above the expectation than they are.
For me this information doesn’t lead to the conclusion that you have a better chance to excel at football if you don’t get paid in Euro unless you play at a sunny Mediterranean beach. (But maybe I get your chart wrong) And I don’t get in which regard countries outside Europe do better when adjusting for population. Didn’t you just write that the correlation between population and football ranking is -0,002, meaning that there is no correlation? Did I get something wrong? This thing alone would have been reason enough to look for a better explanation. But I’m willing to learn if I got something mistaken here.
And in this case history gives imo a better explanation than (your) math :) (<– Smily!) But btw, why wouldn't you use tradition, maybe constructed out of years since founding of the national football association and international matches played or something on that line, as a new variable for your regressions? Just saying.
Well, I wish you all the best, and hope I didn't cause some new misunderstandings in this post. Maybe I will visit this blog again, this time I just followed a link from Slate Star Codex. You see, I'm really not that much into sports.
Could you post the code you used to create that decision tree with rpart?