The Walking Dead in The Walking Dead

Me and my better half have been binge watching the walking dead recently and I figured that the show would be a nice topic for a blog post. Of course involving some form of data analysis. I am currently very much interested in text analysis, so I tried to hunt down the episode transcripts. I actually found them here. I was super psyched to find them in the format <character>: <text>, which would make it easy to put together a dataset of all lines of all characters throughout the show. I wrote some could for it (can be found on my github), but was quite disappointed when I realized that only the first season was in this format. For season 2-7 there is no chance to recover who said what. I was rather disappointed because the “coolest” thing I could do is count the number of lines of each character:

Continue reading

Blockvoting in the Eurovision Song Contest

The Eurovision Song Contest is one of, if not THE biggest international TV song competition in the world. Hundreds of millions of people tune in every year, for stunning voices, weird performances and unforgettable singers. Remember the Epic Sax Guy? He is back again this year!

The winner of the contest is selected by a positional voting system. The voting system has changed a lot over the years, but the general approach has always been the same. Each country awards points to their favorite 10 songs, which can not include the own song. The voting sometimes is considered a bit controversial, since one gets the impression that there is a lot of regional bloc voting going on. Be it Norway and Sweden exchanging 12 points or former soviet countries voting for Russia, anecdotal evidence is all over the place.

In this post, I want to explore, how much evidence the voting data from all ESC provides. You will see, that there is actually quite a lot! The voting data from 1958-2016 was scraped from eurovisionworld and was done in R. The code can be found on github.

Continue reading

The Myth of Club 27

The term club 27 refers to the observed phenomenon that famous musicians die at a higher rate at the age of 27. Jimi Hendrix, Janis Joplin, Kurt Cobain and Amy Winehouse to name just a few, are members of this questionable club. The media is going wild whenever a new famous person enters this mysterious club. But is there a (statistical) truth behind this? Do musicians really die at a higher rate at the age of 27?

Continue reading

VIP RIP: The high number of dead celebrities in 2016

Many people would agree that 2016 was a bad year. Especially the VIP death toll seems extraordinary high this year.
With the recent deaths of British singer George Michael and Princess Leia, Carrie Fisher, the year even seems to go with a blast. With data on celebrity death tolls, I want to test if the death rate really was higher, or if we just perceived it as such.

The data for this posts comes from Wikipedia’s lists of deaths by year. The structure of the monthly lists are equal starting 2004, so that I wrote a simple scraping function in R with the rvest package. The code is attached at the end of this post.

Continue reading

Angelique Kerber, No. 1 in women’s tennis…since weeks!

This Monday September 12th will be a historic day for German female tennis. Angelique Kerber will be the first German player since Steffi Graf in 1996 who is ranked number one in the WTA ranking.

Winning the Australian Open in the beginning of this year, reaching the final of Wimbledon and then winning the US Open,  one could definitely say that she finally deserves it.  I would even go a step further and say it is overdue for a few weeks! To “prove” this claim, I grabbed all WTA matches since 1968 (yeah, I know Angelique wasn’t even alive then) until 29 August 2016 from here and here and built my own women’s tennis ranking with the power of Google’s PageRank.

Continue reading

Page 1 of 2 >