The Walking Dead in The Walking Dead

Me and my better half have been binge watching the walking dead recently and I figured that the show would be a nice topic for a blog post. Of course involving some form of data analysis. I am currently very much interested in text analysis, so I tried to hunt down the episode transcripts. I actually found them here. I was super psyched to find them in the format <character>: <text>, which would make it easy to put together a dataset of all lines of all characters throughout the show. I wrote some could for it (can be found on my github), but was quite disappointed when I realized that only the first season was in this format. For season 2-7 there is no chance to recover who said what. I was rather disappointed because the “coolest” thing I could do is count the number of lines of each character:

Continue reading

Blockvoting in the Eurovision Song Contest

The Eurovision Song Contest is one of, if not THE biggest international TV song competition in the world. Hundreds of millions of people tune in every year, for stunning voices, weird performances and unforgettable singers. Remember the Epic Sax Guy? He is back again this year!

The winner of the contest is selected by a positional voting system. The voting system has changed a lot over the years, but the general approach has always been the same. Each country awards points to their favorite 10 songs, which can not include the own song. The voting sometimes is considered a bit controversial, since one gets the impression that there is a lot of regional bloc voting going on. Be it Norway and Sweden exchanging 12 points or former soviet countries voting for Russia, anecdotal evidence is all over the place.

In this post, I want to explore, how much evidence the voting data from all ESC provides. You will see, that there is actually quite a lot! The voting data from 1958-2016 was scraped from eurovisionworld and was done in R. The code can be found on github.

Continue reading

Speeding up the Bradley Terry Model in R

I am currently developing my first R package which confronted me a lot with the question: “How can I speed up my code?”.

I did some “research” and read a lot of general articles about speeding up code, but also a few posts specifically about speeding up R code. While I mostly got the main points, I always found the example use cases slightly contrived. So I decided to write a little something about one of my use cases which includes many points that I think are important when trying to speed up your code. The method we want to implement in this post is the so called Bradley Terry Model. If you do not care about the theoretical part, you can jump directly to the implementation section.

Continue reading

The Myth of Club 27

The term club 27 refers to the observed phenomenon that famous musicians die at a higher rate at the age of 27. Jimi Hendrix, Janis Joplin, Kurt Cobain and Amy Winehouse to name just a few, are members of this questionable club. The media is going wild whenever a new famous person enters this mysterious club. But is there a (statistical) truth behind this? Do musicians really die at a higher rate at the age of 27?

Continue reading

VIP RIP: The high number of dead celebrities in 2016

Many people would agree that 2016 was a bad year. Especially the VIP death toll seems extraordinary high this year.
With the recent deaths of British singer George Michael and Princess Leia, Carrie Fisher, the year even seems to go with a blast. With data on celebrity death tolls, I want to test if the death rate really was higher, or if we just perceived it as such.

The data for this posts comes from Wikipedia’s lists of deaths by year. The structure of the monthly lists are equal starting 2004, so that I wrote a simple scraping function in R with the rvest package. The code is attached at the end of this post.

Continue reading

Page 1 of 5 >