The Walking Dead in The Walking Dead

Me and my better half have been binge watching the walking dead recently and I figured that the show would be a nice topic for a blog post. Of course involving some form of data analysis. I am currently very much interested in text analysis, so I tried to hunt down the episode transcripts. I actually found them here. I was super psyched to find them in the format <character>: <text>, which would make it easy to put together a dataset of all lines of all characters throughout the show. I wrote some could for it (can be found on my github), but was quite disappointed when I realized that only the first season was in this format. For season 2-7 there is no chance to recover who said what. I was rather disappointed because the “coolest” thing I could do is count the number of lines of each character:

Continue reading

Blockvoting in the Eurovision Song Contest

The Eurovision Song Contest is one of, if not THE biggest international TV song competition in the world. Hundreds of millions of people tune in every year, for stunning voices, weird performances and unforgettable singers. Remember the Epic Sax Guy? He is back again this year!

The winner of the contest is selected by a positional voting system. The voting system has changed a lot over the years, but the general approach has always been the same. Each country awards points to their favorite 10 songs, which can not include the own song. The voting sometimes is considered a bit controversial, since one gets the impression that there is a lot of regional bloc voting going on. Be it Norway and Sweden exchanging 12 points or former soviet countries voting for Russia, anecdotal evidence is all over the place.

In this post, I want to explore, how much evidence the voting data from all ESC provides. You will see, that there is actually quite a lot! The voting data from 1958-2016 was scraped from eurovisionworld and was done in R. The code can be found on github.

Continue reading

Speeding up the Bradley Terry Model in R

I am currently developing my first R package which confronted me a lot with the question: “How can I speed up my code?”.

I did some “research” and read a lot of general articles about speeding up code, but also a few posts specifically about speeding up R code. While I mostly got the main points, I always found the example use cases slightly contrived. So I decided to write a little something about one of my use cases which includes many points that I think are important when trying to speed up your code. The method we want to implement in this post is the so called Bradley Terry Model. If you do not care about the theoretical part, you can jump directly to the implementation section.

Continue reading

The Lord of the Rings: The Three Networks

This post is inspired by the star wars social network.

I created the interaction networks of Lord of the Rings characters for all three movies based on the scripts I found online.  The networks capture the story line of all movies surprisingly well and might be a nice gimmick for all Lord of the Rings enthusiasts. Code and network files can be found on github.

I also created interactive versions of the networks where you can drag, click and hover and generally play around a bit. The links to the those versions are below the respective plots.

For those who are interested in technical details of the data extraction and analysis, head down to the Making of section. But let’s start with some visualizations of the networks for the three movies.

Continue reading