Me and my better half have been binge watching the walking dead recently and I figured that the show would be a nice topic for a blog post. Of course involving some form of data analysis. I am currently very much interested in text analysis, so I tried to hunt down the episode transcripts. I actually found them here. I was super psyched to find them in the format <character>: <text>, which would make it easy to put together a dataset of all lines of all characters throughout the show. I wrote some could for it (can be found on my github), but was quite disappointed when I realized that only the first season was in this format. For season 2-7 there is no chance to recover who said what. I was rather disappointed because the “coolest” thing I could do is count the number of lines of each character:

