VIP RIP: The high number of dead celebrities in 2016

Many people would agree that 2016 was a bad year. Especially the VIP death toll seems extraordinary high this year.
With the recent deaths of British singer George Michael and Princess Leia, Carrie Fisher, the year even seems to go with a blast. With data on celebrity death tolls, I want to test if the death rate really was higher, or if we just perceived it as such.

The data for this posts comes from Wikipedia’s lists of deaths by year. The structure of the monthly lists are equal starting 2004, so that I wrote a simple scraping function in R with the rvest package. The code is attached at the end of this post.


deathtoll

2016 has seen 6483 important people pass away (so far!). This, in fact, is the highest recorded number on wikipedia, closely followed by 2014 with 6470 death VIP’s. The lower numbers for 2004 to around 2012 are most likely due to missing data since not every death might have been tracked by Wikipedia back then. Although 2016 has the highest death toll, it doesn’t seem to be out of the ordinary.

The lists of notable deaths on Wikipedia, however, includes far more than “just” actresses/actors and singers, the professions that presumably have seen an increased death toll.

deathtoll singers

If we just look at singers who died in 2016 we can observe a fair increase of deaths. David Bowie, Prince and George Michael have left earth together with 223 other singers. The second highest number is 2013 with 191 death singers.

death toll actors

The same can be observed for actresses and actors. 130 actresses and actors passed away in 2016. Before this year, the number has been quite stable around 100 since 2009.

So not only by perception but also supported by data, the year 2016 has seen an increase in deaths of famous actresses/actors and singers. Let’s hope the list will not be extended in the remaining few days.

R Code

The following code was used to extract the list of deaths by month and by year. The names of people are stored in <ul></ul> environemnts, starting with the third such element on each page. When the whole page is scraped, each day of a month is stored in a separate list element. Simply looping over the days of months then gives all names of people who died that specific day.

require(rvest)
month.names=format(ISOdate(2016,1:12,1),"%B")
years=2004:2016
month.days=c(31,28,31,30,31,30,31,31,30,31,30,31)
df.res=data.frame()
for(y in years){
  k=0
  for(m in month.names){
    print(m)
    k=k+1
    url=paste0("https://en.wikipedia.org/wiki/Deaths_in_",m,"_",y)    
    list.month<-
      read_html(url) %>% 
      html_nodes("ul")
    for(i in 1:month.days[k]){
      if(m=="December" & y==2016 & i>28){
        next()
      }
      list.day<-list.month%>%
        .[2+i] %>% 
        html_text() %>% 
        str_split_fixed("\n",n=Inf) %>% 
        c() %>% 
        str_replace_all("\\.\\[[0-9]*\\]","")
      df=data.frame(name=list.day,month=m,year=y,day=i)
      df.res=bind_rows(df.res,df)
    }
  }
}

Posted in: Data Analysis |

Tagged with:

Written by Dmathlete

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*