Sentiment Analysis of Red Hot Chili Peppers Lyrics Using Tidytext

Last week, I finished been reading ‘Scar Tissue’, Anthony Kiedis’ autobiography. The book details his life and the many years he has been involved with the RHCP. Keidis has lived a life worth telling in the memoir. Constant recollection of his life journeys are spilled into a 400+ page book that does not dissapoint. After completing the tell all story I decided to take a data perspective on the RHCP. After their succesful album, ‘Blood Sugar Sex Magik’, lead guitarist John Frusciante left the band due to the overwhelming popularity and among other issues. Replacing Frusciante with Dave Navarro in 1992 the RHCP created ‘One Hot Minute’. Although the album went platinum it was not as successful as the earlier title. Frusciante re-joined the RHCP in 1998 and they released ‘Californication’. The RHCP style in ‘One Hot Minute’ vs ‘Blood Sugar Sex Magik’ is stated to contain darker subject matter, which is credited to the addition of Navarro. Creating a sentiment analysis, we will compare the albums lyrics.

Getting the Data by scraping RHCP lyrics

To gather the lyrical data we will need to scrape the lyrics using rvest.

library(knitr)
library(rvest)        
library(tidyr)         
library(tidytext)      
library(wordcloud)
library(XML)      
library(tidyverse)

poe <- ('https://genius.com/Red-hot-chili-peppers-the-power-of-equality-lyrics')
poe_html <- read_html(poe)
  poe_lyrics <- poe_html %>%
    html_nodes("p") %>%
    html_text()
poe_lyric_df <- data.frame(line = 1:1, text = poe_lyrics)
poe_lyric_df$text <- as.character(poe_lyric_df$text)
  
poe <- poe_lyric_df %>%
  unnest_tokens(word, text)
  
  blood_sugar <- blood_sugar %>%
  anti_join(stop_words) %>%
  filter(!grepl('[0-9]', word), word != 'verse', word != 'hook', word != 'song', word != 'album', word != 'anthony')

To extract the lyrics we can use the format above for each url lyric or use purrr for writing a function by album using map_chr function to transform the input into a list or data frame (This is by far the most efficient route).

Now we have the lyrics for ‘Blood Sugar Sex Magik’ and can transfer them using the tidy text format.

Now we can do the same for ‘One Hot Minute’

Once we have the lyrics for ‘One Hot Minute’ we transfer them using the same tidy text format.

one_minute <- one_minute %>%
  anti_join(stop_words) %>%
  filter(!grepl('[0-9]', word), word != 'verse', word != 'chorus', word != 'song', word != 'album', word != 'red', word != 'hot', 
  word != 'peppers', word != 'chili', word != 'https', word != 'lyrics', word != 'genius.com')
  
one_minute <- bind_rows(mutate(warped, album = "One Hot Minute", song = "Warped"),
                       mutate(aeroplane, album = "One Hot Minute", song = "Aeroplane"),
                       mutate(deep_kick, album = "One Hot Minute", song = "Deep Kick"),
                       mutate(my_friends, album = "One Hot Minute", song = "My Friends"),
                       mutate(coffee_shop, album = "One Hot Minute", song = "Coffee Shop"),
                       mutate(pea, album = "One Hot Minute", song = "Pea"),
                       mutate(one_big_mob, album = "One Hot Minute", song = "One Big Mob"),
                       mutate(walkabout, album = "One Hot Minute", song = "Walkabout"),
                       mutate(tearjerker, album = "One Hot Minute", song = "Tearjerker"),
                       mutate(one_hot_m, album = "One Hot Minute", song = "One Hot Minute"),
                       mutate(falling_into_grace, album = "One Hot Minute", song = "Falling Into Grace"),
                       mutate(shallow, album = "One Hot Minute", song = "Shallow"),
                       mutate(transcending, album = "One Hot Minute", song = "Transcending")) %>%
                       unnest_tokens(word, text)

Frequency of Lyrics between albums

  library(stringr)

frequency <- bind_rows(mutate(one_minute, album = "One Hot Minute"), 
                       mutate(blood_sugar, album = "Blood Sugar Sex Magik")) %>% 
  mutate(word = str_extract(word, "[a-z']+")) %>%
  count(album, word) %>%
  group_by(album) %>%
  mutate(proportion = n / sum(n)) %>% 
  select(-n) %>% 
  spread(album, proportion) %>% 
  gather(album, proportion, `One Hot Minute`) %>%
  na.omit()

library(scales)

ggplot(frequency, aes(x = proportion, y = `Blood Sugar Sex Magik`, color = abs(`Blood Sugar Sex Magik` - proportion))) +
  geom_abline(color = "gray40", lty = 2) +
  geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
  geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
  scale_x_log10(labels = percent_format()) +
  scale_y_log10(labels = percent_format()) +
  scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
  facet_wrap(~album, ncol = 2) +
  theme(legend.position="none") +
  labs(y = "Blood Sugar Sex Magik", x = NULL)

frequency Figure 1.1 Comparing word frequences between RHCP albums ‘One Hot Minute’ & ‘Blood Sugar Sex Magik’.

Words that are close to the line have similar frequencies in both albums. Some words landed here unintentionally. For instance, rick and kiedis are most likely not lyrics but appear from scraping the web page (Rick Rubin was the producer on both albums while Anthony Kiedis is the lead singer). It is interesting to see ‘funky’ appearing near the middle of the line while ‘love’ appearing at the high end of the frequency.

We can now quantify how similar and different these sets of word frequencies are using a correlation test. How correlated are the word frequencies between ‘One Hot Minute’ & ‘Blood Sugar Sex Magik’?

cor.test(data = frequency[frequency$album == "One Hot Minute",],
         ~ proportion + `Blood Sugar Sex Magik`)
## 
##  Pearson's product-moment correlation
## 
## data:  proportion and Blood Sugar Sex Magik
## t = 5.2814, df = 246, p-value = 2.82e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2026112 0.4267278
## sample estimates:
##       cor 
## 0.3191241

The correlation between words in ‘One Hot Minute’ & ‘Blood Sugar Sex Magik’ is .31, not a strong indication of similar lyrics. This could be due to the addition of Navarro or the RHCP trying a different lyrical tone.

Combine both albums and add sentiment analysis

Using an inner_join statement we can get a good grasp of the sentiment by grabbing positive and negative words. Lets find the net sentiment between the two albums.

tidy <- bind_rows(blood_sugar, one_minute)

afinn <- tidy %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(album) %>% 
  summarise(sentiment = sum(score)) %>% 
  mutate(method = "AFINN")

ggplot(afinn, aes(album, sentiment, fill = album)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~album, ncol = 2, scales = "free_x")

sentiment Figure 1.2 Displays ‘One Hot Minute’ has a higher negative sentiment between the two album lyrics.

We can now examine the top words used between each album

tidy %>%
  group_by(album) %>%
  count(word, sort = TRUE) %>%
  filter(n > 13) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = album)) +
  facet_wrap(~ album, scales = "free_y") +
  geom_col(show.legend = FALSE) +
  labs(y = "Most Common Used Words") +
  coord_flip()

common Figure 1.3 Examines the most common words used between the albums

We can now reshape this chart into a wordcloud

Most Common Word Clouds

  tidy %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

cloud

Lets distinguish between positive and negative words.

library(reshape2)

tidy %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#F8766D", "#00BFC4"),
                   max.words = 100)

wordcloud

Summary

The data appear to support the notion that when John Frusciante left the RHCP momentarily their lyrics were darker and ‘negative’ according the word lexicons. The analysis was motivated by the tidytext and rvest packages. Further analysis could look into the RHCP later albums with Frusciante and possibly other Navarro band lyrics. The Red Hot Chili Peppers revolutionized funk rock in America and anyone interested in their journey should read ‘Scar Tissue’.