Some unusual science for the day: Using modern computer science to understand just where Rock n' Roll came from.

Originally shared by Yonatan Zunger

Some unusual science for the day: Using modern computer science to understand just where Rock n' Roll came from. Normally, people who talk about the history of music break it down into genres which have a lot to do with marketing, country of origin, and so on, and talk about individual bands as historical influences on each other. But these boundaries can be quite arbitrary: for example, "gospel" and "rock" are considered very far apart, but if you go back to the 1950's, rock was so influenced by gospel that it was hard to tell them apart at times.

So these researchers tried something else. They analyzed the songs which topped the US charts from 1960 to 2010, about 17,000 in all. For each song, they examined features not of the marketing around the music, but of the music itself: instrumentation, chord changes, timbre, types of harmony. They then used a technique called "k-means" to find the natural clusters into which the songs fell by these measures, and found thirteen natural groupings. To understand these groupings better, they adapted a technique from molecular genetics which is used to understand the functions of genes: they took song tags from, and did a mathematical analysis to see which song tags were most strongly associated with each cluster. (For example, if one cluster had songs tagged "R&B" far more often than the other clusters did, it's a good sign that this tag describes the cluster) 

They came up with 13 clusters -- what you might call the "purely musical" genres of the music, since they're based entirely on the songs' musical qualities, not on the politics or marketing around them. These ranged from cluster #2 (hip hop / rap / gangsta rap / old school) to #9 (classic rock / country / rock / singer-songwriter) to #8 (dance / new wave / pop / electronic). 

The image you see a bit of below is the history of the popularity of these genres over time, with 1960 at the bottom of the graph and 2010 at the top. You can see the sudden rise of rap (leftmost column), the gradual vanishing of jazz and the blues from the charts (the dwindling figure center-right), and the coming and going of hard rock (the dark blue bubbly thing at the center).

Interestingly, they have answered one important historical question, about the significance of the British Invasion: apparently no, this was not the key catalyst of the revolution in American music; the revolution was already well underway before the Beatles arrived in 1964. (Which shouldn't really surprise people too much, given that this is where rock came from) 

If you look at the bottom of the image, you'll notice a tree structure which the summary on the arXiv blog doesn't talk about; you'll have to read the article itself ( for that. It's basically a genetic tree of these genres of music. This is constructed using the same techniques of "genetic relatedness" which are used to create modern evolutionary trees of species, only instead of being based on DNA snippets, they're based on those underlying musical features like chord changes which were the basis of the clustering. So you can see (for example) that hip-hop comes from a completely different ancestry than all the other observed genres, while pairs like country and classic rock are close relatives.

Why is this interesting? Apart from the obvious fun of studying music history using the methods of molecular biology, it shows the ways in which these techniques can be used to describe a whole host of things. To make this work, what you need is a large sample of items to classify (here, songs); for each item, a large collection of features to measure (a few hundred at least; in this case, things like chord changes and instrumentation); and if you want to be able to describe the function of these features, have functional labels (here, song tags) for at least a good collection of the items you want to classify. Then you can do a "genetic analysis," grouping them into families, observing family trees, and (if you have additional data, like the year of release in this case) understand things like the evolution of these groups over time or space.

What's marvelous is that you can do this sort of analysis with all sorts of things. Do it on news articles, with the features being words, and you'll discover that they cluster into stories, which in turn cluster into subjects. (Why? Because you'll see, say, a bunch of stories with the word "Brezhnev" which also include references to the USSR, and these come and go over time, and at later times start to also include stories about "Andropov," "Chernenko," and "Gorbachev." Depending on how finely you slice these, you can either see the life of a politician, or the history of the Soviet Union.) Do it on a city's road network, with features involving the number of cars on each chunk of the road at a given time, and you'll discover... well, I'm not sure what you'll discover. I don't know if anyone's ever done that analysis. But you could do it and find out.

This is the real magic of data analysis: it gives you new ways to stare at what seem like hopelessly complex piles of data, and see meaningful patterns.


Popular posts from this blog

A census of amplified and overexpressed human cancer genes : Nature Reviews Cancer

RT @CancerInNorwich: 📢 Join us on Monday at 1pm for the next talk in our virtual seminar series. We will be hearing from Dr Wafa Al-Jamal from @QUBelfast, who will be talking about "Smart Nanomedicimes for Pancreatic Cancer". All are welcome to join via the MS Teams link:

An excellent opportunity to join the cancer genetics team as a diagnostic lab bioinformatician in Norwich. Lots of exciting projects to get involved in. Deadline: 30/05/2023. Happy to chat about this role.