Monday, November 24, 2014
Review of Dataclysm: Who We Are (when we think no one’s looking) by Christian Rudder (Fourth Estate, 2014)
The book then proceeds by discussing social media data and what they might reveal about human behaviour and society. Crucially, however, there is no systematic discussion of big data per se, its forms and characteristics, no discussion of data analytics, and only a cursory discussion of the many ethical, social and political implications of such data. There is no discussion of statistics, or statistical tests performed on the data presented, nor data mining, data analytics, machine learning, pattern recognition, profiling, prediction, etc. The irony here is that Rudder’s company – OkCupid – employs these techniques to be able to process and match potential partners, yet he never explains how this is achieved.
Instead, the entire analysis is rooted in the empiricist form of data science, rather than data-driven science, and never proceeds beyond description. As such, the analysis of gender and race he presents are based on a ‘letting the data speak for themself’ approach and constitutes armchair interpretation. He barely engages with the vast academic literature on quantitative analysis of race and gender that has taken place for several decades using large data sets such as the census or public administration data. Rudder has access to an enormous set of very interesting data that could be used to conduct some fascinating sociological and psychological analysis. Instead what we get are a series of descriptive statistics and banal revelations, most of which are already well established.
The result is a book that hints at the potential of big data and data science but undersells it substantially, and it under-estimates in my view the readership level of its potential audience by never progressing beyond mathematics and data visualisations used in junior school. In contrast, books such as The Signal and the Noise by Nate Silver provide a much wider and deeper discussion. This is a shame as Rudder is an engaging writer and he has privileged access to an extremely rich social data that could be used to conduct some wonderful and sophisticated social science research. Such rich research and its policy implications are barely hinted at.