My plea for a more responsible attitude towards Big Data is based on the following two theses:
1. Data is power. Or more accurately it goes: data – information – knowledge – power;
2. Following the late Stan Lee’s wisdom: ‘With great power comes great responsibility’.
Many Big Data champions claim that data mining allows us to develop objective predictive models - models that help us predict future behaviours by learning from past observations (i.e. data). My PhD argues the opposite: all predictive models are subjective. Even if one could claim that the data are unbiased, which they hardly ever are, the process of analysing those data is always subjective: the focus of the analysis and the selected algorithm(s) are decided by someone with a specific goal in mind, and are therefore inherently selective. The following example illustrates how such biased algorithmic models can have severely detrimental effects on human lives. It also demonstrates the importance of awareness and transparency of subjectivity and personal biases in data mining explorations:
In 2012, a Silicon Valley start-up developed a piece of software for predictive policing called PredPol. The software uses historical crime data to calculate, in real-time, where crimes are most likely to occur. Many police departments in the United States purchased the software to help them deal with a demand for increased efficiency of their police force. By spending more time patrolling the areas where crimes were predicted, police officers were able to discourage criminal activities such as burglaries from taking place. However, because police patrolled such areas more frequently and more thoroughly, a lot of minor crimes, such as selling small quantities of drugs, which would typically be unrecorded, were now on the record too. This led to a lot more arrests for victimless crimes in impoverished neighbourhoods, where a majority of the population is from a Black or Hispanic background. Similar minor crimes committed by college students would go unpunished since they occurred in other, unpatrolled, neighbourhoods.
The UCLA anthropology professor who founded PredPol, Jeffry Brantingham, ensured that his predictive policing model is blind to race and ethnicity: the program does not focus on individuals but targets geography instead (O’Neil 2016). The input into the model only consists of the type, location, and date and time of each crime. However, even if the data model is colour blind and unbiased, geographic and cultural segregation in cities - especially in the United States – resulted in a system of criminalising and punishing poverty. PredPol unwittingly created a system where the less fortunate and marginalised were more likely to fall into a cycle of incarceration and re-incarceration.
The Spatial Information Lab of Columbia University and the Justice Mapping Center in New York came up with an alternative approach to studying criminal justice data. In their study, researchers investigated the same crime data sets used for predictive policing, but instead of looking at the location of the crime, they looked at the home addresses of incarcerated citizens, prison admission rates, and the public cost of incarceration. As a result of this study for five cities in the US, the researchers came up with the term ‘Million Dollar Blocks’: single urban blocks where public expenditure would exceed one million dollars a year to put people into the prison system. Mapping these urban blocks helped the researchers identify urban ‘hot spots’ – sources of mass incarceration - and demonstrate the importance of geography as a root cause of imprisonment. The researchers argue that public investment used for mass incarceration could be redirected to improve and uplift civic infrastructures, such as education and other community resources, in order to address the causes and not just the symptoms of crime and to ultimately break the cycle of incarceration.
The Million Dollar Blocks initiative demonstrates how the same data can be used to form an entirely different narrative when approached from a different angle. While data do hold power, the individual or team that analyses the data shapes the narrative and decides how agency and power are exercised, and therefore carry a big responsibility.
This is the first article of a series I will be publishing as summaries of my PhD dissertation.
© Million Dollar Blocks - Colombia University's Center for Spatial Research