2017 Data for Good Exchange: With Great Data Comes Great Responsibility

Data, and the analysis of it, holds enormous power to change minds, alter policies, and challenge societal assumptions. And, for data scientists, that implies a personal and professional obligation to fully understand the influence of their work and a responsibility to ensure its ethical application.

In other words, With Great Data comes Great Responsibility. That was the dominant theme of the day-long Data for Good Exchange (D4GX) conference at Bloomberg’s Global Headquarters in New York on Sunday, September 24th. The event brought together hundreds of data scientists from across academia, research, the non-profit sector, and business.

In the opening keynote address, John Kahan, an executive at Microsoft who’s been leading a high-profile analysis of data related to Sudden Infant Death Syndrome (SIDS), revealed that new efforts are underway to analyze the DNA of babies who suffered unexplained deaths. He also said the first sample to be analyzed will be that of his son Aaron, whose death in 2003 sparked his effort as an advocate to eradicate this ‘silent killer.’

In the second keynote of the morning, Sarah Tofte, director of research and implementation at the non-profit Everytown for Gun Safety, used data to demonstrate how judges in Rhode Island and Colorado proved to be lax in applying laws requiring people convicted of domestic violence crimes to relinquish their firearms.

The keynotes opened a day in which researchers presented papers on a wide range of topics, all of them big, from tracking opioid overdoses and the incarceration rates in U.S. jails and prisons to the viability of the world’s food supply in the next decade and beyond.

Sara Menker, CEO and Founder of Gro Intelligence, a firm that builds analytics software for the global agriculture industry, opened her remarks by asking, “How do we feed 9 billion people by 2020?” She then explained how she and her team determined that the world is likely to experience a food shortage in the next decade, far sooner than the shortage most governments expect in 2050.

Menker estimates that, within 10 years, the difference between the world’s food supply and demand could be a many as 214 trillion calories, which she quantified as 359 billion Big Macs. “This tipping point is where the world’s ability to produce food to meet growing demand will break,” she said.

A paper presented by Katherine Keith, a researcher at the University of Massachusetts at Amherst, detailed the use of natural language processing (NLP) to automatically scrape news articles from Google News to automatically track and update a database on the number of people killed by police officers in the U.S.

Keith’s team found that different standards in how state and local governments gather data on police-related fatalities had led federal agencies to miss as many as 2,000 deaths every year, and that the data is riddled with errors. Keith said the team is also investigating how the technique may be extended to other domains, such as tracking victims of humanitarian disasters.

In a conversation with Noel Hidalgo of Beta NYC and Brianna Vecchione, a Senior Fellow at Microsoft, Manhattan Borough President Gale Brewer discussed how harnessing data from New York City’s 311 service and information phone system has led to a better understanding of neighborhood problems throughout New York City’s most populous borough. “Everyone votes nationally, but they complain locally,” Brewer said.

A panel discussion on the use of social media data for social good explored the moral and ethical implications of how businesses use data to both make more money by better targeting consumers, but can also use the same data to effect positive social changes. During this session, The Governance Lab at NYU Tandon School of Engineering shared findings from its new report, “The Potential of Social Media Intelligence to Improve People’s Lives: Social Media Data for Good,” detailing how social media data and analytical expertise can be used to develop solutions to pressing public problems.

John Akred, CTO of consulting firm Silicon Valley Data Science, drew applause when he said that large companies have a unique perch from which they can observe the ebb and flow of societal trends and this data can be leveraged both for social and commercial good. “We need to teach companies that the analysis of data for social good is an obligation, not a luxury.”

On the same panel, Natalia Adler, a data, research, and policy manager at UNICEF (The United Nations Children’s Fund), said that companies also struggle with the notion of donating their data for use outside their business. “Corporations are often willing to donate money to certain causes, but when it comes to donating data, they simply don’t know how to do it.”

Ultimately, this year’s Data for Good Exchange stimulated thinking, conversation, and action, including the launch of a community-driven initiative to develop a ‘Hippocratic Oath’-like code of ethics on sharing data – by data scientists, for data scientists – called Community Principles on Ethical Data Sharing (or CPEDS). And it gave the community of professionals that collect, analyze and apply powerful data sets the opportunity to remember the great responsibility that comes with their work.