Why We Need Community-Driven Tech Ethics: Two Years of the Global Data Ethics Project

Maureen “Mo” Johnson is an advocate for ethics centered on a framework of human rights, public policy, and cooperative infrastructure. You can find her @moridesamoped or on LinkedIn. With contributions from Lilian Huang and Abhijeet Chavan.

As more data scientists and technologists become aware of the unintended consequences of their algorithmic work, there has been a commensurate demand for better education, training and community-building around ethical data science tools, advocacy and methodologies. The data practitioners themselves are voicing emergent ethical concerns in industry, in civic tech organizations, and anywhere data is being used, analyzed and applied. We are on the ground, writing the code. Many of us are aware of the impact that our work — including its unintended consequences — is having on civil society.

Today’s Data for Good Exchange 2019 conference in New York City marks the two-year anniversary of a joint initiative by BrightHive, Bloomberg and Data for Democracy that was announced at Data for Good Exchange 2017. At the time of its inception, data ethics was not mainstream, but it is an essential part of the conversation now. This is — in large part — due to collaborative grassroots efforts, including our own, which invigorated distributed networks of concerned volunteers from different disciplines around the world.

In 2017, Data for Democracy (D4D) was a growing Slack channel made up of a distributed network of more than 800 data scientists and data practitioners. Most were from North America, Europe, Australia and New Zealand, and they congregated to do data for good  projects. The community model was participatory; if you proposed a project, you led the project and collaboratively built a team of volunteers to address a pressing data problem. Because of this participatory approach and intentional inclusion of multiple data practitioners, regardless of expertise, D4D was an ideal community-based forum within which to generate a broad set of ethical principles. This was an essential component of the approach designed by Natalie Evans Harris, COO of Brighthive: “By data scientists, for data scientists”, where the term data scientist was as broadly and inclusively defined as possible. The intent of the D4D community model was to include people who might not even consider themselves data scientists. We sought to welcome those also wished to become data scientists, as well as those who were data and tech adjacent. Most importantly, the ethics project also welcomed those who studied and were engaged in thinking critically about technology’s impact on society.

The initiative was originally named Community Principles in Ethical Data Sharing (CPEDS), later the Data Oath (or colloquially, the Code of Ethics or ethics pledge), and today it is known as the Global Data Ethics Project (GDEP). This evolution of the project name reflects the complexity of the community-based, multi-stakeholder work that took place over the past two plus years. The D4GX conferences have served as a regular assembly point for the community to gather in person, punctuating months of research, meetings, and in-depth discussions that individual working groups engaged in and contributed to using digital tools, including GitHub, online discussions, teleconferences and social media, to build the principles from the ground-up. This ensured that people who could not be physically present at a D4GX conference could still contribute their significant insights and perspectives to the ethical discussion.

At D4GX 2017, Lilian Huang of the University of Chicago and BrightHive’s Natalie Evans Harris organized the initial call for working groups, simultaneously announcing it on social media and in the D4D Slack channel. Volunteers formed seven working groups, focused on different aspects of data ethics, and led by the following moderators: 

  • Data Ownership and Provenance: David Morar
  • Bias and Mitigation: Maria Filippelli
  • Responsible Communications: Maureen “Mo” Johnson 
  • Privacy and Security: Erin Stein
  • Transparency and Openness: Abhijeet Chavan
  • Questions and Answers: Lilian Huang
  • Thought Diversity: Margeaux Spring

This call for moderators is what galvanized me to join the effort as a volunteer. Because of my background in public health policy, data analysis and social science research, I chose to moderate the Responsible Communications working group. This was driven, in part, by a fundamental belief that any data project should both include and partner with the communities impacted by data work. Over the next several months, these teams of volunteers from across the country, representing non-profits, research institutions as well as industry, and with expertise in different disciplines, met regularly for in-depth discussions. They gathered essential readings, compiled case studies, and created seven sets of recommendations.

Alongside the other moderators, I worked with community members to present these recommendations to a larger audience at D4GX San Francisco in February 2018. There, over 2,000 data practitioners, ethicists, advocates and academics participated at the conference or virtually through the Slack channel to discuss, consolidate and develop the ethical principles. They developed a 5-point framework around “Fairness, Openness, Reliability, Trust and Social Benefit” (aka FORTS), and 10 Global Data Ethics Principles for data practitioners to apply to their work with data.

At the event, Omidyar Network, a philanthropic institution, proposed that the extra-institutional principles be adapted into a “Data Oath” for data practitioners. This framing suggested that the ethical principles could be used to signify both commitment to and affiliation with an ethical data science community. With support, the project grew and I became the community lead in mid-2018, helping facilitate conversations about the ethical use of data, current events in technology, privacy, fairness, bias and other salient issues.

Community Building is Key in Tech Ethics

As the ethics project has expanded to meet the expectations and participation of many stakeholders, my own role in and understanding of the data and tech ethics community has also changed and grown. For me, it has been important to contextualize the Global Data Ethics Project within the evolving culture of data science and the tech industry. In response to an emergent need, we worked together to form an inclusive and equitable data science culture which foregrounds ethical thinking. It was important to define priorities for overall ethical behavior with the goal that these guidelines were both general enough to be relevant to a data scientist’s approach to any work, and specific enough to shape concrete actions. In other words, the community-generated ethical principles are meant to guide a data scientist in being a thoughtful, responsible, and ethical agent who can work productively with other data scientists and data stakeholders.

Over the past year, the community has tripled in size and made efforts to extend and include others, and the conversation around ethical data science and technology has generated efforts outside of the core principles; many community members have participated in and led initiatives to address the impact of technology on society. In addition, because of our efforts and many others, the discussion and application of ethics in the use of data and technology has become a visible and essential part of the tech ecosystem. Ethics in tech is part of the conversation now — and it is here to stay.

It is also essential to practice our ethical aspirations while building community: We must continue to build inclusion and equity as core values in the tech ethics community. More importantly, we must listen to voices who have been speaking for a long time.

I believe that community work is — in and of itself — a tool of empowerment and collective action that can address emerging ethical issues in the development of new technologies. Questions of how to minimize algorithmic bias, preserve citizen’s privacy, diminish surveillance, regain informed consent, and other urgent issues mandate the need for multiple voices and perspectives–  both inside and outside of the technology. We cannot solve ethics in tech with one toolkit or one set of principles. It is the collective action and context-specific efforts that drive solutions for ethical issues in technology. In fact, most community members have been attentive to larger social efforts and movements to center ethics in the development and application of technology.

Abhijeet Chavan (left) at the Legal Services Corporation hackathon in New Orleans in January 2018, one of several events that led to a community-created set of principles for data and AI ethics in legal services.

Community members apply this work in other contexts 

One of the working group moderators, Abhijeet Chavan, started an initiative to discuss artificial intelligence (AI) and data ethics in legal services. The objectives of the initiative were to engage the legal services community to learn about new data-driven, algorithm-powered technologies; to discuss and identify impact on legal due process and ethics; and to develop a set of community principles and guidelines to protect and promote the community’s professional values. He conducted a series of community discussion events at conferences and hackathons to collect ideas and concerns from legal professionals. This led to the creation of the Legal Services Community Principles and Guidelines for Due Process and Ethics in the Age of AI.

Other community members have applied the principles to data for good projects at D4D and other civic tech organizations, such as Civic Data Alliance, led by Matthew Gotth-Olsen and Margeaux Spring, one of the original CPEDS moderators. These efforts focus on the oversight of volunteer-driven data work and the unintended consequences of using humanitarian data and data from vulnerable populations. Principles such as trust and putting people before data are essential when partnering with other communities, especially those who have been ignored, neglected or exploited by algorithms deployed by technology firms who do not foreground ethical thinking.

In addition, affiliated projects such as the Data Practices community, which includes many of the same community members as the ethics project, have included the ethical principles in their work. The goal of the Data Practices movement was to start a similar “Agile for Data” movement that could help offer direction and improved data literacy across the ecosystem. Data Practices courseware is developed and maintained collaboratively by a community of experts, led by Patrick McGarry, to raise the level of data literacy across the entire data ecosystem, from novice to expert practitioners.

Professors who teach future industry data scientists are developing pedagogies focused on ethical technology; this seeds the way for building data scientists who center their work on ethical considerations. In 2018, Casey Fiesler created a list of ethical data science courses. Omidyar Network, in collaboration with Mozilla, Schmidt Futures and Craig Newmark Philanthropies, launched the Responsible Computer Science Challenge, which supports the conceptualization, development, and piloting of curricula that integrate ethics with undergraduate computer science training. One GDEP community member, Giulio Valentino Dalla Riva of the University of Canterbury in New Zealand, created The Trustworthy Data Scientist, a course built around the ethical principles generated in our community-based effort.

At conferences, on Twitter, and in the classroom, professors discuss ways to drive ethical thinking in technology training. Lilian Huang, currently a graduate student at the University of Chicago, shared that she believes that “there is strong interest, within academia and beyond, in building ethical frameworks that support and structure how we think about data and algorithms.” Newer institutions such as AI Now Institute and Data & Society are merging academic research with practice and policy advocacy.

Industry players have also joined the conversation on ethical technology, in part because of community-generated, tech-worker-led efforts. Members of the GDEP community have focused on working within their respective companies to drive changes within existing programs, aligning themselves with Human Resources, Compliance and Diversity & Inclusion initiatives already underway. Success has often depended on the company’s existing values, and whether or not a company was willing to allocate the time, personnel and resources to the application of ethical principles to its technology and business practices.

Last year, Salesforce hired Paula Goldman as its first-ever Chief Ethical and Humane Use Officer. While at Omidyar Network, she was a key advocate for a data oath for data practitioners and part of the larger code of ethics effort. Natalie Evans Harris is using the GDEP principles and framework to shape the governance agreements for all of BrightHive’s data trusts. Each member of the trust signs an amendment which includes a base set of principles, plus additional ones that are unique to their data.

Our Ethical Futures

At this two year anniversary, retrospectively, I feel that the ethics project established a starting point for me and many other volunteers — many of whom I now consider close friends — to join a movement to imagine a world built on consensus-based, inclusive and participatory ethical values. Many of us connected because we felt there was a need for a collaborative, extra-institutional place to share perspectives, initiatives and tools. We had been involved in local efforts — at our companies, in city government, at our civic tech organizations — and wanted to share challenges, support each other and build from the ground up. We wanted ethics to be a key part of the conversation about technology. I’m grateful that I am now part of this community.

I believe that we need to continue doing the work to expand the conversation about technology beyond technologists, and include social scientists, community organizers, historians, students and communities most affected by the unintended consequences of technology and policy initiatives. We must also make efforts to actively include, listen and collaborate with initiatives outside of the English-speaking world. This requires tools of digital cooperation, empathy, and to practice ceding power and elevating others’ voices.

One of GDEP’s principles is this: “Ensure that all data practitioners take responsibility for exercising ethical imagination in their work, including considering the implication of what came before and what may come after, and actively working to increase benefit and prevent harm to others.” Building on that principle and incorporating ideas from speculative fiction, some community members and I built a writing prompt generator called the Ethical Imagination Folded Futures Story Project. We hope this project seeds the idea that the ways we build technologies are intimately connected to how we imagine our collective future. Perhaps we can use this vision to inform and guide our present-day behavior. For me, this means deeply acknowledging and integrating human rights frameworks and long-standing civil rights and social justice movements into ethical thinking.

In the Folded Futures Story Project, we also wrote: “We want to see groups of people collaborating to save the world rather than a lone hero. This means building new narratives that place less emphasis on the work of individuals or even small groups, and instead place greater emphasis on world-building, and global systems that form and evolve over generations to meet and respect societal needs.” I feel that our shared responsibility is our biggest strength as we enact change in the tech ecosystem, and the world in which it is embedded. We are many voices with a diversity of opinions, experiences and perspectives about ethics in data and tech. Our collaborative disagreements and consensus-building push the edges of our ethical thinking, and any leadership I bring to the table is rooted in the plurality of our collective voices.

Ultimately, the GDEP community effort is one of many grassroots initiatives in the global conversation about ethics in data, algorithms and artificial intelligence technologies. The act of creating and sustaining a group of engaged data practitioners provided lessons in distributed community building. It is important to consider that a community of practice could open the possibility for radical, inclusive affiliation built around a commitment to an ethical and collaborative technology practice, not limited to direct mandates, principles or checklists. At the very least, we have learned about collaborative processes that are useful in the establishment of a consensus of ethical norms — particularly, we have learned that as technology evolves, our understanding of ethics must evolve as well.

Acknowledgements: Many community members read this post and offered thoughtful feedback and edits. This work, like most work done in the world, was done with the support of many.