Bryon Jacob, the CTO and co-founder of Austin-based data.world, just knows there is so much more we could be doing with data. But even in 2017, he says, finding good data is tough. That’s not all that limits us as we try to better understand the data we collect: It’s hard to collaborate on big data projects. Preparing big data sets for analysis takes an inordinate amount of effort. Plus, private corporations often don’t want to share their data, and even if and when they want to, it’s not always easy for them to do it.
Jacob, a member of the Program Committee for this year’s Data for Good Exchange, to be held in New York on Sunday, September 24, is working on all of these problems. “Just reading through the papers, there are a lot of submissions that are focused around this area that I’m extremely interested in and which hasn’t gotten a lot of air time,” says Jacob. “How do people collaborate on data?”
Data.world, a collaboration platform for data projects, is one of the tools Jacob is using to tackle these issues, which he expects to be prominent at the conference. “If you log in to our open data platform, which anyone can do, it looks like a social network,” he explains. “You’ll see profiles of people, the datasets they want to share and the projects they are working on.”
In addition to enabling collaboration on individual projects, data.world helps researchers and others capture the metadata that’s created when they prep their data for analytics. “Eighty percent of every data project is finding the data, cleaning it, understanding it, and getting it ready,” he says. “That work is done on a project-by-project basis and thrown away.” Capturing that metadata means that the next person to use that open data set will, in effect, get a head start.
Jacob also wants private companies to use and provide open data more effectively. “It’s hard to get the flywheel moving,” he admits, “and most companies don’t immediately see why they should open up their data.”
The hurdles, he says, are classic: The risks seem high and the returns uncertain. Jacob says companies worry about legal risk and technical costs, and as a result, do nothing. He hopes that data.world, by providing a platform that makes it easy to publish data openly, while also providing a space to work on it privately and securely, will help lower the cost and work required for a company to share its data with a larger community of trusted collaborators.
But that’s only half the battle. As Jacob puts it, the returns for companies that choose to make their data available aren’t obviously translatable to traditional corporate decision-making. By allowing others to look at their data with a fresh set of eyes, Jacob says companies can get insights from their data that they’d never discover on their own.
There can also be a public relations benefit for companies that open up their data, but Jacob says that’s generally not enough. “I think companies aren’t opposed to making data more open, but they’re not incented enough,” he says. “We have to reduce the cost and friction of doing it.”
Bryon is well-versed in the benefits of getting different groups –public or private—to collaborate. The teams that tend to get the most value from data.world, Jacob says, are those that combine experts from different perspectives: an academic, say, partnering with a mathematician and a subject matter expert, enabling the trio to attack knotty problems from a variety of angles. In that, he sees a parallel with the Data for Good Exchange. “I don’t know of any similar gathering that’s led by and explicitly targets both public sector and private sector organizations,” he says. “That’s the missing piece for getting people to communicate and work together.”
Register now for the 2017 Data for Good Exchange. We encourage you to register early and invite colleagues who may wish to attend.