Bridging the Data Science Gap
DataKind connects data scientists with social organizations to maximize their impact.
Data scientists want to contribute to the public good. Social organizations often boast large caches of data but neither the resources nor the skills to glean insights from them. In the worst case scenario, the information becomes data exhaust, lost to neglect, lack of space, or outdated formats. Jake Porway, Data Without Borders founder and The New York Times data scientist, explored how to bridge this gap during the second Big Data for the Public Good seminar, hosted by Code for America and sponsored by Greenplum, a division of EMC.
Code for America founder Jennifer Pahlka opened the seminar with an appeal to the data practitioners in the room to volunteer for social organizations and civic coding projects. She pointed to hackathons such the ones organized during the nationwide event Code Across America as being examples of the emergence of a new kind of “third place”, referencing sociologist Ray Oldenburg’s theory that the health of a civic society depends upon shared public spaces that are neither home nor work. Hackathons, civic action networks like the recently announced Code for America Brigade, and social organizations are all tangible third spaces where data scientists can connect with community while contributing to the public good.
These principles are core to the DataKind mission. “Anytime there’s a process, there’s data,” Porway emphasized to the audience. Yet much of what is generated is lost, particularly in the third world, where a great amount of information goes unrecorded. In some cases, the social organizations that often operate on shoestring budgets may not even appreciate the value of what they’re losing. Meanwhile, many data scientists working in the private sector want to contribute their skills for the social good in their off-time. “On the one hand, we have a group of people who are really good at looking at data, really good at analyzing things, but don’t have a lot of social outputs for it,” Porway said. “On the other hand, we have social organizations that are surrounded by data and are trying to do really good things for the world but don’t have anybody to look at it.”
To facilitate these connections, DataKind connects “expert data scientists with social organizations to maximize their impact” through collaborations with organizations in need, fellowships, and weekend data dives. To emphasize the vast disconnect between social organizations and the field of data science, Porway pointed to work DataKind did with New York Civil Liberties Union to analyze and visualize “stop-and-frisk” incidents recorded by the New York Police Department in 2010 to determine whether there was a trend of racial profiling. Displaying the resulting maps, Porway said to the data scientists in the room, “I know what you’re thinking: it’s just a map. But what’s easy for you guys to do is transformative for social organizations.”
Such collaborations can also bolster the job market for data scientists: Porway noted that as an increasing number of organizations recognize the value of deep data dives, organizations such as United Nations Global Pulse are hiring teams of dedicated researchers.
During his talk and the lively question-and-answer session following, Porway acknowledged the challenges ahead: in some cases, organizations may resist opening their data to outsiders, fearing that some internal information could be used against the them rather than to serve the organization’s mission. Porway stated that such non-profits need to be convinced that the information released will be leveraged to serve the greater good.
He warned that when “data and skills are silo’d from one another”—when organizations and those who can analyze data operate separately—the results often lack focus and discourage civic engagement. Pointing to the Obama Administration’s mandate for government agencies to release open data through Data.gov, Porway said that such data dumps are “like giving crude oil to people…open data is not useable data.” In a case like Data.gov, Porway explained, the issue is that the government agencies often have no idea who would want their data sets and how they could be used. This is a problem that can be addressed by engaging government agencies, social organizations, and data scientists in an ongoing dialogue. “By bridging these communities, you’re starting to make that data useable,” he said.
Looking ahead, Porway expressed a pragmatic but optimistic view of the future. He’s excited by the ever-increasing amount of clean and accessible global data generated by mobile devices, and advancements in sentiment analysis for audio and video. But the most fundamental cultural and civic shift is the network of “transformative communities” emerging from the breakdown of silos separating government data, social organizations, journalists, and data scientists. In Porway’s view, the top-down world of “big silos” is being replaced by “a bottom-up world” where “select groups within those communities are coming together for a common goal and sharing across those boundaries to do more.”
