Computational Journalism

What is computational journalism?

Ultimately, interactions among journalists, software developers, computer scientists and other scholars over the next few years will answer that question. Faculty and students at the DeWitt Wallace Center are collaborating with colleagues across the country to advance this new field and to explore and develop computational tools to support accountability journalism.  These collaborations have resulted in the following publications, projects, and conferences.

 

  • New Coursework: Spring 2012 CPS 296.1, Project in Computational Journalism

Duke Computer Science professor Jun Yang taught a course aimed at developing a computational journalism project. The result — “a Web-based interface that would allow journalists to effortlessly harness the power of mTurk (as well as social networking sites like Facebook) to split up massive public record dumps into individual document.” For a description of how the project unfolded, see “Analyzing documents with the help of the crowd

 

Conceived by then Knight Professor Sarah Cohen, the Reporters’ Lab is a research project launched in February 2011 to find ways to, as Harvard’s Nieman Journalism Lab put it, “make reporting cheaper without cheapening the reporting.”

Although Cohen has since moved on to the computer-assisted reporting desk at The New York Times, her charge for the lab remains the same: To narrow the power gap between citizens and powerful institutions by arming reporters with the latest tools and techniques to mine sources for watchdog stories.

By serving as a bridge between open-source developers, investigative journalists and researchers from a variety of disciplines, the lab tests existing reporting tools, helps identify technological shortfalls in the newsgathering process and commissions or encourages the creation of time-saving tools for newsrooms of all sizes.

The lab has already had some success. Before the lab was formally launched, Duke partnered with Fernanda Viégas and Martin Wattenberg, now visualization experts at Google, to create TimeFlow, an investigative timeline and chronology tool. The open-source program is available for free and has helped journalists at ProPublica tell the story of the tragic aftermath of a roadside bomb in Iraq. It even helped a Barron’s reporter piece together a story of Russian corruption.

To tackle the prevalence of unsearchable audio and video, the lab’s then lead developer built the Video Notebook, an open-source video and audio annotation and analysis tool currently in beta. And in a span of about 30 hours, the lab worked with hackers, journalists and community members at a San Francisco hackathon to cobble together an application called Haystax, which can help users quickly gather information from clunky online databases for public records stories.

With Cohen now serving as founding director, the project is run by lab Managing Editor Tyler Dukes.

 

Sarah Cohen, Knight Professor of the Practice of Journalism and Public Policy and James T. Hamilton, Director of the DeWitt Wallace Center for Media & Democracy, together with Fred Turner of Stanford University, authored this article exploring “how computer scientists can empower journalists, democracy’s watchdogs, in the production of news in the public interest.”

 

  • June 2010 CASBS Workshop: Tracking, Transcribing, and Tagging Government: Building Digital Records for Computational Social Science

Innovation in technology, algorithms, and data availability are leading to the development of a new field, computational social science. Across disciplines and organizations researchers are devising new ways to take advantage of previously formidable amounts of data. In June 2010 the Center for Advanced Study in the Behavioral Sciences (CASBS) at Stanford brought together a working group composed primarily of computer scientists and political scientists for a week-long meeting entitled Tracking, Transcribing, and Tagging Government: Building Digital Records for Computational Social Science.

The workshop was distinctive in three ways. First, the sessions focused research attention on data describing government activity (at all levels – global, federal, state, and local) in all parts of the policymaking process (e.g., decision-making and implementation by all branches of government, input from citizens, coverage by media). Second, the presentations generally addressed the challenges of turning unstructured data in multiple formats into large structured data sets suitable for analyses such as text mining and network analysis. Third, the workshop brought together researchers from multiple fields—computer science, political science, history, classics, and journalism—to talk about how to create digital records that would facilitate research in social science and the humanities. The workshop sessions also revealed how the tools developed to analyze government records could also be transformed for use by journalists and others interested in holding public and private institutions accountable.

The workshop was organized by James T. Hamilton, Director of Duke’s DeWitt Wallace Center for Media and Democracy, and Frank R. Baumgartner, Richard J. Richardson Distinguished Professor of Political Science at the University of North Carolina (Chapel Hill).

Workshop Materials:

Workshop Description

Workshop Agenda and Session Abstracts

Workshop Participants

Presentations and Readings

 

  • July 2009 CASBS Workshop: Developing the Field of Computational Journalism

Final Report:  Accountability Through Algorithm: Developing the Field of Computational Journalism

(En Espanol)

This report describes how computational approaches, such as the development of a suite of open source reporting tools, can make it easier for reporters and citizens to hold government accountable. Drawing from discussions held at a workshop hosted by the Center for Advanced Study in the Behavioral Sciences (CASBS) at Stanford in July, 2009, James T. Hamilton and Fred Turner lay out the roles that foundations,  government agencies, academic research centers, nonprofits, open source developers, journalists, and readers can play in the evolution of this new field.

Conference materials:

Agenda and discussion topics

Participant Bios

Readings and Links

Research Exercises

Press Release

 

  • Jimmy Shedlick:  2010 Computation & Journalism Summer Internship

During the summer of 2010, Relevance, Inc. and the DeWitt Wallace Center enlisted Duke undergraduate Jimmy Shedlick (‘11) through a DukeEngage internship, to join a team of journalists and software developers in creating open source software tools to enhance the practice of watchdog journalism. Shedlick’s work was profiled in Duke Today.  Through his internship, Shedlick worked on a news-aggregator designed to help journalists discover stories more easily.

 

Related links from the field of Computational Journalism:

Institute for Quantitative Social Science, Harvard University Workshop on Computer-Based Text Coding, August 15-17, 2007, Penn State, Department of Political Science

Studying Society in a Digital World, April 23-25, 2009, Center for Information Technology Policy, Princeton University

Networks and Network Analysis for the Humanities, August 15-27, 2010, An NEH Institute for Advanced Topics in Digital Humanities

The Transparent Text Symposium, September 21-22, 2009, IBM Center for Social Software, Cambridge, Massachusetts

Open Government: Defining, Designing, and Sustaining Transparency, January 21-22, 2010, Center for Information Technology Policy, Princeton University

2008 Georgia Tech Symposium on Computation and Journalism

Deep Throat Meets Data Mining

Tracking Toxics When the Data Are Polluted

The Golden Age of Computer-Assisted Reporting Is at Hand

Using Data Visualization as a Reporting Tool Can Reveal Story’s Shape