About Data Commons

Why Data Commons

Publicly available data from open sources (census.gov, cdc.gov, data.gov, etc.) are vital resources for students and researchers in a variety of disciplines. Unfortunately, processing these datasets is often tedious and cumbersome. Organizations follow distinctive practices for codifying datasets. Combining data from different sources requires mapping common entities (city, county, etc.) and resolving different types of keys/identifiers. This process is time consuming, tedious and done over and over. Our goal with Data Commons is to address this problem.

Data Commons synthesizes a single graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources without data cleaning or joining. We hope the data contained within Data Commons will be useful to students, researchers, and enthusiasts across different disciplines.

Who can use it?

Data Commons can be accessed by anyone via the tools available on datacommons.org. Students, researchers and developers can use the REST and Python APIs, both of which are free for educational, academic and journalistic research purposes. At this point, use of these APIs does not require an API Key.

The data may also be accessed via the SPARQL query interface, which requires an API Key.


Data Commons has benefited greatly from many collaborations. In addition to help from US Department of Commerce (notably the Census Bureau), we have received help from our many academic collaborations, including, UC San Francisco, Stanford University, UC Berkeley and Harvard.

We are looking for more collaborators, both for adding new data to Data Commons and for building new and interesting applications of Data Commons. Contact us if you are interested in working with us.

Advisory Board

We are fortunate to have the counsel of our Advisory Board, which includes:

  • Gary King, Director for the Institute for Quantitative Social Science at Harvard University.
  • Arun Majumdar, Director, Precourt Institute for Energy.
  • Sendhil Mullainathan, Roman Family University Professor of Computation and Behavioral Science at Chicago Booth.
  • Alfred Spector, Former head of research at Google.
  • Hal Varian, Chief Economist, Google.

See also