by Joseph Rickert
Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to access it. Below is a list of 17 R packages that appeared on CRAN between May 1st and August 8th that, in one way or another, provide access to publicly available data.
bigQueryR: Provides an interface to Google's BigQuery . Thevignette shows how to use it.
blscrapeR: Provides an API wrapper for Bureau of Labor Statistics data sets. There is a vignette showing how to accessinflation and price data, one for accessing Wages and Benefits data , and one formapping BLS data.
dataone: The dataone R package enables R scripts to search, download and upload science data and metadata from/to the DataONE Federation . The website describes DataOne as "a community driven project providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data". The package comes with several vignettes including thisoverview.
eechidna: Provides the data from the 2013 Australian Federal Election and tools to analyze it. There are several nicely done vignettes. The following plot which shows election results by polling place comes from the vignette on plotting polling stations .
There are also vignettes oncensus andelection data,shapefiles and mapping Australia's Electorates .
getHFdata : Provides functions to downloads and aggregate high frequency trading data for Brazilian instruments directly from the Bovespa ftp site . There is avignette to get you started. The following plot showing unemployment data by state comes from the vignette onCensus data.
googleAnalyticsR: Provides an interface to the Google Analytics Reporting API. There is avignette.
googleway: Provides functions to retrieve data from 6 Google Maps APIs. Thevignette shows how.
gutenberg: Search and download public domain works in the Project Gutenberg collection. Thevignette shows you how to search and download public domain texts.
ie2miscdata: Contains a collection of USGS environmental and water resources data sets. There is avignetteshowing how to create plots from the data.
macleish: Provides functions to data from the Ada & Archibald MacLeish field station in Whately, MA. Thev ignette shows how to obtain weather data.
muckrock: Contains public domain information on requests made by muckrock through the US Freedom of Information Act.
osi: Provides a connector to the Open Source Initiative API that provides machine --readable data about open source software licenses.
pewdata: Provides for reproducible, programmatic retrieval of survey data sets from the Pew Research Center . Thevignette shows how to setup and use the package. Look here for an interesting poll about what Americans know about science.
For more packages that provide APIs to data sets have a look at the CRAN Task View on Web Technologies and Services . For a list of interesting data sets out there in the wild see the MRANData Sources page.
Editor's note: This is Joe's last post to Revolutions as a member of the Microsoft team: he is heading on for further adventures in the world of R. We want to thank Joe for his many contributions to the blog over the past 6 years, and please join us in wishing him well!