Great R packages for data import, wrangling & visualization

Datetime:2016-08-23 02:41:51         Topic: R Program  Data Visualization  DataBase          Share        Original >>
Here to See The Original Article!!!

One of the great things about R is the thousands of packages users have written to solve specific problems in various disciplines -- analyzing everything from weather or financial data to the human genome -- not to mention analyzing computer security-breach data .

Some tasks are common to almost all users, though, regardless of subject area: data import, data wrangling and data visualization. The table below show my favorite go-to packages for one of these three tasks (plus a few miscellaneous ones tossed in). The package names in the table are clickable if you want more information. To find out more about a package once you've installed it, type help(package = "packagename") in your R console (of course substituting the actual package name ).

My favorite R packages for data visualization and munging

Package Category Description Sample Use Author
devtools package development, package installation While devtools is aimed at helping you create your own R packages, it's also essential if you want to easily install other packages from GitHub. Install it! Requires Rtools on Windows and XCode on a Mac. On CRAN. install_github("rstudio/leaflet") Hadley Wickham & others
installr misc Windows only: Update your installed version of R from within R. On CRAN. updateR() Tal Galili & others
reinstallr misc Seeks to find packages that had previously been installed on your system and need to be re-installed after upgrading R. On GitHub. reinstallr() Calli Gross
readxl data import Fast way to read Excel files in R, without dependencies such as Java. CRAN. read_excel("my-spreadsheet.xls", sheet = 1) Hadley Wickham
googlesheets data import, data export Easily read data into R from Google Sheets. CRAN. mysheet <- gs_title("Google Spreadsheet Title")
mydata <- mydata <- gs_read(mysheet, ws = “WorksheetTitle”)
Jennifer Bryan
RMySQL data import Read data from a MySQL database into R. There are similar packages for other databases. CRAN. con <- dbConnect(RMySQL::MySQL(), group = "my-db")
myresults <- dbSendQuery(con, "SELECT * FROM mytable")
Jeroen Ooms & others
MonetDBLite data import & storage Speedy SQL database that installs and runs as an R package. CRAN. See Anthony Damico's tutorial at Hannes Muehleisen, Anthony Damico & others
readr data import Base R handles most of these functions; but if you have huge files, this is a speedy and standardized way to read tabular files such as CSVs into R data frames, as well as plain text files into character strings with read_file. CRAN. read_csv(myfile.csv) Hadley Wickham
rio data import, data export rio has a good idea: Pull a lot of separate data-reading packages into one, so you just need to remember 2 functions: import and export. CRAN. import("myfile") Thomas J. Leeper & others
psych data analysis No, I'm not using the functions that analyze personality data; but I do regularly use the describe and describeBy functions to summarize data sets, as well as read.clipboard to get data I've copied into R. CRAN. describe(mydf) William Revelle
sqldf data wrangling, data analysis Do you know a great SQL query you'd use if your R data frame were in a SQL database? Run SQL queries on your data frame with sqldf. CRAN. sqldf("select * from mydf where mycol > 4") G. Grothendieck
jsonlite data import, data wrangling Parse json within R or turn R data frames into json. CRAN. myjson <- toJSON(mydf, pretty=TRUE)
mydf2 <- fromJSON(myjson)
Jeroen Ooms & others
XML data import, data wrangling Many functions for elegantly dealing with XML and HTML, such as readHTMLTable. CRAN. mytables <- readHTMLTable(myurl) Duncan Temple Lang
httr data import, data wrangling An R interface to http protocols; useful for pulling data from APIs. See the httr quickstart guide . CRAN. r <- GET("")
content(r, "text")
Hadley Wickham
quantmod data import, data visualization, data analysis Even if you're not interested in analyzing and graphing financial investment data, quantmod has easy-to-use functions for importing economic as well as financial data from sources like the Federal Reserve. CRAN. getSymbols("AITINO", src="FRED") Jeffrey A. Ryan
rvest data import, web scraping Web scraping: Extract data from HTML pages. Inspired by Python's Beautiful Soup. Works well with Selectorgadget. CRAN. See the package vignette Hadley Wickham
dplyr data wrangling, data analysis The essential data-munging R package when working with data frames. Especially useful for operating on data by categories. CRAN. See the intro vignette Hadley Wickham
plyr data wrangling While dplyr is my go-to package for wrangling data frames, the older plyr package still comes in handy when working with other types of R data such as lists. CRAN. llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling Change data row and column formats from "wide" to "long"; turn variables into column names or column names into variables and more. The tidyr package is a newer, more focused option, but I still use reshape2. CRAN. See my tutorial Hadley Wickham
tidyr data wrangling While I still prefer reshape2 for general re-arranging, tidy won me over with specialized functions like fill (fill in missing columns from data above) and replace_na. CRAN. See examples in this blog post . Hadley Wickham
validate data wrangling Intuitive data validation based on rules you can define, save and re-use. CRAN. See the introductory vignette . Mark van der Loo & Edwin de Jonge
data.table data wrangling, data analysis Popular package for heavy-duty data wrangling. While I typically prefer dplyr, data.table has many fans for its speed with large data sets. CRAN. Useful tutorial Matt Dowle & others
stringr data wrangling Numerous functions for text manipulation. Some are similar to existing base R functions but in a more standard format, including working with regular expressions. Some of my favorites: str_pad and str_trim. CRAN. str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling Everything you ever wanted to do with date arithmetic, although understanding & using available functionality can be somewhat complex. CRAN. mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham & others
zoo data wrangling, data analysis Robust package with a slew of functions for dealing with time series data; I like the handy rollmean function with its align=right and fill=NA options for calculating moving averages. CRAN. rollmean(mydf, 7) Achim Zeileis & others
editR data display Interactive editor for R Markdown documents. Note that R Markdown Notebooks are another useful way to generate Markdown interactively. editR is on GitHub. editR("path/to/myfile.Rmd") Simon Garnier
knitr data display Add R to a markdown document and easily generate reports in HTML, Word and other formats. A must-have if you're interested in reproducible research and automating the journey from data analysis to report creation. CRAN. See the Minimal Examples page. Yihui Xie & others
listviewer data display, data wrangling Elegant way to view complex nested lists within R. GitHub timelyportfolio/listviewer. jsonedit(mylist) Kent Russell
DT data display Create a sortable, searchable table in one line of code with this R interface to the jQuery DataTables plug-in. GitHub rstudio/DT. datatable(mydf) RStudio
ggplot2 data visualization Powerful, flexible and well-thought-out dataviz package following 'grammar of graphics' syntax to create static graphics, but be prepared for a steep learning curve. CRAN. qplot(factor(myfactor), data=mydf, geom="bar", fill=factor(myfactor))
See my searchable ggplot2 cheat sheet and
time-saving code snippets .
Hadley Wickham
dygraphs data visualization Create HTML/JavaScript graphs of time series - one-line command if your data is an xts object. CRAN. dygraph(myxtsobject) JJ Allaire & RStudio
googleVis data visualization Tap into the Google Charts API using R. CRAN. mychart <- gvisColumnChart(mydata)
Numerous examples here
Markus Gesmann & others
metricsgraphics data visualization R interface to the metricsgraphics JavaScript library for bare-bones line, scatterplot and bar charts. GitHub hrbrmstr/metricsgraphics. See package intro Bob Rudis
RColorBrewer data visualization Not a designer? RColorBrewer helps you select color pallettes for your visualizations. CRAN. See Jennifer Bryan's tutorial Erich Neuwirth
leaflet mapping Map data using the Leaflet JavaScript library within R. GitHub rstudio/leaflet. See my tutorial RStudio
choroplethr mapping Easy ways to map data with built-in state, county, zip code and country geographic info; you can also import your own shape files. Recent update improved earlier issues with projections. CRAN. data(df_pop_state)
Free email course by pkg author
Ari Lamstein
tmap mapping Not the most polished-looking maps for publication or presentation, but this new package offers a very easy way to read in shape files and join data files with geographic info, as well as do some exploratory mapping. CRAN. See the package vignette or my mapping in R tutorial Martijn Tennekes
fitbitScraper misc Import Fitbit data from your account into R. CRAN. cookie <- login(email="", password="")
df <- get_daily_data(cookie, what="steps", "2015-01-01", "2015-05-18")
Cory Nisson
rga Web analytics Use Google Analytics with R. GitHub skardhamar/rga. See package README file andmy tutorial Bror Skardhamar
RSiteCatalyst Web analytics Use Adobe Analytics with R. GitHub randyzwitch/RSiteCatalyst. See intro video Randy Zwitch
roxygen2 package development Useful tools for documenting functions within R packages. CRAN.

See this short, easy-to-read blog post

on writing R packages

Hadley Wickham & others
shiny data visualization Turn R data into interactive Web applications. I've seen some nice (if sometimes sluggish) apps and it's got many enthusiasts. CRAN. See the tutorial RStudio
flexdashboard data visualization If Shiny is too complex and involved for your needs, this package offers a simpler (if somewhat less robust) solution based on R Markdown. CRAN. More info in Using flexdashboard JJ Allaire, RStudio & others
openxlsx misc If you need to write to an Excel file as well as read, this package is easy to use. CRAN. write.xlsx(mydf, "myfile.xlsx") Alexander Walker
gmodels data wrangling, data analysis There are several functions for modeling data here, but the one I use, CrossTable, simply creates cross-tabs with loads of options -- totals, proprotions and several statistical tests. CRAN. CrossTable(myxvector, myyvector, prop.t=FALSE, prop.chisq = FALSE) Gregory R. Warnes
car data wrangling car's recode function makes it easy to bin continuous numerical data into categories or factors. While base R's cut accomplishes the same task, I find recode's syntax to be more intuitive - just remember to put the entire recoding formula within double quotation marks. CRAN. recode(x, "1:3='Low'; 4:7='Mid'; 8:hi='High'") John Fox & others
rcdimple data visualization R interface to the dimple JavaScript library with numerous customization options. Good choice for JavaScript bar charts, among others. GitHub timelyportfolio/rcdimple. dimple(mtcars, mpg ~ cyl, type = "bar") Kent Russell
foreach data wrangling Efficient - and intuitive if you come from another programming language - for loops in R. CRAN. foreach(i=1:3) %do% sqrt(i)
Also see The Wonders of foreach
Revolution Analytics, Steve Weston
downloader data acquisition Wrapper for base R download function that eases dealing with files over https (although R 3.2.2 solves some of these issues as well). CRAN. download("", "", mode = "wb") NA
scales data wrangling While this package has many more sophisticated ways to help you format data for graphing, it's worth a download just for the comma(), percent() and dollar() functions. CRAN. comma(mynumvec) Hadley Wickham
plotly data visualization R interface to the Plotly JavaScript library that was open-sourced in late 2015. Graphs have a distinctive look and a promo for the Plotly site, which may not be for everyone, but it's full-featured, relatively easy to learn (especially if you know ggplot2) and includes an ggplotly() function for graphs created with ggplot2. CRAN. d <- diamonds[sample(nrow(diamonds), 1000), ]
plot_ly(d, x = carat, y = price, text = paste("Clarity: ", clarity), mode = "markers", color = carat, size = carat)
Carson Sievert & others
profvis programming Is your R code sluggish? This package gives you a visual representative of your code line by line so you can find the speed bottlenecks. CRAN. profvis({ your code here }) Winston Chang & others

A few important points for newbies:

To install a package from CRAN, use the command install.packages("packagename") -- of course substituting the actual package name for packagename and putting it in quotation marks. Package names, like pretty much everything else in R, are case sensitive.

To install from GitHub, it's easiest to use the install-github function from the devtools package, using the format devtools::install_github("githubaccountname/packagename") . That means you first want to install the devtools package on your system with install.packages("devtools") . Note that devtools sometimes needs some extra non-R software on your system -- more specifically, an Rtools download for Windows or Xcode for OS X . There's more information about devtools here .

In order to use a package's function during your R session, you need to do one of two things. One option is to load it into your R session with the library("packagename") or require("packagename") . The other is to call the function including the package name, like this: packagename::functioname() . Package names, like pretty much everything else in R, are case sensitive.

Want to learn more about handling data with R? See 4 data wrangling tasks in R for advanced beginners .


Put your ads here, just $200 per month.