(This article was first published on R Tricks – Data Science Riot! , and kindly contributed toR-bloggers)
The most difficult thing about working with BLS data is gaining a clear understanding on what data are available and what they represent. Some of the more popular data sets can be found on the BLS Databases, Tables & Calculations website. The selected examples below do not include all series or databases.
The first step in analyzing any of these data in R is to install the blscrapeR package from CRAN.
Current Population Survey (CPS)
The CPS includes median weekly earnings by occupation, among other things.
For example, we can use blscrapeR to pull data from the API for the median weekly earnings for Database Administrators and Software Developers.
library(blscrapeR) # Median Usual Weekly Earnings by Occupation, Unadjusted Second Quartile. # In current dollars df <- bls_api(c("LEU0254530800", "LEU0254530600"), startyear = 2000, endyear = 2015) # Plot library(ggplot2) ggplot(df, aes(x=date, y=value, color=seriesID)) + geom_line() + labs(title = "Median Weekly Earnings by Occupation") + theme(legend.position="top") + scale_color_discrete(name="Occupation", breaks=c("LEU0254530800", "LEU0254530600"), labels=c("Database Admins.", "Software Devs."))
Occupational Employment Statistics (OES)
The OES contains similar wage data found in the CPS, but often has more resolution in certain geographic areas. Unlike the CPS, the OES is an annual survey and does not keep time series data.
For example, we may want to compare the average hourly wage of Computer and Information Systems Managers in Orlando, FL to those in San Jose, CA. Notice, below the survey only returns values for 2015.
# Computer and Information Systems Managers in Orlando, FL and San Jose, CA. # Orlando: "OEUM003674000000011302103" # San Jose: "OEUM004194000000011302108" library(blscrapeR) df <- bls_api(c("OEUM003674000000011302103", "OEUM004194000000011302108")) head(df)
## year period periodName value footnotes seriesID date ## 1 2015 A01 Annual 68.03 OEUM003674000000011302103 2015-01-31 ## 2 2015 A01 Annual 85.86 OEUM004194000000011302108 2015-01-31
Another OES example would be to grab the most recent Annual mean wage for All Occupations in All Industries in the United States.
library(blscrapeR) df <- bls_api("OEUN000000000000000000004") df
## year period periodName value footnotes seriesID date ## 1 2015 A01 Annual 48320 OEUN000000000000000000004 2015-01-31
Employer Cost for Employee Compensation
This data set includes time series data on how much employers pay for employee benefits as a total cost and as a percent of employee wages and salaries.
For example, if we want to see the total cost of benefits per hour work and also see what percentage that is of the total compensation, we could run the following script.
library(blscrapeR) library(dplyr) library(tidyr) df <- bls_api(c("CMU1030000000000D", "CMU1030000000000P")) # Spread series ids and rename columns to human readable format. df.sp <- spread(df, seriesID, value) %>% rename("hourly_cost"=CMU1030000000000D, "pct_of_wages"=CMU1030000000000P) # Percentages are represented as floating integers. Fix this to avoid confusion. df.sp$pct_of_wages <- df.sp$pct_of_wages*0.01 df.sp
year period periodName footnotes date hourly_cost pct_of_wages ## 1 2014 Q01 1st Quarter 2014-01-31 9.97 0.312 ## 2 2014 Q02 2nd Quarter 2014-02-28 10.00 0.313 ## 3 2014 Q03 3rd Quarter 2014-03-31 10.07 0.313 ## 4 2014 Q04 4th Quarter 2014-04-30 10.49 0.316 ## 5 2015 Q01 1st Quarter 2015-01-31 10.61 0.317 ## 6 2015 Q02 2nd Quarter 2015-02-28 10.47 0.315 ## 7 2015 Q03 3rd Quarter 2015-03-31 10.48 0.314 ## 8 2015 Q04 4th Quarter 2015-04-30 10.52 0.313 ## 9 2016 Q01 1st Quarter 2016-01-31 10.70 0.315
National Compensation Survey-Benefits
This survey includes data on how many Americans have access to certain benefits. For example, we can see the percentage of those who have access to paid vacation days and those who have access to Health insurance through their employers.
library(blscrapeR) library(dplyr) library(tidyr) df <- bls_api(c("NBU10500000000000033030", "NBU11500000000000028178")) # Spread series ids and rename columns to human readable format. df.sp <- spread(df, seriesID, value) %>% rename("pct_paid_vacation"=NBU10500000000000033030, "pct_health_ins"=NBU11500000000000028178) # Value data are in whole numbers but represent percentages. Fix this to avoid confusion. df.sp$pct_paid_vacation <- df.sp$pct_paid_vacation*0.01 df.sp$pct_health_ins <- df.sp$pct_health_ins*0.01 df.sp
year period periodName footnotes date pct_paid_vacation pct_health_ins ## 1 2013 A01 Annual 2013-01-31 0.74 0.72 ## 2 2014 A01 Annual 2014-01-31 0.74 0.72 ## 3 2015 A01 Annual 2015-01-31 0.74 0.72
For more on blscrapeR, you can visit the package's vignettes directory .