Calculate Wages and Benefits in R with blscrapeR

Datetime:2016-08-23 01:44:35          Topic: DataBase  R Program           Share

(This article was first published on R Tricks – Data Science Riot! , and kindly contributed toR-bloggers)

The most difficult thing about working with BLS data is gaining a clear understanding on what data are available and what they represent. Some of the more popular data sets can be found on the BLS Databases, Tables & Calculations website. The selected examples below do not include all series or databases.

Install blscrapeR

The first step in analyzing any of these data in R is to install the blscrapeR package from CRAN.

install.packages('blscrapeR')

Current Population Survey (CPS)

The CPS includes median weekly earnings by occupation, among other things.

For example, we can use blscrapeR to pull data from the API for the median weekly earnings for Database Administrators and Software Developers.

library(blscrapeR)
# Median Usual Weekly Earnings by Occupation, Unadjusted Second Quartile.
# In current dollars
df <- bls_api(c("LEU0254530800", "LEU0254530600"),
                startyear = 2000, endyear = 2015)
# Plot
library(ggplot2)
ggplot(df, aes(x=date, y=value, color=seriesID)) +
    geom_line() +
    labs(title = "Median Weekly Earnings by Occupation") +
    theme(legend.position="top") +
    scale_color_discrete(name="Occupation",
        breaks=c("LEU0254530800", "LEU0254530600"),
        labels=c("Database Admins.", "Software Devs."))

Occupational Employment Statistics (OES)

The OES contains similar wage data found in the CPS, but often has more resolution in certain geographic areas. Unlike the CPS, the OES is an annual survey and does not keep time series data.

For example, we may want to compare the average hourly wage of Computer and Information Systems Managers in Orlando, FL to those in San Jose, CA. Notice, below the survey only returns values for 2015.

# Computer and Information Systems Managers in Orlando, FL and San Jose, CA.
# Orlando: "OEUM003674000000011302103"
# San Jose: "OEUM004194000000011302108"
library(blscrapeR)
df <- bls_api(c("OEUM003674000000011302103", "OEUM004194000000011302108"))
head(df)

Output:

##   year period periodName value footnotes                  seriesID         date
## 1 2015    A01     Annual 68.03           OEUM003674000000011302103   2015-01-31
## 2 2015    A01     Annual 85.86           OEUM004194000000011302108   2015-01-31

Another OES example would be to grab the most recent Annual mean wage for All Occupations in All Industries in the United States.

library(blscrapeR)
df <- bls_api("OEUN000000000000000000004")
df

Output:

##   year period periodName value footnotes                  seriesID        date
## 1 2015    A01     Annual 48320           OEUN000000000000000000004   2015-01-31

Employer Cost for Employee Compensation

This data set includes time series data on how much employers pay for employee benefits as a total cost and as a percent of employee wages and salaries.

For example, if we want to see the total cost of benefits per hour work and also see what percentage that is of the total compensation, we could run the following script.

library(blscrapeR)
library(dplyr)
library(tidyr)
df <- bls_api(c("CMU1030000000000D", "CMU1030000000000P"))

# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
    rename("hourly_cost"=CMU1030000000000D, "pct_of_wages"=CMU1030000000000P)

# Percentages are represented as floating integers. Fix this to avoid confusion.
df.sp$pct_of_wages <- df.sp$pct_of_wages*0.01
df.sp

Output:

year period  periodName footnotes       date hourly_cost pct_of_wages
## 1 2014    Q01 1st Quarter           2014-01-31        9.97        0.312
## 2 2014    Q02 2nd Quarter           2014-02-28       10.00        0.313
## 3 2014    Q03 3rd Quarter           2014-03-31       10.07        0.313
## 4 2014    Q04 4th Quarter           2014-04-30       10.49        0.316
## 5 2015    Q01 1st Quarter           2015-01-31       10.61        0.317
## 6 2015    Q02 2nd Quarter           2015-02-28       10.47        0.315
## 7 2015    Q03 3rd Quarter           2015-03-31       10.48        0.314
## 8 2015    Q04 4th Quarter           2015-04-30       10.52        0.313
## 9 2016    Q01 1st Quarter           2016-01-31       10.70        0.315

National Compensation Survey-Benefits

This survey includes data on how many Americans have access to certain benefits. For example, we can see the percentage of those who have access to paid vacation days and those who have access to Health insurance through their employers.

library(blscrapeR)
library(dplyr)
library(tidyr)
df <- bls_api(c("NBU10500000000000033030", "NBU11500000000000028178"))

# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
    rename("pct_paid_vacation"=NBU10500000000000033030, "pct_health_ins"=NBU11500000000000028178)

# Value data are in whole numbers but represent percentages. Fix this to avoid confusion.
df.sp$pct_paid_vacation <- df.sp$pct_paid_vacation*0.01
df.sp$pct_health_ins <- df.sp$pct_health_ins*0.01
df.sp

Output:

year period periodName footnotes       date pct_paid_vacation   pct_health_ins
## 1 2013    A01     Annual           2013-01-31              0.74             0.72
## 2 2014    A01     Annual           2014-01-31              0.74             0.72
## 3 2015    A01     Annual           2015-01-31              0.74             0.72

For more on blscrapeR, you can visit the package's vignettes directory .





About List