Handling required and missing R packages in Microsoft R Services

Datetime:2016-08-23 01:46:00          Topic: R Program           Share

(This article was first published on R – TomazTsql , and kindly contributed toR-bloggers)

I have seen several time, that execute R code using procedure sp_execute_external_script was not valid due to missing library or library dependencies.

Problem is – in general – not solved out of the box. But can be solved using and maintaining a list of installed libraries used by Microsoft R services or by simply create a “pre-code” R code to do a check for you.

In both cases, user will end up with additional code, but it might be a good check if you are installing library in production and you run such check prior to running any relevant R code.

Let’s start with simple R code:

USE WideWorldImporters;
GO

EXECUTE sp_execute_external_script    
           @language = N'R'
          ,@script=N'library(Hmisc) 
                    df <- data.frame(rcorr(as.matrix(sp_RStats_query), 
                            type="pearson")$P)
                    OutputDataSet<-df'
          ,@input_data_1 = N'SELECT 
                     SupplierID
                    ,UnitPackageID
                    ,OuterPackageID
                FROM [Warehouse].[StockItems]'
          ,@input_data_1_name = N'sp_RStats_query'
    WITH RESULT SETS ((SupplierID NVARCHAR(200)
                    ,UnitPackageID NVARCHAR(200)
                    ,OuterPackageID NVARCHAR(200)));

This code will in my case return error message, that R Service is missing a specific library in order to execute R code – in my case library Hmisc.

So the first step to solve this issue is to declare variable for R script and parametrize  @script parameter for procedure sp_execute_external_script.

DECLARE @OutScript NVARCHAR(MAX) 
SET @OutScript =N'library(Hmisc) 
                    df <- data.frame(rcorr(as.matrix(sp_RStats_query), 
                     type="pearson")$P)
                    OutputDataSet<-df'

EXECUTE sp_execute_external_script    
           @language = N'R'
          ,@script= @OutScript
          ,@input_data_1 = N'SELECT 
                     SupplierID
                    ,UnitPackageID
                    ,OuterPackageID
                FROM [Warehouse].[StockItems]'
          ,@input_data_1_name = N'sp_RStats_query'
    WITH RESULT SETS ((
                         SupplierID    NVARCHAR(200)
                        ,UnitPackageID NVARCHAR(200)
                        ,OuterPackageID NVARCHAR(200)
                    ));

Now we need to do a string search for following patterns:

  • library (c(package1,package2))
  • library (package1),  library(package2)
  • library(package1)

All these reserved R words denote the need for particular library/libraries installation.

DECLARE @OutScript NVARCHAR(MAX) 
SET @OutScript =N'library(Hmisc)
                  library(test123) 
                    df <- data.frame(rcorr(as.matrix(sp_RStats_query), 
                      type="pearson")$P)
                    OutputDataSet<-df'

/*
***************************************************************
START: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES
***************************************************************
*/


DECLARE @Tally TABLE (num TINYINT,R_Code NVARCHAR(MAX))
INSERT INTO @Tally VALUES (1,@OutScript)
DECLARE @libstatement NVARCHAR(MAX)
DECLARE @cmdstatement NVARCHAR(MAX)

;WITH CTE_R(num,R_Code, libname)
AS
(
SELECT
      1 AS num,
      RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS  R_Code, 
      substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')',
 R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname
FROM @Tally
WHERE  
        CHARINDEX('(', R_Code, 0) > 0 
    AND CHARINDEX('library(',R_Code,0) > 0

UNION ALL

SELECT
     1 AS num,
     RIGHT(R_Code, LEN(R_Code) - CHARINDEX(')', R_Code, 0)) AS  R_Code,
     substring(R_Code, CHARINDEX('library(', R_Code, 0) + 0, CHARINDEX(')', 
R_Code, 0) - CHARINDEX('library(', R_Code, 0) + 1) AS libname
FROM CTE_R
WHERE 
    CHARINDEX('(', R_Code, 0) > 0 
AND CHARINDEX('library(',R_Code,0) > 0

)
, fin AS
(
SELECT TOP 1 stuff((SELECT ' install.packages(''''' + 
REPLACE(REPLACE(REPLACE(c1.libname,'library',''),')',''),'(','') + '''''
           , dependencies = T)'
              FROM CTE_R AS c1 
              WHERE 
                    c1.num = c2.num
              FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'') AS lib_stat
              FROM CTE_R AS c2
)
SELECT 
       [email protected]

= lib_stat FROM fin SET @cmdstatement = 'EXEC xp_cmdshell ''"C:\Program Files\Microsoft SQL Server\ MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd -e ' + @libstatement + '''' EXEC SP_EXECUTESQL @cmdstatement /* *************************************************************** END: CODE FOR CHECKING AND INSTALLING MISSING R LIBRARIES *************************************************************** */ EXECUTE sp_execute_external_script                @language = N'R'           ,@script= @OutScript           ,@input_data_1 = N'SELECT                      SupplierID                     ,UnitPackageID                     ,OuterPackageID                 FROM [Warehouse].[StockItems]'           ,@input_data_1_name = N'sp_RStats_query'     WITH RESULT SETS ((                          SupplierID    NVARCHAR(200)                         ,UnitPackageID NVARCHAR(200)                         ,OuterPackageID NVARCHAR(200)                     ));

Result in this case will be successful with correct R results and sp_execute_external_script will not return error for missing libraries.

I added a “fake” library called test123 for testing purposes if all the libraries will be installed successfully.

At the end the script generated xp_cmdshell command (in one line):

EXEC xp_cmdshell '"C:\Program Files\Microsoft SQL Server\
MSSQL13.MSSQLSERVER\R_SERVICES\bin\R.EXE" cmd 
-e install.packages(''Hmisc'') install.packages(''test123'')'

You might also experience the following error in the output of xp_cmdshell command:

In this case, go to the following location and enable write permission for this folder and subfolders.

For the end, you will always find more elegant and easy way to write R code that will inadvertently check the installation (as well as version and dependencies for library) with following R code:

if(!is.element("Hmisc", installed.packages()))
{install.packages("Hmisc", dependencies = T)
}else{library("Hmisc")}

So the original code can simply be changed to:

USE WideWorldImporters;
GO

EXECUTE sp_execute_external_script    
           @language = N'R'
          ,@script=N'if(!is.element("Hmisc", installed.packages()))
                      {install.packages("Hmisc", dependencies = T)
                        }else{library("Hmisc")}
                    df <- data.frame(rcorr(as.matrix(sp_RStats_query), 
 type="pearson")$P)
                    OutputDataSet<-df'
          ,@input_data_1 = N'SELECT 
                     SupplierID
                    ,UnitPackageID
                    ,OuterPackageID
                FROM [Warehouse].[StockItems]'
          ,@input_data_1_name = N'sp_RStats_query'
    WITH RESULT SETS ((SupplierID NVARCHAR(200)
                    ,UnitPackageID NVARCHAR(200)
                    ,OuterPackageID NVARCHAR(200)));

But unfortunately, one can not always count on the consistence of developer or data scientists or author of the code, that they will always add a simple check for library installation.

Code is available at GitHub .

Happy R-TSQLing!





About List