Obtain the partial effect of x-variables from a VarPro analysis.

partialpro(object, xvar.names, nvar,
  target, learner, newdata, method = c("unsupv", "rnd", "auto"),
  verbose = FALSE, papply = mclapply, ...)

Arguments

object

VarPro object obtained from previous call to varpro.

xvar.names

Names of the x-variables to be used.

nvar

Number of variables to be used. Default is all.

target

For classification, an integer or character value specifying the class that the partial effect is computed for (defaults to the last class).

learner

Optional function for passing a user's personally constructed learner (see details below).

newdata

Optional data frame containing x-data used for the analysis. If present, partial effects are obtained for these data points. Otherwise the default action is to use the training data.

method

Isolation forest method used for unlimited virtual twins. Choices are "unsupv" (unsupervised analysis, the default), "rnd" (pure random splitting) and "auto" (auto-encoder, a type of multivariate forest). See isopro for details.

verbose

Print verbose output?

papply

Use mclapply or lapply.

...

Additional hidden options: "cut", "nsmp", "nvirtual", "nmin", "alpha", "df", "sampsize", "ntree", "nodesize", "mse.tolerance".

Details

Obtain partial effects for requested variables from a VarPro analysis. Some variables will have been filtered in the VarPro analysis and in such cases their partial effects cannot be obtained (as filtered variables are considered to be noisy variables).

Partial effects are obtained using predicted values from the random forest object built during the VarPro analysis. Unlimited Virtual Twins (UVT) are used to restrict the prediction call to partial plot data (i.e. the artificially created data used for the partial effects estimation) that meet an isolation forest criteria specified by the internal option cut, a small percentile value which filters unlikely to occur partial data. Isolation forests are constructed via the function isopro.

For regression, the partial effect is on the scale of the response; for survival it is either mortality (default) or RMST (the latter when the user specifies this in the original varpro call: see examples below); for classification it is the log-odds for the specified target class label (see examples below).

Partial effects are estimated locally using polynomial linear models fit to the predicted values from the learner. Degrees of freedom of the local polynomial model is set by option df (the default is 2 corresponding to a quadratic model).

By default, the random forest built during the VarPro analysis is used for prediction, however users can provide their own predictor using option learner. The user's learner should be trained using the training data used in the varpro analysis. learner then takes the form of a function whose input is a data frame containing feature values and whose output is the predicted value from the user's learner. For regression and survival, the output should be a vector. For classification, the output should be matrix whose columns are the predicted probability for each class label given in the order of the original class outcome. If the input is missing, the function should return the predicted values for the original training data. See the examples below which illustrate using an external: (1) random forest learner (which can sometimes be better than the random forests learner built during the VarPro analysis); (2) gradient tree boosting learner; and (3) BART (Bayesian Adaptive Regression Trees).

Author

Min Lu and Hemant Ishwaran

References

Ishwaran H. (2025). Multivariate Statistics: Classical Foundations and Modern Machine Learning, CRC (Chapman and Hall), in press.

Examples

##------------------------------------------------------------------
##
## Boston housing
##
##------------------------------------------------------------------

library(mlbench)
data(BostonHousing)
par(mfrow=c(2,3))
plot((oo.boston<-partialpro(varpro(medv~.,BostonHousing),nvar=6)))
# \donttest{

##------------------------------------------------------------------
##
## Boston housing using newdata option
##
##
##------------------------------------------------------------------

library(mlbench)
data(BostonHousing)
o <- varpro(medv~.,BostonHousing)
par(mfrow=c(2,3))
plot(partialpro(o,nvar=3))
## same but using newdata (set to first 6 cases of the training data)
plot(partialpro(o,newdata=o$x[1:6,],nvar=3))

##------------------------------------------------------------------
##
## Boston housing with externally constructed rf learner
##
##------------------------------------------------------------------

## varpro analysis
library(mlbench)
data(BostonHousing)
o <- varpro(medv~.,BostonHousing)

## default partial pro call
pro <- partialpro(o, nvar=3)

## partial pro call using built in rf learner
mypro <- partialpro(o, nvar=3, learner=rf.learner(o))

## compare the two
par(mfrow=c(2,3))
plot(pro)
plot(mypro, ylab="external rf learner")

##------------------------------------------------------------------
##
## Boston housing:  tree gradient boosting learner, bart learner
##
##------------------------------------------------------------------

if (library("gbm", logical.return=TRUE) &&
    library("BART", logical.return=TRUE)) {

## varpro analysis
library(parallel)
library(mlbench)
data(BostonHousing)
o <- varpro(medv~.,BostonHousing)

## default partial pro call
pro <- partialpro(o, nvar=3)

## partial pro call using built in gradient boosting learner
## mypro <- partialpro(o, nvar=3, learner=gbm.learner(o, n.trees=1000, n.cores=detectCores()))

## The only way to pass check-as-cran
mypro <- partialpro(o, nvar=3, learner=gbm.learner(o, n.trees=1000, n.cores=2))

## partial pro call using built in bart learner
## mypro2 <- partialpro(o, nvar=3, learner=bart.learner(o, mc.cores=detectCores()))

## The only way to pass check-as-cran
mypro2 <- partialpro(o, nvar=3, learner=bart.learner(o, mc.cores=2))

## compare the learners
par(mfrow=c(3,3))
plot(pro)
plot(mypro, ylab="external boosting learner")
plot(mypro2, ylab="external bart learner")

}

##------------------------------------------------------------------
##
## peak vo2 with 5 year rmst
##
##------------------------------------------------------------------

data(peakVO2, package = "randomForestSRC")
par(mfrow=c(2,3))
plot((oo.peak<-partialpro(varpro(Surv(ttodead,died)~.,peakVO2,rmst=5),nvar=6)))

##------------------------------------------------------------------
##
## veteran data set with celltype as a factor
##
##------------------------------------------------------------------

data(veteran, package = "randomForestSRC")
dta <- veteran
dta$celltype <- factor(dta$celltype)
par(mfrow=c(2,3))
plot((oo.veteran<-partialpro(varpro(Surv(time, status)~., dta), nvar=6)))

##------------------------------------------------------------------
##
## iris: classification analysis showing partial effects for all classes
##
##------------------------------------------------------------------

o.iris <- varpro(Species~.,iris)
yl <- paste("log-odds", levels(iris$Species))
par(mfrow=c(3,2))
plot((oo.iris.1 <- partialpro(o.iris, target=1, nvar=2)),ylab=yl[1])
plot((oo.iris.2 <- partialpro(o.iris, target=2, nvar=2)),ylab=yl[2])
plot((oo.iris.3 <- partialpro(o.iris, target=3, nvar=2)),ylab=yl[3])


##------------------------------------------------------------------
##
## iowa housing data
##
##------------------------------------------------------------------

## quickly impute the data; log transform the outcome
data(housing, package = "randomForestSRC")
housing <- randomForestSRC::impute(SalePrice~., housing, splitrule="random", nimpute=1)
dta <- data.frame(data.matrix(housing))
dta$y <- log(housing$SalePrice)
dta$SalePrice <- NULL

## partial effects analysis
o.housing <- varpro(y~., dta, nvar=Inf)
oo.housing <- partialpro(o.housing,nvar=15)
par(mfrow=c(3,5))
plot(oo.housing)

# }