Individual Variable Priority: A Model-Independent Local Gradient Method for Variable Importance

ivarpro(object,
        cut = seq(.05, 1, length=21),
        nmin = 20, nmax = 150,
        y.external = NULL,
        noise.na = TRUE,
        papply = mclapply,
        max.rules.tree = 150,
        max.tree = 150)

Arguments

object

VarPro object from a previous call to varpro, or a rfsrc object.

cut

Sequence of \(lambda\) values used to relax the constraint region in the local linear regression model. Calibrated so that cut = 1 corresponds to one standard deviation of the release coordinate.

nmin

Minimum number of observations required for fitting a local linear model.

nmax

Maximum number of observations allowed for fitting a local linear model.

y.external

Optional user-supplied response vector to use as the dependent variable in the local linear regression. Must match the dimension and type expected for the outcome family.

noise.na

Logical. If TRUE (default), gradients for noisy or non-signal variables are set to NA; if FALSE, they are set to zero.

papply

Apply method; either mclapply or lapply.

max.rules.tree

Optional. Maximum number of rules per tree. If unspecified, the value from the VarPro object is used.

max.tree

Optional. Maximum number of trees used for rule extraction. If unspecified, the value from the VarPro object is used.

Details

Understanding individual-level variable importance is critical in applications where personalized decisions are required. Traditional variable importance methods focus on average (population-level) effects and often fail to capture heterogeneity across individuals. In many real-world problems, it is not sufficient to determine whether a variable is important on average, we must also understand how it affects individual predictions.

The VarPro framework identifies feature-space regions through rule-based splitting and computes importance using only observed data. This avoids biases introduced by permutation or synthetic data, leading to robust, population-level importance estimates. However, VarPro does not directly capture individual-level effects.

To address this limitation, individual variable priority (iVarPro) extends VarPro by estimating the local gradient of each feature, quantifying how small changes in a variable influence an individual's predicted outcome. These gradients serve as natural measures of sensitivity and provide an interpretable notion of individualized importance.

iVarPro leverages the release region concept from VarPro. A region \(R\) is first defined using VarPro rules. Since using only data within \(R\) often results in insufficient sample size for stable gradient estimation, iVarPro releases \(R\) along a coordinate \(s\). This means the constraint on \(s\) is removed while all others are held fixed, yielding additional variation specifically in the \(s\)-direction, precisely what is needed to compute directional derivatives.

Local gradients are then estimated via linear regression on the expanded region. The parameter cut controls the amount of constraint relaxation. A value of cut = 1 corresponds to one standard deviation of the release coordinate, calibrated automatically from the data.

The flexibility of this framework makes it suitable for quantifying individual-level importance in regression, classification, and survival settings.

Author

Min Lu and Hemant Ishwaran

References

Lu, M. and Ishwaran, H. (2025). Individual variable priority: a model-independent local gradient method for variable importance.

See also

Examples

# \donttest{
## ------------------------------------------------------------
##
## synthetic regression example 
##
## ------------------------------------------------------------

## true regression function
true.function <- function(which.simulation) {
  if (which.simulation == 1) {
    function(x1,x2) {1*(x2<=.25) +
      15*x2*(x1<=.5 & x2>.25) + (7*x1+7*x2)*(x1>.5 & x2>.25)}
  }
  else if (which.simulation == 2) {
    function(x1,x2) {r=x1^2+x2^2;5*r*(r<=.5)}
  }
  else {
    function(x1,x2) {6*x1*x2}
  }
}

## simulation function
simfunction = function(n = 1000, true.function, d = 20, sd = 1) {
  d <- max(2, d)
  X <- matrix(runif(n * d, 0, 1), ncol = d)
  dta <- data.frame(list(x = X, y = true.function(X[, 1], X[, 2]) + rnorm(n, sd = sd)))
  colnames(dta)[1:d] <- paste("x", 1:d, sep = "")
  dta
}

## iVarPro importance plot
ivarpro.plot <- function(dta, release=1, combined.range=TRUE,
                     cex=1.0, cex.title=1.0, sc=5.0, gscale=30, title=NULL) {
  x1 <- dta[,"x1"]
  x2 <- dta[,"x2"]
  x1n = expression(x^{(1)})
  x2n = expression(x^{(2)})
  if (release==1) {
    if (is.null(title)) title <- bquote("iVarPro Estimated Gradient " ~ x^{(1)})
    cex.pt <- dta[,"Importance.x1"]
  }
  else {
    if (is.null(title)) title <- bquote("iVarPro Estimated Gradient " ~ x^{(2)})
    cex.pt <- dta[,"Importance.x2"]
  }
  if (combined.range) {
    cex.pt <- cex.pt / max(dta[, c("Importance.x1", "Importance.x2")],na.rm=TRUE)
  }
  rng <- range(c(x1,x2))
  par(mar=c(4,5,5,1),mgp=c(2.25,1.0,0))
  par(bg="white")
  gscalev <- gscale
  gscale <- paste0("gray",gscale)
  plot(x1,x2,xlab=x1n,ylab=x2n,
       ylim=rng,xlim=rng,
       col = "#FFA500", pch = 19,
       cex=(sc*cex.pt),cex.axis=cex,cex.lab=cex,
       panel.first = rect(par("usr")[1], par("usr")[3], par("usr")[2], par("usr")[4], 
                          col = gscale, border = NA))
  abline(a=0,b=1,lty=2,col= if (gscalev<50) "white" else "black")
  mtext(title,cex=cex.title,line=.5)
}

## simulate the data
which.simulation <- 1
df <- simfunction(n = 500, true.function(which.simulation))

## varpro analysis
o <- varpro(y~., df)

## canonical ivarpro analysis
imp1 <- ivarpro(o)

## ivarpro analysis with custom lambda
imp2 <- ivarpro(o, cut = seq(.05, .75, length=21))

## build data for plotting the results
df.imp1 <- data.frame(Importance = imp1, df[,c("x1","x2")])
df.imp2 <- data.frame(Importance = imp2, df[,c("x1","x2")])

## plot the results
par(mfrow=c(2,2))
ivarpro.plot(df.imp1,1)
ivarpro.plot(df.imp1,2)
ivarpro.plot(df.imp2,1)
ivarpro.plot(df.imp2,2)

# }