ivarpro.RdIndividual Variable Priority (iVarPro) computes case-specific (individual-level) variable importance scores. For each observation in the data and for each predictor identified by the VarPro analysis, iVarPro returns a local gradient-based priority measure that quantifies how sensitive that case's prediction is to changes in that variable.
ivarpro(object,
adaptive = TRUE,
cut = NULL,
cut.max = 1,
ncut = 51,
nmin = 20, nmax = 150,
y.external = NULL,
noise.na = TRUE,
max.rules.tree = NULL,
max.tree = NULL,
use.loo = TRUE,
use.abs = FALSE,
path.store.membership = TRUE,
keep.data = TRUE)varpro object from a previous call to
varpro, or a rfsrc object.
Logical. If FALSE and cut is not
supplied, the cut grid is constructed as seq(0, cut.max,
length.out = ncut). If TRUE (default) and cut is not
supplied, a data-adaptive upper bound for the neighborhood scale is
computed from the sample size using a simple bandwidth-style
rule-of-thumb, and cut is constructed as a sequence from 0 to
this data-adaptive maximum (subject to cut.max). This provides
a convenient way to automatically sharpen the local neighborhood for
case-specific gradients when the sample size is moderate to large.
Optional user-supplied sequence of \(\lambda\) values
used to relax the constraint region in the local linear regression
model. For continuous release variables, each value in cut is
calibrated so that cut = 1 corresponds to one standard
deviation of the release coordinate. If cut is supplied, it is
used as-is and the arguments cut.max, ncut, and
adaptive are ignored. For binary or one-hot encoded release
variables, the full released region is used and cut does not
control neighborhood size.
Maximum value of the \(\lambda\) grid used to define
the local neighborhood for continuous release variables when
cut is not supplied. By default, cut is constructed as
seq(0, cut.max, length.out = ncut) (or up to a data-adaptive
value if adaptive = TRUE). Smaller values of cut.max
yield more local, sharper case-specific gradients, while larger values
yield smoother, more global behavior.
Length of the cut grid when cut is not
supplied. The grid is constructed as seq(0, cut.max, length.out
= ncut) (or up to an adaptively chosen maximum if adaptive =
TRUE).
Minimum number of observations required for fitting a local linear model.
Maximum number of observations allowed for fitting a local
linear model. Internally, nmax is capped at 10% of the sample
size.
Optional user-supplied response vector or matrix to use as the dependent variable in the local linear regression. Must have the same number of rows as the feature matrix and match the dimension and type expected for the outcome family.
Logical. If TRUE (default), gradients for noisy
or non-signal variables are set to NA; if FALSE, they
are set to zero.
Maximum number of rules per tree. If
unspecified, the value from the varpro object is used, while
for rfsrc objects, a default value is used.
Maximum number of trees used to extract rules. If
unspecified, the value from the varpro object is used, while
for rfsrc objects, a default value is used.
Logical. If TRUE (default), leave-one-out
cross-validation is used to select the best neighborhood size (i.e.,
the best value in cut) for each rule and release variable. If
FALSE, the neighborhood is chosen to use the largest available
sample that satisfies nmin and nmax.
Use the absolute gradient for individual importance?
Default is FALSE which uses the actual gradient.
Store the rule membership indices (OOB
case IDs) in the returned object for later ladder/band calculations?
Setting FALSE can substantially reduce memory usage when the
number of rules is large, but disables ladder-based bands in
partial.ivarpro() and prevents ivarpro_band() from being
used. Default is TRUE.
Save the x and y data (default is TRUE)? Used
for downstream plots.
Understanding individual-level (case-specific) variable importance is important in applications where decisions are made at the level of a single person, unit, or record. A predictor may have only a modest average effect, yet be highly influential for certain cases, or the direction of its effect may differ across individuals.
The VarPro framework summarizes population-level importance by defining feature-space regions using rule-based splitting and computing importance using only observed data. iVarPro (Lu and Ishwaran, 2025) extends this idea to the individual level by quantifying how sensitive each case's prediction is to small changes in a predictor identified by the VarPro rule set.
For each VarPro rule, iVarPro considers the corresponding rule-defined region and then releases the rule along the rule's release coordinate. Intuitively, releasing a region means keeping the other rule constraints in place while allowing additional variation in the released variable, which provides the information needed to estimate a local directional effect. A simple local linear regression is then fit on this released region, and the resulting slope is used as a local, gradient-based priority score. Case-specific scores are obtained by aggregating the relevant rule-level gradients over the rules that apply to each case.
Neighborhood size and cut.max.
For continuous release variables, the size of the local neighborhood
used for slope estimation is controlled by cut (constructed from
cut.max and ncut when not supplied). Smaller neighborhoods
produce more local behavior and can better reflect sharp changes, while
larger neighborhoods produce smoother, more global behavior. When
use.loo = TRUE, the neighborhood size is chosen in a data-driven
way using a leave-one-out criterion; when use.loo = FALSE, the
choice is based on meeting the requested sample-size bounds
nmin and nmax. When adaptive = TRUE and cut is
not supplied, an additional sample-size based rule is used to limit the
maximum neighborhood scale (subject to cut.max).
For binary or one-hot encoded release variables, iVarPro interprets the
local effect as a scaled finite difference between the two levels (0 and
1), conditional on the other rule constraints; in this case cut
does not control the neighborhood along the binary coordinate.
Cut.max ladder (neighborhood sensitivity).
Because the choice of neighborhood scale can affect the estimated local
gradients, iVarPro also records a ladder of rule-level gradient
estimates across the candidate neighborhood sizes defined by the
cut grid. These ladder values can be summarized (e.g., ranges or
quantiles) and used to visualize how sensitive case-specific gradients
are to the neighborhood choice, without repeatedly refitting iVarPro for
many different cut.max values. Ladder-based case summaries require
rule membership information; set path.store.membership = TRUE to
enable ladder bands and related summaries, or leave it FALSE to
reduce memory usage when the number of rules is very large. See
examples below.
Settings that are currently handled. The flexibility of this framework makes it suitable for quantifying case-specific variable importance in regression, classification, and survival settings. Currently, multivariate forests are not handled.
For univariate outcomes (and two-class classification treated as a single
score), a numeric data.frame of dimension \(n \times p\) containing
case-specific (individual-level) variable priority values, where \(n\) is
the number of observations and \(p\) is the number of predictors in
object$xvar.names.
Each row corresponds to a case (observation) in the original data.
Each column corresponds to a predictor variable in
object$xvar.names.
The entry in row \(i\) and column \(j\) is the iVarPro importance score
for variable \(j\) for case \(i\), measuring the local sensitivity of
that case's prediction to changes in that variable. Predictors that are never
used as release variables in the VarPro rule set may appear with constant
NA values (when noise.na = TRUE) or constant zero values (when
noise.na = FALSE).
Ladder/path information.
The returned object carries an attribute "ivarpro.path" containing
additional information used for ladder-based summaries and plotting. In
particular:
cutThe full cut grid used to evaluate candidate local
neighborhoods.
cut.ladderThe interior values of cut (excluding the
endpoints) used for the cut.max ladder path.
rule.imp.ladderA numeric matrix of dimension \(R \times L\) storing rule-level gradients selected under each ladder truncation, where \(R\) is the number of retained rules and \(L = length(cut.ladder)\).
rule.variableInteger vector of length \(R\) giving the release-variable index for each retained rule.
oobMembershipOptional list (length \(R\)) giving the OOB
membership indices for each retained rule; included only when
path.store.membership = TRUE.
Additional tuning flags and rule metadata (e.g., use.loo,
adaptive, and tree/branch identifiers) may also be included for
diagnostics.
Lu, M. and Ishwaran, H. (2025). Individual variable priority: a model-independent local gradient method for variable importance. Artificial Intelligence Review, 58:407.
# \donttest{
## ------------------------------------------------------------
##
## survival example with shap-like plot
##
## ------------------------------------------------------------
data(peakVO2, package = "randomForestSRC")
o <- varpro(Surv(ttodead, died)~., peakVO2, ntree = 50)
## canonical standard analysis
imp1 <- ivarpro(o)
shap.ivarpro(imp1)
## non-adaptive analysis
imp2 <- ivarpro(o, adaptive = FALSE)
shap.ivarpro(imp2)
## non-adaptive using a small cut.max
imp3 <- ivarpro(o, cut.max = 0.5, adaptive = FALSE)
shap.ivarpro(imp3)
## ------------------------------------------------------------
##
## synthetic regression example with partial plot
##
## ------------------------------------------------------------
## true regression function
true.function <- function(which.simulation) {
if (which.simulation == 1) {
function(x1, x2) { 1 * (x2 <= .25) +
15 * x2 * (x1 <= .5 & x2 > .25) +
(7 * x1 + 7 * x2) * (x1 > .5 & x2 > .25) }
}
else if (which.simulation == 2) {
function(x1, x2) { r <- x1^2 + x2^2; 5 * r * (r <= .5) }
}
else {
function(x1, x2) { 6 * x1 * x2 }
}
}
## simulation function
simfunction <- function(n = 1000, true.function, d = 20, sd = 1) {
d <- max(2, d)
X <- matrix(runif(n * d, 0, 1), ncol = d)
dta <- data.frame(list(
x = X,
y = true.function(X[, 1], X[, 2]) + rnorm(n, sd = sd)
))
colnames(dta)[1:d] <- paste("x", 1:d, sep = "")
dta
}
## simulate the data
which.simulation <- 1
df <- simfunction(n = 500, true.function(which.simulation))
## varpro analysis
vp <- varpro(y ~ ., df)
## ivarpro analysis
imp <- ivarpro(vp)
## partial plot of x2
partial.ivarpro(imp, var="x2")
## partial plot of x2 without ladder band
partial.ivarpro(imp, var="x2", ladder=FALSE)
## optional: use only a subset of ladder cuts
partial.ivarpro(imp, var="x2", ladder=TRUE, ladder.cuts=1:8)
## partial plot with color/size using x1 (color) and y (size)
partial.ivarpro(imp, var="x2", col.var="x1", size.var="y")
## ------------------------------------------------------------
##
## survival example with partial plot
##
## ------------------------------------------------------------
data(peakVO2, package = "randomForestSRC")
## varpro/importance call
i.pv <- ivarpro(varpro(Surv(ttodead, died)~., peakVO2))
## partial plot of peak vo2
## color displays interval (a measure of exercise time)
## size displays "y" which is predicted mortality in survival
partial.ivarpro(i.pv, var="peak.vo2", col.var="interval", size.var="y")
## same but using beta blockers for color
partial.ivarpro(i.pv, var="peak.vo2", col.var="betablok", size.var="y")
# }