Estimates the proportional change in variance (PCV) sequentially by fitting intermediate (partially-adjusted) models, adding each predictor one-by-one. The step-specific PCV is the change in between-stratum variance contributed by a predictor given the variables already in the model. Because the steps are sequential it is order-dependent: it reflects each variable's marginal, model-dependent change, not an order-invariant “unique” contribution.
Usage
stepwise_pcv(
data,
outcome,
vars,
engine = "lme4",
family = "gaussian",
sampling_weights = NULL
)Arguments
- data
Data frame with observations. Ensure `make_strata()` was run first so the `stratum` variable exists.
- outcome
Character string; the dependent variable.
- vars
Character vector; predictors (strata groupings & covariates) to add sequentially to the model.
- engine
Modeling engine ("lme4", "brms", "wemix", or "ordinal"). Default is "lme4"; switches to "wemix" automatically when
sampling_weightsis supplied, and to "ordinal" for an ordinal family or ordered-factor outcome.- family
Error distribution and link function. Default is "gaussian".
- sampling_weights
Optional name of a sampling-weight column for design-weighted stepwise fits; see
fit_maihda. The weight column joins the complete-case filter so every step uses the same analytic sample.
Value
A data.frame (class maihda_stepwise) showing the sequential
models, the between-stratum variance at each step, and both the step-specific
and total PCV. For a binary (binomial/Bernoulli) outcome it also carries
the discriminatory-accuracy trajectory: AUC (the C-statistic of each
step's model – step 0 is the strata-only discriminatory accuracy),
Step_AUC and Total_AUC (the absolute change in AUC,
delta-AUC, versus the previous step and versus the null), and MOR (the
Median Odds Ratio, logit link only). These columns are absent for non-binary
outcomes.
Details
All models are fit on the complete cases for `outcome`, `stratum`, and all variables in `vars` so that each sequential variance comparison uses the same analytic sample.
For a binary outcome the table additionally tracks discriminatory accuracy
(Merlo et al. 2016): AUC is each model's C-statistic and Step_AUC /
Total_AUC are its absolute change (delta-AUC), in contrast to the
proportional Step_PCV / Total_PCV. The MOR is reported
for the logit link (NA otherwise) and is a monotone transform of the
between-stratum variance already in Variance. For a design-weighted fit
(sampling_weights) the AUC is the design-weighted (population) C-statistic.
Reuses maihda_discriminatory_accuracy on each step's fitted model, so
no additional models are fit. Note that adding a stratum-defining dimension
(one already encoded by the strata) typically leaves the AUC essentially unchanged:
it re-partitions the between-stratum variance (so the PCV and MOR move) but not the
per-stratum predicted ranking the rank-based AUC depends on. The AUC trajectory is
therefore most informative for individual-level covariates that vary within
strata.
References
Merlo, J., Wagner, P., Ghith, N., & Leckie, G. (2016). An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PLOS ONE, 11(4), e0153778.
Examples
# \donttest{
strata_result <- make_strata(maihda_sim_data, c("gender", "race"))
stepwise_pcv(strata_result$data, "health_outcome", c("gender", "race", "age"))
#> Step Model Added_Variable Variance Step_PCV Total_PCV
#> 0 Null Model None (Intercept only) 2.324e+01 0.00000 0.00000
#> 1 Model 1 gender 2.290e+01 0.01457 0.01457
#> 2 Model 2 race 7.564e-14 1.00000 1.00000
#> 3 Model 3 age 0.000e+00 1.00000 1.00000
# }
