Calculate Proportional Change in Between-Stratum Variance (PCV)

Calculates the proportional change in between-stratum variance (PCV) between two MAIHDA models. The PCV measures how much the between-stratum variance changes when moving from one model to another, and is calculated as: PCV = (Var_model1 - Var_model2) / Var_model1. (The function and result object retain the historical "pvc" naming; “PVC” and “PCV” refer to the same quantity.)

Usage

calculate_pvc(
  model1,
  model2,
  bootstrap = FALSE,
  n_boot = 1000,
  conf_level = 0.95
)

Arguments

model1: A maihda_model object from fit_maihda(). This is the reference model (typically a simpler or baseline model).
model2: A maihda_model object from fit_maihda(). This is the comparison model (typically a more complex model with additional predictors).
bootstrap: Logical indicating whether to compute bootstrap confidence intervals for the PCV. Default is FALSE.
n_boot: Number of bootstrap samples if bootstrap = TRUE. Default is 1000.
conf_level: Confidence level for bootstrap intervals. Default is 0.95.

Value

A list containing:

pvc: The estimated proportional change in variance
var_model1: Between-stratum variance from model1
var_model2: Between-stratum variance from model2
ci_lower: Lower bound of confidence interval (if bootstrap = TRUE)
ci_upper: Upper bound of confidence interval (if bootstrap = TRUE)
bootstrap: Logical indicating if bootstrap was used

Details

The PVC is the proportional change in between-stratum variance when moving from model1 to model2: a positive value means model2 has lower between-stratum variance, a negative value means higher. It is the share of model1's between-stratum variance explained by model2 only in the canonical nested case, where model2 adds fixed-effect predictors to model1 on the same outcome, analytic sample and strata. The function does not require nesting, so for non-nested models the PVC is simply a model-dependent difference in variance, not an explained proportion.

REML vs ML. lmer fits Gaussian models by REML, whose between-stratum variance estimate is not comparable across models with different fixed effects – exactly the canonical null-vs-adjusted PCV, where the adjusted model adds the dimensions' main effects. calculate_pvc() therefore refits any REML lmer model with maximum likelihood (refitML) before reading the variances (and before the parametric bootstrap, so the interval matches), matching maihda_ic and anova() on lme4 models. Using REML estimates here biases the PCV (it overstates the residual between-stratum variance of the adjusted model). GLMM fits (glmer) and the brms/wemix/ordinal engines are already on the maximum-likelihood scale and are unaffected; single-model VPC/ICC summaries keep their REML fit, since that comparison-free quantity is not subject to the pitfall.

When bootstrap = TRUE, the function uses a parametric bootstrap: it simulates new responses from model2 and refits both models with lme4::refit() for each simulated response to obtain confidence intervals for the PVC estimate. For negative-binomial models (glmer.nb) refit() holds the dispersion parameter theta fixed at its original estimate, so the interval is conditional on the estimated theta.

Examples

# \donttest{
# Create strata and fit two models
strata_result <- make_strata(maihda_sim_data, c("gender", "race"))
model1 <- fit_maihda(health_outcome ~ age + (1 | stratum), data = strata_result$data)
model2 <- fit_maihda(health_outcome ~ age + gender + (1 | stratum), data = strata_result$data)

# Calculate PVC without bootstrap
pvc_result <- calculate_pvc(model1, model2)
print(pvc_result$pvc)
#> [1] 0.007185334

# Calculate PVC with bootstrap CI
# pvc_boot <- calculate_pvc(model1, model2, bootstrap = TRUE, n_boot = 500)
# print(pvc_boot)
# }