Fits a multilevel model for MAIHDA (Multilevel Analysis of Individual
Heterogeneity and Discriminatory Accuracy) using lme4, brms, WeMix (for
design-weighted (survey) data), or – for an ordered-factor outcome – a
cumulative link mixed model via ordinal::clmm().
Usage
fit_maihda(
formula,
data,
engine = "lme4",
family = "gaussian",
autobin = TRUE,
context = NULL,
sampling_weights = NULL,
id = NULL,
time = NULL,
time_degree = 1,
interactions = FALSE,
...
)Arguments
- formula
A formula specifying the model. Can include a random effect for stratum (e.g.,
outcome ~ fixed_vars + (1 | stratum)) or can directly specify the intersection variables to be used for forming strata (e.g.,outcome ~ fixed_vars + (1 | var1:var2:var3)). If variables other than "stratum" are provided in the random effect,make_stratawill be called internally to compute the strata and the formula will be updated.- data
A data frame containing the variables in the formula.
- engine
Character string specifying which engine to use: "lme4" (default), "brms", "wemix" (design-weighted pseudo-maximum-likelihood via
WeMix::mix(); requiressampling_weights), or "ordinal" (cumulative link mixed model viaordinal::clmm(); requires an ordinal family). Whensampling_weightsis supplied andengineis left at its default, the engine switches to "wemix" automatically (with a message); likewise an ordinal family (or an auto-detected ordered-factor outcome) switches the default engine to "ordinal".- family
Character string, family object, or family function specifying the model family. Common options: "gaussian", "binomial", "poisson", "negbinomial". Default is "gaussian".
family = "negbinomial"fits an overdispersed count model with the dispersion parameter theta estimated from the data: lme4 vialme4::glmer.nb()and brms via itsshapeparameter (log link only; not supported by the wemix engine). A fixed-thetaMASS::negative.binomial(theta)family object is also accepted withengine = "lme4"and is fitted withglmer(), honouring the supplied theta.family = "ordinal"(alias"cumulative"; ormaihda_cumulative("probit")/brms::cumulative()for a non-logit link) fits a cumulative (proportional-odds) model for an ordered-factor outcome:ordinal::clmm()under the automatic "ordinal" engine,brms::cumulative()underengine = "brms". The VPC/ICC lives on the latent scale (level-1 variance \(\pi^2/3\) logit / 1 probit, as for binomial models) and response-scale predictions are expected category scores (categories scored 1..K in order). An ordered-factor outcome with 3+ levels under the default family selects this model automatically, with a warning. The logit and probit links are supported;sampling_weights,context, and lme4-styleweights/subset/offsetarguments are not available on the clmm path. If the outcome variable appears to be binary and the default family is used, the function will automatically switch to "binomial", recode two-level responses to 0/1 forglmer(), and issue a warning. When a two-level non-0/1 response is recoded (on either the auto-detected or an explicitfamily = "binomial"path), the mapping follows the usual convention – the first level becomes 0 (reference) and the second becomes 1 (the modeled event), where "first/second" means alphabetical order for a character outcome and the declared order for a factor. The chosen mapping is reported via amessage()and stored on the result as$response_recoding; set the factor levels (or supply a 0/1 outcome) to control which level is the event. Although any valid family object is accepted for fitting, the MAIHDA variance summaries (summary.maihda_model, VPC/ICC, PCV) are only defined forgaussian("identity"), the binomial/Bernoulli families with a logit or probit link,poisson("log"), and the negative binomial with a log link (level-1 variancelog(1 + 1/mu + 1/theta); Nakagawa, Johnson & Schielzeth 2017). Other families (for exampleGamma(link = "log")) will fit, butsummary()and the VPC/PCV helpers will stop with an "not implemented" error because no level-1 variance is defined for them.- autobin
Logical indicating whether numeric variables used only for automatic strata creation should be binned by
make_strata. Default is TRUE.- context
Optional character vector naming one or more higher-level context columns in
data(e.g."school","hospital","region"). Each enters the model as a crossed intercept-only random effect alongside the intersectional stratum effect –outcome ~ covars + (1 | stratum) + (1 | context)– giving the contextual cross-classified MAIHDA of the literature (individuals cross-classified by stratum and place/institution).summary.maihda_modelthen partitions the unexplained variance into between-stratum vs. between-context vs. residual, and the headline VPC/ICC remains the between-stratum share (now net of the context). A context variable may not be a stratum dimension or"stratum"itself, and may not already appear as a fixed-effect term (its variance would then be absorbed by the fixed part). A context with few levels (say < 10) weakly identifies its variance and often yields a singular lme4 fit; thebrmsengine handles this better. Writing the random effect directly in the formula (... + (1 | school)) fits the same model but is summarised generically as "Other random effects"; onlycontext =activates the labelled contextual partition. Not supported by thewemixengine.- sampling_weights
Optional single character string naming a numeric column of
dataholding individual sampling (survey/design) weights, for a design-weighted MAIHDA on complex-survey data (e.g. NHANES, PISA). Sampling weights are not the same thing as lme4'sweights=(precision weights, which rescale the residual variance), so combiningsampling_weightswithengine = "lme4"is an error. Two engines support them:engine = "wemix"(chosen automatically whenengineis left at its default): weighted pseudo-maximum-likelihood viaWeMix::mix()(Rabe-Hesketh & Skrondal 2006), the estimator used for NAEP/PISA analysis. The individual weights enter at level 1 unchanged and the level-2 (stratum) weights are 1, because intersectional strata are exhaustive population cells included with certainty. Supportsgaussian(identity)andbinomial(logit)models with the canonical single(1 | stratum)random intercept. Fixed-effect standard errors are design-consistent (sandwich); the VPC/PCV are reported as point estimates (no bootstrap – seesummary.maihda_model).engine = "brms": the weights enter the model as likelihood weights (y | weights(w)), normalized to mean 1, giving a pseudo-posterior: point estimates target the population-weighted estimand but credible intervals are not design-based – interpret them cautiously. Full design-based (replicate-weight / linearised) variances for the variance components are not computed.
Rows with a missing or non-positive sampling weight are dropped with a warning. The column names
.maihda_sw(brms likelihood weight) and.maihda_l2wt(WeMix level-2 weight) are reserved for the design-weighted engines: they are written into the analytic data internally, so a model that suppliessampling_weightsmay not also reference a variable of either name in its formula (fitting would error). DefaultNULL(unweighted).- id
Optional single character string naming a person/unit identifier column for a longitudinal (growth-curve) MAIHDA on long-format data (one row per measurement occasion). Supplied together with
time, it makes the model a 3-level growth curve – occasions within individuals (id) within intersectional strata – with a random intercept and slope ontimeat both the individual and stratum levels. The growth random effects are added automatically: write the strata shorthand(1 | var1:var2)(or(1 | stratum)) only, not the slopes. The between-stratum variance (and hence the VPC) then becomes a function of time;summary.maihda_modelreports the time-varying VPC. Longitudinal fits are supported byengine = "lme4"/"brms"only (notwemix/ordinal), and are incompatible withcontextandsampling_weights. DefaultNULL(cross-sectional). See Bell, Evans, Holman & Leckie (2024).- time
Optional single character string naming a numeric measurement-time column (e.g. wave 0, 1, 2, ... or age), required for a longitudinal MAIHDA; see
id. DefaultNULL.- time_degree
Polynomial degree of the growth curve when
timeis supplied: 1 (default) linear, 2 quadratic, etc. The brms engine supports degree 1 only.- interactions
Opt-in per-stratum interaction diagnostic (
maihda_interactions), attached as theinteractionsslot and shown byprint().FALSE(default) skips it;TRUEcomputes it with the diagnostic's default correction (adjust = "BH"); or pass ap.adjustmethod name, including"none"for the uncorrected view. It is meaningful only on an adjusted model (the dimensions' main effects in the fixed part); on a null modelmaihda_interactionswarns. This is the single-fit parallel to the default-oninteractionsofmaihda.- ...
Additional arguments passed to
lmer/glmer(lme4),brm(brms), orWeMix::mix()(wemix; e.g.nQuad,fast). The lme4-styleweights/subset/offsetarguments are not supported by the wemix engine.
Value
A maihda_model object containing:
- model
The fitted model object (lme4, brms, WeMix, or ordinal::clmm)
- engine
The engine used ("lme4", "brms", "wemix", or "ordinal")
- sampling_weights
The sampling-weight column name when supplied, NULL otherwise
- formula
The model formula
- data
The data used for fitting
- family
The family used
- strata_info
The strata information from make_strata() if available, NULL otherwise
- context_vars
The context variable name(s) when
contextwas supplied, NULL otherwise- interactions
The
maihda_interactionsdiagnostic wheninteractionsis notFALSE, NULL otherwise- response_recoding
For a recoded two-level outcome, a data frame mapping each original level to its 0/1 value and role (reference/event); NULL when no recoding occurred
- diagnostics
Fit-quality diagnostics (singular fit / convergence) for lme4 models, surfaced by the print and summary methods
Examples
# \donttest{
# Standard approach: manually create strata first
strata_result <- make_strata(maihda_sim_data, vars = c("gender", "race", "education"))
model <- fit_maihda(health_outcome ~ age + (1 | stratum),
data = strata_result$data,
engine = "lme4")
# Simplified approach: specify stratifying variables directly in the grouping structure
# The function internally calls make_strata() to create intersectionals
model2 <- fit_maihda(health_outcome ~ age + (1 | gender:race:education),
data = maihda_sim_data,
engine = "lme4")
# Contextual cross-classified MAIHDA: strata crossed with a higher-level
# context (here country) -- the literature's cross-classified MAIHDA.
data(maihda_country_data)
model3 <- fit_maihda(math ~ 1 + (1 | gender:ses),
data = maihda_country_data,
context = "country")
summary(model3) # between-stratum vs. between-country vs. residual
#> MAIHDA Model Summary
#> ====================
#>
#> Variance Partition Coefficient (VPC/ICC):
#> Estimate: 0.1032
#>
#> Variance Components:
#> component variance sd proportion
#> Between-stratum (random) 915.2 30.25 0.1032
#> Context: country 1137.1 33.72 0.1283
#> Within-stratum (residual) 6813.0 82.54 0.7685
#> Total 8865.3 94.16 1.0000
#>
#> Contextual Cross-Classified Partition (stratum x context):
#> Between-stratum (intersectional) variance: 915.2323 (share 10.3%)
#> Context 'country' variance: 1137.1067 (share 12.8%)
#> Note: the headline VPC/ICC is the between-stratum share conditional on
#> the context random effect(s). The context share is the between-context
#> component of the unexplained variance.
#>
#> Fixed Effects:
#> term estimate
#> (Intercept) 492.3
#>
#> Stratum Estimates (first 10):
#> stratum stratum_id label random_effect se lower_95 upper_95
#> 1 1 male × Medium 7.404 9.689 -11.59 26.396
#> 2 2 female × Medium -7.027 9.740 -26.12 12.063
#> 3 3 female × High 30.825 9.729 11.76 49.894
#> 4 4 male × Low -28.600 9.741 -47.69 -9.508
#> 5 5 female × Low -37.651 9.724 -56.71 -18.592
#> 6 6 male × High 35.048 9.713 16.01 54.085
# }
