A simulated cross-sectional dataset built to showcase **Bayesian (brms) MAIHDA for sparse intersections** – the regime where many intersectional strata each hold only a handful of individuals. There the maximum-likelihood (lme4) estimate of the *interaction* between-stratum variance collapses to a singular fit with no uncertainty, so the additive-vs-interaction split is both unstable and falsely precise; weakly-informative priors (`engine = "brms"`) regularise the variance off the boundary and return a calibrated credible interval.
Format
A data frame with 240 rows and 6 variables:
- gender
Strata dimension (Women/Men).
- ethnicity
Strata dimension (White/Black/Asian).
- education
Strata dimension (Low/High).
- age_group
Strata dimension (Young/Mid/Older).
- y
A continuous (Gaussian) outcome. True between-stratum VPC 0.26, of which 40% is the intersectional interaction.
- event
A binary outcome (No/Yes), ~46% "Yes". Its latent-scale between-stratum VPC is 0.31, again 40% interaction.
The exact generative truth is also attached as
attr(maihda_sparse_data, "truth") (additive/interaction variances, shares,
and VPCs for each outcome).
Details
The data carry a **known, non-trivial interaction** so the vignette can claim *recovery* rather than merely report numbers: 4 dimensions form 36 intersectional strata with deliberately skewed sizes (median 6 individuals, 12 of 36 cells below 5, two singletons), and the true interaction accounts for **40 between-stratum variance** on both outcomes. On the binary outcome a genuine 40 interaction is read by lme4 as roughly 3 that is purely a small-cell artifact.
Note
A purely illustrative dataset. The dimension labels are arbitrary and the interaction is constructed, not estimated from any real population – its only purpose is to make the sparse-cell behaviour of the ML and Bayesian estimators visible against a known answer.
Examples
data(maihda_sparse_data)
attr(maihda_sparse_data, "truth")$gaussian$interaction_share # 0.40
#> [1] 0.4
# ML over-shrinks the interaction under sparse cells (a singular fit):
# m_lme4 <- maihda(y ~ 1 + (1 | gender:ethnicity:education:age_group),
# data = maihda_sparse_data, decomposition = "crossed-dimensions")
#
# Weakly-informative priors regularise it and report honest uncertainty:
# m_brms <- maihda(y ~ 1 + (1 | gender:ethnicity:education:age_group),
# data = maihda_sparse_data, decomposition = "crossed-dimensions",
# engine = "brms",
# prior = brms::set_prior("normal(0, 0.5)", class = "sd"))
# See vignette("bayesian_sparse_maihda").
