MAIHDA 0.2.0
New Features
-
Removed the
"risk_vs_effect"plot type (and the internalplot_risk_vs_effect()). The quadrant view of each stratum’s mean prediction against its stratum random effect largely duplicated what"effect_decomp"and"predicted"already show more directly, so it has been dropped fromplot()for bothmaihda_modelandmaihda_analysisobjects, fromtype = "all", and from the Shiny app’s plot menu. -
Removed the
"ternary"plot type and thecompute_maihda_ternary_data(),maihda_ternary_plot(), andplot_maihda_ternary()functions. The ternary diagnostic normalised each stratum’s additive, intersection-specific, and uncertainty signals to sum to 1, which discards effect magnitude – the quantity MAIHDA exists to measure – and puts two effect components and an estimation-error term on a single triangle. Its content is covered more directly by"effect_decomp"(the additive-vs-intersectional split with magnitudes intact) and the"predicted"/"upset"views (uncertainty shown as intervals), so it has been dropped fromplot()for bothmaihda_modelandmaihda_analysisobjects, fromtype = "all", and from the Shiny app, and the optionalggterndependency removed. -
Added the
"upset"plot type andmaihda_upset_size(). An UpSet-style alternative to"predicted"that replaces the long intersectional axis labels with a category matrix (an intersection-size bar, a dot matrix encoding each stratum’s level on every dimension, and a predicted-value panel, all sharing one column order). Binary 0/1 dimensions collapse to a present/absent row; multi-level factors get one row per level. Aquantityargument switches the bottom panel between the predicted value (fixed + random) and the stratum random effect (interaction).maihda_upset_size()returns content-scaled figure dimensions. theme_maihda()now is set as standard theme for ggplot objects-
The
predicted(and longitudinaltrajectories) plot can now keep the most extreme strata when it truncates, instead of the first by stratum order. When there are more strata than then_stratacap, the view dropped strata in stratum order – effectively arbitrary with respect to how far a subgroup sits from the population mean, so the most striking strata could be the ones omitted.plot()gains an opt-inselectargument:"order"(default, unchanged) keeps the first n_strata in stratum order;"deviation"keeps the n_strata furthest from the reference line (largest|predicted - reference|, so the most extreme strata in both directions, not just one tail). For a longitudinaltrajectoriesplot it keeps the strata whose trajectories swing furthest from the population curve (peak|random deviation|over the time grid). Selection and display are kept separate:selectchanges which strata appear, but the x-axis stays in stratum order. It composes with the flag-aware cap (flagged strata are always kept;selectgoverns the fill) and the caption names the rule used (“the 12 strata furthest from the reference, of 200”). -
The BLUP plots can now show only the flagged strata, and never hide a flagged stratum behind the display cap. When there are many strata the
"predicted"view caps the number drawn (n_strata, default 50) and kept them in stratum order, so a stratum flagged bymaihda_interactions()that fell past the cap was dropped from the figure entirely – the highlight could not reach it.plot()gainsonly_flagged:TRUErestricts the"predicted"and"obs_vs_shrunken"views to the strata carrying a credibly non-zero interaction (and turns the highlight on with the stored diagnostic if it was off), with a caption naming the screen (e.g. “Showing the 7 flagged strata (95% interval, BH-adjusted) of 200 total”) and a graceful captioned empty panel when none are flagged. Independently, whenever interactions are highlighted then_stratacap on"predicted"is now flag-aware: every flagged stratum is kept and the remaining slots are filled in stratum order, so the signal the highlight exists to surface is never silently truncated away. The across-strata reference line is still computed from the full set, so filtering never shifts it."effect_decomp"deliberately ignoresonly_flagged(its waterfall exists to show each flagged stratum’s place in the full distribution) and says so; the framing stays on “flagged”, not “significant”, consistent with the diagnostic’s exploratory, partial-pooling reading. -
The interaction diagnostic now defaults to FDR control and gains an equivalence (ROPE) reading.
maihda_interactions()now defaults toadjust = "BH"(Benjamini-Hochberg) rather than"none": fitting and flagging every stratum in one call is a screening question, so the flags should control the false-discovery rate by default. Passadjust = "none"(orinteractions = "none"tomaihda()/fit_maihda()) for the uncorrected, per-stratum individual-testing view. A newropeargument adds an equivalence / smallest-interaction-of-interest reading (Schuirmann 1987; Kruschke 2018): supply a region of practical equivalence (a half-widthdforc(-d, d), orc(lower, upper), on the link scale) and each stratum gains adecisionof"relevant"(interval entirely outside the region),"negligible"(entirely inside), or"inconclusive"(straddling a bound), reported byprint(). The documentation now also keeps two ideas the literature distinguishes apart – partial pooling regularises magnitude/sign (Gelman, Hill & Yajima 2012) while whether to correct depends on the inferential structure of the claim (Rubin 2021) – rather than treating shrinkage as itself a multiplicity correction.
Bug Fixes
compare_maihda(ic = TRUE)could show information criteria on incompatible scales without a warning. The comparability check only warned when the models differed in outcome, weights, family/link, analytic sample, or strata, then appended whichever information-criterion columns existed. Two same-family models fitted on different engines – e.g. a Gaussianlme4fit (reporting AIC/BIC) and a Gaussianbrmsfit (reporting WAIC/LOOIC) – agree on all the checked aspects, so no warning fired, yet the appended table placed the likelihood AIC/BIC and the Bayesian WAIC/LOOIC side by side. Those are different scales and are not comparable to each other (asmaihda_ic()’s documentation already notes).compare_maihda()now emits an explicit warning whenever the appended criteria span both the likelihood (AIC/BIC) and the Bayesian (WAIC/LOOIC) scales, so a cross-engine comparison can no longer be read as if the four numbers were on one ruler. An all-lme4/all-ordinal(AIC/BIC only) or all-brms(WAIC/LOOIC only) comparison is unaffected.A fixed interaction among the stratum dimensions (e.g.
Gender * Race) silently corrupted the decomposition. R expandsGender * RacetoGender + Race + Gender:Race, but the workflow only looked for the dimensions’ additive main effects when deciding whether the supplied formula was the fully-specified adjusted model. It found bothGenderandRace, treated the fit as the adjusted model, and derived the null by removing only those main effects – leaving the fixedGender:Raceinteraction in both the null and the adjusted formulas. That fixed cell-means term duplicates the intersectional(1 | Gender:Race)random intercept: it absorbs the between-stratum variance into fixed effects, pinning the stratum variance at a singular boundary so the PCV came backNULL/invalid (and every per-group PCV came backNAincompare_maihda_groups()/maihda(group = )).maihda()andcompare_maihda_groups()now reject such a formula up front with an actionable error – the MAIHDA adjusted model is defined to carry only the dimensions’ additive main effects, with the intersection estimated by the stratum random effect (writeGender + Race, and usedecomposition = "crossed-dimensions"ormaihda_interactions()to quantify the interaction).maihda_interactions()likewise now warns when handed a bare model that carries such a fixed dimension interaction (its stratum random effects are no longer the pure interaction the diagnostic reports). Detection is robust to variable order within the interaction and to non-syntactic names, and a legitimate covariate-by-dimension interaction (e.g.age * Gender) is left untouched.Discriminatory accuracy silently dropped the AUC/MOR for a
brmsaggregated-binomial (y | trials(n)) fit. The model layer fits such responses, andsummary()attaches the discriminatory accuracy automatically for any binomial/Bernoulli fit, butmaihda_discriminatory_accuracy()only recognised the aggregated structure forlme4cbind(success, failure)fits. Abrmsy | trials(n)fit therefore reachedmaihda_auc()with the per-row success counts as the response, errored on the non-0/1 values, and the summary quietly omitted the DA. It now detects abrmsaggregated binomial via the existing trial-count extraction path (maihda_brms_trial_counts(), which parses thetrials()addition term off the stored formula –brmsexposes noweights.brmsfit) and computes the same count-weighted C-statistic thelme4cbind()path uses. The rows are ranked by the per-trial probabilitypredict_maihda()returns (now normalised to a probability on both engines – see below), so the AUC ranks by probability and not by a count that confounds the probability with the number of trials;n_case/n_controlare the total successes / failures. Bernoulli andlme4fits are unaffected.calculate_pvc()(and thereforemaihda(),stepwise_pcv(), andcompare_maihda_groups()) could silently report a REML-based PCV for a singular multi-random-effect model. The ML refit (lme4::refitML(), needed because REML between-stratum variances are not comparable across models with different fixed effects) was skipped whenever the fit was globally singular – but a fit can be singular because a non-stratum random effect (e.g. an extra(1 | site)grouping factor) is on the boundary while the stratum variance is comfortably nonzero. In that case the package returned the REML PCV instead of the intended ML PCV. The ML refit is now skipped only when the stratum variance itself is on the boundary (effectively zero, judged onlme4::isSingular()’s own relative tolerance), or whenrefitML()fails; a non-stratum boundary no longer suppresses it. The boundary skip for a zero-variance stratum is preserved, so the zero-variance guard incalculate_pvc()is not masked.Aggregated-binomial stratum predictions were row-weighted instead of trial-weighted. For a
cbind(success, failure)(ory | trials(n)) fit the rows of an intersectional stratum can carry very different numbers of binomial trials, but the per-stratum prediction means averaged the fitted probabilities / linear predictors with unit weights, so a 5-trial row counted as much as a 200-trial row. The unit weighting is correct for the observed stratum summaries (which sum successes over summed trials – the trials are already in the denominator), but wrong for the model predictions. Prediction aggregation now uses a dedicatedmaihda_prediction_weights()that weights each row by its binomial trial count – read fromstats::weights(model, type = "prior")for anlme4cbind()fit, or parsed from the formula’strials()addition term for abrmsy | trials(n)fit (which exposesmodel.frame.brmsfitbut noweights.brmsfit, so the prior-weights route is unavailable and the counts would otherwise fall back to unit weights) – while the observed numerator/denominator path is unchanged. This corrects the predicted stratum means and thew_sumweights feedingmaihda_table(), the predicted-strata / effect-decomposition plots, and the prediction-deviation panels; unweighted and non-binomial fits are unaffected (the weights reduce exactly to the previous values).brmscumulative (ordinal) stratum predictions were on the wrong response scale. The stratum-level prediction helper applied the scalar inverse link for theengine = "brms",family = "ordinal"response scale, returning a single cumulative probability in[0, 1]rather than the expected category scoresum_k k * P(Y = k)in[1, K]that the rest of the package documents and computes – the individualpredict_maihda()path already collapses thefitted()category-probability array to that score, and theclmm(engine = "ordinal") stratum path already used it. This silently mis-scaled (and mis-ranked)maihda_table()and the stratum plots for a Bayesian cumulative model. The brms stratum helper now maps the latent location to the expected category score with the same shared cumulative helpers (using the posterior-mean thresholds), so the brms andclmmcumulative paths agree and match the documented response scale; the link scale is unchanged.maihda_interactions()returned a meaningless scalar diagnostic for a longitudinal fit. A direct call on a longitudinalmaihda()analysis (or a bare longitudinal model) fell through to the crossed-dimensions branch and reported a single per-stratum BLUP – which drops the random slope and so describes a cross-section, not the trajectory. It now errors with a pointer to the trajectory tools, matching the automatic-attachment path that already skips longitudinal objects.The Bayesian
pdcolumn was mislabelled.maihda_interactions()reportedpd = P(BLUP > 0)under the heading “probability of direction”, so a strongly negative interaction (whose direction is near-certain) showedpdnear 0 rather than near 1.pdis now the conventional probability of directionmax(P(BLUP > 0), P(BLUP < 0))(in[0.5, 1], à labayestestR::p_direction), with the sign reported separately in the existingdirectioncolumn.predict_maihda()silently accepted an unseen stratum on thewemixandordinalpaths. Whennewdataalready carried astratumcolumn, the preparation step returned early and skipped the unseen-stratum check, so a misspelled or genuinely new stratum flowed through to the WeMix/clmmlinear-predictor helpers, which map a missing random effect to 0 – yielding a fixed-only prediction that looked valid and contradicted both the documented contract (unseen strata are an error, as fortype = "strata") andlme4’s default behaviour.predict_maihda()now rejects an unseen stratum by default for every engine and prediction type, whether the stratum is supplied directly or rebuilt from the grouping variables. A newallow_new_levelsargument (defaultFALSE) opts into the previous behaviour explicitly: fortype = "individual"it returns a population-average (fixed-effects-only) prediction for unseen strata, dropping the stratum random effect (forwarded asallow.new.levelsto lme4 andallow_new_levelsto brms). Stratum-level predictions have no random effect to report for an unseen stratum, so they remain an error regardless.predict_maihda(scale = "response")returned expected counts, not probabilities, for abrmsaggregated-binomial (y | trials(n)) fit.brms’sfitted()/posterior_epred()return the expected number of successes (trials * p) for a binomial fit with atrials()term, whereaslme4’stype = "response"for acbind(success, failure)fit returns the per-trial probability. The samepredict_maihda(scale = "response")call therefore produced two different scales depending on the engine, so anyone ranking or comparing predicted risks acrossbrmsstrata was comparing expected counts (which confound the probability with the per-row number of trials) rather than probabilities. Thebrmsresponse-scale prediction is now normalised by the per-row trial counts (evaluated on the predictionnewdata), so both engines return a per-trial probability in[0, 1]; the discriminatory-accuracy AUC, the Shiny app’s absolute-prediction panel, and any downstream ranking now read a true probability. Bernoulli and non-binomial fits are unaffected (notrials()term -> no normalisation).predict_maihda(allow_new_levels = TRUE)did not give the documented zero-effect prediction for an unseen stratum on thebrmsengine. The contract is to drop the stratum random effect (treat it as zero) for a stratum the model never saw. Thebrmspath merely forwardedallow_new_levels = TRUE, butbrms’s defaultsample_new_levels = "uncertainty"draws a new-stratum effect from the estimated random-effects distribution rather than zeroing it, so the prediction silently carried a random per-call group effect. Unseen-stratum rows are now split off and predicted with anre_formulathat excludes the stratum term, while seen-stratum rows keep their estimated effect (a blanket exclusion would have dropped the seen strata’s effects too). Any other random effect the unseen row participates in – a contextual(1 | school)intercept fromfit_maihda(context = ), a longitudinal(time | id)growth term – is kept, exactly aslme4’sallow.new.levelszeroes only the unseen level’s effect and retains seen ones; for the usual single-stratum model the excludingre_formulaisNA, the fixed-effects-only population average.lme4and thewemix/ordinalpaths already behaved this way and are unchanged, so the engines now agree on the same model.The crossed-dimensions decomposition reported
NaNadditive / interaction shares when there was no between-stratum variance. The shares split the between-strata variance, so for a degenerate fit whose additive and interaction variances are both estimated at exactly zero they are0 / 0 = NaN, which leaked throughsummary()and the comparison/tidier outputs.maihda_cc_partition()now returnsNA(a defined “undefined”) for the shares whenbetween == 0, and for the VPC when the total variance is0;print()showsNA (no between-strata variance to split). Non-degenerate fits are unchanged.A formula
offset()term was silently dropped fromwemixandordinalpredictions.WeMix::mix()andordinal::clmm()both honour anoffset(.)term in the model formula when fitting, but the manual linear-predictor helpers (nopredict.clmmexists, and WeMix’s ownpredict()offers no fixed-only/scale control) rebuilt the linear predictor asX %*% beta (+ u)from the design matrix only – andmodel.matrix()never produces a column for an offset term, so the offset was omitted. Every link-scale prediction (and the response-scale value derived from it) was therefore off by exactly the per-row offset:predict_maihda()for both engines, and the ordinalplot_prediction_deviation_panels()probabilities, which rebuild theclmmlocation independently. The helpers now evaluate the formula’s offset on the prediction data (stats::model.offset()of the rebuilt model frame) and add it back, including for an offset-only null model whose location is otherwise identically zero. Fits with no offset are unaffected.maihda()andcompare_maihda_groups()chose the engine from the raw outcome, beforesubset/weights. The wrappers passengineexplicitly to every sub-fit, so they pre-selectedengine = "ordinal"from an ordered-factor outcome – but they checked the raw column, whilefit_maihda()detects the family on the analytic sample (aftersubsetand weight filtering). An ordered 3-level outcome subset to two observed levels is a binary analytic sample: a directfit_maihda()fits it asbinomial/lme4, but the wrappers pinnedengine = "ordinal"and then errored (maihda()) or failed every group (compare_maihda_groups()) with an engine/family contradiction. The ordinal pre-check now runs on the same analytic samplefit_maihda()uses (the forwardedsubset/weights are resolved against the data mask first), so the engine choice can no longer contradict the resolved family. Relatedly,maihda()now forwards the evaluatedsubset/weights(not the raw expressions) to its derived null/adjusted fits, mirroringcompare_maihda_groups(): a response-referencingsubset(e.g.subset = y %in% c("lo", "mid")) was otherwise re-evaluated by each derived fit against$original_data, whose response has already been recoded to 0/1, selecting zero rows.Auto-binning a numeric stratum dimension could silently overwrite a user column. When
make_strata()(or the(1 | a:b)shorthand) auto-bins a numeric dimensionv, the adjusted-model and prediction machinery add an internal factor column named.maihda_dim_<v>(referenced by the adjusted / crossed-dimensions formulae and rebuilt for predictionnewdata). Neither writer checked whether the user’s data already carried a column of that name, so an existing.maihda_dim_<v>variable was clobbered – and, because that column then enters the model, the fit or prediction changed silently. The.maihda_dim_prefix is now reserved:make_strata()rejects a pre-existing.maihda_dim_<v>column for any dimension it is about to auto-bin, with an actionable rename hint (or setautobin = FALSE), mirroring the existing reserved-weight-column guard. The internal name is now generated through one shared helper so the fit-time and predict-time writers cannot drift, and predicting on a fitted model’s own data (which legitimately carries the package’s copy) is unaffected.maihda_table()could show REML variance rows alongside an ML-based PCV without saying so.calculate_pvc()refits a Gaussianlme4fit with maximum likelihood before differencing the between-stratum variances (REML variances are not comparable across models with different fixed effects), but the table’s Between-stratum variance / SD and VPC/ICC rows are each model’s own (REML) estimate – the quantitiessummary()reports. The two are on different variance bases by design, soPCV != (var_null - var_adj) / var_nullread off the table, which looks like an inconsistency.maihda_table()now attaches amodels_note(shown byprint()) explaining this whenever the PCV’s variance basis actually differs from the displayed rows; it stays silent for already-ML engines (glmer/brms/wemix/ordinal) and for a boundary null wherecalculate_pvc()keeps the REML fit. The displayed numbers are unchanged, so the table still agrees exactly withsummary().
Improvements
-
Console output is now colour-coded (via
cli). The print methods across the package –maihda()analyses, model/summary, PCV, discriminatory accuracy, the interaction diagnostic, group comparison, tables, information criteria, and the rest – bold section titles, accent the headline values, and dim secondary notes, reusing the plot accent colour so the console and the figures agree. The palette is deliberately valence-neutral: colour marks emphasis (a finding to look at, a neutral conclusion, a de-emphasised note), never good-vs-bad – so, e.g., anegligibleequivalence result is shown in a neutral colour, not green. It degrades to plain text automatically wherever ANSI is unsupported (knitr/vignettes,R CMD check,testthat,NO_COLOR), so rendered and captured output is byte-for-byte unchanged.clijoinsImports. -
compare_maihda_groups()(andmaihda(group = )) gain avar_between_adjusted_mlcolumn: the adjusted model’s actual between-stratum variance, read straight off the adjusted fit on the maximum-likelihood scale the PCV is computed on. The existingvar_between_adjustedcolumn is, by design, a derived coherence quantity (var_between * (1 - pcv), on the REMLvar_between/vpcscale so the table satisfiespcv = (var_between - var_between_adjusted) / var_betweenexactly) and is not the adjusted fit’s own variance; the documentation now states this explicitly andvar_between_adjusted_mlreports the literal adjusted variance for anyone who needs it. The two differ only by the small REML-vs-ML gap in the null variance.
MAIHDA 0.1.11
CRAN release: 2026-06-18
New Features
-
maihda()now reports the intersectional interactions by default. “Which strata genuinely interact” is the scientific payoff of MAIHDA, so it no longer needs a separate call:maihda()computesmaihda_interactions()on the adjusted (or crossed-dimensions) model, stores it as theinteractionsslot, and surfaces a one-line headline inprint()(flagged count, the strongest stratum, and – crucially – the multiplicity stance actually used, so an uncorrected scan is never silently read as corrected). The computation is essentially free (it reads the stratum estimates the summary already holds; no refit), and it is skipped for a longitudinal decomposition (whose interaction is a trajectory, not a scalar). Control it with the newinteractionsargument:TRUE(default) computes it with the diagnostic’s own default correction;FALSEskips it; or pass ap.adjustmethod name to choose the correction.fit_maihda()gains the same argument as an opt-in (interactions = FALSEby default, since a single fit is often a null model where the diagnostic does not apply).plot(..., highlight_interactions = TRUE)now reuses the stored diagnostic, so the plot highlights and the printed headline can no longer disagree. - Added
broom-styletidy()andglance()methods formaihda_model,maihda_summary, andmaihda_analysis, so MAIHDA results drop straight into tidy data forggplot2,gt/flextabletables, and other downstream tooling.tidy()returns the stratum (intersection) random-effect estimates by default (with the human-readable intersectional label), or the variance-components table (component = "variance") or fixed effects (component = "fixed"), all in broom’sestimate/std.error/conf.low/conf.highshape;tidy()on amaihda()analysis takeswhich = c("null", "adjusted").glance()returns the MAIHDA headline as a one-row tibble – the VPC/ICC (with its bootstrap/posterior interval), and for amaihda()analysis the PCV, the adjusted-model AUC, the additive/interaction shares, AUC and MOR for a binary outcome, plusn_strata/nobs/engine/family– a row no generic mixed-model tool emits, since the PCV needs the null+adjusted pair. The layout is uniform across thelme4,brms,wemix, andordinalengines. The generics come from the lightweightgenericspackage (the same onesbroom/broom.mixedre-export), sotidy()/glance()dispatch whether the user hasbroom,generics, or justMAIHDAloaded, with no hardbroomdependency; the raw fixed-effect/per-row tidyingbroom.mixedalready does on the underlying fit is intentionally not duplicated. - Added longitudinal (growth-curve) MAIHDA – the life-course extension of Bell, Evans, Holman & Leckie (2024, Soc Sci Med 351:116955) – via new
id,time, andtime_degreearguments onfit_maihda()andmaihda(). Supplyingid(the person/unit measured repeatedly) andtime(a numeric measurement-time column) fits a 3-level growth curve – occasions within individuals within intersectional strata – with a random intercept and slope on time at both the individual and stratum levels (outcome ~ time + (time | id) + (time | stratum); the growth slopes are added automatically, so you write only the strata shorthand(1 | var1:var2)). The between-stratum variance, and therefore the VPC, is then a function of time (VarS(t) = a(t)' Sigma_s a(t)):summary()reports the baseline (reference-time) VPC, the full time-varying VPC trajectory, and the stratum/individual intercept-slope-covariance blocks, and a new plot type"vpc_trajectory"draws the VPC-over-time curve (with"trajectories"for the predicted per-stratum mean trajectories;plot(type = "vpc")/"all"route there for a longitudinal fit).maihda(decomposition = "longitudinal")(selected automatically whenid/timeare supplied) fits a null and an adjusted growth model – the adjusted model adding the dimensions’ main effects and theirdim:timeinteractions – and reports the PCV separately for the baseline (intercept) and the slope variance: the additive-vs-multiplicative split of the intersectional trajectory inequality (the paper’s “partly multiplicative but mostly additive” finding), returned as amaihda_long_pcvobject withpcv_intercept,pcv_slope, and a time-specificpcv_t(and a"pcv_trajectory"plot). All families with a defined level-1 variance are supported (gaussian/binomial/poisson/negbinomial, latent scale for non-Gaussian) onengine = "lme4"(any degree) andengine = "brms"(linear growth; posterior credible bands on the VPC trajectory).predict_maihda(type = "strata")returns each stratum’s deviation at the baseline (reference) time plus its raw intercept and slope (a stratum is now a trajectory; the baseline deviation, evaluated atref_time = min(time), is the longitudinal analogue of a cross-sectional stratum BLUP and differs from the raw time-0 intercept whenever time is not zero-referenced). The intercept-only guards elsewhere are untouched – a longitudinal fit is tagged and routed to the time-varying path, while every other model still rejects random slopes – andextract_between_variance()/calculate_pvc()reject a longitudinal model with a pointer to the time-varying decomposition. Design-weighted, contextual,wemix/ordinal, and group-comparison longitudinal models are out of scope (each errors). A new bundled datasetmaihda_long_data(600 persons x 5 waves, gender x ethnicity x education strata, with constructed mostly-additive trajectory differences) demonstrates it. - Added ordinal (cumulative) MAIHDA for ordered-factor outcomes – the multicategorical extension the MAIHDA tutorials flag as priority future work.
family = "ordinal"(alias"cumulative"; probit via the new exportedmaihda_cumulative("probit")orbrms::cumulative("probit")) fits a cumulative (proportional-odds) model: a newengine = "ordinal"wrapsordinal::clmm()for the frequentist path (selected automatically for an ordinal family under the default engine, with a message) andengine = "brms"usesbrms::cumulative(). An ordered-factor outcome with 3+ levels under the all-default family/engine selects the model automatically (with a warning; a 2-level ordered factor still takes the binomial auto-detect path), and an unordered factor response is coerced to ordered in its declared level order with a message. The VPC/ICC lives on the latent scale – level-1 variance pi^2/3 (logit) or 1 (probit), the same latent treatment as binomial models, so cumulative VPCs are comparable to logistic ones – andsummary()gains athresholdsslot (the cut points that take the intercept’s place, with Hessian SEs) printed alongside the location fixed effects. Becausepredict.clmmdoes not exist, predictions are built from the stored coefficients: the link scale is the latent location eta = x’beta + u, and the response scale is the expected category score sum_k k*P(Y = k) (the quantity the plots label “Average Expected Category Score”), with P(Y <= k) = g^-1(alpha_k - eta) differenced by pure, unit-tested helpers shared with the brms path (whosefitted()probability arraypredict_maihda()now collapses to the same score). The two-modelmaihda()decomposition and PCV,stepwise_pcv(),compare_maihda_groups()(engine handshakes mirror thesampling_weightsprecedent), the stratum plots,obs_vs_shrunken(observed mean category score),risk_vs_effect,effect_decomp(exact on the latent scale), and the deviation panels all work;maihda_mor()now also accepts cumulative-logit fits, returning the median cumulative odds ratio. Explicit limits with targeted errors: logit/probit links only, the canonical single(1 | stratum)structure (nocontext/crossed-dimensions – use brms), nosampling_weightson the clmm path, no parametric bootstrap (clmm has no simulate/refit; brms provides credible intervals), AUC/maihda_vpc_response()stay binomial-only, and the ternary diagnostic is not yet supported.ordinaljoinsSuggests. - Added the negative-binomial family for overdispersed count outcomes:
family = "negbinomial"onfit_maihda()(and thereforemaihda(),stepwise_pcv(), andcompare_maihda_groups()). The dispersion parameter theta is estimated from the data –lme4::glmer.nb()under the default engine, theshapeparameter underengine = "brms"– and a fixed-thetaMASS::negative.binomial(theta)family object is also accepted with lme4, honouring the supplied theta. The VPC/ICC uses the lognormal latent-scale level-1 variancelog(1 + 1/mu + 1/theta)(Nakagawa, Johnson & Schielzeth 2017, J R Soc Interface), the negative-binomial analogue of the Stryhn et al. (2006) Poisson approximation already used by the package (it reduces to it as theta grows); for brms theshapedraws are propagated into the VPC credible interval. The two-modelmaihda()decomposition,calculate_pvc(), parametric-bootstrap intervals (vialme4::refit(), which holds theta fixed at its estimate – the interval is conditional on theta, as documented), prediction, and the stratum plots (routed to the count branch) all work; the log link is required, and the wemix engine,maihda_discriminatory_accuracy(), andmaihda_vpc_response()reject the family with targeted messages. Internally, engine-specific family labels are now canonicalised so the family/link comparability checks incalculate_pvc()andcompare_maihda()no longer depend on raw label strings: two fits whose theta is estimated (lme4’s theta-embedding"Negative Binomial(<theta>)", brms’sshape) compare equal, since each label otherwise carried its own theta estimate and could never match. A fixed, user-specified theta (MASS::negative.binomial(theta)) is treated differently – it is part of the model specification, so it stays in the comparability key and two fits with different fixed thetas are (correctly) rejected as incomparable. - Added design-weighted MAIHDA (survey/sampling weights) via a new
sampling_weightsargument onfit_maihda(),maihda(),stepwise_pcv(), andcompare_maihda_groups(). Sampling weights are not the same thing as lme4’sweights=(precision weights, which rescale the residual variance), so feeding survey weights to lmer/glmer maximises the wrong objective and gives invalid population estimates; supplyingsampling_weightswithengine = "lme4"is therefore an error, and supplying it with the default engine switches (with a message) to the newengine = "wemix": weighted pseudo-maximum-likelihood viaWeMix::mix()(Rabe-Hesketh & Skrondal 2006), the estimator built for NAEP/PISA-style survey analysis. The individual weights enter at level 1 unchanged and the level-2 (stratum) weights are 1, because intersectional strata are exhaustive population cells sampled with certainty. The wemix engine supports the canonical MAIHDA structure –gaussian(identity)orbinomial(logit)with the single(1 | stratum)intercept – and flows through the whole toolkit:summary()reports the design-weighted VPC/ICC (latent-scale pi^2/3 level-1 variance for logistic fits, matching the other engines) and design-consistent (sandwich) fixed-effect standard errors; stratum estimates carry analytic conditional SEs (the design-weighted analogue of lme4’scondVar, reducing to it at unit weights);predict_maihda(), the stratum plots (aggregated with the sampling weights, so stratum summaries are population-representative),calculate_pvc()(which now also refuses to compare fits with different sampling weights), themaihda()two-model decomposition,stepwise_pcv(), andcompare_maihda_groups()all work. For a binomial fit,maihda_discriminatory_accuracy()computes the design-weighted AUC (each observation contributes its weight as case/control mass) and flags it in the print method. Alternativelyengine = "brms"acceptssampling_weightsas likelihood weights (y | weights(w), normalized to mean 1), giving a pseudo-posterior: point estimates are design-consistent but credible intervals are not design-based (a message says so). Limitations are explicit rather than silent: no parametric bootstrap for wemix (a design-based interval would need replicate weights – a possible future extension), and no crossed random effects, socontext =anddecomposition = "crossed-dimensions"require lme4/brms. Rows with missing or non-positive weights are dropped with a warning. A unit-weight wemix fit reproduces the lme4 ML fit to numerical precision.WeMixjoinsSuggests. - Added the contextual cross-classified MAIHDA – the “cross-classified MAIHDA” of the literature (e.g. patients cross-classified by intersectional stratum and hospital, or students by stratum and school) – via a new
contextargument onfit_maihda()andmaihda().context = "school"(one or more column names) enters each context as a crossed random intercept alongside the intersectional stratum effect,outcome ~ covars + (1 | stratum) + (1 | school), and the summaries then partition the unexplained variance into between-stratum vs. between-context vs. residual: the headline VPC/ICC becomes the between-stratum share net of the context, each context gets its ownContext: <name>row in the variance-components table, and a new$contextsummary element carries the per-context variances and shares (with parametric-bootstrap intervals for lme4 viabootstrap = TRUE, and posterior credible intervals for brms). Inmaihda()the context random intercept is carried by both the null and the adjusted model, so the PCV is computed with the context partialled out;contextalso composes withdecomposition = "crossed-dimensions"(the context variance then enters the single fit and its VPC denominator). A new plot type"context_vpc"(on bothmaihda_modelandmaihda_analysis) bars the stratum vs. context variances with their shares, andplot(type = "vpc")renders the contextual split automatically.contextcannot be combined withgroup(a stratified per-level comparison vs. a joint crossed model are different designs; supplying both errors), a context variable may not be a stratum dimension or appear as a fixed effect, and a context with few levels weakly identifies its variance (considerengine = "brms"). A manually written... + (1 | school)still fits and is summarised generically as “Other random effects”; onlycontext =activates the labelled partition. -
Renamed the
decompositionvalue"cross-classified"to"crossed-dimensions"inmaihda()andcompare_maihda_groups()(and in the Shiny app’s decomposition toggle), because that mode crosses the stratum dimensions’ main effects as random intercepts – whereas “cross-classified MAIHDA” in the literature means the contextual stratum-by-place model now fitted viacontext(see above). The old value is accepted as a deprecated alias that warns and maps to"crossed-dimensions". Note for code that inspects results: amaihda_analysisfrom this mode now hasmode = "crossed-dimensions"(was"cross-classified"), andcompare_maihda_groups()results carryattr(, "decomposition") == "crossed-dimensions"; printed output and plot titles say “crossed-dimensions” accordingly. - Added
maihda(), a single high-level entry point that runs the standard two-model MAIHDA workflow in one call. It fits the null model (the formula you supply) and the adjusted model (the same model plus the additive main effects of the stratum dimensions), summarises the null model’s VPC/ICC, and reports the PCV – the proportional change in between-stratum variance from null to adjusted (the additive share of the intersectional inequality). When agroupis supplied it also runs this decomposition within each group (thecompare_maihda_groups()result gains apcvcolumn). It returns one consistentmaihda_analysisobject with newmodel_adjusted/summary_adjusted/pcvslots (alongside the unchanged null-modelmodel/summary), andprint,summary, andplotmethods;plot()routes the VPC/shrinkage views to the null model and the additive-vs-intersectional views (effect_decomp,risk_vs_effect,ternary) to the adjusted model, and gainstype = "group_pcv".maihda()is intrinsically a decomposition and has no single-model mode – usefit_maihda()for a single fit. A numeric dimension auto-binned for the strata enters the adjusted model as its reconstructed tertile factor (not a linear term). Because it cannot decompose without an intersection,maihda()errors (rather than returning a half-result) when the stratum-defining variables are unidentifiable (e.g. a hand-builtstratumcolumn) or define only one dimension; the shorthand(1 | var1:var2)andmake_strata()paths both record the dimensions and decompose normally. - Added the
maihda_country_datadataset (OECD PISA 2018, accessed via thelearningtowerpackage): 3,600 students across six countries with gender x socioeconomic-status strata and mathematics-achievement outcomes. It showcasescompare_maihda_groups()/maihda(group = "country"), since intersectional inequality (VPC/ICC) genuinely differs across the countries. - Added the
maihda_sparse_datadataset and a new vignette, “Bayesian MAIHDA for sparse intersections”, showing whyengine = "brms"is the safer estimator when intersectional cells are small. The (simulated) data carry a known interaction – 40% of the between-stratum variance, on a Gaussian outcomeyand a binary outcomeevent– across 36 strata with deliberately skewed sizes (median 6, half the cells below 5). Under that sparsity the maximum-likelihood (lme4) crossed-dimensions fit is singular and over-shrinks the interaction (to ~23% Gaussian, ~3% binary – i.e. a real interaction read as “fully additive”), with no uncertainty attached; a weakly-informative prior on the random-effect SDs regularises the variance off the boundary and returns a calibrated credible interval that covers the truth. The vignette also documents the precompute-and-cache pattern for shipping brms results in a build (Stan cannot run on CRAN’s / pkgdown’s builders). - Added
maihda_table(), a one-call publication-ready results export that assembles the two canonical MAIHDA write-up deliverables (cf. Evans et al. 2024’s tutorial) from a fittedmaihda()analysis (or a singlefit_maihda()model): (a) a model-results table contrasting the null (Model 1) and adjusted (Model 2) fits – intercept, between-stratum variance and SD, VPC/ICC, the PCV, and, for a binary outcome, the AUC and Median Odds Ratio – and (b) a ranked-strata table ordering every intersectional stratum by its model-predicted outcome (their Table 4), with the predicted value’s conditional interval, the stratum size, and the stratum random effect. It introduces no new estimator – the model-results table reuses the quantities fromsummary()and the ranked-strata table reusesplot(type = "predicted")’s stratum predictions, so the table agrees exactly withsummary()/plot(). The$modelsdata frame is numeric and export-ready (statistics in rows, an estimate +*_lower/*_upperinterval columns per model: the VPC bootstrap/posterior interval and the bootstrap PCV interval are carried, other rows are point estimates), and theprint()method renders the familiar “estimate [low, high]” layout plus the top/bottom strata (n_strata). It adapts to every fit type: a crossed-dimensions analysis gets “Additive share”/“Interaction share” rows instead of the PCV, a contextual cross-classified analysis (context =) gets a “Context share (VPC)” row, an ordinal fit leaves the intercept rowNA(its category thresholds are reported bysummary(), not the table), andwhich = "adjusted"ranks the strata by the adjusted rather than the null model. Works across the lme4, brms, wemix, and ordinal engines. -
summary()of a binomial/Bernoulli MAIHDA model – and thereforemaihda(), whose summaries it builds – now reports the discriminatory accuracy automatically: the AUC / C-statistic and Median Odds Ratio (the “DA” in MAIHDA), as a newdiscriminatory_accuracyslot shown by theprint()methods, so the strata’s discriminatory accuracy sits alongside the VPC without a separate call. Formaihda(), the headlineprint()shows the null model’s AUC/MOR with the adjusted model’s AUC for comparison. The response-scale (probability-scale) VPC is attached on request viasummary(model, response_vpc = TRUE, seed = )ormaihda(..., response_vpc = TRUE, seed = )(it is simulation-based, hence opt-in and seeded). Both are computed from the already-fitted model with no refit, and are skipped for non-binomial outcomes and for the cross-classified fit (whose single-stratum between-variance the MOR/response-VPC require is not defined across crossed random effects). The standalonemaihda_discriminatory_accuracy(),maihda_auc(),maihda_mor(), andmaihda_vpc_response()are unchanged. - Added
maihda_ic(), which surfaces relative-fit information criteria for choosing between model structures (different covariate sets, strata definitions, or families) – the question the VPC/PCV do not answer. It reportsAIC/BICfor the likelihood engines (lme4, ordinalclmm) and the BayesianWAIC/LOOICfor brms, takes one or moremaihda_models (or amaihda_analysis, which expands into its null and adjusted rows), and adds adeltacolumn (gap from the best model on the primary criterion). Crucially it handles the REML pitfall:lmerfits Gaussian models by REML, whose AIC/BIC are not comparable across models with different fixed effects (the canonical null-vs-adjusted MAIHDA case), so when more than one model is supplied any REML fit is refitted with maximum likelihood vialme4::refitML()before the criteria are read – matching whatanova()does onlme4models – and theestimatorcolumn records it. Design-weighted (wemix) pseudo-likelihood fits reportNA(no standard AIC/BIC is defined).compare_maihda()now appends these criteria alongside the VPC/ICC by default (ic = TRUE); setic = FALSEfor the lean VPC-only table. -
stepwise_pcv()now also reports the discriminatory-accuracy trajectory for a binary (binomial/Bernoulli) outcome, alongside the between-stratum-variance / PCV trajectory it already produced – the stepwise discriminatory-accuracy analysis of Merlo et al. (2016). Each step gains anAUC(that model’s C-statistic; step 0 is the strata-only discriminatory accuracy),Step_AUCandTotal_AUC(the absolute change in AUC – delta-AUC – versus the previous step and versus the null, in contrast to the proportionalStep_PCV/Total_PCV), andMOR(the Median Odds Ratio, logit link only) column. No extra models are fit: the columns are read off each step’s already-fitted model viamaihda_discriminatory_accuracy(), so a design-weighted stepwise (sampling_weights) reports the design-weighted AUC, and the columns are simply absent for non-binary outcomes (the gaussian/poisson/ordinal table is unchanged). A newprint()method for themaihda_stepwisetable notes the proportional-vs-absolute distinction. - Added
maihda_interactions(), a diagnostic that flags which strata carry a credibly non-zero intersectional interaction – the heart of “where is there genuine intersectionality”. It reads each stratum’s interaction BLUP (the stratum random effect of the adjusted / crossed-dimensions model, i.e. the departure from the additive main-effects prediction) and flags the strata whose effect is credibly different from zero, returning a rankedmaihda_interactionstable (flagged strata first, by interaction magnitude). For the frequentist engines (lme4/wemix/ordinal) it uses the BLUP’s conditional standard error with an optional multiplicity correction (adjust, default"none"; the docs steer to FDR"BH"for screening many strata, with the full set ofp.adjustmethods available for a reviewer who needs a specific one); forbrmsit uses the exact posterior tail – a credible interval and the probability of directionpd = P(BLUP > 0)– rather than a normal approximation, andadjustis inert (the Bayesian answer is multiplicity-free). Passing amaihda()analysis uses the right (adjusted) model automatically; passing a bare null model is caught with a warning, since its stratum effects mix the additive and interaction parts. The help explains why a correction is optional on already-shrunken BLUP estimates (Gelman, Hill & Yajima 2012; Gelman & Carlin 2014). -
plot()gains ahighlight_interactionsargument (on both amaihda_modeland amaihda()analysis) that focuses and stars themaihda_interactions()-flagged strata on the BLUP-based views (effect_decomp,predicted,obs_vs_shrunken). PassTRUE(flags computed with defaults), a multiple-testing method such as"BH", or a precomputedmaihda_interactionsobject to reuse a specificconf_level/adjust; for an analysis the flags are computed once from the adjusted model and reused across views. Ineffect_decomp, labels follow the selected flagged set, so a BH screen labels only strata that survive the BH adjustment.FALSE(default) leaves every plot unchanged.
Improvements
-
maihda_mor()now requires the logit link, not merely the binomial family. The Median Odds Ratio is an odds-ratio-scale quantity derived from the logistic latent variance, so applying itsexp(sqrt(2 * V_A) * qnorm(0.75))formula to abinomial(link = "probit")fit returned a number that was not on the model’s scale.maihda_mor()now errors for a non-logit binomial link, andmaihda_discriminatory_accuracy()reports the (link-agnostic) AUC withmor = NAfor such fits, recording the link and explaining theNAin itsprint()method. - Clarified that
maihda_vpc_response()collapses the fixed part to its mean linear predictor before simulating, so for an adjusted (covariate) model the response-scale VPC is a conditional-at-mean estimate (evaluated at the average covariate profile), not one marginalised over the covariate distribution. It is exact for the canonical null/strata-only model; the documentation now states this and recommends reading it from the null model. - Updated the “MAIHDA for binary outcomes” vignette, which still claimed the package shipped no AUC/MOR helper and defined a local one. It now uses the exported
maihda_discriminatory_accuracy(),maihda_auc(), andmaihda_mor(), and points tomaihda_vpc_response()for the response-scale VPC. - Added the missing
shinycssloadersdependency to the README’s interactive-dashboard list (it is inDESCRIPTIONSuggests and used byrun_maihda_app()). - Plotting is now unified under the base
plot()generic.compare_maihda()andcompute_maihda_ternary_data()now return classed objects, soplot()dispatches automatically:-
plot()on acompare_maihda()result (wasplot_comparison()) -
plot()on acompare_maihda_groups()result, withtype = "vpc"/“components” (wasplot_group_comparison()) -
plot()on acompute_maihda_ternary_data()result (wasplot_maihda_ternary())
-
- The old
plot_comparison(),plot_group_comparison(), andplot_maihda_ternary()functions still work but are deprecated and emit a one-time warning pointing toplot(). -
fit_maihda()now records lme4 fit-quality diagnostics (singular fit and convergence warnings) on the model object;print()on the model andsummary()surface a “Fit diagnostics” note so a boundary/non-converged fit – which makes the VPC/PCV unreliable – is no longer silent. -
compare_maihda_groups()now raises a single aggregated warning naming any group whose lme4 fit was singular or failed to converge (per-group fits on small strata are where this is most likely), so an unreliable per-group VPC – often pinned at 0 by a boundary fit – is no longer silent. - The
"risk_vs_effect"plot no longer frames the outcome axis as “risk”. A higher predicted value is not universally “bad” (it depends on the outcome), so the axis and title now read as a neutral “mean predicted value/probability”, with a note that the direction depends on the outcome. - Clarified the documentation of
plot_prediction_deviation_panels()to match the implementation: the labelled points use a per-type metric – deviation from the mean prediction for Gaussian/Poisson (and the ordinal expected-score mode), the absolute deviance residual for binomial, and surprise for the ordinal surprise mode – and are not statistical outliers or model-misfit “deviants”. - Clarified that the per-group VPC/ICC in
compare_maihda_groups()is the between-stratum share of variance, which can differ across groups because of the residual variance as well as the between-stratum variance. The documentation now points to thevar_betweencolumn for comparing the absolute amount of intersectional variation, and notes that overlap of separate per-group intervals is not a valid test of whether two groups’ VPCs differ. -
plot()on acompare_maihda_groups()result gainstype = "between_variance"(andplot()on amaihda(group = )analysis gainstype = "group_between_variance"), which bars the absolute between-stratum variance by group – the magnitude of intersectional variation that the VPC share cannot convey. All group plots now name any groups omitted because their VPC was not estimable instead of dropping them silently. - Clarified the PCV documentation (
calculate_pvc(),stepwise_pcv(), and the print method): the PCV is a model-dependent change in between-stratum variance and equals variance “explained” only when the second model nests the first; the stepwise PCV is order-dependent and not a variable’s unique contribution. The vignette and Shiny app no longer describe PCV as variance causally “explained” by main effects or treat a negative PCV as evidence of hidden structural inequality. - Corrected the
summary()VPC/ICC documentation: the denominator is the total unexplained variance, which includes the variance of any additional random effects (not just between-stratum + residual), and a note on the weighted-Gaussian level-1 variance was added. - Documented which families the MAIHDA variance summaries support (
gaussian("identity"), binomial/Bernoulli with logit/probit,poisson("log")); other families such asGamma(link = "log")will fit butsummary()/VPC/PCV will stop with a clear “not implemented” error rather than returning an invalid number. - README clarifications: a note that the CRAN release can lag this repository (so the newest features may require the GitHub development version), and the interactive-dashboard dependency list now includes
future,promises, andhaven(for SPSS/Stata uploads). - Completed the PCV wording cleanup in the remaining vignette and Shiny-app text (the prior pass missed several spots), and softened over-strong app labels: the quadrant plot is “Mean Prediction vs. Stratum Effect” (not “Risk vs. Intersectional Effect”), the cumulative-PCV chart is “Change in Between-Stratum Variance” (not “Variance Explained”), and “deviant strata” is now “most extreme strata”.
- Removed the stale checked-in rendered vignette HTML (
vignettes/*.html); these are build artifacts generated from the.Rmdsources and had drifted out of sync with the corrected text. They are now git-ignored and added to.Rbuildignore, so a locally rendered HTML is never shipped in the tarball and R CMD build regeneratesinst/docfrom the.Rmd. - Fixed an invalid documented URL flagged by
R CMD check --as-cran: themaihda_country_data@sourcelinked tohttps://www.oecd.org/pisa/data/, which returns HTTP 403. It now links to the CRAN page of thelearningtowerpackage (the reproducible access route used to build the dataset), keeping the OECD PISA 2018 attribution in the text. - The
data-raw/maihda_health_data.Rregeneration script no longer callsinstall.packages("NHANES")as a side effect; like the country-data script it now stops with a clear message asking the developer to install the dependency.
Performance
-
make_strata()(and the prediction-time stratum lookup) now matches rows to strata with a single vectorised, collision-safe key match instead of an O(rows x strata x variables) row-by-row scan, so it scales to large survey datasets. Behaviour is unchanged, including the correct handling of values that contain the stratum-label separator.
Bug Fixes
-
calculate_pvc()andcompare_maihda()now apply their analytic-sample identity checks to design-weighted (wemix) fits. AWeMixResultsobject exposes nonobs()/model.frame(), so the row-count, row-identity and outcome-fingerprint guards previously fell through to a silent pass: two wemix fits sharing a formula,nand strata but fit to completely different outcome values compared as if they were the same analytic sample (calculate_pvc()returned a PCV,compare_maihda()reported VPCs, neither warned). The guards now fall back to the wrapper’s stored analytic$data(the rows the engine actually fit), so a mismatched outcome is caught –calculate_pvc()errors andcompare_maihda()warns – exactly as for the lme4 path. Genuinely matched wemix fits are unaffected. - The longitudinal PCV baseline is now the between-stratum variance at the observed baseline time (
ref_time = min(time)), not the raw time-0 random-intercept variance.maihda_longitudinal_pcv()(and themaihda_long_pcvprint method) evaluatedSigma[1, 1]directly, which is the variance attime = 0– correct only when time is centred so the baseline is 0. For a model whose time does not start at zero (e.g. waves 10:12) this reported the PCV of an extrapolated time-0 variance, which is meaningless and can even be negative; the baseline PCV is nowa(t)'Sigma a(t)evaluated atref_time, matching howsummary()reports the baseline VPC. The slope-variance PCV is unchanged (it is invariant to where time is zeroed). - Cross-sectional, single-value-per-stratum summaries are now refused for longitudinal (growth-curve) models rather than silently returning a misleading scalar. A growth model’s stratum estimand is a trajectory (random intercept + slope), so the scalar BLUP that
maihda_table()’s ranked-strata table andplot(type = "predicted" / "obs_vs_shrunken" / "risk_vs_effect" / "effect_decomp" / "prediction_deviation" / "ternary")build (which adds only the random intercept and drops the slope) misrepresents it. These paths now error – or, formaihda_table(), omit the ranking with an explanatory note – and point to the trajectory tools (predict(type = "strata"),plot(type = "trajectories"),plot(type = "vpc_trajectory")).summary()’s stratum-estimates print is relabelled “baseline (intercept) deviations” for a longitudinal fit, with the same pointer. -
maihda_ic()now reports the analytic sample sizenfor a design-weighted (wemix) fit instead ofNA.WeMixResultshas nonobs()method, so the IC table fell back toNA; it now usesnrow(model$data)(the rows the fit used), matching thenobsthatglance()already reports for the same model. -
Corrected the PCV for Gaussian
lmermodels (the REML pitfall). The proportional change in between-stratum variance compares the stratum variance of a null and an adjusted model that differ in their fixed effects (the adjusted model adds the dimensions’ additive main effects).lmerfits Gaussian models by REML, and a REML variance estimate is not comparable across models with different fixed effects, so the PCV was biased – it overstated the adjusted model’s residual between-stratum variance and therefore understated the additive share (the PCV).calculate_pvc()– and hencemaihda()’s PCV,stepwise_pcv(), and the per-group PCV incompare_maihda_groups()– now refits any REMLlmermodel with maximum likelihood (lme4::refitML()) before reading the variances (and before the parametric bootstrap, so the interval matches), exactly asmaihda_ic()already does for AIC/BIC and asanova()does onlme4models. In simulations with a known 60% additive share (40% interaction), the reported PCV moved from ~50% (biased, REML) to ~58–60% (ML), recovering the truth.glmer(GLMM) fits, the brms/wemix/ordinal engines, single-model VPC/ICC summaries (which correctly stay REML, being comparison-free), and singular/boundary fits are unaffected. Incompare_maihda_groups(),var_between_adjustedis now reported asvar_between * (1 - pcv)so it shares the REML scale of the VPC’svar_betweenand the table stays internally coherent. - VPC/ICC is no longer reported for a Gaussian model fit with a non-identity link (e.g.
gaussian(link = "log")). The residual variance is on the response scale while the between-stratum variance is on the link scale, sosummary(),compare_maihda(), andcalculate_pvc()now raise a clear error instead of silently returning an invalid variance partition. - Binary-outcome auto-detection now keys off the analytic model frame – after applying covariate transformations (e.g.
log(x)), dropping rows with missing values, dropping rows with a missing prior weight, and applying anysubset=(including negative/character row indices) – instead of the raw outcome column. An outcome that is only 0/1 once excluded rows are removed is now correctly fit withfamily = "binomial". - Data-masked engine arguments forwarded through
...(e.g.weights=,subset=,offset=) now work at any nesting depth, including through themaihda()andcompare_maihda_groups()wrappers. Arguments are captured as quosures and resolved with the data mask (rlang::eval_tidy), fixing the previous “object not found” / “..1 used in an incorrect context” failures (a directfit_maihda()call worked, but the wrappers did not). - A binary outcome is now recoded to 0/1 in a way that no longer breaks a
subset=expression referencing the original response labels (e.g.subset = y %in% c("no", "yes")): the subset is evaluated against the original response before recoding. -
compare_maihda_groups()now slices forwardedweights=/subset=/offset=to each group’s rows before fitting, not just for the row-count guard. An external (non-column)weightsvector previously failed every group with a length mismatch, and an externalsubsetvector could be recycled onto the wrong rows of later groups; both now align correctly per group. - The Gaussian VPC/ICC now accounts for prior
weights. With weights the per-observation residual variance issigma^2 / w_i, so the level-1 variance reported is the mean conditional residual variancemean(sigma^2 / w_i)rather than a singlesigma^2; unweighted models are unchanged. -
plot_effect_decomposition()now uses the stratum random effect (BLUP) itself as the intersectional component instead of (full prediction - fixed prediction). With additional random effects such as(1 | site)the latter wrongly absorbed those effects into the stratum component. - The stratum-level “surprise” in
plot_prediction_deviation_panels()(ordinal, surprise mode) is now the average per-observation surprise,mean(-log(p))(log loss), instead of-log(mean(p)), which could change stratum rankings. - The Shiny app (
run_maihda_app()) now also auto-detects a numeric 0/1 outcome and fits it as binomial under the default family, matching the core API, instead of silently fitting a linear probability model. -
compare_maihda_groups()now warns when groups end up with different populated strata even undershared_strata = TRUE, since their VPCs are then estimated over different stratum support and are not strictly directly comparable. -
plot_prediction_deviation_panels()now plots Poisson/count models on the response (expected-count) scale with count labels, rather than routing them through the Gaussian link-scale branch. -
compare_maihda_groups(min_group_n = ...)now guards the analytic sample size (the rows the model actually fits) rather than the raw group row count, so a group with enough raw rows but a tiny usable sample is skipped instead of being fit on a handful of observations. -
predict_maihda(type = "strata")now respectsnewdata: it returns only the strata present innewdata(and errors on a stratum the model never saw, astype = "individual"does) instead of always returning every training stratum. Withnewdata = NULLit still returns all strata. -
compare_maihda_groups()now counts populated strata for the pre-fit guard on the analytic model frame, not the raw subgroup. A group with two raw strata but only one stratum left after missing-row removal is now cleanly skipped as VPC-undefined instead of failing during fitting with “grouping factors must have > 1 sampled level”. -
n_bootfor bootstrap intervals must now be at least 10 (the minimum number of successful refits an interval requires); an unusably smalln_bootfails immediately with a clear message instead of only erroring after the bootstrap runs. -
calculate_pvc()andcompare_maihda()now detect differing priorweights: previously two models fit to the same rows/outcome/strata but with different weights compared as if equivalent (PCV returned silently, no warning), even though weights change the variance estimates.calculate_pvc()now errors andcompare_maihda()warns; an unweighted fit and an explicit unit-weight fit are still treated as equal. - When a two-level non-0/1 outcome is recoded to 0/1, the chosen mapping (which level becomes the modeled event = 1) is now reported via a
message()and stored on the model as$response_recoding. Previously the mapping followed alphabetical (character) or declared (factor) level order silently, with no signal at all whenfamily = "binomial"was passed explicitly, so the modeled event could be inverted unnoticed. The 0/1 assignment rule is unchanged (it matches baseglm). - Stratum-level plot aggregations (
plot(type = "predicted"),"risk_vs_effect","obs_vs_shrunken","effect_decomp") now honour priorweights, collapsing per-stratum predictions with a prior-weight-weighted mean (and weighting the reference lines by the summed weights). This makes the plots consistent with the weighted Gaussian VPC for survey/weighted fits; unweighted models are unaffected, and aggregated-binomial (cbind) fits, whoseweights(type = "prior")are the trials, are left unweighted to avoid double-counting. - The Shiny app’s “Compute Bootstrap CIs” control now actually produces the VPC/ICC bootstrap intervals it (and the vignette) advertise. The bootstrap was previously applied only to the PCV, so the headline VPC/ICC was shown as a point estimate with no interval. The interval is now computed in the background worker (keeping the UI responsive) and displayed alongside the VPC/ICC in the Model Summary and Interactive Explorer tabs; the PCV interval in the PCV Results tab is unchanged.
- The Shiny app (
run_maihda_app()) no longer aborts the whole analysis when the baseline (null) model has zero or negative between-stratum variance (a singular / no-between-variation fit), which makescalculate_pvc()error by design. The fitted model, VPC/ICC, summaries, and plots are now shown as usual and the PCV is reported as unavailable (with the underlying reason) instead of failing the entire workflow. -
compare_maihda_groups()now accepts a bare family function (e.g.family = stats::gaussian) – one of the documented forms (“as infit_maihda”), alongside a family name ("gaussian") and a family object (gaussian()). The per-group fits already handled it; only the family metadata recorded on the result did not, so the call fit every group and then failed with “object of type ‘closure’ is not subsettable”. The family function is now resolved to a family object up front, exactly asfit_maihda()does.
Diagnostics
-
brmsfits now record MCMC convergence diagnostics (maximum Rhat and the number of divergent transitions) alongside the engine, surfaced in the “Fit diagnostics” block ofprint()/summary()so a non-converged or divergent Bayesian fit is no longer silent. - Bootstrap VPC/PCV intervals now report Monte Carlo error:
summary()andcalculate_pvc()print the number of successful bootstrap draws and the Monte Carlo standard error so the precision of an interval can be judged.
MAIHDA 0.1.10
New Features
- Added
compare_maihda_groups()to compare intersectional inequality (VPC/ICC and between-/within-stratum variance) across levels of a higher-level grouping variable such as country, region, or survey wave. It fits a stratified MAIHDA model per group, by default using shared/global strata so VPCs are directly comparable, with optional per-group bootstrap confidence intervals. - Added
plot_group_comparison()to visualize the result either as a VPC-by-group forest plot or as stacked variance-composition bars.
Bug Fixes
- Fixed parametric-bootstrap confidence intervals for VPC (
summary()) and PVC (calculate_pvc()): failedrefit()iterations were silently recorded as0instead of being dropped, biasing intervals toward zero and suppressing the high-failure-rate warning. Failed iterations are now excluded correctly. - Corrected the Poisson VPC residual-variance approximation to
log(1 + 1/mu)(Stryhn et al. 2006); the previous1/mulinearization biased the VPC downward for low-count outcomes. -
plot_prediction_deviation_panels()no longer draws zero-width “95% CI” bars when the underlying model does not supply standard errors (e.g.lme4::merMod); intervals are omitted instead of collapsed.
MAIHDA 0.1.8
CRAN release: 2026-05-16
General Updates & New Features
- Added
plot_prediction_deviation_panels()function for visualizing predicted values and identifying deviant cases. - Added
plot_risk_vs_effect()function to create a quadrant scatterplot comparing overall marginal predicted risk against pure intersectional effects. - Added
plot_effect_decomposition()function to visually decompose the total deviation from the overall mean into additive and intersectional components. - Replaced the redundant “caterpillar” plot with the “predicted” plot in
plot()and the interactive dashboard. - Added automatic tertile binning (via an
autobinparameter) for numeric grouping variables with more than 10 unique values inmake_strata(). - Updated the interactive Shiny Dashboard (
run_maihda_app()) to include the new visualizations and a toggle for auto-binning continuous strata variables. - Added detection for binomial data.
fit_maihda()will now automatically detect binomial outcomes and switch to the appropriate family.
Bug Fixes
-
VPC/ICC Calculation Fix: Corrected the residual variance estimation for binomial and ordinal models. The package now accurately applies the theoretical level-1 variance ( for
"logit"links and for"probit"links) internally when summarizing models or bootstrapping the variance partition coefficient, avoiding deflated VPC/ICC metrics.
MAIHDA 0.1.7
CRAN release: 2026-04-05
General Updates & New Features
- Added
stepwise_pcv()function to sequentially estimate proportional change in variance (PCV) by adding predictors one-by-one. - Added a fully-featured interactive Shiny Dashboard (via
run_maihda_app()) for visual data exploration, model fitting, and performance visualization. - Improved bootstrap methods for more efficient confidence interval estimation.
- Added missing documentation block for the
maihda_sim_datadataset to resolveR CMD checkwarnings. - Updated test suite setup:
tests/testthat.Rwas modified to correctly usetest_check("MAIHDA")instead ofshinytest2. - Added
importFrom(stats, as.formula)for thestepwise_pcvfunction to prevent undefined warnings. - Updated
introduction.Rmdvignette: added standard CRAN installation instructions, and improved text clarity.
MAIHDA 0.1.0
CRAN release: 2026-04-03
Initial Release
- Initial CRAN submission
- Added
make_strata()function for creating intersectional strata - Added
fit_maihda()function for fitting multilevel models with lme4 (default) or brms engines - Added
summary()function for variance partition and stratum estimates - Added
predict_maihda()function for individual and stratum-level predictions - Added
plot()function with three plot types:- Caterpillar plots of stratum random effects
- Variance partition coefficient visualization
- Observed vs. shrunken estimates comparison
- Added
compare_maihda()function for comparing models with bootstrap confidence intervals - Added comprehensive documentation and vignettes
- Added unit tests for core functionality
Bug Fixes and Improvements
- Enhanced
make_strata()to properly handle missing values (NA) in input variables:- Observations with missing values in any stratum variable are now assigned NA stratum
- Missing values are no longer included as valid stratum categories
- Added comprehensive tests for missing value handling
