This function creates strata (intersectional categories) from multiple categorical variables in a dataset.
Arguments
- data
A data frame containing the variables to create strata from.
- vars
Character vector of variable names to use for creating strata.
- sep
Separator to use between variable values when creating stratum labels. Default is "_".
- min_n
Minimum number of observations required for a stratum to be included. Strata with fewer observations will be coded as NA. Default is 1.
Value
A list with two elements:
- data
The original data frame with an added 'stratum' column. The strata_info is also attached as an attribute for use by fit_maihda()
- strata_info
A data frame with information about each stratum including counts and the combination of variable values
Details
If any of the specified variables has a missing value (NA) for a given observation, that observation will be assigned to the NA stratum (stratum = NA), rather than creating a stratum that includes the missing value.
The strata_info data frame is also attached as an attribute to the data, which allows fit_maihda() to automatically capture stratum labels for use in plots and summaries.
Examples
# Create strata from gender and race variables
result <- make_strata(maihda_sim_data, vars = c("gender", "race"))
print(result$strata_info)
#> stratum label n gender race
#> 1 1 Female_Asian 7 Female Asian
#> 2 2 Female_Black 50 Female Black
#> 3 3 Female_Hispanic 32 Female Hispanic
#> 4 4 Female_White 150 Female White
#> 5 5 Male_Asian 13 Male Asian
#> 6 6 Male_Black 44 Male Black
#> 7 7 Male_Hispanic 52 Male Hispanic
#> 8 8 Male_White 152 Male White
