Adiposity SDS (Sex-Stratified)
Source:vignettes/articles/adiposity_sds_strat.Rmd
adiposity_sds_strat.RmdScope
Sex-stratified SDS (z-scores) for anthropometric measures using separate male/female references, with NA policies, optional extreme screening, and flexible variable mapping.
When to use this
- You need sex-specific standardization before modeling or flagging outliers.
- Your study has both men and women and you want SDS relative to sex-appropriate reference means/SDs.
- You want built-in options for missing data handling and plausibility screening.
What you need (rich guide)
- Required:
sex(coded M/F or 1/2 after mapping) and at least one anthropometric variable present in bothref$Mandref$F. - Reference list:
ref$Mandref$Fmust cover identical variable names, each withmeanandsd > 0. - Units: use the same units in your data as in the reference (e.g., BMI kg/m^2, waist cm, height cm).
- Optional:
allow_partial = TRUEto skip missing variables instead of erroring.
Load packages and data
Use a small slice of the packaged simulated data. Replace
sim_small with your data.
library(HealthMarkers)
library(dplyr)
sim_path <- system.file("extdata", "simulated_hm_data.rds", package = "HealthMarkers")
sim <- readRDS(sim_path)
# Normalize sex labels to M/F and ensure both sexes appear in the demo slice
sim_small <- sim %>%
mutate(sex = case_when(
sex %in% c("M", "F") ~ sex,
sex %in% c("male", "Male", "m") ~ "M",
sex %in% c("female", "Female", "f") ~ "F",
sex %in% c(1, "1") ~ "M",
sex %in% c(2, "2") ~ "F",
TRUE ~ NA_character_
)) %>%
filter(!is.na(sex)) %>%
slice_head(n = 200)Quick start (self-contained, always runs)
adiposity_sds_strat() returns a tibble of only
the <var>_SDS columns — not the original
data. Use dplyr::bind_cols() to rejoin.
# Hard-coded reference values (typical population means/SDs)
ref <- list(
M = list(BMI = c(mean = 24.5, sd = 3.8), waist = c(mean = 88, sd = 12)),
F = list(BMI = c(mean = 22.1, sd = 4.2), waist = c(mean = 76, sd = 11))
)
df_demo <- data.frame(
sex = c("M", "F", "M", "F"),
BMI = c(25.2, 21.8, 27.1, 19.5),
waist = c(85, 72, 95, 68)
)
col_map <- list(sex = "sex", vars = list(BMI = "BMI", waist = "waist"))
sds_out <- adiposity_sds_strat(df_demo, col_map = col_map, ref = ref)
sds_out
#> # A tibble: 4 × 2
#> BMI_SDS waist_SDS
#> <dbl> <dbl>
#> 1 0.184 -0.25
#> 2 -0.0714 -0.364
#> 3 0.684 0.583
#> 4 -0.619 -0.727Spot-check row 1 (male): BMI_SDS = (25.2 − 24.5) / 3.8 = 0.1842. Row 2 (female): BMI_SDS = (21.8 − 22.1) / 4.2 = -0.0714.
Walkthrough with simulated data (build sex-specific refs)
# Build sex-specific references from the demo slice
ref_sex <- sim_small |>
group_by(sex) |>
summarise(
BMI = list(c(mean = mean(BMI, na.rm = TRUE), sd = sd(BMI, na.rm = TRUE))),
waist = list(c(mean = mean(waist, na.rm = TRUE), sd = sd(waist, na.rm = TRUE))),
height = list(c(mean = mean(height, na.rm = TRUE), sd = sd(height, na.rm = TRUE)))
)
if (!all(c("M", "F") %in% ref_sex$sex)) {
stop("Demo data slice must contain both M and F; normalize or increase sample.")
}
ref <- list(
M = list(
BMI = ref_sex$BMI[[ref_sex$sex == "M"]],
waist = ref_sex$waist[[ref_sex$sex == "M"]],
height = ref_sex$height[[ref_sex$sex == "M"]]
),
F = list(
BMI = ref_sex$BMI[[ref_sex$sex == "F"]],
waist = ref_sex$waist[[ref_sex$sex == "F"]],
height = ref_sex$height[[ref_sex$sex == "F"]]
)
)
asds_strat <- adiposity_sds_strat(
data = sim_small,
col_map = col_map,
ref = ref,
na_action = "keep",
check_extreme = FALSE,
extreme_action = "cap",
allow_partial = FALSE,
prefix = "",
verbose = FALSE
)
new_cols <- setdiff(names(asds_strat), names(sim_small))
asds_strat %>%
slice_head(n = 5) %>%
select(all_of(new_cols))Interpretation: each <var>_SDS is standardized
against sex-specific mean/SD; values near 0 are average for that sex,
positive above, negative below.
Missing data and extremes
Compare row handling and capping.
demo <- sim_small[1:8, c("sex", "BMI", "waist", "height")]
demo$BMI[3] <- NA # missing
demo$waist[5] <- 300 # extreme high waist
demo$height[2] <- 20 # extreme low height
demo_keep <- adiposity_sds_strat(
data = demo,
col_map = col_map,
ref = ref,
na_action = "keep",
check_extreme = TRUE,
extreme_action = "cap",
allow_partial = FALSE,
prefix = ""
)
demo_omit <- adiposity_sds_strat(
data = demo,
col_map = col_map,
ref = ref,
na_action = "omit",
check_extreme = TRUE,
extreme_action = "cap",
allow_partial = FALSE,
prefix = ""
)
list(
keep_rows = nrow(demo_keep),
omit_rows = nrow(demo_omit),
capped_preview = demo_keep %>% select(ends_with("_SDS")) %>% slice_head(n = 3)
)Column prefix
Use prefix to namespace output columns when combining
multiple calls on the same data frame:
# Same data as quick-start but prefix with "z_"
sds_prefixed <- adiposity_sds_strat(
df_demo, col_map = col_map, ref = ref, prefix = "z_"
)
names(sds_prefixed) # "z_BMI_SDS" "z_waist_SDS"
#> [1] "z_BMI_SDS" "z_waist_SDS"Verbose diagnostics
Set verbose = TRUE to emit three structured messages per
call:
- Preparing inputs — start-of-function signal.
-
Column map — confirms which data column each mapped
variable resolved to. Example:
adiposity_sds_strat(): column map: BMI -> 'BMI', waist -> 'waist', sex -> 'sex', ... -
Results summary — shows how many rows computed
successfully (non-NA) per output column. Example:
adiposity_sds_strat(): results: BMI_SDS 28/30, waist_SDS 30/30, ...
verbose = TRUE emits at the "inform" level;
options(healthmarkers.verbose = "inform") must also be
active:
old_opt <- options(healthmarkers.verbose = "inform")
invisible(adiposity_sds_strat(df_demo, col_map = col_map, ref = ref))
#> adiposity_sds_strat(): preparing inputs (2 vars, 4 rows)
#> adiposity_sds_strat(): column map: BMI -> 'BMI', waist -> 'waist'
#> adiposity_sds_strat(): results: BMI_SDS 4/4, waist_SDS 4/4
options(old_opt)Reset with options(healthmarkers.verbose = NULL) or
"none".
Expectations
-
col_map$sexmust be present; sex must map to M/F (or 1/2) or the call errors. -
ref$Mandref$Fmust list identical variables with finitemeanandsd > 0. - All mapped columns must exist unless
allow_partial = TRUE, in which case missing variables are skipped with an info message. -
na_actioncontrols whether rows with missing inputs are kept, dropped, or error; missing values propagate to SDS when kept. -
check_extreme+extreme_actionscreen raw inputs before standardizing; SDS outputs themselves are not capped. - Numeric coercion happens with warnings if NAs are introduced; align your data units with the reference units.
Common pitfalls
- Sex not normalized to M/F (or 1/2) leads to an error; recode before calling.
- References built from a small or biased subset will give unstable SDS; use appropriate population means/SDs.
- Unit mismatches (e.g., inches vs cm, lb vs kg) will distort SDS; keep units consistent with the reference.
- Forgetting
allow_partial = TRUEwhen some variables are absent will error; enable it to compute what’s available.
Validation notes
- Within each sex, SDS should center near 0 with SD near 1 if references match the data population.
- Spot-check a variable:
(value - sex_mean)/sex_sdshould match the corresponding<var>_SDS. - Use
check_extreme = TRUEwith tailoredextreme_rulesto catch implausible anthropometrics before standardizing.
See also
-
adiposity_sds()for pooled-sex SDS. -
obesity_indices()andobesity_metrics()for related adiposity markers.