library(tidyverse)
library(lubridate)
# Raw demographics
raw_demographics <- tribble(
~pt_id, ~site, ~rand_date, ~age, ~gender, ~ethnic, ~weight_kg, ~height_cm, ~trt_grp,
"0001", "001", "15Jan2024", 54, "Male", "Caucasian", 92.3, 172, "A",
"0002", "001", "15Jan2024", 62, "Female", "Caucasian", 78.1, 165, "B",
"0003", "001", "16Jan2024", 48, "Female", "Asian", 88.5, 162, "P",
"0004", "002", "17Jan2024", 71, "Male", "African American", 95.0, 178, "A",
"0005", "002", "17Jan2024", 39, "Female", "Caucasian", 70.2, 160, "B",
"0006", "002", "18Jan2024", 58, "Male", "Hispanic", 88.0, 175, "P",
"0007", "003", "18Jan2024", 45, "Female", "Caucasian", 82.3, 168, "A",
"0008", "003", "19Jan2024", 67, "Male", "Asian", 97.1, 170, "B",
"0009", "003", "19Jan2024", 52, "Female", "African American", 76.4, 163, "P",
"0010", "003", "20Jan2024", 44, "Male", "Caucasian", 91.5, 180, "A"
)
# Raw lab data
raw_labs <- tribble(
~pt_id, ~site, ~lab_date, ~test_name, ~result, ~unit, ~visit_num, ~visit_name,
"0001", "001", "15Jan2024", "HbA1c", 8.2, "%", 1, "Screening",
"0001", "001", "15Jan2024", "Glucose", 9.1, "mmol/L", 1, "Screening",
"0001", "001", "15Apr2024", "HbA1c", 7.6, "%", 2, "Week 13",
"0001", "001", "15Jul2024", "HbA1c", 7.1, "%", 3, "Week 26",
"0001", "001", "15Jan2025", "HbA1c", 6.8, "%", 4, "Week 52",
"0002", "001", "15Jan2024", "HbA1c", 9.0, "%", 1, "Screening",
"0002", "001", "15Jan2024", "Glucose", 10.2, "mmol/L", 1, "Screening",
"0002", "001", "15Apr2024", "HbA1c", 8.3, "%", 2, "Week 13",
"0002", "001", "15Jul2024", "HbA1c", 7.8, "%", 3, "Week 26",
"0002", "001", "15Jan2025", "HbA1c", 7.2, "%", 4, "Week 52",
"0003", "001", "16Jan2024", "HbA1c", 7.8, "%", 1, "Screening",
"0003", "001", "16Jan2024", "Glucose", 8.9, "mmol/L", 1, "Screening",
"0003", "001", "16Apr2024", "HbA1c", 7.9, "%", 2, "Week 13",
"0003", "001", "16Jul2024", "HbA1c", 7.7, "%", 3, "Week 26",
"0003", "001", "16Jan2025", "HbA1c", 7.5, "%", 4, "Week 52"
)
glimpse(raw_demographics)
glimpse(raw_labs)2 Module 2: SDTM Deep Dive
Clinical Data Science for Pharma: CDISC From Scratch
2.1 What We Do in This Module
In Module 1 we learned what SDTM is. In this module we actually build it.
By the end of this module you will:
- Understand the raw data you start with (simulated EDC export)
- Know the rules for mapping raw data to SDTM
- Build a DM domain (Demographics) in both SAS and R
- Build an LB domain (Laboratory) in both SAS and R
- Understand what a Reviewer’s Guide (SDRG) is
We will use our running trial: GLPX-001.
2.2 Step 1: The Raw Data
Before SDTM exists, you have raw data exported from the EDC system. This is messy, inconsistent, and not submission-ready.
Here is what a raw demographics export might look like from GLPX-001:
| pt_id | site | rand_date | age | gender | ethnic | weight_kg | height_cm | trt_grp |
|---|---|---|---|---|---|---|---|---|
| 0001 | 001 | 15Jan2024 | 54 | Male | Caucasian | 92.3 | 172 | A |
| 0002 | 001 | 15Jan2024 | 62 | Female | Caucasian | 78.1 | 165 | B |
| 0003 | 001 | 16Jan2024 | 48 | Female | Asian | 88.5 | 162 | P |
| 0004 | 002 | 17Jan2024 | 71 | Male | African American | 95.0 | 178 | A |
| 0005 | 002 | 17Jan2024 | 39 | Female | Caucasian | 70.2 | 160 | B |
Problems with this raw data:
pt_idis not globally unique — site 001 and site 002 both have a0001genderuses “Male/Female” — SDTM requires M/Fethnicuses free text — SDTM requires controlled terminologyrand_dateis in SAS date format (15Jan2024) — SDTM requires ISO 8601trt_grpuses A/B/P codes — SDTM needs the full arm description- No
STUDYID, noDOMAINcolumn
Mapping is the process of converting this raw data to SDTM. This is the core job of a SDTM programmer.
2.3 Step 2: Create the Simulated Raw Data
Before mapping, we need our raw data to exist. Let us create it in both SAS and R so we have something to work with throughout the module.
2.3.1 Create raw data in SAS
/*=============================================================
GLPX-001 | Module 2 | Create simulated raw demographics data
=============================================================*/
data raw.demographics;
infile datalines delimiter='|' missover;
input pt_id $ site $ rand_date $ age gender $ ethnic $
weight_kg height_cm trt_grp $;
datalines;
0001|001|15Jan2024|54|Male|Caucasian|92.3|172|A
0002|001|15Jan2024|62|Female|Caucasian|78.1|165|B
0003|001|16Jan2024|48|Female|Asian|88.5|162|P
0004|002|17Jan2024|71|Male|African American|95.0|178|A
0005|002|17Jan2024|39|Female|Caucasian|70.2|160|B
0006|002|18Jan2024|58|Male|Hispanic|88.0|175|P
0007|003|18Jan2024|45|Female|Caucasian|82.3|168|A
0008|003|19Jan2024|67|Male|Asian|97.1|170|B
0009|003|19Jan2024|52|Female|African American|76.4|163|P
0010|003|20Jan2024|44|Male|Caucasian|91.5|180|A
;
run;
/* Raw lab data */
data raw.labs;
infile datalines delimiter='|' missover;
input pt_id $ site $ lab_date $ test_name $ result unit $
visit_num visit_name $;
datalines;
0001|001|15Jan2024|HbA1c|8.2|%|1|Screening
0001|001|15Jan2024|Glucose|9.1|mmol/L|1|Screening
0001|001|15Apr2024|HbA1c|7.6|%|2|Week 13
0001|001|15Jul2024|HbA1c|7.1|%|3|Week 26
0001|001|15Jan2025|HbA1c|6.8|%|4|Week 52
0002|001|15Jan2024|HbA1c|9.0|%|1|Screening
0002|001|15Jan2024|Glucose|10.2|mmol/L|1|Screening
0002|001|15Apr2024|HbA1c|8.3|%|2|Week 13
0002|001|15Jul2024|HbA1c|7.8|%|3|Week 26
0002|001|15Jan2025|HbA1c|7.2|%|4|Week 52
0003|001|16Jan2024|HbA1c|7.8|%|1|Screening
0003|001|16Jan2024|Glucose|8.9|mmol/L|1|Screening
0003|001|16Apr2024|HbA1c|7.9|%|2|Week 13
0003|001|16Jul2024|HbA1c|7.7|%|3|Week 26
0003|001|16Jan2025|HbA1c|7.5|%|4|Week 52
;
run;
2.3.2 Create raw data in R
2.4 Step 3: Build the DM Domain
The Demographics (DM) domain is always built first because every other domain needs USUBJID — which is constructed here.
2.4.1 The mapping rules for DM
| SDTM Variable | Source | Rule |
|---|---|---|
STUDYID |
Hardcoded | “GLPX-001” |
DOMAIN |
Hardcoded | “DM” |
USUBJID |
Constructed | STUDYID + “-” + site + “-” + pt_id |
SUBJID |
pt_id | As-is |
SITEID |
site | As-is |
AGE |
age | As-is (numeric) |
AGEU |
Hardcoded | “YEARS” |
SEX |
gender | Male → M, Female → F |
RACE |
ethnic | Map to CDISC controlled terminology |
RFSTDTC |
rand_date | Convert to ISO 8601 (YYYY-MM-DD) |
ARMCD |
trt_grp | A → DRUG1, B → DRUG2, P → PLACEBO |
ARM |
trt_grp | A → “Drug 1mg”, B → “Drug 2mg”, P → “Placebo” |
ACTARMCD |
trt_grp | Same as ARMCD (actual = planned here) |
ACTARM |
trt_grp | Same as ARM |
2.4.2 Build DM in SAS
/*=============================================================
GLPX-001 | Module 2 | Build SDTM DM domain
=============================================================*/
/* Step 1: Define study-level constants */
%let studyid = GLPX-001;
/* Step 2: Map raw demographics to SDTM DM */
data sdtm.dm;
set raw.demographics;
/* Study and domain */
STUDYID = "&studyid";
DOMAIN = "DM";
/* Unique subject identifier */
USUBJID = cats(STUDYID, "-", site, "-", pt_id);
SUBJID = pt_id;
SITEID = site;
/* Age */
AGE = age;
AGEU = "YEARS";
/* Sex: map to CDISC controlled terminology */
if upcase(gender) = "MALE" then SEX = "M";
else if upcase(gender) = "FEMALE" then SEX = "F";
else SEX = "U";
/* Race: map to CDISC controlled terminology */
select (upcase(ethnic));
when ("CAUCASIAN") RACE = "WHITE";
when ("ASIAN") RACE = "ASIAN";
when ("AFRICAN AMERICAN") RACE = "BLACK OR AFRICAN AMERICAN";
when ("HISPANIC") RACE = "WHITE"; /* Hispanic is ETHNIC not RACE */
otherwise RACE = "UNKNOWN";
end;
/* Ethnicity (separate from race in CDISC) */
if upcase(ethnic) = "HISPANIC" then ETHNIC = "HISPANIC OR LATINO";
else ETHNIC = "NOT HISPANIC OR LATINO";
/* Randomisation date: convert SAS date literal to ISO 8601 */
_rand_date = input(rand_date, date9.);
RFSTDTC = put(_rand_date, yymmdd10.); /* Produces YYYY-MM-DD */
/* Treatment arm */
select (trt_grp);
when ("A") do; ARMCD = "DRUG1"; ARM = "Drug 1mg"; end;
when ("B") do; ARMCD = "DRUG2"; ARM = "Drug 2mg"; end;
when ("P") do; ARMCD = "PLACEBO"; ARM = "Placebo"; end;
otherwise do; ARMCD = ""; ARM = ""; end;
end;
/* Actual arm = planned arm (no switches in this trial) */
ACTARMCD = ARMCD;
ACTARM = ARM;
/* Keep only SDTM variables — drop raw source variables */
keep STUDYID DOMAIN USUBJID SUBJID SITEID
AGE AGEU SEX RACE ETHNIC RFSTDTC
ARMCD ARM ACTARMCD ACTARM;
run;
/* Step 3: Sort by USUBJID (required for SDTM submission) */
proc sort data=sdtm.dm;
by USUBJID;
run;
/* Step 4: Quick check */
proc print data=sdtm.dm (obs=5);
title "SDTM DM Domain — First 5 Records";
run;
2.4.3 Build DM in R
library(tidyverse)
library(lubridate)
# Helper: parse SAS-style dates like "15Jan2024" to Date
parse_sas_date <- function(x) {
as.Date(x, format = "%d%b%Y")
}
# Helper: format Date to ISO 8601 character
to_iso8601 <- function(x) {
format(x, "%Y-%m-%d")
}
# Map race to CDISC controlled terminology
map_race <- function(ethnic) {
case_when(
str_to_upper(ethnic) == "CAUCASIAN" ~ "WHITE",
str_to_upper(ethnic) == "ASIAN" ~ "ASIAN",
str_to_upper(ethnic) == "AFRICAN AMERICAN" ~ "BLACK OR AFRICAN AMERICAN",
str_to_upper(ethnic) == "HISPANIC" ~ "WHITE",
TRUE ~ "UNKNOWN"
)
}
# Map ethnicity to CDISC controlled terminology
map_ethnic <- function(ethnic) {
case_when(
str_to_upper(ethnic) == "HISPANIC" ~ "HISPANIC OR LATINO",
TRUE ~ "NOT HISPANIC OR LATINO"
)
}
# Build DM domain
sdtm_dm <- raw_demographics |>
mutate(
STUDYID = "GLPX-001",
DOMAIN = "DM",
USUBJID = paste(STUDYID, site, pt_id, sep = "-"),
SUBJID = pt_id,
SITEID = site,
AGE = age,
AGEU = "YEARS",
SEX = case_when(
str_to_upper(gender) == "MALE" ~ "M",
str_to_upper(gender) == "FEMALE" ~ "F",
TRUE ~ "U"
),
RACE = map_race(ethnic),
ETHNIC = map_ethnic(ethnic),
RFSTDTC = to_iso8601(parse_sas_date(rand_date)),
ARMCD = case_when(
trt_grp == "A" ~ "DRUG1",
trt_grp == "B" ~ "DRUG2",
trt_grp == "P" ~ "PLACEBO"
),
ARM = case_when(
trt_grp == "A" ~ "Drug 1mg",
trt_grp == "B" ~ "Drug 2mg",
trt_grp == "P" ~ "Placebo"
),
ACTARMCD = ARMCD,
ACTARM = ARM
) |>
# Keep only SDTM variables
select(STUDYID, DOMAIN, USUBJID, SUBJID, SITEID,
AGE, AGEU, SEX, RACE, ETHNIC, RFSTDTC,
ARMCD, ARM, ACTARMCD, ACTARM) |>
# Sort by USUBJID
arrange(USUBJID)
# Preview
print(sdtm_dm)2.4.4 What the DM domain looks like
After mapping, the first 3 rows of sdtm.dm should look like:
| STUDYID | DOMAIN | USUBJID | AGE | SEX | RACE | ETHNIC | RFSTDTC | ARMCD | ARM |
|---|---|---|---|---|---|---|---|---|---|
| GLPX-001 | DM | GLPX-001-001-0001 | 54 | M | WHITE | NOT HISPANIC OR LATINO | 2024-01-15 | DRUG1 | Drug 1mg |
| GLPX-001 | DM | GLPX-001-001-0002 | 62 | F | WHITE | NOT HISPANIC OR LATINO | 2024-01-15 | DRUG2 | Drug 2mg |
| GLPX-001 | DM | GLPX-001-001-0003 | 48 | F | ASIAN | NOT HISPANIC OR LATINO | 2024-01-16 | PLACEBO | Placebo |
HISPANIC is an ethnicity in CDISC, not a race. The RACE variable uses categories like WHITE, ASIAN, BLACK OR AFRICAN AMERICAN. ETHNIC is a separate variable. Always check the CDISC Controlled Terminology list when you are unsure.
2.5 Step 4: Build the LB Domain
Laboratory (LB) is one of the most complex SDTM domains because subjects have many lab results at many timepoints.
2.5.1 The mapping rules for LB
| SDTM Variable | Source | Rule |
|---|---|---|
STUDYID |
Hardcoded | “GLPX-001” |
DOMAIN |
Hardcoded | “LB” |
USUBJID |
Constructed | Same rule as DM |
LBSEQ |
Derived | Sequence number within subject |
LBTESTCD |
test_name | HbA1c → HBA1C, Glucose → GLUC |
LBTEST |
test_name | Full label |
LBORRES |
result | As character string (original result) |
LBORRESU |
unit | Original unit |
LBSTRESC |
result | Standardised result (character) |
LBSTRESN |
result | Standardised result (numeric) |
LBSTRESU |
unit | Standardised unit |
LBDTC |
lab_date | ISO 8601 |
VISITNUM |
visit_num | As-is |
VISIT |
visit_name | As-is |
LBORRES— the result exactly as reported (character). Could be “8.2” or “>10” or “POSITIVE”LBSTRESC— the result after standardisation (character). Units converted to a standardLBSTRESN— the numeric version ofLBSTRESC(only populated for numeric results)
In our simple trial all units are already standard so they will be the same. In real trials they often differ (e.g., glucose reported in mg/dL at some sites, mmol/L at others — standardise to one unit in LBSTRESC/LBSTRESN).
2.5.2 Build LB in SAS
/*=============================================================
GLPX-001 | Module 2 | Build SDTM LB domain
=============================================================*/
data sdtm.lb_pre;
set raw.labs;
STUDYID = "&studyid";
DOMAIN = "LB";
USUBJID = cats(STUDYID, "-", site, "-", pt_id);
/* Test codes and labels */
select (upcase(test_name));
when ("HBA1C") do;
LBTESTCD = "HBA1C";
LBTEST = "Hemoglobin A1C";
end;
when ("GLUCOSE") do;
LBTESTCD = "GLUC";
LBTEST = "Glucose";
end;
otherwise do;
LBTESTCD = upcase(test_name);
LBTEST = test_name;
end;
end;
/* Original result and unit */
LBORRES = put(result, best12.); /* Convert numeric to character */
LBORRESU = unit;
/* Standardised result (same as original here — units already standard) */
LBSTRESC = LBORRES;
LBSTRESN = result;
LBSTRESU = unit;
/* Date */
_lab_date = input(lab_date, date9.);
LBDTC = put(_lab_date, yymmdd10.);
/* Visit */
VISITNUM = visit_num;
VISIT = visit_name;
keep STUDYID DOMAIN USUBJID LBTESTCD LBTEST
LBORRES LBORRESU LBSTRESC LBSTRESN LBSTRESU
LBDTC VISITNUM VISIT;
run;
/* Add sequence number within subject */
proc sort data=sdtm.lb_pre;
by USUBJID LBDTC LBTESTCD;
run;
data sdtm.lb;
set sdtm.lb_pre;
by USUBJID;
if first.USUBJID then LBSEQ = 0;
LBSEQ + 1;
run;
proc print data=sdtm.lb (obs=8);
title "SDTM LB Domain — First 8 Records";
run;
2.5.3 Build LB in R
# Test code mapping
map_testcd <- function(test_name) {
case_when(
str_to_upper(test_name) == "HBA1C" ~ "HBA1C",
str_to_upper(test_name) == "GLUCOSE" ~ "GLUC",
TRUE ~ str_to_upper(test_name)
)
}
map_test <- function(test_name) {
case_when(
str_to_upper(test_name) == "HBA1C" ~ "Hemoglobin A1C",
str_to_upper(test_name) == "GLUCOSE" ~ "Glucose",
TRUE ~ test_name
)
}
# Build LB domain
sdtm_lb <- raw_labs |>
mutate(
STUDYID = "GLPX-001",
DOMAIN = "LB",
USUBJID = paste(STUDYID, site, pt_id, sep = "-"),
LBTESTCD = map_testcd(test_name),
LBTEST = map_test(test_name),
LBORRES = as.character(result), # Original result as character
LBORRESU = unit,
LBSTRESC = as.character(result), # Standardised (same here)
LBSTRESN = result, # Numeric standardised
LBSTRESU = unit,
LBDTC = to_iso8601(parse_sas_date(lab_date)),
VISITNUM = visit_num,
VISIT = visit_name
) |>
# Sort and add sequence number within subject
arrange(USUBJID, LBDTC, LBTESTCD) |>
group_by(USUBJID) |>
mutate(LBSEQ = row_number()) |>
ungroup() |>
select(STUDYID, DOMAIN, USUBJID, LBSEQ,
LBTESTCD, LBTEST,
LBORRES, LBORRESU, LBSTRESC, LBSTRESN, LBSTRESU,
LBDTC, VISITNUM, VISIT)
print(sdtm_lb, n = 8)2.5.4 What the LB domain looks like
| USUBJID | LBSEQ | LBTESTCD | LBORRES | LBSTRESU | LBDTC | VISIT |
|---|---|---|---|---|---|---|
| GLPX-001-001-0001 | 1 | GLUC | 9.1 | mmol/L | 2024-01-15 | Screening |
| GLPX-001-001-0001 | 2 | HBA1C | 8.2 | % | 2024-01-15 | Screening |
| GLPX-001-001-0001 | 3 | HBA1C | 7.6 | % | 2024-04-15 | Week 13 |
| GLPX-001-001-0001 | 4 | HBA1C | 7.1 | % | 2024-07-15 | Week 26 |
| GLPX-001-001-0001 | 5 | HBA1C | 6.8 | % | 2025-01-15 | Week 52 |
2.6 Step 5: Validate Your SDTM
After building SDTM datasets you must check them. Regulators run their own checks — so you run yours first. Key things to verify:
2.6.1 Checks to always run
1. Is USUBJID consistent across domains?
Every subject in LB must also be in DM. No orphan records.
/* SAS: Check all LB subjects exist in DM */
proc sql;
select distinct lb.USUBJID
from sdtm.lb as lb
left join sdtm.dm as dm on lb.USUBJID = dm.USUBJID
where dm.USUBJID is null;
quit;
/* Should return 0 rows */
# R: Check all LB subjects exist in DM
orphan_subjects <- sdtm_lb |>
anti_join(sdtm_dm, by = "USUBJID") |>
distinct(USUBJID)
if (nrow(orphan_subjects) == 0) {
cat("✅ All LB subjects found in DM\n")
} else {
cat("❌ Orphan subjects in LB:\n")
print(orphan_subjects)
}2. Are dates in ISO 8601 format?
/* SAS: Check date format */
data _null_;
set sdtm.lb;
if not prxmatch('/^\d{4}-\d{2}-\d{2}$/', LBDTC) then
put "BAD DATE: " USUBJID= LBDTC=;
run;
# R: Check date format
bad_dates <- sdtm_lb |>
filter(!str_detect(LBDTC, "^\\d{4}-\\d{2}-\\d{2}$"))
if (nrow(bad_dates) == 0) {
cat("✅ All dates in ISO 8601 format\n")
} else {
cat("❌ Bad dates found:\n")
print(bad_dates |> select(USUBJID, LBDTC))
}3. Are controlled terminology values valid?
# R: Check SEX values
valid_sex <- c("M", "F", "U", "UNDIFFERENTIATED")
bad_sex <- sdtm_dm |>
filter(!SEX %in% valid_sex)
if (nrow(bad_sex) == 0) {
cat("✅ All SEX values valid\n")
} else {
cat("❌ Invalid SEX values:\n")
print(bad_sex |> select(USUBJID, SEX))
}
# Check RACE values
valid_race <- c("WHITE", "BLACK OR AFRICAN AMERICAN", "ASIAN",
"AMERICAN INDIAN OR ALASKA NATIVE",
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER",
"MULTIPLE", "UNKNOWN", "NOT REPORTED")
bad_race <- sdtm_dm |>
filter(!RACE %in% valid_race)
if (nrow(bad_race) == 0) {
cat("✅ All RACE values valid\n")
} else {
cat("❌ Invalid RACE values:\n")
print(bad_race |> select(USUBJID, RACE))
}4. Is LBSEQ unique within subject?
duplicates <- sdtm_lb |>
group_by(USUBJID, LBSEQ) |>
filter(n() > 1)
if (nrow(duplicates) == 0) {
cat("✅ LBSEQ is unique within subject\n")
} else {
cat("❌ Duplicate LBSEQ values found\n")
print(duplicates)
}3 Exercise 2.1 — Build the DM domain
4 Exercise 2.2 — Build the LB domain
4.1 Step 6: Save Your SDTM Datasets
4.1.1 Save in SAS (as .xpt transport files for submission)
/* Save as SAS transport format (.xpt) — required for FDA submission */
libname xptout xport "/path/to/sdtm/dm.xpt";
proc copy in=sdtm out=xptout;
select dm;
run;
libname xptout xport "/path/to/sdtm/lb.xpt";
proc copy in=sdtm out=xptout;
select lb;
run;
4.1.2 Save in R (as .xpt using haven)
library(haven)
# Save as SAS transport format (v5 xpt — FDA requirement)
write_xpt(sdtm_dm, "data/sdtm/dm.xpt", version = 5, name = "DM")
write_xpt(sdtm_lb, "data/sdtm/lb.xpt", version = 5, name = "LB")
cat("SDTM datasets saved to data/sdtm/\n")The FDA requires submission datasets in SAS Version 5 transport format (.xpt). This is an old but stable format that any software can read. The haven package in R writes this format correctly with write_xpt().
4.2 The SDTM Reviewer’s Guide (SDRG)
When you submit SDTM to the FDA, you also submit a SDTM Reviewer’s Guide (SDRG) — a Word document that explains:
- What datasets are included and why
- Any deviations from the SDTM Implementation Guide
- Custom domains (if any)
- Key decisions made during mapping
- Where to find important variables
The SDRG is what the FDA reviewer reads first before looking at any data. A clear SDRG makes review faster and reduces questions from regulators.
Example SDRG entry for our DM domain:
DM domain: The ARMCD variable was derived from the raw EDC field
trt_grpusing the mapping A=DRUG1, B=DRUG2, P=PLACEBO. Subjects withtrt_grp= “P” received matching placebo. RFSTDTC was derived from the randomisation date. Hispanic subjects were coded as RACE=WHITE and ETHNIC=HISPANIC OR LATINO per CDISC controlled terminology.
4.3 Module 2 Summary
4.4 Your Tasks Before Module 3
Answers:
LBORRES is character because original results can be non-numeric: “POSITIVE”, “>10.0”, “< 0.5”. Making it character preserves exactly what was reported.
For the unit conversion: you would standardise to one unit (e.g., mmol/L) in LBSTRESN and LBSTRESU, while keeping the original values in LBORRES and LBORRESU. The conversion factor for glucose is: mg/dL ÷ 18.0182 = mmol/L.
4.5 What’s Next
In Module 3 we build ADaM datasets from the SDTM we just created. We will build ADSL (one row per subject, all baseline variables) and ADLB (with baseline, change from baseline, and analysis flags). This is where the real analytical work begins.
This course is open source and free forever. Found an error or want to contribute? Open an issue or pull request on GitHub.