install.packages(c(
"tidyverse", # data manipulation and plotting
"haven", # read SAS datasets (.sas7bdat, .xpt)
"admiral", # ADaM dataset building (pharmaverse)
"pharmaversesdtm", # example SDTM data
"gt", # publication-quality tables
"rtables", # clinical-style tables
"ggplot2", # figures
"lubridate" # date handling
))Introduction to clinical trial data analysis
Foundations of CDISC data standards
Course Overview
Welcome
This free course is built for biologists, computational biologists, and data analysts who want to transition into clinical data science in the pharmaceutical industry. It was created from the perspective of someone making that transition firsthand, translating CDISC standards from abstract documentation into practical, working understanding. Every concept is explained through the questions that naturally arise when moving from discovery research to regulated clinical trial data, with a focus on clarity, intuition, and real‑world relevance rather than textbook formalism.
By the end of this course you will be able to:
- Understand how clinical trial data flows from collection to regulatory submission
- Work with CDISC standards (SDTM and ADaM) in both SAS and R
- Produce industry-standard Tables, Listings, and Figures (TLFs)
- Read and interpret a Statistical Analysis Plan (SAP)
- Understand the roles and workflow inside a pharma company.
What this course assumes: Some statistical background (you know what a mean, a p-value, and a regression are). No SAS experience required. No prior clinical trial experience required.
The Big Picture: What Happens to Clinical Trial Data?
Before touching any code or standard, we need to understand why all of this exists. Let us walk through a simplified story.
A drug enters a clinical trial
A pharmaceutical company is testing a new GLP-1 drug for type 2 diabetes. They recruit 2,000 patients across 50 hospitals in 10 countries. Over 2 years, they collect:
Demographics (age, sex, weight, ethnicity)
Lab results (HbA1c, fasting glucose, lipids) every 3 months A Phase III randomised, double-blind, placebo-controlled trial of a fictional GLP-1 receptor agonist in adults with type 2 diabetes.
Population: Adults 18–75 years, HbA1c 7.5–10%, BMI 27–40 kg/m²
Arms: Drug 1mg (n=300), Drug 2mg (n=300), Placebo (n=300)
Duration: 52 weeks
Primary endpoint: Change from baseline in HbA1c at week 52
Key secondary endpoints: Change in body weight, fasting glucose, BP
We chose a diabetes/obesity trial deliberately; it mirrors the kind of work done at several pharma companies on GLP-1 programs.
Setting Up Your Environment
You need three things: SAS, R/Quarto, and a way to run both.
SAS OnDemand for Academics (free)
SAS is the dominant tool for SDTM/ADaM programming at most pharma companies. SAS OnDemand for Academics is a free cloud-based version.
- Go to https://welcome.oda.sas.com
- Click Register; use your university email if you have one
- Once logged in, launch SAS Studio (the browser-based IDE)
- You do not need to install anything locally
SAS Studio runs in the browser. You write code on the left, results appear on the right. It looks different from RStudio but the logic is similar.
R and RStudio
You likely already have this. If not:
- Install R from https://cran.r-project.org
- Install RStudio from https://posit.co/downloads
Key R packages we will use throughout the course:
Quarto
This course is written in Quarto. To render these files locally:
- Install Quarto from https://quarto.org/docs/get-started
- In RStudio: open any
.qmdfile and click Render - Or from the terminal:
quarto render module-00-orientation.qmd
Verify your R setup
Run this to confirm everything is working:
library(tidyverse)Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tibble' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.1
Warning: package 'readr' was built under R version 4.4.1
Warning: package 'purrr' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.1
Warning: package 'stringr' was built under R version 4.4.1
Warning: package 'forcats' was built under R version 4.4.1
Warning: package 'lubridate' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)Warning: package 'haven' was built under R version 4.4.3
# If this runs without errors, you are ready
cat("R version:", R.version$major, ".", R.version$minor, "\n")R version: 4 . 4.0
cat("tidyverse loaded successfully\n")tidyverse loaded successfully
cat("haven loaded successfully\n")haven loaded successfully
Verify your SAS setup
Paste this into SAS Studio and click Run:
/* Module 0: SAS verification */
%put SAS Version: &sysvlong;
%put Hello from SAS, setup is working;
data verify;
name = "GLPX-001";
status = "Ready";
put name= status=;
run;
If you see output without errors, SAS is ready.
GitHub Repo Structure
The course lives at: https://github.com/sufyansuleman/cdisc-from-scratch
Here is how the repository is organised:
cdisc-from-scratch/
│
├── README.md ← Course overview and how to use it
│
├── modules/
│ ├── module-00-orientation.qmd ← This file
│ ├── module-01-cdisc-overview.qmd
│ ├── module-02-sdtm.qmd
│ ├── module-03-adam.qmd
│ ├── module-04-tlfs.qmd
│ ├── module-05-full-pipeline.qmd
│ └── module-06-industry-context.qmd
│
├── data/
│ ├── raw/ ← Simulated raw EDC data (GLPX-001)
│ ├── sdtm/ ← SDTM datasets we build
│ └── adam/ ← ADaM datasets we build
│
├── sas/
│ └── macros/ ← Reusable SAS macros
│
├── r/
│ └── functions/ ← Reusable R functions
│
├── outputs/
│ └── tlfs/ ← Generated tables and figures
│
└── _quarto.yml ← Quarto project config (renders as a website)
Your First Task
Before Module 1, do these three things:
What’s Next
In Module 1 we go deep on CDISC standards: what SDTM and ADaM actually look like, what the rules are, and why they are designed the way they are. We will look at real SDTM domain examples and start to understand the logic before writing any code.
This course is open source and free forever. Found an error or want to contribute? Open an issue or pull request on GitHub.