Prerequisites (knowledge of topic)
1. Course “Introduction to Biostatistics”, required.
2. Basic knowledge of statistics, such as distributions (i.e., normal, t, F, chi-squares), mean/standard calculations, linear regressions (I will review some of these concepts when in the first class so to bring all students into the same level of understanding)
3. R/SAS knowledge. It would be greatly helpful if students have some basic knowledge of R or SAS. The course will be taught in R since it is free and open source, but SAS code will be available.
Laptop required for practice with the latest version of R installed
R is required. Please download and install the latest version of R/RStudio from http://r-project.org.
1. This «Advanced Biostatistics» course is based on the book: "Clinical Trial Data Analysis Using R and SAS" co-authored by Din Chen, Karl E. Peace and Pinggao Zhang, published by Chapman and Hall/CRC Biostatistics Series in 2017 (thereafter referred as CTDA in this document)
2. This class is aimed to provide a thorough presentation and learning of biostatistical analyses with detailed step-by-step illustrations on their implementation using R and SAS. Examples are based on the authors' actual experience in many areas of biostatistical clinical drug development. After understanding the application, various biostatistical methods appropriate for analyzing data are identified. Then analysis code is developed using appropriate R/SAS packages and functions to analyze the data. Analysis code development and results are presented in a stepwise fashion. This stepwise approach should enable students to follow the logic and gain an understanding of the analysis methods and the R/SAS implementation so that they may use R/SAS to analyze their own biostatistical data.
3. Students are encoraged to bring their own research data to be used in the class as further examples.
Topics To Be Covered
1. R basics: Introduction to the R with Monte-Carlo simulation on clinical trial applications.
2. Treatment comparisons with continuous/categorical endpoints: We start with simple two treatment comparisons using t-test and extend the analysis to multiple treatment comparisons (Analysis of variance) and then to analysis of covariance with clinical covariates using "lm" function in R for continuous endpoints and "glm" for categorical endpoints.
3. Longitudinal clinical trials: We will illustrate longitudinal trials using R "lattice" graphical package and their analysis using linear mixed models for continuous endpoints (R function "lmer" from "lme4" package), generalized linear mixed modeland GEE for categorical endpoints (R function "glmmPQL" from "MASS" package and and "gee" from "gee" package)
4. Meta-analysis in clinical trials: Both the fixed-effect model and the random-effect model will be discussed with both categorical and continuous endpoints using the powerful graphical feathers in R “meta” package
5. Bayesian analysis in clinical trials using MCMC simulations with MCMCpack.
Day 1-Monday (Chapters 1 and 2, CTDA):
Introduction to R and Review of Biostatistics
Morning Session: R Basics
Introduction to the R system, Monte-Carlo simulation in clinical trials
Afternoon Session: R for basic biostatistical analysis
Clinical trial simulations and data generation
Data distribution plotting, summary statistics and simple regression
Summary and Discussions
Day 2-Tuesday (Chapters 3 and 4, CTDA):
Morning Session (Chapter 3): Treatment Comparison
Data from Clinical Trials: Diastolic Blood Pressure and Data on Duodenal Ulcer Healing
Statistical Models for Treatment Comparisons:
Models for Continuous Endpoints
One-Way Analysis of Variance (ANOVA)
Multi-Way ANOVA: Factorial Design
Multivariate Analysis of Variance (MANOVA)
Models for Categorical Endpoints: Pearson's ¬2-test
R Step-by-Step Illustration on Data Analysis
Afternoon Session (Chapter 4): Treatment Comparisons with Covariates
Data from Clinical Trials
Diastolic Blood Pressure
Clinical Trials for Betablockers
Clinical Trial on Familial Adenomatous Polyposis
Statistical Models Incorporating Covariates
ANCOVA Models for Continuous Endpoints
Logistic Regression for Binary/Binomial Endpoints
Poisson Regression for Clinical Endpoint with Counts
Data Analysis in R
Day 3-Wed (Chapters 6, CTDA)
Longitudinal Data Analysis
Morning Session: Data and Longitudinal Modelling
Longitudinal Data Structure
Diastolic Blood Pressure Data
Clinical Trial on Duodenal Ulcer Healing
Longitudinal Statistical Models
Linear Mixed Models
Generalized Linear Mixed Models
Afternoon Session: Step-by-Step Data Analysis using R
Analysis of Diastolic Blood Pressure Data
Data Graphics and Response Feature Analysis
Longitudinal Modeling with R package «nlme»
Analysis of Cimetidine Duodenal Ulcer Trial
Fit Logistic Regression to Binomial Data
Fit Generalized Linear Mixed Model
Summary and Discussions
Day 4-Thursday (Chapters 8, CTDA)
Morning Session: Fixed-effect and Random-effect Models
Statistical Models for Meta-Analysis
Clinical Hypotheses and Effect Size
Fixed-Effects Meta-Analysis Model: The Weighted-Average
Random-Effects Meta-Analysis Model: DerSimonian-Laird
Meta-Data Analysis in R package “metafor”
Afternoon Session: Meta-Regression
Statistical Models for Meta-Regression
Meta-Regression in R package “metafor”
Summary and Discussions
Day 5-Friday (Chapters 9, CTDA)
Morning Session: Bayesian Models
From Prior Distribution to Posterior Distributions for Some Standard Distributions
Normal Distribution with Known Variance
Normal Distribution with Unknown Variance
Simulation from the Posterior Distribution
Afternoon Session: R Packages for Bayesian Models
R Packages using WinBUGS, R2WinBUGS, BRugs, rbugs, MCMCpack
Bayesian Data Analysis
Blood Pressure Data: Bayesian Linear Regression
Binomial Data: Bayesian Logistic Regression
Count Data: Bayesian Poisson Regression
Summary and Discussions
Clinical Trial Data Analysis Using R and SAS(2017)
by Din Chen, Karl E. Peace and Pinggao Zhang,
Chapman and Hall/CRC Biostatistics Series
«Students are recommend to get this textbook or ebook from library»
Supplementary / voluntary
Mandatory readings before course start
• Install the latest version of R
• (20%) Class participation
• (80%) Take Home Project (due in 3 weeks after class)
• Take home open book project (see below for details)
Write a data analysis report using the longitudinal model learned from the class using the data provided (see “Data Sources” below). The report should include at least 5 Sections:
o (15%) Section 1 is on the literature review to formulate the research questions and objectives
o (30%) Section 2 is to discuss the statistical model development;
o (30%) Section 3 is on data analysis to support the research questions with conclusions, and
o (20%) Section 4 is the discussions and future research.
o (5%) Session 5 is the appendix to include all the references and analysis code.
• Suggested research objectives:
o Present a general overview of measuring change over an 8-month period in individual perceptions of mood and social adjustment by women who recently (1 month previous) underwent breast cancer surgery (intra-individual change).
o Present the longitudinal modeling that measures change across all subjects (inter-individual change).
o Demonstrate the addition of age and type of surgical treatment as possible predictors that may account for any change in the individual growth trajectories (i.e., intercept and slope) of mood and social adjustment.
• Data Sources:
o Excel file “hkcancer.xlsx” with variables explained at sheet “readme”
o Study of 405 Hong Kong Chinese women who underwent breast cancer surgery whether exhibited evidence of rate of change in their “mood” and “social adjustment” at 1, 4, and 8 months post-surgery.
o Byrne, B.M., Lam, W. W. T. and Fielding, R. (2008) Measuring pattern of change in personality assessments: an annotated application of latent growth curve modelling. Journal of Personality Assessment, 90:1-11.
o Data are from this reference
o PDF file attached