Chapter 1
Introduction
In clinical research, during the planning stage of a clinical study, the follow-
ing questions are of particular interest to the investigators: (i) how many
subjects are needed in order to have a desired power for detecting a clin-
ically meaningful difference (e.g., an 80% chance of correctly detecting a
clinically meaningful difference), and (ii) what's the trade-off between cost-
effectiveness and power if only a small number of subjects are available for
the study due to limited budget and/or some medical considerations. To
address these questions, a statistical evaluation for sample size calculation
is often performed based on some statistical inference of the primary study
endpoint with certain assurance. In clinical research, sample size calcula-
tion plays an important role for assuring validity, accuracy, reliability, and
integrity of the intended clinical study.
For a given study, sample size calculation is usually performed based
on some statistical criteria controlling type I and/or type II errors. For
example, we may choose sample size in such a way that there is a desired
precision at a fixed confidence level (i.e., fixed type I error). This approach
is referred to as precision analysis for sample size calculation. The method
of precision analysis is simple and easy to perform and yet it may have a
small chance of correctly detecting a true difference. As an alternative, the
method of pre-study power analysis is usually conducted to estimate sample
size. The concept of the pre-study power analysis is to select required sam-
ple size for achieving a desired power for detecting a clinically/scientifically
meaningful difference at a fixed type I error rate. In clinical research, the
pre-study power analysis is probably the most commonly used method for
sample size calculation. In this book, we will focus on sample size calcula-
tion based on power analysis for various situations in clinical research.
In clinical research, to provide an accurate and reliable sample size cal-
Chapter 1. Introduction
culation, an appropriate statistical test for the hypotheses of interest is
necessarily derived under the study design. The hypotheses should be es-
tablished to reflect the study objectives under the study design. In prac-
tice, it is not uncommon to observe discrepancies among study objective
(hypotheses), study design, statistical analysis (test statistic), and sample
size calculation. These discrepancies can certainly distort the validity and
integrity of the intended clinical trial.
In the next section, regulatory requirement regarding the role of sample
size calculation in clinical research is discussed. In Section 1.2, we pro-
vide some basic considerations for sample size calculation. These basic
considerations include study objectives, design, hypotheses, primary study
endpoint, and clinically meaningful difference. The concepts of type I and
type II errors and procedures for sample size calculation based on precision
analysis, power analysis, probability assessment, and reproducibility prob-
ability are given in Section 1.3. Aim and structure of the book is given in
the last section.
1.1 Regulatory Requirement
As indicated in Chow and Liu (1998), the process of drug research and de-
velopment is a lengthy and costly process. This lengthy and costly process
is necessary not only to demonstrate that the efficacy and safety of the
drug product under investigation, but also to ensure the study drug prod-
uct possesses good drug characteristics such as identity, strength, quality,
purity, and stability after it is approved by the regulatory authority. This
lengthy process includes drug discovery, formulation, animal study, labora-
tory development, clinical development, and regulatory submission. As a
result, clinical development plays an important role in the process of drug
research and development because all of the tests are conducted on humans.
For approval of a drug product under investigation, the United States Food
and Drug Administration (FDA) requires that at least two adequate and
well-controlled clinical studies be conducted for providing substantial evi-
dence regarding the efficacy and safety of the drug product (FDA, 1988a).
However, the following scientific/statistical questions are raised: (i) what is
the definition of an adequate and well-controlled clinical study? (ii) what
evidence is considered substantial? (iii) why do we need at least two stud-
ies? (iv) will a single large trial be sufficient to provide substantial evidence
for approval? and (v) if a single large trial can provide substantial evidence
for approval, how large is considered large? In what follows, we will address
these questions.
1.1. Regulatory Req uirement
Table 1.1.1: Characteristics of an Adequate and Well-Controlled Study
Criteria
Objectives
Methods of analysis
Design
Selection of subjects
Assignment of subjects
Participants of studies
Assessment of responses
Assessment of the effect
Characteristics
Clear statement of investigation's purpose
Summary of proposed or actual methods of
analysis
Valid comparison with a control to provide a
quantitative assessment of drug effect
Adequate assurance of the disease or
conditions under study
Minimization of bias and assurance of
comparability of groups
Minimization of bias on the part of subjects,
observers, and analysis
Well-defined and reliable
Requirement of appropriate statistical
methods
1.1.1 Adequate and Well-Controlled Clinical Trials
Section 314.126 of 21 CFR (Code of Federal Regulation) provides the def-
inition of an adequate and well-controlled study, which is summarized in
Table 1.1.1.
As it can be seen from Table 1.1.1, an adequate and well-controlled
study is judged by eight characteristics specified in the CFR. These char-
acteristics include study objectives, methods of analysis, design, selection
of subjects, assignment of subjects, participants of studies, assessment of
responses, and assessment of the effect. For study objectives, it is required
that the study objectives be clearly stated in the study protocol such that
they can be formulated into statistical hypotheses. Under the hypotheses,
appropriate statistical methods should be described in the study protocol.
A clinical study is not considered adequate and well-controlled if the em-
ployed study design is not valid. A valid study design allows a quantitative
assessment of drug effect with a valid comparison with a control. The selec-
tion of a sufficient number of subjects with the disease or conditions under
study is one of the keys to the integrity of an adequate and well-controlled
study. In an adequate and well-controlled clinical study, subjects should
be randomly assigned to treatment groups to minimize potential bias by
ensuring comparability between treatment groups with respect to demo-
graphic variables such as age, gender, race, height and weight, and other
Chapter 1. Introduction
patient characteristics or prognostic factors such as medical history and
disease severity. An adequate and well-controlled study requires that the
primary study endpoint or response variable should be well-defined and
assessed with certain degree of accuracy and reliability. To achieve this
goal, statistical inferences on the drug effect should be obtained based on
the responses of the primary study endpoint observed from the sufficient
number of subjects using appropriate statistical methods derived under the
study design and objectives.
1.1.2 Substantial Evidence
The substantial evidence as required in the Kefaurer-Harris amendments
to the Food and Drug and Cosmetics Act in 1962 is defined as the evi-
dence consisting of adequate and well-controlled investigations, including
clinical investigations, by experts qualified by scientific training and expe-
rience to evaluate the effectiveness of the drug involved, on the basis of
which it could fairly and responsibly be concluded by such experts that the
drug will have the effect it purports to have under the conditions of use
prescribed, recommended, or suggested in the labeling or proposed label-
ing thereof. Based on this amendment, the FDA requests that reports of
adequate and well-controlled investigations provide the primary basis for
determining whether there is substantial evidence to support the claims of
new drugs and antibiotics.
1.1.3 Why at Least Two Studies?
As indicated earlier, the FDA requires at least two adequate and well-
controlled clinical trials be conducted for providing substantial evidence
regarding the effectiveness and safety of the test drug under investigation
for regulatory review and approval. In practice, it is prudent to plan for
more than one trial in the phase III study because any or combination
of the following reasons: (i) lack of pharmacological rationale, (ii) a new
pharmacological principle, (iii) phase I and phase II data are limited or
unconvincing, (iv) a therapeutic area with a history of failed studies or
failures to confirm seemingly convincing results, (v) a need to demonstrate
efficacy and/or tolerability in different sub-populations, with different co-
medication or other interventions, relative to different competitors, and (vi)
any other needs to address additional questions in the phase III program.
Shao and Chow (2002) and Chow, Shao and Hu (2002) pointed out
that the purpose of requiring at least two clinical studies is not only to
assure the reproducibility but also to provide valuable information regard-
ing generalizability. Reproducibility is referred to as whether the clinical
results are reproducible from location (e.g., study site) to location within
1.1. Regulatory Requirement
the same region or from region to region, while generalizability is referred
to as whether the clinical results can be generalized to other similar pa-
tient populations within the same region or from region to region. When
the sponsor of a newly developed or approved drug product is interested in
getting the drug product into the marketplace from one region (e.g., where
the drug product is developed and approved) to another region, it is a con-
cern that differences in ethnic factors could alter the efficacy and safety of
the drug product in the new region. As a result, it is recommended that a
bridging study be conducted to generate a limited amount of clinical data
in the new region in order to extrapolate the clinical data between the two
regions (ICH, 1998a).
In practice, it is often of interest to determine whether a clinical trial
that produced positive clinical results provides substantial evidence to as-
sure reproducibility and generalizability of the clinical results. In this chap-
ter, the reproducibility of a positive clinical result is studied by evaluating
the probability of observing a positive result in a future clinical study with
the same study protocol, given that a positive clinical result has been ob-
served. The generalizability of clinical results observed from a clinical trial
will be evaluated by means of a sensitivity analysis with respect to changes
in mean and standard deviation of the primary clinical endpoints of the
study.
1.1.4 Substantial Evidence with a Single Trial
Although the FDA requires that at least two adequate and well-controlled
clinical trials be conducted for providing substantial evidence regarding
the effectiveness of the drug product under investigation, a single trial may
be accepted for regulatory approval under certain circumstances. In 1997,
FDA published the Modernization Act (FDAMA), which includes a provi-
sion (Section 115 of FDAMA) to allow data from one adequate and well-
controlled clinical trial investigation and confirmatory evidence to establish
effectiveness for risk/benefit assessment of drug and biological candidates
for approval under certain circumstances. This provision essentially codi-
fied an FDA policy that had existed for several years but whose application
had been limited to some biological products approved by the Center for
Biologic Evaluation and Research (CBER) of the FDA and a few pharma-
ceuticals, especially orphan drugs such as zidovudine and lamotrigine. As it
can be seen from Table 1.1.2, a relatively strong significant result observed
from a single clinical trial (say, p-value is less than 0.001) would have about
90% chance of reproducing the result in future clinical trials.
Consequently, a single clinical trial is sufficient to provide substantial
evidence for demonstration of efficacy and safety of the medication under
study. However, in 1998, FDA published a guidance which shed the light
Chapter 1. Introduction
Table 1.1.2: Estimated Reproducibility Probability Based on
Results from a Single Trial
t-statistic
1.96
2.05
2.17
2.33
2.58
2.81
3.30
p- value
0.050
0.040
0.030
0.020
0.010
0.005
0.001
Reproducibility
0.500
0.536
0.583
0.644
0.732
0.802
0.901
on this approach despite that the FDA has recognized that advances in
sciences and practice of drug development may permit an expanded role
for the single controlled trial in contemporary clinical development (FDA,
1998b).
1.1.5 Sample Size
As the primary objective of most clinical trials is to demonstrate the ef-
fectiveness and safety of drug products under investigation, sample size
calculation plays an important role at the planning stage to ensure that
there are sufficient of subjects for providing accurate and reliable assess-
ment of the drug products with certain statistical assurance. In practice,
hypotheses regarding medical or scientific questions of the study drug are
usually formulated based on the primary study objectives. The hypotheses
are then evaluated using appropriate statistical tests under a valid study
design to ensure that the test results are accurate and reliable with certain
statistical assurance. It should be noted that a valid sample size calculation
can only be done based on appropriate statistical tests for the hypotheses
which can reflect the study objectives under a valid study design. It is then
suggested that the hypotheses be clearly stated when performing a sample
size calculation. Each of the above hypotheses has different requirement
for sample size in order to achieve a desired statistical assurance (e.g., 80%
power or 95% assurance in precision).
Basically, sample size calculation can be classified into sample size es-
timation/determination, sample size justification, sample size adjustment,
and sample size re-estimation. Sample size estimation/determination is re-
ferred to the calculation of required sample size for achieving some desired
statistical assurance of accuracy and reliability such as an 80% power, while
1.2. Basic Considerations
sample size justification is to provide statistical justification for a selected
sample size, which is often a small number due to budget constraints and/or
some medical considerations. In most clinical trials, sample size is neces-
sarily adjusted for some factors such as dropouts or covariates in order to
yield sufficient number of evaluable subjects for a valid statistical assess-
ment of the study medicine. This type of sample size calculation is known
as sample size adjustment. In many clinical trials, it may be desirable to
conduct interim analyses (planned or unplanned) during the conduct of the
trial. For clinical trials with planned or unplanned interim analyses, it is
suggested that sample size be adjusted for controlling an overall type I error
rate at the nominal significance level (e.g., 5%). In addition, when conduct
interim analyses, it is also desirable to perform sample size re-estimation
based on cumulative information observed up to a specific time point to
determine whether the selected sample size is sufficient to achieve a desired
power at the end of the study. Sample size re-estimation may be performed
in a blinded or unblinded fashion depending upon whether the process of
sample size re-estimation will introduce bias to clinical evaluation of sub-
jects beyond the time point at which the interim analysis or sample size
re-estimation is performed. In this book, however, our emphasis will be
placed on sample size estimation/determination. The concept can be easily
applied to (i) sample size justification for a selected sample size, (ii) sample
size adjustment with respect to some factors such as dropouts or covari-
ates, and (iii) sample size re-estimation in clinical trials with planned or
unplanned interim analyses.
1.2 Basic Considerations
In clinical research, sample size calculation may be performed based on pre-
cision analysis, power analysis, probability assessment, or other statistical
inferences. To provide an accurate and reliable sample size calculation, it is
suggested that an appropriate statistical test for the hypotheses of interest
be derived under the study design. The hypotheses should be established to
reflect the study objectives and should be able to address statistical/medical
questions of interest under the study design. As a result, a typical procedure
for sample size calculation is to determine or estimate sample size based on
an appropriate statistical method or test, which are derived under the hy-
potheses and the study design, for testing the hypotheses in order to achieve
certain degree of statistical inference (e.g., 95% assurance or 80% power)
on the effect of the test drug under investigation. As indicated earlier, in
practice it is not uncommon to observe discrepancies among study objective
(hypotheses), study design, statistical analysis (test statistic), and sample
size calculation. These discrepancies certainly have an impact on sample