Hoeffding’s inequality
√
In probability theory, Hoeffding’s inequality provides
an upper bound on the probability that the sum of random
variables deviates from its expected value. Hoeffding’s
inequality was proved by Wassily Hoeffding in 1963.[1]
Hoeffding’s inequality is a special case of the Azuma–
Hoeffding inequality, and it is more general than the
Bernstein inequality, proved by Sergei Bernstein in 1923.
They are also special cases of McDiarmid’s inequality.
(
For example, taking " =
ln n/n gives:
jH(n) pnj
p
n ln n
P
12 exp (2 ln n) = 12/n2:
)
2 General case
Let X1, ..., Xn be independent random variables bounded
by the interval [0, 1]: 0 ≤ Xᵢ ≤ 1. We define the empirical
mean of these variables by
(X1 + + Xn):
1
n
X =
One of the inequalities in Theorem 1 of Hoeffding
(1963):
2nt2
P(X E[X] t) e
Theorem 2 of Hoeffding (1963) is a generalization of
the above inequality when it is known that Xi are strictly
bounded by the intervals [ai, bi]:
[
X E
[
(
(X E
P
X
X
(
] t
) exp
(
] t
) 2 exp
P
)
)
∑
2n2t2
i=1(bi ai)2
∑
2n2t2
i=1(bi ai)2
n
n
which are valid for positive values of t. Here E[X] is the
expected value of X. The inequalities can be also stated
in terms of the sum
Sn = X1 + + Xn
of the random variables:
P(Sn E[Sn] t) exp
(
)
;
)
2t2
∑
(
i=1(bi ai)2
∑
2t2
n
:
n
i=1(bi ai)2
P(jSn E[Sn]j t) 2 exp
Note that the inequalities also hold when the Xᵢ have been
obtained using sampling without replacement; in this case
the random variables are not independent anymore. A
proof of this statement can be found in Hoeffding’s paper.
For slightly better bounds in the case of sampling with-
out replacement, see for instance the paper by Serfling
(1974).
1 Special case of Bernoulli random
variables
Hoeffding’s inequality can be applied to the important
special case of identically distributed Bernoulli random
variables, and this is how the inequality is often used in
combinatorics and computer science. We consider a coin
that shows heads with probability p and tails with prob-
ability 1 − p. We toss the coin n times. The expected
number of times the coin comes up heads is pn. Fur-
thermore, the probability that the coin comes up heads at
most k times can be exactly quantified by the following
expression:
P(H(n) k) =
)
pi(1 p)ni;
(
k∑
i=0
n
i
where H(n) is the number of heads in n coin tosses.
When k = (p − ε)n for some ε > 0, Hoeffding’s inequal-
ity bounds this probability by a term that is exponentially
small in ε2n:
(2"2n
)
(2"2n
)
P(H(n) (p ")n) exp
Similarly, when k = (p + ε)n for some ε > 0, Hoeffding’s
inequality bounds the probability that we see at least εn
more tosses that show heads than we would expect:
:
P(H(n) (p + ")n) exp
Hence Hoeffding’s inequality implies that the number of
heads that we see is concentrated around its mean, with
exponentially small tail.
:
P ((p ϵ)n H(n) (p + ")n) 12 exp
(2"2n
)
:
1
2
3 Proof
In this
inequality.[2] The proof uses Hoeffding’s Lemma:
section, we give a proof of Hoeffding’s
Suppose X is a real random vari-
able with mean zero such that
P (X 2 [a; b]) = 1 . Then
(
[
] exp
8 s2(b a)2
esX
E
)
1
:
Using this lemma, we can prove Hoeffding’s inequality.
Suppose X1, ..., Xn are n independent random variables
such that
P (Xi 2 [ai; bi]) = 1;
Let
1 i n:
Sn = X1 + + Xn:
Then for s, t ≥ 0, Markov’s inequality and the indepen-
dence of Xᵢ implies:
6 NOTES
4 Usage
4.1 Confidence Intervals
Hoeffding’s inequality is useful to analyse the number of
required samples needed to obtain a confidence interval
by solving the inequality in Theorem 1:
2nt2
P(X E[X] t) e
The inequality states that the probability that the esti-
mated and true values differ by more than t is bounded
by e−2nt2. Symmetrically, the inequality is also valid for
another side of the difference:
2nt2
P(X + E[X] t) e
By adding them both up, we can obtain two-sided variant
of this inequality:
2nt2
P(jX E[X]j t) 2e
This probability can be interpreted as the level of signifi-
cance (probability of making an error) for a confidence
interval around E[X] of size 2t:
P (Sn E [Sn] t) = P
e
]
= P(X /2 [E[X] t; E[X] + t]) 2e
Solving the above for n gives us the following:
2nt2
Thus we get
P (Sn E [Sn] t) exp
(
∑
2t2
n
i=1(bi ai)2
)
:
6 Notes
[1] Hoeffding (1963)
[2] Nowak (2009); for a more intuitive proof, see this note
(
)
[
[
]
es(SnE[Sn]) est
n∏
stE
es(SnE[Sn])
st
n∏
(
ai )2
st
s2 (bi
E
i=1
i=1
e
8
es(XiE[Xi])
= e
e
= exp
st + 1
8 s2
)
n∑
(bi ai)2
i=1
{
To get the best possible upper bound, we find the min-
imum of the right hand side of the last inequality as a
function of s. Define
∑
g : R+ ! R
g(s) = st + s2
Note that g is a quadratic function and achieves its mini-
mum at
i=1(bi ai)2
n
8
∑
s =
4t
n
i=1(bi ai)2 :
2t2
2t2
samples to ac-
n log(/2)
Therefore, we require at least log(/2)
quire (1 ) -confidence interval E[X] t .
Hence, the cost of acquiring the confidence interval is
sublinear in terms of confidence level and quadratic in
terms of precision.
Note that this inequality is the most conservative of the
three in Theorem 1, and there are more efficient methods
of estimating a confidence interval.
5 See also
Concentration inequality – a summary of tail-
bounds on random variables.
Hoeffding’s lemma
3
7 References
Serfling, Robert J. (1974).
“Probability Inequal-
ities for the Sum in Sampling without Replace-
ment”. The Annals of Statistics 2 (1): 39–48.
doi:10.1214/aos/1176342611. MR 0420967.
Hoeffding, Wassily (1963). “Probability inequali-
ties for sums of bounded random variables”. Jour-
nal of the American Statistical Association 58 (301):
13–30.
doi:10.1080/01621459.1963.10500830.
JSTOR 2282952. MR 0144363.
Robert Nowak (2009).
“Lecture 7: Chernoff’s
Bound and Hoeffding’s Inequality” (PDF). ECE 901
(Summer '09) : Statistical Learning Theory Lecture
Notes. University of Wisconsin-Madison. Retrieved
May 16, 2014.
4
8 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES
8 Text and image sources, contributors, and licenses
8.1 Text
Hoeffding’s inequality Source: https://en.wikipedia.org/wiki/Hoeffding’{}s_inequality?oldid=726374628 Contributors: Michael Hardy,
Tomi, Charles Matthews, Coco~enwiki, Alan Liefting, Giftlite, MarkSweep, RJHall, EmilJ, 3mta3, RajeevA, Rjwilmsi, FlaBot, Sodin,
Gaius Cornelius, David Pal, Gutworth, A. Pichler, Ylloh, CBM, Headbomb, Steve Kroon, Heysan, Magioladitis, Sullivan.t.j, Wullj, Leyo,
VolkovBot, Adevish, Wiae, Melcombe, Qwfp, Addbot, Erel Segal, Jchthys, Gopher1214, Fpmchu, Kiefer.Wolfowitz, ZéroBot, Quondum,
ChuispastonBot, Intervallic, Xuan.z.jiao.sju, Fanxiequan, Monkbot, Dmirylenka, InfoTheorist, Levgvel and Anonymous: 38
8.2 Images
8.3 Content license
Creative Commons Attribution-Share Alike 3.0