Capacity of Multi-antenna Gaussian Channels
_I. Emre Telatar
Abstract
We investigate the use of multiple transmitting andor receiving antennas
for single user communications over the additive Gaussian channel with and
without fading. We derive formulas for the capacities and error exponents of
such channels, and describe computational procedures to evaluate such for-
mulas. We show that the potential gains of such multi-antenna systems over
single-antenna systems is rather large under independence assumptions for the
fades and noises at di erent receiving antennas.
Introduction
We will consider a single user Gaussian channel with multiple transmitting andor
receiving antennas. We will denote the number of transmitting antennas by t and the
number of receiving antennas by r. We will exclusively deal with a linear model in
which the received vector y C r depends on the transmitted vector x C t via
y = Hx + n
where H is a r t complex matrix and n is zero-mean complex Gaussian noise with
independent, equal variance real and imaginary parts. We assume Enny = Ir, that
is, the noises corrupting the di erent receivers are independent. The transmitter is
constrained in its total power to P ,
Equivalently, since xyx = trxxy, and expectation and trace commute,
Exyx P:
trExxy P:
Rm. C-, Lucent Technologies, Bell Laboratories, Mountain Avenue, Murray Hill, NJ,
USA , telatar@lucent.com
This second form of the power constraint will prove more useful in the upcoming
discussion.
We will consider several scenarios for the matrix H:
. H is deterministic.
. H is a random matrix for which we shall use the notation H, chosen according
to a probability distribution, and each use of the channel corresponds to an
independent realization of H.
. H is a random matrix, but is xed once it is chosen.
The main focus of this paper in on the last two of these cases. The rst case is
included so as to expose the techniques used in the later cases in a more familiar
context.
In the cases when H is random, we will assume that its entries form an
i.i.d. Gaussian collection with zero-mean, independent real and imaginary parts, each
with variance =. Equivalently, each entry of H has uniform phase and Rayleigh
magnitude. This choice models a Rayleigh fading environment with enough separation
within the receiving antennas and the transmitting antennas such that the fades for
each transmitting-receiving antenna pair are independent. In all cases, we will assume
that the realization of H is known to the receiver, or, equivalently, the channel output
consists of the pair y; H, and the distribution of H is known at the transmitter.
Preliminaries
A complex random vector x C n is said to be Gaussian if the real random vector
Imxi, is Gaussian. Thus,
^x Rn consisting of its real and imaginary parts, ^x = hRex
to specify the distribution of a complex Gaussian random vector x, it is necessary to
specify the expectation and covariance of ^x, namely,
E^x Rn
and
E^x E^x^x E^xy Rnn:
We will say that a complex Gaussian random vector x is circularly symmetric if the
covariance of the corresponding ^x has the structure
E^x E^x^x E^xy =
ReQ ImQ
ReQ
ImQ
for some Hermitian non-negative de nite Q C nn. Note that the real part of an
Hermitian matrix is symmetric and the imaginary part of an Hermitian matrix is
anti-symmetric and thus the matrix appearing in is real and symmetric. In this
case ExExxExy = Q, and thus, a circularly symmetric complex Gaussian
random vector x is speci ed by prescribing Ex and Ex Exx Exy.
For any z C n and A C nm de ne
^z =Rez
Imz
and
Lemma . The mappings z ! ^z = hRez
following properties:
ImA
^A =ReA ImA
ReA :
Imzi and A ! ^A = hReA ImA
ImA ReAi have the
C = AB ^C = ^A ^B
C = A + B ^C = ^A + ^B
C = Ay ^C = ^Ay
C = A ^C = ^A
det ^A = j detAj = detAAy
z = x + y ^z = ^x + ^y
y = Ax ^y = ^A^x
Rexyy = ^xy ^y:
a
b
c
d
e
f
g
h
Proof. The properties a, b and c are immediate. d follows from a and
the fact that ^In = In. e follows from
det ^A = detI
iI
I ^AI iI
I
= det A
ImA A
= detA detA:
f, g and h are immediate.
Corollary . U C nn is unitary if and only if ^U Rnn is orthonormal.
Proof. U yU = In ^U y ^U = ^In = In.
Corollary . If Q C nn is non-negative de nite then so is ^Q Rnn .
Proof. Given x = x; : : : ; xny Rn , let z = x + jxn+; : : : ; xn + jxny C n, so
that x = ^z. Then by g and h
xy ^Qx = RezyQz = zyQz :
The probability density with respect to the standard Lebesgue measure on C n
of a circularly symmetric complex Gaussian with mean and covariance Q is given
by
;Qx = det ^Q= exp^x ^y ^Q^x ^
= detQ expx yQx
where the second equality follows from dh. The di erential entropy of a com-
plex Gaussian x with covariance Q is given by
H Q = E Q log Qx
= log detQ + log eExyQx
= log detQ + log e trExxyQ
= log detQ + log e trI
= log deteQ:
For us, the importance of the circularly symmetric complex Gaussians is due to the
following lemma: circularly symmetric complex Gaussians are entropy maximizers.
Lemma . Suppose the complex random vector x C n is zero-mean and satis es
Exxy = Q, i.e., Exix
j = Qij, i; j n. Then the entropy of x satis es
Hx log deteQ with equality if and only if x is a circularly symmetric complex
Gaussian with
Exxy = Q
j dx = Qij, i; j n.
j dx = Qij, and that log Qx is a linear combination of
Let
the terms xix
Proof. Let p be any density function satisfying RC n pxxix
Qx = detQ expxyQx:
Observe that RC n Qxxix
j . Thus E Qlog Qx = E plog Qx. Then,
Hp H Q = ZC n
= ZC n
=ZC n
;
px log px dx +ZC n
px log px dx +ZC n
px log
Qx
px
dx
Qx log Qx dx
px log Qx dx
with equality only if p = Q. Thus Hp H Q.
Lemma . If x C n is a circularly symmetric complex Gaussian then so is y = Ax
for any A C mn.
Proof. We may assume x is zero-mean. Let Q = Exxy. Then y is zero-mean,
^y = ^A^x, and
E^y ^yy = ^AE^x^xy ^Ay =
^A ^Q ^Ay =
^K
where K = AQAy.
Lemma . If x and y are independent circularly symmetric complex Gaussians, then
z = x + y is a circularly symmetric complex Gaussian.
Proof. Let A = Exxy and B = Eyyy. Then E^z ^zy =
^C with C = A + B.
The Gaussian channel with fixed transfer function
We will start by reminding ourselves the case of deterministic H. The results of this
section can be inferred from , Ch.
. Capacity
We will rst derive an expression for the capacity CH; P of this channel. To that
end, we will maximize the average mutual information Ix; y between the input and
the output of the channel over the choice of the distribution of x.
By the singular value decomposition theorem, any matrix H C rt can be written
as
H = UDV y
where U C rr and V C tt are unitary, and D Rrt is non-negative and diagonal.
In fact, the diagonal entries of D are the non-negative square roots of the eigenvalues
of HH y, the columns of U are the eigenvectors of HH y and the columns of V are the
eigenvectors of H yH. Thus, we can write as
y = UDV yx + n:
Let ~y = U yy, ~x = V yx, ~n = U yn. Note that U and V are invertible, ~n has the same
distribution as n and, E~xy ~x = Exyx. Thus, the original channel is equivalent to
the channel
~y = D ~x + ~n
where ~n is zero-mean, Gaussian, with independent, identically distributed real and
imaginary parts and E ~n ~ny = Ir. Since H is of rank at most minfr; tg, at most
minfr; tg of the singular values of it are non-zero. Denoting these by =
, i =
; : : : ; minfr; tg, we can write component-wise, to get
i
~yi = =
i ~xi + ~ni;
i minfr; tg;
and the rest of the components of ~y if any are equal to the corresponding components
of ~n. We thus see that ~yi for i minft; rg is independent of the transmitted signal and
that ~xi for i minft; rg don’t play any role. To maximize the mutual information,
we need to choose f~xi : i minfr; tgg to be independent, with each ~xi having
independent Gaussian, zero-mean real and imaginary parts. The variances need to
be chosen via water- lling" as
ERe~xi = EIm~xi =
i +
where is chosen to meet the power constraint. Here, a+ denotes maxf ; ag. The
power P and the maximal mutual information can thus be parametrized as
P =Xi
i +;
C =Xi lni+:
Remark Reciprocity. Since the non-zero eigenvalues of H yH are the same as those
of HH y, we see that the capacities of channels corresponding to H and H y are the
same.
Example . Take Hij = for all i; j. We can write H as
p=r
...p=r
H =
prthp=t : : :p=ti
and we thus see that in the singular value decomposition of H the diagonal matrix
D will have only one non-zero entry, prt. We also see that the rst column of U is
p=r; : : : ; y and the rst column of V is p=t; : : : ; y. Thus,
C = log + rtP :
The x = V ~x that achieves this capacity satis es Exix
j = P=t for all i; j, i.e., the
transmitters are all sending the same signal. Note that, even though each transmitter
is sending a power of P=t, since their signals add coherently at the receiver, the power
received at each receiver is P t. Since each receiver sees the same signal and the noises
at the receivers are uncorrelated the overall signal to noise ratio is P rt.
Example . Take r = t = n and H = In. Then
C = n log + P=n
For x that achieves this capacity Exix
j = ijP=n, i.e, the components of x are i.i.d.
However, it is incorrect to infer from this conclusion that to achieve capacity one has
to do independent coding for each transmitter. It is true that the capacity of this
channel can be achieved by splitting the incoming data stream into t streams, coding
and modulating these schemes separately, and then sending the t modulated signals
over the di erent transmitters. But, suppose Nt bits are going to be transmitted,
and we will either separate them into t groups of N bits each and use each group to
select one of N signals for each transmitter, or, we will use all all Nt bits to select
one of N t signal vectors. The second of these alternatives will yield a probability of
error much smaller than the rst, at the expense of much greater complexity. Indeed,
the log of the error probability in the two cases will di er by a factor of t. See the
error exponents of parallel channels in , pp. .
. Alternative Derivation of the Capacity
The mutual information Ix; y can be written as
Ix; y = Hy Hyjx = Hy Hn;
and thus maximizing Ix; y is equivalent to maximizing Hy. Note that if x satis es
Exyx P , so does x Ex, so we can restrict our attention to zero-mean x.
Furthermore, if x is zero-mean with covariance Exxy = Q, then y is zero-mean
with covariance Eyyy = HQH y + Ir, and by Lemma among such y the entropy
is largest when y is circularly symmetric complex Gaussian, which is the case when
x is circularly symmetric complex Gaussian Lemmas and . So, we can further
restrict our attention to circularly symmetric complex Gaussian x. In this case the
mutual information is given by
Ix; y = log detIr + HQH y = log detIt + QH yH
where the second equality follows from the determinant identity detI+AB = detI+
BA, and it only remains to choose Q to maximize this quantity subject to the
constraints trQ P and that Q is non-negative de nite. The quantity log detI +
HQH y will occur in this document frequently enough that we will let
Q; H = log detI + HQH y
to denote it. Since H yH is Hermitian it can be diagonalized, H yH = U yU, with
unitary U and non-negative diagonal = diag; : : : ; t. Applying the determinant
identity again we see that
detIr + HQH y = detIt + =UQU y=:
Observe that ~Q = U QU y is non-negative de nite when and only when Q is, and that
tr ~Q = trQ; thus the maximization over Q can be carried equally well over ~Q.
Note also that for any non-negative de nite matrix A, detA Qi Aii, thus
detIr + = ~Q= Yi
+ ~Qiii
with equality when ~Q is diagonal. Thus we see that the maximizing ~Q is diagonal,
and the optimal diagonal entries can be found via water- lling" to be
~Qii =
i +;
i = ; : : : ; t
~Qii = P . The corresponding maximum mutual infor-
where is chosen to satisfyPi
mation is given by
as before.
. Error Exponents
Xi logi+
Knowing the capacity of a channel is not always su cient. One may be interested
in knowing how hard it is to get close to this capacity. Error exponents provide a
partial answer to this question by giving an upper bound to the probability of error
achievable by block codes of a given length n and rate R. The upper bound is known
as the random coding bound and is given by
Perror expnErR;
where the random coding exponent ErR is given by
ErR = max
E R;
where, in turn, E is given by the supremum over all input distributions qx satis-
fying the energy constraint of
E ; qx = logZ Z qxxpyjx=+ dx+
dy:
In our case pyjx = detIr expyxyyx. If we choose qx as the Gaussian
distribution Q we get after some algebra
E ; Q = log detIr + + HQH y = + Q; H: