ISO/IEC JTC 1/SC 29/WG 1 N 2412
Date: 2005-12-03
ISO/IEC JTC 1/SC 29/WG 1
(ITU-T SG 16)
Coding of Still Pictures
JBIG
Joint Bi-level Image
JPEG
Joint Photographic
Experts Group
Experts Group
TITLE:
SOURCE:
PROJECT:
STATUS:
The JPEG-2000 Still Image Compression Standard
(Last Revised: 2005-12-03)
Michael D. Adams
Assistant Professor
Dept. of Electrical and Computer Engineering
University of Victoria
P. O. Box 3055 STN CSC, Victoria, BC, V8W 3P6, CANADA
E-mail: mdadams@ece.uvic.ca
Web: www.ece.uvic.ca/˜mdadams
JPEG 2000
REQUESTED ACTION:
DISTRIBUTION:
None
Public
Contact:
ISO/IEC JTC 1/SC 29/WG 1 Convener—Dr. Daniel T. Lee
Yahoo! Asia, Sunning Plaza, Rm 2802, 10 Hysan Avenue, Causeway Bay, Hong Kong
Yahoo! Inc, 701 First Avenue, Sunnyvale, California 94089, USA
Tel: +1 408 349 7051/+852 2882 3898, Fax: +1 253 830 0372, E-mail: dlee@yahoo-inc.com
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
THIS PAGE WAS INTENTIONALLY LEFT BLANK
(TO ACCOMMODATE DUPLEX PRINTING).
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
Copyright c 2002–2005 Michael D. Adams
1
The JPEG-2000 Still Image Compression Standard
(Last Revised: 2005-12-03)
Michael D. Adams
Dept. of Electrical and Computer Engineering, University of Victoria
P. O. Box 3055 STN CSC, Victoria, BC, V8W 3P6, CANADA
E-mail: mdadams@ece.uvic.ca Web: www.ece.uvic.ca/˜mdadams
Abstract—JPEG 2000, a new international standard for still image com-
pression, is discussed at length. A high-level introduction to the JPEG-2000
standard is given, followed by a detailed technical description of the JPEG-
2000 Part-1 codec.
Keywords—JPEG 2000, still image compression/coding, standards.
I. INTRODUCTION
DIGITAL IMAGERY is pervasive in our world today. Con-
sequently, standards for the efficient representation and
interchange of digital images are essential. To date, some of
the most successful still image compression standards have re-
sulted from the ongoing work of the Joint Photographic Experts
Group (JPEG). This group operates under the auspices of Joint
Technical Committee 1, Subcommittee 29, Working Group 1
(JTC 1/SC 29/WG 1), a collaborative effort between the In-
ternational Organization for Standardization (ISO) and Interna-
tional Telecommunication Union Standardization Sector (ITU-
T). Both the JPEG [1–3] and JPEG-LS [4–6] standards were
born from the work of the JPEG committee. For the last few
years, the JPEG committee has been working towards the estab-
lishment of a new standard known as JPEG 2000 (i.e., ISO/IEC
15444). The fruits of these labors are now coming to bear, as
several parts of this multipart standard have recently been rati-
fied including JPEG-2000 Part 1 (i.e., ISO/IEC 15444-1 [7]).
In this paper, we provide a detailed technical description of
the JPEG-2000 Part-1 codec, in addition to a brief overview of
the JPEG-2000 standard. This exposition is intended to serve as
a reader-friendly starting point for those interested in learning
about JPEG 2000. Although many details are included in our
presentation, some details are necessarily omitted. The reader
should, therefore, refer to the standard [7] before attempting
an implementation. The JPEG-2000 codec realization in the
JasPer software [8–10] (developed by the author of this paper)
may also serve as a practical guide for implementors. (See Ap-
pendix A for more information about JasPer.) The reader may
also find [11–13] to be useful sources of information on the
JPEG-2000 standard.
The remainder of this paper is structured as follows. Sec-
tion II begins with a overview of the JPEG-2000 standard. This
is followed, in Section III, by a detailed description of the JPEG-
2000 Part-1 codec. Finally, we conclude with some closing re-
marks in Section IV. Throughout our presentation, a basic un-
derstanding of image coding is assumed.
II. JPEG 2000
The JPEG-2000 standard supports lossy and lossless com-
pression of single-component (e.g., grayscale) and multi-
component (e.g., color) imagery. In addition to this basic com-
pression functionality, however, numerous other features are
provided, including: 1) progressive recovery of an image by fi-
delity or resolution; 2) region of interest coding, whereby differ-
ent parts of an image can be coded with differing fidelity; 3) ran-
dom access to particular regions of an image without needing to
decode the entire code stream; 4) a flexible file format with pro-
visions for specifying opacity information and image sequences;
and 5) good error resilience. Due to its excellent coding per-
formance and many attractive features, JPEG 2000 has a very
large potential application base. Some possible application ar-
eas include: image archiving, Internet, web browsing, document
imaging, digital photography, medical imaging, remote sensing,
and desktop publishing.
A. Why JPEG 2000?
Work on the JPEG-2000 standard commenced with an initial
call for contributions [14] in March 1997. The purpose of having
a new standard was twofold. First, it would address a number
of weaknesses in the existing JPEG standard. Second, it would
provide a number of new features not available in the JPEG stan-
dard. The preceding points led to several key objectives for the
new standard, namely that it should: 1) allow efficient lossy and
lossless compression within a single unified coding framework,
2) provide superior image quality, both objectively and subjec-
tively, at low bit rates, 3) support additional features such as rate
and resolution scalability, region of interest coding, and a more
flexible file format, 4) avoid excessive computational and mem-
ory complexity. Undoubtedly, much of the success of the orig-
inal JPEG standard can be attributed to its royalty-free nature.
Consequently, considerable effort has been made to ensure that
a minimally-compliant JPEG-2000 codec can be implemented
free of royalties1.
B. Structure of the Standard
This document is a revised version of the JPEG-2000 tutorial that I wrote
which appeared in the JPEG working group document WG1N1734. The original
tutorial contained numerous inaccuracies, some of which were introduced by
changes in the evolving draft standard while others were due to typographical
errors. Hopefully, most of these inaccuracies have been corrected in this revised
document. In any case, this document will probably continue to evolve over
time. Subsequent versions of this document will be made available from my
home page (the URL for which is provided with my contact information).
The JPEG-2000 standard is comprised of numerous parts,
with the parts listed in Table I being defined at the time of this
writing. For convenience, we will refer to the codec defined in
1Whether these efforts ultimately prove successful remains to be seen, how-
ever, as there are still some unresolved intellectual property issues at the time of
this writing.
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
2
Copyright c 2002–2005 Michael D. Adams
Part 1 (i.e., [7]) of the standard as the baseline codec. The base-
line codec is simply the core (or minimal functionality) JPEG-
2000 coding system. Part 2 (i.e., [15]) describes extensions to
the baseline codec that are useful for certain “niche” applica-
tions, while Part 3 (i.e., [16]) defines extensions for intraframe-
style video compression. Part 5 (i.e., [17]) provides two refer-
ence software implementations of the Part-1 codec, and Part 4
(i.e., [18]) provides a methodology for testing implementations
for compliance with the standard. In this paper, we will, for the
most part, limit our discussion to the baseline codec. Some of
the extensions included in Part 2 will also be discussed briefly.
Unless otherwise indicated, our exposition considers only the
baseline system.
For the most part, the JPEG-2000 standard is written from the
point of view of the decoder. That is, the decoder is defined quite
precisely with many details being normative in nature (i.e., re-
quired for compliance), while many parts of the encoder are less
rigidly specified. Obviously, implementors must make a very
clear distinction between normative and informative clauses in
the standard. For the purposes of our discussion, however, we
will only make such distinctions when absolutely necessary.
III. JPEG-2000 CODEC
Having briefly introduced the JPEG-2000 standard, we are
now in a position to begin examining the JPEG-2000 codec in
detail. The codec is based on wavelet/subband coding tech-
niques [21, 22].
It handles both lossy and lossless compres-
sion using the same transform-based framework, and borrows
heavily on ideas from the embedded block coding with opti-
mized truncation (EBCOT) scheme [23–25].
In order to fa-
cilitate both lossy and lossless coding in an efficient manner,
reversible integer-to-integer [26–28] and nonreversible real-to-
real transforms are employed. To code transform data, the codec
makes use of bit-plane coding techniques. For entropy coding,
a context-based adaptive binary arithmetic coder [29] is used—
more specifically, the MQ coder from the JBIG2 standard [30].
Two levels of syntax are employed to represent the coded image:
a code stream and file format syntax. The code stream syntax is
similar in spirit to that used in the JPEG standard.
The remainder of Section III is structured as follows. First,
Sections III-A to III-C, discuss the source image model and
how an image is internally represented by the codec. Next, Sec-
tion III-D examines the basic structure of the codec. This is
followed, in Sections III-E to III-M by a detailed explanation of
the coding engine itself. Next, Sections III-N and III-O explain
the syntax used to represent a coded image. Finally, Section III-
P briefly describes some of the extensions included in Part 2 of
the standard.
A. Source Image Model
Before examining the internals of the codec, it is important to
understand the image model that it employs. From the codec’s
point of view, an image is comprised of one or more compo-
nents (up to a limit of 214), as shown in Fig. 1(a). As illustrated
in Fig. 1(b), each component consists of a rectangular array of
samples. The sample values for each component are integer val-
ued, and can be either signed or unsigned with a precision from
Component N−1
...
Component 2
Component 1
Component 0
Component i
.
.
.
(a)
...
...
(b)
Fig. 1. Source image model. (a) An image with N components. (b) Individual
component.
1 to 38 bits/sample. The signedness and precision of the sample
data are specified on a per-component basis.
All of the components are associated with the same spatial ex-
tent in the source image, but represent different spectral or aux-
iliary information. For example, a RGB color image has three
components with one component representing each of the red,
green, and blue color planes. In the simple case of a grayscale
image, there is only one component, corresponding to the lu-
minance plane. The various components of an image need not
be sampled at the same resolution. Consequently, the compo-
nents themselves can have different sizes. For example, when
color images are represented in a luminance-chrominance color
space, the luminance information is often more finely sampled
than the chrominance data.
B. Reference Grid
Given an image, the codec describes the geometry of the var-
ious components in terms of a rectangular grid called the ref-
erence grid. The reference grid has the general form shown
in Fig. 2. The grid is of size Xsiz Ysiz with the origin lo-
cated at its top-left corner. The region with its top-left corner at
(XOsiz;YOsiz) and bottom-right corner at (Xsiz 1;Ysiz 1)
is called the image area, and corresponds to the picture data to
be represented. The width and height of the reference grid can-
not exceed 232 1 units, imposing an upper bound on the size
of an image that can be handled by the codec.
All of the components are mapped onto the image area of
the reference grid. Since components need not be sampled at
the full resolution of the reference grid, additional information
is required in order to establish this mapping. For each com-
ponent, we indicate the horizontal and vertical sampling period
in units of the reference grid, denoted as XRsiz and YRsiz, re-
spectively. These two parameters uniquely specify a (rectangu-
lar) sampling grid consisting of all points whose horizontal and
vertical positions are integer multiples of XRsiz and YRsiz, re-
spectively. All such points that fall within the image area, con-
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
Copyright c 2002–2005 Michael D. Adams
TABLE I
PARTS OF THE STANDARD
Part
1
2
Title
Core coding system
Extensions
3
4
5
6
8
9
10
11
12
13
Motion JPEG 2000
Conformance testing
Reference software
Compound image file format
Secure JPEG 2000
Interactivity tools, APIs and pro-
tocols
3D and floating-pointdata
Wireless
ISO base media file format
Entry-level JPEG 2000 encoder
Purpose
Specifies the core (or minimal functionality) JPEG-2000 codec.
Specifies additional functionalities that are useful in some applications but need not be supported
by all codecs.
Specifies extensions to JPEG-2000 for intraframe-style video compression.
Specifies the procedure to be employed for compliance testing.
Provides sample software implementations of the standard to serve as a guide for implementors.
Defines a file format for compound documents.
Defines mechanisms for conditional access, integrity/authentication, and intellectual property
rights protection.
Specifies a client-server protocol for efficiently communicating JPEG-2000 image data over net-
works.
Provides extensions for handling 3D (e.g., volumetric) and floating-pointdata.
Provides channel coding and error protection tools for wireless applications.
Defines a common media file format used by Motion JPEG 2000 and MPEG 4.
Specifies an entry-level JPEG-2000 encoder.
This part of the standard is still under development at the time of this writing.
Xsiz
(0,0)
(XOsiz,YOsiz)
Ysiz
YOsiz
Image Area
Ysiz−YOsiz
Ysiz
XOsiz
Xsiz−XOsiz
(Xsiz−1,Ysiz−1)
Fig. 2. Reference grid.
Xsiz
(0,0)
(XTOsiz,YTOsiz)
(XOsiz,YOsiz)
T0
T3
T6
T1
T4
T7
T2
T5
T8
3
Document
[7]
[15]
[16]
[18]
[17]
[19]
[20]
YTOsiz
YTsiz
YTsiz
YTsiz
Xsiz
YRsiz YOsiz
XRsiz XOsiz
ple will correspond to the point XOsiz
XRsiz Ysiz
YRsiz and its top-left sam-
XRsiz ; YOsiz
YRsiz : Note that
stitute samples of the component in question. Thus, in terms
of its own coordinate system, a component will have the size
the reference grid also imposes a particular alignment of sam-
ples from the various components relative to one another.
From the diagram, the size of the image area is (Xsiz
XOsiz) (Ysiz YOsiz). For a given image, many combina-
tions of the Xsiz, Ysiz, XOsiz, and YOsiz parameters can be
chosen to obtain an image area with the same size. Thus, one
might wonder why the XOsiz and YOsiz parameters are not
fixed at zero while the Xsiz and Ysiz parameters are set to the
size of the image. As it turns out, there are subtle implications
to changing the XOsiz and YOsiz parameters (while keeping the
size of the image area constant). Such changes affect codec be-
havior in several important ways, as will be described later. This
behavior allows a number of basic operations to be performed
more efficiently on coded images, such as cropping, horizon-
tal/vertical flipping, and rotation by an integer multiple of 90
degrees.
C. Tiling
In some situations, an image may be quite large in compar-
ison to the amount of memory available to the codec. Conse-
quently, it is not always feasible to code the entire image as a
XTOsiz
XTsiz
XTsiz
XTsiz
Fig. 3. Tiling on the reference grid.
single atomic unit. To solve this problem, the codec allows an
image to be broken into smaller pieces, each of which is inde-
pendently coded. More specifically, an image is partitioned into
one or more disjoint rectangular regions called tiles. As shown
in Fig. 3, this partitioning is performed with respect to the ref-
erence grid by overlaying the reference grid with a rectangu-
lar tiling grid having horizontal and vertical spacings of XTsiz
and YTsiz, respectively. The origin of the tiling grid is aligned
with the point (XTOsiz;YTOsiz). Tiles have a nominal size of
XTsiz YTsiz, but those bordering on the edges of the image
area may have a size which differs from the nominal size. The
tiles are numbered in raster scan order (starting at zero).
By mapping the position of each tile from the reference grid
to the coordinate systems of the individual components, a par-
titioning of the components themselves is obtained. For exam-
ple, suppose that a tile has an upper left corner and lower right
corner with coordinates (tx0;ty0) and (tx1 1;ty1 1), respec-
tively. Then, in the coordinate space of a particular component,
the tile would have an upper left corner and lower right cor-
ner with coordinates (tcx0;tcy0) and (tcx1 1;tcy1 1), respec-
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
4
(0,0)
( , )
tcx0 tcy0
Tile−Component Data
Copyright c 2002–2005 Michael D. Adams
about zero, a number of simplifying assumptions could be made
in the design of the codec (e.g., with respect to context model-
ing, numerical overflow, etc.).
The postprocessing stage of the decoder essentially undoes
the effects of preprocessing in the encoder. If the sample val-
ues for a component are unsigned, the original nominal dynamic
range is restored. Lastly, in the case of lossy coding, clipping is
performed to ensure that the sample values do not exceed the
allowable range.
( −1, −1)
tcx1
tcy1
F. Intercomponent Transform
Fig. 4. Tile-component coordinate system.
tively, where
(tcx0;tcy0) = (dtx0=XRsize ; dty0=YRsize)
(tcx1;tcy1) = (dtx1=XRsize ; dty1=YRsize) :
(1a)
(1b)
These equations correspond to the illustration in Fig. 4. The por-
tion of a component that corresponds to a single tile is referred
to as a tile-component. Although the tiling grid is regular with
respect to the reference grid, it is important to note that the grid
may not necessarily be regular with respect to the coordinate
systems of the components.
D. Codec Structure
The general structure of the codec is shown in Fig. 5 with
the form of the encoder given by Fig. 5(a) and the decoder
given by Fig. 5(b). From these diagrams, the key processes
associated with the codec can be identified: 1) preprocess-
ing/postprocessing, 2) intercomponent transform, 3) intracom-
ponent transform, 4) quantization/dequantization, 5) tier-1 cod-
ing, 6) tier-2 coding, and 7) rate control. The decoder structure
essentially mirrors that of the encoder. That is, with the excep-
tion of rate control, there is a one-to-one correspondence be-
tween functional blocks in the encoder and decoder. Each func-
tional block in the decoder either exactly or approximately in-
verts the effects of its corresponding block in the encoder. Since
tiles are coded independently of one another, the input image
is (conceptually, at least) processed one tile at a time. In the
sections that follow, each of the above processes is examined in
more detail.
E. Preprocessing/Postprocessing
The codec expects its input sample data to have a nominal
dynamic range that is approximately centered about zero. The
preprocessing stage of the encoder simply ensures that this ex-
pectation is met. Suppose that a particular component has P
bits/sample. The samples may be either signed or unsigned,
leading to a nominal dynamic range of [2P1;2P1 1] or
[0;2P 1], respectively. If the sample values are unsigned, the
nominal dynamic range is clearly not centered about zero. Thus,
the nominal dynamic range of the samples is adjusted by sub-
tracting a bias of 2P1 from each of the sample values. If the
sample values for a component are signed, the nominal dynamic
range is already centered about zero, and no processing is re-
quired. By ensuring that the nominal dynamic range is centered
In the encoder, the preprocessing stage is followed by the for-
ward intercomponent transform stage. Here, an intercomponent
transform can be applied to the tile-component data. Such a
transform operates on all of the components together, and serves
to reduce the correlation between components, leading to im-
proved coding efficiency.
Only two intercomponent transforms are defined in the base-
line JPEG-2000 codec: the irreversible color transform (ICT)
and reversible color transform (RCT). The ICT is nonreversible
and real-to-real in nature, while the RCT is reversible and
integer-to-integer. Both of these transforms essentially map im-
age data from the RGB to YCrCb color space. The transforms
are defined to operate on the first three components of an image,
with the assumption that components 0, 1, and 2 correspond
to the red, green, and blue color planes. Due to the nature of
these transforms, the components on which they operate must
be sampled at the same resolution (i.e., have the same size). As
a consequence of the above facts, the ICT and RCT can only be
employed when the image being coded has at least three com-
ponents, and the first three components are sampled at the same
resolution. The ICT may only be used in the case of lossy cod-
ing, while the RCT can be used in either the lossy or lossless
case. Even if a transform can be legally employed, it is not
necessary to do so. That is, the decision to use a multicompo-
nent transform is left at the discretion of the encoder. After the
intercomponent transform stage in the encoder, data from each
component is treated independently.
The ICT is nothing more than the classic RGB to YCrCb color
space transform. The forward transform is defined as
V0(x; y)
V1(x; y)
V2(x; y)3
5 =2
2
4
4
0:299
0:5
0:16875 0:33126
0:587
0:114
0:5
0:41869 0:081313
5
U0(x; y)
U1(x; y)
U2(x; y)3
2
5
4
(2)
where U0(x; y), U1(x; y), and U2(x; y) are the input components
corresponding to the red, green, and blue color planes, respec-
tively, and V0(x; y), V1(x; y), and V2(x; y) are the output compo-
nents corresponding to the Y, Cr, and Cb planes, respectively.
The inverse transform can be shown to be
U0(x; y)
U1(x; y)
U2(x; y)3
2
5 =2
4
4
0
1
1 0:34413 0:71414
1
1:772
1:402
0
3
5
V0(x; y)
V1(x; y)
V2(x; y)3
2
4
5
(3)
The RCT is simply a reversible integer-to-integer approxima-
tion to the ICT (similar to that proposed in [28]). The forward
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
Copyright c 2002–2005 Michael D. Adams
5
Rate Control
Original
Image
Preprocessing
Forward
Intercomponent
Transform
Forward
Intracomponent
Transform
Quantization
Tier−1
Encoder
Tier−2
Encoder
Coded
Image
Tier−2
Decoder
Tier−1
Decoder
Dequantization
Inverse
Intracomponent
Transform
Inverse
Intercomponent
Transform
Postprocessing
(a)
Coded
Image
Reconstructed
Image
Fig. 5. Codec structure. The structure of the (a) encoder and (b) decoder.
(b)
transform is given by
V0(x; y) = 1
4 (U0(x; y) + 2U1(x; y) +U2(x; y))
V1(x; y) = U2(x; y) U1(x; y)
V2(x; y) = U0(x; y) U1(x; y)
(4a)
(4b)
(4c)
where U0(x; y), U1(x; y), U2(x; y), V0(x; y), V1(x; y), and V2(x; y)
are defined as above. The inverse transform can be shown to be
U1(x; y) = V0(x; y) 1
U0(x; y) = V2(x; y) +U1(x; y)
U2(x; y) = V1(x; y) +U1(x; y)
4 (V1(x; y) +V2(x; y))
(5a)
(5b)
(5c)
The inverse intercomponent transform stage in the decoder
essentially undoes the effects of the forward intercomponent
transform stage in the encoder. If a multicomponent transform
was applied during encoding, its inverse is applied here. Unless
the transform is reversible, however, the inversion may only be
approximate due to the effects of finite-precision arithmetic.
G. Intracomponent Transform
Following the intercomponent transform stage in the encoder
is the intracomponent transform stage. In this stage, transforms
that operate on individual components can be applied. The par-
ticular type of operator employed for this purpose is the wavelet
transform. Through the application of the wavelet transform,
a component is split into numerous frequency bands (i.e., sub-
bands). Due to the statistical properties of these subband signals,
the transformed data can usually be coded more efficiently than
the original untransformed data.
Both reversible integer-to-integer [26, 27, 31–33] and non-
reversible real-to-real wavelet transforms are employed by the
baseline codec. The basic building block for such transforms
is the 1-D 2-channel perfect-reconstruction (PR) uniformly-
maximally-decimated (UMD) filter bank (FB) which has the
general form shown in Fig. 6. Here, we focus on the lifting
realization of the UMDFB [34, 35], as it can be used to imple-
ment the reversible integer-to-integer and nonreversible real-to-
real wavelet transforms employed by the baseline codec. In fact,
x[n]
- -
?
#2
--
-
h+
6
Q1
6
A1(z)
--
6
?
A0(z)
?
Q0
?
h+
z
?
-
-
#2
-
-
-
h+
6
Ql 1
6
Al 1(z)
- -
6
?
Al 2(z)
?
Ql 2
?
h+
-
y0[n]
-
s0
y1[n]
-
s1
(a)
-
y0[n]
-
s1
0
y1[n]
-
s1
1
-
+
h+
6
Ql 1
6
Al 1(z)
6
- -
?
Al 2(z)
?
Ql 2
?-
h+
-
+
-
+
h+
6
Q1
6
A1(z)
- -
?
A0(z)
?
Q0
?-
+
h+
-
"2
x[n]
-
h+
6
z1
6
-
"2
- -6
(b)
Fig. 6. Lifting realization of a 1-D 2-channel PR UMDFB. (a) Analysis side. (b)
Synthesis side.
l 1
i=0 , fQi(x)g
l 1
i=0 , and fsig1
for this reason, it is likely that this realization strategy will be
employed by many codec implementations. The analysis side
of the UMDFB, depicted in Fig. 6(a), is associated with the for-
ward transform, while the synthesis side, depicted in Fig. 6(b),
is associated with the inverse transform.
In the diagram, the
i=0 denote filter transfer func-
fAi(z)g
tions, quantization operators, and (scalar) gains, respectively. To
l 1
obtain integer-to-integer mappings, the fQi(x)g
i=0 are selected
such that they always yield integer values, and the fsig1
i=0 are
l 1
chosen as integers. For real-to-real mappings, the fQi(x)g
i=0
are simply chosen as the identity, and the fsig1
i=0 are selected
from the real numbers. To facilitate filtering at signal bound-
aries, symmetric extension [36–38] is employed. Since an im-
age is a 2-D signal, clearly we need a 2-D UMDFB. By applying
the 1-D UMDFB in both the horizontal and vertical directions,
a 2-D UMDFB is effectively obtained. The wavelet transform is
then calculated by recursively applying the 2-D UMDFB to the
lowpass subband signal obtained at each level in the decompo-
sition.
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com
6
Copyright c 2002–2005 Michael D. Adams
LL 0
...
. ...
.
.
HLR−2
LH R−2
HH
R−2
HLR−1
subbands. Suppose that we denote the coordinates of the upper
left and lower right samples in a subband as (tbx0;tby0) and
(tbx1 1;tby1 1), respectively. These quantities are computed
as
(tbx0;tby0)
LH R−1
HHR−1
=
Fig. 7. Subband structure.
Suppose that a (R 1)-level wavelet transform is to be em-
ployed. To compute the forward transform, we apply the anal-
ysis side of the 2-D UMDFB to the tile-component data in an
iterative manner, resulting in a number of subband signals be-
ing produced. Each application of the analysis side of the 2-D
UMDFB yields four subbands: 1) horizontally and vertically
lowpass (LL), 2) horizontally lowpass and vertically highpass
(LH), 3) horizontally highpass and vertically lowpass (HL), and
4) horizontally and vertically highpass (HH). A (R 1)-level
wavelet decomposition is associated with R resolution levels,
numbered from 0 to R 1, with 0 and R 1 corresponding
to the coarsest and finest resolutions, respectively. Each sub-
band of the decomposition is identified by its orientation (e.g.,
LL, LH, HL, HH) and its corresponding resolution level (e.g.,
0;1; : : : ; R 1). The input tile-component signal is considered to
be the LLR1 band. At each resolution level (except the lowest)
the LL band is further decomposed. For example, the LLR1
band is decomposed to yield the LLR2, LHR2, HLR2, and
HHR2 bands. Then, at the next level, the LLR2 band is de-
composed, and so on. This process repeats until the LL0 band
is obtained, and results in the subband structure illustrated in
Fig. 7. In the degenerate case where no transform is applied,
R = 1, and we effectively have only one subband (i.e., the LL0
band).
As described above, the wavelet decomposition can be as-
sociated with data at R different resolutions. Suppose that the
top-left and bottom-right samples of a tile-component have co-
ordinates (tcx0;tcy0) and (tcx1 1;tcy1 1), respectively. This
being the case, the top-left and bottom-right samples of the
tile-component at resolution r have coordinates (trx0;try0) and
(trx1 1;try1 1), respectively, given by
(trx0;try0) =tcx0=2Rr1 ;tcy0=2Rr1
(trx1;try1) =tcx1=2Rr1 ;tcy1=2Rr1
where r is the particular resolution of interest. Thus, the tile-
component signal at a particular resolution has the size (trx1
trx0) (try1 try0).
Not only are the coordinate systems of the resolution levels
important, but so too are the coordinate systems for the various
(6a)
(6b)
for LL band
for HL band
(7a)
for LH band
2m for HH band
for LL band
for HL band
(7b)
for LH band
2m for HH band
2Rr1m ;l tcy0
l tcx0
l tcx0
2Rr1 1
2Rr1m ;l tcy0
l tcx0
l tcx0
2Rr1 1
8>>>>>><
>>>>>>:
8>>>>>><
2Rr1m ;l tcy1
l tcx1
l tcx1
2Rr1 1
2Rr1m ;l tcy1
l tcx1
>>>>>>:
l tcx1
2Rr1 1
(tbx1;tby1)
=
2Rr1m
2m ;l tcy0
2Rr1m
2m
2Rr1 1
2m ;l tcy0
2Rr1 1
2Rr1m
2m ;l tcy1
2Rr1m
2m
2Rr1 1
2m ;l tcy1
2Rr1 1
where r is the resolution level to which the band belongs, R is
the number of resolution levels, and tcx0, tcy0, tcx1, and tcy1 are
as defined in (1). Thus, a particular band has the size (tbx1
tbx0) (tby1 tby0). From the above equations, we can also
see that (tbx0;tby0) = (trx0;try0) and (tbx1;tby1) = (trx1;try1)
for the LLr band, as one would expect. (This should be the case
since the LLr band is equivalent to a reduced resolution version
of the original data.) As will be seen, the coordinate systems for
the various resolutions and subbands of a tile-component play
an important role in codec behavior.
By examining (1), (6), and (7), we observe that the coordi-
nates of the top-left sample for a particular subband, denoted
(tbx0;tby0), are partially determined by the XOsiz and YOsiz
parameters of the reference grid. At each level of the decompo-
sition, the parity (i.e., oddness/evenness) of tbx0 and tby0 affects
the outcome of the downsampling process (since downsampling
is shift variant). In this way, the XOsiz and YOsiz parameters
have a subtle, yet important, effect on the transform calculation.
Having described the general transform framework, we now
describe the two specific wavelet transforms supported by the
baseline codec: the 5/3 and 9/7 transforms. The 5/3 transform
is reversible, integer-to-integer, and nonlinear. This transform
was proposed in [26], and is simply an approximation to a linear
wavelet transform proposed in [39]. The 5/3 transform has an
underlying 1-D UMDFB with the parameters:
l = 2; A0(z) = 1
2 (z + 1); A1(z) = 1
Q0(x) = bxc ; Q1(x) =x + 1
2 ;
4 (1 + z1);
s0 = s1 = 1:
(8)
The 9/7 transform is nonreversible and real-to-real. This trans-
form, proposed in [22], is also employed in the FBI fingerprint
compression standard [40] (although the normalizations differ).
The 9/7 transform has an underlying 1-D UMDFB with the pa-
Click to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.comClick to buy NOW!PDF-XChange Viewerwww.docu-track.com