H.264 and MPEG-4 Video
Compression
Video Coding for Next-generation Multimedia
Iain E. G. Richardson
The Robert Gordon University, Aberdeen, UK
Copyright C 2003
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone
(+44) 1243 779777
Email (for orders and customer service enquiries): cs-books@wiley.co.uk
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording,
scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988
or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham
Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher.
Requests to the Publisher should be addressed to the Permissions Department, John Wiley &
Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed
to permreq@wiley.co.uk, or faxed to (+44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to
the subject matter covered. It is sold on the understanding that the Publisher is not engaged
in rendering professional services. If professional advice or other expert assistance is
required, the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-470-84837-5
Typeset in 10/12pt Times roman by TechBooks, New Delhi, India
Printed and bound in Great Britain by Antony Rowe, Chippenham, Wiltshire
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
To Phyllis
Contents
About the Author
Foreword
Preface
Glossary
1 Introduction
1.1 The Scene
1.2 Video Compression
1.3 MPEG-4 and H.264
1.4 This Book
1.5 References
2 Video Formats and Quality
2.1 Introduction
2.2 Natural Video Scenes
2.3 Capture
2.3.1 Spatial Sampling
2.3.2 Temporal Sampling
2.3.3 Frames and Fields
2.4 Colour Spaces
2.4.1 RGB
2.4.2 YCbCr
2.4.3 YCbCr Sampling Formats
2.5 Video Formats
2.6 Quality
2.6.1 Subjective Quality Measurement
2.6.2 Objective Quality Measurement
2.7 Conclusions
2.8 References
xiii
xv
xix
xxi
1
1
3
5
6
7
9
9
9
10
11
11
13
13
14
15
17
19
20
21
22
24
24
•viii
3 Video Coding Concepts
3.1 Introduction
3.2 Video CODEC
3.3 Temporal Model
3.3.1 Prediction from the Previous Video Frame
3.3.2 Changes due to Motion
3.3.3 Block-based Motion Estimation and Compensation
3.3.4 Motion Compensated Prediction of a Macroblock
3.3.5 Motion Compensation Block Size
3.3.6 Sub-pixel Motion Compensation
3.3.7 Region-based Motion Compensation
3.4 Image model
3.4.1 Predictive Image Coding
3.4.2 Transform Coding
3.4.3 Quantisation
3.4.4 Reordering and Zero Encoding
3.5 Entropy Coder
3.5.1 Predictive Coding
3.5.2 Variable-length Coding
3.5.3 Arithmetic Coding
3.6 The Hybrid DPCM/DCT Video CODEC Model
3.7 Conclusions
3.8 References
4 The MPEG-4 and H.264 Standards
4.1 Introduction
4.2 Developing the Standards
4.2.1 ISO MPEG
4.2.2 ITU-T VCEG
4.2.3 JVT
4.2.4 Development History
4.2.5 Deciding the Content of the Standards
4.3 Using the Standards
4.3.1 What the Standards Cover
4.3.2 Decoding the Standards
4.3.3 Conforming to the Standards
4.4 Overview of MPEG-4 Visual/Part 2
4.5 Overview of H.264 / MPEG-4 Part 10
4.6 Comparison of MPEG-4 Visual and H.264
4.7 Related Standards
4.7.1 JPEG and JPEG2000
4.7.2 MPEG-1 and MPEG-2
4.7.3 H.261 and H.263
4.7.4 Other Parts of MPEG-4
4.8 Conclusions
4.9 References
CONTENTS
27
27
28
30
30
30
32
33
34
37
41
42
44
45
51
56
61
61
62
69
72
82
83
85
85
85
86
87
87
88
88
89
90
90
91
92
93
94
95
95
95
96
97
97
98
CONTENTS
5 MPEG-4 Visual
5.1 Introduction
5.2 Overview of MPEG-4 Visual (Natural Video Coding)
5.2.1 Features
5.2.2 Tools, Objects, Profiles and Levels
5.2.3 Video Objects
5.3 Coding Rectangular Frames
5.3.1 Input and Output Video Format
5.3.2 The Simple Profile
5.3.3 The Advanced Simple Profile
5.3.4 The Advanced Real Time Simple Profile
5.4 Coding Arbitrary-shaped Regions
5.4.1 The Core Profile
5.4.2 The Main Profile
5.4.3 The Advanced Coding Efficiency Profile
5.4.4 The N-bit Profile
5.5 Scalable Video Coding
5.5.1 Spatial Scalability
5.5.2 Temporal Scalability
5.5.3 Fine Granular Scalability
5.5.4 The Simple Scalable Profile
5.5.5 The Core Scalable Profile
5.5.6 The Fine Granular Scalability Profile
5.6 Texture Coding
5.6.1 The Scalable Texture Profile
5.6.2 The Advanced Scalable Texture Profile
5.7 Coding Studio-quality Video
5.7.1 The Simple Studio Profile
5.7.2 The Core Studio Profile
5.8 Coding Synthetic Visual Scenes
5.8.1 Animated 2D and 3D Mesh Coding
5.8.2 Face and Body Animation
5.9 Conclusions
5.10 References
6 H.264/MPEG-4 Part 10
6.1 Introduction
6.1.1 Terminology
6.2 The H.264 CODEC
6.3 H.264 structure
6.3.1 Profiles and Levels
6.3.2 Video Format
6.3.3 Coded Data Format
6.3.4 Reference Pictures
6.3.5 Slices
6.3.6 Macroblocks
•ix
99
99
100
100
100
103
104
106
106
115
121
122
124
133
138
141
142
142
144
145
148
148
149
149
152
152
153
153
155
155
155
156
156
156
159
159
159
160
162
162
162
163
163
164
164
•x
6.4 The Baseline Profile
Inter Prediction
Intra Prediction
6.4.1 Overview
6.4.2 Reference Picture Management
6.4.3 Slices
6.4.4 Macroblock Prediction
6.4.5
6.4.6
6.4.7 Deblocking Filter
6.4.8 Transform and Quantisation
4 × 4 Luma DC Coefficient Transform and Quantisation
6.4.9
(16 × 16 Intra-mode Only)
6.4.10 2 × 2 Chroma DC Coefficient Transform and Quantisation
6.4.11 The Complete Transform, Quantisation, Rescaling and Inverse
Transform Process
6.4.12 Reordering
6.4.13 Entropy Coding
6.5 The Main Profile
6.5.1 B Slices
6.5.2 Weighted Prediction
6.5.3 Interlaced Video
6.5.4 Context-based Adaptive Binary Arithmetic Coding (CABAC)
6.6 The Extended Profile
6.6.1 SP and SI slices
6.6.2 Data Partitioned Slices
6.7 Transport of H.264
6.8 Conclusions
6.9 References
7 Design and Performance
7.1 Introduction
7.2 Functional Design
7.2.1 Segmentation
7.2.2 Motion Estimation
7.2.3 DCT/IDCT
7.2.4 Wavelet Transform
7.2.5 Quantise/Rescale
7.2.6 Entropy Coding
7.3 Input and Output
7.3.1 Interfacing
7.3.2 Pre-processing
7.3.3 Post-processing
7.4 Performance
7.4.1 Criteria
7.4.2 Subjective Performance
7.4.3 Rate–distortion Performance
CONTENTS
165
165
166
167
169
170
177
184
187
194
195
196
198
198
207
207
211
212
212
216
216
220
220
222
222
225
225
225
226
226
234
238
238
238
241
241
242
243
246
246
247
251