logo资料库

ISO_IEC_11172-2 MPEG1标准.pdf

第1页 / 共111页
第2页 / 共111页
第3页 / 共111页
第4页 / 共111页
第5页 / 共111页
第6页 / 共111页
第7页 / 共111页
第8页 / 共111页
资料共111页,剩余部分请下载后查看
TITLE PAGE PROVIDED BY ISOCD 11172-2 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO -- FOR DIGITAL STORAGE MEDIA AT UP TO ABOUT 1.5 Mbit/s -- Part 2 Video CONTENTS FOREWORD INTRODUCTION - PART 2: VIDEO I.1 I.2 I.3 I.4 I.5 I.6 Temporal Processing Spatial Redundancy Reduction Purpose I.1.1 Coding Parameters Overview of the Algorithm I.2.1 I.2.2 Motion Representation - Macroblocks I.2.3 Encoding Decoding Structure of the Coded bitstream Features Supported by the Algorithm I.6.1 I.6.2 I.6.3 I.6.4 I.6.5 Random Access Fast Search Reverse Playback Error Robustness Editing 1 2 Scope References GENERAL NORMATIVE ELEMENTS 1.1 1.2 TECHNICAL NORMATIVE ELEMENTS 2.1 2.2 Definitions Symbols and Abbreviations Arithmetic Operators 2.2.1 Logical Operators 2.2.2 Relational Operators 2.2.3 Bitwise Operators 2.2.4 2.2.5 Assignment 2.2.6 Mnemonics 2.2.7 Constants Method of Describing Bit Stream Syntax Requirements 2.4.1 2.4.2 2.4.3 2.4.4 Coding Structure and Parameters Specification of the Coded Video Bitstream Syntax Semantics for the Video Bitstream Syntax The Video Decoding Process 2.3 2.4 2-ANNEX A (normative) 8 by 8 Inverse Discrete Cosine Transform 2-ANNEX B (normative) Variable Length Code Tables 2-B.1 Macroblock Addressing 2-B.2 Macroblock Type 2-B.3 Macroblock Pattern 2-B.4 Motion Vectors 2-B.5 DCT Coefficients 2-ANNEX C (normative) Video Buffering Verifier 2-C.1 Video Buffering Verifier 4 5 5 5 5 6 6 6 7 8 9 9 10 10 10 10 10 11 11 11 12 12 19 19 19 20 20 20 20 21 21 23 23 26 33 41 A-1 B-1 B-1 B-2 B-3 B-4 B-5 C-1 C-1
2-ANNEX D (informative) Guide to Encoding Video 2-D.1 2-D.2 OVERVIEW INTRODUCTION 2-D.2.1 Video Concepts 2-D.2.2 MPEG Video Compression Techniques 2-D.2.3 Bitstream Hierarchy 2-D.2.4 Decoder Overview 2-D.3 PREPROCESSING 2-D.3.1 Conversion from CCIR 601 Video to MPEG SIF 2-D.3.2 Conversion from Film 2-D.4 MODEL DECODER 2-D.4.1 Need for a Decoder Model 2-D.4.2 Decoder Model 2-D.4.3 Buffer Size and Delay 2-D.5 MPEG VIDEO BITSTREAM SYNTAX 2-D.5.1 Sequence 2-D.5.2 Group of Pictures 2-D.5.3 Picture 2-D.5.4 Slice 2-D.5.5 Macroblock 2-D.5.6 Block 2-D.6 CODING MPEG VIDEO 2-D.6.1 Rate Control 2-D.6.2 Motion Estimation and Compensation 2-D.6.2.1 Motion Compensation 2-D.6.2.2 Motion Estimation 2-D.6.2.3 Coding of Motion Vectors 2-D.6.3 Coding I-Pictures 2-D.6.3.1 Slices in I-Pictures 2-D.6.3.2 Macroblocks in I Pictures 2-D.6.3.3 DCT Transform 2-D.6.3.4 Quantization 2-D.6.3.4 Coding of Quantized Coefficients 2-D.6.4 Coding P-Pictures 2-D.6.4.1 Slices in P-Pictures 2-D.6.4.2 Macroblocks in P Pictures 2-D.6.4.3 Selection of Macroblock Type 2-D.6.4.4 DCT Transform 2-D.6.4.5 Quantization of P Pictures 2-D.6.4.6 Coding of Quantized Coefficients 2-D.6.5 Coding B-Pictures 2-D.6.5.1 Slices in B-Pictures 2-D.6.5.2 Macroblocks in B Pictures 2-D.6.5.4 Selection of Macroblock Type 2-D.6.5.5 DCT Transform 2-D.6.5.6 Quantization 2-D.6.5.7 Coding 2-D.6.6 Coding D-Pictures 2-D.7 DECODING MPEG VIDEO 2-D.7.1 Decoding a Sequence 2-D.8 POST PROCESSING 2-D.8.1 Editing 2-D.8.2 Resampling 2-D.8.2.1 Conversion of MPEG SIF to CCIR 601 Format 2-D.8.2.2 Temporal Resampling D-1 D-1 D-1 D-1 D-2 D-3 D-6 D-7 D-7 D-10 D-11 D-11 D-11 D-12 D-13 D-13 D-17 D-20 D-24 D-26 D-28 D-28 D-28 D-29 D-29 D-30 D-34 D-36 D-36 D-36 D-37 D-38 D-39 D-44 D-45 D-45 D-47 D-50 D-50 D-51 D-51 D-51 D-52 D-53 D-54 D-54 D-54 D-54 D-55 D-55 D-56 D-56 D-57 D-57 D-58
E-1 2-ANNEX E (informative) Bibliography FOREWORD This standard is a committee draft that was submitted for approval to ISO-IEC/JTC1 SC29 on 23 November 1991. It was prepared by SC29/WG11 also known as MPEG (Moving Pictures Expert Group). MPEG was formed in 1988 to establish a standard for the coded representation of moving pictures and associated audio stored on digital storage media. This standard is published in four Parts. Part 1 - systems - specifies the system coding layer of the standard. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronized sequences in real-time. Part 2 - video - specifies the coded representation of video data and the decoding process required to reconstruct pictures. Part 3 - audio - specifies the coded representation of audio data. Part 4 - conformance testing - is still in preparation. It will specify the procedures for determining the characteristics of coded bit streams and for testing compliance with the requirements stated in Parts 1, 2 and 3. In Part 1 of this standard all annexes are informative and contain no normative requirements. In Part 2 of this standard 2-Annex A, 2-Annex B and 2-Annex C contain normative requirements and are an integral part of this standard. 2-Annex D and 2-Annex E are informative and contain no normative requirements. In Part 3 of this standard 3-Annex A and 3-Annex B contain normative requirements and are an integral part of this standard. All other annexes are informative and contain no normative requirements. INTRODUCTION - PART 2: VIDEO I.1 This Part of this standard was developed in response to the growing need for a common format for representing compressed video on various digital storage media such as CDs, DATs, Winchester disks and optical drives. The standard specifies a coded representation that can be used for compressing video sequences to bitrates around 1.5 Mbit/s. The use of this standard means that motion video can be manipulated as a form of computer data and can be transmitted and received over existing and future networks. The coded representation can be used with with both 625-line and 525- line television and provides flexibility for use with workstation and personal computer displays. The standard was developed to operate principally from storage media offering a continual transfer rate of about 1.5 Mbit/s. Nevertheless it can be used more widely than this because the approach taken is generic. Purpose I.1.1 Coding Parameters The intention in developing this standard has been to define a source coding algorithm with a large degree of flexibility that can be used in many different applications. To achieve this goal, a number of the parameters defining the characteristics of coded bitstreams and decoders are contained in the bitstream itself. This allows for example, the algorithm to be used for pictures with a variety of sizes and aspect ratios and on channels or devices operating at a wide range of bitrates. Because of the large range of the characteristics of bitstreams that can be represented by this standard, a sub-set of these coding parameters known as the "Constrained Parameters bitstream" has been defined. The aim in defining the constrained parameters is to offer guidance about a widely useful range of parameters. Conforming to this set of constraints in not a requirement of this standard. A flag in the bitstream indicates whether or not it is a Constrained Parameters bitstream. Summary of the Constrained Parameters: Horizontal picture size Vertical picture size Less than or equal to 768 pels Less than or equal to 576 lines
Less than +/- 64 pels (using half-pel vectors) Overview of the Algorithm Less than or equal to 396 macroblocks Less than or equal to 396x25 macroblocks per second Less than or equal to 30 Hz Picture area Pixel rate Picture rate Motion vector range [Vector scale code <= 4) Input buffer size (in VBV model) Less than or equal to 327 680 bits Bitrate Less than or equal to 1 856 000 bits/second (constant bitrate) I.2 The coded representation defined in this standard achieves a high compression ratio while preserving good picture quality. The algorithm is not lossless as the exact pel values are not preserved during coding. The choice of the techniques is based on the need to balance a high picture quality and compression ratio with the requirement to make random access to the coded bitstream. Obtaining good picture quality at the bitrates of interest demands very high compression, which is not achievable with intraframe coding alone. The need for random access, however, is best satisfied with pure intraframe coding. This requires a careful balance between intra- and interframe coding and between recursive and non-recursive temporal redundancy reduction. A number of techniques are used to achieve high compression. The first, which is almost independent from this standard, is to select an appropriate spatial resolution for the signal. The algorithm then uses block-based motion compensation to reduce the temporal redundancy. Motion compensation is used both for causal prediction of the current picture from a previous picture, and for non-causal, interpolative prediction from past and future pictures. Motion vectors are defined for each 16-pel by 16-line region of the image. The difference signal, the prediction error, is further compressed using the discrete cosine transform (DCT) to remove spatial correlation before it is quantized in an irreversible process that discards the less important information. Finally, the motion vectors are combined with the residual DCT information, and transmitted using variable length codes. I.2.1 Temporal Processing Because of the conflicting requirements of random access and highly efficient compression, three main picture types are defined. Intra pictures (I-Pictures) are coded without reference to other pictures. They provide access points to the coded sequence where decoding can begin, but are coded with only moderate compression. Predicted pictures (P- Pictures) are coded more efficiently using motion compensated prediction from a past intra or predicted picture and are generally used as a reference for further prediction. Bidirectionally-predicted pictures (B-Pictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation. Bidirectionally- predicted pictures are never used as references for prediction. The organisation of the three picture types in a sequence is very flexible. The choice is left to the encoder and will depend on the requirements of the application. Figure I.1 illustrates the relationship between the three different picture types.
Figure I.1 Example of temporal picture structure, M=3 The fourth picture type defined in the standard, the DC-picture, is provided to allow a simple, but limited quality, fast- forward mode. I.2.2 Motion Representation - Macroblocks The choice of 16 by 16 macroblocks for the motion-compensation unit is a result of the trade-off between the coding gain provided by using motion information and the overhead needed to store it. Each macroblock can be one of a number of different types. For example, intra-coded, forwards- prediction-coded, backwards-prediction coded, and bidirectionally- prediction-coded macroblocks are permitted in bidirectionally-predicted pictures. Depending on the type of the macroblock, motion vector information and other side information is stored with the compressed prediction error signal in each macroblock. The motion vectors are encoded differentially with respect to the last transmitted motion vector, using variable length codes. The maximum length of the vectors that may be represented can be programmed, on a picture-by-picture basis, so that the most demanding applications can be met without compromising the performance of the system in more normal situations. It is the responsibility of the encoder to calculate appropriate motion vectors. The standard does not specify how this should be done. I.2.3 Spatial Redundancy Reduction Both original pictures and prediction error signals have high spatial redundancy. This standard uses a block-based DCT method with visually weighted quantization and run-length coding. After motion compensated prediction or interpolation, the residual picture is split into 8 by 8 blocks. These are transformed into the DCT domain where they are weighted before being quantized. After quantization many of the coefficients are zero in value and so two-dimensional run-length and variable length coding is used to encode the remaining coefficients efficiently. I.3 The standard does not specify an encoding process. It specifies the syntax and semantics of the bitstream and the signal processing in the decoder. As a result, many options are left open to encoders to trade-off cost and speed against picture quality and coding efficiency. This clause is a brief description of the functions that need to be performed by an encoder. Figure I.2 shows the main functional blocks. Encoding 1 2 3 4 5 6 7 8 9 I B B B P B B B P Forward PredictionBidirectional Prediction
Figure I.2 Simplified Video Encoder Block Diagram The input video signal must be digitized and represented as a luminance and two colour difference signals (Y, Cr, Cb). This may be followed by preprocessing and format conversion to select an appropriate window, resolution and input format. The standard requires that the colour difference signals (Cr and Cb) are subsampled with respect to the luminance by 2:1 in both vertical and horizontal directions and are reformatted, if necessary, as a non-interlaced signal. The encoder must choose which picture type to use for each picture. Having defined the picture types, the encoder estimates motion vectors for each 16 by 16 macroblock in the picture. Typically in P-Pictures one vector is needed for each macroblock and in B-Pictures one or two vectors are needed. If B-Pictures are used, some reordering of the picture sequence is necessary before encoding. Because B-Pictures are coded using bidirectional motion compensated prediction, they can only be decoded after the subsequent reference picture (an I or P-Picture) has been decoded. Therefore the pictures are reordered by the encoder so that the pictures arrive at the decoder in the order for decoding. The correct display order is recovered by the decoder. The basic unit of coding within a picture is the macroblock. Within each picture, macroblocks are encoded in sequence, left to right, top to bottom. Each macroblock consists of six component 8 by 8 blocks: four blocks of luminance, one block of Cb chrominance, and one block of Cr chrominance. See Figure I.3. Note that the picture area covered by the four blocks of luminance is the same as the area covered by each of the chrominance blocks. This is due to subsampling of the chrominance information to match sensitivity of the human visual system. Frame Re-order Motion Estimator Source input pictures+ DCT Q VLC Q -1DCT -1+ Framestores and Predictor Vectors Modes Multi- plex Regulator BufferData
Figure I.3 Macroblock Structure Firstly, for a given macroblock, the coding mode is chosen. It depends on the picture type, the effectiveness of motion compensated predictionin that local` region, and the nature of the signal within the block. Secondly, depending on the coding mode, a motion compensated prediction of the contents of the block based on past and/or future reference pictures is formed. This prediction is subtracted from the actual data in the current macroblock to form an error signal. Thirdly, this error signal is separated into 8 by 8 blocks (4 luminance and 2 chrominance blocks in each macroblock) and a discrete cosine transform is performed on each block. The resulting 8 by 8 block of DCT coefficients is quantized and the two-dimensional block is scanned in a zig-zag order to convert it into a one-dimensional string of quantized DCT coefficients. Fourthly, the side-information for the macroblock (type, vectors etc) and the quantized coefficient data is encoded. For maximum efficiency, a number of variable length code tables are defined for the different data elements. Run-length coding is used for the quantized coefficient data. A consequence of using different picture types and variable length coding is that the overall data rate is variable. In applications that involve a fixed-rate channel, a FIFO buffer may be used to match the encoder output to the channel. The status of this buffer may be monitored to control the number of bits generated by the encoder. Controlling the quantization process is the most direct way of controlling the bitrate. The standard specifies an abstract model of the buffering system (the Video Buffer Verifier) in order to constrain the maximum variability in the number of bits that are used for a given picture. This ensures that a bitstream can be decoded with a buffer of known size At this stage, the coded representation of the picture has been generated. The final step in the encoder is to regenerate I- Pictures and P-Pictures by decoding the data so that they can be used as reference pictures for subsequent encoding. The quantized coefficients are dequantized and an inverse 8 by 8 DCT is performed on each block. The error signal produced is then added back to the prediction signal and clipped to the required range to give a decoded reference picture. I.4 Decoding is the inverse of the encoding operation. It is considerably simpler than encoding as there is no need to perform motion estimation and there are many fewer options. The decoding process is defined by this standard. The description that follows is a very brief overview of one possible way of decoding a bitstream. Other decoders with different architectures are possible. Figure I.4 shows the main functional blocks. Decoding 0 1 2 3 4 5 Y CrCb
Figure I.4 Basic Video Decoder Block Diagram For fixed-rate applications the channel fills a FIFO buffer at a constant rate with the coded bitstream. The decoder reads this buffer and decodes the data elements in the bitstream according to the defined syntax. As the decoder reads the stream, it identifies the start of a coded picture and then the type of the picture. It decodes each macroblock in the picture in turn. The macroblock type and the motion vectors, if present, are used to construct a prediction of the current macroblock based on past and future reference pictures that have been stored in the decoder. The coefficient data is decoded and inverse quantized. Each 8 by 8 block of coefficient data is transformed by an inverse DCT (specified in 2-Annex A), and the result is added to the prediction signal and clipped to the defined range. After all the macroblocks in the picture have been processed, the picture has been reconstructed. If it is an I-picture or a P-picture it is a reference picture for subsequent pictures and is stored, replacing the oldest stored reference picture. Before the pictures are displayed they may need to be re-ordered from the coding order to their natural display order. After reordering the pictures are available, in digital form, for post-processing and display in any manner that the application chooses. I.5 This standard specifies a syntax for a compressed bitstream. This syntax contains six layers, each of which either supports a signal processing or a system function: Layers of the syntax Sequence layer Random access unit: context Group of pictures layer Random access unit: video Picture layer Slice layer Macroblock layer Motion compensation unit Block layer I.6 Features Supported by the Algorithm Structure of the Coded bitstream Function Primary coding unit Resynchronization unit DCT unit BufferMUX -1Quantizer StepsizeQ -1DCT -1+ Side InformationMotion VectorsFramestores and Predictors
分享到:
收藏