logo资料库

ISO/IEC 13818-3:1994.pdf

第1页 / 共113页
第2页 / 共113页
第3页 / 共113页
第4页 / 共113页
第5页 / 共113页
第6页 / 共113页
第7页 / 共113页
第8页 / 共113页
资料共113页,剩余部分请下载后查看
Introduction
0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies
0.2 Low bitrate coding of multichannel audio
0.2.1 Universal multichannel audio system
0.2.2 Representation of multichannel audio
0.2.2.1 The 3/2-stereo plus LFE format
0.2.2.2 Compatibility
0.2.2.3 Multilingual capability
0.2.3 Basic Parameters of the Multichannel Audio Coding System
0.2.3.1 Compatibility with ISO/IEC 11172-3
0.2.3.2 Audio Input/Output Format
0.2.3.3 Composite Coding Modes
0.2.3.4 Encoder and Decoder Parameters
Section 1: General
1.1 Scope
1.2 Normative References
Section 2: Technical elements
2.1 Definitions
2.2 Symbols and abbreviations
2.2.1 Arithmetic operators
2.2.2 Logical operators
2.2.3 Relational operators
2.2.4 Bitwise operators
2.2.5 Assignment
2.2.6 Mnemonics
2.2.7 Constants
2.3 Method of describing bit stream syntax
2.4 Requirements for Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies
2.4.1 Specification of the Coded Audio Bit stream Syntax
2.4.1.1 Layer I, II
2.4.1.2 Layer III
2.4.2 Semantics for the Audio Bit stream Syntax
2.4.2.1 Audio Sequence General
2.4.2.2 Audio Frame
2.4.2.3 Header
2.4.2.4 Error Check
2.4.2.5 Audio Data Layer I
2.4.2.6 Audio Data Layer II
2.4.2.7 Audio Data Layer III
2.4.2.8 Ancillary Data
2.4.3 The Audio Decoding Process
2.4.3.1 Audio Decoding Layer I, II
2.4.3.2 Audio Decoding Layer III
2.5 Requirements for low bitrate coding of multichannel audio
2.5.1 Specification of the Coded Audio Bit stream Syntax
2.5.1.1 Audio Sequence
2.5.1.2 Audio Frame Layer I
2.5.1.3 Audio Frame Layer II, III
2.5.1.4 MC_extension
2.5.1.5 MPEG1 Header
2.5.1.6 MPEG1 Error Check
2.5.1.7 MPEG1 Audio Data
2.5.1.8 MC Header
2.5.1.9 MC Error Check
2.5.1.10 MC Composite Status Information Layer I, II
2.5.1.11 MC Composite Status Information Layer III
2.5.1.12 MC Audio Data, Layer I and Layer II
2.5.1.13 MC Audio Data, Layer III
2.5.1.14 ML Audio Data, Layer I and Layer II
2.5.1.15 ML Header, Layer III
2.5.1.16 ML Main Data, Layer III
2.5.1.17 MPEG1 Ancillary Data
2.5.1.18 Ext_frame
2.5.1.19 Ext_header
2.5.1.20 Ext_ancillary_data
2.5.2 Semantics for the audio bit stream syntax
2.5.2.1 Audio Sequence General
2.5.2.2 Audio Frame Layer I
2.5.2.3 Audio Frame Layer II, III
2.5.2.4 MC_extension
2.5.2.5 MPEG1 Header
2.5.2.6 MPEG1 Error Check
2.5.2.7 MPEG1 Audio Data
2.5.2.8 MC Header
2.5.2.9 MC Error Check
2.5.2.10 MC Composite Status Info Layer I, II
2.5.2.11 MC Composite Status Information Layer III
2.5.2.12 MC Audio Data Layer I, II
2.5.2.13 MC Audio Data Layer III
2.5.2.14 ML Audio Data Layer I and Layer II
2.5.2.15 ML Audio Data Layer III
2.5.2.16 MPEG-1 Ancillary Data
2.5.2.17 Extension frame
2.5.2.18 Extension header
2.5.3 The Audio Decoding Process
2.5.3.1 General
2.5.3.2 Composite Coding Modes
2.5.3.2.1 Transmission Channel Switching
2.5.3.2.2 Dynamic Crosstalk
2.5.3.2.3 MC_Prediction
2.5.3.3 Requantisation Procedure
2.5.3.4 Decoding of Scalefactors
2.5.3.5 Decoding of Low Frequency Enchancement Channel
2.5.3.6 De-normalisation procedure
2.5.3.7 Synthesis Subband Filter
2.5.3.8 Layer III Decoding
2.5.3.8.1 Layer III Segment Lists
2.5.3.8.2 Decoding Process for Layer III
2.5.3.8.3 Decoding of LFE for Layer III
2.5.3.8.4 Decoding of ML Data for Layer III
Annex A
Diagrams
Annex B
Tables
Table B.1. Possible quantisation per subband, Layer II
Table B.2. Layer III scalefactor bands
Table B.3. Low-pass filter description
Annex C
The encoding process
C.1 Extension to lower sampling frequencies
C.1.1 Lower sampling frequencies, Layer I.
C.1.2 Lower sampling frequencies, Layer II.
C.1.3 Lower sampling frequencies, Layer III
C.2 Multichannel extension
C.2.1 Multichannel extension Layer I, II
C.2.1.1 The filterbank
C.2.1.2 Calculation of scalefactors
C.2.1.3 Psychoacoustic models
C.2.1.4 Predistortion
C.2.1.5 Matrixing
C.2.1.6 Dynamic transmission channel switching
C.2.1.7 Dynamic Crosstalk
C.2.1.8 Adaptive Multichannel Prediction
C.2.1.9 Phantom coding of centre channel
C.2.1.10 Bit Allocation
C.2.1.11 Multilingual
C.2.1.12 Formatting
C.2.2 Multichannel extension Layer III
C.2.2.1 Psychoacoustic models
C.2.2.2 The filterbank
C.2.2.3 Segment list processing
C.2.2.4 Dynamic transmission channel switching
C.2.2.5 Matrixing
C.2.2.6 Adaptive multichannel prediction
C.2.2.7 Quantization and coding
C.2.2.8 Multilingual extensions
Annex D
Psychoacoustic models
D.1 Psychoacoustic Model 1 for Lower Sampling Frequencies
Step 1 Calculation of spectrum
Step 2 Determination of the sound pressure level
Step 3 Considering the threshold in quiet
Step 4 Finding of tonal and non-tonal components
Step 5 Decimation of tonal and non-tonal masking components
Step 6 Calculation of individual masking thresholds
Step 7 Calculation of the global masking threshold LTg
Step 8 Determination of the minimum masking threshold
Step 9 Calculation of the signal-to-mask-ratio
D.2 Psychoacoustic Model 2 for Lower Sampling Frequencies
Annex E
List of patent holders
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO ISO/IEC JTC1/SC29/WG11 NO803 11 / November / 1994 Information Technology - Generic Coding of Moving Pictures and Associated Audio: Audio ISO/IEC 13818-3 International Standard
ISO/IEC 13818-3:1994(E) ©ISO/IEC Contents.......................................................................................................................................................Page Foreword................................................................................................................................................................ iii Introduction..............................................................................................................................................................v 0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies ..............................................v 0.2 Low bitrate coding of multichannel audio .........................................................................................................v Section 1: General....................................................................................................................................................1 1.1 Scope ............................................................................................................................................................1 1.2 Normative References ..................................................................................................................................1 Section 2: Technical elements..................................................................................................................................2 2.1 Definitions ....................................................................................................................................................2 2. Symbols and abbreviations...........................................................................................................................8 2.3 Method of describing bit stream syntax......................................................................................................11 2.4 Requirements for Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies ..........13 2.5 Requirements for low bitrate coding of multichannel audio.......................................................................18 Annexes A. Diagrams ...........................................................................................................................................................57 B. Tables ................................................................................................................................................................59 C. The encoding process........................................................................................................................................64 D. Psychoacoustic models......................................................................................................................................75 E. List of patent holders.......................................................................................................................................102 © ISO/IEC 1994 All rights reserved. No part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from the publisher. Printed in Switzerland. ISO/IEC Copyright Office • Case Postale 56 • CH1211 Genève 20 • Switzerland ii
©ISO/IEC ISO/IEC 13818-3:1994(E) iii
ISO/IEC 13818-3:1994(E) ©ISO/IEC Foreword iv
©ISO/IEC Introduction ISO/IEC 13818-3:1994(E) This Recommendation | International Standard was prepared by SC29/WG11, also known as MPEG (Moving Pictures Expert Group). MPEG was formed in 1988 to establish a standard for the coded representation of moving pictures and associated audio stored on digital storage media. This Recommendation | International Standard is published in three parts. Part 1 - systems - specifies the system coding layer of the standard. It defines a multiplexed structure for combining audio and video data and means of representing the timing information needed to replay synchronised sequences in real-time. Part 2 - video - specifies the coded representation of video data and the decoding process required to reconstruct pictures. Part 3 - audio - specifies the coded representation of audio data and the decoding process required to decode audio signals. 0.1 Extension of ISO/IEC 11172-3 Audio Coding to Lower Sampling Frequencies In order to achieve better audio quality at very low bit rates (<64 kbit/s per audio channel), in particular if compared with CCITT Standard G-722 performance, three additional sampling frequencies are provided for ISO/IEC 11172-3 layers I, II and III. The additional sampling frequencies are 16 kHz, 22,05 kHz and 24 kHz. This allows corresponding audio bandwidths of approximately 7,5 kHz, 10,3 kHz and 11,25 kHz. The syntax, semantics, and coding techniques of ISO/IEC 11172-3 are maintained except for a new definition of the sampling frequency field, the bitrate index field, and the bit allocation tables. These new definitions are valid if the ID bit in the ISO/IEC 11172-3 header equals zero. To obtain the best audio performance, the parameters of the psychoacoustic model used in the encoder have to be changed accordingly. With these sampling frequencies, the duration of the audio frame corresponds to : Layer I II III Sampling Frequency in kHz 16 24 22,05 24 ms 72 ms 36 ms 17,41.. ms 52,24.. ms 26,12.. ms 16 ms 48 ms 24 ms 0.2 Low bitrate coding of multichannel audio 0.2.1 Universal multichannel audio system A standard on low bit rate coding for mono or stereo audio signals was established by MPEG-1 Audio in ISO/IEC 11172-3. This standard is applicable for carrying of high quality digital audio signals associated with or without picture information on storage media or transmission channels with limited capacity. The ISO/IEC 11172-3 audio coding standard can be used together with both MPEG-1 and MPEG-2 Video as long as only two-channel stereo is required. MPEG-2 Audio (ISO/IEC 13818-3) provides the extension to 3/2 multichannel audio and an optional low frequency enhancement channel (LFE). Multichannel audio systems provide enhanced stereophonic stereo performance compared to conventional two channel audio systems. It is recognised that improved presentation performance is desirable not only for applications with accompanying picture but also for audio-only applications. A universal and compatible multichannel audio system applicable to satellite or terrestrial television broadcasting, digital audio broadcasting (terrestrial and satellite), as well as other non-broadcasting media, e.g., CATV Cable TV Distribution CDAD Cable Digital Audio Distribution ENG IPC ISM NDB Electronic News Gathering (including Satellite News Gathering) Interpersonal Communications (video conference, videophone, etc.) Interactive Storage Media (optical disks, etc.) Network Database Services (via ATM, etc.) v
ISO/IEC 13818-3:1994(E) ©ISO/IEC Digital Storage Media (digital VTR, etc.) Electronic Cinema Home Television Theatre Integrated Services Digital Network DSM EC HTT ISDN seems to be very attractive to the manufacturer, producer, and consumer. This document describes an audio subband coding system called ISO/MPEG-Audio Multichannel, which can be used to transfer high quality digital multichannel and/or multilingual audio information on storage media or transmission channels with limited capacity. One of the basic features is the backwards compatibility to ISO/IEC 11172-3 coded mono, stereo or dual channel audio programmes. It is designed for use in different applications as considered by the ISO/MPEG audio group and the specialist groups TG10/1, 10/2 and 10/3 of the ITU-R (previously CCIR). 0.2.2 Representation of multichannel audio 0.2.2.1 The 3/2-stereo plus LFE format Regarding stereophonic presentation, specialist groups of ITU-R, SMPTE, and EBU recommend the use of an additional centre loudspeaker channel C and two surround loudspeaker channels LS and RS, augmenting the front left and right loudspeaker channels L and R. This reference audio format is referred to as "3/2-stereo" (3 front / 2 surround loudspeaker channels) and requires the transmission of five appropriately formatted audio signals. For audio accompanying picture applications (e.g. HDTV), the three front loudspeaker channels ensure sufficient directional stability and clarity of the picture related frontal images, according to the common practice in the cinema. The dominant benefit is the "stable centre", which is guaranteed at any location of the listener and important for most of the dialogue. Additionally, for audio-only applications, the 3/2-stereo format has been found to be an improvement over two- channel stereophony. The addition of one pair of surround loudspeaker channels allows improved realism of auditory ambience. A low frequency enhancement channel (in this document called LFE channel) can, optionally, be added to any of these configurations. The purpose of this channel is to enable listeners to extend the low frequency content of the reproduced programme in terms of both frequency and level. In this way it is the same as the LFE channel proposed by the film industry for their digital sound systems. The LFE channel should not be used for the entire low frequency content of the multichannel sound presentation. The LFE channel is optional at the receiver, and thus should only carry low frequency sound effects, which may have a high level. The LFE channel is not included in any dematrixing operation in the decoder. The sampling frequency of the LFE channel corresponds to the sampling frequency of the main channels, divided by a factor of 96. This provides 12 LFE samples within one audio frame. The LFE channel is capable of handling signals in the range from 15 Hz to 120 Hz. 0.2.2.2 Compatibility Downwards compatibility. A hierarchy of audio formats providing a lower number of loudspeaker channels and reduced presentation performance (down to 2/0-stereo or even mono) and a corresponding set of downwards mixing equations are recommended in ITU-R Recommendation 775 : "Multichannel stereophonic audio system with and without accompanying picture", November 1992. Alternative lower level audio formats which may be used in circumstances where economic or channel capacity constraints apply, are 3/1, 3/0, 2/2, 2/1, 2/0, and 1/0. Corresponding loudspeaker arrangements are 3/2, 3/1, 3/0, 2/2, 2/1, 2/0, and 1/0. Backwards compatibility. For several applications, the intention is to extend the existing 2/0-stereo sound system by transmitting additional audio channels (centre, surround) without making use of simulcast operation. This provision of backwards compatibility with existing receivers implies the use of compatibility matrices: the decoder of the vi
©ISO/IEC ISO/IEC 13818-3:1994(E) previous generation must reproduce the two conventional basic stereo signals Lo/Ro, and the multichannel decoder produces the complete 3/2-stereo presentation L´/C´/R´/LS´/RS´ from the basic stereo signal and the extension signals. It is recognised that backward compatibility may not be required for all applications of MPEG-2 Audio. Therefore, nonbackward compatible (NBC) audio coding systems free of the constraints of backward compatibility are being evaluated for optional use with the standard. 0.2.2.3 Multilingual capability Particularly for HDTV applications, multichannel stereo performance and bilingual programmes or multilingual commentaries are required. This standard provides for alternative audio channel configurations in the five- channel sound system, for example a bilingual 2/0 stereo programme or one 2/0, 3/0 stereo sound plus accompanying services (e.g. "clean dialogue" for the hard-of-hearing, commentary for the visually impaired, multilingual commentary etc.). An important configuration is the reproduction of commentary dialogue (e.g. via centre loudspeaker) together with the common music/effect stereo downmix (examples are documentation film, sport reports). 0.2.3 Basic Parameters of the Multichannel Audio Coding System The transmission of the five audio signals of a 3/2 sound system requires five transmission channels (although, in the context of bitrate reduced signals, these are not necessarily independent). In order that two of the transmitted signals can provide a stereo service on their own, the source sound signals are generally combined in a linear matrix prior to encoding. These combined signals (and their transmission channels) are identified by the notation T0, T1, T2, T3 and T4. 0.2.3.1 Compatibility with ISO/IEC 11172-3 Backwards and forwards compatibility with an ISO/IEC 11172-3 decoder is provided. For a multichannel audio bit stream, backwards compatibility means, that an ISO/IEC 11172-3 audio decoder properly decodes the basic stereo information. The basic stereo information consists of a left and right channel that constitute an appropriate downmix of the audio information in all channels, or, optionally, the basic stereo information may consist only of the left and right channel of the multichannel audio configuration. Appropriate downmix equations are given by equation pairs (1) and (2), (3) and (4), and (5) and (6). Lo = L + x * C + y * LS Ro = R + x * C + z * RS (1) (2) or or Lo = L Ro = R Lo = L + x * C − y * jS Ro = R + x * C + y * jS (3) (4) (5) (6) where jS is derived from LS and RS by calculation of the mono component, bandwidth limitation to the range 100-7000 Hz, half Dolby®1 B-type encoding, and 90 degrees phase shifting (Prologic®1 surround matrixing). Compatibility with existing surround sound decoders by use of equations (5) and (6) has not been verified at the time of printing of this Recommendation | International Standard. Forwards compatibility means that an MPEG 2 multichannel audio decoder is able to decode properly an ISO/IEC 11172-3 audio bit stream. 1Dolby and Prologic are registered trademarks of Dolby Laboratories Licensing Corp. vii
ISO/IEC 13818-3:1994(E) ©ISO/IEC The following combinations are possible: Basic Lo, Ro Stereo Multichannel Extension Layer II Layer III Layer I Layer II mc Layer III mc Layer II mc This document describes the combinations of the basic Lo, Ro stereo of Layer I, II and III and the multichannel extension of Layer II mc and Layer III mc. The ISO/MPEG-Audio Multichannel system provides full compatibility with the ISO Standard 11172-3. This compatibility is realised by coding the basic stereo information in conformance with ISO/IEC 11172-3 and exploiting the ancillary data field of the ISO/IEC 11172-3 audio frame and an optional extension bit stream for the multichannel extension. The complete ISO/IEC 11172-3 frame incorporates four different types of information: - Header information within the first 32 bits of the ISO/IEC 11172-3 audio frame. - Cyclic Redundancy Check (CRC), consisting of 16 bits, just after the header information (optional). - Audio data, for Layer II consisting of bit allocation (BAL), scalefactor select information (SCFSI), scalefactors (SCF), and the subband samples. - Ancillary data. Due to the large number of different applications which will use the ISO/IEC 11172-3 Standard, the length and usage of this field are not specified. The variable length of the ancillary data field enables packing the complete extension information of the channels T2/T3/T4 into the first part of the ancillary data field. If the MC encoder does not use all of the ancillary data field for the multichannel extension information, the remaining part of the field can be used for other ancillary data. The bit rate required for the multichannel extension information may vary on a frame by frame basis, depending on the sound signals. The overall bit rate may be increased above that provided for in ISO/IEC 11172-3 by the use of an optional extension bit stream. The maximum bit rate, including the extension bit stream, is given by the following table: Sampling Frequency Layer Maximum Total Bit Rate 32 kHz 32 kHz 32 kHz 44.1 kHz 44.1 kHz 44.1 kHz 48 kHz 48 kHz 48 kHz I II III I II III I II III 903 kbit/s 839 kbit/s 775 kbit/s 1075 kbit/s 1011 kbit/s 947 kbit/s 1130 kbit/s 1066 kbit/s 1002 kbit/s 0.2.3.2 Audio Input/Output Format Sampling frequencies : 48, 44.1 or 32 kHz Quantisation : The following combinations of audio channels can be applied as inputs to the audio encoder: up to 24 bits/sample PCM resolution viii
分享到:
收藏