vp9-bitstream_superframe-and-uncompressed-header

VP9 Bitstream superframe and uncompressed header DRAFT Rev 1.0, 2015.12.08 1. Introduction 1.1 Frames, Tiles and Blocks 1.2 Overview of Compressed Frame Format 2. Method of specifying syntax in tabular form 3. Superframe 3.1 Syntax 3.2 Semantics 4. Uncompressed Header 4.1 Syntax 4.2 Semantics 4.3 Bit depth, Color space and chroma subsampling 4.3.1 Syntax 4.3.2 Semantics 4.4 Frame Buffers 4.5 Frame Size 4.5.1 Syntax 4.5.2 Semantics 4.6 Interpolation Filters 4.6.1 Syntax 4.6.2 Semantics 4.7 Loop Filter 4.7.1 Syntax 4.7.2 Semantics 4.8 Quantization 4.8.1 Syntax 4.8.2 Semantics 4.9 Segmentation 4.9.1 Syntax 4.9.2 Semantics 4.10 Tiles 4.10.1 Syntax 4.10.2 Semantics Appendix A dc_qlookup and ac_qlookup tables

1. Introduction This document is a description of part of the VP9 bitstream format. It covers the highlevel format of frames, superframes, as well as the full format of the frame uncompressed header and superframe index. 1.1 Frames, Tiles and Blocks VP9 is based on decomposition of video image frames into rectangular pixel blocks of different sizes, prediction of such blocks using previously reconstructed blocks, and transform coding of the residual signal. There are only two frame types in VP9. Intra frames or key frames are decoded without reference to any other frame in a sequence. Inter frames are encoded with reference to previously encoded frames or reference buffers. Video image blocks are grouped in tiles. Based on the layout of the tiles within image frames, tiles can be categorized into column tiles and row tiles. Column tiles are partitioned from image frames vertically into columns and row tiles are partitioned from image frames horizontally into rows. Column tiles are independently coded and decodable units of the video frame. Each column tile can be decoded completely independently. Row tiles are interdependent. There has to be at least one tile per frame. Tiles are then broken into 64x64 super blocks that are coded in raster order within the video frame. 1.2 Overview of Compressed Frame Format Every frame begins with an uncompressed header, followed by a compressed header. Beside the headers, the main body of a compressed video frame contains the compressed perblock data for one or more tiles. Uncompressed header contains bitstream profile, frame type (intra or inter), colorspace descriptor, YUV chroma subsampling, YUV range, frame size, motion compensation interpolation filter type, frame buffers to be refreshed, loop filter parameters, quantization Copyright © 2015 Google Page 1

parameters, segmentation and tiling information. There is also frame context information and binary flags indicating the use of error resilient mode, parallel decoding mode, high precision motion vector mode, and intra only mode. See section 4 for details. Compressed header contains frame transform mode, probabilities to decode transform size for each block within the frame, probabilities to decode transform coefficients, probabilities to decode modes and motion vectors, and so on. VP9 supports consolidating multiple compressed video frames into one single chunk, called “superframe”. The superframe index is stored in the last up to 34 bytes of a chunk. The enclosed frames can be located by parsing superframe index. See section 3 for details. 2. Method of specifying syntax in tabular form The syntax tables specify a superset of the syntax of all allowed bitstreams. Additional constraints on the syntax may be specified, either directly or indirectly, in other clauses. Note that the methods for identifying and handling errors and other such situations are not specified in this document. The functions presented here are used in the syntactical description. These functions are expressed in terms of the value of a bitstream pointer that indicates the position of the next bit to be read by the decoding process from the bitstream. read_bits(n) reads the next n bits from the bitstream and advances the bitstream pointer by n bit positions. When n is equal to 0, read_bits(n) is specified to return a value equal to 0 and to not advance the bitstream pointer. The following descriptors specify the parsing process of each syntax element: f(n): fixedpattern bit string using n bits written (from left to right) with the left bit first. The parsing process for this descriptor is specified by the return value of the function read_bits(n). u(n): unsigned integer using n bits. When n is "v" in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified by the return value of the function read_bits(n) interpreted as a binary representation of an unsigned integer with most significant bit written first. Copyright © 2015 Google Page 2

s(n): signed integer using n bits for the value and 1 bit for signness. When n is "v" in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified below: s(n) { value return sign ? value : value } Descriptor u(n) u(1) u_bytes(n): unsigned integer using n bytes with least significant byte first. When n is "v" in the syntax table, the number of bits varies in a manner dependent on the value of other syntax elements. The parsing process for this descriptor is specified below: u_bytes(n) { value = 0 for (i = 0; i < n; ++i) value |= byte << (i * 8) return value } Descriptor u(8) 3. Superframe Superframe is a way that vp9 uses to consolidate multiple compressed video frames into one single chunk. The superframe index is stored in the last up to 34 bytes of a chunk. To determine whether a chunk of data has a superframe index, check the last byte in the chunk for the marker 3 bits (0b110xxxxx): If the marker bits are set apply a mask to the byte: mask 2 bits (0bxxx11xxx) and add 1 to get the size in bytes for each frame size (14) mask 3 bits (0bxxxxx111) add add 1 to get the # of frame sizes encoded in the index (18) Calculate the total size of the index as follows: 2 + number of frame sizes encoded * number of bytes to hold each frame size 2 represents the one byte holding a marker byte that's duplicated at the start of the index and at Copyright © 2015 Google Page 3

the end of the index. Go to the first byte of the superframe index, and check that it matches the last byte of the superframe index. No valid frame can have 0b110xxxxx in the last byte. In the encoder we insure this by adding a byte 0 to the data in this case. The actual frames are concatenated one frame after each other consecutively. To find the second piece of compressed data in a chunk, read the index to get the frame size, and offset from the start of data to that byte using the size pulled from the data. 3.1 Syntax superframe_index() { SUPERFRAME_MARKER /* equal to 0b110 */ frame_size_length_minus_one num_frames_minus_one for (i = 0; i <= num_frames_minus_one; ++i) frame_sizes[i] SUPERFRAME_MARKER /* equal to 0b110 */ frame_size_length_minus_one num_frames_minus_one } Descriptor f(3) u(2) u(3) u_bytes(v) f(3) u(2) u(3) 3.2 Semantics frame_size_length_minus_one plus one specifies the size in bytes for each frame. num_frames_minus_one plus one specifies number of frames contained in this superframe. frame_sizes[i] specifies the size of ith frame. Every frame_sizes[i] has frame_size_length_minus_one + 1 bytes. Copyright © 2015 Google Page 4

4. Uncompressed Header Uncompressed header contains bitstream profile, frame type (intra or inter), colorspace descriptor, YUV chroma subsampling, YUV range, frame size, motion compensation interpolation filter type, frame buffers to be refreshed, loop filter parameters, quantization parameters, segmentation and tiling information. There is also frame context information and binary flags indicating the use of error resilient mode, parallel decoding mode, high precision motion vector mode, and intra only mode. 4.1 Syntax uncompressed_header() { FRAME_MARKER /* equal to 0b10 */ version high profile = (high << 1) + version if (profile == 3) RESERVED_ZERO /* equal to 0 */ show_existing_frame if (show_existing_frame) { index_of_frame_to_show return } frame_type show_frame error_resilient_mode if (frame_type == KEY_FRAME /* 0 */) { SYNC_CODE /* equal to 0x498342 */ bitdepth_colorspace_sampling(profile) refresh_frame_flags = 0xFF Copyright © 2015 Google Descriptor f(2) u(1) u(1) f(1) u(1) u(3) u(1) u(1) u(1) u(24) Page 5

frame_size() } else { if (!show_frame) intra_only else intra_only = false if (!error_resilent_mode) reset_frame_context else reset_frame_context = 0 if (intra_only) { SYNC_CODE /* equal to 0x498342 */ if (profile > 0) bitdepth_colorspace_sampling(profile) refresh_frame_flags frame_size() } else { refresh_frame_flags for (i = 0; i < REFS_PER_FRAME; ++i) { frame_refs[i] ref_frame_sign_biases[i] } frame_size_from_refs() high_precision_mv interp_filter() } } u(1) u(2) u(24) u(REF_FRAMES = 8) u(REF_FRAMES = 8) u(3) u(1) u(1) Copyright © 2015 Google Page 6

if (!error_resilient_mode) { refresh_frame_context frame_parallel_decoding_mode } else { refresh_frame_context = 0 frame_parallel_decoding_mode = 1 } frame_context_index loop_filter() quantization() segmentation() tiles(width) compressed_header_size_in_bytes } u(1) u(1) u(2) u(16) 4.2 Semantics profile VP9 supports four profiles: Profile Bit Depth SRGB Colorspace Support Chroma Subsampling 0 1 No Yes 8 8 2 3 10 or 12 10 or 12 No Yes YUV 4:2:0 YUV 4:2:2, YUV 4:4:0 or YUV 4:4:41 YUV 4:2:0 YUV 4:2:2, YUV 4:4:0 or YUV 4:4:41 1If colorspace is SRGB, then Chroma subsampling is YUV 4:4:4. show_existing_frame if set to 1, indicates the frame indexed by index_of_frame_to_show is to be displayed, otherwise proceeds for further processing. Copyright © 2015 Google Page 7

资料库

vp9-bitstream_superframe-and-uncompressed-header_v1.0.pdf

相关推荐

开发技术

热门标签

最新资料