WO1998053613A1

WO1998053613A1 - Apparatus, method and computer readable medium for scalable coding of video information

Info

Publication number: WO1998053613A1
Application number: PCT/US1998/008193
Authority: WO
Inventors: Marshall A. Robers; Mark R. Banham; Aggelos K. Katsaggelos
Original assignee: Motorola Inc.
Priority date: 1997-05-20
Filing date: 1998-04-21
Publication date: 1998-11-26

Abstract

A scalable video coding method incorporating scan-based coding (104, 106) of DCT coefficients of both INTRA and INTER macroblocks, which defines motion compensation (102) from a predetermined base-layer to eliminate drift between decoder and encoder. This method also includes the use of scan-adaptive VLCs (108) to improve compression efficiency. The method permits the encoding of video sequences at similar quality and rates to the non-scalable H.263 standard, with minor departures from that standard, resulting in the generation of progressive bitstreams for use in many different applications requiring scalability.

Description

APPARATUS, METHOD AND COMPUTER READABLE MEDIUM FOR SCALABLE CODING OF VIDEO INFORMATION

Field of the Invention

This invention relates to video compression and coding

techniques, and more specifically, to an apparatus, method and computer readable medium for scalable coding of video information.

Background of the Invention

Many applications requiring the transmission and/or storage of digital video information are limited by the available bandwidth of the system. A variety of applications such as surveillance, public safety, and video database browsing can thus benefit from the ability to transmit or decode a low resolution rendition of a high quality video scene. This low

resolution rendition, however, is not always sufficient to meet the needs of end users. Often a high quality video sequence is needed to gain more information from the source. The ability to create both a low

resolution video sequence, and higher resolution sequence from a single bitstream can be very useful for the applications mentioned.

Rendering multiple levels of quality from a single bitstream addresses

the needs of limited encoding complexity and reduced overall disk storage space, and permits novel functionalities such as streaming video at different levels of quality depending on available network bandwidth.

Currently, there does not exist a very efficient coding method for digital video data with multiple qualities extractable from a single encoded bitstream, which can leverage the technology in existing standardized video codecs. An apparatus, method, and computer readable medium designed to efficiently perform scalability utilizing the platform of existing standardized video codecs would solve many problems for applications needing scalable video.

Brief Description of the Drawings

FIG. 1 is a flow chart illustrating one preferred embodiment of steps of a method in accordance with the present invention.

FIG. 2 is a diagram illustrating spectral scan parameters and quantization scan parameters of one preferred embodiment of a method in accordance with the present invention.

FIG. 3 is a block diagram of one preferred embodiment of an

apparatus for scalable coding of a plurality of video frames in accordance with the present invention. FIG. 4 is a diagrammatic representation of one preferred embodiment of a computer readable medium for scalable coding of video information in accordance with the present invention.

FIG. 5 is another preferred embodiment of a flow chart for a method for scalable coding of video information, the video information having a plurality of video frames, in accordance with the present invention.

Detailed Description of a Preferred Embodiment

This invention involves scalable encoding and decoding of 8 x 8 blocks of discrete cosine transform (DCT) coefficients for both INTRA and INTER coded blocks. INTRA coded blocks are those blocks of video data which do not utilize any temporal prediction from prior frames in the video sequence. INTER coded blocks have a prediction from a prior

frame, and a prediction error which is coded with the DCT. This method

can be applied within the structure of the ITU-T H.263 standard for video

coding at low bitrates. The present invention uses a type of scalability

known as SNR (signal-to-noise-ratio) scalability (to differentiate it from spatial and temporal scalabilities which involve changes in spatial and temporal resolution). The novelty of the present invention is found at the block level of the H.263 syntax, where it defines multiple scans, or layers, of refinement for the DCT coefficients of the displaced frame difference (DFD) INTER block, or INTRA block being coded. This scalable method allows flexibility in defining the scans, and both the number of scans and the content of each scan can be varied.

Video coding at low bitrates requires a compression technique which utilizes the temporal redundancy of a video sequence (i.e., the strong correlation of consecutive frames). Most video coding schemes include a block matching technique for motion estimation and compensation. The task of block matching becomes more difficult within the context of a scalable video coder because motion compensation requires the use of the previous reconstructed frame. An encoder using this methodology explicitly has a decoder in its coding loop. A decoder may or may not decode all layers of quality of a scalably encoded previous reconstructed frame. It is, thus, necessary to guarantee that the previous reconstructed frame used for prediction in the encoder is the

same for all possible subsets of the overall compressed stream. For this reason, motion compensation within the encoder (i.e., determination of

the DFD) of the present invention is based on the previous reconstructed

frame found in the minimum subset of the compressed scalable

bitstream. This minimum subset is called the base-layer, and it is determined by the expected minimum bandwidth channel for a specific application. Using the base-layer for the encoder's motion compensation guarantees that the motion compensation process can be exactly

duplicated in the decoder.

FIG. 1 , numeral 100, is an overall block diagram of a preferred embodiment of a method for scalable encoding. The encoding process includes a determination of a target number of bits to spend on a macroblock which will be scalably encoded (102). The parameters specifying how the data in that block shall be partitioned are computed in step (104). These parameters include a spectral scan parameter and a quantization scan parameter for each scan. Multiple scans of coefficients are generated in step (106), and encoded using variable length codes in step (108). Finally, the lowest resolution scan, or base-layer, is extracted in the encoder for use in prediction of the next frame (110).

This invention defines a partitioning approach for DCT coefficients of video frames. Still image compression using the "progressive" mode

of the JPEG standard is related to this partitioning approach. In JPEG,

blocks of "still images" are compressed by breaking up the DCT data

into predetermined groups of coefficients. In this invention, however, the

partitioning approach is applied adaptively to DCT coefficients represented by the block layer of the syntax of a video bitstream. The partitioning approach involves specifying a set of scans, which are subsets of the set of DCT coefficients associated with a block of video data. These scans are then encoded separately, permitting a decoder to extract one, some, or all of the scans associated with the DCT data to produce video of varying qualities. The application and design of this method for video compression requires significant departure from the application of scalable DCT coding to still images. The methods for defining the DCT coefficient scans in this invention are given next, and can be seen graphically in FIG. 2, numeral 200.

Spectral scan selection involves transmitting a subset of an 8 x 8 block of DCT coefficients in a particular scan. In spectral scan selection, some of the 64 DCT coefficients are sent in their entirety (i.e., all bits of magnitude precision), and no information is sent about the other DCT coefficients. The DCT tends to decorrelate a block of values so that the majority of the data required for perceptually lossless compression is contained in the low frequency coefficients. Therefore, appropriate use of spectral scan selection for video involves transmitting low frequency DCT

coefficients in the first scans and higher frequency DCT coefficients in

subsequent scans. A graphical representation of a typical scan definition for a single 8 x 8 block of DCT coefficients using spectral scan selection

can be found in FIG. 2, numeral 202. In this figure, the 64 coefficients are ordered from top to bottom, and the significant bits of each coefficient (Most Significant Bit (MSB) to Least Significant Bit (LSB)) are ordered from left to right.

A second method for partitioning a block of DCT coefficients is bit plane coding. In this scheme, the coefficients are refined in precision

(i.e., magnitude) in the various scans. Thus, a base-layer constructed using bit plane coding would contain the most significant bits for all 64

DCT coefficients. Subsequent scans, which contain less significant bits than the base-layer, would then refine the magnitudes of the DCT coefficients. The enhancement scans only contain useful information if accompanied by all previous scans; i.e., the LSB contains useful information only if all other bits are known. The adjustment of the precision of these coefficients is equivalent to varying the quantization of each coefficient. The bit plane coding of coefficients is controlled by a scan quantization parameter. A graphical representation of a typical scan

definition for a single 8 x 8 block of DCT coefficients using bit plane coding is seen in FIG. 2, numeral 204.

A third and final approach for the present scan definition involves

combining spectral scan selection and bit plane coding. This scheme

offers the user increased control over exactly which coefficient

information is contained in each scan. With this hybrid of both approaches, one can define the base-layer as the most significant bits of the lower frequency DCT coefficients. Subsequent scans would refine those coefficients included in the base-layer and begin to include the coefficients for higher frequency coefficients. The final scan would transmit the least significant bits of the high frequency coefficients. A graphical representation of a typical scan definition for a single 8 x 8 block of DCT coefficients using the combined mode of both spectral

scan selection and bit plane coding can be found in FIG. 2, numeral 206.

The flexibility incorporated into the scan definition permits the use of efficient VLCs. Within the H.263 standard, for example, each significant (i.e., nonzero) DCT coefficient is coded using a 3-D VLC determined by the relative frequency of occurrence of each symbol. Each 3-D code corresponds to a specific combination of three different parameters: (1) the run: number of preceding non-significant coefficients, (2) the level: the quantized index corresponding to the value of the significant

coefficient, and (3) a binary value called 'last' which tells if the current coefficient is the last significant coefficient in the block. This invention uses this 3D VLC coding method within the context of scalable video

coding. In order to improve the compression efficiency, scan-dependent VLC tables may be used. More specifically, the relative frequency of each

symbol in the 3-D VLC is dependent on the scan definition. Scan-

dependent VLC tables take advantage of the dependency between each symbol's rate of occurrence and the scan used. The importance of scan- dependent VLC tables can be understood by considering a scan which contains only the LSB for a group of DCT coefficients. For this scan, the allowed values for the level can be reduced to a binary value instead of a range of values, thus improving the efficiency of that code.

When designing a video transmission scheme for real-time communication channels, practical limits are set on the allowable bandwidth of the encoded video subsets. Thus, the partitioning of the DFD and INTRA block data using both spectral scan selection and bit plane coding must be adaptive so the bitrate constraints can be met. This invention provides a method for defining the scan parameters in order to obtain the desired bitrates, given a predetermined rate control system to adjust the overall DCT quantization stepsize and the coded framerate.

The overall DCT quantization stepsize and the coded framerate are adjusted based on the desired bitrate for all scans combined. The approach for selecting and modifying both the overall DCT quantization

stepsize and the coded framerate can be any standard procedure based on buffer management. The adjustments to the frame rate, and the

quantization step sizes assume the existence of a channel which can

transmit at a constant rate. In other words, the input buffer is assumed to

empty at a constant rate. The coded framerate is regulated by a procedure which is executed every time that a frame is coded. This type of rate control is a common part of most existing motion compensated block-DCT based video codecs.

In order to partition a block of DCT coefficients after selection of the coded frame and quantization of those coefficients, this invention

divides the total incoming bits into subsets of specified sizes. The basic idea of the method is to change the boundaries of the scans based on the target bitrates for each of the scans. This method uses maximum predetermined bitrates for each scan. The modification of the scan parameters can be executed at any macroblock boundary, or any time the overall DCT quantization stepsize can be adjusted within the syntax of the video bitstream.

In order to dynamically modify the scan parameters, they must first be explicitly specified. The dynamic approach of this invention

parameterizes the boundaries between each scan. This method can be used for any number of scans; here, an example is provided based on a

video sequence with three scans per block of DCT coefficients (see Table 1). Note that Scan 3 contains the uncoded LSBs from all DCT

coefficients. This division into three subsets yields three parameters

(A,B, and X) which the method dynamically adjusts.

Table 1 : Example Parameterized Coefficient Scan Definitions

This partitioning scheme changes the scan parameters based on the number of bits spent on each scan during the previous frame. In other words, buffers are maintained for each scan which hold the bits used for representing the previous frame. As each macroblock line in the new frame is coded, bits are added to the appropriate buffers and the bits spent on that macroblock line in the previous frame are removed. The number of bits in these scan buffers at the end of each macroblock line can be used to calculate the error from the target bits for each scan. This is defined as Target Bit Error (TBE):

TBE(j) = Bits_ln_Buffer(j) - Target_Bits_Per_FrameG),

where the argument j is used to indicate the current scan number. The

target number of bits per frame depends on the coded framerate, and is set by the predetermined rate control common to existing motion compensated block-DCT based video codecs.

Each TBE is normalized based on the assumption that exceeding the target bitrate by a fixed number of bits requires more significant and immediate correction for a scan with a smaller target bitrate. This normalization produces a Normalized Target Bit Error (NTBE) for each scan. Here,

NTBEQ)=TBEG)/ Target_Bits_Per_Frame(j),

Finally, the TBE's are compared to determine if the scan parameters need to be adjusted. This is done by calculating three scan differences

(Δ(i,j)) by comparing the NTBE's for each scan. The definition of the

scan differences for the example case with 3 scans is:

Δ(1,2) = NTBE(1) - NTBE(2);

Δ(1,3) = NTBE(1) - NTBE(3);

Δ(2,3) = NTBE(2) - NTBE(3). These Δ(i,j) values are compared to predetermined thresholds

(T(i,j)) which depend on the maximum allowable deviation from the desired scan bitrates. If the threshold is exceeded, the appropriate scan parameter is adjusted, (see Table 2). These scan adjustments must result in a feasible solution for bitstream encoding, and one preferred embodiment is described next. The amount by which A,B, and X are incremented/decremented is chosen to be proportional to the integer

division of Δ(i,j) by T(i,j) by a predetermined proportionality constant. The

magnitude of the scan adjustments is also limited. These limitations prevent the scan parameters from oscillating rapidly and do not pose difficulty for meeting imposed bitrate constraints.

Table 2: Dynamic Adjustment of Scan Parameters The decoder must know of any adjustments to the scan parameters. One preferred embodiment of the coding of the scan

parameters is to encode changes in these parameters only within the bit

field of a Group of Blocks (GOB) header, which is part of the syntax of H.263 within which this preferred embodiment is implemented. The number of bits required for these parameters is minimal since the magnitude of the scan adjustments is been limited. The values of the thresholds, T(i,j), seen in Table 2, is set to 0.15 for all cases. A, B, and X

are changed proportionally to the amount that Δ(i,j) exceeds T(i,j) for each

case.

The scan bit precision parameters, referred to here as the quantization scan parameters, A and B, are limited to take on the values: 0,1 , and 2, and each is permitted to change only by -1 , 0, or +1 at each valid change point. A field of 2 bits is needed to transmit the absolute value of each of these parameters at each GOB header. The spectral scan parameter, X, is permitted to take on the values: -7, -6, -5, -4, -3, -2, -1 , 0, 1 , 2, 3, 4, 5, 6, 7, and is limited to lie within the range [5,35]. A field of 5 bits is coded at each GOB header to transmit the absolute value of the spectral scan parameter. The scan parameters are limited in terms of possible values in order to prevent rapid changes in bitrate within a video frame, and too reduce the number of bits needing to be transmitted

in each encoded frame. A decoder can read the values of the scan

parameters at each GOB header, and adjust the scan definitions before decoding the plurality of scans associated with each block of DCT coefficients. The scan parameters, along with the motion vectors and all administrative information, are transmitted with the base layer.

FIG. 3, numeral 300, is a block diagram of one preferred embodiment of an apparatus for scalable coding of a plurality of video frames. The apparatus comprises a memory unit (302), and a scalable partitioning video processor/ASIC (application specific integrated circuit) (304) coupled to the memory. The scalable partitioning video processor/ASIC (304) initiates a program by sending a control signal (306) to the memory unit (302). The a scalable partitioning video processor/ASIC (304) is responsive to a set of program instructions stored in the memory unit (302), which, when operably coupled to the memory unit (302), determines a plurality of scan parameters (312) for a corresponding plurality of bit rates. The scalable partitioning video processor/ASIC (304) is used to transform a video frame of the plurality of video frames into blocks, typically 8x8, of DCT coefficients (308). The scalable partitioning video processor/ASIC (304) is further responsive to partition the DCT coefficients of each block into a plurality of scans (310),

each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters; and

the scalable partitioning video processor/ASIC is further responsive to

encode each scan of the plurality of scans using predetermined variable

length codewords (314) and outputting coded scan coefficients (318), and, where selected, to further change the scan parameters at predetermined locations in a video frame according to a predetermined rate control scheme (316) in order to effectively reach a target coded bitrate associated with each scan.

FIG. 4, numeral 400, is a diagram of one preferred embodiment of executable instructions and output parameters of a computer readable medium for scalable coding of a plurality of video frames. The computer readable medium (401) stores the plurality of executable instructions (402), the plurality of executable program instructions responsive, when executed, to determine a plurality of scan parameters (404) for a corresponding plurality of bit rates. The executable program instructions also transform a video frame of the plurality of video frames into blocks, typically 8x8, of DCT coefficients (406). The executable program instructions partition the DCT coefficients into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter (408) and a quantization scan parameter (410) of the plurality of scan parameters, and encode each scan of the plurality of scans by selecting

predetermined variable length codewords (412) executable instructions which are typically stored in the medium. The plurality of executable

instructions signal a change (414) in the spectral scan parameter and

the quantization scan parameter of each of the plurality of scan parameters at predetermined locations in a video frame in order to effectively reach a target coded bitrate associated with each scan.

FIG. 5, numeral 500, is another preferred embodiment of a flow chart for a method for scalable coding of video information, the video information having a plurality of video frames, in accordance with the

present invention. The method includes: (a) determining a plurality of scan parameters for a corresponding plurality of bit rates (502); (b) transforming a video frame of the plurality of video frames into transform information (504); (c) partitioning the transform information into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters (506); and (d) encoding each scan of the plurality of scans

(508). Typically, the transform information is a discrete cosine transform value. In one embodiment, encoding step (d) utilizes a plurality of variable length codes.

Where selected, each spectral scan parameter and each

quantization scan parameter of the plurality of scan parameters is altered according to a predetermined adjustment scheme at a plurality of

predetermined points in a video frame of the plurality of video to achieve

each bit rate of the plurality of bitrates (510). The plurality of scans generally includes a first scan having a first spectral scan parameter and a first quantization scan parameter of the plurality of scan parameters, the first spectral scan parameter and the first quantization scan parameter corresponding to a lowest bit rate of the plurality of bit rates. In one embodiment, the first scan of the plurality of scans is used as a basis for motion compensation (512).

From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.

What is claimed is:

Claims

1. A method for scalable coding of video information, the video information having a plurality of video frames, the method comprising:

1A) determining a plurality of scan parameters for a corresponding

plurality of bit rates; 1 B) transforming a video frame of the plurality of video frames into transform information;

1C) partitioning the transform information into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters; and 1 D) encoding each scan of the plurality of scans.

2. The method of claim 1 wherein at least one of 2A-2C:

2A) the transform information is a discrete cosine transform value; 2B) encoding step 1 D utilizes a plurality of variable length codes; and

2C) each spectral scan parameter and each quantization scan

parameter of the plurality of scan parameters is altered according to a predetermined adjustment scheme at a plurality of predetermined points

in a video frame of the plurality of video to achieve each bit rate of the plurality of bitrates.

3. The method of claim 1 wherein the plurality of scans includes a first scan having a first spectral scan parameter and a first quantization scan parameter of the plurality of scan parameters, the first spectral scan

parameter and the first quantization scan parameter corresponding to a lowest bit rate of the plurality of bit rates, and where selected, at least one of 3A-3C:

3A) further comprising:

(e) utilizing the first scan of the plurality of scans as a basis for motion compensation;

3B) wherein the first scan is intracoded; and

3C) wherein the first scan is intercoded.

4. An apparatus for scalable coding of video information, the video information having a plurality of video frames, the apparatus comprising: a memory unit having a stored set of program instructions; and a scalable partitioning video processor/application specific integrated circuit coupled to the memory unit, the a scalable partitioning video processor/application specific integrated circuit responsive to the

set of program instructions, when operably coupled, to determine a plurality of scan parameters for a corresponding plurality of bit rates; to

transform a video frame of the plurality of video frames into transform

information; the scalable partitioning video processor/application specific integrated circuit further responsive to partition the transform information into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters; and the scalable partitioning video processor/application specific integrated circuit is further responsive to

encode each scan of the plurality of scans.

5. The apparatus of claim 4 wherein at least one of 5A-5C:

5A) the scalable partitioning video processor/application specific integrated circuit is a video codec; 5B) the scalable partitioning video processor/application specific integrated circuit is a microprocessor; and

5C) the scalable partitioning video processor/application specific integrated circuit is a digital signal processor.

6. The apparatus of claim 4 wherein at least one of 6A-6C:

6A) the transform information is a discrete cosine transform value;

6B) the scalable partitioning video processor/application

specific integrated circuit is further responsive to encode each scan utilizing a variable length code; and

6C) each spectral scan parameter and each quantization scan

parameter of the plurality of scan parameters is altered according to a predetermined adjustment scheme at a plurality of predetermined points in a video frame of the plurality of video frames to achieve each bit rate of the plurality of bitrates.

7. The apparatus of claim 4 wherein the plurality of scans includes a first scan having a first spectral scan parameter and a first quantization scan parameter of the plurality of scan parameters, the first spectral scan parameter and the first quantization scan parameter corresponding to a lowest bit rate of the plurality of bit rates, and where selected, at least one of 7A-7C: 7A) wherein the scalable partitioning video processor/application specific integrated circuit is further responsive to utilize the first scan of the plurality of scans as a basis for motion

compensation;

7B) wherein the first scan is intracoded; and 7C) wherein the first scan is intercoded.

8. A computer readable medium for scalable coding of video information, the video information having a plurality of video frames, the

computer readable medium storing a plurality of executable instructions, the plurality of executable program instructions responsive, when

executed, to determine a plurality of scan parameters for a

corresponding plurality of bit rates; to transform a video frame of the plurality of video frames into transform information; to partition the transform information into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters; and to encode each scan

of the plurality of scans.

9. The computer readable medium of claim 8 wherein at least one of 9A-9C:

9A) the transform information is a discrete cosine transform value; 9B) the program instructions utilize a variable length code to encode each scan of the plurality of scans; and

9C) each spectral scan parameter and each quantization scan parameter of the plurality of scan parameters is altered according to a predetermined adjustment scheme at a plurality of predetermined points in a video frame of the plurality of video frames to achieve each bit rate of the plurality of bitrates.

10. The computer readable medium of claim 8 wherein the plurality of scans includes a first scan having a first spectral scan parameter and a

first quantization scan parameter of the plurality of scan parameters, the

first spectral scan parameter and the first quantization scan parameter

corresponding to a lowest bit rate of the plurality of bit rates, and where selected, at least one of 10A-10C: 10A) wherein the program instructions utilize the first scan of the plurality of scans as a basis for motion compensation; 10B) wherein the first scan is intracoded; and 10C) wherein the first scan is intercoded