WO2004105399A1

WO2004105399A1 - Method and apparatus for video compression

Info

Publication number: WO2004105399A1
Application number: PCT/NO2004/000121
Authority: WO
Inventors: Tom-Ivar Johansen; Gisle BJØNTEGAARD
Original assignee: Tandberg Telecom As
Priority date: 2003-05-22
Filing date: 2004-04-29
Publication date: 2004-12-02
Also published as: US20040233993A1; CN100559883C; US20100166059A1; JP2007502595A; JP4773966B2; US7684489B2; CN1795681A; EP1625753A1; NO319007B1; NO20032319D0

Abstract

The invention is related to handling various picture resolution in an extended version of the compression standard H.264/AVC or other similar standards. The present invention provides an extension of the standard to include formats like the above-described 4:2:2 and 4:4:4. The method is based on the way chrominance is already treated in H.264/AVC.

Description

Method and apparatus for video compression

Field of the invention

The invention is related to handling various picture resolution in an extended version of the compression standard H.264/AVC or other similar standards.

Background of the invention

Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings, TV broadcasting and video telephony.

However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte) or more. Such uncompressed video data results in large bit volumes, and can not be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.

Thus, enabling real time video transmission requires a large extent of data compression. Data compression may, however, compromise with picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.

In video compression systems, the main goal is to represent the video information with as little capacity as possible. Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, the main goal is to reduce the number of bits.

The most common video coding method is described in the MPEG* and H.26* standards. The video data undergo four main processes before transmission, namely prediction, transformation, quantization and entropy coding.

The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The prediction is mainly based on vectors representing movements. The prediction process is typically performed on square block sizes (e.g. 16x16 pixels).

Note that in some cases, like in H.264/AVC predictions of pixels based on the adjacent pixels in the same picture rather than pixels of preceding pictures are used. This is referred to as intra prediction, as opposed to inter prediction. The pixels marked as bold in figure 4 are such nearby pixels. In H.264/AVC, there are many different modes for doing such prediction both for luminance blocks and chrominance blocks. One of the prediction modes is called DC-prediction. It predicts all pixels in a block to have the same value. When we take into account the characteristics of the particular transform that is used for residual coding it means that only the .DC coefficient of the residual block data is changed compared to transformation of the block data without prediction. All AC-coefficients are unchanged. For this reason the prediction mode is named DC-prediction.

The residual represented as a block of data (e.g. 4x4 pixels) still contains internal correlation. A well-known method of taking advantage of this is to perform a two dimensional block transform. In H.263 an 8x8 Discrete Cosine Transform (DCT) is used, whereas H.264 uses a 4x4 integer type transform. This transforms 4x4 pixels into 4x4 transform coefficients and they can usually be represented by fewer bits than the pixel representation. Transform of a 4x4 array of pixels with internal correlation will probability result in a 4x4 block of transform coefficients with much fewer non-zero values than the original 4x4 pixel block.

A macro block is a part of the picture consisting of several sub blocks for luminance (luma) as well as for chrominance (chroma) .

There are typically two chrominance components (Cr, Cb) with half the resolution both horizontally and vertically compared with luminance. This is in contrast to for instance RGB (red, green, blue) which is typically the representation used in the camera sensor and the monitor display. In figure 1, the macro block consists of 16x16 luminance pixels and two chrominance components with 8x8 pixels each. Each of the components is further broken down into 4x4 blocks, which are represented by the small squares. For coding purposes, both luma and chroma 4x4 blocks are grouped together in 8x8 sub blocks and designated Y0-Y3 and Cr, Cb. The chroma part of this format is in some contexts denoted as 4:2:0, and is shown to the left in figure 2. The abbreviation is not very self- explanatory. It means that the chrominance has half the resolution of luminance horizontally as well as vertically. For the conventional video format CIF, this means that a luminance frame has 352x288 pixels whereas each of the chrominance components has 176x144 pixels.

In an alternative format, denoted 4:2:2 and shown in the middle part of figure 2, chrominance has half of the luminance resolution in the horizontal direction and the same resolution as luminance in the vertical direction. This format is typically used for high quality interlaced TV signals where the interlace structure causes some challenges for use of half chrominance resolution vertically. In still an alternative format, denoted 4:4:4 and shown to the right in figure 2, that luminance and chrominance signals have the same resolution both in the horizontal and vertical direction. One typical area of application is graphics material where colors are used in a way such that it is desirable to have the same resolution for chrominance as for luminance.

From the patent literature there are examples disclosing video encoding/decoding and methods of compression. In particular the patent US 6,256,347 Bl (Yu et al.) should be mentioned, which discloses a image processor that receives prediction error values from decompressed MPEG coded digital video signals in the form of pixel blocks containing luminance and chrominance data in a 4:2:2 or 4:2:0 format and recompresses the pixel blocks to a predetermined resolution. Luminance and chrominance data are processed with different compression laws during recompression. Luminance data are recompressed to an average of six bits per pixel, whereas chrominance data are recompressed to an average of four bits per pixel. Thus this patent discloses a method for bit compression of data on 4:2:2 and 4:2:0 formats, and hence is not a general method applying a plurality of formats.

Further it should be mentioned that US 2003/0043921 Al (Dufour et al.) is disclosing a method for video encoding applied to an input signal IS which is including a sequence of frames represented by a luminance matrix and two chrominance matrices.

As mentioned most video coding standards are mainly designed for 4:2:0. MPEG2 professional profile covers

4:2:2 using a special chrominance block arrangement. The same is true for H.263. Generally this means that each format needs a special solution. Summary of the invention

It is an object of the present invention to provide a unified solution to coding/decoding of different video formats like 4:2:0, 4:2:2 and 4:4:4.

In particular, the present invention provides a method of video coding for transforming a first mxn macro block of residual chrominance pixel values of moving pictures by a first integer-transform function generating a corresponding second mxn macro block of integer-transform coefficients, then further transforming DC values of the integer- transform coefficients by a second integer-transform function generating a third block of integer-transformed DC coefficients, wherein the method further includes the steps of generating the second mxn macro block of integer- transform coefficients by utilizing a kxk integer-transform function on each kxk sub-blocks of the first mxn macro block, wherein n and m is a multiple of k, and then generating the third block of coefficients by utilizing a second ixj integer-transform function on the DC values resulting in a (m/k)x(n/k) third block of integer- transformed DC coefficients.

The present invention also provides a method of video decoding, being an inverted version of the method of video coding.

Brief description of the drawings

In order to make the invention more readily understandable; the discussion that follows will refer to the accompanying drawing.

Figure 1 shows how a macro block with the 4:2:0 format of 16x16 luma pixels and two chroma components with 8x8 pixels each are divided into 4x4 blocks which in turn are arranged in subgroups of four 4x4 blocks. It is also shown how DC coefficients are extracted from each of 4 chroma blocks to form separate chroma dc elements consisting of 2x2 blocks

Figure 2 shows one component of chroma pixels in a macro block of different picture formats,

Figure 3 shows a second level transform of DC values for different formats,

Figure 4 indicates the basis of a DC prediction of an 8x16 block.

Detailed description of the present invention

The present invention provides an extension of the

H.264/AVC video coding standard to include formats like the above-described 4:2:2 and 4:4:4. The method is based on the way chrominance is already treated in H.264/AVC. A macroblock consist of a part of the picture with 16x16 luminance pixels and two chrominance components with 8x8 pixels each. This is illustrated in the figure 2 marked 4:2:0.

The description is mainly related to the encoding process. However, this has implications to how decoding must be performed. This means for instance that if transformation is performed on two levels at the encoder, the decoder must perform inverse transformation on two levels. Generally the word "coding" is often used as a short expression to include the whole process of encoding and decoding. The invention covers the whole coding process which is defined to contain both encoding and decoding.

The first aspect of the present invention is related to the coding describing the residual signal. In H.264/AVC the chrominance residual signal is described with two level transforms. The 4:2:0 box in figure 2 indicates that the 8x8 pixel chrominance block is divided into 4x4 pixel sub- blocks. The residual signal in each of the 4x4 sub-blocks undergo a 4x4 transformation resulting in one DC coefficient and 15 AC coefficients. The DC coefficient represents the average value over the 4x4 block.

According to the first aspect of the present invention, the 4x4 block size of the first transform of the chrominance residual signal is maintained. The number of such sub- blocks will then be different for the different picture formats. In a general denotation, a kxk transform is used on a macro block of mxn (m in the horizontal direction, n in the vertical direction) chrominance pixels.

A further transformation of the DC coefficients of each of the 4x4 blocks undergo a 2x2 transform as indicated in figure 3. In the general case, an ixj transform for the DC coefficients is used, i and j will have values such that ixk = horizontal number of chrominance pixels in a macroblock and jxk = vertical number of chrominance pixels in a macroblock. The transform type is preferably chosen to be two-dimensional Hadamard transform.

The present invention also relates to intra prediction part of the coding. In a preferred embodiment of the invention, DC-prediction for the 4:2:2 format is provided. DC- prediction predicts one value for a whole block. In this case we want to predict one value for all the pixels in an 8x16 block from the neighboring, already coded and decoded pixels. This is indicated in figure 4 where the 8x16 shall be predicted from the 24 neighboring pixels in bold.

A natural prediction would be to take the average of all 24 bold pixels:

Prediction = Sum(24 neighboring pixels) /24

However, it is desirable to avoid the division by 24. Therefore we use the following definition: Prediction = (2xSum(8 pixels above) + Sum(16 pixels to the left) ) /32

In this way, the division by 32 can easily be implemented with a shift operation.

To take advantage of the shift operation in the general case, the DC-prediction has to be executed on rectangular blocks of size 2^q x 2^r where q and r are integers. q > r and q is defined to represent a first dimension of the block and r is defined to represent a second dimension of the block. The first dimension may represent the vertical size and the second dimension may represent the horizontal size of the block or visa versa. DC prediction of the block is formed as:

Prediction = (Sum (neighboring pixels to the first dimension) + 2^(q-r,x (Sum (neighboring pixels to the second dimension) ) /2^(q+1)

It follows from the discussion above that m=2^q and n=2^r.

With the present invention, the first level transform is kept unchanged in the sense that the chrominance pixels of a macroblock is divided into 4x4 subblocks as indicated in 4:2:2 and 4:4:4 of figure 2 and each subblock undergo a 4x4 transform. The second level transform of DC coefficients will be of size 2x4 and 4x4 for the two higher formats as depicted in figure 3. Hence the main difference between coding the different formats is the second order residual chrominance transform.

Note that the scope of the present invention is not limited to the H.264/AVC. It could advantageously also be utilized in connection with other video coding standards like e.g. SIP.

Claims

P a t e n t c l a i m s

1. A method of video coding for transforming a first mxn macro block of residual chrominance pixel values of moving pictures by a first integer-transform function generating a corresponding second mxn macro block of integer-transform coefficients, then further transforming DC values of the integer-transform coefficients by a second integer- transform function generating a third block of integer- transformed DC coefficients, c h a r a c t e r i z e d i n

generating the second mxn macro block of integer- transform coefficients by utilizing a kxk integer- transform function on each kxk sub-blocks of the first mxn macro block, wherein n and is a multiple of k,

generating the third block of coefficients by utilizing a second ixj integer-transform function on the DC values resulting in a (m/k)x(n/k) third block of integer-transformed DC coefficients.

2. A method according to claim 1, c h a r a c t e r i z e d i n that k=4, m=8 and n=8.

3. A method according to claim 1, c h a r a c t e r i z e d i n that k=4, m=16 and n=8.

4. A method according to claim 1, c h a r a c t e r i z e d i n that k=4, m=16 and n=16.

5. A method according to one of the preceding claims, c h a r a c t e r i z e d i n that i=m/k and j=n/k.

6. A method according to one of the preceding claims, c h a r a c t e r i z e d i n that the video coding is implemented according to the H.264 standard.

7. A method according to one of the preceding claims, c h a r a c t e r i z e d i n that the integer- transform function is a Hadamard transform.

8. A method according to claim 1, c h a r a c t e r i z e d i n that further includes the following steps :

predicting one DC value associated with the first mxn macro block by means of m above-lying neighboring pixels (alnp) and n left-lying neighboring pixels (llnp) according to the following expression:

(2x(sum of m alnp) + (sum of n llnp))/2xm

9. A method according to claim 8, c h a r a c t e r i z e d i n that m=8 and n=16 and that the division in the expression is executed by a shift operation.

10. A method of video decoding for transforming a first block of integer-transformed DC coefficients by a first inverse integer-transform function generating a number of DC values of a first mxn macro block of integer-transform coefficients which in turn are transformed by a second inverse integer-transform function generating a second mxn macro block of residual chrominance pixel values of moving pictures, c h a r a c t e r i z e d i n

generating the number of DC values of the first mxn macro block of integer-transform coefficients by utilizing a first ixj inverse integer-transform function on the first block of integer-transformed DC coefficients,

generating the second mxn macro block of residual chrominance pixel values by utilizing a kxk inverse integer-transform function on each kxk sub-blocks of the first mxn macro block of integer-transform coefficients, wherein n and m is a multiple of k, and the first block of integer-transformed DC coefficients is of the size (m/k)x(n/k).

11. A method according to claim 10, c h a r a c t e r i z e d i n that k=4, m=8 and n=8.

12. A method according to claim 10, c h a r a c t e r i z e d i n that k=4, m=16 and n=8.

13. A method according to claim 10, c h a r a c t e r i z e d i n that k=4, m=16 and n=16.

14. A method according to one of the preceding claims, c h a r a c t e r i z e d i n that i=m/k and j=n/k.

15. A method according to one of the claims 10 - 14, c h a r a c t e r i z e d i n that the video coding is implemented according to the H.264 standard.

16. A method according to one of the claims 10 - 15, c h a r a c t e r i z e d i n that the integer- transform function is a Hadamard transform.