WO2000040031A1

WO2000040031A1 - Method and device for encoding a video signal

Info

Publication number: WO2000040031A1
Application number: PCT/EP1999/010199
Authority: WO
Inventors: Françoise Groliere; Eric Barrau
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 1998-12-29
Filing date: 1999-12-17
Publication date: 2000-07-06
Also published as: KR20010041441A; JP2002534863A; EP1057343A1

Abstract

In video communication, quality and delay of the transmission depend on the bit rate control strategy that allows to fit the number of bits generated for a given setting of the coder with the bandwidth. In order to give appropriate guidelines to the coder, a preanalysis step, based on approximate predictions (by means of an empirical law for the motion information and a use of statistics for the content of the picture), is consequently carried out to predict this generated number of bits, and followed by a decision step provided for adjusting the quantization step size and the rate of the successive pictures.

Description

Method and device for encoding a video signal.

The present invention relates to a method of encoding the successive pictures of a video signal, comprising the steps of subdividing each successive picture into a plurality of sub-pictures, transforming each sub-picture into coefficients, quantizing said coefficients with an applied step size, coding said quantized coefficients, and controlling the step size in conformity with a target value for the number of bits for encoding each successive picture. The invention also relates to a corresponding device. This invention is particularly adapted to real-time communication at low bit rate according to the so-called H.263 recommendation.

In low bit rate applications (videophony, videoconferencing), a system for image compression such as proposed in the H.263 standard is recommended. A video encoder according to said standard is for instance described in the book "Motion estimation algorithms for video compression", B. Furht and al., Kluwer Academic Publishers, 1997, chapter 2, pp.30-35. An H.263 video encoder such as illustrated in Fig.l is based on a motion- compensated prediction from a previous image to the current one, followed by an orthogonal transformation (such as DCT), which reduces the spatial redundancy of the pictures by decorrelating the picture elements and concentrating the energy into a few low order coefficients, a quantization, and an encoding operation for encoding the prediction error thus transformed and quantized. At least the first picture is a reference one, encoded without temporal prediction (i.e. according to an "intra mode", or I mode), and from time to time one further picture in every n pictures may be also coded according to said intra mode. The other type of picture, the P one, corresponds to P pictures, i.e. to pictures that are temporally predicted from earlier pictures.

As shown in Fig.l, the successive pictures P regularly arrive at the input of the encoding device (input video). A coding branch including a discrete cosine transform circuit 12, a quantization circuit 13 and a variable-length encoder 14 (DCT, Q, VLC respectively) processes these pictures (the circuit 12 receives in fact the difference between the input pictures and predicted ones available at the output of a subtracter 25) and sends the obtained, coded variable bit rate bitstream to a buffer 15, the output of which is the output constant bit rate bitstream of the H.263 video encoder. Said output bitstream is also sent to a bitrate control circuit 30 for buffer regulation. A prediction branch is provided and comprises in series an inverse quantization circuit 21 (Q^"1), an inverse DCT transform circuit 22 (DCT¹), an adder 23 (delivering the reconstructed previous picture RPP), a temporal prediction circuit 24 (delivering a predicted picture PP, sent to the subtracter 15, and motion vectors MV, sent to the variable-length encoder 14, the prediction being based on a block-matching search carried out between the current picture CP, available at the output of a picture skipping circuit 26, and the reconstructed one RPP, available at the output of the adder 23), and the subtracter 25. The output of the temporal prediction circuit 24 is also sent back towards the other input of the adder 23 in view of the reconstruction of the previous picture RPP used for the temporal prediction. The H.263 standard defines a hierarchical bitstream syntax with four layers in said hierarchy : picture level, group of blocks level (GOB), macroblock level (MB), and block level (8 8 picture elements, or pixels), the block being the elementary unit over which DCT operates. A macroblock includes four luminance blocks (covering a 16 x 16 area in a picture) and two chrominance blocks. The motion estimation and compensation implemented on the reconstructed previous picture RPP in the circuit 24 of the prediction branch operate on macroblocks. A feedback connection 31 between the buffer 15 and the quantization circuit 13 allows to obtain a finer or a coarser quantization. The coarseness of the quantization is defined by a quantization parameter for the first three layers (blocks, macroblocks, GOBs) and a fixed quantization matrix which sets the relative coarseness of quantization for each DCT coefficient. The picture skipping circuit 26 provided at the input of the encoder may also be used as a possible way to reduce the bit rate (while keeping an acceptable picture quality). The number of skipped pictures is variable and depends on the output buffer fullness, and the feedback connection 31 provided for buffer regulation is therefore related not only to quantization step size variations but also to picture skipping (and also to an intra/inter selection, which is controlled by a circuit 41 actuating or not a first switch 42 and a second switch 43).

This feedback connection allows to give guidelines to the encoder, for which the problem of bit rate control can indeed be formulated as follows : given a predetermined bit rate and an input picture, how to decide what encoder setting has to be chosen ? It would be possible either to have a constant picture rate approach, according to which the pictures are periodically grabbed and the quality is adapted to the complexity of each successive picture (in order to maintain the targeted bit rate) and therefore highly variable from a picture to another one, or to have a constant quality approach, according to which the pictures are processed with a fixed quantization step but only when the encoder has finished the processing of the previous picture, i.e. at a highly variable picture rate adapted to the complexities of the pictures.

It is therefore an object of the invention to propose an improved encoding method in which a trade-off between picture rate and quality is researched, while taking also into account the fact that the delay between the input pictures and the displayed ones has to be well controlled (as constant as possible) in order to ensure a restitution of any scene as regular as possible.

To this end the invention relates to a method such as described in the preamble of the description and in which said controlling step comprises a pre-analysis sub-step, based on an estimation of the number of bits respectively used for coding motion information between previous and current pictures and for coding said coefficients, and a decision sub- step, provided for adjusting the quantizing step size and the rate of the successive pictures.

The particular aspects of the invention will now be explained with reference to the embodiment described hereinafter and considered in connection with the accompanying drawings, in which:

Fig.1 shows a basic video compression scheme according to the H.263 standard;

Fig.2 shows the general structure of an encoder according to the invention; Fig.3 shows how the scheme of Fig.2 works.

In a coding chain such as shown in Fig.l, the starting point of the feedback control carried out thanks to the output buffer 15 is to fit the number of bits which will be used to code each successive current picture to the number of bits available on the transmission channel. When studying the bit generation during the coding operation of each picture, it appears that the biggest amount of bits to be transmitted is related to the content of the picture. In fact, bits are used for coding information at the picture level, GOB level, macroblock level and block level, but the most expensive part in term of number of bits generated are macroblock and block levels, which concern motion estimation and DCT coefficient coding, totally dependent of the complexity of the current picture. In such a coding chain, the quality of the communication depends on a variable, quantifier-dependent part of the generated bits. According to the invention, a pre-analysis of the current image to roughly predict how many bits will be generated according to the encoder's setting is provided. Said pre-analysis, described hereunder in a more detailed manner, allows to predict the number of bits generated for each possible quantization step of the encoder (preanalysis step). It is followed by a decision step in which, after having compared said number of bits to the desired one, a setting for the encoder is found. If the corresponding quantization step is in accordance with a previously set quality range, the picture is coded with it (it means that a transmission is not authorized when the quality is too bad, which corresponds to a quantifier step size too great). Otherwise the worst authorized quantization step is first chosen and a decreased picture rate is computed (and chosen thanks to the picture skipping circuit 26), in order to meet the bandwidth requirements. Then the computed setting is used for the encoding process of the picture. After the encoding operation of each GOB, the bit rate control allowed by the feedback connection for buffer regulation checks if a discrepancy between predicted and desired numbers of bits has appeared during said encoding operation, and, if necessary, the setting of the encoder is modified by modifying the quantization step (the only authorized changes are plus or minus one) between two consecutive GOBs.

The generated bits can be split into two parts, a first one corresponding to the headers and a second one corresponding to the real content of the current picture. The computation of the first part is easy, but that of the second one is more complex.

In order to transmit the content of a picture, two kinds of information are indeed needed : (a) the information of motion between the current picture and the previous one, and (b) the chrominance and luminance variations between a current macroblock and the corresponding one in the previous picture.

A first pre-analysis sub-step, related to the motion information, is based on an approximate prediction of the number of bits needed to code the motion vectors. More precisely, it has been found that an empirical law linking the mean motion of the complete picture and the number of bits needed to code all the motion data could be established. This law can be expressed in the form of the following equation (1):

PNB = 4. N . log(1000.Mean_mv/N) (1) where:

N = number of macroblocks having a non null motion vector;

Mean_mv = (sum of motion vectors )/(number of macroblocks per picture); PNB = estimated number of bits used to code the motion information of a picture (related to the mean motion of the whole picture). This law has the advantage to be simple and rapid.

A second pre-analysis sub-step, related to the number of bits to code DCT coefficients, is based on a prediction done at the macroblock level. For each quantization step, statistics of the number of bits used versus the sum of absolute differences between luminance data of current and reference macroblocks (= SAD, which is the correlation measure between the original macroblock in the current picture and the displaced macroblock in the previous reconstructed picture, according to the relation (2):

SADN , y) = - previous(i-x,j - y)\ (2)

with x,y = displacement coordinates and N = size of the block, generally 8 or 16) have been done, and it appears that the number of bits has not a linear variation, with respect to the SAD.

This could be due to the coding mode of the DCT coefficients in H.263 standard, which is a variable length coding mode for the more frequent coefficients and a fixed length coding one for the others : this could explain why, for each quantization step, three different areas, one for small SAD, one for medium, and another one for big SAD, are observed.

This leads to make different predictions, depending on the SAD area : a compromise has been made for three regions which work with the total range of the quantization step (1 to 31). Three order polynomial approximation laws have been computed for each quantization step for SAD < 500, 500 < SAD < 1000, 1000 < SAD

< 1500, a fixed value being chosen for SAD >1500. Said compromise between accuracy and computing complexity seems to be rather good.

In practice, just at the beginning of the coding, the motion estimation of the whole picture is done. The SAD of each macroblock being known, the predicted number of bits is then computed for each quantization step, in order to determine which quantization step gives the prediction closest to the targeted number of bits. If the computed quantization step is too high in term of minimal quality, the quantization step is set to the maximum allowed.

Knowing the number of bits sent on the line, the time to grab the next picture is then computed. In the encoder according to the invention, shown in Fig.2, these pre-analysis sub-steps, referenced with the single reference 201, are followed by a decision sub-step 202.

The reference "coder 203" designates the association of all the elements of Fig.l, except the buffer 15 and the bitrate control circuit 30.

With respect to the basic scheme of Fig.l, Fig.2 illustrates at what level the above-described preanalysis acts in the coding chain and Fig.3 shows how the scheme of Fig.2

(i.e. the bit rate control according to the invention) is working. It must be indicated that said regulation is here implemented by carrying out a set of software instructions controlling computation steps, test steps, or similar steps, such as now described. The first step 31 is provided for computing the authorized target number of bits T for the input picture, taking into account the bandwidth BW, the fullness of the output buffer FOB and the target frame rate TFR according to a relation (3) of the following type:

„ bandwidth - buffer fullness ...

^{Tb =} 5^-S ^(3> The second step comprises a computing operation 321, provided for computing the predicted number of bits P_n useful for coding the actual picture with the smaller quantizer (or quantization) step which appears to be compatible with the bandwith, the quality and the frame rate. These numbers T_b and P_n are then compared (test operation 322) : if P_n is greater than T_b (output Y), the coding step of Fig.2 will be done with said quantization step, while, if P_n is smaller than T_b (output N), 1 is added to the quantization step, the operation 321 is repeated, and the test operation 322 is repeated with the modified value of P_n.

When P_n is greater than T_b, a quality test 33 is carried out : if the quantizer is under a predetermined quality threshold (Q_n-ι < Q_max), the frame rate FR is equal to the target frame rate (connection 331), while, if it is not the case, a new smaller frame rate is computed (connection 332) according to the predicted number of bits for the minimal authorized quality. In both situations, a coding step 341 is then carried out for coding each group of blocks (GOB).

The last step is provided for checking for each new GOB (test 342 NEW GOB ?) if the coding prediction is in accordance (test PRED OK ?) with the actual coding (sub-step 343). If yes, the next coding step will continue with the same quantizer (connection 344) ; if not, add or subtract 1 (sub-step 345, NEW QUANT = +/-1) to the computed quantizer in view of reducing the drift.

Claims

CLAIMS:

1. A method of coding the successive pictures of a video signal, comprising the steps of:

- subdividing each successive picture into a plurality of sub-pictures;

- transforming each sub-picture into coefficients; - quantizing said coefficients with an applied step size;

- coding said quantized coefficients;

- controlling the step size in conformity with a target value for the number of bits for encoding each successive picture, said controlling step comprising a pre-analysis sub- step, based on an estimation of the number of bits respectively used for coding motion information between previous and current pictures and for coding said coefficients, and a decision sub-step, provided for adjusting the quantizing step size and the rate of the successive pictures.

2. A device for encoding the successive pictures of a video signal, comprising: - means for dividing each successive picture into a plurality of sub-pictures;

- an encoder for encoding successively said sub-pictures, or groups of sub- pictures, said encoder including a picture transformer for transforming each sub-picture into coefficients and a quantizer for quantizing the coefficients with an applied step size;

- control means for controlling the quantization step size in conformity with a target value for the number of bits for encoding the applied picture; said device also comprising a pre-analysis stage, provided for estimating the number of bits respectively used for coding motion information between previous and current pictures and for coding said coefficients, and a decision stage, provided for adjusting the quantization step size and the rate of successive pictures.