WO2001049038A1

WO2001049038A1 - Method, device and computer programme generation for prediction in encoding an image divided into image blocks

Info

Publication number: WO2001049038A1
Application number: PCT/DE2000/004095
Authority: WO
Inventors: Ralf Buschmann
Original assignee: Siemens Aktiengesellschaft
Priority date: 1999-12-23
Filing date: 2000-11-21
Publication date: 2001-07-05

Abstract

A method for prediction in encoding an image, divided into image blocks, is disclosed, in which at least one image block, in the vicinity of a current image block, is used for prediction, whereby the at least one image block in the vicinity is fixed by a subsequent image block, which follows in the display sequence.

Description

description

Method, arrangement and computer program product for predicting the coding of an image divided into image blocks

The invention relates to a method, an arrangement and a computer program product for predicting the coding of an image divided into m image blocks.

A method for image compression with the associated arrangement is known from [1]. The known method serves as a coding method in the MPEG standard and is essentially based on the hybrid DCT (Discrete Cosine Transformation) with motion compensation. A similar

The procedure is for video telephones with nx 64kbιt / s (CCITT recommendation H.261, H.263), for TV contribution (CCIR recommendation 723) with 34 or 45Mbιt / s and for multimedia applications with 1.2Mbιt / s (ISO-MPEG-1) is used. The hybrid DCT consists of a temporal processing stage that takes advantage of the relationship between successive images and a local processing stage that uses the local correlation within an image.

Local processing (intra-frame coding) essentially corresponds to classic DCT coding. The image is broken down into m blocks of 8x8 pixels, each of which is transformed using DCT m the frequency range. The result is a matrix of 8x8 coefficients, which approximately reflect the two-dimensional spatial frequencies in the transformed image block. A coefficient with frequency 0 (DC component) represents an average gray value of the image block.

After the transformation, data expansion takes place. However, natural image templates become one

Concentration of energy around the DC component (DC value) take place while the high-frequency coefficients are usually almost zero.

In a next step, the coefficients are spectrally weighted, so that the amplitude accuracy of the high-frequency coefficients is reduced. Here, the properties of the human eye are used, which dissolve high spatial frequencies less accurately than low ones.

The data reduction takes the form of an adaptive one

Quantization, by means of which the amplitude accuracy of the coefficients is further reduced or by which the small amplitudes are set to zero. The degree of quantization depends on the fill level of the output buffer: when the buffer is empty, fine quantization takes place so that more data is generated, while when the buffer is full, it is coarser, which reduces the amount of data to be transferred.

After the quantization, the block is scanned diagonally ("zιgzag" scanning), followed by entropy coding, which brings about a further reduction in data. Two effects are used for this:

1.) The statistics of the amplitude values (high amplitude values occur less frequently than small ones), so that long codewords are assigned to the rare events and short codewords to the frequent events (variable-length coding, VLC). This results in a lower data rate on average than with a fixed word length coding. The variable rate of the VLC is then smoothed in the buffer memory.

2.) Take advantage of the fact that most cases are followed by zeros from a certain value on m.

Instead of all of these zeros, only an EOB code (End Of Block) is transmitted, which results in a significant one Coding gain in the compression of the image data leads. Instead of the output rate of, for example, 512bιt, only 46bιt are then to be transmitted for this block, which corresponds to a compression factor of over 11.

Another compression gain is obtained by the temporal prediction (interframe coding). A lower data rate is required for coding differential images than for the original images, because the amplitude values are much lower.

However, the time differences are only small, even if the movements in the picture are small. If, on the other hand, the movements in the picture are large, large differences arise, which in turn are difficult to code. For this reason, the picture-to-picture movement is measured (movement estimation) and compensated before the difference is formed (movement compensation). The motion information is transmitted with the image information, usually only one motion vector per macroblock (e.g. four 8x8 image blocks) is used.

Even smaller amplitude values of the difference images are obtained if a motion-compensated bidirectional prediction is used instead of the prediction used.

In the case of a motion-compensated hybrid or not the image signal itself is transformed, but the temporal difference signal after the motion compensation. For this reason, the coder also has a recursion time loop, because the predictor must

Calculate prediction value from the values of the already transmitted (coded) images. An identical time recursion loop is in the decoder, so that the encoder and decoder are completely synchronized. There are three main methods in which MPEG-2 code processing can be used to process images:

I-pictures: No temporal prediction is used for the I-pictures, i.e. the picture values are directly transformed and encoded. I-pictures are used in order to be able to start the decoding process again without knowing the past, or to resynchronize

To achieve transmission errors.

P-pictures: A temporal prediction is made on the basis of the P-pictures, the DCT is applied to the temporal prediction error.

B-pictures: With the B-pictures, the temporal bidirectional prediction error is calculated and then transformed. The bidirectional prediction works basically adaptively, i.e. it will be one

Forward prediction, backward prediction or interpolation permitted.

In MPEG-2 coding, a picture sequence is divided into so-called GOPs (Group Of Pictures), n pictures from one I picture to the next form a GOP. The distance between the P-pictures is denoted by, wherein there are m-1 B-pictures between the P-pictures. The MPEG syntax, however, leaves it up to the user to choose m and n. m = 1 means that no B-pictures are used, and n = 1 means that only I-pictures are encoded.

[2] discloses a method for estimating motion in the context of a method for block-based image coding. It is assumed that a digitized image

Has pixels which are combined in blocks of 8x8 pixels or 16x16 pixels. If necessary, an image block can also comprise several image blocks.

In the case of a sequence of images, the following is done for an image to be encoded, taking into account the image blocks of this image:

The image block for which a motion estimation is to be carried out is carried out in a temporally preceding image, starting from an image block which was located in the same relative position in the previous image

(= previous image block), a value for an error measure is determined. For this purpose, a sum is preferably determined via the magnitude of the differences between the coding information of the picture block and the previous picture block assigned to the picture elements.

Coding information here means brightness information (luminance value) and / or color information (chrominance value), each of which is assigned to a pixel. "In a search space of a predeterminable size and shape around the starting position in the previous image, a value of the error measure is determined for an area of the same size of the previous image block, shifted by one or half an image point.

^■ result in a search space of the Great n × n pixels

2 there are n (error) values. The shifted previous image block in the temporally preceding image is determined for which the error measure gives a minimum error value. For this

Picture block is assumed that this previous picture block best matches the picture block of the picture to be encoded for which the motion estimation is to be performed. ^■ The result of the movement estimation is a

Motion vector with which the displacement between the image block m the image to be encoded and the selected image block is described in the previous image.

Compression of the image data is achieved in that the motion vector and the error signal are encoded.

■ In particular, the motion estimation is carried out for each image block of an image.

An ect-based image compression method is based on a decomposition of the image into objects with any

Boundary. The individual objects are encoded separately in different "Video Object Plans", transmitted and reassembled in a receiver (decoder). As described above, in conventional coding methods, the entire picture is divided into m square picture blocks. This principle is also adopted in object-based methods, in that the object to be coded is divided into m square blocks and a movement estimation with movement compensation is carried out separately for each block. The coding of edge blocks is problematic since the edge of the object generally does not match the block edges.

There are different search strategies for the movement estimation. A so-called "block matchmg method" is used for block-based image compression methods. It is based on the fact that the picture block to be coded is compared with picture blocks of the same size of a reference picture. One of the reference picture blocks is in the same position as the picture block to be coded, the others

Reference image blocks are shifted in relation to this. With a large search area in the horizontal and vertical directions, there are numerous search positions, so that a correspondingly large number of block comparisons must be carried out when the search is complete. As a criterion for a good match between the block to be encoded and the reference block, a sum is generally used the absolute differences of the individual pixels. Such a search procedure is also often associated with so-called rate boundary conditions according to [2] in order to make the motion vectors robust against incorrect estimates. This improves the picture quality.

With all previously standardized video coding methods H.261, H.263, MPEG-1, MPEG-2 and MPEG-4, this results in a temporal prediction of the image to be currently coded in blocks with the aid of displacement information

(Block size 16x16 or 8x8 pixels) from the previous image (displacement = movement estimate; displacement vector = movement vector).

The quality of the prediction determines the data rate for the transmission of the image and thus the image quality at the receiver (decoder). The blocks within a block cell are usually processed from left to right and within an image from top to bottom and transmitted to the decoder. This processing creates a local neighborhood for an image block currently to be transmitted, which at best consists of three picture blocks adjacent above and a left picture block already transmitted.

It is disadvantageous here that only insufficient information is taken into account for the current image block due to the neighborhood described. This results in large errors, which must be corrected accordingly, which manifests itself in the high data rate or reduced image quality.

The object of the invention is to carry out a prediction which delivers significantly improved results with regard to data rate and performance and thus enables either a lower data rate for the same image quality or a higher image quality for the same data rate. This object is achieved according to the features of the independent claims.

To solve the problem, a method for predicting the coding of an image subdivided into image blocks is specified, in which at least one image block in the neighborhood is used for prediction for a current image block, the at least one image block in the neighborhood being determined by a m the image block following the display order.

The image data transmission, that is to say the transmission of the digital image divided into m image blocks, preferably takes place from an encoder to the decoder. The image data are transmitted as a data stream, the data stream being unpacked and, in particular, displayed on the decoder side. Image compression with prediction is expediently carried out on the encoder.

The order of representation relates in particular to the order of the decoder, i.e. image blocks shown on the receiver after decoding.

It should be noted here that the prediction can take place with or without subsequent error correction. Error correction (error compensation) compensates for errors associated with prediction (differences between image blocks).

A further development is that the prediction is an intra-prediction. This means that the prediction is carried out based on a causal neighborhood (intra-prediction), i.e. by means of m the display sequence of subsequent image blocks. For example, cell by cell

Coding the image blocks of an image results in a display order from left to right (per line) to the picture is processed. A prediction of a current image block (or a part thereof) based on an image block on the right-hand side fulfills the above-mentioned condition.

Another development is that prediction is a motion-compensated prediction (inter-coding, movement estimation). Image blocks, in particular, are predicted using a previous image (and the image blocks contained therein) (motion estimation, explanations see above).

It is thus possible to code the current image block in the context of at least one image block following the display sequence, in particular that

To carry out prediction. For this purpose, the current block is preferably buffered and the subsequent block is first coded. The coding of the current image block is now carried out on the basis of the image block which has just been transmitted but which follows the display sequence.

When considering the neighborhood or the context for the current image block, a motion vector for the current image block is determined as part of the motion estimation. The motion vector is based on the

Motion vectors of the blocks m of the local neighborhood are predicted, so that the smallest possible difference to the motion vectors from the neighborhood manifests itself in a low data transmission rate. A large difference in the motion vector would lead to the fact that a corresponding amount of data had to be transmitted and thus the required data rate increased or the image quality decreased accordingly.

A further development consists in that in particular several picture blocks from the local neighborhood are used to predict the current picture block. Thus, both image blocks following the display sequence for the current image block and previous image blocks can be used for the prediction of the current image block.

One embodiment consists in that the current image block is subdivided into m a first part and m a second part, different blocks of the local neighborhood including at least one of the m being used to predict the first and the second part

The order of representation of the following image blocks can be used.

An example of an intra-prediction is that an edge divides the current image block m a left sub-block and m a right sub-block. The edge in this case is a texture edge, i.e. the texture in the left sub-block can be better predicted from the predecessor block and the texture in the right sub-block can be better predicted from the successor block.

In the temporal prediction from the previous picture, the edge is expediently an edge of a moving object. It is particularly important that the image areas on the left and right in the block are caused by different shifts in the image signal in the previous block, i.e. they each have different displacement vectors.

This means that a certain sequence must be followed for the creation of the data stream, in particular the image data are processed in rows, from left to right, in blocks.

It is therefore advantageous that the first part is a left part of the current image block and the second part is a right part of the current image block. In this

The order of presentation refers to the prediction of the left part of the current image block, particularly to the previous left image block, whereas the right part of the current image block, among others. is decoded using the right image block.

It is also an embodiment that the subdivision of the current image block is determined analytically. If, for example, an edge is involved, it is advantageous to describe this edge functionally. An approximate description of the course of the edge, for example as a straight line, saves a significant data rate.

It should be noted here that the analytical description of the edge or the subdivision of the current image block can also be described using a specific syntax selected for the coding. By means of certain coding patterns, the course of object edges within the current image block can be determined. Such coding has the advantage that it can be extremely memory efficient.

Another embodiment consists in that a decoding of an image divided into m image blocks, which has been encoded according to the above method, is carried out, the current image block being predicted on the basis of at least one image block lying in the neighborhood.

It is also a further development that the information about this subdivision, in the above example the edge, can be suitably implemented on the decoder side, that is to say is correctly recognized accordingly. For this purpose, the decoder can expediently be expanded to include a semantics of the coding by dividing the currently received image data block. Such coding can be coupled with corresponding information for restoring the left sub-block and the right sub-block of the current image block. If different environments (contexts, expertise) are taken into account for the right or left sub-block, this can be communicated to the decoder by suitable coding on the side of the encoder. Such coding, as is customary in communication protocols, can be carried out by different masking of code words.

Also to solve the problem, an arrangement for predicting the coding of an image divided into m image blocks is provided, in which a processor unit is provided which is set up in such a way that at least one image block in the neighborhood m is used for the prediction for a current image block, whereby the at least one of the

Neighboring image block is determined by an image block following in the display order.

Furthermore, a computer program product for predicting the coding of an image subdivided into image blocks is specified for solving the task

Load a program memory of a processor unit

Execution of the following steps enables: for a current picture block, at least one picture block lying in the neighborhood is used for prediction, wherein the at least one picture block lying in the neighborhood can be determined by a picture block following the sequence of representation.

The arrangement and the computer program product are particularly suitable for carrying out the method according to the invention or one of its developments explained above.

Exemplary embodiments of the invention are illustrated and explained below with the aid of the drawings. Show it

Fig.l is a sketch showing a Codιerungs- / representation order of picture blocks;

2 shows a case distinction in m different block modes;

3 shows a sketch with an image encoder and an image decoder;

4 shows a processor unit.

Fig.l shows a sketch that illustrates a coding or presentation order of image blocks.

In Fig.l two lines 111 and 112 are shown with image blocks 101 to 106. Different block modes I to V for block 105 will be described below. In this case, an edge 113 optionally runs through the image blocks 102 and 105, left partial blocks 107 and 109 and right partial blocks 108 and 110 being formed in each case. There are the following options for block modes (combination modes):

Intra-prediction is understood to mean that a prediction is made only from the currently decoded image signal, ie the local environment of the current image is taken into account in the prediction. Inter-Pradiktion however, the (temporal) previous image signal (image) is also taken into account.

Block mode I:

The image block 105 is not divided into two parts by an edge m, it is predicted from surrounding blocks (of the current image), in particular on the basis of the subsequent block 106. The prediction takes place particularly in the signal range or in the DCT range.

Biock mode II:

The image block 105 is divided by the edge 113 m, the left sub-block 109 and the right sub-block 110. Sub-blocks 109 and 110 are predicted individually: for sub-block 109, a prediction results from blocks or sub-blocks 101, 107 and 104. For sub-block 110, a prediction results from blocks or sub-blocks 106, 108 and 103. In In this case, the edge describes a texture edge, for example a transition between the colors black and white.

Block mode III:

An inter-prediction (without edge) is to be carried out for the image block 105. For this purpose, a movement estimate is made for block 105, the resulting displacement vector (displacement vector) is determined on the basis of displacement vectors of the surrounding blocks 101, 102, 103, 104 and 106. If necessary, the residual error image is encoded. This results in a saving in terms of a bit rate necessary for the shift vector.

Block mode IV:

Analog to block mode II, an inter-prediction is now made taking into account edge 113. The edge is preferably an object edge of a moving one Object, ie the sub-blocks 109 and 110 have different displacements. Accordingly, ■ the displacement vector for sub-block 109 from the image blocks or sub-blocks 101, 107 and 104 and ■ the displacement vector for sub-block 110 from the image blocks or sub-blocks 108, 103 and 106 are determined and encoded. Preferably, the displacement vector for sub-block 109 can be set equal to the displacement vector for block 104 and, accordingly, the displacement vector for sub-block 110 can be set equal to the displacement vector for block 106. The advantage of this block mode is above all the saving in data rate due to a lower residual error.

Block mode V:

Combinations of intra-prediction and inter-prediction based on the individual sub-blocks are also possible. For example, block mode II can be used for sub-block 109 and block mode III for sub-block 110.

It should be noted here that the block modes II, IV and V are expected to offer the greatest savings in data rate.

2 shows a sketch with a case distinction and the possibilities for combining the block modes explained in FIG.

In step 201, a decision is made as to whether an edge (113 in FIG. 1) is present. If this is not the case, a branch is made to point 202 and a decision is made as to whether an inter-prediction 203 or an intra-prediction 204 should be carried out. The edge 113 divides the image block into m a left sub-block 205 (see reference number 109 m FIG. 1) and m a right sub-block 206 (see reference number 110 m FIG. 1). Now the left sub-block 205 and the right sub-block 206 can either be m-predicted or m-predicted (cf. blocks 207 to 210).

3 shows a sketch of an arrangement for carrying out a block-based image coding method.

A video data stream to be encoded with temporally successive digitized images is fed to an image coding unit 1201. The digitized images are divided into macro blocks 1202, each macro block having 16x16 pixels. The macro block 1202 comprises four picture blocks 1203, 1204, 1205 and 1206, each picture block containing 8x8 picture elements to which luminance values (brightness values) are assigned. Furthermore, each macro block 1202 comprises two chrominance blocks 1207 and 1208 with the chrominance values assigned to the pixels (color information, color saturation).

The block of an image contains a luminance value

(= Hellιgkeιt), a first chrominance value (= color) and a second chrominance value (= color saturation). The luminance value, first chrominance value and second chrominance value are referred to as color values.

The image blocks are fed to a transformation coding unit 1209. In the case of differential image coding, values to be coded from image blocks of temporally preceding images are subtracted from the image blocks currently to be coded; only the difference formation information 1210 of the transformation coding unit (Discrete Cosine Transformation, DCT) 1209 is supplied. This is done via a Connection 1234 the current macroblock 1202 a motion estimation unit 1229 communicated. In the transform coding unit 1209, spectral coefficients 1211 are formed for the picture blocks or difference picture blocks to be coded and fed to a quantization unit 1212.

Quantized spectral coefficients 1213 are fed to both a scan unit 1214 and an inverse quantization unit 1215 on a backward path. After a scanning process, e.g. a "zιgzag" scanning method, entropy coding is carried out on the scanned spectral coefficient 1232 in an entropy coding unit 1216 provided for this purpose. The entropy-coded spectral coefficients are transmitted as coded image data 1217 via a channel, preferably a line or a radio link, to a decoder.

In the inverse quantization unit 1215 there is an inverse quantization of the quantized spectral coefficients 1213. Spectral coefficients 1218 obtained in this way are supplied to an inverse transformation coding unit 1219 (inverse discrete cosine transformation, IDCT). Reconstructed coding values (also differential coding values) 1220 are supplied to an adder 1221 in the differential image mode. The adder 1221 also receives coding values of an image block, which result from a temporally previous image after motion compensation has already been carried out. Reconstructed image blocks 1222 are formed with the adder 1221 and stored in an image memory 1223.

Chrominance values 1224 of the reconstructed image blocks 1222 are supplied from the image memory 1223 to a motion compensation unit 1225. For holiness values 1226 there is an interpolation m of an interpolation unit 1227 provided for this purpose. The interpolation is used to determine the number m of the respective image block contained brightness values preferably doubled. All brightness values 1228 are supplied to both the movement compensation unit 1225 and the movement estimation unit 1229. The Movement Estimation Unit 1229 also receives the

Picture blocks of the macro block to be coded in each case (16x16 picture elements) via the connection 1234. This is done in the movement estimation unit 1229

Motion estimation taking into account the interpolated brightness values ("motion estimation on a half-pixel basis"). When estimating the movement, absolute differences of the individual brightness values are preferably determined in the macro block 1202 currently to be coded and the reconstructed macro block from the previous image.

The result of the motion estimation is a motion vector 1230, by means of which a local displacement of the selected macroblock from the temporally previous image to the macroblock 1202 to be coded is expressed.

Both brightness information and chrominance information relating to the macroblock determined by the motion estimation unit 1229 are shifted by the motion vector 1230 and subtracted from the coding values of the macroblock 1202 (see data path 1231).

A processor unit PRZE is shown in FIG. The processor unit PRZE comprises a processor CPU, a memory MEM and an input / output interface IOS, which is used in different ways via an interface IFC: an output becomes visible on a monitor MON and / or on a printer via a graphics interface PRT issued. An entry is made using a mouse MAS or a keyboard TAST. The processor unit PRZE also has a data bus BUS, which connects the memory MEM, the processor CPU and the input / output interface IOS guaranteed. Furthermore, additional components can be connected to the data bus BUS, for example additional memory, data storage (hard disk) or scanner.

Bibliography :

[1] J. De Lameillieure, R. Schafer: "MPEG-2 image coding for digital television", television and kmo technology, 48th year, No. 3/1994, pages 99-107.

[2] M. Bierlmg: "Displacement Estimation by Hierarchical

Block Matching ", SPIE, Vol.1001, Visual Communications and Image Processing '88, S.942-951, 1988.

Claims

claims

1. A method for prediction in the coding of an image subdivided into m image blocks, in which at least one image block lying in the neighborhood is used for prediction for a current image block, the at least one image block lying in the neighborhood being determined by an image block following the display sequence ,

2. The method of claim 1, wherein the prediction is an intra-prediction.

3. The method of claim 1, wherein the prediction is a motion-compensated prediction.

4. The method according to any one of the preceding claims, wherein the current image block is subdivided into a first part and m a second part, wherein for the first part the prediction based on at least one image block preceding in the display order and for the second part the prediction additionally based on the subsequent image blocks is carried out.

The method of claim 4, wherein the first part is a left part of the current image block and the second part is a right part of the current image block.

Method according to Claim 4 or 5, in which the division of the current image block is predetermined by an edge.

7. The method according to claim 6, wherein the edge can be determined analytically.

8. A method for prediction in the decoding of an image divided into m image blocks, which was encoded according to one of claims 1 to 7, in which the current image block is predicted on the basis of the at least one image block lying in the neighborhood.

9. A method for prediction in the decoding of an image divided into m image blocks, which was encoded according to one of claims 4 to 7, in which the first part of the current image block is predicted from the previous image block and in which the second part of the current image block is based on the subsequent image blocks is predicted.

10. The method according to claim 9, wherein information about the subdivision of the current picture block is evaluated at the decoder and m the current picture block is entered.

11. Arrangement for prediction in the coding of an image divided into m image blocks, in which a processor unit is provided which is set up in such a way that for a current image block at least em m the neighborhood image block is used for prediction, the at least one in the neighborhood lying picture block is determined by a picture block following the order of representation.

12. Computer program product for predicting the coding of an image subdivided into m image blocks, which, when loaded into a program memory of a processor unit, enables the following steps to be carried out: for a current image block, at least one image block lying in the neighborhood becomes a prediction used, the at least one image block lying in the neighborhood being determinable by an image block following in the display sequence.