EP1782634A1

EP1782634A1 - Method and device for coding and decoding

Info

Publication number: EP1782634A1
Application number: EP05764634A
Authority: EP
Inventors: Peter Amon; Andreas Hutter; Benoit Timmermann
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2004-08-27
Filing date: 2005-07-29
Publication date: 2007-05-09
Also published as: KR101240441B1; WO2006024584A1; US8290058B2; KR20070046880A; JP5300921B2; US20080095241A1; JP2008511226A; CN101010961B; DE102004041664A1; CN101010961A; JP2011172297A

Abstract

The invention relates to a method for video coding image sequences, wherein images of the image sequence are coded in a scaled manner such that the obtained video data contains information which represents the image in a plurality of different steps from a defined image resolution and/or image quality (e.g. according to the data rate), and the resolution is defined by the number of image pixels of each represented image. Coding takes place in a block-based manner such that, for a description of an approximate movement of parts of one of the images, contained in the image sequence, at least one block structure which describes the movement is produced, said structure being fitted in such a manner that it is divided from a block into partial blocks comprising, in parts, sub-blocks which divide in a fine manner the successive partial blocks. According to the invention, a first block structure is produced temporally for at least one first resolution level and a second block structure is produced for a second resolution level. The first resolution level has a lower image pixel number and/or image quality than the second resolution level. Also, the second block structure is compared to the first block structure such that differences in the block structure are determined, such that on the base of the properties of the structure differences, a modified second block structure is produced. The structure thereof represents one part of the second block structure. Subsequently, the modified block structure and the second block structure are compared based on at least one value which is proportional to the quality of the image and the block structure and the value thereof is directly proportional to an improved quality based on the coding of the bit sequence.

Description

METHOD AND DEVICE FOR CODING AND DECODING

The invention relates to a method for video coding according to the preamble of claim 1, a method for decoding according to the preamble of claim 22 and coders for video coding according to the preamble of claim 23 and a decoding device according to the preamble of claim 24th

Digital video data is usually compressed for storage or transmission in order to significantly reduce the enormous volume of data. The compression takes place both by eliminating the signal redundancy contained in the video data and by eliminating the irrelevant signal parts which are imperceptible to the human eye. This is generally achieved by a hybrid coding method in which the image to be coded is first predefined in time and the remaining prediction error is then transformed into the frequency domain, for example by a discrete cosine transform, quantized there and by a variable coding method Length code is encoded. The motion information and the quantized spectral coefficients are finally transmitted.

The better this prediction of the next image information to be transmitted is, the smaller the prediction error remaining after the prediction and the less data rate must subsequently be used for the coding of this error. An essential task in the compression of video data is thus to obtain the most accurate possible prediction of the picture to be coded from the previously transmitted picture information. The prediction of an image has hitherto been effected by initially dividing the image, for example, into regular sections, typically square blocks of size 8 × 8 or 16 × 16 pixels, and then prediction for each of these picture blocks from that already known in the receiver Image information is determined by motion compensation. (However, blocks of different size may also result.) Such an approach can be taken from FIG. Two basic cases of prediction can be distinguished:

- Uni-directional prediction: The motion compensation takes place here exclusively on the basis of the previously transmitted image and leads to so-called "P-frames". - Bi-directional prediction: The prediction of the image is done by superimposing two images, one of which is temporally forward and another follows in time and which leads to so-called "B-frames". It should be noted here that both reference pictures have already been transferred.

Corresponding to these two possible cases of prediction, motion-compensated temporal filtering (MTCF) yields five directional modes in the method of MSRA [1], as can be seen in FIG.

MCTF-based scalable video coding is used to ensure good video quality for a very wide range of possible bit rates as well as temporal and spatial resolution levels. However, the MCTF algorithms known today show unacceptable results for reduced bit rates, which is due to the fact that too little texture (block information) in relation to the information which relates to the motion information (block structures and motion vectors) a sequence of defined videos refer to exist. It therefore requires a scalable form of motion information in order to achieve an optimum relationship between texture and motion data at any bit rate and also resolution. For this purpose, [1] a solution of MSRA (Microsoft Research Asia) is known, which represents the current state of the MCTF algorithms.

The MSRA solution proposes to present movements in layers, or to dissolve them in successively refined structures. The MSRA method thus achieves that the quality of images at low bit rates is generally improved.

However, this solution has the disadvantage that they come to some

Shifts in the reconstructed image lead, which are attributable to an offset between the movement information and the textures.

An improvement to this is from the German

Patent application with the file number 10 2004 038 110.0 be¬ known.

In the method described there, which in particular does not completely transmit a complete motion vector field generated according to MSRA (temporal block structures MV_QCIF, MV_CIF and MV_4CIF), which is defined on the encoder side, but rather only the most significant part of this motion vector field. The generation of the most significant part is achieved by a kind of refinement of the block structures, which is achieved by virtue of structural properties only determining parts of the structural differences between successive block structures and using them to produce more refined block structures.

The problem here is that not every visual quadrant achieved by a refined block structure and associated texture an increase in quality compared to a visual quality that can be achieved by means of a corresponding basic structure and associated texture.

The object underlying the invention is to provide a method for coding and decoding, as well as an encoder and decoder, which enable better embedding of refined structures.

This object is achieved on the basis of the method of coding according to the preamble of claim 1 by its characterizing features nenden. Furthermore, this object is achieved by a method for decoding according to the preamble of An¬ claim 22, the encoder according to the preamble of claim 23 and the decoder according to the preamble of claim 24 by their features.

In the method according to the invention for video coding of image sequences, in which images of the image sequence are scaled in such a way that the resulting video data contains information representing a representation of the images in a plurality of different levels from one by the number of Image points per image representation defined resolution of the images and / or image qualities (eg depending on the data rate) ensure, the coding is block-based such that for a description of any contained in the image sequence movement of parts of the images at least one movement descriptive Block structure er¬ is generated, which is designed such that it is divided starting from a block in sub-blocks partially with the sub-blocks successively finer sub-dividing sub-blocks, temporarily for at least a first resolution level a first block structure and for a second resolution level a second block structure r, wherein the first resolution level has a lower number of pixels and / or image quality than the second resolution level. Furthermore, the second block structure is compared with the first block structure such that differences in the block structure are determined so that a modified second block structure is generated on the basis of properties of the structure differences such that their structure represents a subset of the second block structure. Subsequently, the modified second block structure and the second block structure are compared on the basis of at least one value proportional to a quality of the image and based on that block structure of the coding of the bit sequence whose value is directly proportional to a better quality.

By this procedure, the difference between texture information is minimized and, moreover, this information can be coded with minimal effort. In addition, the offset disappears for the cases where, for example, the finest motion vector field has been selected, so that an improvement in image quality is ensured even at lower bit rates and lower resolutions.

Furthermore, the comparisons according to the invention ensure, above all, through the comparison, that a step-by-step adaptation and, above all, an optimal adaptation between a motion estimation and the embeddedness of residual-error images is achieved. In addition, it is characterized by its special efficiency.

For this purpose, sub-blocks added to this difference detection are preferably detected, the properties of the sub-blocks being detected alternatively or additionally to the difference determinations.

If the block size of the subblocks is detected as a subblock property, one obtains a very good indicator in practice of the degree of fineness of the block structures produced.

If only the partial block of the first block structure is used for the difference determination, the partial block of the second block structure, the differences of the texture information can be further reduced.

In this case, preferably only those subblocks of the second block structure are taken over into the modified second block structure whose block size reaches a definable threshold value. This ensures that not a complete block structure, i. a complete motion vector field has to be transmitted, but only the most significant part of the structure. This leads firstly to a reduction of the information to be transmitted and, in spite of this reduction, to an elimination or reduction of the offset, so that artifacts in the encoded image are reduced or eliminated. In practice, the use of a definable threshold value is of great advantage since, for example, optimal values determined by simulation or experimental tests can be set here, of which very good results can be expected on the basis of the results of the simulation or experiments.

In this case, the threshold value is preferably defined such that it indicates a ratio of the block size of a sub-block of the second block structure to a block size contained in a region of the first block structure used for comparison, which block is assigned to the smallest sub-block of the area.

Furthermore, it is provided in a development to indicate that the acquired sub-blocks can be non-dyadic.

A further improvement of the results with respect to the representation of the decoded image can be achieved if the modified second block structure of the second resolution step is used as the first block structure of a third resolution step, wherein the second resolution step has a lower pixel count and / or image quality than the second resolution block third resolution level. Thus, possible further block structures of higher resolution levels are used for generating the modified second block structure, in which the modified second block structure of the respectively preceding resolution stage is used for the comparison according to the invention.

It is also advantageous for a decoding that the coding takes place in such a way that subblocks not taken over into the second modified block structure are respectively identified.

For this purpose, it is preferably provided that the identification takes place by the use of a direction mode, which is referred to in particular as "not_refind".

In a further development of the invention, a bit stream is generated during the encoding of the bit sequence in such a way that it represents a scalable texture, wherein this is preferably achieved in that the bit stream is realized by a number of bit planes and in particular at least depends on Comparison result is varied as well as by a to be realized for a transmission bit rate. This achieves adapted SNR scalability.

In addition, if the number of bit planes is varied depending on the resolution level, a fine granularity of the SNR scalability is ensured.

It is also advantageous if, in the case of direct proportionality of the value of the modified second block structure, at least a first part of the bit planes representing the second block structure is updated. This ensures that the corresponding second modified block structure is available on the decoder side. In this case, the update can be carried out, for example, such that the transmission of a second part takes place or alternatively that the first part is modified by a second part of bit planes.

The updating is preferably carried out in such a way that those regions of a texture associated with the second block structure are refined, which are defined by the modified second block structure, so that in the final result a good image quality is available even for different spatio-temporal resolutions or bit rates without being subject to drift, which arises due to an offset between motion vector fields and residual error blocks, which do not make use of the refinement of the block structures.

Additional support for the finer granularity is achieved if, at a high bit rate, a second number of bit planes exceeding the number is transmitted.

The object on which the invention is based is also achieved by the method for decoding a coded image sequence by taking into account the information contained in the image sequence according to a method, in particular the information described above for updating motion information, and a scalable one Texture representing bitstream is a scaled representation of the image sequence is generated.

The coder according to the invention, which has means for carrying out the method, and a corresponding decoder, which has means for decoding a coded picture sequence generated according to the method, also contribute to achieving the object. For this purpose, the decoder preferably has means for detecting scalable texturizing parts of the bit stream indicating first signals, and additionally means for detecting second signals indicating regions to be updated, wherein the signals are each designed in particular as syntax elements. As a result, the improvements in the quality of the representation achieved by the method according to the invention can be carried out on the decoder side.

If the decoder has means for determining those bit planes at which an update leads to improvements in a representation of the coded image sequence and alternatively or additionally has means for determining the bit plane at which the update of a texture is to take place to precisely reconstruct refined or scalable representation of the image sequence.

If the decoder has means for updating a texture which are configured in such a way that consideration of updated motion information takes place, the elimination of the offset achieved by the inventive method for encoding can be ensured in the scalable representation of the image sequence generated on the decoder side.

In this case, the decoder is preferably characterized by updating means which are configured in such a way that an updated texture is formed from an existing texture in such a way that the updated texture information is formed from the texture information assigned to the texture and a texture update information, wherein the update are designed such that the Texturinforma¬ tion is at least partially replaced by the texture update information. Further details of the invention and advantages will be explained with reference to an Ausführungsbei¬ game of the invention with reference to Figures 1 to 7. Showing:

FIG. 1 shows the model of a motion estimation for generating scalable motion information,

FIG. 2 shows the directional modes necessary for this,

FIG. 3 shows the subblock sizes used here,

FIG. 4 shows the schematic representation of block structures produced according to the invention,

Figure 5 shows schematically the decision according to the invention about updates

FIG. 6 schematically shows the generation according to the invention of an updated bistream

FIG. 1 schematically shows the MSRA solution known from the prior art, which is explained for a better understanding of the invention, since it is used at least in part in the described embodiment.

According to MSRA, the mentioned multilayer motion estimation is performed in each temporary layer. The motion estimation is realized with a fixed spatial resolution with different macroblock sizes, so that the resulting motion vector field adapts to the decoded resolution. For example, if the original resolution level is a CIF-encoded format and the decoded resolution level is a QCIF format, the motion estimation is performed at the resolution level of the CIF format or CIF resolution, respectively with a block size of 32 x 32 as a base and with a macroblock size of 8 x 8 as the smallest block size. If on the other hand, if the decoded format is the CIF format, the size of the macroblocks is scaled down by a factor of 2, as can be seen from FIG.

As can furthermore be seen in FIG. 1, the original motion vectors are transmitted in the lower branch of the processing shown there for the decoding of the block present in QCIF format, while for each higher layer, for example those for the decoding of the CIF block is used, only the difference information with respect to the motion vectors is used. In this case, a single motion vector of a lower layer can be used to predict a plurality of vectors of the higher layer if the block is split up into smaller subblocks.

Different modes indicate the direction of the motion compensation, as already mentioned and illustrated in FIG. 2, while FIG. 3 shows that the block structures are coded according to the MSRA method according to the same method as described in FIG Standard MPEG-4 AVC (Advanced Video Coding) [2] is used.

In order to select one of the block structure and the direction of the motion compensation which are to be encoded, it is provided according to the MSRA approach to use a so-called cost function, which has been defined for this function and which is termed "rate distortion optimization" is known.

In the multilayer representation of the motion according to MSRA, different motion descriptors, which are adapted to different local resolutions, are generated for the same temporal layer (frame rate). In this case, the motion estimation, which belongs to the higher resolutions, is regarded as enriching information (enhancement layer / information) on the basis of a detection of the coarse movement information. Since the result obtained by the coarse motion vector field If the residual error block contains a large amount of energy, only that residual error block is transmitted which is generated after the feinsth movement compensation. This leads, especially when the coarse motion information is selected, to very strong artifacts in the reconstructed residual error image, even when the bit rate is high.

FIG. 4 shows how temporary block structures generated according to the invention lead, using the method according to the invention, to block structures which are ultimately to be transmitted.

Three temporary block structures MV QCIF, MV_CIF and MV_4CIF can be seen. According to the invention, each of these block structures is in each case assigned to a resolution level, resolution level designating the format of the resolution with which a video signal encoded by the method according to the invention, which consists of image sequences, can be represented.

For the present embodiment, these are the Common Intermediate Format (CIF), the QCIF and the 4CIF format.

In this case, QCIF represents a first resolution stage, that is to say the lowest resolution stage for the resolution stage selected according to the invention, so that according to the invention a first block structure MV_QCIF is assigned to it, while CIF represents a second resolution stage, for the invention a second block structure MV_CIF is produced.

The block structures are generated in the context of a motion estimation algorithm, for example using the already mentioned MCTF and / or MSRA method. It can also be seen that the temporary block structures MV_QCIF, MV_CIF and MV_4CIF have successively refined sub-block structures, which are characterized by sub-blocks which are becoming increasingly finer, based on sub-blocks MB_QCIF, MV_CIF and MV_4CIF respectively defined for each temporary block structure MV_QCIF, MV_CIF and MV_4CIF added.

Furthermore, it can be seen from the representation that the temporary block structures MV_QCIF, MV_CIF and MV_4CIF have the same spatial resolution, ie this remains constant despite the number of image points increasing from resolution step to resolution step.

FIG. 4 also shows the block structures MV_QCIF, MV_CIF and MV_4CIF to be transmitted or, finally, transmitted, for example for a streaming application, which are generated from the temporary block structures MV_QCIF using the method according to the invention. MV_CIF and MV 4CIF are generated by respectively comparing a block structure belonging to a high resolution stage with a block structure belonging to a next lower resolution stage and, as a result, generating a modified block structure belonging to the considered resolution stage which has subblock structures which only contains a subset of the temporal block structure belonging to the same resolution step, this not being a true subset, which would preclude the case that the subblock structure of the modified block structure with the subblock structure of the corresponding temporary block str but rather, since it is even the case that this special case can also occur according to the method according to the invention, it is merely a (simple) partial quantity known, for example, from mathematics.

This algorithm according to the invention will be explained in more detail below. According to the invention, the generation of a block structure belonging to the lowest resolution stage is started. According to the invention, the modified block structure MV QCIF results directly from this first block structure MV_QCIF, since, of course, no comparison with a previous block structure can be made for this case. The directly resulting modified block structure MV_QCIF therefore has the same subblock structure as the first block structure MV QCIF.

According to the invention, in a further step to the next higher resolution stage, in this case CIF, a second block structure MV_CIF is generated. It can be seen that additional subblocks have been added to the second block structure MV CIF, which lead to a finer subblock structure, as compared to the first block structure MV QCIF auf¬ has. The sub-blocks or sub-block structures that have been added are shown in phantom in the figure.

According to the invention, a comparison is therefore carried out in a next step, in which the added sub-blocks are checked as to whether they have a block size that is more than four times smaller than the smallest one

Block size of the corresponding subarea of the first block structure.

If this is the case, the corresponding subblock structure is included in a modified second block structure MV_CIF, whereas in cases where the subblock to be examined represents less refinement, the acquisition of the subblock structure in the modified second block structure to be transferred is dispensed with.

In order to be able to explain this better, two of the examples in the second block structure MV CIF are shown in FIG. have been picked out, namely a first sub-block SB1 and a second sub-block SB2.

The first sub-block SB1 is located in a first sub-block MB1_CIF of the second block structure MV_CIF. Accordingly, according to the invention, the first sub-block MB1_QCIF corresponding to the first sub-block MB1_CIF of the second block structure MV CIF is examined, which is the smallest sub-block size occurring here. In the present example, this minimum block size is defined by a minimum first subblock MIN SB1. As can be seen, the size of the first sub-block corresponds to the size of the first minimum sub-block, so there is no refinement in this case. Accordingly, according to the invention, the subblock structure underlying the first subblock is not adopted in the second block structure MV_CIF to be transmitted, so that in the illustration according to FIG. 4 the second modified block structure MV_CIF lacks the dot-dash grid at the corresponding position.

In the comparison, among other things, a second sub-block SB2 is used for the comparison. Since the second sub-block SB2 is contained in a fourth sub-block MB4_CIF of the second block structure MV CIF, a search is made for a minimum sub-block size in a fourth sub-block MB4_QCIF of the first block structure MV QCIF. This is given by a second minimum sub-block MIN_SB2, which in this case exactly divides the fourth sub-block MB4 QCIF of the first block structure MV_QCIF. As can be seen, in this case the size of the second sub-block SB2 represents one-eighth of the size of the minimum second sub-block MIN_SB2, so that even an eightfold refinement is given compared to the first block structure MV_QCIF. According to the invention, therefore, the subblock structure defining the second subblock is also adopted in the modified second block structure MV 'CIF. The same happens for all those Blocks of the second block structure MV CIF can be seen in the illustration according to FIG. 4 on the dashed structures of the modified second block structure MV 'CIF.

As can be seen from a comparison of the second block structure MV CIF and the modified second block structure MV'_CIF, not all sub-block structures of the second block structure MV'_CIF have been adopted. In order for such an encoded image sequence to be displayed correctly, an encoding of those subblocks which were not incorporated into the modified block structures is encoded during the encoding of the block structures which are to be transmitted. The process according to the invention also finds its application in other dissolution stages in the same way. For example, according to the present exemplary embodiment, a block structure MV 4CIF is also generated for the 4CIF format. According to the invention, this is again used as a second block structure, while the first block structure is given by the preceding second block structure MV_CIF. The second modified block structure MV'_4CIF resulting from the comparison of the two block structures has again been refined in the representation of FIG. 4 only by a part of the added subblock structures, which are dotted in the illustration.

Alternatively or additionally, instead of a temporary block structure, an already generated transmitted, i. Modified second block structure can be used as a first block structure.

According to the invention, it is not necessary to produce block structures to be transferred according to the invention for all resolution levels coded in the image sequence, but for example only in the partial resolutions of the resolutions mentioned, ie, for example, only for CIF in the case that QCIF, CIF or 4CIF has been used or only for CIF, in case, that QCIF and CIF has been applied. On the contrary, in practice it is sufficient to apply this to average resolution levels compared to all existing resolution levels, since the best performance is given for a middle resolution level, since in this case multiple up and down sampling of the block structures and the motion vectors is avoided can be. In this case, the data rates for the motion information for the various local resolution levels are set by a parameter, so that an optimum ratio of the data rate for motion information and texture information results at each resolution level.

In this case, the invention is not based on the exemplary embodiment explained with reference to FIG. 4, but encompasses all implementations which come within the scope of expert knowledge and which comprise the core according to the invention:

That, in particular according to MSRA, generated complete Bewegungs¬ vector field (Temporary Block structures MV_QCIF, MV_CIF and MV_4CIF), which is defined on the encoder side or present, but not completely to transmit, but rather only the most significant part of this motion vector field.

An essential advantage of the algorithm according to the invention is the improvement of the image quality even at low bit rates as well as at low resolutions.

FIG. 5 now shows which method steps are taken as a basis for the signaling explained above or also for the bitstream generation, as explained below.

According to the inventive selective refinement method described above, the novel block mode proposed according to the invention shows whether a block structure for a currently considered motion vector field is to be used for the following movement. has to be split up. Because of these block modes, it is therefore possible to locate the regions in which a current residual error block which differs from a previous residual error block associated with a lower layer.

The blocks associated with these regions are then compared with the blocks located at the same positions within the preceding residual error block and the difference is encoded for this purpose. After this

Information has been stored on the encoder side, it is er¬ necessary to the best possible, i. to achieve optimum coordination between the motion information and the texture for the respective bit rate.

As a rule, a bit stream is generated for this purpose before the transmission, so that all the information available on the encoder side can be used optimally.

In order to achieve this, as shown in FIG. 5, for example, a comparison is carried out in the sense of an evaluation in which it is determined whether a motion vector field (block structure) must be refined or not.

This is advantageous for the reason that it can occur in practice that the visible quality, which can be better with the base motion vector field (block structure) MVFIELD1 and the corresponding texture 1 by a value that is x% of the texturl, is better than the result obtained is what is achieved when this motion vector field has been refined into a modified block structure MVFIELD2 and thus also better than the corresponding refinement of the texture '1 (defined by y% of (texture 1 ) + Refinement). Here y is smaller than x at the same bit rate. It can be seen from the corresponding decision procedure shown schematically that in the event that a refinement appears necessary, the part of the information which relates to the texture information must be adapted accordingly. However, this also results in the problem of which part of the texture information is assigned to the refinement information.

As explained above, this is made possible, on the one hand, by suitable signaling, which makes it possible to locate on the side of a decoder those regions in the residual error blocks which can and should be refined. This makes it possible for the decoder that the above-described inventive procedure in which the refinement of the motion information has been adapted such that the embedding of the residual error block has been made possible, i. the part of the refinement of a residual error block is represented by a few more blocks, to recognize and to consider accordingly.

Second, this requires a suitable encoding which, in order to be efficient in the sense of a compression efficiency, is performed in such a way that the refinement blocks are encoded with a block-based transformation (IT, DCT, etc.) these blocks then represent the difference between the residual error blocks on the basis of the refinement of the motion vector fields and the residual error blocks which have not been generated on the basis of refined motion vector fields and have a certain number of bit planes, for example N bit planes.

Finally, this also requires a suitable organization of the bit stream to be generated for the transmission, as shown in FIG.

The goal of this bitstream generation according to the invention is to achieve a good image quality for various spatial conditions. to ensure lent / temporal resolution levels or bit rates, without a drift, which can be caused by an offset zwi¬ rule a motion vector field and a residual error block. Schematically, therefore, the steps are shown with which this is achieved according to the invention.

In this case, the illustrated embodiment starts from an initialization state in which a specific number of motion vector fields with corresponding residual error blocks have been generated on the encoder side. For example, a first motion vector field MVF1 and a first refined motion vector field MVF1 'for a QCIF resolution, the first refined motion vector field MVF1' and (not shown) a second motion vector field for a CIF

Resolution as well as the second motion vector field and a third motion vector field for a 4CIF resolution. The encoding or decoding for such a scenario at a QCIF resolution results in the method according to the invention as follows: starting from the assumption that a large range of bit rates must be decoded for the QCIF resolution, it is in one First step neces sary to transmit the first motion vector field MVl and the first corresponding residual error block. The higher the bitrate, the higher the number of bitrates

Bite planes BTPL1 ... BTPLN + M, which represent the residual error block. Furthermore, the number is limited by the decision explained in the introduction about a refinement of the blocks.

According to the illustrated example, the number of bit planes is limited to a number N. If, according to the evaluation according to the invention, the decision is made that a refinement is required, the first motion vector field MVF1 is refined in such a way that the refined motion vector field MVF1 'is generated. In such a case it is therefore necessary that the first torfield MVFl is updated ("updated") in order to prevent an offset between the motion vector fields and the respective textures.

An algorithm proposed here according to the invention can also be taken from the illustration and proceeds as follows.

If the above-mentioned evaluation of the movement information reveals that an update of the movement information is necessary, a certain number of bit planes BTPL1... BTPLN has usually already been transmitted. Up to a certain limit value BTPLn, the bit planes which represent the non-refined residual error blocks (BTPL1-BTPLn) need not be modified. On reaching this limit BTPLn, on the other hand, the next following bit planes BTPLn... BTPLN are updated according to the exemplary embodiment.

This takes place starting from the bit plane, which represents the last bit plane of the unrefined residual error blocks, BTPLn and extends to the bit plane, which is already transmitted BTPLN.

The update is carried out in such a way that the regions which belong to the refined parts REFINEMENT are updated in such a way that they coincide with the subsequent motion vector field, i. according to the illustrated embodiment, the first refined vector field MVFl 'match.

In this case, according to the invention, in the case of a higher bit rate, the number of bit planes BTPLN + 1 to BTPLN + M which exceeds the already transmitted bit plane number BTPLN can additionally be transmitted. This concept is repeated for each spatial resolution and / or quality level and thereby enables finer granularity of a signal-to-noise scalability (SNR scalability). According to the initial scenario, an encoding or decoding takes place in a CIF resolution stage as follows.

Since, in accordance with the invention, the SNR and spatial scalability should also be combined here, if e.g. it is erforder¬ Lich to decode a (video) bitstream at CIF resolution ren and this is done at lower bit rate, the first modified motion vector field MVFl 'from the QCIF resolution to the CIF resolution upscaled. In addition, e.g. an inverse wavelet transformation or an interpolation is performed in order to achieve a higher spatial resolution of the texture TEXTUR1, TEXTURE '1.

It should be noted that at a very low bit rate the update of the texture TEXTUR1 to the texture TEXTURE '1 is not necessary (for example, if less than n bit planes are necessary to decode the CIF resolution). Overall, this achieves spatial scalability.

The SNR scalability in CIF resolution is achieved by coding the bit planes of the difference between the original refined CIF residual error block and a QCIF bit plane refined by interpolated or inverse wavelet transforms. If the decision as to whether refinement is positive in the CIF resolution is followed by the same strategy as explained in the above-described method for QCIF. The same applies to a scaling from CIF to 4CIF.

However, the invention is not limited to the exemplary embodiment described. Rather, the following also applies:

1. The SNR scalability is generated by bitwise representation of the texture information according to the example described above, but is not limited thereto, as it may also be alternative scalable texture representations can be achieved.

2. The maximum number of bit planes that occur before refinement (BTPLN) may differ for each spatial resolution.

3. In addition, more than one update can take place within a spatial resolution level if more than two layers of the motion information are used for this spatial resolution level.

Irrespective of this, a very good coordination between motion information and the texture is always achieved for a wide range of bit rates as well as spatial temporal resolutions, this being done without degradation of the quality of images, since a good distribution of the information is achieved by the method according to the invention and thus also the offset between motion information and textures is eliminated.

bibliography

[1] Jizheng Xu, Ruiqin Xiong, Bo Feng, Gary Sullivan, Ming

Chieh Lee, Feng Wu, Shipeng Li, "3D subband video coding using barbell lifting", ISO / IEC JTC1 / SC29 / WGIl MPEG

68th meeting, MlO569 / sO5, Munich, March 2004. [2] German patent application, file number 10 2004 038 110.0

Claims

claims

1. A method for video coding of image sequences in which images of the image sequence are encoded scaled in such a way that the resulting video data contain information representing a representation of the images in a plurality of differing levels from one by the number of image points each Guaranteed image representation defined resolution (QCIF, CIF, 4CIF) of the images and / or image qualities, the coding is block-based such that for a description of any contained in the image sequence movement of parts of the images at least one movement descriptive block structure (MV QCIF, MV_CIF, MV_4CIF) is generated, which is designed in such a way that, starting from a block, it subdivides subblocks in part, with the subblocks (MB1_QCIF..MB4_QCIF, MB1_CIF..MB4_CIF, MBl_4CIF..MB4_4CIF), successively into subdividing subblocks is, with the following steps: a) temporarily becomes a first B for at least a first resolution level lock structure (MV_QCIF; MV_CIF) and for a second resolution level a second block structure (MV CIF, MV 4CIF) is generated, wherein the first resolution level has a lower number of pixels and / or image quality than the second resolution level, b) the second block structure (MV_CIF; MV_4CIF) is compared with the first block structure (MV_QCIF; MV_CIF) in such a way that differences in the block structure are determined, c) on the basis of properties of the structure differences, a modified second block structure (MV'_CIF, MV'_4CIF) is generated in such a way, that their structure represents a subset of the second block structure (MV CIF; MV_4CIF), d) the modified second block structure (MV 'CIF,

MV '4CIF) and second block structure (MV CIF; M 4CIF) are compared on the basis of at least one value proportional to a quality of the image, e) the block structure is based on the coding of the bit sequence whose value is directly proportional to a better quality.

2. The method according to claim 1, characterized in that for the determination of difference added sub-blocks are detected.

3. The method according to claim 1 or 2, characterized in that sub-block properties are detected for the difference determination.

4. The method according to claim 3, characterized in that the block size of the sub-blocks is detected as sub-block property.

5. Method according to one of the preceding claims, characterized in that only the subblock (MB1_QCIF..MB4QCIF; MB1_CIF..MB4_CIF) of the first block structure (MV_QCIF; MV CIF) which is used by the subblock (MB1_CIF. .MB4 CIF; MBl_4CIF..MB4_4CIF) corresponds to the second block structure (MV_CIF; MV_4CIF).

6. The method according to any one of the preceding claims, da¬ characterized in that the generation of the second modified block structure is based on a threshold value decision.

7. Method according to one of claims 3 to 6, characterized in that only those subblocks of the second block structure (MV_CIF; MV_4CIF) are taken over into the modified second block structure (MV'_CIF;MV'_4CIF) whose block size is one reach definable threshold.

8. Method according to claim 7, characterized in that the threshold value is defined such that it has a ratio of the block size of a sub-block of the second block structure (MV CIF, MV 4CIF) to one in the second

Indicates the area of the first block structure (MV_QCIF; MV_CIF) contained in the comparison area which is assigned to the smallest sub-block of the area.

9. The method according to any one of the preceding claims, da¬ characterized in that the acquired sub-blocks can be divided non-dyadically.

10. The method according to any one of claims, marked thereby, that as a first block structure (MV_CIF) of a third

Resolution level the modified second block structure (MV '4CIF) of the second resolution level is used, wherein the second resolution level has a lower Bild¬ number of points and / or image quality than the third resolution level.

11. The method according to any one of claims 7 to 10, characterized ge indicates that the coding is carried out such that in the second modified block structure (MV_CIF; MV_4CIF) not taken sub-blocks are each marked.

12. The method according to any one of claims 9 to 11, characterized ge indicates that the coding is such that non-dyadically divided sub-blocks are each marked.

13. The method according to the preceding claim, characterized ge indicates that the identification by the use of a, in particular as "not_refined" designated direction mode takes place.

14. A method in particular according to a preceding Ansprü¬ che, characterized in that in the context of coding the bit sequence, a bit stream is generated such that it represents a scalable texture in connection with an update of Bewegungsinforma- tions, this preferably takes place in that the bit stream is realized by texture resolution stages and in particular is varied at least as a function of the comparison result and also by a bit rate to be realized for a transmission.

15. The method according to claim 14, characterized in that the texture-resolution stages are realized as a number of bit planes.

16. The method according to claim 15, characterized in that the number of bit planes (BTPLl ... BTPLN) is varied depending on the resolution level.

17. The method according to any one of claims 15 to 16, characterized ge indicates that in the case of direct proportionality of the value of the modified second block structure (MV'_CIF; MV'_4CIF) at least a first part of the texture representing Bitebenen (BTPLn .. .BTPLN) is updated.

18. The method according to claim 17, characterized in that the updating is carried out such that the transmission of a second part (BTPLn '... BTPLN') takes place.

19. The method according to claim 17, characterized in that the updating is performed such that the first part (BTPLn ... BTPLN) is modified by a second part of bit-planes (BTPLn '... BTPLN').

20. The method according to any one of claims 16 to 19, characterized in that the update is carried out in such a way that those regions (REFINEMENT) of a texture (TEXTUR1) associated with the second block structure are refined, which are defined by the modified second block structure (MV'_CIF;MV'_4CIF).

21. The method according to any one of claims 14 to 19, characterized ge indicates that at a high bit rate over the number (BTPLl ... BRPLN) second number (BTPLN ... BTPLN + M) is transmitted by bit-planes.

22. Method for decoding, in particular, a coded bit sequence generated according to the method according to one of claims 1 to 20, characterized in that, taking into account the sequence contained in the picture sequence according to a method, in particular according to one of the preceding A scaled representation of the image sequence is generated according to the claims, the updating of motion information and a bit stream representing a scalable texture.

23 encoder for generating a coded image sequence gekenn¬ characterized by means for performing the method according to one of claims 1 to 20.

24. Decoder characterized by means for decoding ei¬ ner generated according to the method of any one of claims 1 to 20 coded bit sequence.

25. A decoder according to claim 23, characterized by means for detecting scalable textures representing parts of the bit stream indicative of the first signals, wherein the signals are designed in particular as syntax elements.

26. Decoder according to one of the preceding claims, to be updated by means for detecting rende regions indicating second signals, the signals are designed in particular as syntax elements.

27. Decoder according to one of the preceding claims, gekenn¬ characterized by means for determining that bit-plane (BTPLn), wherein an update leads to improvements of a representation of the coded image sequence.

28. Decoder according to one of the preceding claims, gekenn¬ characterized by means for determining that bit plane (BTPLN), in which the updating of a texture is to take place.

29. Decoder according to one of the preceding claims, gekenn¬ characterized by means for updating a texture, which are designed such that a consideration of an updated motion information takes place.

30. Decoder according to one of the preceding claims, gekenn¬ characterized by updating means, which are ausgest¬ taltet such that from an existing texture an aktu¬ alisierte texture is formed such that from the associated texture texture information and a text update information the updated Texturin¬ formation is formed.

31 decoder according to claim 30, characterized in that the updating means are configured such that the texture information is at least partially replaced by the tur¬ turaktualisierungsinformation.