EP1800490A1

EP1800490A1 - Device and method for generating a coded video sequence while using an inter-layer movement data prediction

Info

Publication number: EP1800490A1
Application number: EP05791756A
Authority: EP
Inventors: Heiko Schwarz; Detlev Marpe; Thomas Wiegand
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2004-10-15
Filing date: 2005-09-21
Publication date: 2007-06-27
Also published as: WO2006042611A1; JP2008517498A

Abstract

During scalable video coding in conjunction with a movement compensation (1006, 1014) both in a base layer (1002) as well as in an enhancement layer, a prediction (1014, 1016) of the movement data of the enhancement layer (1004) is carried out while using the movement data of the base layer (1004) in order to obtain a scalability concept that, on the one hand, leads to a maximum flexibility for the computation of the movement data of the different layers and, on the other, permits a low bit rate.

Description

Apparatus and method for generating a coded video sequence using an interlayer motion data prediction

description

The present invention relates to video coding systems and in particular to scalable video coding systems that can be used in conjunction with the video coding standard H.264 / AVC or with new MPEG video coding systems.

The H.264 / AVC standard is the result of a video standardization project by the ITU-T Video Coding Expert Group (VCEG) and the ISO / IEC Motion Picture Expert Group (MPEG). The main objectives of this standardization project are to create a clear video coding concept with very good compression behavior, and at the same time to generate a network-friendly video presentation which includes applications with a "conversation character", such as video telephony, for example. and non-conversational applications (storage, broadcasting, streaming).

It exists in addition to the above-cited standard ISO / IEC

14496-10 also a large number of publications that refer to the standard. By way of example only, reference will be made to "The Emerging H.264 AVC Standard", Ralf Schäfer, Thomas Wiegand and Heiko Schwarz, EBU Technical Review, January 2003. Also included is the trade publication "Overview of the H.264 / AVC Video Coding Standard ", Thomas Wiegand, Gary J. Sullivan, Gesle Bjontegaard and Ajay Lothra, IEEE Transactions on Circuits and System for Video Technology, Juicy 2003, as well as the specialist publication" Context-based Adaptive "Binary Arithmetic Coding in the H.264 / AVC Video Compression Standard", Detlev Marpe, Heiko Schwarz, and Thomas Wiegand, IEEE Transactions on Circuits and Systems for Video Technology, September 2003, provide a detailed overview of various aspects of video playback. coding standards.

For better understanding, however, an overview of the video coder / decoder algorithm is given below with reference to FIGS. 9 to 11.

FIG. 9 shows a complete structure of a video coder, which generally consists of two different stages. Generally speaking, the first stage, which operates in a video-based manner in principle, generates output data which is subsequently subjected to entropy coding (entropy coding) by a second stage, which is denoted 80 in FIG. 9 be gene. The data are data 81a, quantized transformation coefficients 81b and motion data 81c, where these data 81a, 81b, 81c are supplied to the entropy coder 80 in order to generate a coded video signal at the output of the entropy coder 80 ,

In detail, the input video signal (input video signal) is divided into macroblocks or split, with each macroblock having 16 × 16 pixels. Then, the assignment of the macroblocks to slice groups and slices (slice) is selected, whereafter each macroblock of each slice is processed by the network of operation blocks as shown in FIG. It should be noted that an efficient parallel processing of macroblocks is possible if different slices are in one video picture. The assignment of the macroblocks to slice groups and slices is done by means of a block coding control. (Coder control) 82 performed in Fig. 8. There are several conditions defined as follows:

I-slice: The I-slice is a slice in which all macroblocks of the slice are coded using an intra-prediction.

P-slice: In addition to the coding type of the H-slice, certain macroblocks of the P-slice may also be encoded per prediction block using at least one motion-compensated prediction signal per prediction block using inter-prediction.

B slice: In addition to the coding types available in the P slice, certain mark blocks of the B slice may also be encoded using one prediction with two motion compensated prediction signals per predictive block ,

The above three types of coding are very similar to those in previous standards, except for the use of reference pictures, as described below. The following two coding types for slices are new in the standard H.264 / AVC:

SP slice: It is also referred to as a switch P slice, which is coded to allow efficient switching between different precoded images.

SI slice: The SI slice is also referred to as a switching I slice, which permits exact adaptation of the macroblock in an SP slice for direct random access and for error recovery purposes. Total are. Slices a sequence of macroblocks that are processed in the order of a raster scan, unless a likewise defined in the standard property of the flexible macroblock arrangement FMO (FMO = Flexible Macro

Block Ordering) is used. An image can be divided into one or more slices, as shown in FIG. 11. An image is therefore a collection of one or more slices. Slices are independent of one another in the sense that their syntax elements can be analyzed (parsed) from the bit stream, whereby the values of the sampled values in the region of the image represented by the slice are decoded in a highly independent manner can be obtained without requiring data from other slices, provided that reference images used in both the encoder and iiα

Decoders are identical. However, certain information from other slices may be needed to apply the deblocking filter across slice boundaries.

The FMO property modifies the way in which images are partitioned into slices and macroblocks by using the concept of slice groups. Each slice group is a set of macroblocks defined by a block-to-slice-group macro mapping specified by the content of an image parameter set and by specific information of slice headers. This macroblock-to-slice group map consists of a slice group identification number for each macroblock j_n the image, specifying to which slice group the associated macroblock belongs. Each slice group can be partitioned into one or more slices so that a slice has a sequence of macroblocks within the same slice group, arranged in the order of a raster scan. within the set of macroblocks of a particular slice group.

Each macroblock may be transmitted in one of several encoding types depending on the slice encoding type. In all the slice encoding types, the following types of intra-encoding are supported, which are called intra_ _4X4 or ixitra_ i6xi6, resulting in an additional chroma prediction mode and an I _PCM prediction mode are supported.

The intra_ ₄ χ ₄ mode is based on the prediction of each 4x4 chroma block separately and is well suited for encoding parts of an image with outstanding detail. The intra_i6χi6 mode performs a prediction of the entire I6xl6-Chroina block on the other side and is more suitable for encoding "soft" areas of an image.

In addition to these two chroma prediction types, a separate chroma prediction is performed. As an alternative for intra_ _4x 4 and intra_i6 _X i6, the I-4 _X 4 codewith type allows the coder to simply skip the prediction as well as the transform coding and instead directly transmit the values of coded samples. The I- _PC M ~ mode serves the following purposes: It allows the encoder to accurately represent the values of the samples. It provides a way to accurately represent the values of very abnormal image content without increasing the data. It also makes it possible to specify a hard limit for the number of bits that is required. Must have coder for a macroblock handling without the coding efficiency suffers. In contrast to earlier video coding standards (namely H.263 plus and MPEG-4 visual), where the In1: ra prediction was performed in the transformation domain, intra-prediction in H.264 / AVC always takes place in spatial domain (spatial domain ) by referring to adjacent sample values of previously coded blocks which are to the left or above the block to be predicted (FIG. 10). This can cause false propagation in certain environments where transmission errors occur, and this error propagation occurs due to motion compensation in inter-coded macroblocks. Therefore, a limited intra-coding mode can be signaled which allows prediction only of intra-coded adjacent macro blocks.

When the _{intra 4X4} mode is used, each 4x4 block of spatially adjacent samples is predicted. Sixteen samples of the 4x4 block are predicted using previously decoded samples in adjacent blocks. One of 9 prediction modes can be used for each 4x4 block. In addition to the DC prediction (where a value is used to predict the entire 4x4 block), 8 direction prediction modes are specified. These modes are suitable for predicting directional structures in an image, such as edges at different angles.

In addition to the intra-macroblock coding types, various predictive or motion-compensated coding types are specified as P-macroblock types. Each P-macroblock type corresponds to a specific division of the macroblock into the block shapes that are necessary for a toewegungs- compensated prediction can be used. Divisions with luma block sizes of 16x16, 16x8, 8x8, 8x16 samples are supported by the syntax. In the case of distributions of 8 × 8 samples, an additional syntax element is transmitted for every 8 × 8 division. This syntax element specifies whether the corresponding 8x8 split is further partitioned into 8x4, 4x8, or 4x4 luma sample splits and corresponding chroma samples.

The prediction signal for each predictive-coded MxM luma block is obtained by shifting a region of the corresponding reference image specified by a translation motion vector and an image reference index. Thus, if a macroblock is coded using four 8x8 divisions, and each 8x8 division is further divided into four 4x4 divisions, a maximum amount of 16 motion vectors for a single P-macroblock in the Frame of the so-called motion field or motion field.

The quantization parameter slice QP is used to calculate the

Quantization of the transformation coefficients

H.264 / AVC. The parameter can take 52 values. These values are arranged so that an increase of 1 with respect to the quantization parameter means an increment of the quantization stepwise by about 12%. This means that an increase of the quantization parameter by 6 results in an increase of the quantizer step size by exactly a factor of 2. It should sen hingewie¬ that a change in the step size likewise by about 12% if in about a reduction of the bit rate of about 12% indicates be ^¬. The quantized transform coefficients of a block are generally sampled in a zigzag path and further processed using entropy coding methods. The 2x2 DC coefficients of the chroma component are sampled in raster-scan order, and all inverse transform operations within H.264 / AVC can be performed using only 16-bit addition and shift operations. Integer values are implemented.

Referring to Fig. 9, the input signal is first divided frame by frame in a video sequence, each time for each frame, into which 16x16 pixel macroblocks are divided. Thereafter, each image is applied to a subtractor 84 which subtracts the original image provided by a decoder 85. Which is included in the encoder. The subtraction result, that is to say the residual signals in the spatial domain, are now transformed, scaled and quantized (block 86) in order to obtain the quantized transformation coefficients on the line 81b. In order to generate the subtraction signal which is transmitted to the sub¬ Tractor 874 is fed, the quantized transform coefficients are first rescaled and inverse transformed (block 87) to be fed to an adder 88 whose output feeds the deblocking filter 89, wherein at the output of the deblocking filter Aus¬ the output video signal as it eg a decoder will decode, e.g. can be monitored for control purposes (output 90).

Using the decoded output signal at the output 90, a motion estimation is then performed in a block 91. To exercise _. Appreciation in Block 91, as can be seen in FIG. 9, is supplied with an image of the original input video signal. The standard allows two different motion estimates, namely a forward motion estimate and a backward motion estimate. In the forward motion estimation the motion of the current picture in terms _^ is estimated on the previous image. On the other hand, in the backward motion estimation, the motion of the current picture is estimated using a future picture. The results of the motion estimation (block 91) become one

Motion compensation block (Motion Compensstion) 92 zuge¬ leads, in particular, when a scarf is ^s ter 93, on the inter-prediction mode, as a motion-compensated Inizer prediction performed in Fig. 9, the case is , On the other hand, if the switch 93 is set to intra-frame prediction, an intra-frame prediction is performed using a block 490. For this, the motion data is not needed, since no motion compensation is performed for an Ixitra frame prediction.

The motion estimation block 91 generates motion data or motion fields, wherein motion data or motion fields, which consist of motion vectors, are transmitted to the decoder, so that a corresponding inverse prediction, ie reconstruction using the transformation coefficients and the motion data, is performed can. It should be noted that in the case of a forward prediction, the motion vector can be calculated from the immediately preceding picture or also from several preceding pictures. In addition, it should be noted that in case of backward prediction a current picture using the UNMIT ^¬ telbar adjacent future image and also aatürlich can be calculated using other future images.

A disadvantage of the video-coder concept shown in FIG. 9 is that it does not offer a simple scalability option. As known in the art, the term "scalability" refers to a coder / decoder concept in which the coder provides a scaled data stream. The scaled data stream includes a 0 base scaling layer and one or more extensions. The basic scaling layer comprises a representation of the signal to be coded generally with lower quality, but also with a lower data rate The expansion scaling layer contains a further representation of the video signal, which is typically The enhancement O scaling layer, of course, has its own bit allowance, so that the number of bits used to represent the basic scaling layer provides a representation with improved quality with respect to the base scaling layer signal to be coded increases with each enhancement layer.

A decoder, depending on the configuration or, as far as possible, either decodes only the basic scaling layer in order to provide a comparatively poor representation of the image signal represented by the coded signal. However, with each "addition" of another skating layer, the decoder can gradually improve the O quality of the signal (at the expense of the bit rate).

Depending on the implementation and the transmission channel from an encoder to a decoder, it will at least be the Base scaling layer, since the bit rate of the base scaling layer is typically so low that even a previously limited carry surround channel will suffice. If the transmission k_anal no longer allows bandwidth for the application, then only: the base scaling layer is transmitted, but not an extension scaling layer. As a result, the decoder can only produce a low-quality representation of the image signal. Compared to the unscaled case, where the data rate would have been so high that a transmission would not have been possible at all, the low-quality representation is an advantage. If the transmission channel 1 permits the transmission of one or more extension layers, then the coder also transmits one or more extension layers to the decoder, so that the latter can gradually increase the quality of the output video signal as required.

With regard to the encoding of video sequences, two different scales can be distinguished. The one scaling is the temporal scaling, to the effect that z. B. not all video frames of a video sequence are transmitted, but that for the reduction of the data rate - for example only every second picture, every third picture, every fourth picture, etc. transmitted wi_rd.

The other scaling is the SNR scalability (SNR = Signal to Noise Ratio), in which each scaling layer, ie both the base scaling threshold and the first, second, third. , , Extension Skzalierungsschicht all temporal information includes, however, with a un¬ different quality. So that would have been the basic Although the scaling layer has a low data rate, it has a low signal-to-noise ratio, and this signal-to-noise ratio can then be progressively improved with the addition of an extension scaling layer in each case.

The coder concept illustrated in FIG. 9 is problematic in that it is based on the fact that only residual values are generated by the subtractor 84, and then further processed. These residual values are calculated on the basis of prediction algorithms, in the arrangement shown in FIG. 9, which forms a closed loop using blocks 86, 87, 88, 89, 93, 94, and 84, with a quantization in the closed loop. received in blocks 86, 87.

If a simple SNR scalability were now implemented in such a way that, for example, each predicted residual signal would first be quantized with a coarse quantizer step size, and then quantized step by step using extension layers with finer quantization step sizes have the following consequences. Due to the inverse quantization and the prediction, in particular with regard to the motion estimation (block 91) and the motion compensation (block 92), which take place using the original image on the one hand and the quantized image on the other hand, a "divergence ¹ " results. the quantization increments "both in the encoder and in the decoder. As a result, the generation of the extension scaling layers on the encoder side becomes very problematic. Further, the processing of the: expansion scale layers on the decoder side is at least with respect to the elements defined in the H.264 / AVC standard impossible. The reason for this is the closed loop shown in FIG. 9 in the video encoder in which the quantization is contained.

In the standardization document JVT-I 032 tl titled

"SNR-Scalable Extension of H.264 / AVC, Heiko Schwarz, Detlev Marpe and Thomas Wiegand, presented at the ninth JVT meeting from 2nd to 5th December 2003 in San Diego, introduces a scalable extension to H.264 / AVC , which includes a scalability both in terms of time and the signal / noise ratio (with the same or unter¬ different time accuracy). For this purpose, a lifting representation of temporal subband decompositions is introduced, which allows the use of known methods for motion-matched prediction.

Wavelet-based video coding algorithms using rendering implementations for wavelet analysis and wavelet synthesis are described in J.-R. Ohm, "Complexity and delay analysis of MCTF interframe" wavelet structures ", ISO / IECJTC1 / WG11 Doc.M8520, Juicy 2002, be¬ written. Comments on scalability can also be found in D. Taiibman, "Successful refinement of video: fundamental issues, past efforts and new directions", Proc. Of SPIE (VCIP'03), vol. 5150, pp. 649-663, 2003, However, according to the invention, an encoder / D & encoder concept is achieved which, on the one hand, has the possibility of scalability and, on the other hand, can build on standard-compliant elements, in particular, for example, for motion compensation. Before discussing an encoder / docoder setup in detail with reference to FIG. 3, a basic lifting scheme on the coder side or an inverse lifting scheme on the decoder side will first be described with reference to FIG. Detailed explanations of the backgrounds of the combination of lifting schemes and wavelet transformations can be found in W. Sweldens, "A cuentom design construction of biorthogonal wavelets", J. Apjpl Comp. Harm. Anal., Vol (no.2), pp. 186-200, 1996 and I. Daubechies and W. Sweldens, "Factoring wavelet transforms into lifting steps", J. Fourier Anal. Appl. , vol. A (no.3), pp. 247-269, 1998. In general, the liftering scheme consists of three steps, the polyphase decomposition step, the prediction step, and the update step (update step).

The decomposition step comprises a division of the input-side data stream into an identical first copy for a lower branch 40a and an identical copy for an upper branch 40b. Further, the identical copy of the upper branch 40b is delayed by one time step (z ^"1 ) such that a sample S _{2k +} i having an odd index k at the same time as a sample having a even index s _2k by a respective decimator or dovm The decimator 42a or 42b.) reduces the number of samples in the upper and in the lower branch 40b, 40s by eliminating each respective second value of the sampled value.

The second area II, which relates to the prediction step, comprises a prediction operator 43 and a subtracter 44. The third area, that is to say the updating step, comprises an updating operator 45. as an adder 46. On the output side, there are still two normalizers 47, 48 for normalizing the high-pass signal h _k (normalizer 47) and for normalizing the low-pass signal I _k by the normalizer 48.

Specifically, the polyphase decomposition results in the even and odd samples being separated by a given signal s [k]. Since the correlation structure typically exhibits a local characteristic, the even and odd polyphase components are highly correlated. Therefore, in a subsequent step, a prediction (P) of the odd samples is performed using the even samples. The corresponding prediction operator (P) for each odd sample is a linear combination of the adjacent even samples ie

As a result of the prediction step, the odd-numbered samples become their corresponding prediction residual values

replaced. It should be pointed out that the prediction step is equivalent to carrying out a high-pass filter of a two-channel filter bank, as described in I. Daiagbeies and W. Sweldens, "Factoring wavelet transforms im- lifting steps", J. Fourier Anal. Appl.vol 4 (no.3), pp. 247-269, 1998. In the third step of the lifting scheme, low-pass filtering is performed by replacing the even samples 5 _even [λ:] with a linear combination of the prediction residuals h [k]. The corresponding updating operator U is given by

By replacing the even samples with

Finally, the given signal s [k] can be represented by l (k) and h (k), but each signal has the half sample rate. Since both the updating step and the prediction step are completely invertible, the corresponding transformation can be interpreted as a critically sampled perfect reconstruction filter bank. In fact, it can be shown that any biorthogonal family of wavelet filters can be realized by a sequence of one or more prediction cycles and one or more update steps. Which are standards as has been carried out for a normalization of the low-pass and high-pass components, supplied minimizer 47 and 48, with suitably chosen Skalieαcungsfaktoren Fi and F _h.

The inverse-lifting scheme, which corresponds to the synthesis filter bank, is shown in FIG. 4, on the right-hand side. It simply consists of applying the prediction and update operator in reverse order and with opposite signs, followed by the reconstruction using the even and odd polyphase components. In detail, the decoder shown on the right in FIG. 4 thus again comprises a first decoder area I, a second decoder area II and a third decoder area III. The first decoder area undoes the effect of the update operator 45. This happens because the high-pass signal back normalized by a further normalizer 50 is supplied to the update operator 45. The output signal of the decoder-side update operator 45 is then fed to a subtractor 52, in contrast to the adder 46 in FIG. Accordingly, the procedure proceeds with the output signal of the predictor 43, the output signal of which is now not fed to a subtracter, as on the coder side, but whose output signal is now fed to an adder 53. Now an upsampling of the signal takes place in each branch by a factor of 2 (blocks 54a, 54b). Then, the upper branch is shifted one sample ahead, which is equivalent to delaying the lower branch, and then performing an addition of the aixf upper and lower branch data streams in an adder 55 to obtain the reconstructed s _k signal to get at the output of the synthesis filter bank.

By means of the predictor 43 or the updater 45, different wavelets can be implemented. If the so-called Haar wavelet is to be implemented, the prediction operator and the update operator are given by the following equation

such that i (5 [2Ä:] - f- j [2Ä + l])

the non-normalized high-pass or Deep pess (analysis) Aus¬ output signal of the hair filter correspond. ,

In the case of the 5/3 biorthogonal spline wavelet, the low-pass and the high-pass analysis filters of this wavelet have 5 resp. 3 filter taps, wherein the corresponding scaling function is a B-spline of order 2. In non-moving image coding applications (still images, such as JPEG 2000), this wavelet is used for a large subband coding scheme. In a lifting environment, the corresponding predictive and updating operators are the same. Transformation as follows

Fig. 3 shows a block diagram of an encoder / decoder structure with exemplary four filter levels both on the encoder side and on the decoder side. From Fig. 3 it can be seen that the first filter plane, the second filter plane, the third filter plane and the fourth filter plane are identical with respect to the encoder. The filter levels relative to the decoder are also identical. On the coder side, each filter plane comprises as central elements a backward predilector Mi ₀ and a forward predictor Mn 61. The backward predictor 60 corresponds in principle to the predictor 43 of FIG. 4, while the forward predictor 61 corresponds to the updating of FIG. 4 ent. In contrast to FIG. 4, it should be noted that FIG. 4 relates to a stream of samples _r in which one sample has an odd-numbered index 2k + 1, while another sample has an even-numbered index 2k. However, the notation in Fig. 3 refers _Λ as be¬ already with reference to FIG. 1 has been set forth, rather than on a group of images to a group of ten sampling values. If an image for example a number of ten or Abtastwer- Pixels, so this image is fed in total. Then the next image is fed in, etc. Thus, there are no longer odd-numbered and even-numbered samples, but odd-numbered and even-numbered images. According to the invention, the lifting scheme described for odd and even samples is applied to odd-numbered or even-numbered images, each of which has a plurality of samples. The sample-value predictor 43 of FIG. 4 now becomes the return-motion-compensation prediction 60, while the sample-wise forward-compensation prediction 61 becomes the sample-value updater 45.

It should be noted that the motion filters, which consist of motion vectors and represent the coefficients for the blocks 60 and 61, are calculated in each case for two images related to one another and transmitted as side information from the encoder to the decoder. However, a significant advantage of the inventive concept is the fact that the elements 91, 92, as described with reference to FIG. 9 and standardized in the standard H.264 / AVC, can be used without further ado both the motion fields Mn as well as to calculate the motion fields ^¬ Mn. For the inventive concept Therefore, no new predictor / updater must be used, but the already existing, examined and tested for functionality and efficiency in Video¬ standard algorithm for motion compensation in the forward direction or in Rückwärtsrich.tung be used.

In particular, the general structure of the filter bank used shown in FIG. 3 shows a temporal decomposition of the video signal with a group of 16 pictures which are fed in at an input 64. The decomposition is a dyadic temporal decomposition of the video signal, in the example shown in Figure 3 embodiment 4 Ebe¬ s 2 ⁴ = 16 frames, is thus compels a group size of 16 pictures sawn to union on the display with the smallest zeit¬ Resolution, so arrive on the signals at the output 28a and the output 28b. Therefore, if 16 images are grouped, this results in a delay of 16 images, which makes the four-level concept shown in FIG. 3 rather problematic for interactive applications. Therefore, if targeted to interactive applications, it is preferable to form smaller groups of images, such as four or eight images to be grouped. Then the delay is reduced accordingly, so that the use for interactive applications is possible. In cases where interactivity is not required, for example for storage purposes etc., the number of images in a group, ie the group size, can be correspondingly increased, for example to 32, 64, etc. images.

Thus, an interactive application of the Hasr-based motion-compensated lifting scheme is used, which consists of a backward motion compensation prediction (Mio), such as in H.264 / AVC, and further comprising an updating step including forward motion compensation (Mn). Both the prediction step and the update step use the motion compensation process as shown in H.264 / AVC. Furthermore, not only the motion compensation is used, but also the deblocking filter 89, designated by reference numeral 89 in FIG.

The second filter element again comprises downsamplers 66a, 66b, a subtracter 69, a backward predictor 67, a forward predictor 68, and an adder 70 and a further processing device in order to provide the first and the second high-order means at an output of the further processing device. output the second level pass image, while at the output of the adder 70, the first and the second low-pass image of the second level are output.

The coder in FIG. 3 additionally comprises a third plane and a fourth plane, wherein a group of 16 images is fed into the input 64 of the fourth plane. The fourth level, which is also called HP4 on a high pass output 72, are quantized with a Quantάsierungs- parameters Q and correspondingly ^■ weiterverarbeite- te eight high-pass-B ± lder output. Accordingly, eight low-pass images are output at a low-pass output 73 of the fourth filter plane, which are fed into an input 74 of the third filter plane. The level is to _r to generate wieder¬ to effectively on a high pass output 75, which is also referred to as HP3 four high pass images, and to a low-pass output to produce 76 four low-pass images 10 of the second in the input Filter level are fed and disassembled. It should be particularly pointed out that the group of images processed by a filter plane need not necessarily be video images which originate from an original video sequence, but may also be low-pass images which pass from a next higher filter level at a low pass Output of the filter level have been output.

It should also be pointed out that the coder concept shown in FIG. 3 can be reduced to eight images without additional s if the fourth filter plane is simply omitted and the group of images is fed into the input 74 , Likewise, the concept shown in FIG. 3 can be easily extended to a group of thirty-two images by adding a fifth filter plane, and then by adding sixteen high-pass images to a high pass output of the fifth fil ¬ terebene be outputted and the sixteen low-pass images are fed at the output of the fifth filter level in the input 64 of the vie ried filter level.

On the decoder side, the tree-like concept of the coder side is also used, but now not metir as on the coder side from the higher level to the lower level, but now, on the decoder side, from the lower level to the higher level. For this purpose, the data stream is received by a transmission medium, which is schematically referred to as Network Abstraction Layer 100, and the received bit stream is first subjected to an inverse further processing using the inverse

Further processing facilities subjected to a reconstructed version of the first high-pass image of the first ebene at the output of the device 30 a and a reconstructed Version of the low-pass image of the first level arα output of the block 30b of Fig. 3 to obtain. Then, analogously to the right-hand half of FIG. 4, the forward motion compensation prediction is first undone by means of the pirdictor 61, in order then to subtract the output signal of the predistorter 61 from the reconstructed version of the Tiezfpass signal (subtractor 101).

The output of the subtractor 101 is fed to a backward compensation predictor 60 to produce a prediction result which is added in an adder 102 to the reconstructed version of the highpass picture. Then both signals, that is to say the signals in the lower branch 103a, 103b, are multiplied by twice the sampling rate using the supersamplers 104a, 104b, in which case the signal on the upper branch is delayed, depending on the implementation. It should be noted that the suppsampling by the bridge 104a, 104b is done by simply inserting a number of zeros equal to the number of samples for an image ei¬ nes image caused by the method shown with z ^"1 element in the upper arm 103b with respect to the lower arm 103a, the addition by an adder 106 with the result that the output side of the adder 106 with respect to the two low-pass-images of the second Level one after the other.

The reconstructed version of the first and the second low-pass image of the second level are then fed into the decoder-side inverse filter of the second level and there, together with the transmitted high-pass images of the second level again by the identical implementation of the inverse Filter bank combined to take on a walk 108 the second level a series of four low-pass images. to have the third level. The four: low-pass images of the third level are combined in an inverse filter plane of the third level with the transferred third-level high-pass images in order to produce eight low-pass images of the first level at an output 110 of the inverse third-level filf fourth level in successive format. These eight low-level images of the third level are then reproduced in an inverse filter of the fourth level with the eight high-pass images of the fourth level received by the transmission medium 100 via the input HP4, as on the basis of the first plane in order to obtain a reconstructed group of 16 images at an output 112 of the fourth-level inverse filter.

Thus, in each stage of the analysis filter bank, two images, that is, either original images or images representing the low-pass signals and generated at a next higher level, are decomposed into a low-pass signal and a high-pass signal. The low-pass signal can be regarded as a representation of the similarities of the: input images, while the high-pass signal can be regarded as a representation of the differences between the input images. In the corresponding stage of the synthesis filter, the two input images are reconstructed using the low-pass signal and the high-pass signal. Since the inverse operations of the Anslyse- step are performed in the synthesis step, guarantees the analysis / synthesis filter bank (without quantization of course lent) perfect reconstruction.

The only losses occur due to: Quantization in the processing facilities eg 26a, 26b, 18 ago. If very finely quantized, a good signal-to-noise ratio is achieved. However, if very roughly quantized, then a relatively poor signal-to-noise ratio, but at low bit rate, ie at low demand, achieved.

Without SNR scalability, at least one time scaling control could berel_ts be implemented with the concept illustrated in FIG. For this purpose, a time-scaling controller 120 is used, which is designed to receive on the input side the high-pass or low-pass outputs or the outputs of the further processing devices (26a, 26b, 18...) In order to flow from these partial data TPl, HPl, HP2, HP3, HP4 to produce a scaled data stream having in a base layer of scaling the we_Lterarbeitende version of the first low-pass image and the first high-pass image. In a first extension scaling layer, the sidewall processing of the second high-pass image could then be accommodated. The further processed versions of the high-pass images on the third level could then be accommodated in the second expansion-scaling layer, while the further-processing versions of the high-pass images of the fourth level are introduced in a third expansion-scaling layer. Thus, a decoder could already produce a temporally low-quality sequence of low-level low-level images due to the base scaling layer alone, than o per group of images, two low-pass s images first level. With the addition of each enhancement layer, the number of reconstructed images per group can always be doubled. The functionality of the decoder is typically controlled by a scaling control, which is designed to detect how much scaling layers are included in the data stream or how many skating layers are to be considered by the decoder during decoding.

The JVT document JVT-J 035, entitled "SNR-Scalable Extension of H.264 / AVC" Heiko Schwarz, Detlev Marpe and Thomas Wiegand, presented at the tenth JVT meeting in Waikigo Hawaii, 8 1 to 12 December 2003, shows an SNR scalable extension of the temporal decomposition scheme illustrated in Figures 3 and 4. In particular, a temporal scaling layer is split into individual "SNR scaling sublayers" to obtain a SNR base layer. that a particular temporal scaling layer is quantized with a first, coarser quantizer step size in order to obtain the SNR

Base layer to get. Then, inter alia, an inverse quantization is performed, and the result signal from the inverse quantization is subtracted from the original signal to obtain a difference signal, which is then quantized at a finer quantizer step size to obtain the second scaling layer. However, the second scaling layer is again requantized below the finer quantizer step size to subtract the signal obtained after requantization from the original signal to obtain another difference signal, but again after quantization with a finer quantizer step size represents a second SNR scaling layer or an SNR enhancement layer.

MCTF = motion, so it has been found that the above-described NEN ^¬ Skalierbarkeitsschemata, the compensated at the motion temporal filtering (MCTF Compensated Temporal Filtering) already provide a high degree of flexibility in terms of time scalability and SNR scalability. However exis¬ cash discount is a problem in that the bit rate of several scaling layers still together significantly higher than the bit rate is the _e can be achieved if pictures would be diert co¬ with the highest quality without scalability. Because of the side information for the different scaling layers, scalable encoders may never reach the bit rate of the non-scaled case. However, the bit rate of a data stream with multiple scaling layers should be at least as close as possible to the bit rate of the non-scaled case.

Furthermore, the scalability concept should provide a high degree of flexibility for all types of scalability, ie a high degree of flexibility both in terms of time as well as in terms of space and in terms of SNR.

The high flexibility is particularly important where already low resolution images would be sufficient, but a higher temporal resolution would be desirable. Such a situation arises, for example, when rapid changes are present in pictures, as for example in videos of team sports, where many people move in addition to the ball at the same time.

Another disadvantage of existing scalability concepts is that they either use the identical motion data for all scaling layers, which restricts either the flexibility of scalability or in a non-optimal motion prediction or in resulting in an increasing residual signal of the motion prediction.

On the other hand, a completely different motion data transmission of two different scaling layers leads to a considerable overhead, since the proportion of the motion data in the total bit stream certainly does, especially when relatively low SNR scaling ratios are considered, in which the coarse quantization is done becomes noticeable. A flexible scalability concept, in which different movement data in different ska lierungsschichten are even possible, is therefore: bought at an additional bit rate, which is particularly disadvantageous in view that all efforts to reduce the bit rate is particularly disadvantageous. The additional bits for the transmission of the movement data also appear in the low scaling layers in comparison to the bits for the motion prediction residual values. However, this is particularly unpleasant precisely there, since it is currently being attempted in the low scaling layers to use a reasonably acceptable quality, that is to say at least a reasonably reasonable quantization parameter, and at the same time to maintain a low bit rate.

The object of the present invention is to provide a scalable video coding system concept which provides a low data rate while still showing flexibility.

This object is achieved by a device for generating a coded video sequence according to patent claim 1, a method for generating a coded video sequence according to claim 15, a device for decoding an elliptically coded video sequence according to patent claim 16, a method for decoding a coded video sequence according to patent claim 21, a computer program according to claim. 22 or a computer-readable medium according to claim. 23 triggers. The present invention is based on the finding that further data rate savings with simultaneous flexibility with regard to different SNR or spatial scaling layers is compensated by the fact that the calculation of the expansion movement data within the scope of an expansion motion compensation for the expansion scaling layer Movement data wer¬ used. Thus, according to the invention, in the calculation of the magnification movement data, it is not done as if there were no motion data of the base layer, but the motion data of the base layer are included in the calculation.

Here, according to preferred embodiments of the present invention, an adaptive concept is used, i. E. different ways of taking into account the basic movement data for different blocks of an image, and that of course for a block on an expansion

Motion data prediction with the basic motion data as a predictor can be dispensed with altogether if it turns out that the prediction does not produce any success in data rate reduction. Whether an augmented motion data prediction using the basic motion data has been made in an overhasty way, and of what kind that was, is transmitted in the bit stream with signaling information associated with a block and thus communicated to the decoder. Thus, the decoder is able to resort to the reconstruction of the motion data for one block on the base motion data already reconstructed in the decoder, the fact that the fact is to be resorted to at all and in which way is signaled by block-transmitted signaling information in the bit stream is signaled.

Depending on the implementation, the basic transaction data may be used in the actual calculation of the expansion Movement Dehn, as they are subsequently used by the expansion motion compensator, are taken into account. However, according to the present invention, it is also preferable to calculate the expansion movement data independently of the basic movement data and to use the base movement data only in the post-processing of the extension movement data to obtain the extension-movement data that is actually for enlargement be conveyed. According to the invention, autonomous calculation of extension movement data is therefore carried out in the sense of high flexibility, whereby these extension movement data calculated independently of the base movement data are used for encoder prediction on the coder side, while the base movement data are only for Purposes of calculating a

Residual signal of any kind can be used to reduce the necessary bits zmr transmission of extension motion vectors.

In a preferred embodiment of the present invention, the movement data-in-between-layer prediction is supplemented by an interlayer residual value prediction in order to make redundancies between the different sclating layers as good as possible, even in the case of the residual values of the motion compensation prediction use and for data rate reduction purposes.

In a preferred embodiment of the present invention, a bit rate reduction is achieved not only with a motion compensated prediction performed within a scaling layer but also with a skew scale layer prediction of the residual images after the motion compensated prediction of a low Riger layer, for example, the base layer, to a higher layer, such as the Erweiterungs¬ layer. It has been found that, within the same temporal scaling layer, the residual values of the individual scaling clauses considered here after the motion-compensated prediction, which are preferably with regard to the resolution or with regard to the signal-to-noise ratio (SNR). are scaled, even between the residual values have correlations. According to the invention, these correlations are advantageously utilized by providing an interlayer predictor on the coder side for the extension scaling layer which corresponds to an interlayer combiner on the decoder side. Preferably, this interlayer predictor is adaptively designed, for example, to decide for each macroblock whether 'interlayer prediction is worthwhile, or whether the prediction would rather lead to a bitrate increase. The latter is the case when, with respect to a subsequent entropy codie, the prediction residual signal becomes larger than the original motion compensation residual signal of the enhancement layer. However, this situation will not occur in many cases, so the interlayer predictor is activated and results in a significant bit rate reduction.

Preferred embodiments of the present invention will be explained in detail below with reference to the accompanying drawings. Show it:

1 a shows a preferred embodiment of an inventive encoder;

FIG. 1b shows a more detailed representation of a basic

Image coder of Fig. Ia; 1 c shows an explanation of the functionality of an intermediate layer prediction flag;

Fig. Id is a description of a motion data flag;

Fig. Ie shows a preferred implementation of the expansion motion compensator 1014 of Fig. 1a;

Fig. 2 shows a preferred implementation of the expansion motion data determiner 1078 of Fig. 2;

FIG. 1 g shows an overview representation of three preferred embodiments for calculating the extension movement data and for expanding transaction data processing for purposes of signaling and, if appropriate, residual data transfer; FIG.

2 shows a preferred embodiment of a decoder according to the invention;

Fig. 3 is a block diagram of a four-level decoder;

4 is a block diagram for illustrating the lifting decomposition of a temporal six-band

Filter bank;

Fig. 5a is an illustration of the functionality of the lifting scheme shown in Fig. 4;

Fig. 5b is an illustration of two preferred. Lifting prescriptions with unidirectional prediction (Haar wavelet) and bidirectional prediction (5/3 transformation); 5c shows a preferred embodiment of the motion compensation prediction and update operators and reference indices for arbitrary choice of the two images to be processed by the lift-up scheme;

5d shows a representation of the intramode in which original image information can be entered into high-pass images in macroblock-wise fashion;

Fig. 6a is a schematic illustration for signaling a macroblock mode;

FIG. 6b shows a schematic representation of the up-sampling of movement data t> ei of a spatial scalability in accordance with a preferred embodiment of the present invention; FIG.

Fig. 6c is a schematic representation of the data stream syntax for motion vector differences;

FIG. 6d shows a schematic representation of a residual value syntax extension according to a preferred ». Embodiment 1 of the present invention;

7 shows an overview diagram for illustrating the time shift of a group of, for example, 8 images;

FIG. 8 shows a preferred temporal placement of low-pass images for a group of 16 images; FIG. 9 shows an overview block diagram for illustrating the basic coding structure for an encoder according to the standard H.264 / AVC for a macroblock;

Fig. 10 is a context arrangement consisting of two adjacent pixel elements A and B to the left and above, respectively, of a current syntax element C; and

11 is an illustration of the division of an image into slices.

1a shows a preferred embodiment of an apparatus for generating a coded video sequence which comprises a base scaling layer and an expansion layer.

Scaling layer has. An original video sequence with a group of 8, 16 or a different number of pictures is fed via an input 100 0. On the output side, the coded video sequence contains the base scale layer 1002 and the extension scale layer 1004. The extension scale layer 1004 and the base scale layer 1002 may be supplied to a bit stream multiplexer which produces a single scalable bit stream on the output side. However, a separate transmission of the two scaling layers is also possible depending on the implementation and makes sense in certain cases. FIG. 1a shows an encoder for generating two scaling layers, ie the base scaling layer and an expansion scaling layer. In order to obtain an encoder produces ge ^¬ th one or more further Erweiterungsschich the optionally, the functionality is to repeat the extension scaling layer, wherein a higher extension scaling layer always by the The scaling layer is always supplied with data from the next lower extension scaling layer, as shown in FIG. 1, where the shown expansion scaling layer 1004 is supplied with data by the base scaling layer 1002.

Before special consideration is given to different types of scaling, such as, for example, SNR scalability or spatial scalability or combined scalability from spatial and SNR scalability, the general principle of the present invention is shown first. Initially, the encoder includes a basic motion compensator or base motion estimator 1006 for computing basic motion data indicating how a macroblock in a current image is related to another image in a group of images representing the basic motion Compensator 1006 input side receives, has moved. Techniques for calculating motion data, in particular for calculating a motion vector for a macroblock, which is in principle a range of pixels in a digital video image, are known. Preferably, the motion compensation calculation is used, as standardized in the video encoding standard H.264 / AVC. This looks at a macroblock of a later image and determines how the macroblock has "moved" compared to a previous image This motion (in the xy direction) is indicated by a two-dimensional motion vector, which is from block 1006 for each macroblock is calculated and supplied via a motion data line 1008 to a basic image coder 1010. Then, it is calculated for the next image how a macroblock has moved from the previous image to the next image. In an implementation, this new motion vector, which as it were specifies the movement from the second to the third image, can be transmitted again as a two-dimensional absolute vector with the movement between the first and the third image. For reasons of efficiency, however, it is preferred to transmit only one motion vector difference, thus the difference between the motion vector of a macroblock from the second to the third image and the motion vector of the macroblock from the first to the second image. Alternative refer- ences or motion vector differences to not immediately preceding images but to further outgoing images can also be used.

The motion data computed by block L006 is then fed to a base motion predictor 1012 configured to compute a basic sequence of residual error images for use of the motion data and the group of images. The basic motion predictor thus performs the motion compensation that has been certainly prepared by the motion compensator or motion estimator. This basic sequence of residual error images is then supplied to the basic BdLdcodierer. The basic image coder is configured to provide the base scaling layer 1002 at its output.

The encoder according to the invention further comprises an expansion-motion compensator or expansion-motion estimator 1014 for determining extension motion data. This extension movement data is then fed to an expansion movement pxrector 1016, which on the output side generates an extension sequence of residual error images and feeds them to a downstream interlayer predictor 1018. The expansion movement Predictor thus carries the motion compensation dxαrch, the motion compensator or. Motion estimator has certainly been prepared.

The interlayer predictor is designed to calculate extension prediction residual error images on the output side. Depending on the implementation, in addition to the data it receives from block 1016, in addition to the extension sequence of residual error images, the interlayer predictor uses the basic sequence of residual error images as provided by block 1012 via dashed detour 1020 , Alternatively, however, block 1018 may also use an interpolated sequence of residual error images provided at the output of block 1012 and interpolated by an interpolator 1022. Again alternatively, the interlayer predictor may also provide a reconstructed basic sequence of residual error images as provided at an output 1024 from the basic image coder 1010. As can be seen from FIG. As can be seen, this reconstructed basic sequence of residual error images may be interpolated 1022 or not interpolated 1020. Generally speaking, the interlayer predictor thus operates using the basic sequence of residual error images, where the information at inter-layer predictor input 1026 is e.g. B. by reconstruction or interpolation, are derived from the basic sequence of residual error images at the output of the block 1012.

The downstream inter-layer predictor 1018 Expansion image encoder 1028, which is formed t, ten to the extension prediction residual images to codie reindeer, the encoded extension scaling layer 1004 ^¬ preserver. In a preferred embodiment of the present invention, the interlayer predictor is configured to macroblock by macroblock and frame by frame the signal at its input 1026 from the corresponding signal, the interlayer predictor 1018 from the expansion motion predictor 1016 gets to subtract. The result signal obtained in this subtraction then represents a macroblock of an image of the extension prediction residual error images.

In a preferred embodiment of the present invention, the interlayer predictor is adaptively designed. For each macroblock, an interlayer prediction flag 1030 is provided, which indicates to the interlayer predictor that it is to perform a prediction, or that indicates in its other state that no prediction is to be performed, but that the a corresponding macroblock at the output of the expansion motion predictor 1016 is to be supplied to the expansion image coder 1028 without further prediction. This adaptive implementation has the advantage that an interlayer prediction is performed only where it is meaningful, ie, where the prediction residual signal leads to a lower output image rate, in comparison to the case at where no intermediate-layer prediction has been performed, but in which the output data of the extension-motion predictor 1016 has been directly coded.

In the case of spatial scalability, a decimator 1032 is provided between the extension scaling layer and the base scaling layer, which decoder is formed in order to implement the video sequence at its input, which has a bestiirante spatial resolution _Λ on a Videose¬ quency at its output, which has a lower Auf¬ solution. If a pure SNR scalability is provided, if as o the basic image coders 1010 and 1028 for the two Ska lierungsschichten with different quantization Pajrametern 1034 or. 1036, the decimator 1032 is not intended. This is shown in FIG. Ia by the detour line 1038 schematically Darge presents.

In the case of spatial scalability, the interpolator 102 2 must furthermore be provided. In the case of pure SNR scalability, however, the interpolator 102 2 is not provided. Instead, the detour line 1020 is taken in place, as shown in FIG. Ia is shown.

In one implementation, the expansion motion compensator 1014 is configured to completely calculate its own motion field, or the motion field calculated by the base motion compensator 1006 directly (detour JL040), or after a vertical touch through an up key 1042 to use . In the case of a spatial Ska¬ lierbarkeit the Hochtaster 1042 must be provided to hochzuasten a motion vector of the basic motion data to the higher resolution, ie z. B. to scale. If, for example, the expansion resolution is twice as high and wide as the base resolution, a macroblock (16 × 16 minute sample values) covers an image area in the extension layer which corresponds to a sub-macroblock (8 × 8 luminance sampling values) Base layer corresponds.

In order to use the base motion vector for the macroblock of the extension scaling layer, the Base motion vector therefore doubled in its x component and its y component, that is, scaled by a factor of 2. However, this will be described below with reference to FIG. 6b.

On the other hand, if there is only SNR scalability, then the motion field is the same for all scaling layers. It therefore only has to be calculated once and can be used by any higher scaling layer directly as calculated by the lower scaling layer.

For interlayer prediction, both the signal at the output of the basic motion predictor 1012 may be used. Alternatively, however, the reconstructed signal on line 1024 may also be used. The selection of which of these two signals to use for prediction is made by a switch 1044. The signal on line 1024 differs from the signal at the output of block 1012 in that it already has a quantization. This means that the signal on line 1024 has a quantization error compared to the signal at the output of block 1012. The alternative of using the signal on the line 1024 for interlayer prediction is particularly advantageous when a

SNR scalability either alone, or in conjunction with a spatial scalability is used, since then the quantization error made by the basic image coder LO10, so to speak, "taken along" in dd_e higher scaling layer, since the output signal at block 1018 then the quantization error made by the first skimming layer, which is then passed through the expansion image coder with a typically finer quantum tisierer step size or a changed Quantisie¬ approximately parameter 2 at the input of quantizer 1036 t ^~ and is written to the extension scaling layer 1004th

Analogous to the interlayer prediction flag 1030, a motion data flag 1048 is also fed into the image coder, so that corresponding information is contained in the enhancement scaling layer 1004 for this purpose, and then transmitted by the decoder, with reference to FIG 2 is used to be used.

If a pure spatial scalability is used, instead of the signal on the line 1024, ie instead of the reconstructed sequence of basis residual error images, the output signal of the basic motion predictor 1012, ie the basic sequence of residual error images, can also be used.

Depending on the implementation, the control of this switch can be performed manually or on the basis of a prediction utility function.

It should be noted at this point that preferably all the predictions, that is to say the motion prediction, the extension movement data prediction and the interlayer residual value prediction, are carried out adaptively. This means that for each macroblock or sub-macroblock in an image of the basic sequence of residual error images, it is not necessarily necessary, for example, to actually contain prediction residual data. Thus, despite the fact that it is referred to as a "residual error image", an image of the basic sequence of residual error images may also contain non-predicted macroblocks or sub-macroblocks. This situation will occur when it turns out to be has, for example, that a new object occurs in an image, here would be a motion-compensated prediction make no sense, since the prediction residual signal would be greater than the original signal in the image. In the expansion-motion prediction at block 1016, in such a case, thus, both the prediction operator and, if applicable, the updating operator for this block Cz-B macroblock or sub-macroblock would be deactivated.

Nevertheless, for reasons of clarity z. Ldern spoken by a base sequence of Restfehlerbi example, although maybe only one residual error image of the base sequence having a single block of residual error images are included in the actual movement Prädik ^~ tions residual signals. In typical applications, however, each residual error image will actually have an increased number of blocks with motion prediction residual data.

For the purposes of the present invention, the same is true for "the extension sequence of residual error pictures to. Thus, this situation in the expan ^~ terungs layer similar to the situation in the base layer to be. For the purposes of the invention is vorliegen¬ Therefore, an extension sequence of residual error images already has a sequence of images in which, in extreme cases, only a single block of a single "residual frame image" has motion prediction residuals, while in all other blocks of that image and even in all On whose "residual error images" actually no residual errors are, since the motion-compensated prediction and possibly the motion-compensated updating have been deactivated for all these images / blocks.

For the purposes of the present invention, this also applies to the interlayer predictor, which calculates extension prediction residual error images. Typically, the expansion prediction residual error images will exist in a sequence. However, the inter-layer predictor before ^¬ is preferably configured adaptively. If z. B. has herausge ^¬ assumed that only a single block a single tra ^¬ gen "residual error picture" residual data prediction from the While the base layer to the enhancement layer was useful, while interlayer residual data prediction has been disabled for all other blocks of that image and possibly even all other images of the sequence of enhancement prediction residual error images, this sequence will nevertheless be considered as Extension prediction residual error images. In this connection, it should be noted that the inter-layer predictor can only predicate residual data if motion compensation residual values have already been calculated in a corresponding block of a residual error field in the base layer, and if for a block corresponding to this block (eg at the same x, y position) a motion-compensated prediction has also been made in a residual error image of the extension sequence, so that residual error values due to a motion-compensated prediction in the extension layer are present in this block. Only when there are actual motion compensation prediction residuals in both blocks to be considered, does the interlayer predictor preferably become active to predict a block of residual error values in an image of the base layer as a predictor of a. Block of residual error values in an image of the enhancement layer and then only the residual values of this prediction, thus to transmit extension prediction residual error data in this block of the viewed image to the extension image coder.

With reference to FIG. 1 b, a more detailed representation of the basic image coder 1010 or of the expansion image coder 1028 or of an arbitrary image coder will be discussed below. Received on the input side: the image coder receives the group of residual error images and supplies them macroblockwise to a transformation 1050. The transformative-programmed macro blocks are then gelled in a block 1052 ska ^¬ and using a quantization parameter 1034, 1036 quantized .... At the exit of the block 1052 Then, the quantization parameter used, ie the used quantization step size for a macroro block and quantization indices for the spectral values of the macroblock are output. This information is then supplied to an entropy coding stage, not shown in FIG. 1b, which comprises a Huffman coder or preferably an arithmetic coder which operates according to H.264 / AVC nacrα according to the known CABAC concept. However, the output of device 1052 is also applied to block 1054, which performs inverse scaling and requantization to convert the quantization indices, along with the quantization parameter, back to numerical values, which are then fed to an inverse transform in block 1036 to obtain a .reconstructed group of residual image errors, which now has a quantization error, which depends on the quantization parameters or the quantization step size, in comparison to the ixr-original group of residual error images at the input of the transformation block 1050. Depending on the control of the switch 1044, either the one signal or the other signal is now supplied to the interpolator 1022 or already to the interlayer predictor 1018 in order to carry out the residual value prediction according to the invention.

In Fig. Ic, a simple implementation of the interlayer prediction flag 1030 is shown. If the interlayer prediction flag is set, the interlayer predictor 1018 is activated. If, on the other hand, the flag is not set, the interlayer predictor is deactivated so that a simulcast operation is carried out for this macroblock or a sub-macroblock subordinate to this macroblock. Reason for this could be that the coding gain by the prediction actually a co- The loss is that a transmission of the corresponding macroblock at the output of the block 1016 yields a better coding gain in the subsequent entropy coding than if prediction residual values were used.

A simple implementation of the motion data flag 1048 is shown in FIG. If the flag is set, motion data of the enhancement layer are derived from the up-sampled motion data of the base layer. In the case of SNR scalability, the 1053 up button is not necessary. Here, with the flag 1048 set, the movement data of the extension layer can be derived directly from the basic movement data. It should be pointed out that this motion data "derivative" can consist in the direct assumption of the motion data or, in a true prediction, in the block 1014 the motion vectors obtained by the base layer of corresponding ones subtracted, for example, to obtain motion data prediction values, the motion data of the enhancement layer (if no prediction of any kind has been made), or the residuals of the prediction (if a true prediction has been made are) via an output shown in Fig. Ia for

Extension picture coder 1028 to be included in the expansion scale layer bit stream 1004 at the end. If, on the other hand, a complete acquisition of the movement data from the base scaling layer is carried out with or without scaling, then no expansion movement data must be written into the extension scaling layer bit stream 1004. All that is required is the movement data flag 1048 in the To signal motion data flag 1048 in the expansion scale layer bitstream.

FIG. 2 shows an apparatus for decoding a coded video sequence that includes the base scaling layer 1002 and the enhancement scaling layer 1004. The expansion scaling layer 1004 and the base scaling layer 1002 may be derived from a bitstream demultiplexer that demultiplexes a scalable bitstream with both scaling layers to include both the base scaling layer 1002 and the expansion scaling layer 1004 of FIG extract common bitstream. The base scaling layer 1002 is supplied to a basic image decoder 1060 which is adapted to decode the base scaling layer to obtain a decoded basic sequence of residual error images and to obtain basic motion data at an output line device 1062. The output signals on line 1062 are then fed to a base-motion combiner 1064, which reverses the base motion prediction introduced in the block 1012 in the encoder to output the decoded pictures of the first scaling layer on the output side. The decoder according to the invention further comprises an expansion image decoder 1066 for decoding the expansion scaling layer 1004 to allow expansion

To obtain prediction residual error images on an output line 1068. The output line 1068 further includes motion data information, such as the motion data flag 1070 or, if actual expansion motion data or expansion motion data residuals were in the enhancement scaling layer 1004, this extension motion data. The decoded basic sequence on the line 1062 is now replaced either by an interpolation lator 1070 is interpolated or unmodified (line 1072) to an interlayer combiner 1074 to undo the interlayer prediction made by the interlayer predictor 1018 of FIG. 1a. The inter-layer combiner is thus designed to combine the extension prediction residual error images with information about the decoded basic sequence on the line 1062, be they interpolated (1070) or not (1072), an extension Sequence of residual defect images, which will eventually

Be ^{^} wegungs combiner 1076 is supplied, makes the like de_r base motion combiner 1064 the distinction made in the Erwei¬ te-trungsSchicht motion compensation rückgän¬ gig. The expansion motion combiner 1076 is coupled to a motion data determiner 1078 to provide the motion data for the motion combination in block 1076. The motion data may actually be full extension motion data for the enhancement layer provided by the interpolation image decoder at the output 1068. Alternatively, the extension movement data may also be movement data residual values. In either case, the corresponding data is supplied via an extension motion data line 1080 to the motion data determiner 1078. However, if the motion data flag 1070 signals that no enhancement strobe data has been transmitted for the enhancement layer, necessary motion data is fetched from the base layer via a line 1082, depending on the scalability used , either directly (line 1084) or after a high-key through a 1086 push-button.

In the case of an inter-layer prediction of INTRA blocks, that is to say no residual data for movement data, reference is made to FIG. Furthermore, a corresponding connection between the expansion-motion combiner 1076 and the base-motion combiner 1064 is provided, which has an interpolator 1090 depending on the spatial scalability, or a detour line if only one SNR scalability has been used is. In the case of an optional intra-block prediction between two layers, the extension layer is only transmitted a prediction residual signal for this intra-macroblock, which is announced by appropriate signaling information in the bit stream. In this case, in addition to the functionality set out below, the expansion motion compressor will also carry out a sum formation for this one macroblock, that is to say a combination between the Makrobllock residual values and the macroblock values from the lower scaling layer and then the obtained macroblock of the actual inverse motion compensation processing.

Referring now to FIGS. 3 to 5d, a preferred embodiment of the basic motion predictor 1012 or the expansion motion predictor 1016 or the inverse element, that is, the expansion motion combiner 1076 or the base motion detector 1016 is shown. Compensator 1064 received.

In principle, any motion compensation prediction algorithm may be used, including the motion compensation algorithm described at 92 in FIG. So the conventional movement

Compensation algorithm also of the system shown in FIG. 1, but with the updating operator U, which is represented by the reference symbol 45 in FIG. is activated. As a result, a group of images is converted into an original image and to a certain extent dependent residual images or prediction residual signals or residual error images. However, in the known motion compensation scheme, an extension is implemented such that the update operator, as shown in FIG. 4, is actively calculated and, for example, as illustrated with reference to FIGS. 5a to 5d , then the normal motion compensation prediction calculation becomes the so-called MCTF processing, which is also referred to as motion-compensated temporal filtering. This will make the update operation from the normal image. or INTRA image of the conventional motion compensation, a low-pass image, since the original image is combined by the prediction residual signal weighted by the update operator.

In the preferred embodiment of the present invention, as already described with reference to FIGS. 1a and 2, such MTCF processing is performed for each scaling layer, the MCTF processing preferably taking place as shown in FIGS. 3 to 5d and Figs. 7 to 8 is described.

In the following, reference will be made to the preferred embodiment of the motion-compensated prediction filter with reference to FIG. 4 and the subsequent FIGS. 5a-5d. As already stated, the motion-compensated temporal filter (MCTF) consists of a general lifting scheme with three steps, namely the polyphase decomposition, the prediction and the updating. FIG. 4 shows the corresponding analysis / synthesis filter bank structure. On the analysis side, the odd samples of a given signal are filtered by a linear combination of the even samples using the prediction operator P and a high pass signal H to the prediction residual values. A corresponding low-pass signal 1 is formed by adding a linear combination of the prediction residual values h to the even-numbered arotast values of the input signal s using the updating operator. The equation-related relationship of the variables h and 1 shown in Fig. 4 and the principal

Embodiments of the operators P and U is shown in FIG. 5a.

Since both the prediction step and the updating step are completely invertible, the corresponding transformation can be regarded as a critically sampled perfect reconstruction filter bank. The synthesis filter bank comprises the application of the prediction operator and the update operator in reverse order with the inverse sign in the summation process, wherein the even and the odd polyphase components are used. For a normalization of the Hoctipass / low-pass components corresponding scaling factors Fi and F _h are used. These scaling factors do not necessarily have to be used, but they can be used if quantizer step sizes are selected during encoding.

It shows f [x, k] a video signal with the space coordinates x = (x, y) ^τ , where k is the time coordinate. The prediction operator P and the temporal decomposition updating operator U using the Haar hair wavelet lifting facial position are given, as shown on the left in Fig. 5b. For the 5/3 transformation, the corresponding operators result as shown on the right in FIG. 5b. The extension to the motion-compensated temporal filter is achieved by modifying the prediction operator and the update operator as shown in FIG. 5c. Particular reference should be made to the reference indices r> 0, which permit general image-adaptive motion-compensated filtering. It can be ensured by means of these reference indices that in the scenario illustrated in FIG. 4, not only two temporally successive images are always decomposed into a high-pass image and a low-pass image, but that, for example, a first image with a third image of a sequence motion compensated can be filtered. Alternatively, the appropriate choice of reference indices allows for e.g. For example, one and the same image of a sequence of sequences can be used to serve as the basis for the motion vector. This means that the reference indices allow, for example, for a sequence of eight images, that all motion vectors z. B. are related to the fourth image of this sequence, so that at the end by processing these eight images through the filter scheme in Figure 4 results in a single low-pass image, and that there are seven Hochpassbilder (Erweiterungs¬ images), and that all motion vectors, where for each motion vector, an enhancement image is assigned to refer to the same IMAGE of the original sequence.

Thus, if one and the same image of a sequence is used as a reference for the filtering of a plurality of further images, this does not result in a temporal resolution scale that obeys factor 2, which can be advantageous for certain applications. It gets into the lower branch The analysis filter bank in FIG. 4 is always fed the same picture, namely, for example, the fourth picture of the sequence of eight pictures. The low-pass image is the same for each filtering, namely the ultimately desired single low-pass image of the sequence of images. If the updating parameter is zero, the basic image is simply "passed through" by the lower branch. In contrast, the high-pass image is always dependent on the corresponding other image of the original sequence and the prediction operator, the motion vector associated with this input image being used in the prediction. In this case, therefore, it can be said that the finally obtained low-pass image is associated with a particular image of the original sequence of images, and that also each high-pass image is associated with an image of the original sequence, with exactly the deviations of the original image of the sequence (a Motion compensation) from the selected basic picture of the sequence (which is fed in the lower branch of the analysis filter bank of FIG. 4). If each update parameter M _O i, Mu, IM _2I and M ₃ i equal

Zero, this leads to the image fed into the lower branch 73 of the fourth level being simply "looped through." The low-pass image TP1 is to some extent "repeatedly" fed into the filter bank, while the other images are controlled by the Reference indices - gradually introduced into the input 64 of Fig. 3 wer¬ the.

As can be seen from the above equations, delivery far the prediction and update operators for motion compensated filtering for the two various ^¬ which wavelets different predictions. When the Haar wavelet is used, a unidirectional achieved motion-compensated prediction. If, on the other hand, the 5/3 spline wavelet is used, the two operators specify a bidirectional motion-compensated prediction.

Since bi-directional compensated prediction generally reduces the energy of the prediction residual, but increases the motion vector rate compared to unidirectional prediction, it is desirable to dynamically interpolate. switching back and forth between unidirectional and bi-directional prediction, which means that a lift representation of the Haar wavelet and the 5/3 spline wavelet can be toggled dependent on a picture-dependent control signal. The concept according to the invention, which does not use a closed feedback loop for temporal filtering, readily permits this macroblock switching back and forth between two wavelets, which in turn serves the flexibility and in particular the data rate saving, which is optimally adapted to the signal.

In order to display the movement fields or more generally the prediction data fields Mp and Mo, it is advantageously possible to resort to the existing syntax of the B slices in H.264 / AVC.

A dyadic tree structure is obtained by cascading the paired image decomposition levels, a group of 2 ⁿ 2 ^{n -1} in -images -Restbilder and a single low-pass

(or intra) image decomposed, as shown with reference to FIG. 7 for ei ^¬ ne group of eight images. In particular, FIG. 7 shows the first-level high-pass image HPl on the output. first level filter 22 and the first level low-pass image at the output 24 of the first level filter. The two low-pass images TP2 at the output 16 of the second-level filter and the two high-pass images obtained from the second plane are shown in FIG. 7 shown as second level images. DL e low-pass images of third planes are present at the output 76 of the third-level filter, while the high-pass images on the third level are present at the output 75 in a further processed form. The group of eight images could originally comprise a video image, and then the decoder of FIG. 3 would be used without fourth filter level. If, on the other hand, the group of eight images is a group of eight low-pass images, as they are used at the output 73 of the filter fourth x plane, then the MCT F decomposition according to the invention can be used as a basis.

Motion predictor, extension iings motion predictor, resp. be used as a base-motion combiner or expansion-motion combiner.

Generally speaking, in this decomposition, a

Group of 2 ⁿ images, (2 ^{n + 1 ~ 2} ) motion field descriptions, (2 ^{11 "1} ) residual images, and a single low-pass (or INTRA) image.

Both the basic motion kernel pusher and the

Extension motion compensators are preferably by a base control parameter or. controls an expansion control parameter in order to calculate an optimum combination of a quantization parameter (1034 or 1036) and motion information defined as a function of a certain rate. This is done according to the following methodology in order to obtain an optimal ratio with respect to a certain maximum bit rate. That's how it is emphasized that for low bit rates, ie for relatively coarse quantization parameters, the motion vectors are of greater importance than for higher scaling layers in which relatively fine quantization parameters are taken. Therefore, for cases of coarse quantization and thus lower bit rate, rather fewer motion data are calculated than for higher scaling layers. Thus, in higher scaling layers, it is preferable to go into sub-macroblock modes to compute more motion data for a good quality and for an optimal situation at the high bit rate than in the case of a low bit rate where the motion data is more heavily in percent Weight is lower in terms of residual data than in the case of a higher scaling layer. This will be explained below.

Let images A and B be given, which are either original images or images representing low-pass signals which are generated in a previous analysis stage. Furthermore, the corresponding arrays of luma pixels are used.

Samples a [] and b [] are provided. The movement description Mio is estimated in a macroblock-wise manner and in the following way:

For all possible macroblock and sub-macroblock division & n of a macroblock i within the picture B, the associated motion vectors become

Mi = [m _x, m _y] ^τ

by minimizing the Lagrange function Hi ₁ = argmϊüeiSn {D ₈ ^ (i, m) + λ • R (i, m)} certainly,

the distortion term being given as follows:

Here, S specifies the motion vector search area within the reference picture A. P is the area which is swept by the subject macroblock division or sub-macroblock division. R (i, m) specifies the number of bits needed to transmit all the components of the motion vector m, where D is a fixed Lagrangian multiplier.

The motion search first proceeds through all integer sample accurate motion vectors in the given search area S. Then, using the best Ganzza hl motion vector, the 8 surrounding half sample accurate motion vectors are tested. Finally, using the best half-sample-accurate motion vector, the 8 surrounding quarter-sample accurate motion vectors are tested. For the half and quarter sample accurate motion vector enhancement, the Terrα a [xm _x , ym _y ]

interpreted as Int: erpolationsoperator.

The Mocius decision for the macroblock mode and the sub-macroblock mode basically follows the same approach. From a given set of possible macroblock ^¬ or sub-macroblock modes S _{Mode /} mode is pi of the following Lagrangian functional minimizing selected: P ₁ = arg min {D ₅ ^ (i, p) - + - λ ■ R (i, p)}

P ^eS moäc

The distortion window is given as follows:

D _SAD (i, p) = Σ \ b [x, y] -a [xm _x [p, x, y], ym _y [p, x, y]] \

(X, y) eP

where P specifies the macroblock or sub-macroblock area, and where m [p, x, y] is the motion vector, the macroblock or sub-macroblock mode p and the pitch o- of the sub-macroblock pitch associated with the luma position (x, y).

The rate term R (i, jp) represents the number of bits associated with the choice of coding mode p. For the motion compensated coding modes, it includes the bits for the macroblock mode (if applicable), the sub-macroblock mode (s). (if applicable) and the motion vector (s). For the intra mode, it includes the bits for the macroblock mode and the arrays of quantized luma and chroma-transform coefficient levels.

The set of possible sub-macroblock modes is through

{P_8x8, P_8x4, P_4κ: 8, P_4x4}

given. The set of possible macroblock modes is through

{P 16x16, P 16x8, P 8x16, P 8x8, INTRA} the INTRA mode is only used if a motion field description Mio used for the predictive step is estimated.

5 The Lagrangian multiplier λ is dependent on the base layer quantization parameter for the resp. the high-pass images QP _H ± of the decomposition level for which the motion field is estimated are set according to the following equation:

IO λ = 033 -2 ^A (QP _m / 3 -4)

According to the invention, the in Fig. 8, which is assumed to be a reasonable compromise between physical scalability and

15 coding efficiency allows. The sequence of the original images is called a sequence of input images A, B, A, B, A, B. , , A, B treated. Thus, this scheme provides a stage with optimal temporal scalability (equal spacing between the low-pass images). The sequence of deep

2 0 pass pictures, which as input signal in all following

Zerlegungsstufen be used as sequences of input images B, A, A, B, B, A. , , A, B deals with, whereby the distances between the low-pass images which are decomposed in the following two-channel analysis scheme,

2 5 are kept small, as shown in FIG. 8 can be seen.

Hereinafter, referring Figures au f 6a to 6d preferred to ^{"implementations} both the Bewegungsdaten- inter-layer prediction and the Restdaten-

3 0 interlayer prediction entered. To a spatial or. In order to achieve SNR scalability, in principle motion data and speed data from a lower scaling layer are used for prediction purposes for a higher scaling layer. Here, in particular In the case of spatial scalability, for example, an up-sampling or an upsampling of the movement data must be necessary before they can be used as a prediction for the decoding of spatial enhancement layers. The motion prediction data of a base-layer representation is transmitted by AVC using a subset of the existing B-slice syntax. To encode the motion field of an enhancement layer, two additional macroblock modes are preferably introduced.

The first macroblock mode is "Base_Layer_Yeary" and the second mode is "Qpel_Refinement_Mode". To signal these two additional macroblock modes, two flags, BLFlag and QrefFlag, are added to the macroblock bus syntax, in front of the mb_mode syntax element, as shown in Fig. 1. Thus, the first signal Flag BLF has the base layer mode 1098 while the other flag 1100 symbolizes the Qpel refinement mode If such flag is set, it has the value 1, and the daisy stream is as shown in Fig. 6a Thus, if the flag 1098 has the value 1, the flag 1100 and the macroblock mode syntax element 1102 play no further role, whereas if the flag 1098 has the value zero, it is not set and the flag HOO comes on wear that, when set, return the item 1102 über¬ bridged. have the other hand, both flags 1098 and IL100 a zero value, so they are both not set, the macro ^¬ block mode is evaluated in the syntax element 1102.

Thus, if BLFlag = 1, the base layer mode is used and no further information is used for the corresponding macroblock. This macrolock mode indicates that the motion prediction information is finally, the macroblock parrtitioning of the corresponding macroblock of the base layer is used directly for the enhancement layer. It should be noted that here and throughout the application, the term "base layer" is intended to represent a next lower layer with respect to the layer currently being considered, ie the extension layer, if the base layer represents a layer with half the spatial resolution the motion vector field, ie the field of motion vectors including the macroblock partitioning, is scaled accordingly, as shown in Figure 6b, in which case the current macrololock comprises the same region as an 8x8 sub macroblock of the base Layer Motion Field If encoded with the corresponding base layer macroblock in a rectel, 16x16, 16x8, or 8x16 mode, or if the corresponding base layer sub-macroblock is in 8x8 mode or in the Direct 8x8 mode, then the 16xl6 mode will be used for the current macroblock, otherwise the base layer sub-macroblock will be in 8x4, 4x8 or 4x4 mode s is the macroblock mode for the current macroblock = 16x8, 8x16 or 8x8 (with all sub-macroblock modes = 8x8). If the base layer macroblock represents an INTRA macroblock, the current macroblock is set to INTRA_BASE, which means that it is a macroblock3 <with a prediction from the base layer. For the macroblock partitionings of the current macroblock, the same reference indices are used as for the corresponding macroblock / sub-macroblock partitionings of the base layer block. The associated motion vectors are multiplied by a factor of 2. This factor applies to the situation shown in FIG. 6b, in which a base layer 1102 comprises half the area or number in pixels As the extension layer 1104. If the ratio of the spatial resolution of the base layer to the spatial resolution of the extension layer is not equal to 3_ / 2, then corresponding scaling factors are used for the motion vectors.

If, on the other hand, the flag 1098 equals zero and the flag 1100 equals 1, the macroblock mode Qpel_Refinement_Mode is signaled. The flag 1100 is preferably present only if the base layer is a layer with half the spatial resolution of the current Schichi; represents. Otherwise, the macroblock mode (Qpel_Re ± inement_Mode) is not included in the set of possible macroblock Mocii. This macroblock mode is similar to the base layer mode. The macroblock partitioning as well as the reference indices and the motion vectors are derived as in the base layer mode. For each motion vector, however, there is an additional quarter-sample motion vector refinement -1.0 or +1 for each motion vector component which is additionally transmitted and added to the derived motion vector.

If the flag 1098 = zero and the FJ_ag 1100 = zero, or if the flag 1100 is not present, the macroblock mode and the corresponding reference indices and motion vector differences are specified as usual. This means that the complete set of movement data for the extension layer is transmitted in the same way as for the base layer. According to the invention, however, it is also possible here to provide the base layer.

Motion vector as a predictor for the current extension layer motion vector (instead of the spatial motion vector predictor) to use. So should the list X (where X is between 0 and 1) specify the reference index list of the considered motion vector. If all subsequent conditions are true, as shown in Fig. 6c, a flag MvPrdFlag is transmitted for each motion vector difference:

the base layer macroblock comprising the current macroblock / sub-macroblock partitions is not encoded in an INTRA macroblock mode;

the base-layer macroblock / sub-macroblock partitioning covering the upper left sample of the present macroblock / sub-macroblock partitioning uses the List X or a bi-prediction;

the list X reference index of the base-layer macroblock / sub-macroblock partitioning, which includes the upper left sample of the current macroblock / sub-macroblock partitioning, is equal to the list X-reference index of the current macroblock / sub-macroblock partitioning.

If the flag 1106 of FIG. 3c is not present, or if this flag 1106 = zero, then the spatial motion vector predictor is specified as it is in the standard AVC. Otherwise, if the flag 1106 is present and = 1, the corresponding base layer vector is used as the motion vector predictor. In this case, the list X motion vector (where X = 0 or 1) of the current macroblock / sub-macroblock partitioning is added by adding the transmitted list X motion vector difference to the possibly scaled list X field. Get the base layer macrok motion block vector> lock / sub macroblock partitioning.

The flags 1098, 1100 and 1106 thus together represent a possibility of implementing the movement data flag 1048 shown generally in FIG. 1 a or, in general, a movement data control signal 1048. Of course, different other possibilities of signaling exist for this purpose, whereby, of course, a fixed agreement between the transmitter and the receiver can also be used which allows a reduction in signaling information.

In summary, with reference to Figures Ie, If and Ig, a detailed implementation of the expansion

Movement compensator 1014 of Fig. Ia and the extension motion data determiner 1078 of Fig. 2 dar¬ laid.

Referring to Figure Ie, it can be seen that the expansion motion compensator 1014 basically has to do two things. Thus, he first has to calculate the extension motion data, ie typically the entire motion vectors, and feed them to the extension motion predicitor 1016 so that he can use these vectors in an uncoded form so as to obtain the extension sequence of residual error pictures. in the state of the art typically adaptive and block-by-block to perform. Another matter, however, is the expansion-motion data processing, ie how the motion data used for the motion-compensated prediction are now compressed as much as possible and written in a bit stream. For this, something has been written in the bible For example, as shown in FIG. 5, corresponding data must be provided to the expandable image coder 1028. Ie is shown. The expansion movement data processing means 1014b thus has the task of reducing as much as possible the redundancy with respect to the base layer contained in the expansion movement data which the expansion movement data calculation means 1014a has detected.

0 According to the invention, the basic movement data or the up-sampled basic movement data can be used both by the extension movement data calculation device 1014a to calculate the extension movement data actually to be used, or can also be used only for extension movement data processing. Thus, they are used for augmentation-motion data compression, while they play a role in the computation of augmentation motion data. While the two possibilities 1.) and 2.) of FIG. 1 g show embodiments in which O the base movement data or the up-sampled base movement data are already used in the expansion movement data calculation, the embodiment FIG. 3 shows FIG. ) of Fig. Ib a case in which information about d_ ± e base movement data is not used to calculate the expansion 5 movement data, but only for coding or Gewin¬ tion of residual data.

FIG. 5F shows the decoder-side implementation of the expansion motion data determiner 1078, which includes a block-by-block control module 1078a that contains the signaling information from the bitstream and expansion image decoder 1066, respectively. Further, the expansion movement data determiner 1078 includes a Expansion motion data reconstructor 1078k>, either solely using the decoded base motion data or decoded up-sampled base motion data, or by combining information about the decoded base motion data and from the expansion-picture decoder 1066 from the extension Scaling layer 1004 extracted residual data actually detects the motion vectors of the extension motion data field, which can then be used by the expansion-motion combiner 1076, which may be formed as a conventional combiner, Chen back chen the encoder-side motion-compensated prediction.

In the following, reference will be made to the various exemplary embodiments, as illustrated in FIG. 1g for an overview. As has already been explained with reference to FIG. 6a, the BLFlag 1098 signals a complete transfer of the scaled-up basic movement data for the extension-motion prediction. In this case, the

Device 1014a is designed to completely take over the basic movement data or, in the case of different resolutions of the different layers, to take over the basic movement data in an upscaled form and transmit them to the device 1016. To the enlargement

However, no information about motion fields or motion vectors is transmitted to the picture coder. Instead, a separate flag 1098 is transmitted only for each block, be it a macroblock or a sub-macroblock.

On the decoder side, this means that the device 1078a of FIG. If decodes the flag 1098 for one block and, if it was active, the one from the base layer. existing decoded basic motion data or the decoded up-sampled base motion data is used to compute therefrom the extension motion data, which is then provided to block 1076. Motion vector residual data is not needed by device 1078 in this case.

In the second embodiment of the present invention, which is signaled by the flag QrefFlag 1100, the basic motion vector is included in the expansion

Motion data calculation performed by the device 1014a integrated. As shown in Fig. Ig in section 2.) and described above, the motion data computation or computation of the motion vectors m takes place in that the minimum of the expression

(D + λ R)

is searched. In the distortion term D, the difference between a block of a current image B and a block of a preceding and / or later image shifted by a specific potential motion vector is entered. The quantization parameter of the expansion-picture coder, which is shown in FIG

1036, a. The expression R provides information about the number of bits used to encode a potential motion vector.

Typically, a search with the different po ^¬ tentiellen Bewegungsvektoirren is now carried out, is being calculated for each new motion vector of the distortion term D and the rate term R is calculated, and wherein the expansion approximately quantization Parartieter 1036, which is preferably fixed, but also may vary. The Surnmenterm described is evaluated for different po ^¬ tentielle motion vectors, according to which the move - vector, which gives the minimum result of the sum.

According to the invention, the base motion vector of the corresponding block from the base sizing is now also integrated into this iterative search. If it fulfills the search criterion, then again only the flag 1100 must be transmitted, but no residual values or anything else must be transmitted for this block. Thus, when the base motion vector satisfies the criterion (minimum of the above expression) for a block, the device 1014a uses this base motion vector to convey it to the device 1016. However, only the flag 1100 is transmitted to the extension image coder.

On the decoder side, this means that the device 1078a, when decoding the flag 1100, drives the device 1078b to determine the motion vector from the base motion data for that block since the expansion-picture decoder had not transmitted any residual data.

In a variation of the second embodiment, not only the basic motion vector, but also a multiplicity of (slightly) modified base motion vectors derived from the base motion vector are integrated into the search in the iterative search. Depending on the implementation, each component of the motion vector can be independently incremented or decremented by one increment, or left equal. This increment may be a particular granularity of a motion vector, eg. B. a dissolution step, a half-resolution step or a quarter-resolution step. If such a modified basic motion vector fulfills the search criterion so, the change, that is to say the increment, that is to say +1, 0 or -1, is to a certain extent also transmitted as "residual data" in addition to the flag 1100. A decoder, activated by flag 1100, will then search for the increment in the data stream and also retrieve the base motion vector or the upsampled base motion vector and combine it with the corresponding base motion vector in block 1078b of the increment to then obtain the motion vector for the corresponding block in the enhancement layer.

In the third embodiment, which is signaled by the flag 1106, the detection of the illumination vectors can in principle be arbitrary. For full flexibility, the device 1014a may provide the e-extension motion data, e.g. B. in accordance with the minimization task mentioned in connection with the zwei¬ embodiment. The determined motion vector is then used for the encoder-side motion-compensated prediction, without taking into account information from the base layer. The extension movement data processing 1014a in this case, however, is designed to include the basic motion vectors in the motion vector processing for redundancy reduction, ie before the actual arithmetic coding.

Thus, according to the standard H.264 / AVC, a transfer of motion vector differences is undertaken, differences between blocks lying within the block being determined within a frame. In the implementation, the difference between different nearby blocks may be formed to select the smallest difference. According to the invention, the basic motion vector for the corresponding block in an image is now included in this search for the most favorable predictor for the motion vector difference. If it meets the criterion that it supplies the smallest residual error value as the predictor, this is signaled by flag 1106 and only the residual error value is transmitted to block 1028. If the basic motion vector does not satisfy this criterion, then the Flag 1106 is not set, and a spatial motion vector difference computation is made.

However, for simpler coder implementations, instead of the iterative search, the basic Bewecjungsvektor or a hochentastete version of the same always as a predictor or always for adaptively determined blocks as a predictor.-

According to the invention, an interlayer prediction is also used. made of residual data. This will be explained below. If the motion information is changed from one layer to the next, it may be convenient or not convenient to predicate residual information or, in the case of MCTF decomposition, high-pass information of the extension layer from the base layer. If the motion envelopes for a block of the current layer are similar to the motion vectors of the corresponding base layer or macroblock-wise to corresponding motion vectors of the corresponding base layer, it is likely that the coding efficiency can be increased if the co The base-layer residual signal (high-pass signal) is used as a prediction for the extension-residual signal (extension-high-pass signal), whereby only the difference between the extension residual signal and the base-layer reconstruction (line 1024 of FIG. Ia) is coded. However, if the motion vectors are dissimilar, it is very unlikely that prediction of the residual signal will improve the coding efficiency. Consequently, an adaptive approach is used for pirating the residual signal or high-pass signal. This adaptive approach, that is, whether the intermediate shift predictor 1018 is active or not, may be by actual calculation of the benefit based on the difference signal, or may be performed based on an estimate, such as is a different motion vector of a base scaling layer for a macroblock to a corresponding macroblock in the extension Scalingscnicht. If the difference is smaller than a certain threshold value, the inter-layer predictor is activated via the control line 1030. If the difference is greater than a certain threshold value, then the intermediate layer predictor for this macroblock is deactivated.

A flag ResPrdFlag 1108 is transmitted. If this flag 1108 = 1, the reconstructed residual signal X of the base layer is used as a prediction for the residual signal of the present macroblock of the enhancement layer, encoding only an approximation of the difference between the current residual signal of the enhancement layer and its base-layer reconstruction. Otherwise, the flag 1108 is absent or equal to zero. Here, the residual signal of the current macroblock in the enhancement layer is then coded without prediction from the base layer.

If the base layer represents a layer with half the spatial resolution of the enhancement layer, the residual signal is upsampled using an interpolation filter before the high-order residual signal of the base layer is used as the prediction signal. This filter is an interpolation filter with six taps, such that for the purpose of interpolating a value of the high spatial resolution of the enhancement layer, which was not present due to the low resolution in the base layer, environmental distortions are used in order to obtain as good a value as possible To obtain interpolation result. However interpolated values at the edge of Transformati ^¬ onsblocks, and therefore a anderren transform block would interpolation filter for interpolating values used, it is preferred, just not to do so, but to synthesize the values of the interpolation filter outside the block under consideration so that an interpolation takes place with as few artifacts as possible.

On the basis of a so-called core experiment it was found out that the inter-ski prediction of motion and residual values significantly improves the coding efficiency of the AVC-based MCTF approach. PSNR gains of more than IdB were obtained for certain test points. Especially at very low bit rates for each spatial resolution (except the base layer), the improvement in the reconstruction quality was clearly visible.

Depending on the circumstances, the method according to the invention can be implemented in hardware or in software. The implementation can be carried out on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can cooperate with a programmable computer system such that the method is executed. In general, the invention thus also exists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention, when the computer program product runs on a computer. In other words, the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer. The present invention further relates to a computer-readable medium on which a scalable data stream having a first scaling layer and a second scaling layer, together with the associated control characters, are stored for the various decoder-side devices. Thus, the computer-readable medium may be a volume, or the Internet on which a data stream is transmitted from a provider to a recipient.

Claims

claims

An apparatus for generating a coded video sequence comprising a base scaling layer (1002) and an expansion scaling layer (1004), comprising:

a base motion compensator (1006) for computing base motion data indicating how a block in a current frame has moved relative to another frame in a group of frames;

a basic motion predictor (1012) for calculating a basic sequence of residual error images using the basic motion data;

a base image coder (1010) configured to generate a coded first scaling layer from the base sequence of residual error images;

an expansion motion compensator (1014) for determining extension motion data, the expansion motion compensator configured to adaptively and blockwise determine expansion motion data using the base motion data and to provide block wise signaling information;

an expansion motion predictor (1016) for computing an extension sequence of residual error images using the expansion motion data; and

an extension image encoder (1028) for encoding information about the extension sequence of residual error images and for encoding the block by block

Signaling information to obtain a coded extension scaling layer.

2. Vorrichitung according to claim 1, wherein the basic motion compensator is designed to calculate the basic motion data for images that have a nied¬ rere spatial resolution as images, on the basis of the expansion motion compensation the Er¬ weiternngs motion data determined,

further comprising an up key (1042) for scaling the basic motion data according to a difference in the spatial resolution of the group of images, and

wherein the expansion motion compensator (1014) is adapted to calculate the extension motion data based on the scaled base motion data.

The apparatus of claim 2, wherein the expansion motion compensator (1014) is adapted to provide the scaled base motion data for a block as

To inherit extension motion data and to supply a takeover signal (1098) to the expansion frame encoder (1028) for that block.

4. The apparatus of claim 2, wherein the expansion motion compensator (1014) is adapted to use the scaled base motion data as a predictor for a block of expansion motion data, an extension motion data remainder signal and to provide the expansion image encoder (1028) with the expansion motion data residual signal together with prediction signaling.

5. Device according to claim 1 or 2 in which _r is the base image encoder (1010) adapted to cooperate with a Ba¬ sis quantization parameters (1034) to guantisieren, in which the basic motion compensator is designed to calculate the basic motion data dependent on a base control parameter (1034) which may depend on the base quantization parameter;

wherein the extension - image encoder (1028) ausgebzLl- det is to quantize with an Erv / eiterungs quantizer ^"ungs- parameters (1036), and

wherein the expansion motion compensator (10 1-3) is adapted to calculate the expansion motion data in response to an expansion control parameter (1036) that may depend on the expansion quantization parameter and different from the base control Parameter for the basic image coder.

6. Device according to claim. 5, in which the expansion ^~ s motion compensator is formed to provide the basic motion data as a predicte gate for the expansion

To use motion data and to supply an extension motion data remainder signal with a block-wise signaling to the extension s-picture coder (1028).

7. The apparatus of claim 5, wherein the extension; s motion compensator (10IL4) is adapted to perform a search among a number of potential motion vectors in determining a motion vector for a macroblock according to a search criterion, wherein the extension Motion compensator (1014) is designed to also use in the search a motion vector already determined for the corresponding block of the base layer, and then, if the search criterion is satisfied by the motion vector of the base layer, to take over the motion vector of the base layer and a related information information (1100) to the expansion image coder (1028).

The apparatus of any one of claims 5 to ₁ , wherein said expansion motion compensator (1014) is further adapted to also consider an incitrementally altered motion vector of the base layer, and then, if by the incrementally-changed motion vector of the base layer The search criterion is to supply the incremental change of the motion vector to the expansion image coder (1028) for a block together with a scaling (1100) for the block.

9. Device according to one of the preceding claims, wherein the expansion Bewegungslcompensierer is configured (1014) to the motion vector "s for blocks to determine an image, and to further the Bewegungsvekto¬ to determine between two motion vectors ren nachzuverarbeiten, ^{^} vector differences to motion and the extension image coder (1028), and

wherein the expansion motion compensator (1014) is further configured to calculate a difference between a motion vector of the block of an image from the enhancement layer and a modified or unmodified one depending on a cost function instead of a difference between motion vectors for two blocks of the same image To use motion vectors of a corresponding block of an image from the base layer, and to supply this difference to the expansion image coder (1028) along with signaling (1106) for the block.

10. The apparatus of claim 9, wherein the expansion motion compensator (1014) is configured to use an amount of a difference as a cost function.

11. Device according to one of the preceding claims, further comprising an interlayer predictor (1018), which is designed to extend using the extension sequence of residual error images and Informa¬ information about the basic sequence of residual error images. Calculate prediction residual error images.

12. Device according to claim 11,

wherein the basic image coder (1010) is adapted to perform quantization with a basic quantization parameter (1034),

wherein the expansion image coder (1028) is configured to perform quantization with an expansion quantization parameter (1036), the expansion quantization parameter (1036) providing finer quantization than the basic quantization parameter (1036). 1034) can result in _Λ

wherein the basic image coder (1010) is adapted to reconstruct the basic sequence of residual error images quantized by the first quantization parameter to obtain a reconstructed basic sequence, and

in which the interlayer predictor (1026) is designed to convert the extension predicates to residual residual images using the extension sequence of residual error images and the reconstructed basic sequence of residual error images as the information to calculate the basic sequence of residual picture images.

13. Device according to claim 11 or 12, which furthermore has the following features:

a decimator (1032) for decimating a resolution of the group of images, wherein the decimator (1032) is adapted to provide the basic motion compensator (1006) with a group of images at a base resolution that is less than one Extension resolution of a group of images provided to the expansion motion compensator (1014); and

an interpolator (1022) for spatially interpolating the basic sequence of residual-error images or a reconstructed basic sequence of residual-error images, to obtain an interpolated basic sequence of residual-error images which the inter-layer predictor (1018) as information (1026 ) can be fed in via the basic sequence of residual defect images. Re-copy Claim 3

14. A method of generating a coded video sequence comprising a base scale layer (1002) and an extension scale layer (1004), comprising the steps of:

Calculating (1006) basic motion data indicating how a block in a current image has moved relative to another image in a group of images;

Calculating (1012) a basic sequence of residual error ^images using the basic motion data; Performing a basic image encoding (1010) to generate a coded first scaling layer from the base sequence of residual error images;

Determining (1014) expansion motion data, where adaptive and blockwise expansion motion data will be determined using the base motion data, and wherein and block by block signaling information is provided;

Calculating (1016) an extension sequence of residual error images using the expansion motion data; and

Performing extension image coding (1028) by encoding information about the extension sequence of residual error images and encoding the block-wise signaling information to obtain a coded extension scaling layer.

15. Device for decoding a coded video sequence with a base scaling layer (1002) and an extension scaling layer (1004) with the folloWing features:

a base image decoder (1060) for decoding the base scaling layer to obtain a decoded basic sequence of residual error images and basic motion data;

a basic motion combiner (1064) adapted to obtain, using the basic motion data and the decoded sequence of residual error images, a sequence of images of the base scaling layer;

an expansion-picture decoder (1066) for Decodie ^¬ the extension scaling layer ren order of information to obtain information about an extension sequence of residual error images and information about extension movement data;

extension movement data calculation means (1078) for calculating the extension movement data by evaluating the information about the extension movement data and using basic movement data based on the extracted movement data evaluation information; and

an expansion motion combiner (1076) configured to obtain a sequence of images of the enhancement scaling layer using the extension sequence of residual error images and the expansion motion data.

16. The device according to claim 15,

in which the extension Bάld decoder (1066) is designed to supply a movement data transfer signal from the extension scaling layer,

wherein there is further provided a Hochtas is ^~ ter (1086) to the base motion data from one base scaling layer resolution to a resolution extension scaling layer implement, and

where the extension transaction data

Calculation device (1C78) is designed to ab¬ depending on the movement data transfer signal (1098) as extension movement data to deliver the converted basic movement data.

17. The apparatus of claim 15, wherein the enhancement picture decoder (1066) is adapted to provide a prediction image from the enhancement scaling layer. To provide signaling (1100, 1106) and an expansion motion data residual signal,

wherein the extension motion data calculator (1078) is adapted to derive the enhancement motion data residual signal in accordance with the prediction signaling (1100, 1106) with the base motion data or at its resolution converted in its resolution To combine motion data to obtain the expansion motion data.

18. An apparatus according to claim 15, wherein said expansion-picture decoder (1066) is adapted to form from said expansion-scaling layer differential prediction signaling (1106) and expansion-motion data remainder ± gnal in form to provide motion vector differences for blocks, and

wherein the extension motion data calculator (1078) is adapted to combine the motion vector difference with a base motion vector for a corresponding block to compute a motion vector for a block dependent on the difference prediction signaling (1106) - combine.

19. Device according to one of claims 15 to 18, further comprising an inter-layer combiner (1074), in order to extend prediction residual error data contained in the extension layer with the decoded basic sequence of residual error images or a In¬ terpolierten basic sequence of residual error images to kom¬ binieren to obtain the extension sequence of Restfehlerbil¬ countries.

20. A method of decoding a coded video sequence having a base scaling layer (1002) and a Expansion Scaling Layer (1004) with the following steps:

Decoding (1060) the base scaling layer to obtain a decoded basic sequence of residual error images and basic motion data;

Performing a base-motion combination (1064) using the base motion data and the decoded sequence of residual error images such that a

Sequence of images of the base scaling layer is obtained;

Decoding (1066) the extension scaling layer to obtain information about an extension sequence of

To obtain residual error images and information about expansion motion data;

Calculating (1078) the extension movement data by evaluating the information about the extension movement data and using basic movement data based on the extracted movement data information; and

Performing an expansion-motion combination (1076) to obtain a sequence of expansion-scale layer images using the extension sequence of residual error images and the extension motion data.

21. Computer program for carrying out a method according to claim 15 or 20, when the method runs on ei¬ nem computer.

22. A computer-readable medium having a coded video footprint that includes a base scale layer (1002) and a Comprising a scaling layer (1004), wherein the coded video sequence is arranged such that, when decoded in a decoding device according to claim 15, it results in a decoded first scaling layer and a decoded second scaling layer.