WO2006042612A1 - Vorrichtung und verfahren zum erzeugen einer codierten videosequenz und zum decodieren einer codierten videosequenz unter verwendung einer zwischen-schicht-restwerte-praediktion - Google Patents
Vorrichtung und verfahren zum erzeugen einer codierten videosequenz und zum decodieren einer codierten videosequenz unter verwendung einer zwischen-schicht-restwerte-praediktion Download PDFInfo
- Publication number
- WO2006042612A1 WO2006042612A1 PCT/EP2005/010227 EP2005010227W WO2006042612A1 WO 2006042612 A1 WO2006042612 A1 WO 2006042612A1 EP 2005010227 W EP2005010227 W EP 2005010227W WO 2006042612 A1 WO2006042612 A1 WO 2006042612A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- motion
- images
- sequence
- basic
- residual error
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/31—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/36—Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- the present invention relates to video coding systems and, more particularly, to scalable video coding systems which can be used in conjunction with the video coding standard H.264 / AVC or with new MPEG video coding systems.
- the H.264 / AVC standard is the result of a video standardization project of the ITÜ-T Video Coding Expert Group (VCEG) and the ISO / IEC Motion Picture Expert Group (MPEG).
- VCEG Video Coding Expert Group
- MPEG Motion Picture Expert Group
- the main objectives of this standardization project are to create a clear video coding concept with very good compression behavior, and at the same time to generate a network-friendly video presentation, which uses applications with a "conversation character", such as video telephony, for example. and non-conversational applications (storage, broadcasting, streaming).
- Fig. 9 shows a complete structure of a video encoder, which generally consists of two different stages.
- the first stage which operates in a video-based manner in principle, generates output data which is finally subjected to entropy coding (entropy coding) by a second stage, which is denoted 80 in FIG. 9 be gene.
- the data are data 81a, quantized transformation coefficients 81b and motion data 81c, where these data 81a, 81b, 81c are fed to the entropy coder 80 in order to generate a coded video signal at the output of the entropy coder 80 ,
- the input video signal (input video signal) is divided into macroblocks or split, with each macroblock having 16 ⁇ 16 pixels. Then, the assignment of the macroblocks to slice groups and slices (slice) is selected, whereafter each macroblock of each slice is processed by the network of operation blocks as shown in FIG. It should be noted that efficient parallel processing of macroblocks is possible if different slices are in one video picture.
- the assignment of the macroblocks to slice groups and slices is performed by means of a block Coding Control (Coder Control) 82 in FIG. There are different slices, which are defined as follows:
- the I-slice is a slice in which all macroblocks of the slice are coded using an intra-prediction.
- certain macroblocks of the P-slice can also be coded per prediction block using an inter-prediction with at least one motion-compensated prediction signal.
- certain mark blocks of the B slice may also be encoded using an inter prediction with two motion compensated prediction signals per prediction block.
- SP slice It is also referred to as a switch P slice, which is coded to allow efficient switching between different precoded images.
- the SI-slice is also called a switching-I-slice, which describes an exact adaptation of the M.acroblock in a an SP slice for direct access and error recovery purposes.
- slices are a sequence of macroblocks that are processed in the order of a raster scan, unless a standard macro flexible array block (FMO) feature is also used.
- An image may be split into one or more slices, as shown in FIG.
- An image is therefore a collection of one or more slices.
- Slices are independent of one another in the sense that their syntax elements can be analyzed (parsed) from the bit stream, whereby the values of the sample values in the area of the image represented by the slice are correctly decoded can be used without the need for data from other slices, provided that reference images used are identical both in the encoder and in the decoder. However, certain information from other slices may be needed to apply the deblocking filter across slice boundaries.
- the FMO property modifies the way in which images are partitioned into slices and macroblocks by using the concept of slice groups.
- Each slice group is a set of macroblocks defined by a macro-block-to-slice group map specified by the content of an image parameter set and by specific information of slice headers.
- This macroblock-to-slice group map consists of a slice group identification number for each macroblock in the image, specifying to which slice group the associated macroblock belongs.
- Each slice group can be partitioned into one or more slices, so that a slice has a sequence of macroblocks within the same slice group that is processed in the order of a raster scan within the set of macroblocks of a particular slice group.
- Each macroblock may be transmitted in one of several types of codecs depending on the slice coding type.
- the following types of intra-coding are supported, which are referred to as intra- 4X4 or intra-i 6x- i 6 , wherein additionally a chroma prediction mode and also an I- PCM prediction mode are supported.
- the intra 4X4 mode is based on the prediction of each 4x4 chroma block separately and is well suited for encoding parts of an image with outstanding detail.
- the intra_ i6 x i 6 mode performs a prediction of the entire l ⁇ xl ⁇ chroma block and is more suitable for encoding "soft" areas of an image.
- the I_4 X 4 coding type allows the encoder to simply skip the prediction as well as the transform coding and instead directly transmit the values of coded samples.
- the I-pc M mode serves the following purposes: It allows the encoder to accurately represent the values of the samples. ER provides a way to accurately represent the values of very abnormal image content without data magnification. It also makes it possible to specify a hard limit for the number of bits that a macroblock handling coder must have without suffering the coding efficiency.
- intra-prediction in H.264 / AVC is always performed in the spatial domain by referring to adjacent sample values of previously coded blocks which are to the left or above the block to be predicted (FIG. 10). This can cause error propagation in certain environments where transmission errors occur, and this error propagation occurs due to motion compensation in inter-coded macroblocks. Therefore, a limited intra-coding mode can be signaled, which makes possible a prediction only of intra-coded neighboring macroblocks.
- each 4x4 block is predicted from spatially neighboring samples. Sixteen samples of the 4x4 block are predicted using previously decoded samples in adjacent blocks. One of 9 prediction modes can be used for each 4x4 block. In addition to the DC prediction (where a value is used to predict the entire 4x4 block), 8 directional prediction modes are specified. These modes are suitable for predicting directional structures in an image, such as edges at different angles.
- P-macroblock types In addition to the intra-macroblock coding types, various predictive or motion-compensated coding types are specified as P-macroblock types. Each type of P-macroblock corresponds to a specific breakdown of the ' macro ' blocks into the block shapes used for motion-compensated prediction. Divisions with luma block sizes of 16x16, 16x8, 8x8 or 8x16 samples are supported by the syntax. In the case of divisions of 8 ⁇ 8 samples, an additional syntax element is transmitted for each 8 ⁇ 8 division. This syntax element specifies whether the corresponding 8x8 division is further partitioned into partitions of 8x4, 4x8 or 4x4 luma samples and corresponding chroma samples.
- the prediction signal for each predictive-coded MxM luma block is obtained by shifting a region of the corresponding reference image specified by a translation motion vector and an image reference index.
- a macroblock is encoded using four 8x8 divisions and if each 8x8 division is further divided into four 4x4 divisions, a maximum amount of 16 motion vectors for a single P-macroblock in the frame may of the so-called motion field or motion field.
- the quantization parameter slice QP is used to determine the quantization of the transform coefficients at H.264 / AVC.
- the parameter can take on 52 values. These values are arranged so that an increase of 1 with respect to the quantization parameter means an increment of the quantization by about 12%. This means that an increase of the quantization parameter by 6 results in an increase of the quantizer step size by exactly a factor of 2. It should be noted that a change in the step size by about 12% is also if approximately a reduction of the bit rate by about 12% be ⁇ indicates.
- the quantized transform coefficients of a block are generally sampled in a zigzag path and further processed using entropy coding methods.
- the 2x2 DC coefficients of the chroma component are sampled in raster-scan order, and all inverse transform operations within H.264 / AVC can be performed using only 16-bit addition and shift operations. Integer values are implemented.
- the input signal is first divided frame by frame in a video sequence, each time for each frame, into which 16x16 pixel macroblocks are divided. Thereafter, each image is applied to a subtractor 84 which subtracts the original image provided by a decoder 85. Which is included in the encoder.
- the subtraction result that is to say the residual signals in the spatial domain, are now transformed, scaled and quantized (block 86) in order to obtain the quantized transformation coefficients on the line 81b.
- the quantized transform coefficients are first rescaled and inverse transformed (block 87) to be fed to an adder 88 whose output feeds the deblocking filter 89, wherein at the output of the deblocking filter Aus ⁇ the output video signal how it will decode a decoder, for example, can be monitored for control purposes (output 90).
- a motion estimation is then performed in a block 91.
- an image of the original input video signal is supplied.
- the standard allows two different motion estimates, namely a forward motion estimate and a backward motion estimate.
- the motion of the current picture with respect to the previous picture is estimated.
- the motion of the current picture is estimated using the future picture.
- the results of the motion estimation (block 91) are fed to a motion compensation block 92 which, in particular when a switch 93 is switched to the inter-prediction mode, as shown in FIG the case is a motion-compensated inter-prediction performed.
- the switch 93 is based on intraframe prediction, an intraframe prediction is performed using a block 490. For this purpose, the motion data is not needed, since no motion compensation is performed for an intra-frame prediction.
- the motion estimation block 91 generates motion data or motion fields, wherein motion data or motion fields, which consist of motion vectors (motion vectors), are transmitted to the decoder, so that a corresponding inverse prediction, ie reconstruction, is performed using the transform coefficients and the motion data can.
- motion vectors motion vectors
- a current image can be calculated using the immediately adjacent future image and of course also using further future images.
- a disadvantage of the video coder concept shown in FIG. 9 is that it does not offer a simple scalability option.
- the term "scalability" is understood to mean a coder / decoder concept in which the coder provides a scaled data stream.
- the scaled data stream includes a base scaling layer and one or more expansion layers.
- the basic scaling layer comprises a representation of the signal to be coded generally with lower quality, but also with a lower data rate
- the enhancement scaling layer contains a further representation of the video signal, which is typically together with the representation
- the extension scaling layer of course, has its own bit requirement, so that the number of bits for representing the signal to be coded with each one is equal to the bit rate of the video signal in the base scaling layer Expansion layer increases.
- a decoder will either decode only the basic scaling layer in order to provide a comparatively poor representation of the picture signal represented by the coded signal.
- the decoder can gradually improve the quality of the signal (at the expense of the bit rate).
- at least the base scaling layer is always transmitted, since the bit rate of the base scaling layer is typically so low that even a previously limited transmission channel will be sufficient. If the transmission channel no longer allows bandwidth for the application, then only the base scaling layer, but not an extension scaling layer, is transmitted. As a result, the decoder can only produce a low-quality representation of the image signal.
- the low-quality representation is an advantage. If the transmission channel permits the transmission of one or more extension layers, then the coder also transmits one or more extension layers to the decoder, so that the latter can gradually increase the quality of the output video signal as required.
- the one scaling is the time scaling, in that e.g. not all video frames of a video sequence are über ⁇ wear, but that- to reduce the data rate - for example, only every other image, every third image, every fourth image, etc. is transmitted.
- the other scaling is SNR (Signal to Noise Ratio) scalability, where each scaling layer, both the base scaling layer and the first te, second, third ... expansion scaling layer comprises all temporal information, but with a different quality.
- SNR Signal to Noise Ratio
- the base scaling layer would have a low data rate but a low signal-to-noise ratio, this signal-to-noise ratio can then be improved step by step with the addition of an extension scaling layer.
- the coder concept illustrated in FIG. 9 is problematic in that it is based on the fact that only residual values are generated by the subtractor 84, and then further processed. These residual values are calculated on the basis of prediction algorithms, in the arrangement shown in FIG. 9, which forms a closed loop using blocks 86, 87, 88, 89, 93, 94, and 84, with a quantization in the closed loop. Now, a simple SNR scalability would be implemented such that, for example, each predicted residual signal is first quantized with a coarse quantizer step size, and then, stepwise, using Extension layers with finer quantization step sizes would be quantized, this would have the following consequences.
- Wavelet-based video coding algorithms using rendering implementations for wavelet analysis and wavelet synthesis are described in J.-R. Ohm, "Complexity and delay analysis of MCTF interframe wavelet structures", ISO / IECJTC1 / WG11 Doc.M8520, June 2002. Scalability notes can also be found in D. Taubman, “Successful refinement of video: fundamental issues, past efforts and new directions ", Proc. of SPIE (VCIP '03), vol. 5150, pp. 649-663, 2003, although significant changes to coder structures are necessary for this.
- an encoder / decoder concept is achieved which, on the one hand, has the scalability possibility and, on the other hand, which can be based on standard-compliant elements, in particular, for example, for motion compensation, can build.
- the decomposition step comprises a division of the input-side data stream into an identical first copy for a lower branch 40a and an identical copy for an upper branch 40b. Further, the identical copy of the upper branch 40b is delayed by one time step (z "1 ) such that a sample S 2k + i having an odd index k at the same time as a sample having a even index S 2k by a respective decimator and down, respectively.
- the decimator 42a or 42b reduces the number of samples in the upper and lower branches 40b, 40a by eliminating each second sample value.
- the second area II which relates to the prediction step, comprises a prediction operator 43 and a subtracter 44.
- the third area that is to say the updating step, comprises an update operator 45 and an adder 46.
- On the output side there still exist two normalizers 47, 48 for normalizing the high-pass signal h k (normalizer 47) and for normalizing the low-pass signal l k by the normalizer 48.
- the polyphase decomposition results in the even and odd samples being separated by a given signal s [k]. Since the correlation structure typically exhibits a local characteristic, the even and odd polyphase components are highly correlated. Therefore, in a subsequent step, a prediction
- the prediction step is equivalent to carrying out a high-pass filter of a two-channel filter bank, as described in FIG. Bechies and W. Sweldens, "Factoring wavelet transforms into lifting steps", J. Fourier Anal. Appl.vol 4 (no.3), pp. 247-269, 1998.
- the given signal s [k] can be represented by l (k) and h (k), but each signal has the half sample rate. Since both the update step and the prediction step are completely invertible, the corresponding transformation can be interpreted as a critically sampled perfect reconstruction filter bank. In fact, it can be shown that any biorthogonal family of wavelet filters can be realized by a sequence of one or more prediction steps and one or more update steps.
- the normers 47 and 48 are supplied with suitably chosen scaling factors Fi and Fj 1 .
- the inverse-lifting scheme which corresponds to the synthesis filter bank, is shown in FIG. 4, on the right-hand side.
- the decoder shown on the right in FIG. 4 thus again comprises a first decoder area I, a second decoder area II and a third decoder area III.
- the first decoder area undoes the effect of the update operator 45. This happens because the high-pass signal back normalized by a further normalizer 50 is supplied to the update operator 45.
- the output signal of the decoding-side update operator 45 is then now, in. Contrary to the adder 46 in FIG. 4, a subtractor 52 is supplied.
- the output signal of the predictor 43 is used, the output signal of which is now not fed to a subtractor, as on the coder side, but whose output signal is now supplied to an adder 53.
- an upsampling of the signal takes place in each branch by a factor of 2 (blocks 54a, 54b).
- the upper branch is shifted one sample ahead, which is equivalent to delaying the lower branch, and then performing addition of the data streams on the upper and lower branches in an adder 55 to produce the reconstructed signal S k to get at the output of the synthesis filter bank.
- the low-pass filter and the high-pass analysis filter of this wavelet have 5 and 3 filter taps respectively, the corresponding scaling function being a B-spline of order 2.
- this wavelet is used for a temporal subband coding scheme.
- the corresponding prediction and update operators of the 5/3 transform are given as follows
- Fig. 3 shows a block diagram of an encoder / decoder structure with exemplary four filter levels both on the encoder side and on the decoder side. From Fig. 3 it can be seen that the first filter plane, the second filter plane, the third filter plane and the fourth filter plane are identical with respect to the encoder. The Filter levels relative to the decoder are also identical.
- each filter plane comprises as central elements a backwards predictor Mio and a forward predictor Mn 61.
- the backward predictor 60 corresponds in principle to the predictor 43 of FIG. 4, while the forward predictor 61 corresponds to the updater of F_ig. 4 ent speaks.
- FIG. 4 relates to a stream of samples in which one sample has an odd-numbered index 2k + 1, while another sample has an even-numbered index 2k.
- the notation in FIG. 3 relates, as has already been explained with reference to FIG. 1, to a group of images instead of to a group of samples. If an image has, for example, a number of samplings or Pixels, so this image is fed in total. Then the next image is fed in, etc. Thus, there are no longer odd and even samples, but odd and even images.
- the lifting scheme described for odd-numbered and even-numbered scanning values is applied to odd-numbered or even-numbered pictures, each of which has a plurality of scanning values. From the sample value-wise predictor 43 of FIG. 4, the backward motion compensation prediction 60 now becomes, while from the sample-value updater 45 the imagewise forward motion compensation prediction 61 becomes.
- the motion filters which consist of motion vectors and represent the coefficients for the blocks 60 and 61, are calculated in each case for two images related to one another and as side information from the encoder to the decoder.
- a significant advantage in the concept according to the invention is the fact that the elements 91, 92, as described with reference to FIG. 9 and standardized in the standard H.264 / AVC, can be used without further ado calculate both the motion fields Mio and the motion-neutral Mn.
- no new predictor / updater must be used, but the already existing, under-examined and tested for functionality and efficiency in the Video ⁇ standard algorithm for theuploadssungskomperxsation in the forward direction or in the reverse direction can be used.
- the general structure of the filter bank used shown in FIG. 3 shows a temporal decomposition of the video signal with a group of 16 images, which are fed in at an input 64.
- the delay is reduced accordingly, so that the use for interactive applications is possible.
- the group size can be increased accordingly, for example, 32, 64, etc. images.
- an interactive application of the hair-based motion-compensated lifting scheme which consists of a backward motion compensation prediction (Mio), as in H.264 / AVC, and which further comprises an updating step comprising a forward motion compensation (Mii).
- a backward motion compensation prediction Mio
- an updating step comprising a forward motion compensation (Mii).
- Both the prediction step and the update step use the motion compensation process as shown in H.264 / AVC.
- the deblocking filter 89 designated by reference numeral 89 in FIG.
- the second filter plane again comprises downsamplers 66a, 66b, a subtracter 69, a backward predictor 67, a forward predictor 68, and an adder 70 and a further processing device in order to connect the first and the second high-order means to an output of the further processing device.
- the coder in FIG. 3 additionally comprises a third plane and a fourth plane, wherein a group of 16 images is fed into the input 64 of the fourth plane.
- a high-pass output 72 of the fourth level which is also designated HP4
- quantized and correspondingly further processed eight high-pass images are output with a quantization parameter Q.
- a low-pass output 73 of the fourth filter level eight Low-pass images output, which is fed to an input 74 of the third filter level.
- the plane is again operative to generate four high-pass images at a high-pass output 75, also designated HP3, and to generate one low-pass output 76 four low-pass images which are input to the second Filter level are fed and disassembled.
- the group of images processed by a filter plane does not necessarily have to be video images that originate from an original video sequence, but may also be low-pass images that pass from a next higher filter level at a low pass Output of the filter level have been output.
- the encoder concept shown in FIG. 3 can be easily reduced to eight images for sixteen images if the fourth filter plane is simply omitted and the group of images is fed into the input 74.
- the concept shown in FIG. 3 can easily be extended to a group of thirty-two images by adding a fifth filter plane, and then by adding sixteen high-pass images to a high-pass output of the fifth filter outputted terebene and the sixteen low-pass images are fed at the output of the fifth filter level in the input 64 of the fourth filter level.
- the output of the subtractor 101 is fed to a backward compensation predictor 60 to produce a prediction result which is added in an adder 102 to the reconstructed version of the highpass picture. Then both signals, that is to say the signals in the lower branch 103a, 103b, are brought to twice the sampling rate, namely using the upsampler 104a, 104b, in which case the signal on the upper branch is delayed, depending on the implementation.
- the suppression by the bridge 104a, 104b is performed simply by inserting a number of zeros equal to the number of samples for an image
- An image caused by the element shown in the upper branch 103b with z "1 relative to the lower branch 103a causes the addition by an adder 106 to lead, on the output side with respect to the adder 106, the two low-pass images of the second plane present in succession.
- the reconstructed version of the first and second low pass second level images are then fed to the second level decoder inverse filter and there, along with the transmitted second level high pass images, again through the identical implementation of the inverse Filter bank to have at a second level output 108 a sequence of four third level low pass images.
- the four low-pass images of the third level are combined in an inverse filter plane of the third level with the transmitted high-pass images of the third level to produce eight low-pass images of the fourth at an output 110 of the third-level inverse filter Level in consecutive format. These eight low-level images of the third level are then reproduced in an inverse filter of the fourth level with the eight high-pass images of the fourth level received by the transmission medium 100 via the input HP4, as on the basis of the first plane in order to obtain a reconstructed group of 16 images at an output 112 of the fourth-level inverse filter.
- each stage of the analysis filter bank two images, that is, either original images or images representing low-pass signals and generated at a next higher level, are thus decomposed into a low-pass signal and a high-pass signal.
- the low-pass signal can be regarded as representing the similarities of the input images
- the high-pass signal can be regarded as representing the differences between the input images.
- the two input images are reconstructed using the low-pass signal and the high-pass signal. Since the inverse operations of the analysis step are carried out in the synthesis step, the analysis / synthesis filter bank (without quantization, of course) guarantees a perfect reconstruction.
- a time-scaling controller 120 is used which is designed to receive on the input side the high-pass or low-pass outputs or the outputs of the further processing devices (26a, 26b, 18). In order to flow from these partial data TPl, HPl, HP2, HP3, HP4 to produce a scaled data stream having the wei ⁇ terver usedde version of the first low-pass image and the first high-pass image in a base scaling layer. In a first expansion scaling layer, the further processing version of the second high-pass image could then be accommodated.
- the further processed versions of the high-pass images on the third level could then be accommodated in a second extension scaling layer, while the further processing versions of the high-pass images of the fourth level are introduced in a third expansion scaling layer.
- the functionality of the decoder is typically controlled by a scaling controller, which is designed to detect how many scaling layers are contained in the data stream or how many scaling layers are to be taken into account by the decoder during decoding.
- JVT-J 035 entitled "SNR-Scalable Extension of H.264 / AVC" Heiko Schwarz, Detlev Marpe and Thomas Wiegand, presented at the tenth JVT * meeting in Waikigo Hawaii, 8-12 December 2003, shows an SNR-scalable extension of the temporal decomposition scheme illustrated in Figures 3 and 4.
- a temporal scaling layer is split into individual "SNR scaling sublayers" to obtain a SNR base layer in that a certain time scaling layer is quantized with a first, coarser quantizer step size to obtain the SNR base layer, then inter alia an inverse quantization is performed, and the result signal from the inverse quantization is subtracted from the original signal to be a
- the second signal is then obtained with a finer quantizer step size to obtain the second scaling layer
- the distribution layer is again requantized with the finer quantizer step width in order to subtract the signal obtained after requantization from the original signal in order to obtain a further difference signal which, after quantization again but now with a finer quantizer step size represents a second SNR scaling layer or an SNR Erahancement- layer.
- the scalability concept should provide a high degree of flexibility for all scalability types, ie a high degree of flexibility both in terms of time as well as in terms of space and in terms of SNR.
- the high flexibility is particularly important where low resolution images are sufficient, but a higher temporal resolution would be desirable. Such a situation arises, for example, when there are rapid changes in pictures, such as For example, in videos of team sports, where many people move in addition to the ball at the same time.
- the object of the present invention is to provide a flexible coding / decoding concept which, in spite of the fact that it is a scalable concept, provides the lowest possible bit rate.
- a device for generating a coded video sequence according to patent claim 1 a method for generating a coded video sequence according to claim 15, a device for decoding a coded video sequence according to patent claim 16, a method for decoding a video clip encoded video sequence according to patent claim 26, a computer program according to claim 27 or a computer-readable medium according to claim 28.
- the present invention is based on the finding that bit rate reductions can not only be achieved with a motion-compensated prediction carried out within a scaling layer, but that further bit rate reductions are obtained with constant image quality by virtue of an intermediate scaling layer prediction of the residual images after the motion-compensated prediction from a lower layer, for example the base layer, to a higher layer, such as the extension layer, Maschinen ⁇ is performed.
- the residual values of the individual scaling layers considered here after the motion-compensated prediction which are preferably in the case of are scaled to the resolution or in terms of signal-to-noise ratio (SNR), even between the residual values have correlations.
- these correlations are advantageously utilized by providing an interlayer predictor on the coder side for the extension scaling layer which corresponds to an interlayer combiner on the decoder side.
- this interlayer predictor is adaptively designed, for example, to decide for each macroblock whether an interlayer prediction is worthwhile, or whether the prediction would rather lead to a bitrate increase.
- a prediction of the movement data of the extension layer is also carried out.
- the motion fields in different scaling layers also have correlations with one another which, according to the invention, advantageously reduce the bit rate by providing a motion data predictor be exploited.
- the prediction can be carried out in such a way that no own movement data is calculated for the enhancement layer but, if appropriate, the movement data of the base layer are accepted after an up-sampling. This can lead to that the motion-coincidence residual signal in the extension layer becomes greater than in the case in which motion data for the extension layer is calculated extra.
- this disadvantage is not significant if the saving due to the movement data saved for the expansion layer during the transmission is greater than the increase in bit rate caused by possibly greater residual values.
- a separate motion field can also be calculated for the enhancement layer, with the motion field of the base layer being integrated into the calculation or being used as a predictor to transmit only motion field residual values.
- This implementation has the advantage that the motion data correlation of the two scaling layers is fully utilized and that the residual values of the motion data after the motion data prediction are as small as possible.
- a disadvantage of this concept is the fact that additional residual motion data must be transmitted.
- an additional SNR scalability is used. This means that quantization is carried out in the base layer with a coarser quantization parameter than in the enhancement layer.
- the residual values of the basic motion prediction which are quantized with the coarser quantizer step size and reconstructed again are used here as prediction signals for the interlayer predictor.
- different quantization parameters also result in different motion fields if a calculation of the motion data into which the quantization parameter is received is used.
- a spatial scalability ie if the base scaling layer has a coarser spatial resolution than the extension scaling layer, then it is preferable to interpolate the residual values of the basis motion prediction, that is, from the low spatial resolution to the implement higher spatial resolution of the enhancement scaling layer and then feed it to the interlayer predictor.
- a motion data prediction is performed which, in turn, can consist either in the complete adoption of the motion data of the lower scaling layer (after scaling), or which may consist of using the highly sampled motion vectors of the lower scaling layer for the prediction of the motion vectors of the higher scaling layer in order then to transmit only the motion data residual values which will require a lower data rate than non-predicted motion data.
- a motion data prediction is performed which, in turn, can consist either in the complete adoption of the motion data of the lower scaling layer (after scaling), or which may consist of using the highly sampled motion vectors of the lower scaling layer for the prediction of the motion vectors of the higher scaling layer in order then to transmit only the motion data residual values which will require a lower data rate than non-predicted motion data.
- a combined scalability is used, in that the base scaling layer and the extension scaling layer differ both in spatial resolution and in the quantization parameter used, that is to say the quantizer step size used , In this case, starting z.
- the quantization parameter used that is to say the quantizer step size used , In this case, starting z.
- a combination of quantization parameters for the base layer, distortion and bit requirement for the motion data for the base layer is calculated from a predefined quantization parameter for the base scaling layer on the basis of a Lagrangian optimization.
- the residual values obtained after a motion-compensated prediction and the base movement data used here are then used for the prediction of the corresponding data of a higher scaling layer, in turn starting from a finer scaling parameter for the higher scaling layer an optimum combination of bit requirement for the motion data, quantization parameters and distortion expansion motion data can be calculated.
- Fig. Ia a preferred embodiment of an encoder according to the invention.
- FIG. 1b is a more detailed illustration of a basic picture coder of FIG. 1a;
- FIG. 1 c shows an explanation of the functionality of an interlayer prediction flag;
- Fig. Id is a description of a motion data flag
- Fig. Ie shows a preferred implementation of the expansion motion compensator 1014 of Fig. 1a;
- Fig. 2 shows a preferred implementation of the expansion motion data determiner 1078 of Fig. 2;
- Fig. Ig is an overview of three preferred
- Fig. 2 shows a preferred embodiment of a decoder according to the invention
- Fig. 3 is a block diagram of a four-level decoder
- Fig. 4 is a block diagram illustrating the lifting decomposition of a temporal subband filter bank
- Fig. 5a is an illustration of the functionality of the lifting scheme shown in Fig. 4;
- 5b shows a representation of two preferred lifting rules with unidirectional prediction (Haar wavelet) and bidirectional prediction (5/3 transformation);
- 5c shows a preferred embodiment of the prediction and update operators with motion compensation and reference indices for arbitrary selection of the two images to be processed by the lift-up scheme
- 5d shows a representation of the intramode in which original image information can be entered into high-pass images in macroblock-wise fashion
- 6a shows a schematic representation for signaling a macroblock mode
- 6b shows a schematic representation of the up-sampling of motion data at a spatial scalability in accordance with a preferred embodiment of the present invention
- Fig. 6c is a schematic representation of the data stream syntax for motion vector differences
- FIG. 6 d is a schematic representation of a residual value syntax extension according to a preferred embodiment of the present invention.
- FIG. 7 shows an overview diagram to illustrate the time shift of a group of, for example, 8 images
- FIG. 8 shows a preferred temporal placement of low-pass images for a group of 16 images
- FIG. 8 shows a preferred temporal placement of low-pass images for a group of 16 images
- FIG. 9 shows an overview block diagram for illustrating the basic coding structure for a coder according to the standard H.264 / AVC for a macroblock
- Fig. 10 is a context arrangement consisting of two adjacent pixel elements A and B to the left and above, respectively, of a current syntax element C;
- FIG 11 is an illustration of the division of an image into slices.
- Figure 1a shows a preferred embodiment of an apparatus for generating a coded video sequence having a base scaling layer and an extension scaling layer.
- An original video sequence with a group of 8, 16 or other number of pictures is fed via an input 1000.
- the coded video sequence contains the base scaling layer 1002 and the extension scaling layer 1004.
- the extension scaling layer 1004 and the base scaling layer 1002 can a bit stream multiplexer are supplied to the output side begets a single scalable bitstream er ⁇ .
- FIG. 1a shows an encoder for generating two scaling layers, ie the base scaling layer and an expansion scaling layer.
- extension scaling layer In order to obtain an encoder which If necessary, one or more further extension layers are generated, the functionality of the extension scaling layer is to be repeated, whereby a higher extension scaling layer is always supplied with data by the next lower extension scaling layer, as in FIG. 1 the extensions shown Scaling layer 1004 is supplied by the base scaling layer 1002 with data.
- the encoder includes a basic motion compensator or base motion estimator 1006 for computing basic motion data indicating how a macroblock in a current image is related to another image in a group of images representing the basic motion Compensator 1006 input side receives, has moved.
- a basic motion compensator or base motion estimator 1006 for computing basic motion data indicating how a macroblock in a current image is related to another image in a group of images representing the basic motion Compensator 1006 input side receives, has moved.
- the motion compensation calculation is used, as standardized in the video coding standard H.264 / AVC.
- a macroblock of a later image is considered and it is determined how the macroblock has "moved" in comparison to an earlier image.
- This movement (dLn xy direction) is indicated by a two-dimensional motion vector, which is derived from block 1006 for each macroblock: is calculated and supplied to a basic image coder 1010 via a motion data line 1008.
- a basic image coder 1010 is calculated and supplied to a basic image coder 1010 via a motion data line 1008.
- the next image calculates how a macroblock has moved from the previous image to the next image.
- this new motion vector which to a certain extent indicates the movement from the second to the third image, can be transmitted again as a two-dimensional vector.
- it is preferred to transmit only one motion vector difference ie the difference between the motion vector of a macroblock from the second to the third image and the motion vector of the macroblock from the first to the second image.
- Alternative refer- ences or motion vector differences to images not immediately preceding one another but to further preceding images can also be used.
- the motion data computed by block 1006 is then fed to a base motion predictor 1012 configured to compute a base sequence of residual error images for use of the motion data and the group of images.
- the basic motion predictor thus performs the motion compensation that has been certainly prepared by the motion compensator or motion estimator.
- This basic sequence of residual error images is then supplied to the basic image coder.
- the basic image coder is configured to provide the base scaling layer 1002 at its output.
- the encoder according to the invention further comprises an expansion-motion compensator or expansion-motion estimator 1014 for determining extension motion data.
- This extension movement data is then supplied to an expansion movement predictor 1016, which has on the output side an extension sequence of residual error codes. images and fed to a downstream interlayer predictor 1018.
- the expansion motion predictor thus performs the motion compensation that has been somewhat prepared by the motion compensator or motion estimator.
- the interlayer predictor is designed to calculate extension prediction residual error images on the output side.
- the interlayer predictor uses the basic sequence of residual error images as provided by block 1012 via dashed detour 1020 , Alternatively, however, block 1018 may also use an interpolated sequence of residual error images provided at the output of block 1012 and interpolated by an interpolator 1022. Again alternatively, the interlayer predictor may also provide a reconstructed basic sequence of residual error images as provided at an output 1024 from the basic image coder 1010. As can be seen from FIG.
- this reconstructed basic sequence of residual error images can be interpolated 1022 or not interpolated 1020.
- the interlayer predictor thus operates using the basic sequence of residual error images wherein the information at the interlayer predictor input 1026 is e.g. B. derived by reconstruction or interpolation of the basic sequence of residual error images at the output of the block 1012.
- an expansion image coder 1028 configured to encode the enhancement prediction residual error images to receive the encoded extension scaling layer 1004.
- the interlayer predictor is configured to macroblock by macroblock and frame by frame the signal at its input 1026 from the corresponding signal, the interlayer predictor 1018 from the expansion motion predictor 1016 gets to subtract.
- the result signal obtained in this subtraction then represents a macroblock of an image of the extension prediction residual error images.
- the interlayer predictor is adaptively designed.
- an interlayer prediction flag 1030 is provided, which indicates to the interlayer predictor that it is to perform a prediction, or that indicates in its other state that no prediction is to be performed, but that the a corresponding macroblock at the output of the expansion motion predictor 1016 is to be supplied to the expansion image coder 1028 without further prediction.
- This adaptive implementation has the advantage that an interlayer prediction is carried out only where it is meaningful, ie where the prediction residual signal leads to a lower output image rate, in comparison with the case in which no interlayer prediction has been performed, but in which the output data of the expansion-movement predictor 1016 has been directly coded.
- a decimator 1032 is provided between the extension scaling layer and the base scaling layer, which is formed to convert the video sequence at its input, which has a certain spatial resolution, to a video sequence at its output which has a lower Auf ⁇ solution. If pure SNR scalability is provided, ie if the basic image coders 1010 and 1028 work with different quantization parameters 1034 and 1036 for the two scaling layers, the decimator 1032 is not provided. This is schematically represented in FIG. 1a by the detour line 1038.
- the interpolator 1022 In the case of spatial scalability, the interpolator 1022 must furthermore be provided. In contrast, in the case of a pure SNR scalability, the interpolator 1022 is not provided. Instead, the ummwegtechnisch 1020 is taken instead, as shown in Fig. Ia.
- the expansion motion compensator 1014 is configured to completely calculate its own motion field, or directly to the motion field calculated by the base motion compensator 1006 (detour line 1040) or after a key up through an up key 1042 use.
- the high-speed scanner 1042 must be provided in order to scan up a motion vector of the basic motion data to the higher resolution, ie to scale, for example.
- a macroblock (16x16 luminance samples) in the enhancement layer covers an image area which corresponds to a sub-macroblock (8x8 luminance samples) Base layer corresponds.
- the base motion vector is therefore doubled in its x-component and its y-component, ie scaled by a factor of 2. However, this will be described below with reference to FIG. 6b.
- the motion field is the same for all scaling layers. It therefore only has to be calculated once and can be used by each higher scaling layer directly as it has been calculated by the lower scaling layer.
- both the signal at the output of the basic motion predictor 1012 can be used.
- the reconstructed signal on line 1024 may also be used. The selection of which of these two signals to use for prediction is made by a switch 1044.
- the signal on line 1024 differs from the signal at the output of block 1012 in that it already has a quantization. This means that the signal on line 1024 has a quantization error compared to the signal at the output of block 1012.
- the alternative of using the signal on line 1024 for interlayer prediction is particularly advantageous if SNR scalability is used either alone or in conjunction with spatial scalability, since then the signal provided by the basic image coder 1010 the quantization error is to a certain extent taken into the higher scaling layer, since the output signal at the block 1018 then quantizes the quantization error made by the first scaling layer, which is then quantized by the enhancement picture coder with a typically finer quantizer step size or quantization parameter 2 at the input 1036 and into the extension scaling layer 1004 is written.
- a motion data flag 1048 is also fed into the image coder so that corresponding information is contained in the enhancement scaling layer 1004 for the purpose of being decoded by the decoder, with reference to FIG 2 is used to be used.
- the output signal of the basic motion predictor 1012 ie the basic sequence of residual error images, can also be used.
- control of this switch can be performed manually or on the basis of a prediction utility function.
- an extension sequence of residual error images is already a sequence of images in which in extreme cases only a single block of a single "residual error image” has motion prediction residual values, while in all other blocks of this picture and even in all other "residual defect images” there are actually no residual errors, since for all these images / blocks the motion composition pensêt prediction and possibly the motion compensated Ak ⁇ tualmaschine have been disabled.
- the interlayer predictor which calculates extension prediction residual error images.
- the extension prediction residual error images will be present in a sequence.
- the interlayer predictor is preferably designed to be adaptive. If z.
- a residual data prediction from the base layer to the extension layer was only meaningful for a single block of a single "residual error image", while for all other blocks of this image and possibly even for all other images of the sequence of
- this sequence will still be referred to as extension-prediction-residual-error-pictures, in which context it should be noted that the interlayer predictor only contains residual data can predicate if motion compensation residual values have already been calculated in a corresponding block of a residual error image in the base layer, and if for a block corresponding to this block (eg at the same x, y position) in a residual error image the extension sequence also provided a motion-
- the interlayer predictor preferably become active to provide a block of residual error values in an image of the base layer as a predictor for a block of To use residual error values in an image of the expansion layer and then to transfer only the residual values of this prediction, al ⁇ so extension prediction residual error data in this block of the considered image to the extension image coder.
- the image coder receives the group of residual error images and supplies them in macroblock fashion to a transformation 1050.
- the transformed macroblocks are then scaled in a block 1052 and quantized using a quantization parameter 1034, 1036,....
- the quantization parameter used that is to say the quantization step size used for a macroblock, and quantization indices for the spectral values of the macroblock are then output. This information is then fed to an entropy coding stage, not shown in FIG.
- the output of device 1052 is also applied to a block 1054 which performs inverse scaling and requantization to convert the quantization indices, along with the quantization parameter, back to numerical values which are then fed to an inverse transform in block 1056 to obtain a reconstructed group of residual image errors which, compared to the original group of residual error images at the input of the transformation block 1050, now has a quantization error which depends on the quantization parameters or the quantization parameters. depending on the increment.
- either the one signal or the other signal is now supplied to the interpolator 1022 or already to the interlayer predictor 1018 in order to carry out the residual value prediction according to the invention.
- a simple implementation of the interlayer prediction flag 1030 is shown. If the interlayer prediction flag is set, the interlayer predictor 1018 is activated. If, on the other hand, the flag is not set, then the interlayer predictor is deactivated so that a simulcast operation is executed for this macroblock or a sub-macroblock subordinate to this macroblock. The reason for this could be that the coding gain due to the prediction is actually a coding loss, ie that a transmission of the corresponding macroblock at the output of the block 1016 yields a better coding gain in the subsequent entropy coding than if prediction residual values are used would become.
- motion data flag 1048 A simple implementation of the motion data flag 1048 is shown in FIG. If the flag is set, motion data of the enhancement layer are derived from the up-sampled motion data of the base layer. In the case of SNR scalability, the 1053 up button is not necessary.
- the motion data of ⁇ expansion layer directly be led ab ⁇ from the basic transaction data. It should be pointed out that this motion data "derivative" can consist of the direct assumption of the motion data or, in a true prediction, in the block 1014 the motion vectors of the base Scliicht held by corresponding from the block For example, subtracted 1014 calculated motion vectors for the expansion xgs scaling layer to obtain motion data prediction values.
- the motion data of the enhancement layer (if no prediction of any kind has been made) or the residual values of the prediction (if a true prediction has been made) are supplied to the expansion-picture coder 1028 via an output shown in FIG they are included in the expansion scaling layer bit stream 1004 at the end. If, on the other hand, a complete transfer of the movement data from the base scaling layer with or without scaling is undertaken, then no expansion movement data must be written to the extension scaling layer bit stream 1004. It suffices to signal this fact by the motion data flag 1048 in the expansion scale layer bitstream.
- FIG. 2 shows an apparatus for decoding a coded video sequence comprising the base scaling layer 1002 and the extension scaling layer 1004.
- the expansion scaling layer 1004 and the base scaling layer 1002 may be from a bitstream demultiplexer that demultiplexes a scalable bitstream with both scaling layers to include both the base scaling layer 1002 and the expansion scaling layer 1004 of FIG extract common bitstream.
- the base scaling layer L 002 is applied to a basic image decoder 1060 which is configured to decode the base scaling layer to form a decoded basic sequence of residual error images and. to obtain basic motion data applied to an output port 1062.
- the output signals on line L062 are then applied to a base-motion combiner 1064.
- the decoder of the present invention further includes an expansion image decoder 1066 for decoding the enhancement scaling layer 1004 to obtain extension prediction residual error images on an output line 1068.
- the output line 1068 further includes motion data infomations, such as the motion data flag 1070 or, if augmentation movement data or augmentation movement data residuals were actually in the extender scaling layer 1004, this augmentation movement data.
- the decoded basic sequence on the line 1062 is now either interpolated by an interpolator 1070 or supplied unmodified (line 1072) to an inter-layer combiner 1074 to match that made by the interlayer predictor 1018 of FIG Reversing interlayer prediction.
- the inter-layer combiner is thus designed to combine the extension prediction residual error images with information about the decoded basic sequence on the line 1062, be they interpolated (1070) or not (1072), an extension Sequence of residual error images, which is finally fed to an expansion motion combiner 1076, which, like the basic motion combiner 1064, also makes the motion compensation also carried out in the extension layer recede.
- the expansion motion combiner 1076 is coupled to a motion data determiner 1078 to provide the motion data for the motion combination in block 1076.
- the motion data may actually be supplied to the expansion layer by the expansion image decoder at the output 1068 for full extension motion data be.
- the extension movement data may also be movement data residual values.
- the corresponding data is supplied via an extension motion data line 1080 to the motion data determiner 1078. If, however, the movement data flag 1070 indicates that no additional extension movement data has been transmitted for the extension layer, then necessary movement data are fetched via a line 1082 from the base layer, depending on the scalability used, either directly (line 1084) or after a high key by a 1087 push-button.
- a corresponding connection is further provided on the decoder side between the expansion-motion combiner 1076 and the basic motion combiner 1064, which depending on the spatial ⁇ scalability has an interpolator 1090 or a detour line, if only one SNR Skalierbark: eit has been set.
- the extension layer is only transmitted a prediction residual signal for this intra-macroblock, which is disclosed by appropriate signaling information in the bit stream.
- the expansion-motion compressor will also carry out a sum formation for this one macroblock, ie a combination between the macroblock residual values and the macroblock values from the lower partition layer then supplying the obtained macroblock to the actual inverse motion compensation processing.
- a preferred embodiment of the basic motion predictor 1012 or the expansion motion predictor 1016 or the inverse element, that is, the expansion motion combiner 1076 or the base motion compensator 1064 will be described received.
- any motion compensation prediction algorithm may be used, including the motion compensation algorithm shown at 92 in FIG.
- the conventional motion compensation algorithm also obeys the scheme shown in FIG. 1, however, the update operator U, which is represented by the reference numeral 45 in FIG. 4, is deactivated. As a result, a group of images is converted into an original image and, to a certain extent, dependent residual images or prediction residual signals or residual error images.
- an extension is implemented such that the update operator as shown in FIG. 4 is active, and e.g. is calculated as shown with reference to FIGS. 5a to 5d, the normal motion compensation prediction calculation becomes the so-called MCTF processing, which is also referred to as motion-compensated temporal filtering.
- the update operation from the normal image or INTRA image of the conventional motion compensation becomes a low-pass image, since the original image is combined by the prediction residual signal weighted by the update operator.
- motion-compensated temporal filter consists of a general three-step lifting scheme, namely, polyphase decomposition, prediction, and update.
- Fig. 4 shows the corresponding analysis / synthesis filter bank structure.
- the odd samples of a given signal are filtered by a linear combination of the even samples using the prediction operator P and a high pass signal H to the prediction residual values.
- a corresponding low-pass signal 1 is formed by adding a linear combination of the prediction residual values h with the even-numbered sampled values of the input signal s using the updating operator.
- the equational relationship between the variables h and 1 shown in FIG. 4 and the basic embodiments of the operators P and U is shown in FIG. 5a.
- the synthesis filter bank includes the application of the prediction operator and the update operator in reverse order with the inverse sign in the summation process, wherein the even and the odd polyphase components are verwen ⁇ det.
- corresponding scaling factors F 1 and F h are used for normalization of the high-pass / low-pass components. These scaling factors do not necessarily have to be used, but they can be used if quantizer step sizes are selected during encoding.
- not only two temporally successive images are decomposed into a high-pass image and a low-pass image, but that, for example, a first image with a third image of a sequence motion compensated can be filtered.
- the appropriate choice of reference indices allows for e.g. For example, one and the same image of a sequence of sequences can be used to serve as the basis for the motion vector. This means that the reference indices, for example in a sequence of eight images, allow all motion vectors to be detected. gates z.
- the final result of processing these eight images by the filter scheme in FIG. 4 is a single low-pass image, resulting in seven high-pass images (enhancement images), and all the motion vectors. where an extension image is associated with each motion vector, refer to the same image of the original sequence.
- the same picture namely, for example, the fourth picture of the sequence from eight pictures
- the low-pass image is the same for each filtering, namely the ultimately desired single low-pass image of the sequence of images. If the update parameter is zero, the base image is simply "passed through" by the lower branch.
- the high-pass image is always dependent on the corresponding other image of the original sequence and the prediction operator, the motion vector associated with this input image being used in the prediction.
- the low-pass image finally obtained is associated with a specific image of the original sequence of images, and that each high-pass image is also associated with an image of the original sequence, whereby exactly the deviations of the original image of the sequence (a Motion compensation) from the selected basic picture of the sequence (which is fed in the lower branch of the analysis filter bank of FIG. 4). If each updating parameter Moi, Mn, M 2 i and M 3x equals zero, this causes the image fed into the bottom fourth-level branch 73 to be simply "looped down.”
- the low-pass image TP 1 is somewhat “repeated”. fed into the filter bank, while the other images - controlled by the reference indices - gradually introduced into the input 64 of Fig. 3 was ⁇ .
- the prediction and update operators for the motion-compensated filtering provide different predictions for the two different wavelets.
- the Haar wavelet unidirectional motion compensated prediction is achieved.
- the 5/3 spline wavelet is used, the two operators specify a bidirectional motion-compensated prediction.
- bi-directional compensated prediction generally reduces the energy of the prediction residual, but increases the motion vector rate compared to unidirectional prediction
- it is desirable to dynamically switch between unidirectional and bidirectional prediction which means that between a lifting representation of the Haar wavelets and the 5/3 Spline wavelet depending on a picture-dependent control signal can be switched back and forth.
- the concept according to the invention which does not use a closed feedback loop for temporal filtering, readily allows this macroblock switching back and forth between two Wave ⁇ lets, which in turn the flexibility and in particular the Data rate saving is used, which is optimally customizable feasible.
- FIG. 7 shows the high-pass image HPl of the first level at the output 22 of the first-level filter and the low-pass image of the first level at the output 24 of the first-level filter.
- the two low-pass images TP2 at the output 16 of the second-level filter and the two high-pass images obtained from the second plane are shown in FIG. 7 as second-level images.
- D-Le low-pass images of third planes are present at the output 76 of the third-level filter, while the high-pass images of the third level are present at the output 75 in further processed form.
- the group of eight images could originally comprise video images, in which case the decoder of FIG. 3 would be used without a fourth filter level.
- the MCTF decomposition according to the invention can be used as basic motion predictor, extension motion predictor, or Basic motion combiner or expansion motion combiner. Generally speaking, this decomposition thus transmits a group of 2 n images, (2 n + 1 -2) motion field descriptions, (2 "-1 ) residual images, and a single low-pass (or INTRA) image.
- Both the basic motion compensator and the expansion motion compensator are preferably controlled by a base control parameter or an extension control parameter, respectively, in order to determine an optimum combination of a quantization parameter (1034) which is dependent on a specific rate or 1036) and motion information. This is done according to the following methodology in order to obtain an optimal ratio with respect to a certain maximum bit rate.
- pictures A and B which are either original pictures or pictures representing low-pass signals. len generated in a previous analysis stage. Further, the respective arrays of luma samples a [] and b [] are provided.
- the motion description Mio is estimated in a macroblock fashion as follows:
- ITi 1 [m x , m y ] ⁇
- S specifies the motion vector search area within the reference picture A.
- P is the area which is swept by the subject macroblock division or sub-macroblock division.
- R (i, m) specifies the number of bits needed to transmit all the components of the motion vector m, where ⁇ is a fixed Lagrangian multiplier.
- the motion search initially proceeds via all integer sample-accurate motion vectors in the given search area S. Then, using the., Best integer Motion vector, the 8 surrounding half-sample accurate motion vectors are tested. Finally, using the best half-sample-accurate motion vector, the 8 surrounding quarter-sample-accurate motion vectors are tested. For the half- and quarter-sample-accurate motion vector enhancement, the term
- the mode decision for the macroblock mode and the sub-macroblock mode basically follows the same solution approach. From a given set of possible macroblock or sub-macroblock modes S mOde , the mode pi which minimizes the following Lagrange functional is selected:
- D SAD (UP) ⁇ ⁇ b [x, y] -a [xm x [p, x, y], y - m y [p, x, y]] ⁇
- P specifies the macroblock or sub-macroblock area
- m [p, x, y] is the motion vector, the macroblock or sub-macroblock mode p, and the pitch o- of the sub-macroblock pitch associated with the luma position (x, y).
- the rate term R (i, p) represents the number of bits associated with the choice of the coding mode p.
- R (i, p) represents the number of bits associated with the choice of the coding mode p.
- the motion-compensated coding modes it includes the bits for the macroblock mode (if applicable), the sub-macroblock mode (s) (if applicable) and the motion vector (s).
- the intra mode it includes the bits for the macroblock mode and the arrays of quantized luma and chroma-transform coefficient levels.
- the set of possible sub-macroblock modes is through
- the INTRA mode is only used if a motion field description Mio used for the predictive step is estimated.
- the Lagrangian multiplier ⁇ is set in accordance with the base-layer quantization parameter for the high-pass image (s) QP Hi of the decomposition stage for which the motion field is estimated, according to the following equation:
- the decomposition scheme shown in FIG. 8 is used, from which it is assumed that a reasonable compromise between temporal scalability and coding efficiency is possible.
- the sequence of the original images is treated as a sequence of input images A, B, A, B, A, B... A, B.
- this scheme provides a stage with optimal temporal scalability (equal stood between the lowpass images).
- the sequence of low-pass pictures used as the input signal in all subsequent decomposition stages are treated as sequences of input pictures B, A, A, B, B, A... A, B, thereby representing the distances between the low-pass images, which are decomposed in the following two-channel analysis scheme, are kept small, as can be seen in FIG.
- motion data interlayer prediction and the residual data interlayer prediction will be discussed.
- motion data and texture data from a lower scaling layer are used for prediction purposes for a higher scaling layer.
- an up-sampling or an upsampling of the motion data will be necessary before they can be used as a prediction for the decoding of spatial enhancement layers.
- the motion prediction data of a base-layer representation is transmitted by AVC using a subset of the existing B-slice syntax.
- two additional macroblock modes are preferably introduced.
- the first macroblock mode is "Base__Layer_Mode” and the second mode is the "Qpel_Refinement_Mode”.
- two flags namely the BLFlag and QrefFlag, are added to the macroblock layer syntax, before the syntax element mb_mode, as shown in FIG.
- the first flag BLFlag 1098 signals the base-layer mode, while the other Flag 1100 symbolizes the Qpel Refinement mode. If such a flag is set, it has the value 1, and the data stream is as shown in Fig. 6a. Thus, if the flag 1098 has the value 1, the flag 1100 and the macroblock mode 1102 syntax element play no further role.
- the base layer mode is used and no further information is used for the corresponding macroblock.
- This macroblock mode indicates that the motion prediction information, including the macroblock partitioning of the corresponding macroblock of the base layer, is used directly for the extension layer.
- the term "base layer” is intended to represent a next lower layer with respect to the layer currently being considered, ie the extension layer, if the base layer represents a layer with half the spatial resolution the motion vector field, ie the field of motion vectors including the macroblock partitioning, is scaled accordingly, as shown in Fig. 6b, in which case the current macroblock comprises the same region as an 8x8 sub macroblock of the base frame.
- the same reference indices are used as for the corresponding macroblock / sub-macroblock partitionings of the base layer block.
- the associated motion vectors are multiplied by a factor of 2. This factor applies to the situation shown in FIG. 6b, in which a base layer 1102 comprises half the area or number in pixels as the enhancement layer 1104. If the ratio of the spatial resolution of the base layer to the spatial resolution of the enhancement layer is not equal to 1 / 2, corresponding scaling factors are used for the motion vectors.
- the macroblock mode Qpel_Refinement_Mode is signaled.
- the flag 1100 is preferably present only if the base layer represents a layer with half the spatial resolution of the current layer. Otherwise, the macroblock mode (Qpel_Refinement_Mode) is not included in the set of possible macroblock modes.
- This macroblock mode is similar to the base layer mode.
- the macroblock partitioning as well as the reference indices and the motion vectors are derived as in the base layer mode. However, there is one for each motion vector additional quarter sample motion vector refinement -1.0 or +1 for each motion vector component which is additionally transmitted and added to the derived motion vector.
- the macroblock mode and the corresponding reference indices and motion vector differences are specified as usual. This means that the complete set of movement data for the extension layer is transmitted in the same way as for the base layer. According to the invention, however, it is also possible here to use the base-layer motion vector as a predictor for the current extension-layer motion vector (instead of the spatial motion vector predictor). So should the list X
- the base layer macroblock comprising the current macroblock / sub-macroblock partitions is not encoded in an INTRA macroblock mode
- the base-layer macroblock / sub-macroblock partitioning covering the upper left sample of the present macroblock / sub-macroblock partitioning uses the List X or a bi-prediction;
- the L ⁇ X reference index of the base layer macroblood / sub-macroblock partitioning, which includes the upper left sample of the current macroblock / sub-macroblock partitioning, is equal to the list X reference index of the current macro - block / suto macroblock partitioning.
- the flags 1098, 1100 and 1106 thus together represent a possibility of implementing the movement data flag 1048 shown generally in FIG. 1 a or, in general, a movement data control signal 1048.
- a movement data control signal 1048 a movement data control signal
- different other possibilities of signaling exist for this purpose, although, of course, a fixed agreement between the transmitter and the receiver can also be used which permits a reduction in signaling information.
- the expansion motion compensator 1014 has to do two things in principle. Thus, he first has to calculate the extension movement data, that is to say typically the entire motion vectors, and to supply them to the expansion movement predictor 1016, so that he can use these vectors in an uncoded form so as to obtain the extension sequence of residual error images. in the state of the art typically adaptive and block-by-block to perform. Another matter, however, is the expansion-motion data processing, ie how the motion data used for motion-compensated prediction are now compressed as much as possible and written in a bit stream. To do this, in order for something to be written in the bit stream, corresponding data must be provided to the expansion image coder 1028, as illustrated with reference to Figure Ie.
- the expansion movement data processing means 1014b thus has the task of reducing as much as possible the redundancy with respect to the base layer contained in the expansion movement data which the expansion movement data calculation means 1014a has detected.
- the basic movement data or the up-sampled basic movement data can be used both by the extension movement data calculation device 1014a to calculate the extension movement data actually to be used, or can also only be used for extension movement data processing, ie Extension motion data compression while they do not play any role in the calculation of the expansion movement data.
- the two possibilities 1.) and 2.) of FIG. 1 g show exemplary embodiments in which the base movement data or the up-sampled base movement data are already used in the expansion movement data calculation, the embodiment 3) of FIG Ib a case in which information about the basic motion data is not used to calculate the expansion motion data but only for coding or obtaining residual data.
- FIG. 5F shows the decoder-side implementation of the expansion motion data determination means 1078 having a block-by-block control module 1078a containing the signaling information from the bitstream and expansion-picture decoder 1066, respectively.
- the extension motion data determiner 1078 includes an extension motion data reconstruction device 1078b, either alone using de ⁇ : decoded base motion data or decoded up-sampled base motion data, or by a combination of information about the decoded base motion data and remainder data extracted from the enhancement-scale layer 1004 by the enhancement-picture decoder 1066 actually detects the motion-motion data field motion vectors that can then be used by the enhancement-motion combiner 1076, which may be a conventional combiner; in order to undo the encoder-side motion-compensated prediction.
- the BLFlag 1098 signals a complete transfer of the scaled-up basic movement data for the expansion-movement prediction.
- the device 1014a is designed to take over the basic movement data completely or, in the case of different resolutions of the different layers, to take over the basic movement data in an upscaled form and transmit them to the device 1016.
- the augmentation image coder does not communicate information about motion vectors or motion vectors. Instead, a separate flag 1098 is transmitted only for each block, be it a macroblock or a sub-macroblock.
- the basic motion vector is integrated into the expansion motion data calculation performed by the device 1014a.
- the motion data computation or computation of the motion vectors m takes place in that the minimum of the expression (D + ⁇ R)
- Distortion term D is the difference between a block of a current image B and a block of a preceding and / or later image shifted by a certain potential motion vector.
- the quantization parameter of the expansion-picture coder which is denoted by 1036 in FIG. 1a, enters into the factor ⁇ .
- the expression R provides information about the number of bits used to encode a potential motion vector.
- a search is now made among different potential motion vectors, with the distortion term D calculated for each new motion vector, and the rate term R calculated, and with the expansion quantization parameter 1036 being preferably fixed, however may vary.
- the sum term described is evaluated for different potential motion vectors, after which the motion vector is taken, which yields the minimum result of the sum.
- the base motion vector of the corresponding block from the base layer is now also integrated into this iterative search. If it fulfills the search criterion, then again only the flag 1100 must be transmitted, but no residual values or anything else must be transmitted for this block.
- the device 1014a uses this base motion vector to generate it Facility 1016 to transmit. However, only the flag HOO is transmitted to the extension image coder.
- Component of the motion vector are independently increased or decreased by one increment, or left equal.
- This increment may represent a particular granularity of a motion vector, e.g. A dissolution step, a half-resolution step, or a quarter-resolution step. If such a modified basic motion vector fulfills the search criterion, then in addition to the flag 1100, the change, that is to say the increment, ie +1, 0 or -1, is to a certain extent transmitted as "residual data".
- a decoder activated by the FLag 1100, will then search for the increment in the data stream and further recover the base motion vector or the upsampled base motion vector and combine it with the corresponding base motion vector in block 1078b of the increment then obtain the motion vector for the corresponding block in the enhancement layer.
- the determination of the motion vectors can in principle be arbitrary.
- the device 1014a may include the expansion motion data, e.g. B. in accordance with the minimization task mentioned in connection with the sau ⁇ embodiment.
- the determined motion vector is then used for the encoder-side motion-compensated prediction, without taking into account information from the base layer.
- the enhancement motion data processing- 1014a is, however, formed in this case in the Be ⁇ wegungsvektor processing for redundancy reduction, that is before the actual ⁇ arithmetic coding, involve the basic motion vectors.
- a transmission of motion vector differences is undertaken, differences between nearby blocks within a picture being determined.
- the difference between different nearby blocks may be formed to select the smallest difference.
- the basic motion vector for the corresponding block is now included in an image. If it meets the criterion that it supplies the smallest residual error value as the predictor, this is signaled by flag 1106 and only the residual error value is transferred to block 1028. If the base motion vector does not satisfy this criterion, flag 1106 is not set and a spatial motion vector difference computation is made.
- the basic motion vector or a highly sampled version of the same can always be used as the predictor, or for adaptively determined blocks.
- an interlayer prediction of residual data is also performed. This will be explained below. If the motion information is changed from one layer to the next, it may be convenient or not convenient to predict residual information or, in the case of MCTF decomposition, high-pass information of the enhancement layer from the base layer. If the motion vectors for a block of the current layer are similar to the motion vectors of the corresponding base layer or macroblockwise corresponding motion vectors of the corresponding base layer, it is likely that the coding efficiency can be increased if the co
- the base-layer residual signal (high-pass signal) is used as a prediction for the extension-residual signal (extension-high-pass signal), whereby only the difference between the extension residual signal and the base-layer reconstruction (line 1024 of FIG.
- Ia is coded.
- This adaptive approach that is, whether the interlayer predictor 1018 is active or not, may be accomplished by actually computing the benefit based on the difference signal or may be performed based on an estimation of how different a motion vector of a base scaling layer is for a macroblock to a corresponding macroblock in the extension scaling layer. If the difference is smaller than a specific threshold value, then the inter-layer predictor is activated via the control line 1030. On the other hand, if the difference is greater than a specific threshold value, then the inter-layer predictor for this macroblock is deactivated.
- the residual signal is upsampled using an interpolation filter before the upsampled residual signal of the base layer is used as the prediction signal.
- This filter is an interpolation filter with six taps, such that values from the environment are used to interpolate a value of the high spatial resolution of the enhancement layer, which was not present in the base layer due to the low resolution, in order to achieve the best possible To obtain interpolation result.
- the interpolation filter would therefore become Using the interpolation values of another transformation block, it is preferable not to do exactly that, but to synthesize the values of the interpolation filter outside the considered block so that there is an interpolation with as few artifacts as possible.
- the method according to the invention can be implemented in hardware or in software.
- the implementation can take place on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can cooperate with a programmable computer system such that the method is carried out.
- the invention thus also exists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention, when the computer program product runs on a computer.
- the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.
- the present invention further relates to a computer readable medium on which a scalable data stream with a first scaling layer and a second scaling layer are stored together with the associated control characters for the various decoder-side devices.
- the computer-readable medium may be a volume, or the Internet on which a data stream is transmitted from a provider to a recipient.
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
MX2007004408A MX2007004408A (es) | 2004-10-15 | 2005-09-21 | Metodo y aparato para generar una secuencia de video codificada y para descodificar una secuencia de video codificada al usar prediccion de valor residual de capa intermedia. |
EP05784915A EP1800488A1 (de) | 2004-10-15 | 2005-09-21 | Vorrichtung und verfahren zum erzeugen einer codierten videosequenz und zum decodieren einer codierten videosequenz unter verwendung einer zwischen-schicht-restwerte-praediktion |
JP2007536022A JP5122288B2 (ja) | 2004-10-15 | 2005-09-21 | 中間レイヤ残余値予測を用いて符号化されたビデオシーケンスを生成および符号化されたビデオシーケンスを復号化するための装置および方法 |
BRPI0516348A BRPI0516348B1 (pt) | 2004-10-15 | 2005-09-21 | equipamento e método para a geração de uma seqüência de vídeo codificado e para decodificação de uma seqüência de vídeo codificado usando uma predição de valor residual de camada intermediária |
BR122018016188A BR122018016188B1 (pt) | 2004-10-15 | 2005-09-21 | equipamento e método para a geração de uma seqüência de vídeo codificado e para decodificação de uma seqüência de vídeo codificado usando uma predição de valor residual de camada intermediária |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US61945704P | 2004-10-15 | 2004-10-15 | |
US60/619,457 | 2004-10-15 | ||
DE102004059978.5 | 2004-12-13 | ||
DE102004059978A DE102004059978B4 (de) | 2004-10-15 | 2004-12-13 | Vorrichtung und Verfahren zum Erzeugen einer codierten Videosequenz und zum Decodieren einer codierten Videosequenz unter Verwendung einer Zwischen-Schicht-Restwerte-Prädiktion sowie ein Computerprogramm und ein computerlesbares Medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006042612A1 true WO2006042612A1 (de) | 2006-04-27 |
Family
ID=35431439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2005/010227 WO2006042612A1 (de) | 2004-10-15 | 2005-09-21 | Vorrichtung und verfahren zum erzeugen einer codierten videosequenz und zum decodieren einer codierten videosequenz unter verwendung einer zwischen-schicht-restwerte-praediktion |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1800488A1 (de) |
JP (1) | JP5122288B2 (de) |
WO (1) | WO2006042612A1 (de) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006075240A1 (en) * | 2005-01-12 | 2006-07-20 | Nokia Corporation | Method and system for inter-layer prediction mode coding in scalable video coding |
WO2006129184A1 (en) * | 2005-06-03 | 2006-12-07 | Nokia Corporation | Residual prediction mode in scalable video coding |
WO2008138546A2 (de) * | 2007-05-16 | 2008-11-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Qualitätsskalierbares videosignal, verfahren zu dessen erzeugung, codierer und decodierer |
US10764604B2 (en) | 2011-09-22 | 2020-09-01 | Sun Patent Trust | Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2524515B1 (de) | 2010-01-11 | 2018-05-30 | Telefonaktiebolaget LM Ericsson (publ) | Verfahren zur videoqualitätsschätzung |
RU2621621C2 (ru) * | 2012-07-18 | 2017-06-06 | Сони Корпорейшн | Способ и устройство обработки изображения |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0753970A2 (de) * | 1995-07-14 | 1997-01-15 | Sharp Kabushiki Kaisha | Hierarchischer Bildkodierer und -dekodierer |
DE10121259A1 (de) * | 2001-01-08 | 2002-07-18 | Siemens Ag | Optimale SNR-skalierbare Videocodierung |
US20030165274A1 (en) * | 1997-07-08 | 2003-09-04 | Haskell Barin Geoffry | Generalized scalability for video coder based on video objects |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NL9200499A (nl) * | 1992-03-17 | 1993-10-18 | Nederland Ptt | Systeem omvattende ten minste een encoder voor het coderen van een digitaal signaal en ten minste een decoder voor het decoderen van een gecodeerd digitaal signaal, alsmede encoder en decoder voor toepassing in het systeem. |
JP3263807B2 (ja) * | 1996-09-09 | 2002-03-11 | ソニー株式会社 | 画像符号化装置および画像符号化方法 |
-
2005
- 2005-09-21 JP JP2007536022A patent/JP5122288B2/ja active Active
- 2005-09-21 EP EP05784915A patent/EP1800488A1/de not_active Withdrawn
- 2005-09-21 WO PCT/EP2005/010227 patent/WO2006042612A1/de active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0753970A2 (de) * | 1995-07-14 | 1997-01-15 | Sharp Kabushiki Kaisha | Hierarchischer Bildkodierer und -dekodierer |
US20030165274A1 (en) * | 1997-07-08 | 2003-09-04 | Haskell Barin Geoffry | Generalized scalability for video coder based on video objects |
DE10121259A1 (de) * | 2001-01-08 | 2002-07-18 | Siemens Ag | Optimale SNR-skalierbare Videocodierung |
Non-Patent Citations (5)
Title |
---|
FENG WU ET AL: "DCT-prediction based progressive fine granularity scalable coding", IMAGE PROCESSING, 2000. PROCEEDINGS. 2000 INTERNATIONAL CONFERENCE ON SEPTEMBER 10-13, 2000, PISCATAWAY, NJ, USA,IEEE, vol. 3, 10 September 2000 (2000-09-10), pages 556 - 559, XP010529527, ISBN: 0-7803-6297-7 * |
LILIENFIELD G ET AL: "Scalable high-definition video coding", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. (ICIP). WASHINGTON, OCT. 23 - 26, 1995, LOS ALAMITOS, IEEE COMP. SOC. PRESS, US, vol. VOL. 3, 23 October 1995 (1995-10-23), pages 567 - 570, XP010197032, ISBN: 0-7803-3122-2 * |
SANGEUN HAN ET AL: "Robust and efficient scalable video coding with leaky prediction", PROCEEDINGS 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP 2002. ROCHESTER, NY, SEPT. 22 - 25, 2002, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, NEW YORK, NY : IEEE, US, vol. VOL. 2 OF 3, 22 September 2002 (2002-09-22), pages 41 - 44, XP010607903, ISBN: 0-7803-7622-6 * |
SCHWARZ H; MARPE D; WIEGAND T: "SVC Core Experiment 2.1: Inter-layer prediction of motion and residual data", INTERNATIONAL ORGANISATION FOR STANDARDISATION ISO/IEC JTC 1/SC 29/WG 11 CODING OF MOVING PICTURES AND AUDIO, no. M11043, 23 July 2004 (2004-07-23), Redmond, Washington US, pages 1 - 6, XP002360488 * |
WOODS J W ET AL: "A RESOLUTION AND FRAME-RATE SCALABLE SUBBAND/WAVELET VIDEO CODER", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 11, no. 9, September 2001 (2001-09-01), pages 1035 - 1044, XP001082208, ISSN: 1051-8215 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006075240A1 (en) * | 2005-01-12 | 2006-07-20 | Nokia Corporation | Method and system for inter-layer prediction mode coding in scalable video coding |
WO2006129184A1 (en) * | 2005-06-03 | 2006-12-07 | Nokia Corporation | Residual prediction mode in scalable video coding |
WO2008138546A2 (de) * | 2007-05-16 | 2008-11-20 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Qualitätsskalierbares videosignal, verfahren zu dessen erzeugung, codierer und decodierer |
WO2008138546A3 (de) * | 2007-05-16 | 2009-06-04 | Fraunhofer Ges Forschung | Qualitätsskalierbares videosignal, verfahren zu dessen erzeugung, codierer und decodierer |
US10764604B2 (en) | 2011-09-22 | 2020-09-01 | Sun Patent Trust | Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus |
Also Published As
Publication number | Publication date |
---|---|
JP2008517499A (ja) | 2008-05-22 |
JP5122288B2 (ja) | 2013-01-16 |
EP1800488A1 (de) | 2007-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE102004059993B4 (de) | Vorrichtung und Verfahren zum Erzeugen einer codierten Videosequenz unter Verwendung einer Zwischen-Schicht-Bewegungsdaten-Prädiktion sowie Computerprogramm und computerlesbares Medium | |
DE60031230T2 (de) | Skalierbares videokodierungssystem und verfahren | |
DE60027955T2 (de) | Verfahren und Vorrichtung zur kontextbasierten Inter/Intra Kodierungsmodeauswahl | |
DE69633129T2 (de) | Waveletbaum-bildcoder mit überlappenden bildblöcken | |
CN1926876B (zh) | 空间和时间可扩展编码的图像序列的编码和解码方法 | |
WO2006056531A1 (de) | Verfahren zur transcodierung sowie transcodiervorrichtung | |
DE60317670T2 (de) | Verfahren und Vorrichtung zur 3D-Teilbandvideokodierung | |
DE69915843T2 (de) | Teilbandkodierung/-dekodierung | |
WO2006042612A1 (de) | Vorrichtung und verfahren zum erzeugen einer codierten videosequenz und zum decodieren einer codierten videosequenz unter verwendung einer zwischen-schicht-restwerte-praediktion | |
EP1800490A1 (de) | Vorrichtung und verfahren zum erzeugen einer codierten videosequenz unter verwendung einer zwischen-schicht-bewegungsdaten-prädiktion | |
DE10022520A1 (de) | Verfahren zur örtlichen skalierbaren Bewegtbildcodierung | |
EP1737240A2 (de) | Verfahren zur skalierbaren Bildcodierung oder -decodierung | |
DE102004063902B4 (de) | Computerprogramm mit einem Verfahren zum Verarbeiten einer Gruppe von Bildern und mit einem Verfahren zum Verarbeiten eines Basisbildes und eines oder mehrerer Erweiterungsbilder | |
DE102004011422B4 (de) | Vorrichtung und Verfahren zum Verarbeiten einer Gruppe von Bildern und Vorrichtung und Verfahren zum Verarbeiten eines Basisbildes und eines oder mehrerer Erweiterungsbilder | |
DE102004011421B4 (de) | Vorrichtung und Verfahren zum Erzeugen eines skalierten Datenstroms | |
DE10340407A1 (de) | Vorrichtung und Verfahren zum Codieren einer Gruppe von aufeinanderfolgenden Bildern und Vorrichtung und Verfahren zum Decodieren eines codierten Bildsignals | |
DE10121259C2 (de) | Optimale SNR-skalierbare Videocodierung | |
Sawada et al. | Subband-based scalable coding schemes with motion-compensated prediction | |
Shahid et al. | An adaptive scan of high frequency subbands for dyadic intra frame in MPEG4-AVC/H. 264 scalable video coding | |
EP1157557A1 (de) | Verfahren und anordnung zur transformation eines bildbereichs | |
GANGULY et al. | FAST MODE DECISION ALGORITHM FOR INTRA ONLY SCALABLE VIDEO CODING USING COMBINED SUBBAND/DCT CODING | |
WO2008006806A2 (de) | Skalierbare videokodierung | |
DE10243568A1 (de) | Verfahren zur skalierbaren Videocodierung eines Videobildsignals sowie ein zugehöriger Codec |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005784915 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1265/KOLNP/2007 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/a/2007/004408 Country of ref document: MX |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007536022 Country of ref document: JP Ref document number: 200580035281.3 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005784915 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: PI0516348 Country of ref document: BR |