WO2007042063A1 - Codec video acceptant l'echelonnabilite de la qualite - Google Patents

Codec video acceptant l'echelonnabilite de la qualite Download PDF

Info

Publication number
WO2007042063A1
WO2007042063A1 PCT/EP2005/010972 EP2005010972W WO2007042063A1 WO 2007042063 A1 WO2007042063 A1 WO 2007042063A1 EP 2005010972 W EP2005010972 W EP 2005010972W WO 2007042063 A1 WO2007042063 A1 WO 2007042063A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
refinement
motion
quality level
motion information
Prior art date
Application number
PCT/EP2005/010972
Other languages
English (en)
Inventor
Heiko Schwarz
Thomas Wiegand
Detlev Marpe
Martin Winken
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority to PCT/EP2005/010972 priority Critical patent/WO2007042063A1/fr
Publication of WO2007042063A1 publication Critical patent/WO2007042063A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a video codec supporting quality- or SNR-scalability .
  • JVT Joint Video Team
  • MPEG Moving Pictures Experts Group
  • VCEG ITU-T Video Coding Experts Group
  • Wien eds., "Joint Scalable Video Model JSVM- 3," Joint Video Team, Doc. JVT-P202, Poznan, Tru, July 2005, supports temporal, spatial, and SNR scalable coding of video sequences or any combination thereof.
  • H.264/MPEG4-AVC as described in ITU-T Rec. & ISO/IEC 14496-10 AVC, "Advanced Video Coding for Generic Audiovisual Services, " version 3, 2005, specifies a hybrid video codec in which macroblock prediction signals are either generated by motion-compensated prediction or intra-prediction and both predictions are followed by residual coding.
  • H.264/MPEG4-AVC coding without the scalability extension is referred to as single-layer H.264/MPEG4-AVC coding.
  • Rate-distortion performance comparable to single-layer H.264/MPEG4-AVC means that the same visual reproduction quality is typically achieved at 10% bit-rate.
  • scalability is considered as a functionality for removal of parts of the bit-stream while achieving an R-D performance at -any supported spatial, temporal, or SNR resolution that is comparable to single-layer H.264/MPEG4-AVC coding at that particular resolution.
  • the basic design of the scalable video coding (SVC) can be classified as layered video codec. In each layer, the basic concepts of motion-compensated prediction and intra prediction are employed as in H.264/MPEG4-AVC. However, additional inter-layer prediction mechanisms have been integrated in order to exploit the redundancy between several spatial or SNR layers.
  • SNR scalability is basically achieved by residual quantization, while for spatial scalability, a combination of motion-compensated prediction and oversampled pyramid decomposition is employed. The temporal scalability approach of H.264/MPEG4-AVC is maintained.
  • Fig. 12 shows a typical coder structure 900 with two spatial layers 902a, 902b.
  • an independent hierarchical motion-compensated prediction structure 904a, b with layer- specific motion parameters 906a b is employed.
  • the redundancy between consecutive layers 902a, b is exploited by inter-layer prediction concepts 908 that include prediction mechanisms for motion parameters 906a, b as well as texture data 910a, b.
  • a base representation 912a, b of the input pictures 914a, b of each layer 902a, b is obtained by transform coding 916a, b similar to that of H.264/MPEG4-AVC, the corresponding NAL units (NAL - Network Abstraction Layer) contain motion information and texture data; the NAL units of the base representation of the lowest layer, i.e. 912a, are compatible with single-layer H.264/MPEG4-AVC.
  • the reconstruction quality of the base representations can be improved by an additional coding 918a, b of so-called progressive refinement slices; the corresponding NAL units can be arbitrarily truncated in order to support fine granular quality scalability (FGS) or flexible bit-rate adaptation.
  • FGS fine granular quality scalability
  • the resulting bit-streams output by the base layer coding 916a, b and the progressive SNR refinement texture coding 918a, b of the respective layers 902a, b, respectively, are multiplexed by a multiplexer 920 in order to result in the scaleable bit-stream 922.
  • This bit-stream 922 is scaleable in time, space and SNR quality.
  • One disadvantage of the above-described scaleable extension of the video coding standard H.264/MPEG4-AVC is that a distortion/rate performance of the refinement layers defined by the base layer bit-stream 912a or 912b plus the respective refinement layer bit-streams output by blocks 918a, b up to a specific respective refinement layer is not non-optimal for all R/D-performances .
  • the basic idea underlying the present invention is that refinement layers showing an improved rate/distortion performance may be achieved by accompanying residual information refinement information with refined motion information.
  • a coarse motion information involving less bits for representing the motion information in ⁇ he bit- stream leads to good rate/distortion performance.
  • merely refining the encoding of the texture information relative to the coarse motion information does not lead to optimal rate-distortion performance for representing the respective refinement layer. Rather, more bits should be spent for the motion information in order to refine the motion information for achieving a better rate/distortion performance for higher for maximum- bit-rates.
  • a significantly improved coding efficiency is achieved.
  • Fig. 1 a block diagram of a video encoder according to an embodiment of the present invention
  • Fig. 2 a schematic illustrating the hierarchical prediction structure that may me used in the encoder of Fig. 1;
  • Fig. 3 a schematic illustrating the function of the key pictures as re-synchronization points between encoder and decoder
  • Fig. 4 a graph illustrating an example for the coding efficiency of SNR scalable coding strategies
  • Fig. 5 a graph showing a comparison of the FGS concept of
  • Fig. 6 a graph showing a comparison of the FGS concept of Fig. 1 with the FGS concept of Fig. 12 for another scene;
  • Fig. 7 a schematic illustrating the SNR refinement* coding according to Fig. 12;
  • FIG. 8 a schematic illustrating the refinement coding used in Fig. 1 according to an embodiment of the present application
  • Fig. 9 a schematic illustrating another embodiment for the refinement coding used in Fig. 1;
  • FIG. 10a and 10b schematics showing embodiments for possible refinements of motion information
  • Fig. 11 a flow chart illustrating the most pertinent steps performed in a decoder for decoding the bit-stream of the encoder of Fig. 1 according to an embodiment of the present application.
  • Fig. 12 a conventional coder structure for scalable video coding.
  • Fig. 1 shows an embodiment of a video encoder of the present application.
  • the video encoder of Fig. 1 supports two spatial layers.
  • the encoder of Fig. 1 which is generally indicated by 100 comprises two layer portions or layers 102a and 102b, among which layer 102b is dedicated for generating that part of the desired scalable bit-stream concerning a coarser spatial resolution, while the other layer 102a is dedicated for supplementing the bit- stream output by layer 102b with information concerning a higher resolution representation of an input video signal 104.
  • the video signal 104 to be encoded by encoder 100 is input directly into layer 102a whereas encoder 100 comprises a spatial decimeter 106 for spatially decimating the video signal 104 before inputting the resulting spatially decimated video signal 108 into layer 102b.
  • the decimation performed in spatial decimeter 106 comprises for example, decimating the number of pixels of each picture 104a of the original video signal 104 by a factor of 4 by means of discarding every second pixel in column and row direction.
  • the low-resolution layer 102b comprises a motion-compensated prediction block 110b, a base layer coding block 112b and a refinement coding block 114b.
  • the prediction block 110b performs a motion-compensated prediction on pictures 108a of the decimated video signal 108 in order to predict pictures 108a of the decimated video signal 108 from other reference pictures 108a of the decimated video signal 108.
  • the prediction block 110b generates for a specific picture 108a motion information that indicates as to how this picture may be predicted from other pictures of the video signal 108, i.e., from reference pictures.
  • the motion information may comprise pairs of motion vectors and associated reference picture indices, each pair indicating, for example, how a specific part or macroblock of the current picture is predicted from an indexed reference picture by displacing the respective reference picture by the respective motion vector.
  • Each macroblock may be assigned one or more pairs of motion vectors and reference picture indices.
  • some of the macroblocks of a picture may be intra predicted, i.e., predicted by use of the information of the current picture.
  • the prediction block 110b may perform a hierarchical motion- compensated prediction on the decimated video signal 108.
  • the prediction block 110b outputs the motion information 116b as well as the prediction residuals or the video texture information 118b representing the differences between the predictors and the actual decimated pictures 108a.
  • the determination of the motion information and text information 116b and 118b in prediction block 110b is performed such that the resulting encoding of these information by means of the subsequent base layer coding 110b results in a base-representation bit-stream with, preferably, optimum rate/distortion performance.
  • the prediction block 110b also determines, in cooperation with the refinement coding block 114b, refined motion information along with corresponding refined residual texture information, however, this will also be described in more detail below.
  • the base layer coding block 110b receives the first motion information 116b and texture information 118b from block 110b and encodes the information to a base-representation bit-stream 120b.
  • the encoding performed by block 110b comprises a transformation and a quantization of the texture information 118b.
  • the quantization used by block 110b is relatively coarse.
  • the refinement coding block 114b supports the bit- stream 120b with additional bit-streams for various refinement layers containing information for refining the coarsely quantized transform coefficients representing the texture information in bit-stream 120b.
  • the refinement coding block 114b is not only able to refine the transform coefficients representing the texture information 118b relative to the base representation of bit-stream 120b or a lower refinement layer bit-stream output performed by coding block 114b, i.e., 122b. Rather, as will be described in more detail below, refinement coding block 114b - in cooperating with the prediction block 110b, is further able to decide that a specific refinement layer bit-stream 122b should be accompanied by refined motion information 116b. However, this functionality will be described later after the description of Fig. 6.
  • the refinement of the residual texture information relative to the base representation 120b of the formerly output lower refinement layer bit-streaTn 122b comprises, for example, the encoding of the current quantization error of the transform coefficient representing the texture information 118b with a finer quantization precision.
  • Both bit-streams 120b and 122b are multiplexed by a multiplexer 124 comprised by encoder 100 in order to insert both bit-streams into the final scaleable bit-stream 126 representing the output of encoder 100.
  • Layer 102a operates substantially the same as layer 102b. Accordingly, layer 102a comprises a motion-compensation prediction block 110a, a base layer coding block 112a, and a refinement coding block 114a. In conformity with layer 102b, the prediction block 110a receives the video signal 104 and performs a motion-compensated prediction thereon in order to obtain motion information 116a and texture information 118a. The firstly output motion and texture information 116a and 118a are received by coding block 110a which encodes this information to obtain the base representation bit-stream 120a.
  • the refinement coding block 114a codes refinements of the quantization error manifesting itself in the base representation 120a by comparing a transformation coefficient in bit-stream 120a and the actual transform coefficient resulting from the original texture information 118a and, accordingly, outputs refinement-layer bit-streams 122a for various refinement layers. Moreover, as mentioned above, for the various refinement layers, the refinement coding means 114a may use refinement motion information, which is also generated by prediction block 110a.
  • layer 102a involves inter-layer prediction. That is, as will be described in more detail below, the prediction block 110a uses information derivable from layer 102b, such as residual texture information, motion information or a reconstructed video signal as derived from one or more of the bit-streams 120b and 122b, in order to pre-predict the higher resolution pictures 104a of the video signal 104, thereafter performing the motion-compensated prediction on the pre-prediction residuals as mentioned above with respect to prediction block 110b relative the decimated video signal 108.
  • the mode of operation of encoder 100 is described in more detail below, with, however, firstly neglecting the usage of refined motion information in the refinement coding process in block 114a and 114b.
  • the following detailed description of the encoder 100 represents an embodiment where this encoder is designed to represent a scaleable extension of the video coding standard H.264/MPEG4-AVC.
  • b hierarchical prediction structure as illustrated in Fig. 2 is employed in each layer 102a.
  • the first picture 202 of a video sequence 204 is coded as IDR (intra) picture; so-called key pictures 202, 206 are coded in regular intervals.
  • a key picture 202, 206 and all pictures that are temporally located between a key picture and the previous key pictures are considered to build a group of pictures 208 (GOP) .
  • GOP group of pictures 208
  • the key pictures 202, 206 are either intra-coded or inter-coded by using previous key pictures as reference for motion-compensated prediction.
  • the remaining pictures of a GOP are hierarchically predicted as shown in Fig. 2. It is obvious that this hierarchical prediction structure provides temporal scalability; but it turned out that it offers also the possibility to efficiently integrating spatial and SNR scalability.
  • the hierarchical picture coding can be extended to motion- compensated filtering (MCTF) .
  • MCTF motion- compensated filtering
  • motion-compensated update operations using the prediction residuals dashed arrows in Fig. 2 are introduced in addition to the motion- compensated prediction (continuous arrows in Fig.* 2).
  • a detailed description on how H.264/MPEG4-AVC is extended towards MCTF can be found in J. Reichel, H. Schwarz, and M. Wien, eds., "Scalable Video Coding - Working Draft 3," Joint Video Team, Doc. JVT-P201, Poznan, Tru, July 2005 and J. Reichel, H. Schwarz, and M.
  • Temporal scalability can thus be provided by using a hierarchical prediction structure similar to that depicted in Fig. 2. This can be achieved with single-layer H.264/MPEG4- AVC and does not re-quire any changes of the standard. For spatial and SNR scalability additional tools have to be added to single-layer H.264/MPEG4-AVC. All three scalability types can be combined in order to generate a bit-stream 126 that supports a large degree on combined scalability.
  • CGS coarse-grain scalability
  • FGS fine- granular scalability
  • an additional macroblock modes that utilize motion information of the lower resolution layer has been introduced. If this macroblock is selected, the macroblock partitioning is copied from the co- located macroblock of the corresponding base layer. For the macroblock partitions and sub-macroblock partitions, the same reference picture indices and the same motion vectors as for the corresponding macroblock partition or sub-macroblock partition of the base macroblock are used. Neither reference indices nor motion vector differences are transmitted. Additionally, the design of SVC includes the possibility to use a motion vector of the lower layer as motion vector predictor for the conventional motion-compensated macroblock modes .
  • an additional flag is transmitted for each inter-coded macroblock, which signals the application of residual signal prediction from the lower resolution layer. If the flag is true, the base layer residual signal is used as prediction for the residual signal of the current layer, so that only the corresponding difference signal is coded.
  • Inter-layer intra prediction Furthermore, an additional intra macroblock mode, in which the intra prediction signal is formed by the reconstruction signal of the lower layer, is introduced.
  • this inter-layer intra prediction it is generally required that the lower layer is completely decoded including the computationally complex operations of motion-compensated prediction and deblocking.
  • this problem can circumvented when the inter-layer intra prediction is restricted to those parts of the lower layer picture that are coded with intra macroblocks.
  • This restriction enables single motion compensation loop decoding and is mandatory in the current SVC Working Draft J. Reichel, H. Schwarz, and M. Wien, eds., "Scalable Video Coding - Working Draft 3," Joint Video Team, Doc. JVT-P201, Poznan, Tru, July 2005 and J. Reichel, H. Schwarz, and M. Wien, eds., "Joint Scalable Video Model JSVM-3,” Joint Video Team, Doc. JVT-P202, Poznan, Poland, July 2005.
  • Each NAL unit for a PR slice represents a refinement signal that corresponds to a bisection of the quantization step size (QP increase of 6) .
  • QP increase of 6 the quantization step size
  • These signals are represented/generated by blocks 114a, b in a way that only a single inverse transform has to be performed for each transform block at the decoder side.
  • the progressive refinement NAL units can be truncated at an arbitrary point, so that the quality of the SNR base layer can be improved in a fine granular way. Therefore, the coding order of transform coefficient levels has been modified.
  • the transform coefficient blocks are scanned in several paths, and in each path only a few coding symbols for a transform coefficient block are coded.
  • the CABAC entropy coding as specified in H.264/MPEG4-AVC is re-used.
  • Fig. 3 demonstrates the amount of bit- stream data for five pictures 250a to 25Oe of a video sequence, the bit-stream data for each picture 250a to 25Oe in scalable data-stream 126 being divided into a SNR base layer part (lower half) and a SNR enhancement layer part (upper part) .
  • the references pictures 250a, 25Oe including PR slices are used for motion compensated prediction (dashed arrows in Fig.
  • the motion-compensated prediction signal for key pictures 250a, b is generated by using only the base layer representation (lower part) (without FGS enhancements) of the reference key pictures (line-dot-arrow) .
  • the non- key pictures 250b, c, d are predicted by using the highest quality reference that is available (SNR enhancement layer) , since the reconstruction quality is highly dependent on the quality of the reference pictures 250a, e and the drift is limited by the hierarchical prediction structure.
  • Fig. 4 a comparison of the coding efficiency of coarse- grain and fine-grain SNR scalable coding is illustrated for an example sequence.
  • the base layer has been always coded in compliance with H.264/MPEG4-AVC. Only the first picture of a sequence was intra-coded, and a GOP size of 16 pictures has been selected. No motion-compensated update steps have been employed. All encoder runs have been performed with a rate- distortion optimised encoder control as described in T. Wiegand, et. al, "Rate-Constrained Coder Control and Comparison of Video Coding Standards," IEEE Trans. CSVT, vol. 13, pp. 688-703, July 2003.
  • the difference between the quantization parameters of the lowest and highest SNR layers was set to 12, which approximately corresponds to a factor of 4 in bit-rate.
  • the dashed curve represent CGS runs with adaptive selection of the inter-layer prediction tools and quantization parameter differences of 6.
  • a corresponding CGS run for which all inter-layer prediction tools have always been used is represented by the continuous curve connecting the triangles.
  • the same motion parameters optimised for the lowest rate point have been chosen in the base layer.
  • a comparison of these 2 curves shows that the coding efficiency for the CGS enhancement layers can always be improved when the inter-layer prediction tools are adaptively selected, i.e. especially when the motion parameters for the enhancement layer are adaptively refined, as described further below.
  • the continuous curve connecting the circles represents an FGS coding run that is comparable with the CGS run of the triangle fitted curve.
  • the QP difference between successive layers is equal to 6 and no motion refinements are used.
  • the rate for decoding the FGS bit-stream can be arbitrarily chosen in-side the supported interval.
  • the difference between the points of the green curve and the FGS layer end points of the red curve, which are marked by black circles, results from the fact that when using the FGS functionality the key pictures are only predicted from the base representation of previous key pictures, while in the CGS run the highest quality reference is always used for motion-compensated prediction.
  • the coding efficiency at high bit-rates can be improved by more than 1 dB compared to a simple scheme, when an adaptive switching of the inter- layer prediction mechanisms is allowed.
  • the simple scheme as described here before the motion data of all layers 122 are identical and the reconstructed base layer signal is always used as predictor for the signal of the current layer.
  • the improved coding efficiency of the adaptive concept which allows an adaptive switching of the inter-layer prediction techniques, is mainly a result of the possibility to refine the motion information of the base layer for the coding of the enhancement of refinement layers.
  • the trade-off between motion and texture data can be optimised for each coarse-grain SNR layer.
  • the current residual (or intra) signal is predicted by the reconstructed base layer residual (or intra) signal, and thus multiple inverse transforms for each block are required at the decoder side.
  • the coding symbols of the refinement pictures are coded in a common macroblock by macroblock scan, the coding symbols for progressive refinement slices are coded via so-called cyclic block coding.
  • the coding symbols are coded in several scans, where in each scan only a few coding symbols (e.g. only one significant transform coefficient level) is coded for each transform block. This modified coding order allows the truncation of the corresponding NAL units without generating disturbing coding artefacts .
  • the new approach of adaptive motion information refinement for SNR scalable video coding enables the video encoder 100 the choice to select a, in rate-distortion (RD) sense, better tradeoff between bit rate for coding of residual 118 and motion data 116.
  • RD rate-distortion
  • Fig. 12 Using the same motion information as the SNR base layer and thus transmitting only a refinement of the residual data is the first option.
  • Fig. 12 only supports this mode for FGS.
  • This option or coding mode of coding block 114a, b is illustrated in Fig. 7.
  • Fig. 7 illustrates the process of refinement coding block 114a, b for the case that the coding block 114a, 114b decides for each refinement layer to reuse the motion information of the base representation.
  • Fig. 7 shows an exemplary part 300 of the video sequence input into prediction block 110a, b.
  • the motion information m created by prediction block 110 a,b is illustrated by arrow 304.
  • the information entering base layer coding block 112a, b comprises the base-layer motion information m 0 and the residual texture information residualo.
  • Residual information residualo in combination wi*th ' tbe predictor for picture 302 derived from motion information mo represents the reconstruction of the base-representation information derivable from the encoder bit-stream 120a, b, wherein it is emphasized that "residualo" shall indicate the residual information 118a, b after passing the base - layer coding 112a, b and the corresponding quantization.
  • the refinement coding block 114a, b decides to reuse the motion information m 0 as input into the base layer coding 112b.
  • Coding block 114a, b nearly generates additional information residuali that refines the coarsely quantized residual information residual 0 to a higher precision.
  • residuali forms the first refinement layer and a combination of predictor (mo) , residualo and residuali represents the reconstruction of the first refinement layer.
  • the coding block 114a, b generates additional information residual2 for finer quantizing residual information as represented by residuali and residualo. It is emphasized with respect of Fig. 7 that the quantization subdivision performed with increasing refinement layer number is directly performed in a transformed domain not in a spatial domain so that at the decoder side, as mentioned above, merely one re- transformation is necessary. Further, in Fig.
  • mo and residualo is output by block 112a, b, while refinement coding block 114a, b outputs consecutively residuali, residual 2 , residuals...
  • Multiplexer 124 multiplexes them in the order of mo, residuali, residua ⁇ , residuals ... into bit-stream 126, for example.
  • the transform coefficients may be arranged such that a sudden truncation of the resulting multiplexed scale level bit-stream 126 results in optimum coding efficiency.
  • the other option for block 114a, b is the transmission of -new motion information together with a new residual.
  • the base-layer bit-stream 120a, b is formed by encoding the motion information M 0 and a residualo as shown at 306.
  • refinement coding block 114a, b decides to use a new motion information mi for the first refinement as shown at 312. Different examples for refinement motion information are described below.
  • the decision to replace or refine the base-layer motion information m 0 with motion information mi (Fig. 8) is conducted by 114a, b on a macroblock basis, for example.
  • the decision may be performed such that the raise/distortion performance for encoding the reconstruction 314 of the first refinement layer is increased relative to Fig. 7 or even optimized.
  • the rate/distortion performance may be higher in case of reusing the motion information as shown in Fig. 7. In this case, the decision is negative for option of Fig. 8 and positive for option of Fig. 7.
  • prediction block 110a, b Concurrently, prediction block 110a, b generates corresponding residual texture information 118a, b. Same are then encoded by refinement coding block 114a, b along with the new motion information mi to yield the first refinement of enhancement layer bit-stream 122a, b, with the encoding residual information resulting in residuali, and with the quantization during encoding the residual being performed with a smaller quantization step size as used for quantizing or encoding residualo in base layer coding block 112. Similarly, as shown in 316, in building the second refinement layer, coding block 114 again uses new motion information In 2 . It is emphasized that this does not have to be the case. Rather, the coding block 114 could reuse the motion information M x , thereby deciding between option Fig. 7 and Fig. 8 on a refinement layer basis.
  • Fig. 9 represents an alternative embodiment for the case of Fig. 8, i.e. for the case that the coding block 114 decides to transmit new motion information together with new residual.
  • residual a is equal to a finer quantized version of a difference between the actual picture 302 and the sum of the predictive picture derived from the motion information m a plus the residual information of the quantization level available, i.e. residual 0 plus ... plus residual a -i
  • the refinement coding block 114a, b disregards the residual information available so far and encodes a new residual to the new motion information mi, m 2 , ... by use of the quantization level associated with the current refinement layer.
  • coding block 114a, b combines the new motion information mi with new residual information residuali so that picture 302 of the first refinement layer is derivable from the first refinement layer bit-stream 122a, b independently on the base layer bit-stream 120a, b, i.e. without reuse of residual 0 in bit-stream 120a, b.
  • refinement coding block 114a, b forms the second refinement layer bit-stream 122a, b by combining new motion information m 2 with new residual information residual 2 being the result of encoding the residual between the actual picture 302 and the predictive picture obtained from the new motion information m 2 .
  • the new residual data may or may not be predicted from the SNR subordinate laye.r.
  • the new motion information of a current SNR layer may or may not be coded independently on the motion information of the SNR subordinate layer.
  • the new motion and residual data can be predicted from the SNR subordinate layer to achieve a better RD-performance.
  • the motion information mi of a base layer or subordinate refinement layer may be refined by defining a new motion information m 1+ i for the current refinement layer by further partitioning the current picture by increasing the resolution with which motion vectors are defined for the current picture.
  • the motion info-rmation of the subordinate layer comprises just one motion vector 400 with corresponding reference index referencing picture 402, for a specific partition 404 of the current picture.
  • the new motion information rrii + i comprises pairs of motion vectors 406a to 406d and corresponding reference indices associated with one of four sub-partitions 400a to 400d of current partition
  • the actual motion information refinement information coded into the current refinement layer bit-stream by coding block 114 may either define the motion vectors 406a to 406d a new independent on the motion vector 400 or may define motion vector 406a to 406d by defining merely the differences of these vectors to vector 400, respectively.
  • Fig. 10b Another possibility to refine motion information is illustrated in Fig. 10b.
  • the motion information is refined in that relative to the former motion information mi an additional motion hypothesis is specified in a new motion information mi + i of the current refinement layer.
  • the motion information of the subordinate refinement layer or base-layer is illustrated to contain merely one motion vector 400 along with the corresponding reference index to the reference picture 402 for one partitition 404.
  • the new motion information M i+ i contains for that partitition 404 another motion sector 410 belonging to and originating from a picture 412 that is different from reference picture 402. Therefore, the number of motion hypothesis used for partitition 404 in case of the new motion information is increased.
  • the new motion information mi + i may incorporate information as to how partition 404 is to be predicted from both motion hypothesises, such as, by averaging both predictions. However, conventions at encoder and decoder may be used in order to avoid the transmission of such an information.
  • the motion information may be refined by merely amending the motion vector.
  • the reason that a "wrong" motion vector may lead to a better R/D-performer for a low refinement layer or for the base-layer is that the encoding of a larger motion vector leads to more bits being necessary in order to encode the motion information, for example. Therefore, in such cases, the base layer motion information has likely to comprise motion vectors that are shorter than they should.
  • the bits may, in rate-distortion sense, better be spent for a finer, more accurate motion information so that for higher refinement layers the motion information is likely to change to indicate a longer motion vector.
  • the selected macroblock prediction mode (Direct, 16x16, 16x8, 8x16 or 8x8)
  • 8x8 mode the selected sub-macxoblock prediction mode (8x8, 8x4, 4x8 or 4x4)
  • the residual prediction flag which signals that the residual signal is to be predicted from the subordinate layer • the motion prediction flag, which signals that the motion vector is to be predicted from the subordinate layer • the reference frame index
  • JVT/P201 and JVT/P202 for progressive refinement slices.
  • the transmission of this information can be done similar to the current JVT/P201 and JVT/P202 Working Draft, but the coding efficiency can possibly improved by using a different syntax for transmitting this refinement information.
  • the following possibilities or any combination of these are considered as preferred embodiments of the invention:
  • a macroblock mode in which the macroblock partitioning of the base macroblock is used, but reference picture indices and/or motion vector refinements are transmitted, possibly by using the base layer syntax elements for generating predictors.
  • the transmission of a new macroblock mode which specifies a new partitioning of the macroblock together with refinements for the reference indices and/or motion vectors.
  • the transmission of the new macroblock mode can also be realized by refining the macroblock partitioning of the base layer.
  • an additional motion hypothesis in addition to the motion hypothesis of the base layer.
  • the base layer macroblocks specifies a ' 16x16 mode with prediction from list 0
  • « in the enhancement layer an additional motion hypothesis could specify a 16x16 mode with prediction from list 1, and the final motion compensated prediction signal is generated as the weighted average of both the 16x16 list 0 and the 16x16 list 1 prediction.
  • the weighting factor could be additionally specified. Note, that without transmitting of any refinement of the motion information it is still possible to achieve nearly the same rate-distortion characteristic as that of the current SVC WD, since the bit rate is increased only by the additional possibility to signal a motion refinement mode per macroblock which can very efficiently be realized by a well- designed entropy coding. Consequently, the coding efficiency using motion information refinement cannot become noticeably worse than that of the current SVC described in JVT/P201 and JVT/P202.
  • the predictor for the motion vectors can be adaptively switched between a spatial motion vector predictor (similar to H.264/MPEG4-AVC) and a prediction from the corresponding FGS base layer. Additionally when the adaptive motion information refinement is used the prediction of the residual data from the corresponding FGS base layer can be modified or completely switched off.
  • the used prediction signal could be obtained by weighting and filtering of the corresponding base layer residual signal.
  • the transform block size (4x4 or 8x8 transform) can be selected for each FGS layer macroblock independent of the used transform size of the corresponding base layer macroblock.
  • Fig. 11 shows a part of the steps performed at the decoder side for decoding the scaleable bit- stream 126 generated by the encoder 100 of Fig. 1. In particular, the portion of the steps shown in Fig.
  • step 11 concentrates on the decoding of the base-layer or refinement- layer bit-stream 120 or 122 which are present in the scaleable bit-stream 126 in a multiplexed form.
  • the process begins with a base-layer extraction in step 500, for example, by means of parsing datastream 126.
  • the result of this step is the base-layer motion information m 0 and the corresponding residual information residual 0 (see p. Fig. 8 and Fig. 9) .
  • step 502 it is determined as to whether a further refinement is desired. For example, it is determined as to whether a terminal of interest supports a further refined representation of the video signal. If no, the base-layer bit-stream 120 extracted in step 500 is directly decoded in step 504.
  • step 506 the process of Fig. 11 proceeds to step 506, where the current motion information, i.e. the base-layer motion information, is refined or replaced by the motion information contained in the next or first refinement layer bit-stream 122.
  • “Refining” means that the motion information refinement information contained in the refinement layer bit-stream is not self-contained, but has to be combined with the current motion information m 0 in order to obtain the new motion information ⁇ n.
  • "Replacing” refers to the case where the motion information refinement information contained in the refinement layer bit-stream 122 is self-contained, i.e. the new motion information mi is derivable from this bit-stream without use of the current motion information mo.
  • m 0 is disregarded.
  • the current residual information i.e. the transformed coefficients representing the current residual (residualo in Fig. 8 and 9) is refined or replaced by use of the residual information refinement information contained in the refinement layer bit-stream 122.
  • the refining of current residual information is illustrated in Fig. 8, where residuali represents residual information refinement information contained in the refinement layer bit- stream used for refining the coarsely quantized current residual information residualo-
  • the case of replacement of the current residual information is shown in Fig. 9.
  • new residual information residuali is derived from bit-stream 122 only.
  • step 506 and 508 are repeated for the next refinement layer, with a current motion information being mi and a current encoded residual information being residuali+ residualo in case of Fig. 8, and residuali in case of Fig. 9 (see above) .
  • step 504 the current motion information and current residual information is decoded by combining same information, i.e. by combining the residual texture derived by decoding the current encoded residual information with a prediction picture as derived from the current motion information.
  • the output of step 504 is the video input signal in the desired refinement or quality level.
  • FIG. 5 and 6 show the rate distortion performance of the invented FGS coding scheme in comparison to the current SVC WD.
  • the above description reveals FGS coding scheme, in which it is possible to use a different method for generating the motion-compensated prediction signal in an FGS enhancement layer macroblock than in the co-located base layer macroblock.
  • the different motion-compensated prediction may be is specified by a refinement of the motion vectors of the base layer macroblock.
  • the different motion-compensated prediction may be specified by a refinement of the macroblock partitioning of the base layer macroblock or a completely different macroblock partitioning as well as corresponding reference indices and motion vectors.
  • the reference indices and/or motion vectors can be predicted by using the base layer information, or the different motion- compensated prediction may be specified by an additional motion hypothesis, which consists of a macroblock partitioning (which can be identical to the partitioning of the base layer macroblock) and corresponding reference indices and motion vectors.
  • the motion-compensated prediction signal for the FGS enhancement layer may then be generated by a weighted average of the motion-compensated prediction signals that are obtained by using the motion information of the base layer motion hypothesis and the enhancement layer motion hypothesis. Additionally, in this FGS coding scheme a filtered and weighted (the weight can also be equal to zero) version of the base layer residual signal may be used as a prediction for the enhancement layer residual signal, and the used transform block size for coding of the enhancement layer residual signal may be chosen independently of the used transform block size in the base layer.
  • the above embodiments describe a concept for fine-granular SNR scalable coding of video sequences with an adaptive refinement of motion/prediction information.
  • the coding efficiency can be significantly improved, especially when the fine-granular scalability have to be supported for large bit-rate intervals.
  • the modification of motion information is signalled on a macroblock basis in an fine-granular scalable layer.
  • the transform coefficient levels for the corresponding macroblock are coded as in the FGS design of the scalable H.264/MPEG4-AVC extension as described in the above mentioned working drafts.
  • the motion parameters are additionally included into the progressive refinement slice syntax and the coding of transform coefficient levels is changed.
  • the decision on whether a motion parameter refinement is coded and the corresponding motion parameters are determined by rate- distortion optimized mode decision and motion estimation at the encoder side.
  • the decoder complexity is only slightly increased compared to the current FGS concept as specified in the above mentioned SVC Working Drafts.
  • the inventive encoding scheme can be implemented in hardware or in software. Therefore, the present invention also relates to a computer program, which can be stored on a computer-readable medium such as a CD, a disk or any other data carrier.
  • the present invention is, therefore, also a computer program having a program code which, when executed on a computer, performs the inventive method of encoding or binarizing or the inventive method of decoding or recovering described in connection with the above figures.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Une idée de base de la présente invention était que des couches d'amélioration présentant une efficacité améliorée de débit/distorsion peuvent être formées par ajout d'informations de mouvement affinées à des informations d'affinage d'informations résiduelles. On a notamment admis que pour des débits maximum faibles, des informations grossières de mouvement impliquant moins de bits pour représenter les informations de mouvement dans le flux binaire produisaient une bonne efficacité de débit/distorsion. Cependant, avec des débits maximum en augmentation, le simple affinement du codage des informatisons de texture relatives aux informations grossières de mouvement ne produit pas une efficacité de débit/distorsion optimale pour représenter la couche d'amélioration respective. Il faut plutôt utiliser plus de bits pour les informations de mouvement afin d'affiner les informations de mouvement en vue d'atteindre une meilleure efficacité de débit/distorsion pour des débits binaires maximum plus élevés. On obtient grâce à cette mesure, une efficacité de codage significativement améliorée.
PCT/EP2005/010972 2005-10-12 2005-10-12 Codec video acceptant l'echelonnabilite de la qualite WO2007042063A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2005/010972 WO2007042063A1 (fr) 2005-10-12 2005-10-12 Codec video acceptant l'echelonnabilite de la qualite

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2005/010972 WO2007042063A1 (fr) 2005-10-12 2005-10-12 Codec video acceptant l'echelonnabilite de la qualite

Publications (1)

Publication Number Publication Date
WO2007042063A1 true WO2007042063A1 (fr) 2007-04-19

Family

ID=36498738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/010972 WO2007042063A1 (fr) 2005-10-12 2005-10-12 Codec video acceptant l'echelonnabilite de la qualite

Country Status (1)

Country Link
WO (1) WO2007042063A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010144406A1 (fr) * 2009-06-11 2010-12-16 Motorola Mobility, Inc. Compression d'image numérique par décimation du résidu
US8428143B2 (en) 2006-03-22 2013-04-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Coding scheme enabling precision-scalability
US9635356B2 (en) 2012-08-07 2017-04-25 Qualcomm Incorporated Multi-hypothesis motion compensation for scalable video coding and 3D video coding
CN109155847A (zh) * 2016-03-24 2019-01-04 英迪股份有限公司 用于编码/解码视频信号的方法和装置
CN112136330A (zh) * 2018-04-27 2020-12-25 威诺瓦国际有限公司 视频解码器芯片组

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
J. REICHEL, H. SCHWARZ, M. WIEN: "Joint Scalable Video Model JSVM-3", JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), 16TH MEETING: POZNAN, POLAND, JULY, 2005, no. JVT P202, 29 July 2005 (2005-07-29), pages 1 - 34, XP002384686, Retrieved from the Internet <URL:http://ftp3.itu.ch/av-arch/jvt-site/2005_07_Poznan/> [retrieved on 20060606] *
LANGE R ET AL: "Simple AVC-based codecs with spatial scalability", IMAGE PROCESSING, 2004. ICIP '04. 2004 INTERNATIONAL CONFERENCE ON SINGAPORE 24-27 OCT. 2004, PISCATAWAY, NJ, USA,IEEE, 24 October 2004 (2004-10-24), pages 2299 - 2302, XP010786245, ISBN: 0-7803-8554-3 *
LI Z G ET AL: "A Novel SNR Refinement Scheme for Scalable Video Coding", IMAGE PROCESSING, 2005. ICIP 2005. IEEE INTERNATIONAL CONFERENCE ON GENOVA, ITALY 11-14 SEPT. 2005, PISCATAWAY, NJ, USA,IEEE, 11 September 2005 (2005-09-11), pages 644 - 647, XP010851473, ISBN: 0-7803-9134-9 *
SCHWARZ H ET AL: "Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability", IMAGE PROCESSING, 2005. ICIP 2005. IEEE INTERNATIONAL CONFERENCE ON GENOVA, ITALY 11-14 SEPT. 2005, PISCATAWAY, NJ, USA,IEEE, 11 September 2005 (2005-09-11), pages 870 - 873, XP010851192, ISBN: 0-7803-9134-9 *
SCHWARZ H ET AL: "Scalable Extension of H.264/AVC", ISO/IEC JTC1/CS29/WG11 MPEG04/M10569/S03, XX, XX, March 2004 (2004-03-01), pages 1 - 39, XP002340402 *
SCHWARZ H ET AL: "SVC Core Experiment 2.1: Inter-layer prediction of motion and residual data", ISO/IEC JTC1/SC29/WG11 M11043, XX, XX, no. M11043, 23 July 2004 (2004-07-23), pages 1 - 6, XP002360488 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8428143B2 (en) 2006-03-22 2013-04-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Coding scheme enabling precision-scalability
EP1859630B1 (fr) * 2006-03-22 2014-10-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Code permettant un codage hiérarchique précis
WO2010144406A1 (fr) * 2009-06-11 2010-12-16 Motorola Mobility, Inc. Compression d'image numérique par décimation du résidu
US9635356B2 (en) 2012-08-07 2017-04-25 Qualcomm Incorporated Multi-hypothesis motion compensation for scalable video coding and 3D video coding
CN109155847A (zh) * 2016-03-24 2019-01-04 英迪股份有限公司 用于编码/解码视频信号的方法和装置
EP3435673A4 (fr) * 2016-03-24 2019-12-25 Intellectual Discovery Co., Ltd. Procédé et appareil de codage/décodage de signal vidéo
US10778987B2 (en) 2016-03-24 2020-09-15 Intellectual Discovery Co., Ltd. Method and apparatus for encoding/decoding video signal
US11388420B2 (en) 2016-03-24 2022-07-12 Intellectual Discovery Co., Ltd. Method and apparatus for encoding/decoding video signal
US11770539B2 (en) 2016-03-24 2023-09-26 Intellectual Discovery Co., Ltd. Method and apparatus for encoding/decoding video signal
US11973960B2 (en) 2016-03-24 2024-04-30 Intellectual Discovery Co., Ltd. Method and apparatus for encoding/decoding video signal
CN112136330A (zh) * 2018-04-27 2020-12-25 威诺瓦国际有限公司 视频解码器芯片组

Similar Documents

Publication Publication Date Title
EP1859630B1 (fr) Code permettant un codage hiérarchique précis
Schwarz et al. Constrained inter-layer prediction for single-loop decoding in spatial scalability
Schwarz et al. Overview of the scalable H. 264/MPEG4-AVC extension
Wien et al. Performance analysis of SVC
AU2006269728B2 (en) Method and apparatus for macroblock adaptive inter-layer intra texture prediction
US8208564B2 (en) Method and apparatus for video encoding and decoding using adaptive interpolation
Schwarz et al. MCTF and scalability extension of H. 264/AVC
US9113167B2 (en) Coding a video signal based on a transform coefficient for each scan position determined by summing contribution values across quality layers
WO2007079782A1 (fr) Codage d&#39;image à qualité extensible au moyen d&#39;un chemin de balayage à coefficient de transformée spécifique
JP2015065688A (ja) スケーラブルビデオコーディングのためのテクスチャ予想及びリサンプリングの方法及び装置
BR112013031215B1 (pt) Método e aparelho de codificação escalável de vídeo
KR20140016823A (ko) 영상의 복호화 방법 및 이를 이용하는 장치
WO2007042063A1 (fr) Codec video acceptant l&#39;echelonnabilite de la qualite
KR20160085237A (ko) 머지를 기반으로 한 복호화 방법 및 장치
KR20080002936A (ko) 하나 이상의 디지털 영상을 인코딩하는 방법, 인코더 및컴퓨터 프로그램 생성물
Liu et al. New and efficient interframe extensions of EZBC and JPEG 2000
KR102271878B1 (ko) 영상의 부호화/복호화 방법 및 이를 이용하는 장치
An et al. Low complexity scalable video coding
Agostini-Vautard et al. A new coding mode for hybrid video coders based on quantized motion vectors
De Wolf et al. Adaptive Residual Interpolation: a Tool for Efficient Spatial Scalability in Digital Video Coding.
Peixoto Fernandes da Silva Advanced heterogeneous video transcoding
KR20210120948A (ko) 영상의 복호화 방법 및 이를 이용하는 장치
KR20140076508A (ko) 비디오 부호화 방법 및 비디오 복호화 방법과 이를 이용하는 장치
Kim et al. Optimum quantization parameters for mode decision in scalable extension of H. 264/AVC video codec
원광현 et al. Adaptive Interleaved Motion Vector Coding using Motion Characteristics

Legal Events

Date Code Title Description
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05799811

Country of ref document: EP

Kind code of ref document: A1