CN1843040A

CN1843040A - Scalable video coding and decoding methods, and scalable video encoder and decoder

Info

Publication number: CN1843040A
Application number: CN 200480024363
Authority: CN
Inventors: 李培根; 河昊振; 韩宇镇; 李宰荣
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2003-08-26
Filing date: 2004-08-14
Publication date: 2006-10-04

Abstract

Scalable video coding and decoding methods, a scalable video encoder, and a scalable video decoder. The scalable video coding method includes receiving a GOP, performing temporal filtering and spatial transformation thereon, quantizing and generating a bitstream. The scalable video encoder for performing the scalable video coding method includes a weight determination block which determines a weight for scaling. The scalable video decoding method includes dequantizing the coded image information obtained from a received bitstream, performing descaling, inverse spatial transformation, and inverse temporal filtering on the scaled transform coefficients, thereby recovering video frames. The scalable video decoder for performing the scalable video decoding method includes an inverse weighting block. The standard deviation of Peak Signal to Noise Ratios (PSNRs) of frames included in a group of pictures (GOP) is reduced so that video coding performance can be increased.

Description

Scalable video coding and coding/decoding method and scalable video encoder and decoder

Technical field

The present invention relates to video compression, the encoder that is specifically related to use scalable (scalable) video coding of weight (weight) and coding/decoding method and uses described method respectively.

Background technology

Along with the development of the ICT (information and communication technology) that comprises the internet, video communication and text and voice communication increase.Traditional textcommunication can not satisfy various needs of users, therefore, and for providing needs to increase such as the multimedia service of the various types of information of text, picture and music.The broadband that multi-medium data needs large-capacity storage media and is used to transmit is because the amount of multi-medium data is big usually.For example, the true color image of 24 bits with resolution of 640*480 needs the capacity of every frame 640*480*24 bit, i.e. the about data of 7.37M bit.When the speed with per second 30 frames sends this image, need 221M bps bandwidth.When storage during, need the memory space of about 1200G bit based on 90 minutes films of such image.Therefore, compaction coding method is send the multi-medium data comprise text, video and audio frequency and so on essential.

The basic principle of data compression is to remove data redundancy.Can be by removing spatial redundancy, time redundancy or considering people's eyesight and remove the memory visual redundancy for the limited perception of high-frequency signal and come packed data, in described spatial redundancy, same color or target repeat in image, in described time redundancy, very little variation is arranged between the consecutive frame in moving image, and perhaps same sound repeats on audio frequency.Data compression can diminish/lossless compress according to whether losing source data and being classified as, be classified as according to whether compressing each frame independently in the frame/the interframe compression, and identical with the needed time of recovery and be classified as symmetry/asymmetric compression according to whether compressing the needed time.Data compression is defined as compressing/postpone to be no more than the scalable compression that 50 milliseconds Real Time Compression and frame has different resolution recovery time.For text or medical data, use lossless compress usually.For multi-medium data, use lossy compression method usually.Simultaneously, use usually to compress in the frame to remove spatial redundancy, and use the interframe compression to remove time redundancy usually.

Multimedia dissimilar transmission medium has different performances.The transmission medium of current use has various transmission rates.For example, the ultrahigh speed communication network can send the data of per second tens megabits (megabit), and mobile communications network has the transmission rate of per second 384k bit.Such as Motion Picture Experts Group (MPEG)-1, MPEG-2, H.263 and in the conventional video coding method H.264, remove time redundancy by motion compensation, and remove spatial redundancy by transition coding based on estimation and compensation.These methods have gratifying compression ratio, but they do not have the flexibility of true scalable bit stream, because they use reflexive (reflexive) method in main algorithm.Therefore, send multimedia, have the data-encoding scheme of scalability,, may be suitable for multimedia environment such as small wave video coding and subband video coding for the transmission medium of supporting to have various speed or with the data transfer rate that is suitable for sending environment.The ability of the single compression bit stream of scalability indicating section ground decoding.Scalability comprises: space scalability, instruction video resolution; The signal to noise ratio (snr) scalability, instruction video quality level; The time scalability, the indication frame rate.Scalable video encoder is to single stream encryption, and can send the part of encoding stream with different quality level, resolution or frame rate, with adaptive restrictive condition such as bit rate, sum of errors resource.The video flowing that scalable Video Decoder can be decoded and be sent when change quality level, resolution or frame rate.

Interframe wavelet video coding (IWVC) can provide very flexible, scalable bit stream.But traditional IWVC has the performance lower than the coding method such as H.264.Because this lower performance, though IWVC has good scalability, it only is used for very limited application.Therefore, the performance of improving the data-encoding scheme with scalability is a problem.

Fig. 1 is the flow chart of IWVC.

At step S1, be that unit receives image with the picture group (GOP) that comprises a plurality of frames.Preferably, for the time scalability, described GOP comprises 2 ⁿ(n=1,2,3 ...) individual frame.In an embodiment of the present invention, described GOP comprises 16 frames, and is that unit carries out various operations with GOP.

Then, in step S2, use layering variable size block coupling (HVSBM) to carry out estimation.When original image size is N*N, use wavelet transformation to obtain the image of grade 0 (N*N), grade 1 (N/2*N/2) and grade 2 (N/4*N/4).For the image of grade 2, the estimation block size changes to 8*8 and 4*4 from 16*16, carries out estimation for each piece, and obtains the amplitude (MAD) of absolute distortion with respect to each piece.Similarly, for the image of grade 1, the estimation block size changes to 16*16,8*8 and 4*4 from 32*32, carries out estimation for each piece, and obtains MAD with respect to each piece.For the image of grade 0, the estimation block size changes to 32*32,16*16,8*8 and 4*4 from 64*64, carries out estimation for each piece, and obtains MAD with respect to each piece.

Then, at step S3, prune (prune) estimation tree to minimize MAD.

Then, at step S4, use the optimal movement of being pruned to estimate that tree carries out motion compensated temporal filtering (MCTF), comes this explanation with reference to Fig. 2.Referring to Fig. 2, the position of the described frame of the numeral that in each frame, writes in time series, Wn (wherein n=1,2 ..., 15) be illustrated in the subband that obtains behind the MCTF.In other words, fr0-fr15 is illustrated in it is carried out 16 frames that MCTF comprises before in single GOP.

At first, in time grade 0, carry out MCTF, obtain 8 low-frequency frame and 8 high-frequency sub-band W8, W9, W10, W11, W12, W13, W14 and W15 thus for 16 picture frame forward directions (forward).In time grade 1, carry out MCTF for 8 low-frequency frame forward directions, obtain 4 low-frequency frame and 4 high-frequency sub-band W4, W5, W6 and W7 thus.In time grade 2, carry out MCTF for 4 low-frequency frame forward directions that in time grade 1, obtain, obtain 2 low-frequency frame and 2 high-frequency sub-band W2 and W3 thus.At last, in time grade 3, carry out MCTF, obtain single low frequency sub-band W0 and single high-frequency sub-band W1 thus for 2 low-frequency frame forward directions that in time grade 2, obtain.Therefore,, obtain 16 subband W0-W15 altogether, comprising in the end 15 high-frequency sub-band and the single low frequency sub-band of grade as the result of MCTF.After obtaining these 16 subbands, carry out spatial alternation and quantification for 16 subbands among the step S5 of Fig. 1., at step S6 produce bit stream, comprising data that obtain from spatial alternation and quantification and the motion vector data that obtains from estimation thereafter.

Summary of the invention

Technical problem

Though traditional IWVC has good scalability, it still has shortcoming.Generally, in order to measure the performance of video coding quantitatively, use Y-PSNR (PSNR).When the difference between original image and coded image hour, the PSNR value is big.When the difference between original image and coded image was big, the PSNR value was little.When two images were identical exactly, the PSNR value was unlimited.Fig. 3 shows in traditional IWVC mean P SNR value with respect to the distribution of frame index.As shown in Figure 3, the PSNR value changes widely with respect to the frame index in the GOP.In the position such as fr0, fr4, fr8, fr12 and fr16 (i.e. fr0 in another GOP), it is littler than the adjacent position at them that the PSNR value becomes.When the PSNR value changed widely with respect to frame index, the video pictures quality changed widely along with the time.When image quality changed temporarily widely, people perceived picture quality degradation.As mentioned above, the difference of image quality has hindered the commerce services such as the stream service.Therefore, the variable quantity of reduction PSNR value is crucial for the scalable video coding based on small echo.Simultaneously, the variable quantity that is reduced in the PSNR value between the frame in the GOP is important in the scalable video coding of use based on the spatial alternation of small echo, and also is important in the scalable video coding of the spatial alternation that uses other type such as discrete cosine transform (DCT).

Technical scheme

The invention provides scalable video coding and coding/decoding method and scalable video encoder and decoder, the feasible change that can be reduced in the Y-PSNR (PSNR) of described method.

According to one aspect of the present invention, a kind of scalable method for video coding is provided, comprising: (a) receive a plurality of frame of video, and carry out motion compensated temporal filtering (MCTF) from frame of video, to remove time redundancy for a plurality of frame of video; And (b) from the frame of video that is removed time redundancy, obtain by the conversion coefficient of scalable (scale), quantize scalable conversion coefficient, and produce bit stream.

The frame of video that receives in step (a) has been carried out wavelet transformation, so that remove spatial redundancy from frame of video, and can by use to some subbands in the frame of video that is removed time redundancy predetermined weight obtain scalable conversion coefficient.

Also can be by using predetermined weight, and carry out spatial alternation for the subband of weighting to some subbands in the frame of video that is removed time redundancy, come in step (b) acquisition scalable conversion coefficient.

Preferably, by carrying out spatial alternation for the frame of video that is removed time redundancy, and use predetermined weight to conversion coefficient in the conversion coefficient that produces by spatial alternation, that obtain from some subbands, come in step (b) acquisition scalable conversion coefficient.In this case, determine predefined weight for each picture group (GOP).For single GOP, described predefined weight has single value, and preferably, determines described predefined weight according to the amplitude of the absolute distortion of GOP.At this, preferably, obtain to use described predefined weight and scalable conversion coefficient from subband, described subband is compared with low PSNR frame at the subband that is used for constructing low PSNR frame, applies the essence slight influence for high Y-PSNR (PSNR) frame.

The bit stream that in step (b), produces comprise about be used to obtain the information of weight of scalable conversion coefficient.

According to another aspect of the present invention, a kind of scalable video encoder is provided, it receives a plurality of frame of video, and produces bit stream.Described scalable video encoder comprises: the temporal filtering piece, and it carries out MCTF to remove time redundancy from described frame of video for frame of video; The spatial alternation piece, it carries out spatial alternation for frame of video, to remove spatial redundancy from frame of video; Weight is determined piece, and it determines weight, and described weight will be used for scalable at the conversion coefficient conversion coefficient that obtains as removing the result of time redundancy and spatial redundancy from frame of video, that obtain from some subbands; Quantize block, it quantizes by scalable conversion coefficient; And bit stream generation piece, its uses quantized transform coefficients to produce bit stream.

Described spatial alternation piece can be carried out wavelet transformation to remove spatial redundancy from frame of video to frame of video, described temporal filtering piece can use the subband that obtains by the frame of video execution MCTF to wavelet transformation to produce conversion coefficient, and described weight determines that piece can use the frame of wavelet transformation to determine weight, and determined weight be multiply by the conversion coefficient that obtains from some subbands, obtain thus by scalable conversion coefficient.

Described temporal filtering piece can obtain subband by frame of video is carried out MCTF, described weight determines that piece can use frame of video to determine weight, and determined weight be multiply by some subbands to obtain by scalable subband, and described spatial alternation piece can obtain by scalable conversion coefficient thus for being carried out spatial alternation by scalable subband.

And, the temporal filtering piece can obtain subband by carrying out MCTF for frame of video, and described spatial alternation piece can produce conversion coefficient by carrying out spatial alternation for subband, and described weight determines that piece can use frame of video to determine weight, and determined weight be multiply by the conversion coefficient that obtains from predetermined sub-band, obtain thus by scalable conversion coefficient.

At this, preferably, determine predefined weight for each picture group (GOP) according to the amplitude of the absolute distortion of GOP.Preferably, obtain to use predefined weight and scalable conversion coefficient from subband, described subband is compared with low PSNR frame at the subband that is used for constructing low PSNR frame, applies the essence slight influence for high Y-PSNR (PSNR) frame.

Described bit stream produces piece and can comprise about being used to obtain the information by the weight of scalable conversion coefficient.

According to another aspect of the present invention, a kind of scalable video encoding/decoding method is provided, comprising: from bitstream extraction image encoded information, coded sequence information and weight information; Obtain by scalable conversion coefficient by go to quantize (dequantize) for coded image information; And with come by the opposite decoding order of the coded sequence of described coded sequence information indication to remove scalable (descale), inverse spatial transform and filter between the inverse time to being carried out by scalable conversion coefficient, thereby recover frame of video.

Described decoding order for example is to filter and inverse spatial transform between scalable, inverse time.In addition, described decoding order can be inverse spatial transform, go scalable and filter between the inverse time, perhaps can be scalable, inverse spatial transform and filters between the inverse time.

For each picture group (GOP), from bitstream extraction predefined weight for example.At this, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

For example, obtain to use described predefined weight and contrary scalable conversion coefficient from subband W4, W6, W8, W10, W12 and the W14 that during encoding, produces.

According to another aspect of the present invention, a kind of scalable Video Decoder is provided, comprising: the bit stream analysis piece, its analyzes the bit stream received with from described bitstream extraction image encoded information, coded sequence information and weight information; Inverse quantization block, it goes to quantize coded image to obtain by scalable conversion coefficient; Contrary weighting block, its execution is gone scalable; The inverse spatial transform piece, it carries out inverse spatial transform; And filter block between the inverse time, it is carried out between the inverse time and filters, described scalable Video Decoder with come by the opposite order of the coded sequence of described coded sequence information indication to remove scalable, inverse spatial transform and filter between the inverse time to being carried out by scalable conversion coefficient, recover frame of video thus.

In the indefiniteness example, described decoder is scalable to go, filter between the inverse time and the order of inverse spatial transform is carried out decoding.In addition, described decoder can be with inverse spatial transform, remove the order of filtering between scalable and inverse time or, inverse spatial transform scalable and the order of filtering between the inverse time is carried out decoding to go.

In another indefiniteness example, the weight that described bit stream analysis piece is scheduled to from described bitstream extraction for each picture group (GOP).At this, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

According to one embodiment of the present of invention, described contrary weighting block is for carrying out contrary scalable from the scalable conversion coefficient of subband W4, the W6, W8, W10, W12 and the W14 that have produced during encoding.

Description of drawings

By describing exemplary embodiment of the present invention with reference to the accompanying drawings in detail, above-mentioned and other characteristics of the present invention and advantage will become apparent, wherein:

Fig. 1 is the flow chart of traditional interframe wavelet video coding (IWVC);

Fig. 2 illustrates traditional motion compensated temporal filtering (MCTF);

Fig. 3 is the figure that the Y-PSNR (PSNR) that occurs when the Foreman of two picture groups (GOP) (Fu Man) sequence is carried out traditional IWVC with the speed of 512Kbps is shown;

Fig. 4 is the flow chart according to the scalable method for video coding of one embodiment of the present of invention;

Fig. 5 illustrates the rules that are used for determining subband that will be scalable according to one embodiment of the present of invention;

Fig. 6 illustrates the sketch map according to the best scalable factor of absolute distortion amplitude (MAD);

Fig. 7 is used for figure that the mean P SNR value that obtains is in the present invention compared with the mean P SNR value that obtains at conventional art;

Fig. 8 illustrates the MCTF according to the different time orientation of the use of one embodiment of the present of invention;

Fig. 9 is the functional-block diagram according to the scalable video encoder of one embodiment of the present of invention;

Figure 10 is the functional-block diagram according to the scalable video encoder of an alternative embodiment of the invention; And

Figure 11 is the functional-block diagram according to the scalable Video Decoder of one embodiment of the present of invention.

Embodiment

Describe exemplary, indefiniteness embodiment of the present invention in detail referring now to accompanying drawing.

Fig. 4 is the flow chart according to the scalable method for video coding of one embodiment of the present of invention.

At first, at step S10, be that unit receives image with the picture group (GOP) that comprises a plurality of frames.In one embodiment of the invention, single GOP comprises 16 frames, and is that unit carries out all operations with GOP.

After receiving image, calculate weight, be the scalable factor at step S20.The following describes the calculating of the scalable factor.

At step S30, use layering variable size block coupling (HVSBM) carry out estimation thereafter.After described estimation,, prune the estimation tree, so that minimize absolute distortion amplitude (MAD) at step S40.

Then, at step S50, use the optimal movement of being pruned to estimate that tree carries out motion compensated temporal filtering (MCTF).As the result of MCTF, obtain 16 subbands altogether, comprising 15 high-frequency sub-band and single low frequency sub-band.At step S60, described 16 sons are brought the row space conversion into.Can use discrete cosine transform (DCT) to be used as spatial alternation, but preferably use wavelet transformation.Thereafter, at step S70, it is scalable to use the scalable factor that obtains in step S20 to carry out frame.It is scalable to the following describes described frame.After frame is scalable, in step S80, carry out the quantification that embeds, in step S90, produce bit stream then.Described bit stream comprises coded image information, motion vector information and scalable factor information.During encoding, can be in the laggard line time conversion of spatial alternation, and can after time change, carry out scalable.Can in described bit stream, comprise information, so decoder can be discerned different coded sequences about coded sequence.But described bit stream must not comprise coded sequence information.When in described bit stream, not comprising coded sequence information, can be with code identification for being performed with predetermined order.In an embodiment of the present invention, the high-frequency sub-band indication is the result ((a-b)/2) of two picture frames (' a ' and ' b ') relatively, and the mean value ((a+b)/2) of two picture frames of low frequency sub-band indication.But, the invention is not restricted to this.For example, high-frequency sub-band can be indicated poor (a-b) between two frames, and low frequency sub-band can be indicated the frame (a) in two frames that compared.

Fig. 5 illustrates the rules that are used for determining subband that will be scalable according to one embodiment of the present of invention.Subband is indicated a plurality of high-frequency frame and the single low-frequency frame that obtains as the result of temporal filtering.High-frequency frame is called as high-frequency sub-band, and low-frequency frame is called as low frequency sub-band.In scalable video coding, MCTF is used as temporal filtering.When using MCTF, can remove time redundancy, and can obtain the time scalability.

With reference to Fig. 5 illustrate between frame of video fr0-fr15 and the subband W0-W15 that produces from MCTF relation and recovery time frame method.Can the contextual definition between frame of video fr0-fr15 and subband W0-W15 is as follows:

fr15＝W0+W1+W3+W7+W15

fr14＝W0+W1+W3+W7-W15

fr13＝W0+W1+W3-W7+W14

fr12＝W0+W1+W3-W7-W14

fr11＝W0+W1-W3+W6+W13

fr10＝W0+W1-W3+W6-W13

fr9＝W0+W1-W3-W6+W12

fr8＝W0+W1-W3-W6-W12

fr7＝W0-W1+W2+W5+W11

fr6＝W0-W1+W2+W5-W11

fr5＝W0-W1+W2-W5+W10

fr4＝W0-W1+W2-W5-W10

fr3＝W0-W1-W2+W4+W9

fr2＝W0-W1-W2+W4-W9

fr1＝W0-W1-W2-W4+W8

fr0＝W0-W1-W2-W4-W8。

As shown in Figure 3, frame fr0, fr4, fr8 and fr12 compare with consecutive frame and have low especially Y-PSNR (PSNR), and they are called as low PSNR frame.The reason that the low PSNR frame period occurs is relevant with the MCTF order.In other words, motion estimation error appears at during the MCTF, and grade increases and added up in time.Determine the degree of adding up by the MCTF structure.For by the frame of replacing in the high-frequency sub-band of low time grade, the degree that adds up height.On the contrary, have high PSNR value by the frame of replacing in the high-frequency sub-band of high time grade with by the frame of replacing at the low frequency sub-band of the highest time grade, and these frames are called as high PSNR frame.

Therefore, can rebuild the subband that is filtered of selecting multiply by the scalable factor the subband that hangs down the PSNR frame from needs.Distribute more bits with the expression of multiplying each other of the scalable factor.In other words, subband be multiply by scalable factor representation when bit preferably is assigned to bigger conversion coefficient, compare, distribute more bits to the conversion coefficient that obtains from selected subband with other conversion coefficient when considering during the quantification that embeds.Low PSNR frame in the GOP of the bits of encoded of using predetermined quantity distributes more bits to mean to the distribution of the frame except that the low PSNR frame in GOP bit still less.Similarly, when the PSNR value of low PSNR frame improved, the PSNR value of high PSNR frame reduced.Rebuild the needed and subband that apply less influence to high PSNR frame of low PSNR frame and be selected to multiply by the scalable factor.In other words, should select the minimum subband (hereinafter referred to as the minimum change subband) that is used to rebuild high PSNR frame.Correspondingly, at first select subband W8, W10, W12 and W14.But,, therefore need special compensation for frame fr0 and fr8 because frame fr0 and fr8 have the PSNR value lower especially than other frame.For this reason, in described embodiment of the present invention, be chosen as the minimum change subband that will multiply by the scalable factor with subband W4 and W6 are additional, so that be reduced in the change in the PSNR value widely.

Similarly, as shown in Figure 5, in the subband W0-W15 that uses MCTF to obtain, minimum change subband W4, W6, W8, W10, W12 and W14 be multiply by the scalable factor ' a '.In order to reduce the amount of calculation of video coding, preferably, calculate the scalable factor of each GOP, rather than one next calculate the scalable factor together for all frames in the video.In the above embodiment of the present invention, the identical scalable factor is used for minimum change subband W4, W6, W8, W10, W12 and W14, so that reduce amount of calculation, but spirit of the present invention is not limited to the above embodiments.Should be interpreted as in spirit of the present invention, comprising that the subband that obtains by the MCTF operation is weighted so that reduce the video coding and the decoding technique of the change in the PSNR value.Therefore, in scope of the present invention, also comprise the situation that subband be multiply by the different scalable factors.

Can make the scalable factor that ins all sorts of ways to determine multiply by subband.In one embodiment of the invention, obtain the scalable factor according to MAD for each GOP.In described embodiment of the present invention, define MAD by formula (1).

MAD = 8 \times Σ_{i = 0}^{\frac{n - 1}{2}} Σ_{x = 0}^{p - 1} Σ_{y = 0}^{q - 1} | T_{2 i + 1} (x, y) - T_{2 i} (x, y) | \cdot \cdot \cdot (1)

At this, ' i ' indicates frame index, the last frame index of ' n ' indication in GOP, T (x, y) position of indication in the T frame (x, picture value y), and the size of single frame is p*q.

In order to realize the present invention, the scalable factor be multiply by subband according to MAD.Then, obtain the PSNR value of each frame.Then, obtain the best scalable factor ' a ' as shown in Figure 6.

Fig. 6 illustrates the sketch map according to the best scalable factor of MAD.In Fig. 6, solid line is the figure of the value that obtains in actual experiment, and dotted line is by described value and linear formula are approached the figure that obtains.Use formula (2) to obtain the scalable factor ' a '.

A=1.3 (if MAD＜30)

A=1.4-0.0033MAD (if 30＜MAD＜140)

A=1 (if MAD＞140) (2)

After obtaining the described scalable factor ' a ', carry out scalable to subband.In other words, among the subband W0-W15 that obtains using MCTF, come to carry out scalable to minimum change subband W4, W6, W8, W10, W12 and W14 according to formula (3).

W4＝a*W4，W6＝a*W6

W8＝a*W8，W10＝a*W10

W12=a*W12, W14=a*W14 (using formula (2) to obtain " a ") (3)

Fig. 7 is the figure that the mean P SNR value of acquisition is in one embodiment of the invention compared with the mean P SNR value that obtains under the situation of using traditional MCTF.

Referring to Fig. 7, the change of PSNR value in described embodiment of the present invention less than in using the situation of traditional MCTF.In addition, can find that the low PSNR value under conventional situation is enhanced in the present invention, the high PSNR value under conventional situation is lowered in the present invention simultaneously.

The method with some the frame weightings among the GOP in the process of traditional MCTF of carrying out at forward direction only, can also be by improving the PSNR value filtering according to predetermined principle combinations forward direction temporal filtering with between the inverse time during the MCTF.The forward direction of combination and the example of filtering between the inverse time have been shown in the table 1.

Table 1

Mode flags	Grade	0	Grade 1	Grade 2	Grade 3
Mode flags	Grade	0	Grade 1	Grade 2	Grade 3	Forward direction (F=0)	++++++++	++++	++	+
Contrary (F=1)	--------	----	--	-		Forward direction (F=0)	++++++++	++++	++	+
Contrary (F=1)	--------	----	--	-	The forward direction of combination and contrary (F=2) situation (a) situation (b) situation (c) situation (d)	+-+-+-+- +-+-+-+- ++++++++ ++++----	++-- +-+- ++-- ++--	+- +- +- +-	+(-) +(-) - -

Situation (c) and (d) it is characterized in that the low-frequency frame of grade (hereinafter referred to as reference frame) in the end is positioned at the center (i.e. the 8th frame) of the 1st to the 16th frame.Reference frame is the most requisite frame in video coding.Recover other frame according to reference frame.When the time gap between a frame and reference frame increased, restorability reduced.Therefore, in situation (c) with (d), the combination of carrying out the forward direction temporal filtering and filtering between the inverse time is so that reference frame is positioned at center, i.e. the 8th frame, to be minimized in the time gap between reference frame and each other frame.

In situation (a) with (b), minimized average time gap (ATD).In order to calculate ATD, computing time distance.Time gap is defined in two alternate position spikes between the frame.Referring to Fig. 3, the time gap between first frame and second frame is defined as 1, and the time gap between frame 2 and frame 4 is defined as 2.By the time gap between the frame that carries out motion estimation operation in couples and divided by the right quantity of the frame that defines for estimation, obtain ATD.

In situation (a),

ATD = \frac{8 \times 1 + 4 \times 1 + 2 \times 4 + 1 \times 3}{15} = 1.53

In situation (b),

ATD = \frac{8 \times 1 + 4 \times 1 + 2 \times 3 + 1 \times 5}{15} = 1.53

In forward mode shown in the table 1 and inverse mode,

ATD = \frac{8 \times 1 + 4 \times 2 + 2 \times 4 + 1 \times 8}{15} = 2.13

In situation (c),

ATD = \frac{8 \times 1 + 4 \times 2 + 2 \times 4 + 1 \times 2}{15} 1.73

In situation (d),

ATD = \frac{8 \times 1 + 4 \times 1 + 2 \times 4 + 1 \times 1}{15} = 1.67

In realistic simulation, along with ATD reduces, the PSNR value improves, so that improve the performance of video coding.

Fig. 8 illustrates the MCTF that carries out in the different time direction shown in the situation (a).Solid line indication forward direction temporal filtering, dotted line filtered between the indication inverse time.When carrying out MCTF as shown in Figure 8, the relation between frame fr0-fr15 and subband W0-W15 is defined as follows:

fr15＝W0+W1-W3-W7-W15

fr14＝W0+W1-W3-W7+W15

fr13＝W0+W1-W3+W7+W14

fr12＝W0+W1-W3+W7-W14

fr11＝W0+W1+W3-W6-W13

fr10＝W0+W1+W3-W6+W13

fr9＝W0+W1+W3+W6+W12

fr8＝W0+W1+W3+W6-W12

fr7＝W0-W1+W2+W5-W11

fr6＝W0-W1+W2+W5+W11

fr5＝W0-W1+W2-W5+W10

fr4＝W0-W1+W2-W5-W10

fr3＝W0-W1-W2+W4-W9

fr2＝W0-W1-W2+W4+W9

fr1＝W0-W1-W2-W4+W8

fr0＝W0-W1-W2-W4-W8。

In the situation in table 1 (a), the PSNR value also changes according to frame index.Determine to have the frame index of low PSNR value, and also determine to apply the minimum change subband of less influence for the frame outside the frame corresponding with determined frame index.After calculating MAD, the minimum change subband be multiply by the suitable scalable factor.According to the direction of the temporal filtering during MCTF, the frame corresponding to particular index in GOP has good performance, and the frame corresponding to another particular index has relatively poor performance in GOP.The present invention is characterized in that following operation: the frame index of determining to have low PSNR value when determining the temporal filtering order, determine then the subband that is used for rebuilding the frame corresponding with determined frame index, for except with the corresponding frame of determined frame index frame apply the minimum change subband of less influence, then described minimum change subband be multiply by the scalable factor.In one embodiment of the invention, the single scalable factor is used for the subband of GOP, and is determined according to MAD.

In addition, even when unlike traditional MCTF, using a plurality of reference frames to carry out MCTF, can use the relation between frame and subband to carry out multiplying each other of the scalable factor in the mode identical with aforesaid mode.

Fig. 9 is the functional-block diagram according to the scalable video encoder of one embodiment of the present of invention.

Described scalable video encoder comprises that motion estimation block 110, motion vector encoding block 120, bit stream produce piece 130, temporal filtering piece 140, spatial alternation piece 150, embedding quantize block 160 and weight and determine piece 170.

The motion vector of the piece in each frame that motion estimation block 110 obtains will be encoded according to the match block in reference frame.Temporal filtering piece 140 also uses described frame.Can use the layered approach such as layering variable size block coupling (HVSBM) to obtain motion vector.The motion vector that is obtained by motion estimation block 110 is provided to temporal filtering piece 140, so that can carry out MCTF.Described motion vector is passive movement vector coding piece 120 codings also, are produced piece 130 by bit stream then and are included in the bit stream.

Temporal filtering piece 140 is carried out the temporal filtering of frame of video with reference to the motion vector that receives from motion estimation block 110.Use MCTF to come the time of implementation to filter, and described temporal filtering is not limited to traditional MCTF.For example, the temporal filtering order can be changed, perhaps a plurality of reference frames can be used.

Simultaneously, weight determines that piece 170 uses formula (1) to calculate MAD about frame of video, and uses the MAD that is calculated to obtain weight according to formula (2).Can the weight that be obtained be multiply by subband according to formula (3).In one exemplary embodiment, described weight be multiply by the conversion coefficient that produces from the spatial alternation of carrying out by spatial alternation piece 150.In other words,, obtain conversion coefficient, then, described conversion coefficient be multiply by weight by the subband that will multiply by the weight in the formula (3) is carried out spatial alternation.Obviously can behind temporal filtering, carry out multiplying each other of weight, thereafter, can carry out spatial alternation.

Scalable conversion coefficient is sent to and embeds quantize block 160 according to weight.Embed quantize block 160 and carry out, produce image encoded information thus by the embedding of scalable conversion coefficient quantification.The motion vector of described image encoded information and coding is sent to bit stream and produces piece 130.Bit stream produces piece 130 generations and comprises the motion vector of described image encoded information, described coding and the bit stream of weight information.Send described bit stream by channel.

According to exemplary embodiment, spatial alternation piece 150 uses wavelet transformation to remove spatial redundancy about frame of video, to obtain spatial scalability.Perhaps, spatial alternation piece 150 can use DCT to remove spatial redundancy about frame of video.

Simultaneously, when using wavelet transformation,, can before temporal filtering, carry out spatial alternation unlike traditional video coding.With reference to Figure 10 this operation is described.

Figure 10 is the functional-block diagram according to the scalable video encoder of an alternative embodiment of the invention.

Referring to Figure 10, come the wavelet transformation frame of video by spatial alternation piece 210.Known method according to wavelet transformation, single frame is divided into four, a quadrant of described frame replaced with the entire image that is similar to described frame and have 1/4 downscaled images (being called as the L image) of the area of described frame, and other three quadrants of described frame are replaced with a kind of information (being called as the H image), according to described information, can recover entire image from the L image.In an identical manner, the L picture frame can be replaced with 1/4 LL image of area and can recover the information of L image according to it with described L picture frame.The compression method that is called as JPEG2000 uses the image compression of utilizing this wavelet method.Unlike the DCT image, the image of wavelet transformation comprises original image information, and utilizes downscaled images to enable to have the video coding of spatial scalability.

Motion estimation block 220 obtains motion vector about the frame of spatial alternation.Described motion vector is used for the temporal filtering by temporal filtering piece 240.Described motion vector is passive movement vector coding piece 230 codings also, are included in then by bit stream to produce in the bit stream of piece 270 generations.

Weight determines that piece 260 determines weight according to the frame of described spatial alternation.Determined weight be multiply by the conversion coefficient that obtains from the minimum change subband the subband that produces from temporal filtering.Be embedded into quantize block 250 by scalable conversion coefficient and quantize, therefore be converted into coded image.Coded image is produced piece 270 by bit stream and uses with motion vector and weight, to produce bit stream.

Simultaneously, video encoder can be included in two video encoders shown in Fig. 9 and 10, be used to carry out two types video coding, and described video encoder can use coded image to produce bit stream, and the utilization of described coded image obtains at the coded sequence in the coded sequence shown in Fig. 9 and 10, that better performance is provided about each GOP.In this video encoder, be included in the bit stream that will send about the information of coded sequence.In the embodiment shown in Fig. 9 and 10, also be included in the bit stream about the information of coded sequence, so that decoder can be decoded with all images of different order coding.

When the time of implementation filters before spatial alternation in traditional video compression, the value that the conversion coefficient indication produces by spatial alternation.In other words, conversion coefficient is called as the DCT coefficient when it is produced by DCT, perhaps is called as wavelet coefficient when it is produced by wavelet transformation.

In an embodiment of the present invention, term ' conversion coefficient ' is intended to expression by removing the value that spatial redundancy and time redundancy obtain from frame before quantizing (promptly embed quantize).In other words, in the embodiment shown in Fig. 9, the coefficient that the conversion coefficient indication produces by spatial alternation as in traditional video compression.But, in the embodiment shown in Figure 10, the coefficient that the conversion coefficient indication produces by temporal filtering.

The term of Shi Yonging ' scalable conversion coefficient ' is intended to comprise the scalable conversion coefficient by using weight in the present invention, or the result to the scalable subband that obtains by temporal filtering carries out spatial alternation by using weight, and the value that produces.Simultaneously, can consider not use the scalable conversion coefficient of weight to multiply by 1, therefore, can be comprised also not by scalable conversion coefficient by scalable conversion coefficient and used weight and by scalable conversion coefficient.

Described scalable Video Decoder comprises: bit stream analysis piece 310, and it analyzes incoming bit stream, extracts coded image information, encoding motion vector information and weight information thus; The contrary quantize block 320 that embeds, it removes to quantize the coded image information that extracted by described bit stream analysis piece 310, obtains thus by scalable conversion coefficient; Contrary weighting block 370, it is scalable by scalable conversion coefficient that it uses described weight information to make a return journey; Inverse

spatial transform piece

330 and 360 is carried out inverse spatial transform; And filter

block

340 and 350 between the inverse time is carried out between the inverse time and is filtered.

Scalable Video Decoder shown in Figure 11 comprises between two inverse times filter

block

340 and 350 and two inverse

spatial transform pieces

330 and 360, so that all images that it can recover to encode with different order.But, in reality is implemented, can filter and spatial alternation for the calculation element time of implementation of using software.In this case, can be provided for the only single software module and the only single software module that is used for spatial alternation of temporal filtering with the option of selection operation order.

Bit stream analysis piece 310 is from the coded image information of bitstream extraction, and sends coded image information to the contrary quantize block 320 that embeds.The contrary then quantize block 320 that embeds quantizes for coded contrary embedding of image information execution, obtains thus by scalable conversion coefficient.Bit stream analysis piece 310 is also to contrary weighting block 370 Transmit weight information.

Contrary weighting block 370 is scalable to obtain conversion coefficient with being gone by scalable conversion coefficient according to described weight information.Described going scalablely is associated with coded sequence.When having carried out coding with temporal filtering, spatial alternation and scalable order, contrary weighting block 370 will be gone scalable before inverse spatial transform piece 330 by scalable conversion coefficient.Then, inverse spatial transform piece 330 is carried out inverse spatial transform.Thereafter, filter block 340 recovers frame of video by filtering between the inverse time between the inverse time.

When and when having carried out coding with the order of temporal filtering, scalable and spatial alternation, inverse spatial transform piece 330 is for being carried out inverse spatial transform by scalable conversion coefficient, then, contrary weighting block 370 go scalable by inverse spatial transform piece 330 handle by scalable conversion coefficient.Thereafter, filter block 340 recovers frame of video by filtering between the inverse time between the inverse time.

When having carried out coding with spatial alternation, temporal filtering and scalable order, contrary weighting block 370 goes scalable by scalable conversion coefficient, obtains conversion coefficient thus.Then, filter block 350 uses described conversion coefficient to come construct image between the inverse time, and filters between the execution inverse time for described image.Then, inverse spatial transform piece 360 is carried out inverse spatial transform for described image, recovers frame of video thus.Can change coded sequence according to GOP.In this case, bit stream analysis piece 310 obtains coded sequence information from the GOP head of bit stream.Simultaneously, can be scheduled to the basic coding order, and bit stream can not comprise coded sequence information.In this case, can be to carry out decoding with the order of basic coding reversed in order.For example, when the basic coding order is temporal filtering, spatial alternation and when scalable, if bit stream does not comprise coded sequence information, then in regular turn to bit stream carry out remove scalable, inverse spatial transform and filter between the inverse time (even be used in the low frame of broken lines among Figure 11 inverse spatial transform piece 330 and between the inverse time filter block 340 decode).

In the above-described embodiment, described scalable video encoder and sent the bit stream that comprises weight, and scalable Video Decoder uses described weight to recover video image.The invention is not restricted to this.For example, scalable video encoder can information converting (being MAD information), and scalable Video Decoder can be from described information acquisition weight.

Video encoder and Video Decoder can be realized with hardware.Perhaps, the software that can use all-purpose computer and being used to carry out the Code And Decode method is realized them, and described all-purpose computer comprises the CPU and the memory that can calculate.Such software can be recorded on the recording medium such as compact disc read-only memory (CD-ROM) or hard disk, realizes video encoder and Video Decoder together so that described software can use a computer.

Therefore, those skilled in the art can understand, under the situation that does not break away from the spirit and scope of the present invention defined by the appended claims, can carry out the various changes on form and the details.In the above-described embodiment, used MCTF, will be interpreted as being included in the scope of the present invention but filter the cycle time of any kind.

The value that obtains by experiment of the present invention is illustrated among the table 2-7.In the present invention, mean P SNR with obtain by traditional MCTF too not different.But, comparing with traditional MCTF, the present invention has reduced standard deviation.

Table 2: the mean P SNR in the Foreman sequence

Bit rate	The present invention	Tradition MCTF (forward direction filtration)
Bit rate	The present invention	Tradition MCTF (forward direction filtration)	128	30.88	30.91
256	35.66	35.68	128	30.88	30.91
256	35.66	35.68	512	39.19	39.23
1024	43.65	43.71	512	39.19	39.23

Table 3: the standard deviation in the Foreman sequence

Bit rate	The present invention	Tradition MCTF (forward direction filtration)
Bit rate	The present invention	Tradition MCTF (forward direction filtration)	128	1.22	1.23
256	0.89	0.94	128	1.22	1.23
256	0.89	0.94	512	0.75	0.84
1024	0.62	0.74	512	0.75	0.84

Table 4: the mean P SNR in Canoa (Asia, Kano) sequence

Bit rate

The present invention

Tradition MCTF (forward direction filtration)

128	28.46	28.45
128	28.46	28.45	256	32.58	32.58
512	37.76	37.76	256	32.58	32.58
512	37.76	37.76	1024	45.36	45.43

Table 5: the standard deviation in the Canoa sequence

Bit rate	The present invention	Tradition MCTF (forward direction filtration)
Bit rate	The present invention	Tradition MCTF (forward direction filtration)	128	0.859	0.861
256	1.004	1.007	128	0.859	0.861
256	1.004	1.007	512	1.000	1.020
1024	1.070	1.090	512	1.000	1.020

Table 6: the mean P SNR in Tempete (Tai Bide) sequence

Bit rate	The present invention	Tradition MCTF (forward direction filtration)
Bit rate	The present invention	Tradition MCTF (forward direction filtration)	128	27.98	27.99
256	32.2	32.28	128	27.98	27.99
256	32.2	32.28	512	35.42	35.5
1024	37.78	37.82	512	35.42	35.5

Table 7: the standard deviation in the Tempete sequence

Bit rate	The present invention	Tradition MCTF (forward direction filtration)
Bit rate	The present invention	Tradition MCTF (forward direction filtration)	128	0.348	0.350
256	0.591	0.670	128	0.348	0.350
256	0.591	0.670	512	0.555	0.682
1024	0.564	0.654	512	0.555	0.682

Industrial applicibility

The invention provides the model of the change in the PSNR value that can in scalable video coding, be reduced between the frame index.In other words, according to the present invention, the high PSNR value of the frame in single GOP reduces, and the low PSNR value of other frame in described GOP improves, so that can improve video coding performance.

Therefore, should be understood that the above embodiments only are used for explanation, and should not be interpreted as for restriction of the present invention.Scope of the present invention provides by appended claim, rather than provides by aforesaid specification, and, drop on that institute in the scope of claim changes and equivalents is intended in the present invention involved.

Claims

1. scalable method for video coding comprises:

(a) receive a plurality of frame of video, and a plurality of frame of video are carried out motion compensated temporal filtering (MCTF), from frame of video, to remove time redundancy; And

(b) from the frame of video that is removed time redundancy, obtain scalable conversion coefficient, quantize scalable conversion coefficient, and produce bit stream.

2. according to the described scalable method for video coding of claim 1, wherein, the frame of video that receives in step (a) has been passed through wavelet transformation, thereby removed spatial redundancy from frame of video, and by use to some subbands in the frame of video that is removed time redundancy predetermined weight obtain scalable conversion coefficient.

3. according to the described scalable method for video coding of claim 1, wherein, by using predetermined weight to some subbands in the frame of video that is removed time redundancy, then the subband of weighting is carried out spatial alternation, come in step (b) acquisition scalable conversion coefficient.

4. according to the described scalable method for video coding of claim 1, wherein, by the frame of video that is removed time redundancy is carried out spatial alternation, use predetermined weight to conversion coefficient in the conversion coefficient that produces by spatial alternation, that obtain from some subbands then, come in step (b) acquisition scalable conversion coefficient.

5. according to the described scalable method for video coding of claim 4, wherein, determine predefined weight for each picture group (GOP), for single GOP, described predefined weight has single and identical value.

6. according to the described scalable method for video coding of claim 5, wherein, determine described predefined weight according to the amplitude of the absolute distortion of GOP.

7. according to the described scalable method for video coding of claim 6, wherein, obtain to use described predefined weight and scalable conversion coefficient from subband, described subband is at the subband that is used for constructing low PSNR frame, compare with low PSNR frame, high Y-PSNR (PSNR) frame is applied the essence slight influence.

8. according to the described scalable method for video coding of claim 7, wherein, each GOP comprises 16 frames; On single direction, carry out MCTF; Calculate the amplitude (MAD) of absolute distortion by following formula,

MAD = 8 \times Σ_{i = 0}^{\frac{n - 1}{2}} Σ_{x = 0}^{p - 1} Σ_{y = 0}^{q - 1} | T_{2 i + 1} (x, y) - T_{2 i} (x, y) |

Wherein, ' i ' indicates frame index, the last frame index of ' n ' indication in GOP, T (x, y) position of indication in the T frame (x, picture value y), and the size of single frame is p*q; According to the following predefined weight ' a ' that calculates, a=1.3 (if MAD＜30), a=1.4-0.0033MAD (if 30＜MAD＜140), and a=1 (if MAD＞140); And obtain to use predefined weight and scalable conversion coefficient from subband W4, W6, W8, W10, W12 and W14.

9. according to the described scalable method for video coding of claim 1, wherein, the bit stream that in step (b), produces comprise about be used to obtain the information of weight of scalable conversion coefficient.

10. a scalable video encoder receives a plurality of frame of video, and produces bit stream, and described scalable video encoder comprises:

The temporal filtering piece is carried out motion compensated temporal filtering (MCTF) to frame of video, to remove time redundancy from frame of video;

The spatial alternation piece is carried out spatial alternation to frame of video, to remove spatial redundancy from frame of video;

Weight is determined piece, determines weight, and described weight will be used for scalable at the conversion coefficient conversion coefficient that obtains as removing the result of time redundancy and spatial redundancy from frame of video, that obtain from some subbands;

Quantize block, quantize scalable conversion coefficient; And

Bit stream produces piece, uses quantized transform coefficients to produce bit stream.

11. according to the described scalable video encoder of claim 10, wherein, described spatial alternation piece is carried out wavelet transformation to remove spatial redundancy from frame of video to frame of video, described temporal filtering piece uses the subband that obtains by the frame of video execution MCTF to wavelet transformation to produce conversion coefficient, and described weight determines that piece uses the frame of wavelet transformation to determine weight, and determined weight be multiply by the conversion coefficient that obtains from some subbands, obtain thus scalable conversion coefficient.

12. according to the described scalable video encoder of claim 10, wherein, described temporal filtering piece obtains subband by frame of video is carried out MCTF, described weight determines that piece uses frame of video to determine weight, and with determined weight multiply by some subbands with obtain scalable subband, and described spatial alternation piece to scalable subband carry out spatial alternation, obtain thus scalable conversion coefficient.

13. according to the described scalable video encoder of claim 10, wherein, described temporal filtering piece obtains subband by frame of video is carried out MCTF, described spatial alternation piece produces conversion coefficient by subband is carried out spatial alternation, and described weight determines that piece uses frame of video to determine weight, and determined weight be multiply by the conversion coefficient that obtains from predetermined sub-band, obtain thus scalable conversion coefficient.

14. according to the described scalable video encoder of claim 13, wherein, determine predefined weight for each picture group (GOP), for single GOP, described predefined weight has single and identical value.

15., wherein, determine predefined weight according to the amplitude of the absolute distortion of GOP according to the described scalable video encoder of claim 14.

16. according to the described scalable video encoder of claim 15, wherein, obtain to use predefined weight and scalable conversion coefficient from subband, described subband is at the subband that is used for constructing low PSNR frame, compare with low PSNR frame, high Y-PSNR (PSNR) frame is applied the essence slight influence.

17. according to the described scalable video encoder of claim 16, wherein, each GOP comprises 16 frames; On single direction, carry out MCTF; Calculate the amplitude (MAD) of absolute distortion by following formula,

MAD = 8 \times Σ_{i = 0}^{\frac{n - 1}{2}} Σ_{x = 0}^{p - 1} Σ_{y = 0}^{q - 1} | T_{2 i + 1} (x, y) - T_{2 i} (x, y) |

18. according to the described scalable video encoder of claim 10, wherein, described bit stream produce piece comprise about be used to obtain the information of weight of scalable conversion coefficient.

19. a scalable video encoding/decoding method comprises:

From bitstream extraction image encoded information, coded sequence information and weight information;

By coded image information is gone to quantize to obtain scalable conversion coefficient; And

With with come by the opposite decoding order of the coded sequence of described coded sequence information indication to scalable conversion coefficient carry out and remove scalable, inverse spatial transform and filter between the inverse time, thereby recover frame of video.

20. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is to filter and inverse spatial transform between scalable, inverse time.

21. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is inverse spatial transform, go scalable and filter between the inverse time.

22. according to the described scalable video encoding/decoding method of claim 19, wherein, described decoding order is scalable, inverse spatial transform and filters between the inverse time.

23. according to the described scalable video encoding/decoding method of claim 22, wherein, for each picture group (GOP), from the bitstream extraction predefined weight.

24. according to the described scalable video encoding/decoding method of claim 23, wherein, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

25., wherein, obtain to use predefined weight and contrary scalable conversion coefficient from subband W4, W6, W8, W10, W12 and the W14 that during encoding, produces according to the described scalable video encoding/decoding method of claim 23.

26. a scalable Video Decoder comprises:

The bit stream analysis piece is analyzed the bit stream received with from described bitstream extraction image encoded information, coded sequence information and weight information;

Inverse quantization block, with coded image go to quantize with obtain scalable conversion coefficient;

Contrary weighting block, execution is gone scalable;

The inverse spatial transform piece is carried out inverse spatial transform; And

Filter block between the inverse time is carried out between the inverse time and is filtered,

Described scalable Video Decoder with come by the opposite order of the coded sequence of described coded sequence information indication to scalable conversion coefficient carry out and remove scalable, inverse spatial transform and filter between the inverse time, recover frame of video thus.

27. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is scalable for going, filtration and inverse spatial transform between the inverse time.

28. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is inverse spatial transform, go scalable and filter between the inverse time.

29. according to the described scalable Video Decoder of claim 26, wherein, described decoding order is scalable for going, inverse spatial transform and filtering between the inverse time.

30. according to the described scalable Video Decoder of claim 29, wherein, the weight that described bit stream analysis piece is scheduled to from described bitstream extraction for each picture group (GOP).

31. according to the described scalable Video Decoder of claim 30, wherein, the number that constitutes the frame of GOP is 2 ^k(k=1 wherein, 2,3 ...).

32. according to the described scalable Video Decoder of claim 26, wherein, described contrary weighting block is for carrying out contrary scalable from the scalable conversion coefficient of subband W4, the W6, W8, W10, W12 and the W14 that have produced during encoding.

33. a recording medium has computer-readable code, is used for carrying out the step according to the method for any one claim of claim 1-9 and 19-25.