CN101176346A

CN101176346A - Method for scalably encoding and decoding video signal

Info

Publication number: CN101176346A
Application number: CNA2006800161454A
Authority: CN
Inventors: 全柄文; 朴胜煜; 朴志皓; 朴玄旭; 尹度铉
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2005-04-01
Filing date: 2006-03-31
Publication date: 2008-05-07
Also published as: CN101176348B; CN101176347B; CN101176349A; CN101176349B; CN101176347A; CN101176348A

Abstract

In one embodiment, decoding of a video signal includes predicting at least a portion of a current image in a current layer based on at least a portion of a base image in a base layer and offset information.

Description

Can carry out the method for Code And Decode to video signal scaling ground

Technical field

The present invention relates to the ges forschung and the decoding of vision signal.

Background technology

To the required high bandwidth of digital video signal distribution T V signal that is sent and receive with wireless mode by mobile phone or notebook is difficult.Be envisioned that identical difficulty can occur on the mobile television and luggable computer of popularizing use in the future.Therefore, the employed video compression standard of these mobile devices should have high video signal compression efficiencies.

These mobile devices have multiple processing and display capabilities consequently should be equipped with multiple compressed video data forms.This means provide based on a video source and have a plurality of variablees the video data of multiple different quality of the various combination of---for example the figure place of the frame number, resolution and the every pixel that send of per second---.This has applied very big burden to the content provider.

Because the problems referred to above, the content provider is the compressed video data that each source video is equipped with high bit rate, and when receiving request from mobile device, also be equipped with a kind of processing, the video that is compressed is decoded and its coding is got back to the video data of the Video processing ability that adapts to mobile device.Yet this method must be used code converter, comprises decoding, convergent-divergent and encoding process, and this causes the delay that the data of request is offered mobile device sometimes.Code converter also needs complicated hardware and algorithm to deal with multiple object coding form.

Develop scalable video codecs (SVC) and attempted overcoming these problems.This scheme is the image sequence with high image quality with video coding, a part (specifically, a part of frame sequence of selecting discontinuously from whole frame sequences) that guarantees simultaneously image encoded (frame) sequence is decoded to form a certain image quality level.

Motion compensated temporal filter (MCTF) is a kind of encoding scheme that suggestion is used in the scalable video encoding and decoding.The MCTF scheme has the high compression efficiency (being high coding efficiency) that reduces per second transmission figure place.The MCTF scheme can be applicable to some and sends environment, for example the mobile communication environment that is restricted of bandwidth.

Although can guarantee to receive the part of scalable MCTF encoding scheme image encoded sequence and it is processed into the video that has a certain image quality level as mentioned above, still the problem of Cun Zaiing is the remarkable deterioration of picture quality if bit rate reduces.A solution of this problem provides the auxiliary picture sequence of low bit rate, for example have small screen size and/or low frame speed image sequence.

Auxiliary picture sequence is called as basic unit (BL), and main picture sequence is called as enhancement layer or strengthening layer.Owing to same video content is encoded to two layers with different spatial resolutions or different frame rates, so the vision signal of basic unit and enhancement layer has redundancy.In order to improve the code efficiency of enhancement layer, the vision signal of the movable information of available basic unit and/or texture information prediction enhancement layer.This Forecasting Methodology is called as inter-layer prediction.

Fig. 1 illustrates the example of Forecasting Methodology and inter-layer residual prediction method in the basic unit, and it is the inter-layer prediction method of enhancement layer being encoded with basic unit.

Forecasting Methodology is used the texture (or view data) of basic unit in the basic unit.Specifically, Forecasting Methodology forms the prediction data of the macro block of enhancement layer in the basic unit in order to the relevant block of the basic unit of internal schema coding.Term " relevant block " is if refer to be arranged in the basic frame that overlaps with the frame that contains macro block and the ratio of the screen size of screen size by enhancement layer and basic unit amplifies basic frame then has the piece of the area that covers macro block on time domain.After amplifying relevant block by up-samples and with the ratio of the screen size of the screen size of enhancement layer and basic unit, the relevant block of Forecasting Methodology use basic unit basic unit in.

The inter-layer residual prediction method is similar to Forecasting Methodology in the basic unit, uses the relevant block of encoding in the basic unit to obtain residual data except it, and these data are image difference data, rather than contain the relevant block of view data in the basic unit.The inter-layer residual prediction method uses coding to form the prediction data that is encoded with the macro block of the enhancement layer that comprises residual data with the relevant block of the basic unit that comprises residual data, and described residual data is the image difference data.Similar to Forecasting Methodology in the basic unit, after amplifying relevant block by up-samples and with the ratio of enhancement layer screen size and basic unit screen size, the inter-layer residual prediction method is used the basic unit's relevant block that contains residual data.

By down-sampled acquisition under the video source being used for the basic unit with low resolution of inter-layer prediction method.Because can adopt multiple different decline Sampling techniques and down-sampled than (being level and/or vertical dimension reduction gear ratio) down, the enhancement layer and the respective image the basic unit (frame or piece) that produce from same video source can be out-phase.

Fig. 2 illustrates the phase relation between enhancement layer and the basic unit.Can by (i) be independent of enhancement layer than low spatial resolution to video source sampling (ii) with higher spatial resolution to the down-sampled basic unit that produces under the enhancement layer.In the example of Fig. 2, the following down-sampled ratio between enhancement layer and the basic unit is 2/3.

Vision signal is managed as independent component, i.e. a luminance component and two chromatic components.Luminance component is associated with monochrome information Y and two chromatic components are associated with chrominance information Cb and Cr.4: 2: 0 (Y: Cb: be widely used Cr) of ratio between the luminance and chrominance information.The sampling of carrier chrominance signal typically is positioned in the middle of the sampling of luminance signal.When directly when video source forms enhancement layer and/or basic unit, to the luminance signal of enhancement layer and/or basic unit and carrier chrominance signal sampling to satisfy 4: 2: 0 ratio and according to the locality condition of 4: 2: 0 ratio.

In the superincumbent situation (i), because enhancement layer may have different sampling locations with basic unit, therefore shown in the part (a) of Fig. 2, enhancement layer and basic unit are out-phase.In the example of part (a), the luminance and chrominance information of each enhancement layer and basic unit satisfies 4: 2: 0 ratio and according to the locality condition of 4: 2: 0 ratio.

Superincumbent situation (ii) in, by with the down-sampled basic unit that forms under the luminance and chrominance information of a requirement ratio to enhancement layer.If form basic unit so that the luminance and chrominance information homophase of the luminance and chrominance information of basic unit and enhancement layer, the then satisfied locality condition of the luminance and chrominance information of basic unit according to 4: 2: 0 ratio shown in (b) part of Fig. 2.

In addition, if form basic unit so that the luminance and chrominance information of basic unit satisfies the locality condition according to 4: 2: 0 ratios, then shown in the part (c) of Fig. 2, the carrier chrominance signal out-phase of the carrier chrominance signal of basic unit and enhancement layer.In this case, if to the carrier chrominance signal of basic unit according to inter-layer prediction method with the requirement ratio up-samples, the carrier chrominance signal out-phase of the carrier chrominance signal of the basic unit of up-samples and enhancement layer then.

In addition situation (ii) in, shown in Fig. 2 (a), enhancement layer and basic unit may be out-phase.

That is, may change in the up-samples program of the phase place of basic unit in the decline sampling routine that produces basic unit and at inter-layer prediction method, so basic unit and enhancement layer out-phase, code efficiency reduced thus.

Summary of the invention

In one embodiment, the decoding of vision signal comprises the offset information prediction at least a portion present image in anterior layer based on the sampling in the present image of at least a portion base image and prediction in the basic unit.For example, sampling can be brightness and/or chroma samples.

In one embodiment, offset information is based on the corresponding sampling in the basic image section.

In another embodiment, prediction steps is based at least a portion and the offset information prediction present image part of the up-samples part of basic image.

In one embodiment, offset information is a phase shift information.

In one embodiment, prediction steps can obtain offset information from the burst head of basic unit, and in another embodiment, can obtain offset information from the sequence-level head when anterior layer.

Other relevant embodiment comprises to the method for encoding video signal and to the device of encoding video signal and decoding.

Description of drawings

By the detailed description below in conjunction with accompanying drawing, above-mentioned purpose, feature and other advantage with other of the present invention will be easier to understand, wherein:

Fig. 1 illustrates an example of the inter-layer prediction method of enhancement layer being encoded with basic unit;

Fig. 2 illustrates the example of the phase relation between enhancement layer and the basic unit;

Fig. 3 is the block diagram that can implement according to the video signal coding apparatus of scalable video coding method of the present invention;

Fig. 4 illustrates the parts of EL encoder shown in Figure 3;

That Fig. 5 illustrates according to an embodiment of the invention, the phase shift of basic unit and/or enhancement layer is taken into account, as in the decoding of the enhancement layer of encoding, to be used for basic unit is carried out up-samples method according to inter-layer prediction method;

Fig. 6 is the block diagram that is used for device that the bit stream of the device code by Fig. 3 is decoded; And

Fig. 7 illustrates the parts of EL decoder shown in Figure 6.

Embodiment

Below in conjunction with accompanying drawing all embodiment of the present invention are elaborated.

Fig. 3 is the block diagram that can implement according to the video signal coding apparatus of scalable video coding method of the present invention.

Video signal coding apparatus shown in Figure 3 comprises enhancement layer (EL) encoder 100, texture coding unit 110, motion encoded unit 120, multiplexer (or multiplexer) 130, decline sampling unit 140 and basic unit (BL) encoder 150.Decline sampling unit 140 is directly from incoming video signal or by to the down-sampled enhancement layer that produces under the incoming video signal, and passes through the down-sampled base layer signal that forms under incoming video signal or the enhancement layer signal according to specified scheme.Described specified scheme depends on application or the device that receives each layer; And be the problem of design alternative therefore.EL encoder 100 is encoded to the enhancement layer signal that produces by decline sampling unit 140 on each macroblock basis in scalable mode according to specific coding scheme (for example MCTF scheme), and produces appropriate managerial information.Texture coding unit 110 converts the macro block data of coding to the bit stream of compression.The motion vector encoder of the image block that motion encoded unit 120 will obtain by EL encoder 100 according to specified scheme becomes the bit stream of compression.BL encoder 150 is according to specified scheme---for example according to MPEG-1, MPEG-2 or MPEG-4 standard or standard H.261 or H.264---encodes to the base layer signal that is produced by decline sampling unit 140, and produce the small screen image sequence, for example produce 25% the image sequence that is contracted to its original size when needed.Multiplexer 130 is with the dateout of texture coding unit 110, become required form from the small-screen sequence of BL encoder 150 and the output vector data encapsulation of motion encoded unit 120.The data multiplex of 130 pairs of encapsulation of multiplexer is required transformat and output.

Decline sampling unit 140 not only sends to EL and

BL encoder

100 and 150 with enhancement layer and base layer signal, but also the sampling relevant information of two layers is sent to EL and BL encoder 100 and 150.The sampling relevant information of two layers can comprise the ratio between the luminance and chrominance information of spatial resolution (or screen size), frame rate, two layers, two layers carrier chrominance signal the position and based on the information about the phase shift between the luminance and chrominance information of two layers of each position of the luminance and chrominance information of two layers.

Phase shift may be defined as the phase difference between the luminance signal of two layers.Generally be to sample satisfying locality condition, and the luminance signal of two layers is sampled so that their homophases each other according to the luminance and chrominance information of two layers of the comparison between the luminance and chrominance information.

Phase shift also may be defined as the phase difference between the carrier chrominance signal of two layers.Match each other in the position of the luminance signal of two layers so that the luminance signal of two layers each other after the homophase, can determine phase difference between the carrier chrominance signal of two layers based on the difference between the carrier chrominance signal position of two layers.

For example, can define phase difference with reference to a virtual level (for example basic unit of up-samples) separately to each layer based on the incoming video signal that produces enhancement layer or basic unit.Here, phase difference is between the brightness and/or chroma samples (for example pixel) of the enhancement layer of basic unit and virtual level (for example basic unit of up-samples).

EL encoder 100 will be recorded in the header area of sequence layer or sliced layer from the phase shift information that decline sampling unit 140 transmits.If phase shift information has the value beyond 0, then EL encoder 100 will indicate global offset sign " global_shift_flag " set that whether has phase shift between two layers to be, for example, " 1 ", and phase-shift value is recorded in the information of field " global_shift_x " and " global_shift_y ".The horizontal phase shift of " global_shift_x " value representation.The vertical phase shift of " global_shift_y " value representation.Rephrase the statement the horizontal level skew between " global_shift_x " value representation sampling (being pixel), and the skew of the upright position between " global_shift_y " the expression sampling (being pixel).

On the other hand, if phase shift information has 0 value, then EL encoder 100 will indicate that " global_shift_flag " set is, for example, and " 0 ", and phase-shift value is not recorded among information field " global_shift_x " and " global_shift_y ".

If necessary, the EL encoder 100 relevant information of also will sampling is recorded in the header area of sequence layer or sliced layer.

EL encoder 100 is carried out MCTF at the video data that receives from decline sampling unit 140.Therefore, EL encoder 100 is carried out the predicted operation on each macro block in the frame of video (or image) by the reference block that deducts estimation find in macro block.In addition, EL encoder 100 is carried out selectively by the image difference between reference block and the macro block is added into reference block and is upgraded operation.

EL encoder 100 is divided into the input video frame sequence, for example, and odd-numbered frame and even frame.EL encoder 100 on a plurality of coding grades, independent frame carried out prediction and upgrade operation up to, for example, be decreased to 1 for one group of image (GOP) by upgrading the L frame number that operation produces.Fig. 4 is illustrated in the parts that are associated with prediction on the grade in the grade of respectively encoding and upgrade the EL encoder 100 of operation.

The parts of EL encoder 100 shown in Figure 4 comprise estimator/predictor 101.Pass through estimation, estimator/predictor 101 is searched for the reference block of each macro block of a frame (for example odd-numbered frame in the enhancement layer) that comprises residual data, and carries out image difference (be pixel-pixel poor) and the motion vector from macro block to reference block of predicted operation with computing macro block and reference block subsequently.EL encoder 100 also comprises renovator 102, is used for by adding reference block to the image difference normalization of the macro block that calculates and reference block and with this normalized value a frame (for example even frame) carried out upgrading operation, and this frame comprises the reference block of macro block.

The piece that has the minimal graph aberration with the object piece has and the highest degree of correlation of object piece.For example, the definition two pieces image difference be two pieces pixel-pixel difference and or mean value.Think to have pixel-pixel difference and (or mean value) or be reference block less than the piece of the threshold value of object piece, piece with lowest difference and (or mean value).

The operation of being carried out by estimator/predictor 101 is called " P " operation, and the frame that " P " operation is produced calls " H " frame.The high fdrequency component of the residual data reflecting video signal that exists in " H " frame.The operation of being carried out by renovator 102 is called " U " operation, and the frame that " U " operation is produced calls " L " frame." L " frame is a low pass wavelet band image.

The estimator/predictor 101 of Fig. 4 and renovator 102 can carried out its operation simultaneously and concurrently by cutting apart on a plurality of bursts that frame produces, rather than are that unit carries out its operation with the frame.During below embodiment described, term " frame " broadly comprised " burst ", suppose with term " burst " replacement term " frame " technical being equal to.

More particularly, each odd-numbered frame of the L frame that obtains with each input video frame or with previous grade of estimator/predictor 101 is divided into a certain size macro block.Estimator/predictor 101 subsequently with same Time Domain Decomposition grade current odd-numbered frame or before current odd-numbered frame and in the even frame afterwards searching image be similar to the piece of the image of each macro block of cutting apart the most definitely, and use similar or reference block to produce the predicted picture of each macro block of cutting apart and obtain its motion vector.

As shown in Figure 4, EL encoder 100 also can comprise BL decoder 150.BL decoder 105 takes out the coded message such as macro block mode from the base layer stream of the coding that contains the small-screen sequence that is received from BL encoder 150, and with the base layer stream decoding of coding to produce the frame that each all is made of one or more macro blocks.Estimator/predictor 101 also can be searched for the reference block of macro block according to Forecasting Methodology in the basic unit in a frame of basic unit.Specifically, estimator/predictor 101 by the basic unit of BL decoder 105 reconstruct with the frame time domain that contains macro block in the frame overlapping search with the relevant block of internal schema coding.Term " relevant block " is if refer to be arranged in the basic frame that overlaps on the time domain and amplify basic frame with the ratio of the screen size of enhancement layer and the screen size of basic unit then have the piece of the area that covers macro block.

Estimator/predictor 101 is decoded and the original image of the relevant block that reconstruct is found by the pixel value to the interior coding of relevant block, thereby and subsequently the relevant block up-samples found is amplified it with the ratio of enhancement layer screen size with the basic unit screen size.Estimator/predictor 101 is considered to carry out this up-samples so that the relevant block of the basic unit of amplifying and the macro block homophase of enhancement layer from the phase shift information " global_shift_x/y " that decline sampling unit 140 sends.

Estimator/predictor 101 is encoded to macro block with reference to the respective area in basic unit's relevant block, relevant block be exaggerated with the macro block homophase.Here, term " respective area " refers to be in subregion in the relevant block of same relative position in the frame with macro block.

If necessary, estimator/predictor 101 by when changing the relevant block phase place carrying out estimation on the macro block the higher reference region of the degree of correlation of search and macro block in the amplification relevant block in basic unit, and macro block is encoded with the reference region that searches.

If in the search reference region, the phase place of the relevant block of amplifying further changes, then estimator/predictor 101 is changed to 1 with local offset sign " local_shift_flag ", and whether " local_shift_flag " indication exists the phase shift that is different from global phase shift " global_shift_flag " between macro block and corresponding up-samples piece.In addition, estimator/predictor 101 is recorded in the local offset sign head zone of macro block and the local phase shift between macro block and the relevant block is recorded in information field " local_shift_x " and " local_shift_y ".Local phase shift information can be replacement information and provide replacement or replacement as global phase shift information with whole phase shift information.Perhaps, local phase shift information can be an additional information, and the local phase shift information that wherein is attached to corresponding global phase shift information provides whole or whole phase shift information.

Estimator/predictor 101 also will be indicated and with basic unit's internal schema the macro block of enhancement layer to be carried out information encoded and be inserted in the head zone of macro block with this advisory decoder.Estimator/predictor 101 also can be used before macro block and the reference block of finding in other frame afterwards is applied to macro block so that it comprises the residual data as the data of image difference with the inter-layer residual prediction method.Equally in this case, the phase shift information that consideration is transmitted from decline sampling unit 140 " global_shift_x/y " is so that basic unit and enhancement layer homophase, and the relevant block up-samples of 101 pairs of coded basic units of estimator/predictor is so that it comprises the residual data as the data of image difference.Here, the relevant block of basic unit is to be encoded so that it comprises the residual data as the image difference data.

Estimator/predictor 101 will be indicated according to the inter-layer residual prediction method macro block of enhancement layer will be carried out header area that information encoded is inserted into macro block with this advisory decoder.

All macro blocks are carried out said procedure to form the H frame as the predicted picture of frame in 101 pairs of frames of estimator/predictor.All odd-numbered frame of 101 pairs of all input video frame of estimator/predictor or the L frame that obtains with previous grade are carried out said procedures to form the H frame as the predicted picture of incoming frame.

As mentioned above, the image difference of each macro block in the H frame that renovator 102 will be produced by estimator/predictor 101 be added into the L frame with its reference block, and described reference block is the even frame of input video frame or the L frame that obtains with previous grade.

In wired or wireless mode being sent to decoding device with said method coded data stream or via the recording medium transmission.Decoding device is according to following method reconstruct raw video signal.

Fig. 5 illustrate according to an embodiment of the invention, consider in basic unit and/or the enhancement layer phase shift, in decoding, be used for method to basic unit's up-samples according to the enhancement layer of inter-layer prediction method coding.

For the macro block according to the enhancement layer of inter-layer prediction method coding is decoded, amplify the basic unit piece corresponding with the screen size of enhancement layer with the ratio of basic unit screen size with macro block by up-samples.This up-samples considers that the phase shift information " global_shift_x/y " in enhancement layer and/or the basic unit carries out, and compensates the global phase shift between the relevant block that is exaggerated in the macro block of enhancement layer and the basic unit thus.

If between enhancement layer macro block and basic unit's relevant block, there is the local phase shift " local_shift_x/y " be different from global phase shift " global_shift_x/y ", then consider local phase shift " local_shift_x/y " and to the relevant block up-samples.For example in one embodiment, available local phase shift information replaces global phase shift information, perhaps uses with global phase shift information in another embodiment.

Then, be exaggerated with the original image of the macro block of the relevant block reconstruct enhancement layer of macro block homophase.

Fig. 6 is the block diagram that is used for the device of being decoded by the bit stream of the device code of Fig. 3.The decoding device of Fig. 6 comprises demultiplexer (or removing multiplexer) 200, texture decoder unit 210, motion decoding unit 220, EL decoder 230 and BL decoder 240.Demultiplexer 200 is divided into the motion vector stream of compression and the macroblock information stream of compression with the bit stream that receives.Texture decoder unit 210 with the compression macroblock information stream reconstruct to its original uncompressed state.Motion decoding unit 220 reconstructs the compressed motion vector stream to its original uncompressed state.EL decoder 230 rotates back into raw video signal according to specified scheme (for example MCTF scheme) with unpressed macroblock information stream and unpressed motion vector stream.BL decoder 240 is decoded to base layer stream according to specified scheme (for example MPEG-4 or H.264 standard).

EL decoder 230 use the coded message of basic unit according to inter-layer prediction method and/or through the frame of the basic unit of decoding or macro block so that enhancement layer stream is decoded.In order to achieve this end, EL decoder 230 from the sequence header district of enhancement layer or the burst header area read global offset sign " global_shift_flag " and phase shift information " global_shift_x/y " to determine enhancement layer and/or basic unit, whether there is phase shift and confirm phase shift.EL decoder 230 consider the phase shift confirmed to basic unit's up-samples so that employed basic unit of inter-layer prediction method and enhancement layer homophase.

EL decoder 230 is reconstructed into original frame sequence with inlet flow.Fig. 7 illustrates the critical piece of the EL decoder of realizing according to the MCTF scheme 230.

All parts of the EL decoder 230 of Fig. 7 are carried out the L frame sequence that the time domain of the H of Time Domain Decomposition level N and L frame sequence is combined into Time Domain Decomposition level N-1.The parts of Fig. 7 comprise contrary renovator 231, inverse predictor 232, motion vector decoder 233 and arrangement machine 234.Contrary renovator 231 deducts the respective pixel values of the L frame of input the difference of pixel of the H frame of input selectively.Inverse predictor 232 usefulness H frames and the above-mentioned L frame that has deducted the image difference of H frame are reconstructed into the L frame of original image with the H frame of input.Motion vector decoder 233 is decoded into the motion vector information of the piece in the H frame with the input motion vector current and motion vector information is offered the contrary renovator 231 and the inverse predictor 232 of each grade.Arrangement machine 234 makes from the L frame of contrary renovator 231 outputs with by the L frame that inverse predictor 232 is finished and interweaves, and produces a normal L frame sequence thus.

Constitute the L frame sequence 701 of level N-1 from the L frame of arrangement machine 234 outputs.Contrary renovator of next stage and N-1 level fallout predictor are reconstructed into the L frame sequence with the L frame sequence 701 of level N-1 and the H frame sequence 702 of input.This decoding processing is to carry out reconstruct original video frame sequence thus on the progression identical with the coding progression carried out in the coded program.

Below will be in more detail reconstruct (time domain combination) program on the descriptive level N, the H frame of the L frame of the level N that wherein will produce on level N+1 and grade N of reception is reconstructed into the L frame of grade N-1.

L frame for the input of level N, contrary renovator 231 is determined all corresponding H frames of grade N, and the image difference of these frames is that the piece that is used as in the original L frame of level N-1 of the input L frame that is updated to grade N in coded program of reference block obtains with reference to the motion vector that provides from motion vector decoder 233.Contrary renovator 231 pixel value of the relevant block from the input L frame of level N subsequently deducts the error amount of all macro blocks in the corresponding H frame of grade N, the original L frame of reconstruct thus.

Carry out this contrary renewal for the piece in the current L frame of level N and operate, these pieces are upgraded with the macro block error amount in the H frame in the coded program, thus the L frame of level N are reconstructed into the L frame of grade N-1.

For the target macroblock in the input H frame, the motion vector that inverse predictor 232 provides with reference to motion vector decoder 233 is determined the reference block from the contrary L frame that upgrades of the process of contrary renovator 231 outputs, and the pixel value of reference block added poor (error) value of the pixel of target macroblock, its original image of reconstruct thus.

With basic unit's internal schema the macro block in the H frame is carried out information encoded if in the header area of macro block, comprise indication, then the original image of the basic frame reconstruct macro block that provides from BL decoder 240 of inverse predictor 232 usefulness.Be the specific example of this flow process below.

Inverse predictor 232 reconstruct are corresponding to the original image of the interior encoding block in the basic unit of macro block in the enhancement layer, thereby and to it being amplified with the screen size of the enhancement layer ratio with the basic unit screen size from the relevant block up-samples of the reconstruct of basic unit.Phase shift information " global_shift_x/y " in inverse predictor 232 consideration enhancement layers and/or the basic unit is carried out up-samples so that the macro block homophase of relevant block of amplifying in the basic unit and enhancement layer.Promptly, if there is phase shift (for example equaling 1) in " global_shift_flag " indication between basic unit and enhancement layer, then inverse predictor 232 will be from respective macroblock phase shift " global_shift_x " and " global_shift_y " value of basic unit in the process of up-samples.Inverse predictor 232 by be exaggerated in the basic unit with the corresponding amplification piece of macro block homophase in the pixel value of respective area add that the difference of all pixels of macro block comes the original image of reconstruct macro block.Here, term " respective area " refers in the frame to be in part zone in the relevant block of same relative position with macro block.

If there is the local phase shift " local_shift_x/y " that is different from global phase shift " global_shift_x/y " in local offset sign " local_shift_flag " indication between macro block and relevant block, inverse predictor 232 considers that local phase shift " local_shift_x/y " (instead or additive phase information) are to the relevant block up-samples.Can comprise local phase shift information in the header area of macro block.

With the interlayer residual mode macro block in the H frame is carried out information encoded if in header area, comprise indication at macro block, then inverse predictor 232 consider global phase shift discussed above " global_shift_x/y " to the relevant block up-samples of basic unit so that it comprises residual data, thereby amplify relevant block so that the macro block homophase of itself and enhancement layer.Inverse predictor 232 subsequently with amplify with the residual data of the relevant block reconstruct macro block of macro block homophase.

Inverse predictor 232 references contain the reference block of the reconstruct macro block of residual data from the motion-vector search L frame that motion vector decoder 233 provides, and add the original image of pixel value difference (being residual data) the reconstruct macro block of macro block by the pixel value with reference block.

With all macro blocks in the current H frame of the same way as reconstruct in the top operation is its original image, and the macro block of portfolio restructuring is to be reconstructed into the L frame with current H frame.Arrangement machine 234 alternatelies are arranged by the L frame of contrary fortune side device 232 reconstruct and the L frame that is upgraded by contrary renovator 231, and the L frame that will so arrange exports next stage to.

Above-mentioned coding/decoding method is reconstructed into complete sequence of frames of video with MCTF coded data stream.In above-mentioned MCTF coded program, image sets (GOP) has been predicted and upgraded under N time the situation of operation, if in the MCTF decoding program, carry out N contrary renewal and predicted operation, the sequence of frames of video of acquisition then with original image quality.Yet,, can obtain to have than low image quality with than the sequence of frames of video of low level speed if carry out contrary upgrade and predicted operation is less than N time.Therefore, decoding device is designed to carry out contrary the renewal and predicted operation under the degree that adapts with its performance.

Can in mobile communication terminal, media player etc., comprise above-mentioned decoding device.

As top explanation is understood, according to the present invention to the method for encoding video signal and decoding by preventing in that following down-sampled during to video signal encoding/decoding and basic unit that the up-samples program causes and/or the phase shift in the enhancement layer increase code efficiency according to inter-layer prediction method.

Although disclose exemplary embodiment of the present invention to be illustrated as purpose, yet those skilled in that art are appreciated that under situation about not departing from the scope of the present invention with spirit and can make multiple correction, replacement and interpolation.

Claims

1. method to decoding video signal comprises:

At least a portion of prediction present image in the anterior layer, described prediction be based at least a portion of a basic image in the basic unit and the present image predicted in the offset information of sampling.

2. the method for claim 1 is characterized in that, described sampling is a luma samples.

3. the method for claim 1 is characterized in that, described sampling is a chroma samples.

4. the method for claim 1 is characterized in that, described sampling is brightness and chroma samples.

5. the method for claim 1 is characterized in that, described offset information is based on the corresponding sampling in the described part of described basic image.

6. method as claimed in claim 5 is characterized in that, the head of described prediction steps one burst from described basic unit obtains described offset information.

7. method as claimed in claim 5 is characterized in that described offset information is a phase shift information.

8. the method for claim 1 is characterized in that, described prediction steps is based at least a portion of the up-samples part of described basic image and the described part that described offset information is predicted described present image.

9. method as claimed in claim 8 is characterized in that, described up-samples is based on that described offset information carries out.

10. method as claimed in claim 7 is characterized in that described offset information is a phase shift information.

11. the method for claim 1 is characterized in that, the head of described prediction steps one burst from described basic unit obtains described offset information.

12. method as claimed in claim 11 is characterized in that, described prediction steps is determined the existence of described offset information based on the designator in the head of described burst.

13. the method for claim 1 is characterized in that, described prediction steps obtains described offset information from described sequence-level head when anterior layer.

14. method as claimed in claim 13 is characterized in that, described prediction steps is determined the existence of described offset information based on the designator in the described sequence-level head.

15. the method for claim 1 is characterized in that, described prediction steps is determined the existence of described offset information based on described basic unit and described designator of working as in one of anterior layer.

16. the method for claim 1 is characterized in that, described offset information is a phase shift information.

17. the method to encoding video signal comprises:

At least a portion based on a basic image in the basic unit is encoded at least a portion of a present image in anterior layer; And

The offset information of the sampling in the present image of prediction is recorded in the coded vision signal.

18. the device to decoding video signal comprises:

The decoder of prediction at least a portion of a present image in the anterior layer, described prediction be based at least a portion of a basic image in the basic unit and the present image predicted in the offset information of sampling.

19. the device to encoding video signal comprises:

At least a portion of a present image in anterior layer is encoded and the offset information of the sampling in the present image of prediction is recorded in encoder in the coded vision signal based at least a portion of a basic image in the basic unit.