CN101860759B

CN101860759B - Encoding method and encoding device

Info

Publication number: CN101860759B
Application number: CN 200910132684
Authority: CN
Inventors: 郭宜; 刘盈嘉; 李厚强
Original assignee: University of Science and Technology of China USTC; Huawei Technologies Co Ltd
Current assignee: University of Science and Technology of China USTC; Huawei Technologies Co Ltd
Priority date: 2009-04-07
Filing date: 2009-04-07
Publication date: 2012-06-20
Anticipated expiration: 2029-04-07
Also published as: CN101860759A

Abstract

The embodiment of the invention discloses an encoding method and an encoding device. The method comprises the following steps that: values of a Lagrange cost function, corresponding to all encoding modes in a mode aggregate, are calculated, the mode aggregate comprises various modes, wherein items which are in direct proportion to error concealment distortion are added in the Lagrange cost function, and the error concealment distortion represents the capability of error concealment of a base layer to information of an enhancement layer; an encoding mode which can lead the value of the Lagrange cost function to be minimum is selected as the encoding mode of the base layer; and encoding is carried out on the base layer according to the selected encoding mode. According to the embodiment of the invention, the invention can increase relevance of the information of the base layer and the enhancement layer, thus leading code stream to be correctly received on the base layer; and under the condition of losing the enhancement layer, the information of the base layer is utilized to improve the error elasticity of the total code stream so as to maintain the total quality of video not to be greatly influenced.

Description

Coding method and coding device

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to an encoding method and an encoding apparatus.

Background

In recent years, with the rapid development of video services oriented to the internet and wireless networks, the goal of video coding has been to better adapt video streams to various network environments and user terminals with certain fault tolerance and scalability due to the initial pursuit of high compression ratio. The best way to solve this problem is to use SVC (Scalable Video Coding). And the JVT (Joint Video Team) has already incorporated SVC into an extension of the h.264/AVC (Advanced Video Coding) standard, and has now been formally accepted as an international standard.

The SVC can provide a single stream from which a number of sub-streams can be extracted. The sub-code stream can meet the requirements of network transmission rate and the requirements of end users on the aspects of space, time, signal to noise ratio and the like of videos. The lowest quality layer that SVC can provide is called the base layer; a layer that can enhance spatial resolution, temporal resolution, or signal-to-noise strength is referred to as an enhancement layer.

In SVC, loss of base layer information affects not only subsequent time frames of the base layer but also enhancement layer frames referring to the base layer, so the base layer information is important. During SVC transmission, the base layer frame may be additionally protected by UEP (non-uniform Error Protection), for example, the base layer may be transmitted through a more reliable channel. Thus, in general, the lost data occurs mainly in the enhancement layer.

In the prior art, a single-layer error concealment optimization method is provided, which first assumes that a decoding end adopts an error concealment method of MC (Motion Copy), and then when a Motion vector and a mode of a certain macro block of a current frame are selected, a distortion part of a lagrangian cost function adds the Motion vector and the mode to the error concealment capability of the macro block at a corresponding position of a frame behind the current layer, so that a subsequent frame can be better recovered by using Motion information of the current frame under the condition that the decoding end loses the subsequent frame. The disadvantages of this prior art are: the mode calculation technology combined with the current layer error concealment optimization only enhances the correlation between frames in the current layer time direction, and does not consider the influence of different layers, namely the current layer coding on the enhancement layer, and can not meet the requirement of improving the correlation between the base layer and the enhancement layer.

Disclosure of Invention

In view of the above, an object of the embodiments of the present invention is to provide an encoding method and an encoding apparatus, which can increase inter-layer correlation.

In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions:

a method of encoding, the method comprising:

calculating values of Lagrange cost functions corresponding to each coding mode in a mode set, wherein the mode set comprises a plurality of modes, the Lagrange cost functions comprise error concealment distortion, and the error concealment distortion represents error concealment capability of base layer information on enhancement layer information;

selecting an encoding mode capable of minimizing the value of the Lagrangian cost function as an encoding mode of the base layer;

encoding the base layer according to the selected encoding mode;

when the encoding mode of the base layer belongs to an Inter mode or an Inter-layer prediction mode, the error concealment distortion includes a sum of estimated errors between an enhancement layer original frame of a current macroblock and an error concealment value of an enhancement layer found by base layer information, which are calculated based on a sum of squared errors SSD criterion.

An encoding apparatus for encoding a base layer, the encoding apparatus comprising:

the device comprises a calculation unit and a processing unit, wherein the calculation unit is used for calculating values of Lagrange cost functions corresponding to each coding mode in a mode set, the mode set comprises a plurality of modes, the Lagrange cost functions comprise error concealment distortion, and the error concealment distortion represents the error concealment capability of basic layer information to enhancement layer information;

the mode selection unit is used for selecting the coding mode corresponding to the minimum value in the Lagrange cost function values calculated by the calculation unit;

a base layer encoding unit for encoding the base layer according to the mode selected by the mode selection unit;

the calculation unit includes:

a first error concealment distortion calculation unit, configured to calculate, based on an SSD criterion, a sum of estimated errors between an original frame of an enhancement layer of a current macroblock and an error concealment value of the enhancement layer found from base layer information, to obtain error concealment distortion when a coding mode of the base layer belongs to an Inter mode or an Inter-layer prediction mode;

and the first cost function calculation unit is used for calculating the value of the Lagrangian cost function according to the error concealment distortion obtained by the first error concealment distortion calculation unit.

According to the technical scheme disclosed by the embodiment of the invention, when the mode selection is carried out on the base layer, the error hiding capability of the information of the base layer to the enhancement layer is added to increase the correlation between the information of the base layer and the enhancement layer, so that the error elasticity of the total code stream is improved by using the information of the base layer under the condition that the code stream is correctly received by the base layer and the enhancement layer is lost, and the total quality of the video is kept from being greatly influenced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of an encoding method according to an embodiment of the present invention;

fig. 2 to 5 are schematic diagrams of RD curves obtained by applying the embodiment of the present invention to a Bus sequence under different enhancement layer packet loss rates;

fig. 6 is a schematic diagram of an encoding apparatus according to a fourth embodiment of the present invention;

FIG. 7 is a diagram illustrating a structure of a computing unit according to a fourth embodiment of the present invention;

FIG. 8 is a diagram illustrating an architecture of a first error concealment distortion calculation unit according to a fourth embodiment of the present invention;

FIG. 9 is a diagram illustrating another structure of the first error concealment distortion calculating unit according to the fourth embodiment of the present invention;

FIG. 10 is a schematic diagram of another configuration of a computing unit in accordance with a fourth embodiment of the present invention;

fig. 11 is a schematic diagram of another structure of a computing unit according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

For a better understanding of the embodiment of the present invention, first, the process of the BLSkip (Base Layer Skip) error concealment method is discussed, and specifically, assuming that the nth frame of Layer l-1 is correctly received and the nth frame of Layer l is lost, the nth frame of Layer l is error concealed by the BLSkip error concealment method. Layer l-1 is defined as the base layer and layer l is the enhancement layer.

If the resolutions of the l-th layer and the l-1-th layer are the same, the formula reconstructed by the BLSkip error concealment method can be expressed as follows:

<math> <mrow> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>=</mo> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </msubsup> <mo>+</mo> <msubsup> <mover> <mi>r</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>,</mo> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>MB</mi> <mi>m</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein,

the signal is reconstructed for decoding of the n-th frame of the enhancement layer l layer,

is a motion vector mv using a base layer l-1 layer_l-1，nThe decoded reconstructed signal found at enhancement layer l layer n-1 frames,

the reconstructed residual for base layer l-1. Reconstructing residual errors

The reconstructed residual is not only decoded but also encoded, because the reconstructed residual at the decoding end and the reconstructed residual at the encoding end are identical if the base layer is correctly received at the decoding end. MB (multimedia broadcasting)_mFor the mth macroblock, i is the pixel coordinate. The first term to the right of the equal sign in equation (1) represents a prediction value found at the n-1 th frame of the enhancement layer using a motion vector of the base layer, and the second term represents a residual of the base layer frame. Because the resolutions of the l layer and the l-1 layer are the same, when the BLSkip error concealment is carried out, the motion vector of the enhancement layer is replaced by the motion vector of the base layer, a motion compensation value is found in the reference frame of the enhancement layer, and then the residual value of the corresponding position of the base layer is added on the basis of the compensation value to be used as the final reconstruction of the enhancement layer. The reason why the motion vector of the enhancement layer is replaced with the motion vector of the base layer is that: if there is no errorUnder the condition of error generation, when the enhancement layer is reconstructed, the motion compensation of the enhancement layer is added by the reconstructed residual error of the base layer; if an error occurs, the motion vector of the enhancement layer is lost, and since the base layer and enhancement layer have the same resolution, the motion vector of the enhancement layer can be replaced with the motion vector of the base layer.

If the resolutions of the l-th and l-1 layers are not the same, for example, the base layer is QCIF (quarter Common Intermediate Format) and the enhancement layer is CIF (Common Intermediate Format), one macroblock m of the base layer corresponds to four macroblocks m of the enhancement layer₁，m₂，m₃，m₄The reconstruction of pixel i is defined as:

<math> <mrow> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>=</mo> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </msubsup> <mo>+</mo> <msup> <mrow> <mo>[</mo> <mi>U</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>r</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mi>i</mi> </msup> <mo>,</mo> <mi>i</mi> <mo>&Element;</mo> <mo>{</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>4</mn> </msub> </msub> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>

in the above equation, U (-) is the upsampled filtering of the entire frame residual. S (-) is the scale up process of the motion vector, specifically, the motion vector of the macroblock m of the lower layer corresponding to the macroblock m1, m2, m3, m4 of the upper layer, and the motion vector value is multiplied by 2, and the mode is mapped accordingly.

The signal is reconstructed for decoding of the n-th frame of the enhancement layer l layer,for using the motion vector mv of the base layer_l-1，nThe decoded reconstructed signal found at enhancement layer l layer n-1 frames after upscaling,

the reconstructed residual for base layer l-1. Reconstructing residual errorsReconstructing residual not only for decoding but also for encoding, MB_mFor the mth macroblock, i is the pixel coordinate. The difference between the resolution and the same resolution in the reconstruction process is that: under the condition of different resolutionsCorresponding upscaling is required when the base layer motion vector is used, and corresponding upsampling filtering is required when the residual is used.

The above-mentioned intermediate processes are all that the corresponding position corresponding to the base layer is Inter mode, if the corresponding position is Intra mode, the process of reconstructing the enhancement layer can be performed by the following two equations respectively when the resolutions of the base layer and the enhancement layer are the same or different:

<math> <mrow> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>=</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>,</mo> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>MB</mi> <mi>m</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> </math>

<math> <mrow> <msubsup> <mover> <mi>f</mi> <mo>~</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>=</mo> <mi>U</mi> <msup> <mrow> <mo>(</mo> <msub> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mi>i</mi> </msup> <mo>,</mo> <mi>i</mi> <mo>&Element;</mo> <mo>{</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>4</mn> </msub> </msub> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> </math>

as can be seen from equations (3) and (4), when the corresponding location of the base layer is in the Intra mode, when the resolutions of the base layer and the enhancement layer are the same, the reconstructed value of the corresponding location of the base layer is directly used as the reconstructed value of the corresponding location of the enhancement layer, and when the resolutions are different, the reconstructed value of the base layer needs an upsampling filtering process. Here, ,

is the encoded reconstructed signal.

In a network packet loss environment, if an enhancement layer frame is lost and a base layer receives correctly, a BLSkip error concealment method can be adopted to perform recovery by using a motion vector, a residual error and the like of the base layer. The inventors have found, when implementing an embodiment of the present invention, that the greater the information correlation of the base layer and the enhancement layer, the better the quality of the recovery. In the embodiment of the invention, in the process of selecting the macroblock mode of the base layer, the original information of the enhancement layer is introduced to calculate the error concealment capability of the base layer information to the enhancement layer, and the restored base layer mode, the reference frame index, the motion, the residual error and the texture which can enable the enhancement layer to have higher quality are selected, so that the information correlation between the base layer and the enhancement layer is improved.

Example one

The first embodiment provides an encoding method. In this embodiment, the decoding side performs error concealment on the enhancement layer by using a BLSkip error concealment method. When the base layer is coded, the error hiding capability of the information of the base layer to the frame of the enhancement layer is comprehensively considered, and the error hiding capability represents the correlation capability between layers. The mode of the base layer is an Inter mode.

In detail, as shown in fig. 1, the method includes:

step S101: when the macro block of the basic layer is coded, calculating Lagrange cost function values corresponding to all coding modes in the mode set;

the lagrangian cost function includes information source distortion, a code rate, and an error concealment capability of the base layer information estimated by the encoding terminal according to the BLSkip error concealment method for the enhancement layer, and particularly, the lagrangian cost function can be expressed by the following equation:

J＝D_s+λR+ω·D_ec (5)

wherein D is_sFor the source distortion, i.e. the distortion measure of the original signal and the reconstructed signal, the SSD (Sum of squared differences) criterion is adopted; r is a code rate, i.e. bit information used for encoding a certain macroblock, including a macroblock header, motion information, transform quantization coefficients, etc.; λ is Lagrange multiplier, and under SSD criterion, λ is 0.85.2^(QP-12)/3QP is a quantization parameter, and if other criteria are adopted to calculate the source distortion Ds, the cost function is calculated by adopting a corresponding Lagrange multiplier lambda; d_ecThe error concealment distortion is the error concealment capability of the base layer information estimated by the encoding end to the enhancement layer.

For the case where the resolution of the base layer and the enhancement layer is the same, the error concealment distortion can optimize the mode selection process by adjusting the error concealment distortion weights ω. In this embodiment, D_ecThis can be calculated by:

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>MB</mi> <mi>m</mi> </msub> </mrow> </munder> <msup> <mrow> <mo>[</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <mrow> <mo>(</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </msubsup> <mo>+</mo> <msubsup> <mover> <mi>r</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>

wherein, since the l-th layer is not encoded and reconstructed when the macroblock of the l-1-th layer is encoded, the l-th layer is used hereinIs calculated from the original frame f. In the formula (6), in

In order to enhance the original signal of the layer,

for using the motion vector mv of the base layer_l-1，nThe decoded reconstructed signal found at enhancement layer l layer n-1 frames,

is the reconstructed residual of the base layer l-1 layer, mv is the base layer motion vector, MB_mFor the mth macroblock, i is the pixel coordinate,

is equal toEnhancement layer error concealment capabilities found for base layer information. And can be seen from the formula (6)

Reference frame reconstruction in enhancement layer using base layer information when encoding base layer

The sum of the found motion compensation value and residual value for the enhancement layer, i.e. the

The base layer information may include motion and residual.

To estimate the error, the calculation method of equation (6) is actually to calculate the sum of the errors using the SSD criterion.

For the case that the resolutions of the base layer and the enhancement layer are different, for example, the base layer is QCIF (Quarter Common Intermediate Format), the enhancement layer is CIF (Common Intermediate Format), one macroblock m of the base layer corresponds to four macroblocks m1, m2, m3, m4 of the enhancement layer, and the error concealment distortion of the macroblock m of the base layer can be defined as:

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <mo>{</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>4</mn> </msub> </msub> <mo>}</mo> </mrow> </munder> <msup> <mrow> <mo>{</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <mrow> <mo>(</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </msubsup> <mo>+</mo> <mi>U</mi> <msup> <mrow> <mo>(</mo> <msub> <mover> <mi>r</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mi>i</mi> </msup> <mo>)</mo> </mrow> <mo>}</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>

in the above equation, U (-) is the upsampled filtering of the entire frame residual. S (-) is the scale up process of the motion vector, specifically, the lower macroblock m corresponds to the motion vector of the upper m1, m2, m3, m4 macroblock, and the motion vector value is multiplied by 2, and the mode is mapped accordingly. The difference between the resolution and the same resolution in the reconstruction process is that: under the condition of different resolutions, corresponding scale amplification is needed when the base layer motion vector is used, and corresponding upsampling filtering is needed when the residual error is used.

For using the motion vector mv of the base layer_l-1，nThe decoded reconstructed signal found at enhancement layer l layer n-1 frames after upscaling,

reconstructed residual, MB, for base layer l-1_mFor the mth macroblock, i is the pixel coordinate.

It can be seen that, for the situation where the resolutions of the base layer and the enhancement layer are different, the motion information needs to be upscaled when used, and meanwhile, the residual needs to be upsampled and filtered when used. In the formula (7)Is equal to

In step 101, the mode set may include the following modes: skip or Direct, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Inter motion prediction and Inter residual prediction.

It should be noted that, if the l-1 layer is the bottom layer, the following modes may be included in the mode set: skip or Direct, Inter16 × 16, Inter16 × 8, Inter8 × 16, and Inter8 × 8.

If the l-1 layer is the highest layer, no l-th layer is taken as the enhancement layer, and no enhancement layer needs to be considered when the mode selection is performed on the l-1 layer of the current layer, so that no error concealment distortion of the error concealment capability of the enhancement layer caused by the base layer information exists, and D is not included in the Lagrange cost function_ecAnd at this time, the following modes can be included in the mode set: intra4 × 4, Intra8 × 8, Intra16 × 16, Skip or Direct, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Inter motion prediction and Inter residual prediction.

The inventor discovers that in the process of implementing the embodiment of the invention: in the case where the resolutions of the base layer and the enhancement layer are different, the upsampling filtering performed when estimating the error concealment capability of the base layer information to the enhancement layer results in a large amount of computation. The embodiment further reduces the complexity of the calculation, saves the calculation time, and conceals the distortion D in the calculation of the error_ecWhen the residual information is not considered. Thus, the following two equations can be adopted for the same and different resolutions of the base layer and the enhancement layerComputing error concealment distortion D_ecNamely:

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>MB</mi> <mi>m</mi> </msub> </mrow> </munder> <msup> <mrow> <mo>[</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> </mrow> </msubsup> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math>

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <mo>{</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>4</mn> </msub> </msub> <mo>}</mo> </mrow> </munder> <msup> <mrow> <mo>[</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>+</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>mv</mi> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </msubsup> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow> </math>

in calculating the error concealment distortion, the reconstructed signals are encoded in the above equations (6), (7), (8) and (9)

The original signal f may be substituted.

Step S102: selecting an encoding mode capable of minimizing the value of the Lagrangian cost function as an encoding mode of the base layer;

step S103: the base layer is encoded according to the encoding mode selected in step S102.

In this embodiment, when the mode selection is performed on the base layer, a term proportional to the error concealment capability of the information of the base layer to the enhancement layer is added to the calculated lagrangian cost function, so that when the mode corresponding to the minimum value of the lagrangian cost function is selected, the error concealment capability of the information of the base layer to the enhancement layer is also taken into consideration comprehensively, and thus the correlation between the base layer and the enhancement layer can be increased, so that the code stream is correctly received at the base layer, the error resilience of the total code stream is improved by using the information of the base layer under the condition that the enhancement layer is lost, and the total quality of the video is kept from being greatly influenced.

Example two

In the encoding method provided by this embodiment, the situation that the Intra mode exists in the mode set is considered, and the error concealment distortion D is calculated by adopting different modes for the Intra mode_ec。

The encoding method provided in this embodiment is similar to the embodiment, and it is first necessary to select the mode with the smallest cost function value from the mode set, and then encode the base layer according to the mode. Specifically, for the case that there is an Intra mode in the mode set, the calculation cost function can still be calculated by the equation (5) in the first embodiment, and all possible modes are included in the alternative mode set.

The difference between the second embodiment and the first embodiment is that: error concealment distortion D is calculated by adopting different modes for Inter mode, Inter-layer prediction mode and Intra mode_ec(ii) a Wherein the error concealment distortion D is calculated for Inter and Inter prediction modes (including Inter texture prediction, Inter motion prediction and Inter residual prediction) for the same or different resolutions of the base and enhancement layers_ecThe methods of (2) can still be adopted separatelyEquations (6) and (7) in the first embodiment, and similarly, in view of further saving of the calculation efficiency, equations (8) and (9) may be used for calculation; for the Intra mode, the error concealment distortion D can be calculated using equations (10) and (11)_ec。

For the Intra mode, when the resolutions of the base layer and the enhancement layer are the same, the error concealment distortion D can be calculated by the following equation_ec：

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <msub> <mi>MB</mi> <mi>m</mi> </msub> </mrow> </munder> <msup> <mrow> <mo>[</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <msubsup> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>

In the above formula

For the texture reconstruction value, it can be seen from the above formula that, under the condition that the resolutions of the base layer and the enhancement layer are the same, the error concealment capability of the reconstruction value to the corresponding position of the enhancement layer is estimated by the texture reconstruction value of the base layer, and the formula (10) showsTo estimate the error, the estimation error is calculated using the SSD criterion.

For the Intra mode, when the resolution of the base layer and the enhancement layer are different, for example, the base layer is QCIF and the enhancement layer is CIF, the error concealment distortion D can be calculated by using the following formula_ec：

<math> <mrow> <msub> <mi>D</mi> <mi>ec</mi> </msub> <mo>&cong;</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&Element;</mo> <mo>{</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>3</mn> </msub> </msub> <mo>,</mo> <msub> <mi>MB</mi> <msub> <mi>m</mi> <mn>4</mn> </msub> </msub> <mo>}</mo> </mrow> </munder> <msup> <mrow> <mo>[</mo> <msubsup> <mi>f</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>n</mi> </mrow> <mi>i</mi> </msubsup> <mo>-</mo> <mi>U</mi> <msup> <mrow> <mo>(</mo> <msub> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mi>l</mi> <mo>-</mo> <mn>1</mn> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mi>i</mi> </msup> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>

It can be seen from the above equation that the error concealment capability of the enhancement layer is estimated by upsampling the texture reconstruction value of the base layer, where,

to estimate the error, the estimation error is calculated using the SSD criterion. It should be noted that the upsampling process of the texture reconstruction value in equation (11) is different from the filter in the upsampling process used for the residual in the first embodiment, a 4-Tap (Tap) filter is used for the upsampling of the texture reconstruction value in equation (11), and a 2-Tap filter is used for the upsampling process of the residual in the first embodiment. The specific meaning of the above-mentioned differences is already described in the SVC standard and will not be described herein.

In addition, when the base layer is in an Intra mode, the decoding end adopts either single-loop decoding or multi-loop decoding, and when the encoding end estimates error concealment distortion for the enhancement layer by using a BLSkip error concealment method, texture information of the base layer or texture information of the upsampled base layer can be directly utilized, so that residual information does not need to be considered when calculating the error concealment distortion.

Texture reconstruction values in equations (10) and (11) for calculating error concealment distortion for Intra mode

The original signal f may be used instead.

EXAMPLE III

In the encoding method provided in this embodiment, a certain extension is performed on the lagrangian cost function, and in detail, the cost function is expressed as follows:

as can be seen from the above formula, the definition of the cost function in this embodiment is different from that in the first and second embodiments in that, for the Inter mode or the Inter-layer prediction mode, the reference frame propagation distortion D is added_{ep_ref}And is used for refreshing the Intra mode. Reference frame propagation distortion D_{ep_ref}The calculation of (2) can refer to the prior art, and is not described in detail herein.

The following describes a specific application process of the embodiment of the present invention by taking the first embodiment as an example.

The method in the first embodiment is inherited into reference software of JSVM, and an experimental environment meeting the following conditions is designed:

1) the sequence is as follows: bus, Foreman;

2) two layers, the base layer: QCIF30Hz, enhancement layer: CIF30 Hz.

3) Base layer quantization parameters: QP0 ═ 32; enhancement layer quantization parameter: QP1 is 20, 24, 28, 32.

4) The GOP structure is IPPP ….

5) The Intra frame period is 30.

6) 4000 frames are coded.

7) Omega is 0.25, 0.5 and 1.0 respectively.

8) With Inter frames and no Intra mode.

The simulation network packet loss tool adopts tools in articles "SVC/AVCLOSs Simulator, JVT-Q069, Oct.2005" published by Y.Guo, H.Li and Y.Wang, the base layer receives completely and correctly, the enhancement layer applies 3%, 5%, 10% and 20% packet loss rates respectively, the packet loss files are provided by ITU-T/SG16/Q15-I-16r1, S.Wenger, Error patterns for Internet.experiments, Oct.1999 ", and one frame is a packet.

The statistical Bus and Foreman sequences select the same ratio in the enhancement layer as in the base layer mode, as shown in tables 1 and 2:

table 1 Bus enhancement layer selection same ratio as base layer mode

	Anchor	ω＝0.25	ω＝0.5	ω＝1
					QP1＝20	9.65	13.06	13.58	13.69
QP1＝24	9.44	12.52	12.88	13.38
					QP1＝28	10.09	12.97	13.51	13.83
QP1＝32	13.40	16.93	17.64	18.08

Table 2 enhancement layer selection in the same ratio as base layer mode for Foreman case

	Anchor	ω＝0.25	ω＝0.5	ω＝1
					QP1＝20	9.74	13.38	13.92	14.35
QP1＝24	10.32	13.85	14.40	14.93
					QP1＝28	13.18	16.86	17.67	18.42
QP1＝32	18.88	23.11	24.49	25.39

As can be seen from tables 1 and 2, the larger ω is, the larger percentage of the base layer mode is, which means that the inter-layer motion vector correlation increases, so that it can be seen that the method of the present invention can actually increase the inter-layer correlation and improve the error resilience capability of the enhancement layer in the packet loss environment.

In addition, compared with the conventional method (corresponding to the case of ω being 0, that is, no Packet Loss, which is denoted as Anchor), the RD curves of the Packet Loss rates (PLR, Packet Loss Rate) of the enhancement layers are compared, taking the Bus sequence as an example, with ω being 0.25 in the embodiment of the present invention, and are shown in fig. 2 to fig. 5, respectively. As can be seen from the figure, the PSNR (Peak Signal-to-Noise Ratio) value is greatly improved at each packet loss rate, and particularly, under the condition of 20% packet loss rate, the gain of 1dB is better, which fully proves the effectiveness of the embodiment of the present invention.

Example four

The present embodiment correspondingly provides an encoding apparatus for encoding a base layer, as shown in fig. 6, the encoding apparatus 600 includes:

a calculating unit 601, configured to calculate a value of a cost function corresponding to each mode in the mode set;

a mode selecting unit 602, configured to select a minimum value from the cost function values calculated by the calculating unit, and select a mode corresponding to the minimum value;

a base layer encoding unit 603 configured to encode the base layer according to the mode selected by the mode selection unit 602.

The calculating unit 601 also has different constituent units according to the difference of the coding modes of the base layer and whether the respective rates of the base layer and the enhancement layer are the same. Specifically, when the coding mode of the base layer belongs to the Inter mode or the Inter-layer prediction mode, as shown in fig. 7, the calculation unit 601 may include:

a first error concealment distortion calculation unit 701, configured to calculate, based on an SSD criterion, a sum of estimated errors between an original frame of an enhancement layer of a current macroblock and an error concealment value of the enhancement layer found from base layer information, to obtain error concealment distortion when a coding mode of the base layer belongs to an Inter mode or an Inter-layer prediction mode;

a first cost function calculating unit 702, configured to calculate a value of a lagrangian cost function according to the error concealment distortion obtained by the first error concealment distortion calculating unit 701.

Wherein, when the coding mode of the base layer belongs to the Inter mode or the Inter-layer prediction mode, for the case when the resolutions of the base layer and the enhancement layer are the same, as shown in fig. 8, the first error concealment distortion calculation unit 701 may include:

a first enhancement layer error concealment value calculation unit 801, configured to calculate a sum of a residual error and a motion compensation value of an enhancement layer that is found by reconstructing a reference frame of the enhancement layer using base layer information when resolutions of the base layer and the enhancement layer are the same, to obtain an error concealment value of the enhancement layer;

a first error sum calculating unit 802, configured to calculate an estimated error sum between an enhancement layer original frame of a current macroblock and an error concealment value of the enhancement layer obtained by the first enhancement layer error concealment value calculating unit 801 based on SSD criteria.

Wherein, when the coding mode of the base layer belongs to the Inter mode or the Inter-layer prediction mode, and when the resolutions of the base layer and the enhancement layer are not the same, as shown in fig. 9, the first error concealment distortion calculation unit 701 may include:

a second enhancement layer error concealment value calculation unit 901, configured to calculate a sum of a motion compensation value and an up-sampling value of a residual error of the enhancement layer, which are found by reconstructing a reference frame of the enhancement layer with the base layer information, when resolutions of the base layer and the enhancement layer are different;

a second error sum calculating unit 902, configured to calculate, based on SSD criterion, a sum of estimated errors between the enhancement layer original frame of the current macroblock and the error concealment value of the enhancement layer obtained by the second enhancement layer error concealment value calculating unit 901.

When the encoding mode of the base layer belongs to the Intra mode and the resolutions of the base layer and the enhancement layer are the same, as shown in fig. 10, the calculating unit 601 may include:

a second error concealment distortion calculation unit 1001, configured to calculate, based on an SSD criterion, a sum of estimation errors between an original frame of an enhancement layer of a current macroblock and a texture reconstruction estimation value of the base layer when resolutions of the base layer and the enhancement layer are the same, so as to obtain error concealment distortion when a coding mode of the base layer belongs to an Intra mode;

a second cost function calculating unit 1002, configured to calculate a value of a lagrangian cost function according to the error concealment distortion obtained by the second error concealment distortion calculating unit 1001.

When the encoding mode of the base layer belongs to the Intra mode and the resolutions of the base layer and the enhancement layer are not the same, as shown in fig. 11, the calculating unit 601 may include:

a third error concealment distortion calculation unit 1101, configured to calculate, based on the SSD criterion, a total of estimation errors between an original frame of an enhancement layer of a current macroblock and an upsampled value of a texture reconstruction estimation value of the base layer, when resolutions of the base layer and the enhancement layer are different, to obtain error concealment distortion when a coding mode of the base layer belongs to an Intra mode;

a third price function calculating unit 1102, configured to calculate a value of a lagrangian cost function according to the error concealment distortion obtained by the third error concealment distortion calculating unit 1101.

The coding device provided by the embodiment can increase the correlation between the base layer and the enhancement layer when the base layer is coded, so that the code stream is correctly received at the base layer, the error resilience of the total code stream is improved by using the information of the base layer under the condition that the enhancement layer is lost, and the total quality of the video is not greatly influenced.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of encoding, the method comprising:

encoding the base layer according to the selected encoding mode;

2. The method of claim 1, wherein:

when the base layer and the enhancement layer have the same resolution, the error concealment value of the enhancement layer found by the base layer information includes: reconstructing the sum of the motion compensation value and the residual error of the enhancement layer found in the enhancement layer reference frame by using the base layer information;

when the resolutions of the base layer and the enhancement layer are not the same, the error concealment value of the enhancement layer found by the base layer information includes: the sum of the found motion compensation value for the enhancement layer and the up-sampled value of the residual is reconstructed at the enhancement layer reference frame using the base layer information.

3. The method according to claim 1 or 2, wherein said lagrangian cost function further comprises: the reference frame propagates the distortion value.

4. The method of claim 1, wherein when the mode of the base layer belongs to an Intra mode,

when the base layer and enhancement layer resolutions are the same, the error concealment distortion comprises an estimated error sum between an enhancement layer original frame of a current macroblock and a texture reconstruction estimation value of the base layer, which is calculated based on an SSD criterion;

when the base layer and enhancement layer resolutions are not the same, the error concealment distortion includes an estimated error sum between an enhancement layer original frame of a current macroblock and an up-sampling value of a texture reconstruction estimation value of the base layer calculated based on an SSD criterion.

5. An encoding apparatus for encoding a base layer, comprising:

the device comprises a calculation unit and a processing unit, wherein the calculation unit is used for calculating values of Lagrange cost functions corresponding to each coding mode in a mode set, the mode set comprises a plurality of modes, the Lagrange cost functions comprise error concealment loss, and the error concealment loss represents the error concealment capability of basic layer information to enhancement layer information;

the calculation unit includes:

6. The encoding apparatus according to claim 5, wherein the first error concealment distortion calculation unit includes:

a first enhancement layer error concealment value calculation unit, configured to calculate a sum of a residual error and a motion compensation value of the enhancement layer that are found by reconstructing a reference frame of the enhancement layer using the base layer information when the resolutions of the base layer and the enhancement layer are the same, to obtain an error concealment value of the enhancement layer;

a first error sum calculating unit, configured to calculate, based on SSD criteria, a sum of estimated errors between an enhancement layer original frame of a current macroblock and the error concealment value of the enhancement layer obtained by the first enhancement layer error concealment value calculating unit.

7. The encoding apparatus according to claim 5, wherein the first error concealment distortion calculation unit includes:

a second enhancement layer error concealment value calculation unit for calculating a sum of a motion compensation value and an up-sampling value of a residual error of the enhancement layer, which are found by reconstructing a reference frame of the enhancement layer using the base layer information, when the resolutions of the base layer and the enhancement layer are different;

and a second error sum calculation unit for calculating an estimated error sum between the enhancement layer original frame of the current macroblock and the error concealment value of the enhancement layer obtained by the second enhancement layer error concealment value calculation unit based on the SSD criterion.

8. The encoding device according to claim 5, wherein the calculation unit includes:

a second error concealment distortion calculation unit, configured to calculate, based on an SSD criterion, a sum of estimation errors between an original frame of an enhancement layer of a current macroblock and a texture reconstruction estimation value of the base layer when resolutions of the base layer and the enhancement layer are the same, to obtain error concealment distortion when a coding mode of the base layer belongs to an Intra mode;

and the second cost function calculation unit is used for calculating the value of the Lagrangian cost function according to the error concealment distortion obtained by the second error concealment distortion calculation unit.

9. The encoding device according to claim 5, wherein the calculation unit includes:

a third error concealment distortion calculation unit, configured to calculate, based on an SSD criterion, a sum of estimation errors between an original frame of an enhancement layer of a current macroblock and an upsampled value of a texture reconstruction estimation value of the base layer, when resolutions of the base layer and the enhancement layer are different, to obtain error concealment distortion when a coding mode of the base layer belongs to an Intra mode;

and the third price function calculating unit is used for calculating the value of the Lagrangian price function according to the error concealment distortion obtained by the third error concealment distortion calculating unit.