Summary of the invention
In order to improve the efficient of telescopic image sequential coding, the present invention proposes a kind of method for scalable video enhancement layer estimation, comprise the steps: at first, basic tomographic image macro block is encoded; Deposit motion vector and the predicated error thereof of basic each macro block of layer in internal memory, use during for enhancement layer coding; Then enhancement layer is encoded, adopt the motion vector of its reference layer and the starting point that predicated error is used for calculating rough search, described reference layer can be the enhancement layer of basic layer or reduced size.
According to an aspect of the present invention, the implementation method that basic tomographic image macro block is encoded is as follows:
The first step, the brightness value that reads in current macroblock to be encoded is to array Image[16] in [16];
Second step is to Image[16] [16] and Region1[48] [48] image SImage[4 after doing the down conversion computing and obtaining it and dwindle] [4] and SRegion[12] [12];
The 3rd step is at SRegion[] search for and SImage[in []] [] position of mating the most;
The 4th step, to px=[-3,3], py=[-3,3] calculation code cost respectively, choose coding cost minimum motion vector (px, py) as the result of coarse search, the whole pixel motion vector of its correspondence be that (4*px 4*py), remembers this motion vector value and makes mv0;
The 5th step is from Region1[] the horizontal longitudinal direction of searching position according to mv0 motion vector correspondence [] is ± 4 scope is to Image[16] [16] do full searching moving and estimate;
The 6th step, to dx=[-4,4], dy=[-4,4] calculate its coding cost value respectively, (mv0.x+dx mv0.y+dy) for the motion vector of whole pixel fine search, makes mv1 with this motion vector value note to choose the motion vector of cost minimum;
The 7th step, to Region1[] in [] the horizontal longitudinal direction of corresponding mv1 position be ± do H.264 1/2 pixel and 1/4 pixel filter of algorithm dictates in 1 the scope, namely use the 6-tap filter of normalized definition to Region1[x] [y], the zone of x=16+mv1.x-1 all enlarges four times with the horizontal longitudinal direction of image, and it is Region2[72-3 that the expanded view of generation looks like] [72-3];
The 8th step is at Region2[] in [] to Image[] [] full search, obtain predicated error under 1/4 pixel precision;
The 9th step, to qx=[-4,4], qy=[-4,4] calculate its cost value respectively, choose the cost minimum (qx, qy) (4*mv1.x+qx is 4*mv1.y+qy) as the motion vector mv of 1/4 final pixel unit for Dui Ying motion vector.
According to an aspect of the present invention, as follows to enhancement layer image macroblock encoding implementation method:
The first step is taken out the motion vector mvA of four macro blocks of this enhancement layer macro block corresponding reference layer
N, mvB
N, mvC
N, mvD
N, to centered by it ± vector set in 4 scopes gets common factor, is defined as S
Mv
Second step is in S set
MvIn, the coding cost of the various mv that calculating is corresponding is got the mv of coding cost minimum as the mv0 as a result of rough search
N+1
In the 3rd step, if occur simultaneously for empty, then directly use the mean value of reference layer motion vector as the motion vector mv0 of rough search
N+1
In the 4th step, according to the optimum movement vector of basic five steps of layer step the to the inter prediction of the 9th step calculating enhancement layer, the interframe movement that obtains the N+1 layer is estimated optimum movement vector, and this motion vector is labeled as mv
N+1
The 5th step, calculate the optimum movement vector of the inter prediction that uses inter-layer residue prediction instrument gained to the 9th step according to basic five steps of layer step the, the optimum movement vector that obtains adopting the interframe movement of the N+1 layer of inter-layer residue prediction instrument to estimate is labeled as mv ' with this motion vector
N+1
The 6th step, relatively mv
N+1And mv '
N+1The coding cost, select the motion vector mv of coding cost minimum as the final motion vector of this macro block;
The 7th step if this layer is top, then stopped, if then according to said method continue encoding from level to level down in the intermediate layer.
The invention allows for a kind of equipment for scalable video enhancement layer estimation, comprise following part: basic layer coarse movement search module is used for the motion vector of basic layer is carried out rough search; Enhancement layer coarse movement search module is used for the motion vector of enhancement layer is carried out rough search; Whole pixel precision motion search/estimation module utilizes the motion vector of rough search to come further computing, obtains the motion search/estimated result of whole pixel; / 4th pixel precision motion search/estimation module are utilized the result who puts in order pixel search/estimation module, and further computing obtains the motion search/estimated result of 1/4th pixels, and exports reconstructed image and the relevant parameter of this moment.
According to an aspect of the present invention, above-mentioned whole pixel precision motion search/estimation module and 1/4th pixel precision motion search/estimation module are when being used for enhancement layer coding, can assessment whether use the inter-layer residue prediction instrument, to obtain the optimum movement vector of inter prediction.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explaining the present invention, and be not used in restriction the present invention.
In the coding of telescopic image sequence, basic layer refers to the image sequence of picture size minimum, and its coding is only used the image of identical layer.Enhancement layer refers to picture size than the image sequence of basic layer of big or coded picture quality than basic floor height.The coding of enhancement layer not only can be used the image information of its identical layer, also can use the information of the image sequence lower than its level.Image layer with the used low level of enhancement layer coding is called its reference layer herein.Reference layer can be basic layer, also can be other enhancement layer (layer that image resolution ratio or quality are lower than the enhancement layer of present encoding).
The algorithm optimization of estimation has had many researchs.Carry out estimation for non-telescopic image sequence, can be undertaken by process shown in Figure 1.
The first step after motion estimation process begins, is read in original image macro block Image[16] [16] and searching image zone Region[H] [W];
Second step is with original image macro block Image0[16] the regional Region0[H of [16] and searching image] [W] down conversion is SImage[4] [4] and SRegion[H/4] [W/4];
The method of down conversion can require to adopt linear transformation or multi-stage filter according to operational capability and the sequential of system.The horizontal longitudinal direction of original image is dwindled four times herein, directly adopt the mean value of 16 corresponding pixels as the result of down conversion.That is,
X=0..3 wherein, y=0..3
SRegion[H] calculating and the following formula of [W] be similar.
The 3rd step is at SRegion[] in [] to SImage[4] [4] do full search, calculates SAD(x, y) and penalty(x, y), with SAD (x, y)+penalty (x, minimum value correspondence y) (x is y) as the motion vector mv0 of rough search;
The 4th one, read in the regional Region1[16+2e+2f centered by the searching image zone of mv0 motion vector correspondence] [16+2e+2f];
The 5th step is at Region1[f~16+2e+2f] in [f~16+2e] to Image[16] [16] do full search, calculates SAD(x, y) with penalty (x, y), with SAD(x, y)+penalty (x, minimum value correspondence y) (x is y) as the motion vector mv1 of fine search;
The 6th step is with Region1[] [] carry out filtering, produces the Region2[16+2e of half-pix and 1/4 pixel correspondence] [16+2e];
The 7th step is at Region2[] in [] to Image[16] [16] do full search, calculate SAD (x, y) with penalty (x, y), with SAD(x, y)+penalty (x, minimum value correspondence y) (x is y) as the motion vector mv of fine search;
Finish motion estimation process.
At the telescopic image sequence, the method for estimating when the present invention proposes a kind of telescopic image sequential coding.Be convenient and set forth that present embodiment is with the following example that is set to: enhancement layer size (wide and high) is 2 times of its reference layer, and motion vector is at x, hunting zone on the y direction is example with ± 14, estimation is only carried out single directional prediction to a reference picture, a basic layer is arranged, a plurality of enhancement layers.The motion estimation algorithm of the enhancement layer after the improvement as shown in Figure 2.
At first basic layer is encoded.And use when depositing the motion vector of its final each macro block and predicated error thereof internal memory in for enhancement layer coding.Then enhancement layer is encoded.The motion vector of its reference layer (can be the enhancement layer of basic layer or reduced size) and the starting point that predicated error is used for calculating rough search have wherein been adopted.
Below be that example illustrates above-mentioned cataloged procedure with a macroblock encoding.Set forth for simplifying, this sentences search 16x16 macroblock motion vector is example.Fritter (8x8,4x4) can carry out with reference to this explanation by the search of image.
In cataloged procedure, to the coding cost under the different motion vector, can adopt following formula to estimate.
Cost (mv.x, mv.y)=SAD (mv.x, mv.y)+panelty (mv.x, mv.y)+Const (t)----formula 1
Wherein the absolute value of the predicted picture that calculates by standard code when adopting motion vector mv (unit is 1/4 pixel) of SAD and the error of original image with.SAD is in order to assess the matching degree of target image and reference picture under this motion vector, i.e. Yu Ce error.Panelty (mv.x, the coding penalty value during mv.y) for this motion vector of employing.The number of coded bits of available this motion vector is represented.Describe for simplifying, the calculating of Panelty herein when motion vector carries out the VLC coding according to standard thus required bit number represent.Const (t) is the more coded-bit that produces when adopting different coding toolses, can be made as constant, and t represents different coding toolses.When being in enhancement layer coding, this can use two instruments, i.e. motion vector inter-layer prediction and prediction residual inter-layer prediction.
To basic tomographic image macro block (the coding implementation method that its upper left corner coordinate is (x, y)) is as follows:
1. the brightness value that reads in current macroblock to be encoded is to array Image[16] in [16].Read in reference data Region1[H in its hunting zone from its reference picture] [W].Wherein, Region1 upper left corner coordinate be (x-16, y-16), lower right corner coordinate be (x+32, y+32).If coordinate falls into outside the image, fill with its nearest valid pixel by normalized definition.Work as x, the hunting zone on the y direction is ± 14 o'clock, and H=48, W=48 can satisfy all required pixels of estimation.Reference data manys two pixels than the scope of motion search, is because estimation can adopt the 6-tap filter that the image value at 1/2 pixel and 1/4 pixel place estimate like this, so will take out 2 capable and 2 be listed as on the both sides more.Like this, to the piece of 16x16, respectively expand up and down and 14 row as the hunting zone, expand up and down again and 2 row as the inputs of 6-tap filter, just obtain the valid data of 48 row.Similarly, left and right directions obtains 48 row images.
2. to Image[16] [16] and Region1[48] [48] image SImage[4 after doing the down conversion computing and obtaining it and dwindle] [4] and SRegion[12] [12].The method of down conversion can require to adopt linear transformation or multi-stage filter according to operational capability and the sequential of system.The horizontal longitudinal direction of original image is dwindled four times herein, really connect the mean value of 16 pixels that adopt correspondence as the result of down conversion.That is,
X=0..3 wherein, y=0..3
SRegion[12] calculating and the following formula of [12] be similar.
3. at SRegion[] search and SImage[in []] [] position of mating the most.This sentences SAD is that example is as the measurement of matching error.
Px=[-3 wherein, 3], py=[-3,3].SAD calculated motion vector for (px, SImage[in the time of py)] [] and SRegion[] [] corresponding blocks error absolute value and.Because SImage[] [] and SRegion[] [] be the result of four times of down conversions of image, so corresponding original image, motion vector then be (4*px, 4*py) (be unit to put in order pixel).
4. to px=[-3,3], py=[-3,3] calculate its coding cost respectively.Its coding cost adopts formula 1 to calculate, wherein each group (px, py) Dui Ying motion vector be (16*px, 16*py).This is because the motion vector of final coding is 1/4 pixel precision, so the value of motion vector correspondence when coding is 4 times of its whole pixel motion vector.(px, py) as the result of coarse search, the whole pixel motion vector of its correspondence be that (4*px 4*py), makes mv0 (be unit to put in order pixel) with this motion vector value note to choose the motion vector of coding cost minimum.
5. from Region1[] the horizontal longitudinal direction of searching position according to mv0 motion vector correspondence [] is ± 4 scope is to Image[16] [16] do full searching moving and estimate.Be calculated as follows SAD as the measurement of matching error,
Dx=[-4 wherein, 4], dy=[-4,4].These sad values are noted, used in order to enhancement layer.
6. to dx=[-4,4], dy=[-4,4] calculate its coding cost value respectively.The motion vector penalty value is set at motion vector and is (4* (mv0.x+dx), the corresponding coding penalty value of 4* (mv0.y+dy).Because the motion vector of final coding is 1/4 pixel precision, so the value of its correspondence when coding is four times of current integer estimation.(mv0.x+dx mv0.y+dy) for the motion vector of whole pixel fine search, makes mv1 with this motion vector value note to choose the motion vector of cost minimum.
7. to Region1[] in [] the horizontal longitudinal direction of corresponding mv1 position be ± do H.264 1/2 pixel and 1/4 pixel filter of algorithm dictates in 1 the scope, namely use the 6-tap filter of normalized definition to Region1[x] [y], the zone of x=16+mv1.x-1 all enlarges four times with the horizontal longitudinal direction of image, and it is Region2[72-3 that the expanded view of generation looks like] [72-3].(it is the 18x18 zone that 16x16 zone enlarges ± 1, becomes 72x72 after four times.Because the difference of 1/2 and 1/4 pixel is no longer asked in the zone outside+1, thus outside 3 row and columns invalid.)
8. at Region2[] in [] to Image[] [] full search.Obtain predicated error under 1/4 pixel precision.Predicated error can be measured with SAD.
Qx=[-4 wherein, 4], qy=[-4,4] and, (qx is (4 py) but do not calculate,-4), (0 ,-4), (4 ,-4), (4,0), (0,0), (4,0), the sad value of (4,4), (0,4), (4,4), the sad value of these positions adopts the SAD of the corresponding whole pixel of step 5.
9. to qx=[-4,4], qy=[-4,4] calculate its cost value respectively.The pattern vector penalty value is set at motion vector and is (4*mv1.x+qx, 4*mv1.y+qy) corresponding coding penalty value.Choose the cost minimum (qx, qy) (4*mv1.x+qx is 4*mv1.y+qy) as the motion vector mv of 1/4 final pixel unit for Dui Ying motion vector.
When the enhancement layer image macro block is made estimation, be convenient narration, the basic layer of definition is the 0th layer, and the enhancement layer on it is the 1st layer; Enhancement layer on the 1st layer is the 2nd layer.And the like.During following method was described, the enhancement layer of supposing present encoding was the N+1 layer, when it is encoded, and will be with reference to the image of N layer.For finishing the coding to enhancement layer,
At first according to regulation and stipulation the residual error data of its reference layer is done up conversion, the gained image is ResidualUpSample
N[] [].
Shown in a kind of simplification of up conversion is achieved as follows:
1. find out the pixel coordinate of asking (x, y) coordinate of Dui Ying original graph (xRef16, yRef16).Wherein xRef16 and yRef16 are the value of 1/16 pixel precision, get its upper left angle point, namely adopt in the computational process and give up method;
2. computes integer point coordinates and phase place:
a)xRef=(xRef16>>4)-xOffset,yRef=(yRef16>>4)-yOffset;
b)xPhase=(xRef16-16*xOffset)%16,yPhase=(yRef16-16*yOffset)%16
Wherein xOffset and yOffset are the deviation coordinate figure in the upper left corner of enhancement layer image when being the partial reference tomographic image.
3. lateral interpolation:
tempPred[x][yRef]=(16-xPhase)*refSampleArray[xRef,yRef]+xPhase*refSampleArray[xRef+1,yRef];
4. vertical interpolation:
predArray[x][y]=(16-yPhase)*tempPred[x,yRef]+yPhase*tempPred[x,yRef+1]。
To the enhancement layer image macro block (the coding implementation method that its upper left corner coordinate is (x, y)) is as follows:
1. take out the motion vector mvA of four macro blocks of this enhancement layer macro block corresponding reference layer
N, mvB
N, mvC
N, mvD
NTo centered by it ± vector set in 4 scopes gets common factor, is defined as S
Mv
2. in S set
MvIn, with the sad value of 4 corresponding sports vectors of N layer storage and as predicated error, with 2 times of motion vector mv that are used as enhancement layer of the motion vector of correspondence.Utilize formula 1 to calculate the coding cost of corresponding various mv.Get the mv of coding cost minimum as the mv0 as a result of rough search
N+1
3. if occur simultaneously for empty, then directly use the mean value of reference layer motion vector as the motion vector mv0 of rough search
N+1This motion vector of sentencing the 16x16 piece is example, and enhancement layer image is twice for the size of its reference layer at horizontal longitudinal direction.Press the initial motion vectors mv0 of following formula estimation enhancement layer
N+1
mv0
N+1=(mvA
N+mvB
N+mvC
N+mvD
N)/4*2;
Be that four motion vectors are averaged divided by four in the following formula, multiply by 2 is the motion vectors that are scaled enhancement layer N+1 layer.
4. calculate the optimum movement vector of the inter prediction of enhancement layer according to basic layer step 5-9.Region1[48 wherein] [48] obtain from the reference picture of N+1 layer, mv0 adopts step 2 or 3 gained, Image[16] [16] be this tomographic image macroblock to be encoded Image
N+1[16] [16].This step obtains the interframe movement of N+1 layer and estimates optimum movement vector, and this motion vector is labeled as mv
N+1, be that reference picture is made interframe movement estimation, the optimum movement vector that obtains with the N+1 layer namely.Here need to prove when calculating the penalty value of different motion vector, also will consider the coding expense (constant) that the estimation of whether introducing the interlayer motion vector reduces.And then determine whether in coding, adopt this coding tools of interlayer motion-vector prediction.Because this coding tools does not influence content of the present invention, the present invention no longer describes in detail.
5. calculate the optimum movement vector of the inter prediction that uses inter-layer residue prediction instrument gained according to basic layer step 5-9.Region1[48 wherein] [48] obtain from the reference picture of N+1 layer, mv0 adopts step 2 or 3 gained, Image[16] [16] then use Image
N+1[16] [16] and Residual
N[16] difference of [16] is as macroblock image data to be encoded.Residual
N[16] [16] are N layer up conversion gained image, ResidualUpSample
NCorresponding this layer macroblock to be encoded Image in [] []
N+1[16] zone of [16].The optimum movement vector that this step obtains adopting the interframe movement of the N+1 layer of inter-layer residue prediction instrument to estimate is labeled as mv ' with this motion vector
N+1Here need to prove when calculating the penalty value of different motion vector, not only will consider the coding expense that the estimation of whether introducing the interlayer motion vector reduces, and then determine whether in coding, adopt this coding tools of interlayer motion-vector prediction.In addition, also to add the coding expense (constant) of introducing the inter-layer residue prediction instrument and introducing.Because these coding toolses do not influence content of the present invention, the present invention no longer describes in detail.
The interlayer motion-vector prediction refers to that motion vector can be used from the motion vector up conversion of its reference layer corresponding blocks and obtain when the coding amount of exercise.Like this, only need mark once during coding, thereby reduced the bit number of encoding motion vector.
6. compare mv
N+1And mv '
N+1The coding cost, select the motion vector mv of coding cost minimum as the final motion vector of this macro block.The sad value of record also corresponding do choice uses for further coding.
7. if this layer (N+1 layer) is top, then stop.If then according to said method continue the N+2 layer is encoded in the intermediate layer.
Fig. 3 shows according to a kind of equipment for scalable video enhancement layer estimation of the present invention, be used for to realize that above-mentioned (its upper left corner coordinate is the enforcement of the coding of (x, y)) to basic tomographic image macroblock encoding and to the enhancement layer image macro block.
Described equipment for scalable video enhancement layer estimation comprises following part: basic layer coarse movement search module is used for according to basic layer image to be encoded and basic layer reconstructed image the motion vector of basic layer being carried out rough search; Enhancement layer coarse movement search module is used for according to enhancement layer image to be encoded, reference layer motion search result and reference layer motion vector cost the motion vector of enhancement layer being carried out rough search; Whole pixel precision motion search/estimation module, the motion vector that utilizes basic layer coarse movement search module and enhancement layer coarse movement search module rough search to go out comes further computing, obtains the motion search/estimated result of whole pixel; / 4th pixel precision motion search/estimation module are utilized the result who puts in order pixel precision motion search/estimation module, and further computing obtains the motion search/estimated result of 1/4th pixels, and exports reconstructed image and the relevant parameter of this moment.
Described for each module in the equipment of scalable video enhancement layer estimation, can be according to the actual needs, finish respectively to basic tomographic image macroblock encoding and to the enforcement of enhancement layer image macroblock encoding, the concrete steps of realization also can be finished in different modules respectively according to actual needs.
In the process of search/estimation, above-mentioned whole pixel precision motion search/estimation module and 1/4th pixel precision motion search/estimation module are when being used for enhancement layer coding, can assessment whether use the inter-layer residue prediction instrument, to obtain the optimum movement vector of inter prediction.
The present invention expands the method for estimating of non-telescopic image sequence, thereby is applied in the coding of telescopic image.When basic layer is carried out estimation, adopt algorithm identical when encoding with non-telescopic image.To the estimation of enhancement layer the time, effectively utilized the motion estimation result of low-resolution image, thereby reduced the complexity of the estimation of enhancement layer.
For achieving the above object, patent of the present invention comprises following content:
1. a kind of scheme that is applied to the telescopic image interframe encode has been proposed.At first to the estimation of taking exercises of basic tomographic image, obtain the optimum motion vector of its coding.Then enhancement layer image is carried out estimation, calculate encoding scheme and corresponding motion vector between the optimum frame of enhancement layer coding.
2. reference layer is being carried out doing at last the fine search stage of estimation, the corresponding predicated error of motion vector of record reference layer correspondence, thereby when estimating enhancement layer motion vector, provide guidance to the rough estimate of the optimal motion vectors of enhancement layer correspondence.
3. when enhancement layer image being taken exercises estimation, the process of ignoring rough search, and the motion vector that the motion vector information that obtains with basic layer and prediction error value estimate the high-definition picture correspondence, and use for further fine search as result of rough search with this.
The condition that range of application of the present invention is not limited to limit in the implementation method, can be any minute several times of its reference layer as enhancement layer size (wide and high), the hunting zone of motion vector can be bigger, also can be applicable to a plurality of enhancement layers with a plurality of decoded pictures as the reference image.
It should be appreciated by those skilled in the art, the mode that method and apparatus of the present invention can adopt hardware, software or hardware and software to combine realizes by variety of ways such as microprocessor, digital signal processor, field programmable logic unit or gate arrays.
In sum, though the present invention with the preferred embodiment disclosure as above, yet it is not in order to limit the present invention.The general technical staff of the technical field of the invention without departing from the spirit and scope of the present invention, can do various changes and modification.Therefore, protection scope of the present invention is as the criterion when looking the scope that accompanying Claim defines.