Background technology
In the application of video technique, the implementation efficiency of encoder is the emphasis of paying close attention to always, uses for the monitoring of real-time coding especially, and encoder is the target of pursuing always efficiently.Present development trend, video image resolution is increasing, and the video way is more and more, requires encoder not only will handle SD, also will handle high definition; Not only to realize one road coding, also need realize multiplex coding.Like this, will when the realization of encoder, add serial fast processing algorithm.
The scalable coding technology, actual is a kind of multi-layer coding technology, also is a kind of multiplex coding technology, need in same encoder, carry out the image encoding of different resolution.In the image encoding process of upper strata, can adopt with the layer coded message as a reference, be called the layer in the prediction; Also can adopt the reference layer image information as a reference, become inter-layer prediction.
In the end of the year 2007, the joint video expert group of MPEG and VECG has formulated the scalable video compression point coding standard on the basis of standard H.264/AVC, be called H.264/SVC (hereinafter to be referred as SVC), and this standard is adopted by ISO, becomes international standard.SVC is a multi-layer video compressed encoding standard, and each layer correspondence the input of one road video sequence.The build-in attribute of video sequence comprises resolution sizes (like CIF, QVGA, 720P etc.), frame per second (30 frame/second).In order to improve compression performance, SVC adopts the inter-layer prediction technology, and promptly current layer can obtain prediction data data as a reference from reference layer in coding.In standard, the present encoding layer is called enhancement layer, and reference layer is called basic layer.Different according to enhancement layer and basic layer input video sequence, comprise three kinds of scalable scheme among the SVC: quality scalable, flexible time domain and spatial domain are scalable.Under the quality scalable situation, enhancement layer is all identical with resolution, the frame per second of basic layer input video sequence, but basic relatively layer, the enhancement layer behind the coding has higher fidelity, promptly higher picture quality; Under the flexible time domain situation, enhancement layer is identical with the resolution of basic layer input video sequence, and frame per second is different, basic relatively layer, and the frame per second of enhancement layer input video sequence is higher; Under the scalable situation in spatial domain, enhancement layer is different with the resolution of basic layer input video sequence, and frame per second is identical, basic relatively layer, and the resolution of enhancement layer input video sequence is big (also can be identical) more.To three kinds of scalable scenes, SVC adopts quality scalable technology, technological, the scalable technological three kinds of corresponding compression coding technologies in spatial domain of flexible time domain respectively, reduces the redundant information of enhancement layer and basic interlayer, to improve encoding compression efficient.The code stream that the SVC coding produces can be suitable for heterogeneous environment to high in efficiency and convenience, such as the totally different heterogeneous network of bandwidth, the display device that resolution sizes is different, the consumption terminal that the disposal ability power differs.
The scalable technology in spatial domain is the different and formulation of the resolution sizes to current layer and reference layer input video sequence.The scalable technology in spatial domain is in when coding, every frame video image is divided into more the lower Item unit encodes, and this more lower Item unit is called encoding block.In a certain encoding block of encoding enhancement layer, can obtain the needed reference data of coding through the data of enhancement layer itself, be called prediction in the layer; Also can obtain the required reference data of coding, be called inter-layer prediction through the data of basic layer.Different based on the mode of obtaining reference data, prediction also comprises two big classifications in the layer: infra-frame prediction and inter prediction.Infra-frame prediction is meant the prediction mode of reference data from same picture frame; Inter prediction is meant the prediction mode of reference data from other picture frame (frame or two frames).In SVC, infra-frame prediction has the prediction of 4 * 4 block sizes, comprises pattern in 9, such as lateral prediction, vertical prediction etc., also has the prediction of 8 * 8,16 * 16 block sizes in addition, all comprises various modes; The pattern of inter prediction is more, and the P frame prediction of forward direction reference is arranged, and the B frame prediction of two-way reference etc. is also arranged, and every kind can also be divided into 7 kinds of different patterns such as 16 * 16,16 * 8 according to block size.In the inter-layer prediction situation, SVC limits, and dual mode is arranged, interlayer infra-frame prediction, interlayer inter prediction.When the corresponding encoding block of basic layer adopted intraframe coding method, the encoding block of enhancement layer can adopt the interlayer infra-frame prediction; When the corresponding encoding block of basic layer adopted inter-frame encoding, the encoding block of enhancement layer can adopt the interlayer inter prediction.Different according to information of forecasting, the interlayer inter prediction comprises interlayer movable information (motion vector, reference key etc.) prediction and inter-layer residue information prediction.It is thus clear that in the SVC standard, the coding mode of encoding block is of a great variety.But after the coding, the only corresponding a kind of coding mode of each encoding block, encoder task greatly are exactly from miscellaneous coding mode, to select the optimal coding mode of current macro.
In the prior art, it is very important part during encoder is realized that coding mode selection method is selected.Superior model selection scheme can effectively be selected suitable coding mode fast from miscellaneous coding mode, so just largely reduced the complexity of coding, improves coding rate, can also guarantee the encoding compression performance simultaneously.
Nearly 2 years of SVC standard is just formulated, and the realization of SVC still is in the stage of just having launched at present, and present model selection scheme is the direct performance of prediction and inter-layer prediction in the layer relatively, and selection excellent performance person is as last coding mode.
The scalable technology for encoding mode selecting method in a kind of spatial domain can be as shown in Figure 1 in the prior art.Concrete, be the performance specification parameter P_EL of computation layer inner estimation mode, and calculate the performance specification parameter P_BL of inter-layer prediction mode, through comparing P_EL and P_BL, select the superior last coding mode of conduct of performance characterising parameter.If the performance specification parameter P_BL of inter-layer prediction mode greater than the performance specification parameter P_EL of layer inner estimation mode, then selects layer internal schema; Otherwise, if the performance specification parameter P_BL of inter-layer prediction mode then selects the interlayer pattern less than the performance specification parameter P_EL of layer inner estimation mode.Certainly, the performance specification parameter has many kinds, such as the rate distortion value (RD) of encoding block, the absolute difference of encoding block with (SAD) etc.Special, relatively the time, P_EL and P_BL are same types, or are SAD entirely, or are RD entirely.These parameters are the measurements to code efficiency, and its size just can be reacted the quality of coding efficiency.Therefore, prior art is confirmed final coding mode through these two performance characterising parameters that compare layer inner estimation mode and inter-layer prediction mode.
In research and practice process to prior art, the inventor finds to exist in the prior art following problem:
The coding mode selection method of art methods inevitably all need calculate the performance specification parameter of layer inner estimation mode and inter-layer prediction mode; Compare again; Like this, this computational process can be introduced extra computational complexity, thereby reduces the speed of encoder.
Embodiment
The embodiment of the invention provides coding mode selection method and device in a kind of layered video coding.
In order to make those skilled in the art person understand the present invention program better, the embodiment of the invention is done further to specify below in conjunction with accompanying drawing and execution mode.
The application is to the model selection among the SVC; Propose in a kind of layer or the selection scheme of interlayer movable information predictive mode; In the selected layer of the encoding block of current layer behind the coding mode; Fast and effeciently judge whether to select interlayer movable information predictive mode, thereby reduce the complexity that the interlayer coding mode is selected.
SVC is the scalable video coding standard, supports three kinds of scalable scheme: flexible time domain property, and spatial domain scalability, and quality scalability, this patent is fit to the spatial domain scalability.Fig. 2 is the sketch map of spatial domain scalability, and enhancement layer image passes through cutting, dwindles, and obtains basic tomographic image, and the enhancement layer dashed region is exactly the zone of basic tomographic image correspondence in enhancement layer.Require W among the SVC
c>=W
b, H
c>=H
bW
b, H
bBe the width and the height of basic tomographic image; W
e, H
eBe the width and the height of enhancement layer image; W
c, H
cBe width and the height of basic layer, promptly only be in W at the enhancement layer counterpart
c, H
cIn encoding block just have inter-layer prediction mode; (x0 is the upper left corner of basic layer correspondence image in enhancement layer y0), is used for confirming the correspondence image position.
In order to improve the encoding compression performance, a lot of coding modes are provided in the standard, can delamination inner estimation mode and inter-layer prediction mode two big classifications.Layer inner estimation mode is consistent with the coding mode H.264/AVC, and inter-layer prediction mode is to utilize the data of reference layer to obtain the mode of prediction data, is distinctive among the SVC.The application is on the basis of confirming layer inner estimation mode, further judges whether to take interlayer movable information predictive mode.
The flow process of coding mode selection method embodiment can be as shown in Figure 3 in the application's layered video coding, comprising:
S310: obtain the movable information that the enhancement layer coding piece is selected layer inner estimation mode.
The movable information of securing layer internal schema is meant from numerous layer inner estimation modes and selects to confirm suitable layer internal schema, and adopts this pattern-coding, the movable information of acquisition.
Movable information comprises motion vector and reference key in video coding.
As shown in Figure 4, when the current block of present frame mapped on the reference frame, the current block of mapping not necessarily overlapped with match block position on the reference frame, has and departs from, and this just departs from and representes with motion vector.The content of adjacent image frame is very close in the video.In order to improve code efficiency, can adopt predictive coding, match block in encoding block and the reference frame is carried out difference, obtain difference numerical, promptly residual error like this, only need be handled residual error in next code.But two two field pictures are taken after all certain intervals is arranged, motion has taken place in object in the image in this spacer segment, and the object on the present frame directly corresponds to above the reference frame, and object can not overlap, but certain deviation is arranged, and is promptly represented by motion vector.Arrow among Fig. 4 is represented motion vector, and it is a two-dimensional array, for example is expressed as that (x, y), x, y represent horizontal and vertical motion respectively.
When obtaining motion vector, in the hunting zone of reference frame, select best matching blocks, make performance parameter minimum.The reference key scope is the coding Control Parameter, and encoder can be set in advance.Here performance parameter is to weigh the parameter of matching degree, can be SAD, also can be SSD (Sum of squared difference, variance with) etc.After the setting search scope, further how confirming to search for a little is the specific coding strategy of encoder, and such is tactful of a great variety; The most intelligible a kind of be in the hunting zone, to search for; For each search point, obtain performance parameter, select performance parameter minimum again.
Reference key is used for indicating match block which reference frame in front.Confirm and the motion vector of reference key confirm that mode is similar, can be through in different reference frames, searching for, thereby choose its performance good obtain reference key.
S320: on the basis of layer inner estimation mode movable information, obtain the movable information that the enhancement layer coding piece is selected inter-layer prediction mode.
Obtaining the enhancement layer coding piece and select the movable information of interlayer pattern, can be the inter-layer prediction movable information from the corresponding encoding block extracting data enhancement layer coding of basic layer.
Macro block has different prediction modes when doing inter prediction.Shown in Figure 5 is inter-frame forecast mode, also is interframe layer inner estimation mode, has 7 kinds, varies in size according to piecemeal, can not be divided into 16 * 16,16 * 8,8 * 16,8 * 8,8 * 4,4 * 8 and 4 * 4.Under every kind of pattern, macroblock partitions becomes different piecemeals, and is corresponding with movable information, promptly corresponding to one group of motion vector and reference key.Such as 16 * 8 patterns, have 2 piecemeals, each piecemeal has its corresponding motion vector and reference key.
Under the situation of the selected layer of SVC encoder inner estimation mode, a kind of among Fig. 5 just will further obtain the inter-layer motion prediction information of each encoding block, can carry out according to following steps:
A1: search the corresponding blocks of definite enhancement layer coding piece in basic layer.Can obtain the corresponding blocks in the basic layer by pixel coordinate (1, the 1) position calculation of each enhancement layer coding piece.Encoding block is a rectangle, and each rectangular block comprises some pixels, and these pixels are used coordinate representation, and the point in the upper left corner is commonly defined as (0,0) position.Computational process is according to formula (1), (2).Comprise (B in the basic layer
x, B
y) 4 * 4 of the point corresponding blocks that is exactly the enhancement layer coding piece in basic layer.
Ex, Ey are the enhancement layer pixels positions, and Bx, By are that (S is a computational accuracy to the enhancement layer pixels point, generally gets 16 for Ex, the Ey) correspondence position in basic layer, and round () rounds calculating, for example round.
A2: obtain the movable information of corresponding blocks in the basic layer, this movable information comprises motion vector and reference index information.
Reference index information in the movable information of aforementioned definite basic layer corresponding blocks is selected the reference index information of inter-layer prediction mode as the enhancement layer coding piece.
A3: the motion vector of layer corresponding blocks is selected the motion vector of inter-layer prediction mode through behind the convergent-divergent as the enhancement layer coding piece basically.
Can be with the motion vector of basic layer corresponding blocks through behind the convergent-divergent, select the motion vector of inter-layer prediction mode as the enhancement layer coding piece, promptly select the motion vector of inter-layer prediction mode as the enhancement layer coding piece.The convergent-divergent formula is suc as formula shown in 3:
Wherein Mv_EL is the motion vector behind the convergent-divergent, and Mv_BL is a motion vector before the convergent-divergent.Mv_BLx, Mv_BLy are basic layer corresponding blocks motion vectors, and x representes horizontal component, and y representes vertical component.Mv_ELx, Mv_ELy are motion vectors behind the convergent-divergent, are exactly the motion vector in the inter-layer prediction mode movable information.
S330: whether the movable information of selecting through the enhancement layer coding piece of judging layer internal schema and the predicted motion information of interlayer pattern the consistent coding mode of selecting.
On the basis of two steps, whether consistent with the predicted motion information of interlayer pattern in front through the movable information of inspection layer internal schema, select final coding mode.
Concrete, if both are consistent, then adopt interlayer movable information predictive mode to encode, otherwise adopt layer internal schema to encode.
In the SVC coding, the information of presentation code piece has movable information and residual information.Movable information is used for indicating the corresponding prediction data of encoding block, and residual error then is the difference of encoding block and prediction data.The prediction of interlayer movable information is a kind of inter-layer prediction mode among the SVC, utilizes this pattern can effectively reduce the bit number that movable information is represented, but does not obviously promote for the expression of residual error.The application avoids in the time of model selection, introducing the technology of performance parameter; Only relatively whether the interlayer pattern is consistent with the movable information of layer internal schema; Select coding mode, reduce amount of calculation, and; Select in interlayer pattern generation inter-layer prediction mode and the layer and mode motion information when consistent, also guaranteed the image encoding quality.
Below give an example said method embodiment is explained.
As shown in Figure 6, in the telescopic two-layer coding in spatial domain, enhancement layer is 2 with the ratio of basic layer resolution.This object lesson is realized according to step as shown in Figure 7:
S710: obtain the movable information of A1 layer inner estimation mode, motion vector is Mv_A1, and reference key is RIdx_A1.
As previously mentioned, this also when obtaining motion vector, in the hunting zone of reference frame, selects best matching blocks, so that performance parameter is minimum.The reference key scope is the coding Control Parameter, and encoder can be set in advance.Here performance parameter is to weigh the parameter of matching degree, can be SAD or SSD etc.
A1 is the encoding block of enhancement layer, and its layer internal schema selects to confirm as 16 * 16 patterns, and the motion vector that for example obtains according to aforesaid way is Mv_A1, and reference key is RIdx_A1.
S720: on the basis of A1 layer inner estimation mode movable information, obtain the movable information of inter-layer prediction mode, motion vector is MvPred_A1=2*Mv_A0, and reference key is RIdxPred_A1=RIdx_A0.
Here need to prove that basic layer is a relative concept with enhancement layer, such as totally three layers situation, is a layer A0 from the bottom up, layer A1, layer A2.The reference layer of layer A2 is layer A1, and the reference layer of layer A1 is layer A0, and layer A1 opposite layer A0 is an enhancement layer so, and layer A1 opposite layer A2 is basic layer.Though A0 is in lower floor, also be an encoding block, also need coding, movable information is also arranged.
The reference block that can calculate the pairing basic layer of A1 according to aforementioned (1), (2) formula is A0, and A0 is that size is 8 * 8 piece.After A0 encoded, its motion vector was Mv_A0, and reference key is RIdx_A0.
In the prediction of interlayer movable information; Behind the motion vector process convergent-divergent with basic layer corresponding blocks; Select the motion vector of inter-layer prediction mode as the enhancement layer coding piece; Layer encoding block that be enhanced selects the motion vector of inter-layer prediction mode and prediction index to be respectively: MvPred_A1=2*Mv_A0, and RIdxPred_A1=RIdx_A0, wherein 2 is the zoom factor of the basic layer of enhancement layer.
S730: whether motion vector, reference key through judging layer internal schema be consistent with motion vector, the reference key of interlayer pattern, if consistently select interlayer movable information predictive mode, if inconsistent then select a layer interior coding mode for use.
At this moment, if MvPred_A1=Mv_A1, and RIdxPred_A1=RIdx_A1, then judge this encoding block selection interlayer movable information predictive mode; =RIdx_A1 then selects a layer interior coding mode for use.
The coding mode selection method of prior art, do not consider inter-layer prediction mode with the layer inner estimation mode data relationship, both performance parameters of double counting compare again, this has produced waste in computing.And the foregoing description avoids in the time of model selection, introducing the technology of performance parameter, and only relatively whether the interlayer pattern is consistent with the movable information of layer internal schema, selects coding mode, reduces amount of calculation, also guaranteed the image encoding quality.
Below introduce coding mode choice device embodiment in a kind of layered video coding of the application, this device embodiment can be as shown in Figure 8, comprising:
First acquiring unit 81 is used to obtain the movable information that the enhancement layer coding piece is selected layer inner estimation mode;
Second acquisition unit 82 is used on the basis of layer inner estimation mode movable information, obtaining the movable information that the enhancement layer coding piece is selected inter-layer prediction mode;
Selected cell 83; Whether the movable information through judging layer internal schema and the predicted motion information of interlayer pattern the consistent coding mode of selecting; If the movable information of layer internal schema is consistent with the predicted motion information of interlayer pattern; Then adopt interlayer movable information predictive mode to encode, otherwise adopt layer internal schema to encode.
Preferably, said device embodiment can be as shown in Figure 9, and wherein said second acquisition unit 82 comprises:
Search unit 821, search the corresponding blocks of layer inner estimation mode in basic layer that said enhancement layer coding piece is confirmed;
The 3rd acquiring unit 822, the movable information of corresponding blocks in the basic layer of acquisition, this movable information comprises motion vector and reference index information;
Unit for scaling 823 is used for motion vector with basic layer corresponding blocks through behind the convergent-divergent, selects the motion vector of inter-layer prediction mode as the enhancement layer coding piece.
Preferably, said movable information comprises motion vector and reference key, and described enhancement layer coding block size is 16 * 16, a kind of in 16 * 8,8 * 16,8 * 8,8 * 4,4 * 8,4 * 4.
Preferably, the corresponding blocks of definite enhancement layer coding piece in basic layer searched in the said unit 821 of searching, and specifically can comprise:
By (1,1) position of each enhancement layer coding piece, calculate acquisition (B through following formula (1), (2)
x, B
y):
To comprise (B in the basic layer
x, B
y) 4 * 4 of point confirm as the corresponding blocks of enhancement layer coding piece in basic layer.
Preferably, the 3rd acquiring unit 822 obtains the movable information of corresponding blocks in the basic layer, specifically can comprise:
Reference index information in the movable information of aforementioned definite basic layer corresponding blocks is selected the reference index information of inter-layer prediction mode as the enhancement layer coding piece.
After can the motion vector of basic layer corresponding blocks being passed through following formula (3) convergent-divergent, select the motion vector of inter-layer prediction mode as the enhancement layer coding piece:
Wherein Mv_EL is the motion vector behind the convergent-divergent, and Mv_BL is a motion vector before the convergent-divergent.Mv_BLx, Mv_BLy are basic layer corresponding blocks motion vectors, and x is a horizontal component, and y is a vertical component.Mv_ELx, Mv_ELy are motion vectors behind the convergent-divergent, are exactly the motion vector in the inter-layer prediction mode movable information.
Preferably, when first acquiring unit 81 obtains the movable information of enhancement layer coding piece selection layer inner estimation mode, can in the hunting zone of reference frame, select best matching blocks, make performance parameter minimum.
So described the embodiment of the invention through embodiment, those of ordinary skills know, the present invention has many distortion and variation and do not break away from spirit of the present invention, hope that appended claim comprises these distortion and variation and do not break away from spirit of the present invention.