MPEG-2 arrives the H.264 fast conversion method of sign indicating number
Technical field
The invention belongs to the video format conversion in the video compression coding field, especially MPEG-2 arrives the H.264 fast conversion method of sign indicating number.
Background technology
The development of Digital Television is extremely rapid, but vision bandwidth has fettered the expansion of digital video service.The digital TV video frequency video program adopts the MPEG-2 video compression standard more, and picture size is bigger, and code check is bigger.The bandwidth of vast digital cable customers is difficult to satisfy the real-time Transmission of the video flowing of the high code check of multichannel.This is particularly outstanding at mobile digital TV, interactive HDTV (High-Definition Television), Web TV.In order to make the user under the situation of lower bandwidth, can watch more digital television program smoothly, need to reduce the code check of video flowing.Add the restriction of storage volume and the appearance of various different digital television terminals, make digital cable customers the coded video bitstream technical need is more and more urgent efficiently.As CN1745573 image pick up equipment and moving picture photographing method thereof, the image pick-up device of under the moving picture photographing pattern, working, before wherein moving picture photographing begins, indicate by the shutter release button on key input part (12), the clock frequency of control section (10) is set to common frequencies, thereby reduce power consumption under the monitor state with extending battery life, and wherein, when the indication moving picture photographing begins, by clock conversion and control part (101) this clock frequency is significantly increased, thereby make during the motion picture data are carried out decoding processing, mpeg converter (7) can be stored yuv data by high speed access, reference data for example, the SDRAM of search data etc. (8), and can carry out Real Time Compression to motion picture.
CN1567271 possesses the MPEG code stream conversion acquisition method and the device of express network interface, data filter, the PID that realizes transport stream in equipment revises, information on services inserts and rate conversion, and equipment has the Fast Ethernet interface and is used for the object transmission after the conversion spread and delivers to computer.Realize the direct collection of code stream, also can handle code stream.CN1633180 comprises wanting encoded signals to implement conversion 1~n based on the multi-description video coding method of conversion and data fusion; Respectively the signal behind conversion 1~n is quantized and entropy coding; Respectively according to separately path 1~n to quantize and entropy coding after signal 1~n decode; Respectively decoded signal 1~n is carried out inverse transformation; Obtain the limit after the inverse transformation respectively and describe 1~n, the data fusion after 1~n the inverse transformation is become steps such as center description.It can combine the multiple description coded and video coding based on conversion and data fusion, and to one group of video sequence, this coding method can produce a plurality of MPEG code streams, can restore a video sequence that distortion is bigger from each code stream; When a plurality of code streams are received, the less video sequence of distortion will be reduced out.
H.264 be video encoding standard of new generation by joint video team ISO/IEC MPEG and the common exploitation of ITU-T VCEG.Under the prerequisite that obtains the identical image quality,, H.264 can save the bit rate about 50% with respect to other standard such as MPEG-2.H.264 superior coding efficiency, the transcoding research that makes MPEG-2 arrive H.264 is real in working as needing of affair.But encryption algorithm H.264 is significantly different with having of MPEG-2, makes transcoding process more complicated more than other transcoding.
Present transcoding framework mainly contains two kinds: based on the cascade system transcoding (CPDT) of pixel domain with based on the transcoding (DDT) in DCT territory.First based on the cascade system transcoding of pixel domain with MPEG-2 video flowing complete decoding, H.264 encode again.This structure has very big flexibility, can be between different bit rates, frame per second, image resolution ratio, coding mode transcoding.But computation complexity height.Based on the transcoding (DDT) in DCT territory directly in the DCT territory to revaluation such as DCT coefficient, motion vectors, computation complexity is low, but flexibility is restricted, and when requiring to change motion vector, code check, resolution etc., just is difficult to adopt this architecture.
H.264 inter prediction is to utilize the predictive mode of encoded video frame/field and block-based motion compensation.Be the use (1/4 pixel precision MV is adopted in brightness) of piece size range wider (from 16 * 16 to 4 * 4), sub-pix motion vector and utilization of multi-reference frame or the like with the difference of standard inter prediction in the past.
Committed step comprises:
1. macroblock partition
H.264 adopted the motion compensation of tree, promptly each macro block (16 * 16 pixel) can 4 kinds of modes be cut apart: one 16 * 16, and two 16 * 8, two 8 * 16, four 8 * 8.Its motion compensation also should have four kinds mutually.And each sub-macro block of 8 * 8 patterns can also four kinds of modes be cut apart: one 8 * 8, and two 4 * 8 or two 8 * 4 and 44 * 4.These are cut apart and sub-macro block has improved relevance between each macro block greatly.
Each is cut apart or sub-macro block all has an independently motion compensation.Each MV must be encoded, transmit, and the selection of cutting apart also need be encoded in the compression bit stream.For big cut size, MV selects and cuts apart type only to need a spot of bit, but motion compensated residual will be very high at many details area energy.It is low that small size is cut apart the motion compensated residual energy, but need more bit sign MV and cut apart selection.The selection of cut size has influenced compression performance.Generally speaking, big cut size is fit to flat site, and small size is fit to many details area.
The chromatic component of macro block (Cr and Cb) then is half (level with vertical each half) of corresponding bright.Chrominance block adopts and luminance block is same cuts apart pattern, is size reduce by half (level and vertical direction all reduce by half).The MV of chrominance block reduces by half by corresponding bright MV level and vertical component to get.
2.RD optimize
H.264 encoder has adopted the coding controlling models based on the Lagrange optimized Algorithm, determines the macroblock encoding pattern, as division type, motion vector and the quantization parameter etc. of macro block.Its coding efficiency is greatly improved with respect to all coding standards in the past.
Different with SAE, the RD optimized Algorithm is selected macro-block coding pattern based on the Lagrange function, by the bit rate and the distortion of once encoding and once decoding calculates each macro block, select to make the coding mode of Lagrange cost function minimum as this macroblock encoding pattern.The major defect of this method is that computation complexity is very high, but the result can reach optimum RD performance.In actual applications, particularly in the real-time transcoding system, use the amount of calculation of Lagrange optimized Algorithm too big, can not finish real-time transcoding.
Machine learning is by study and analyze data, and the statistical value that obtains under the algorithms of different solves practical problem.Be widely used in different fields, as the object identification of search engine, medical diagnosis, stock analysis, dna sequence dna classification, voice and handwritten word identification, computer vision, robot motion or the like.
Summary of the invention
The present invention seeks to: arrive H.264 transcoding transit code efficient deficiency at MPEG2, can't reach real-time conversion, a kind of new macroblock prediction method based on machine learning is provided.Especially MPEG-2 arrives the H.264 fast conversion method of sign indicating number.The object of the invention also is: propose the cascade pixel domain code conversion algorithm based on machine learning.Utilize H.264 Macroblock Mode Selection and the correlation between the MPEG-2 motion compensated residual, general H.264 Macroblock Mode Selection problem is converted into the data qualification problem.The motion compensated residual, MB pattern, the coded block pattern (CBPC) that utilize the MPEG-2 decoding to obtain are mapped directly to macro block mode H.264, greatly reduce the transcoding complexity, have guaranteed the flexibility of transcoding simultaneously.
The technology of the present invention solution is: MPEG-2 arrives the H.264 fast conversion method of sign indicating number, utilize H.264 Macroblock Mode Selection and the correlation between the MPEG-2 motion compensated residual, H.264 the selection of macro block mode is converted into data qualification, it is characterized in that: the motion compensated residual, MB pattern, the coded block pattern (CBPC) that utilize the MPEG-2 decoding to obtain are mapped directly to macro block mode H.264; When the MPEG-2 sign indicating number is decoded, preserve relevant MB information, comprise that (sub-MB with 4 * 4 calculates respectively for the average of MB coding mode, encoding block type (CBPC), MB residual error and variance, totally 16 averages and variance), H.264 the encoder of decoding back employing standard is to the YUV image encoding, and preserve H.264MB coding mode, and adopt machine learning algorithm to obtain decision tree, be used for the H.264 classification of coding mode; The method that obtains decision tree is that decision tree classification should be followed principle:
1) list entries is divided into the grader of Intra, Skip, Inter 16 * 16 and Inter 8 * 8;
2) Inter 16 * 16 is divided into 16 * 16,16 * 8,8 * 16 grader;
3) inter8 * 8 are divided into 8 * 8,8 * 4,4 * 8,4 * 4 grader;
Decision tree generates should follow principle:
1) if the MC of MPEG-2MB does not encode, promptly do not have non-zero MV, 48 * 8 do not have code coefficient, H.264MB will be encoded into 16 * 16, need to differentiate by the decision tree secondary, select optimization model;
2) if MPEG-2MB is the intra pattern, then in H.264, this MB is encoded into intra or inter 8 * 8, if be encoded into intra, algorithm stops; If inter8 * 8 need to select optimization model by the secondary judgement;
3) if MPEG-2MB is the skip pattern, in H.264, this MB also is the skip pattern.
When the MPEG-2 code stream decoding, obtain MC residual error, the macro block mode of MPEG-2, and calculate the average and the variance of 4 * 4 sub-piece MC residual errors; Macro-block coding pattern in obtaining H.264 by decision tree; When H.264 encoding, to the coding mode indirect assignment of MB; H.264 encoder be input as decoded yuv data of MPEG-2 and MB coding mode, do not use the motion vector of MPEG-2, when estimation, use the MB coding mode that obtains by decision tree.
The present invention has realized that by the method for machine learning the MPEG-2 of low complex degree arrives transcoding H.264.When MPEG-2 decodes, preserve relevant MB information, comprise the average of MB coding mode, encoding block type (CBPC), MB residual error and variance (sub-MB with 4 * 4 calculates respectively, totally 16 averages and variance).H.264 the encoder of decoding back employing standard is to the YUV image encoding, and preserves H.264MB coding mode.Based on MPEG-2MB data and relevant H.264MB coding mode, adopt machine learning algorithm to obtain decision tree, be used for the H.264 classification of coding mode.Fig. 3 arrives the H.264 generation block diagram of transcoding decision tree for MPEG-2.
Description of drawings
Fig. 1 macro block and sub-macroblock partition
Fig. 2 is the RD optimized Algorithm H.264
H.264 Fig. 3 MPEG-2 arrives, and the decision tree of transcoding generates block diagram
Fig. 4 video code translator decision tree
Fig. 5 transcoder theory diagram
Embodiment
The present invention realizes with following method:
1. the generation of decision tree
Decision tree generates branch and node by analyzing a series of sample datas.Node is represented variable, and the variate-value that branch expresses possibility.When the more than one deck of decision tree, node is just represented the decision-making of making based on different variable.In the data qualification process, node presentation class, branch are represented the feature foundation of identification and classification.By decision tree, the sample of input can be divided into a class wherein.
Decision tree can generate by the WEKA Data Mining Tools.The file format of the data mining program of WEKA is ARFF (Attribute-Relation File Format).An ARFF file adopts American Standard Code for Information Interchange to write, and reflects one group of correlation between attribute.Generally comprise two different sections: 1) file header comprises title, attribute and the type of relation; 2) data.
Training set is made up of the MPEG-2 sequence of high code check, does not comprise the B frame.Decision set by the MPEG-2 code stream decoding after, H.264 recompile obtains.In cataloged procedure H.264, quantization parameter is 25, uses RD to optimize and obtains macro-block coding pattern.A large number of experiments show that the image-region of good training set details from smooth to high all has distribution.The sample preface is for example spent or football preferably.Final objective generates single decision tree exactly, can be to any MPEG-2 video code conversion.
Fig. 4 is for having set up the described decision tree of Fig. 3.The transcoding decision tree comprises Three Estate, adopts 3 different WEKA trees:
1) list entries is divided into the grader of Intra, Skip, Inter 16 * 16 and Inter 8 * 8;
2) Inter 16 * 16 is divided into 16 * 16,16 * 8,8 * 16 grader;
3) inter 8 * 8 is divided into 8 * 8,8 * 4,4 * 8,4 * 4 grader.
First WEKA decision tree, training dataset has used average and variance, macro block mode (skip, intra and 3 kinds of non-intra are respectively with 0,1,2,4,8 signs), coded block pattern (CBPC) and the coding mode H.264MB of 16 4 * 4 sub-piece residual errors in macro block of MPEG-2.The attribute definition of ARFF head part is as follows:
@RELATION?mean-variance_4x4
@ATTRIBUTE?mean0?Numeric
@ATTRIBUTE?variance0?Numeric
@ATTRIBUTE?mean1?Numeric
@ATTRIBUTE?variance1?Numeric
............................................
@ATTRIBUTE?mean15?Numeric
@ATTRIBUTE?variance15?Numeric
@ATTRIBUTE?mode_mpeg2{0,1,2,4,8}
@ATTRIBUTECBPC0{0,1}
............................................
@ATTRIBUTE?CBPC6{0,1}
@ATTRIBUTE?class{0,1,8,9}
The capable sample of the example of ARFF data segment is used to train decision-tree model, and delegation represents a macro block sample.
Second decision tree, training sample set has used the average of 16 4 * 4 sub-piece residual errors in macro block of MPEG-2 and variance, macro block mode (3 kinds of non-intra), coded block pattern (CBPC) and 16 * 16 sub-coding mode (16 * 16 H.264MB, 16 * 8,8 * 16).This decision tree has determined the final coding mode of inter 16 * 16.
The 3rd decision tree, training sample set has used the average of 44 * 4 sub-piece residual errors in macro block of MPEG-2 and variance, macro block mode (3 kinds of non-intra), coded block pattern (CBPC) and 8 * 8 the sub-coding mode (8 * 8 of MB H.264,8 * 4,4 * 8,4 * 4).
Based on these training files, use the J48 algorithm to generate decision tree by the WEKA Data Mining Tools.The J48 algorithm is proposed by Ross Quinlan, has a wide range of applications in the data mining field.
2. based on the classification of decision tree
MPEG-2 has used 16 * 16 motion compensation (MC), and whole sub-picture does not have complete decorrelation on time domain.By the residual error of MC, can reflect macro-block coding pattern H.264.The average of the Data Mining Tools WEKA analysis of MPEG-2 macro block residual error that use is increased income and variance, coding mode, encoding block type (CBPC) are obtained H.264 macro-block coding pattern.The decision tree of this transcoder as shown in Figure 4.
This decision tree comprises 3 WEKA decision trees, identifies with grey in Fig. 4.First WEKA decision tree is used to differentiate skip, Intra, 8 * 8,16 * 16 patterns, if 8 * 8 patterns or 16 * 16 patterns, then uses second or the 3rd decision tree to adjudicate the final pattern of this MB.Calculate the decision level of average and variance in the decision tree by the WEKA instrument.The work of decision tree is as follows:
Node 1: that import this node is MPEG-2 coding MB.By detecting the residual error size of MPEG-2MB, the coded system of MB is divided into 4 classes: skip, Intra, 8 * 8 or 16 * 16.The Intra decision process is not discussed in patent, and other situations need to carry out the decision-making classification second time according to the classification situation of front.When generating decision tree, will use following rule:
1) if the MC of MPEG-2MB does not encode, promptly do not have non-zero MV, 48 * 8 do not have code coefficient.H.264MB will be encoded into 16 * 16.Need to differentiate, select optimization model by the decision tree secondary.
2) if MPEG-2MB is the intra pattern, then in H.264, this MB is encoded into intra or inter8 * 8.If be encoded into intra, algorithm stops; If inter8 * 8 need to select optimization model by the secondary judgement.
3) if MPEG-2MB is the skip pattern, in H.264, this MB also is the skip pattern.
Node 2: importing this node is the 16 * 16MB that is told by node 1, and this node is with second WEKA decision tree, to the H.264 pattern of MB (16 * 16,16 * 8 or 8 * 16) classification.Detecting 16 * 8 or 8 * 16 sub-pieces and whether generate better prediction, is 16 * 8 or 8 * 16 if differentiate, and then is final coding mode, otherwise, will continue to differentiate by node 4.
Node 3: the 8 * 8MB that tells by node 1 that imports this node.This node is with the 3rd WEKA decision tree, 8 * 8 sub-macro blocks H.264 selected optimization models: 8 * 8,8 * 4,4 * 8,4 * 4.This decision tree is carried out 4 times, respectively 48 * 8 sub-pieces in the macro block is differentiated once, and this part is only used 44 * 4 average and variance in 8 * 8 sub-pieces.
Node 4: what import this node is skip mode block of being told by node 1 or 16 * 16 mode blocks of being told by node 2.This node is estimated H.264 16 * 16 patterns (not comprising 16 * 8 and 8 * 16 patterns), and selecting optimization model is skip or inter 16 * 16.
The judgement of MB pattern and the selection of threshold value determine that by quantization parameter (QP) H.264 along with the difference of QP, the threshold value of average and variance is also different.Solve this situation two kinds of methods can be arranged: 1) each QP is generated a decision tree, when H.264 encoding,, select corresponding decision trees according to used QP value; 2) only generate a decision tree, adjust the thresholding of average and variance according to the QP value.For first method, in a transcoder, need to generate 52 different decision trees, and each needs 3 WEKA decision trees, therefore need 156 WEKA decision trees altogether.In H.264, QP value and quantization step have certain relation, the every increase by 6 of QP, and quantization step doubles, and therefore can adjust the threshold value of average and variance by this relation.In this transcoder, adopted second method.Generated QP and be 25 decision tree, other QP values can realize by adjusting threshold level.When QP increased by 6, threshold value improved 2.5%, otherwise reduces by 2.5%.
The beneficial effect of patent of the present invention is that the complexity ratio of transcoder is much lower with reference to transcoder (MPEG-2 decoding+H.264 encode) complexity, and transcoder can both obtain good performance under different code checks and resolution.Because time that decoding consumed of MPEG-2 is identical,, can obtain the performance comparison of two kinds of structure transcoders by more H.264 scramble time and PSNR.
The theory diagram of transcoder as shown in Figure 5.In the MPEG-2 code stream decoding, obtain relevant information, comprise MC residual error, macro block mode, the coded block pattern (CBPC) of MPEG-2, and calculate the average and the variance of 4 * 4 sub-piece MC residual errors.Macro-block coding pattern in obtaining H.264 by decision tree.When H.264 encoding, to the coding mode indirect assignment of MB.H.264 encoder be input as decoded yuv data of MPEG-2 and MB coding mode, do not use the motion vector of MPEG-2, when estimation, use the MB coding mode that obtains by decision tree.