CN106162167B

CN106162167B - Efficient video coding method based on study

Info

Publication number: CN106162167B
Application number: CN201510137157.3A
Authority: CN
Inventors: 张云; 朱林卫
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2015-03-26
Filing date: 2015-03-26
Publication date: 2019-05-17
Anticipated expiration: 2035-03-26
Also published as: CN106162167A

Abstract

Efficient video coding method based on study uses efficient video coding device encoded video sequence, extracts the corresponding feature vector of each coding unit block；The learning machine that the feature vector of extraction and forced coding unit size are inputted to the output of three values, establishes learning model.Early-abort strategy structure is added in the selection course that coding unit size is carried out in efficient video coding device, and Direct Model and fusion mode current block is first carried out, and extracts the corresponding feature vector of corresponding present encoding.Feature vector is inputted to the learning machine model succeeded in school, exports predicted value, executes current coded unit size according to corresponding early-abort strategy structure.Until coding unit layer all in coding tree unit all encodes completion；It repeats until coding tree unit all encodes completion in all video frames.Therefore, optimal cataloged procedure can be exported according to rate distortion costs and computation complexity are corresponding, improves study and the classification performance of classifier, and then improve the code efficiency of Video coding.

Description

Efficient video coding method based on study

Technical field

The present invention relates to image-signal processing methods, more particularly to one kind efficiently based on the efficient video coding of study Method.

Background technique

Due to being capable of providing better perceived quality and visual experience more true to nature, high definition (High Definition, HD) It becomes more and more popular with ultra high-definition (Ultra High Definition) video and is liked by people.These high definitions and superelevation Clear video has wide application market, including high definition television broadcast, MAX film, immersion video communication, network video-on-demand And high-definition video monitoring etc..However, since high definition and ultra high-definition video have higher clarity and video frame rate, video counts According to amount also great increase.For example, the video that 120 frame of 8K × 4K@high definition/ultra high-definition video per second has 11.5GB per second It is effectively stored and transmitted and just needs very efficient video compress by initial data.For effective solution HD video Compression problem, Video coding joint working group (Joint Collaborative Team on Video Coding, JCT- VC efficient video coding (High Efficiency Video Coding, HEVC) standard) is proposed.Compared to H.264/AVC The top grade (high profile) of standard reduces by 50% code rate under the premise of identical visual quality, i.e. compression ratio improves one Times.Efficient video coding HEVC introduces a variety of advanced coding techniques, including flexible quaternary tree block partition mode, 35 kinds of frames Inner estimation mode, discrete sine transform, and complicated interpolation and filtering technique etc..These correlative coding technologies effectively improve Video compression efficiency, however encoder complexity is substantially increased, including computation complexity, CPU consumption, internal storage access disappear Consumption, battery consumption etc. are unfavorable for the real-time application of high definition and ultra high-definition.

H.264/ coding tree unit structure (Coding Tree Unit, CTU) in efficient video coding HEVC is similar to The concept of macro block in AVC.CTU includes a luminance block (Coding Tree Block, CTB) and several chrominance blocks, Yi Jiruo Dry syntactic element.Each luminance block CTB includes a coding unit (Coding Unit, CU) according to video content or is divided For multiple coding units.The size of each coding unit supports 8 × 8,16 × 16,32 × 32 and 64 in efficient video coding HEVC Coding unit segmentation sample in × 64, luminance block CTB is as shown in Figure 1, Depth 0 to Depth 4 respectively indicates 64 × 64 to 8 × 8 coding unit size.In addition, each coding unit can be further divided into the predicting unit of different mode and size again (Prediction Unit, PU), wherein including SKIP, MERGE mode, 8 kinds of inter-frame modes and 2 kinds of frame modes.Finally, Each predicting unit PU will carry out transition coding using various sizes of converter unit (TransformUnit, TU) again.Efficiently view Coding unit, predicting unit and converter unit are level recurrence relations in frequency coding HEVC, each coding unit, predicting unit and Converter unit level has multiple modes, and each layer of optimal mode mainly passes through calculating rate distortion costs (Rate- Distortion Cost) it obtains, that is, selecting the smallest mode of rate distortion costs is optimal mode.However, this just needs to calculate institute The rate distortion costs of some levels and mode simultaneously compare to obtain optimal mode, and very time-consuming, computation complexity is very high.

Numerous researchers propose the low complex degree optimization method for efficient video coding HEVC as a result,.Such as pass through Encoded (Slice) or coding unit are predicted to predict the depth of the coding unit of the coded slice where current coded unit Range, for reducing computation complexity without test except estimation range.For another example people is believed using the differentiation of movement Breath is used as main feature, by the movement differential of adjacent encoder cell block and current coded unit block come decision present encoding list Whether member is divided.Have again and then utilizes rate distortion costs, the pattern dependency progress frame of different depth layer and adjacent encoder unit The prediction of interior coding unit depth.Machine learning is the hot spot in artificial intelligence, pattern-recognition and signal processing, passes through study Mode give effective optimal solution.The related algorithm of study is also applied to Video coding by researcher.It such as will be normalized Rate distortion costs etc. are used as characteristic quantity, its effective mould is realized in input support vector machines (Support Vector Machine, SVM) Formula classification problem.It is also considered simultaneously because the distortion of rate caused by misclassification increases.Therefore have and utilize back-propagating nerve net Network (Back Propagation Neural Network, BPNN) solves the pattern classification problem in Video coding.In addition, certainly Plan tree (Decision Tree) etc. is used for the pattern classification H.264/AVC and its in stereo scopic video coding, such as by SKIP mode Progress premature termination mode selection processes are fetched for the block differentiation of optimal mode.These algorithms are mainly used for volume H.264/AVC Code optimization, it is difficult to simply be transplanted in efficient video coding HEVC.

For the efficient video coding optimization based on machine learning, has and propose that the SVM based on weighting carries out coding unit point Cut with ameristic premature termination algorithm, wherein Feature Selection is optimized, meanwhile, rate distortion costs participate in and conduct The weighted information of SVM off-line learning.Therefore, there is the code stream information being utilized in MPEG-2 and H.264/AVC video flowing progress The CU model prediction of HEVC.At the same time, statistical threshold is added, and the situation true to forecasting inaccuracy is further screened. Generally, the existing coding unit mode prediction method based on machine learning is highly dependent on feature selecting and learning machine Nicety of grading, once forecasting inaccuracy really will lead to huge compression efficiency decline.Meanwhile conventional method once it is determined that, it is difficult to it is logical The conversion that code efficiency, computation complexity are realized in parameter adjustment is crossed, the coding requirement suitable for different video system is thus difficult to.

Summary of the invention

Based on this, it is necessary to provide a kind of efficiently efficient video coding method based on study.

A kind of efficient video coding method based on study, comprising the following steps:

Step 110, using efficient video coding device encoded video sequence, extract the corresponding feature of each coding unit block to Amount；

Step 120, the learning machine that the feature vector of extraction and forced coding unit size are inputted to the output of three values, configuration are learned Parameter and mode of learning are practised, learning model is established；

Early-abort strategy is added in step 130, the selection course that coding unit size is carried out in efficient video coding device Structure, wherein in each coding unit depth layer i, Direct Model (SKIP mode) and fusion mode (MERGE mould is first carried out Formula) current block, the corresponding feature vector of extraction present encoding corresponding with step 110；

Feature vector in step 130 is inputted the learning machine model succeeded in school by step 140, exports predicted value, if Predicted value is not divide, then executes and test current coded unit size, at the same skip the test of partition encoding unit size with Coding；If predicted value be segmentation, skip test current coded unit size, directly execute segmentation coding size test and Coding；If uncertain, then current coded unit size is tested, the coding unit size of segmentation is then tested；

Step 150 repeats step 130 and step 140 until coding unit layer all in coding tree unit has all encoded At；

Step 160 repeats step 130- step 150 until coding tree unit all encodes completion in all video frames.

Described eigenvector includes the feature of current coded unit block, motion information, up and down in one of the embodiments, Literary information, quantization parameter etc. and forced coding unit size.

The feature of the current coded unit block includes encoding block marker x in one of the embodiments,_{CBF_Meg}(i)、 Rate distortion costs value x_{RD_Meg}(i), it is distorted x_{D_Meg}(i) and number of coded bits x_{Bit_Meg}(i)；Wherein, i is current coded unit Depth；

The calculation formula of the motion information is x_{MV_Meg}(i)=| MVx |+| MVy |, wherein MVx and MVy respectively indicates fortune Dynamic and vertical movement amplitude, i are the depth of current coded unit.

Using the time of video macroblock mode, the rate distortion costs in space and coding unit depth as current coded unit The characteristic quantity of depth decision, is denoted as x respectively_{NB_RD}(i) and x_{CU_depth}(i)；Wherein, rate distortion costs x_{NB_RD}It (i) is the adjacent left side With the evaluation of estimate of the rate distortion costs value of top coding unit；Coding unit depth x_{CU_depth}It (i) is the flat of adjacent encoder unit Equal depth；

It is calculated by the following formula to obtain:

Wherein d_jIt is in the left side and top coding unit with 4 × 4 pieces of depth values for unit, N_LFT(i) and N_ABVIt (i) is a left side 4 × 4 pieces of numbers in side and top coding unit.

The feature vector by extraction and forced coding unit size input the output of three values in one of the embodiments, Learning machine, the step of configuring learning parameter and mode of learning, establishing learning model includes:

Characteristic vector is inputted to the learning machine of m two-value output, learning machine passes through the model learnt, exports predicted value O_i,+1 or -1, wherein i indicates the label of learning machine, is 1 to m；

To m output O_iIt is merged, obtaining final output is Q_ALL；

Wherein, T_AAnd T_BFor 0 to m two threshold values.

It in one of the embodiments, further include that feature is inputted to classifier, the classifier will be according in current video Hold characteristic and the model parameter learned is made prediction A (+1), B (- 1) or C (U is not known)；

When being predicted as A, then by Direct Model D_nAs optimal mode, then D is only carried out_n；

When being predicted as B, then by fault-tolerant mode P_n+1As for optimal mode, then P is only carried out_n+1；

When being predicted as C, then it represents that uncertain D_nOr P_n+1, then D is executed_nWith 4 P_n+1。

The incrementss of rate distortion costs caused by misprediction can indicate in one of the embodiments, are as follows:

Δη_RD(i)=Δ η_nS→S(i)×p_BA(i)+Δη_S→nS(i)×p_AB(i)；

Wherein p_BA(i)=N_BA,1(i)/N_ALL(i), p_AB(i)=N_AB,2(i)/N_ALLIt (i) is to divide (B) and do not divide (A) Misprediction rate, N_BA,1(i) and N_AB,2It (i) is respectively B misprediction is A in No.1 classifier number of samples, A in No. 2 classifiers Misprediction is the number of samples of B, N_ALL(i) number of samples predicted for i-th layer of coding unit, wherein N_ALL(1) i.e. For the coding unit number of whole image；

Divide and is calculated as with the computation complexity that do not divide after optimizing

Δ T (i)=Δ T_S(i)×q_S,1(i)+ΔT_nS(i)×q_nS,2(i)；

Wherein Δ T_S(i) and Δ T_nSIt (i) is respectively to pass through segmentation and ameristic prediction in coding unit depth layer i Caused by computation complexity reduce percentage, Δ T_S(i)=1-T_S(i)/T_ALL(i), Δ T_nS(i)=1-T_nS(i)/T_ALL(i), Wherein T_S(i)、T_nS(i) and T_ALL(i) it is respectively segmentation, does not divide and the computation complexity of origin operation；

The calculating after optimization is not divided according to the incrementss of rate distortion costs caused by the misprediction and the segmentation and Complexity sets objective function；

The objective function is expressed as

Wherein, Δ η_T,iPercentage, x are reduced for compression efficiency_iAnd y_iIt is two models ginseng of i-th layer of three value output category device Number, is expressed asW_A(j, i) and W_BIt is single that (j, i) respectively indicates i-th layer of coding In member in j-th of classifier positive and negative samples weighted value.

It in one of the embodiments, further include to configuration learning parameter and mode of learning setting Optimal Learning parameter；

To the partial video frame in several cycle tests, count to obtain Δ T using efficient video coding_nS(i)、ΔT_S(i)、 Δη_S→nS(i) and Δ η_nS→S(i), and fitting obtains parameter b_i、a_i、t_i、B_i、A_i、u_i、v_iAnd T_i,

Given Δ η_T,iLeast square method can be used and obtain parameter lambda_i, then by λ_iBring formula into

X can be calculated_iAnd y_i, final basis

Obtain W_A(j,i)/W_BThe ratio of (j, i), the training parameter as learning machine, wherein Δ T_nS(i)、ΔT_S(i)、Δ η_S→nS(i)、Δη_nS→S(i)、b_i、a_i、t_i、B_i、A_i、u_i、v_i、T_i、x_iAnd y_iIt is intermediate parameters.

The training method of the learning machine includes online mode and offline mode in one of the embodiments,.

The online mode includes to n frame video using original HM model based coding, output face in one of the embodiments, To the characteristic vector X of the learning machine of every a kind of coding unit depth i_iAnd the best macroblock mode Y of each coding unit；

By X_iWith Y input support vector machines learning machine training；

The coding unit depth prediction trained support vector machines learning machine being used in coding video frames.

The offline mode includes choose several particular sequences and each sequence several in one of the embodiments, Frame is encoded using original HM model, exports the study towards every a kind of coding unit depth i in these encoded video frames The characteristic vector X of machine_iAnd the best macroblock mode Y of each coding unit；

By X_iWith Y input support vector machines learning machine training；

It is pre- that trained support vector machines learning machine is used for video sequence, the coding of video frame and coding unit depth It surveys.

The above-mentioned efficient video coding method based on study establishes morning by setting the classifier of different niceties of grading Final only policy construction.The early-abort strategy structure can be converted by adjusting the ratio of output becomes the several classics of tradition Decision structure.Therefore, it can be switched over according to actual needs with the characteristic of classifier.It is preparatory for the feature vector of extraction The learning machine model succeeded in school can correspond to output predicted value, and different coding modes is selected according to predicted value.It can basis Rate distortion costs and computation complexity in efficient video coding, which correspond to, exports optimal cataloged procedure, to improve classifier Study and classification performance, and then improve Video coding code efficiency.

Detailed description of the invention

Fig. 1 is the flow chart of the efficient video coding method based on study；

Fig. 2 is the Fractionation regimen figure of coding unit in coding tree unit；

Fig. 3 is coding tree unit cataloged procedure schematic diagram；

Fig. 4 is that coding unit decision divides schematic diagram；

Fig. 5 (a) is recursive coding unit size decision process P_n(i) one of structure chart；

Fig. 5 (b) is recursive coding unit size decision process P_n(i) one of structure chart；

Fig. 5 (c) is recursive coding unit size decision process P_n(i) one of structure chart；

Fig. 5 (d) is recursive coding unit size decision process P_n(i) one of structure chart；

Fig. 6 is three output category device flow charts；

Fig. 7 is Δ η_T,iAverage computation complexity under configuring condition reduces and the increased curve graph of code rate；

Fig. 8 is online mode training schematic diagram；

Fig. 9 is offline mode training schematic diagram.

Specific embodiment

As shown in Figure 1, being the flow chart of the efficient video coding method based on study.

Step 110, using efficient video coding device encoded video sequence, extract the corresponding feature of each coding unit block to Amount.

Described eigenvector include feature, motion information, contextual information, quantization parameter of current coded unit block etc. with And forced coding unit size.

The feature of current coded unit block includes encoding block marker x_{CBF_Meg}(i), rate distortion costs value x_{RD_Meg}(i), it loses True x_{D_Meg}(i) and number of coded bits x_{Bit_Meg}(i)；Wherein, i is the depth of current coded unit.

The calculation formula of motion information is x_{MV_Meg}(i)=| MVx |+| MVy |, wherein MVx and MVy respectively indicate movement and Vertical movement amplitude, i are the depth of current coded unit.

Specifically, the information of current coded unit, main includes using SKIP mode and MERGE pattern-coding present encoding The output information of unit process, including encoding block marker (Coded Block Flag, identification code residual error coefficient number), rate Distortion cost value, distortion and number of coded bits, this four information are expressed as x_{CBF_Meg}(i)、x_{RD_Meg}(i)、x_{D_Meg}(i)、 x_{Bit_Meg}(i), wherein i is the depth of current coded unit, is 0,1,2,3.In addition, after being coding there are one SKIP marker bit Output information position, 0 or 1, be denoted as x_SKIP(i)。

Motion information is mainly used to characterize the motion intense degree of current coded unit, bigger general of general motion intense Rate uses the coding unit of small size, that is, the probability divided is higher.Using the movement arrow after merge pattern-coding in the present embodiment Scale levies the movement of current coded unit, is specifically calculated as x_{MV_Meg}(i)=| MVx |+| MVy |, wherein MVx and MVy are respectively indicated Movement and vertical movement amplitude.

When usually having stronger due to the macroblock mode of video, spatial coherence, as a result, when, space rate be distorted generation The characteristic quantity of valence and coding unit depth as current coded unit depth decision, is denoted as x respectively_{NB_RD}(i) and x_{CU_depth} (i).In the present embodiment, x_{NB_RD}It (i) is the evaluation of estimate on the adjacent left side and the rate distortion costs value of top coding unit；x_{CU_depth} (i) it is the mean depth of adjacent encoder unit, is calculated by the following formula to obtain

Wherein, d_jIt is in the left side and top coding unit with 4 × 4 pieces of depth values for unit, N_LFT(i) and N_ABV(i) it is 4 × 4 pieces of numbers in the left side and top coding unit.

The quantization parameter for encoding current coded unit, is denoted as x_QP, usual quantization parameter is bigger, and current coded unit more has can It can be encoded using biggish piece.

The features above amount enumerated in the present embodiment is main feature amount, the including but not limited to above content, furthermore It can be to wherein deleting.

Step 120, the learning machine that the feature vector of extraction and forced coding unit size are inputted to the output of three values, configuration are learned Parameter and mode of learning are practised, learning model is established.

Incorporated by reference to Fig. 6.Specifically, three value output category devices are made of the classifier that a two-value of m (m >=2) exports, structure As shown in figure 5, these two-value output category devices can be well known support vector machines, neural network learning machine or Bayes's classification Device etc..Finally, merging different output results by combining unit, form the output of three values, be positive respectively (+1), it is negative (- 1) and it is uncertain (U), A, B and C in Fig. 5 (d) are corresponded respectively to, amalgamation mode is embodied as.

Wherein T_AAnd T_BFor 0 to m two threshold values, work as T_A=m, then the A in Fig. 5 (d) is 0, works as T_B=0 three output point The B of class device is 0, T_A=m while T_B=0, then exporting C is 100%.If T_A<T_B, then C output is 0.In practical operation, general T_A> T_B。

The classifier of one three value output can be made of m different learning machines, can also be by the same learning machine but ginseng Number composition different with mode of learning, or different parameters form different learning machines simultaneously.In the present embodiment, using multiple supports Vector machine, different weighting coefficients, m=2, T_A=2, T_B=1.

Support vector machines learning machine is used in the present embodiment, training process can be divided into two major classes, offline mode and online Mode.

As shown in figure 8, for online mode training schematic diagram.

In online mode, by using original HM model based coding to n frame video, export towards each class coding unit depth The characteristic vector X of the learning machine of i_iThe best macroblock mode Y of (including 9 above-mentioned characteristic quantities) and each coding unit, by X_i With Y input support vector machines learning machine training, trained support vector machines learning machine is then used for subsequent video frame coding In coding unit depth prediction；One sequence of every coding requires re -training in which.

As shown in figure 9, for offline mode training schematic diagram.

In offline mode, several frames of several particular sequences and each sequence are chosen, are encoded using original HM model, Export the characteristic vector X of the learning machine towards every a kind of coding unit depth i in these encoded video frames_i(include above-mentioned 9 A characteristic quantity) and each coding unit best macroblock mode Y, by X_iWith Y input support vector machines learning machine training.Then Trained support vector machines learning machine is used for other video sequences, the coding of video frame and coding unit depth prediction.It should Mode can be trained the selection of video as required and training updates.

Early-abort strategy is added in step 130, the selection course that coding unit size is carried out in efficient video coding device Structure, wherein in each coding unit depth layer i, SKIP mode and MERGE mode current block, extraction and step is first carried out The corresponding feature vector of 110 corresponding present encodings.

In efficient video coding device HEVC Video coding, each image is made of the CTU of some column, coding therein Final coding unit segmentation is obtained in tree unit and ameristic decision is not an individual two-value decision problem, Er Qieyou Multiple decision problem compositions.In efficient video coding device HEVC coding, the coding tree unit luma unit of each 64x64 will be first The coding unit size coding for first using 64x64, calculates rate distortion costs；Then, it is divided into the coding list of 4 32x32 Elemental size calculates separately the rate distortion costs of 4 units.In the process, the coding unit size of each 32x32 can be divided again For the size coding unit of 4 16x16, successively recurrence, until the coding unit unit of 8x8.As shown in figure 3, wherein D_nIt indicates Coding current coded unit simultaneously calculates rate distortion costs, and n 0,1,2,3 respectively corresponds coding size 64x64 to 8x8 size, P_n It (i) is then recursive coding unit size decision process, i is the call number of four sub-blocks.Finally, from the coding unit of small size Recurrence upwards in turn, the rate distortion costs compared with upper one layer, cost it is small be set to better model selection；Successively recurrence, simultaneously Compare to the end, thus obtains the optimum code unit size partitioning scheme of coding tree unit.

In coding unit size selection process, current coded unit size or smaller 4 coding unit rulers are selected It is very little can be defined as one point with regardless of the problem of.The coding unit size decision problem of entire coding tree unit can be with as a result, It is described as the two-value decision problem of 3 levels.As shown in figure 4, the selection of 64x64 or 4 32x32 can be defined as one Point with regardless of two-value decision problem, need a classifier, be defined as L1 first layer；Then, for second, there will be 4 Point with regardless of the problem of, thus need 4 classifiers (L2)；Similar, third layer needs 16 classifiers.These classifiers, Same layer belongs to the classifier of the same attribute, can be used for multiple times with a classifier or multiple identical classifiers.

Due to the P in Fig. 3_n(i) structure is a recursive structure, is repeated in different levels, thus for convenience's sake, We use P_n(i) optimization of structure describes the optimization of entire coding unit size decision process.

For the P in Fig. 3_n(i), in the Knowledge Verification Model of efficient video coding device HEVC (i.e. original video encoder) It can be described as by a D_nWith 4 P_n+1(i) the advantages of composition, process are that structure is executed such as the sequence of Fig. 5 (a), the structure It is that selection optimal mode code efficiency is high, the disadvantage is that there is a large amount of non-essential calculating, computation complexity is high.During this, Since optimal mode can select D_nOr 4 P_n+1, it is the process of alternative.

Thus multiple pre- geodesic structures, Fig. 5 (b) are one of premature termination scheme, are finishing D_nLater, pre- using one Classifier is surveyed, predicts current best mode and judge whether to skip to execute subsequent P_n+1, thus reduce computation complexity.However This method will execute D to all blocks_nOperation, for selecting P_n+1It is obviously wasted for optimal mode unnecessary.

It is proposed shown in premature termination scheme such as Fig. 5 (c) as a result, i.e., as execution D_nAnd P_n+1It is preceding to be predicted first by classifier Current optimal mode, if it is D_n, then it is set as Y and only carries out D_n, otherwise it is set as N and executes P_n+1.The advantages of structure is only Predict accurate, not additional calculating cost；The disadvantage is that it is highly dependent on the precision of prediction of classifier, once prediction is not quasi- enough Really, it will biggish compression efficiency is caused to decline.

It is proposed to this end that as shown in Fig. 5 (d), classifier will make prediction A, B or C according to current video content characteristic, right In A, i.e. prediction D_nFor optimal mode, then D is only carried out_n；It then predicts that Pn+1 is optimal mode for B, then only carries out P_n+1；For C Then indicate uncertain D_nOr P_n+1, then D is executed_nWith 4 P_n+1.The structure is very flexible, has multiple advantages: due to that can pass through Tri- output items of ABC in sorting algorithm and parameter adjustment classifier, as a result, when it is 100% that A and B, which is 0, C, the structure It can be exchanged into Fig. 5 (a), there is highest compression efficiency maximum complexity.When B is 0, then the structure becomes Fig. 5 (b)；Finally, When C is 0, then the structure is converted to structure shown in Fig. 5 (c).The structure can realize code efficiency and meter by adjusting A, B, C The conversion of complexity is calculated, code efficiency and encoder complexity, can be according to reality between structure chart 5 (a) and Fig. 5 (c) The adjustment of application system demand.

Feature vector in step 130 is inputted the learning machine model succeeded in school by step 140, exports predicted value, if Predicted value is not divide, then executes and test current coded unit size, at the same skip the test of partition encoding unit size with Coding；If predicted value be segmentation, skip test current coded unit size, directly execute segmentation coding size test and Coding；If uncertain, then current coded unit size is tested, the coding unit size of segmentation is then tested.

Step 150 repeats step 130 and step 140 until coding unit layer all in coding tree unit has all encoded At.

Efficient video coding method based on study further includes by the rate distortion costs of the time of video macroblock mode, space And characteristic quantity of the coding unit depth as current coded unit depth decision, it is denoted as x respectively_{NB_RD}(i) and x_{CU_depth}(i)； Wherein, rate distortion costs x_{NB_RD}It (i) is the evaluation of estimate on the adjacent left side and the rate distortion costs value of top coding unit；Coding unit Depth x_{CU_depth}It (i) is the mean depth of adjacent encoder unit.

It is calculated by the following formula to obtain:

The feature vector by extraction and forced coding unit size input the learning machine of three values output, configuration study ginseng Several and mode of learning, the step of establishing learning model include:

Characteristic vector is inputted to the learning machine of m two-value output, learning machine passes through the model learnt, exports predicted value O_i,+1 or -1, wherein i indicates the label of learning machine, is 1 to m.

To m output O_iIt is merged, obtaining final output is Q_ALL；

Wherein, T_AAnd T_BFor 0 to m two threshold values.

Learning machine is support vector machines, but weighting parameters are different.Preferably, m is set as 2, T_A=2, T_B=1.

Efficient video coding method based on study further includes that feature is inputted to classifier, and the classifier will be according to current Video content characteristic and the model parameter learned make prediction A (+1), B (- 1) or C (U is uncertain).

When being predicted as A, then by Direct Model D_nAs optimal mode, then D is only carried out_n。

When being predicted as B, then by fault-tolerant mode P_n+1As for optimal mode, then P is only carried out_n+1。

Efficient video coding method based on study further includes to configuration learning parameter and mode of learning setting Optimal Learning Parameter.

To the partial video frame in several cycle tests, count to obtain Δ T using efficient video coding_nS(i)、ΔT_S(i)、 Δη_S→nS(i) and Δ η_nS→S(i), and fitting obtains parameter b_i、a_i、t_i、B_i、A_i、u_i、v_iAnd T_i。

X can be calculated_iAnd y_i, final basis

Specifically, needing in the support vector machines learning machine training process of three values output for two support vector machines Habit machine configures reasonable parameter, has reached optimal prediction effect.Mainly by adjusting support vector machines study in the present embodiment Weighting coefficient W in machine_AAnd W_B, carry out adjusting training process.W_AAnd W_BThe importance of positive and negative samples in sample is respectively indicated, it is bigger It indicates easier and is divided into such, false acceptance rate will increase, and false rejection rate will be reduced.The knot of these mistake classification or prediction Fruit will lead to the coding of the code efficiency and computation complexity that finally encode.

Define Δ η_S→nS(i) do not divide those coding units for selecting Fractionation regimen as optimal mode of pattern-coding to use The incrementss of caused rate distortion costs are represented by Δ η_S→nS(i)=(1-J_nS(i)/J_Best× 100%, (i)) wherein i For coding unit decision-making level, i ∈ { 1,2,3 }, J_nS(i) and J_Best(i) it respectively indicates and is compiled using not Fractionation regimen and optimal mode Rate distortion costs when code current coded unit.Equally, Δ η is defined_nS→S(i) for using Fractionation regimen encode those selection regardless of The incrementss for cutting rate distortion costs caused by the coding unit that mode is optimal mode, are represented by Δ η_nS→S(i)=(1-J_S (i)/J_Best(i)) × 100%, wherein J_SIt (i) is the rate distortion costs that current coded unit is encoded using Fractionation regimen.As a result, In the decision process of each layer of coding unit, the incrementss of the rate distortion costs as caused by misprediction can be indicated are as follows:

Δη_RD(i)=Δ η_nS→S(i)×p_BA(i)+Δη_S→nS(i)×p_AB(i)；

Wherein p_BA(i)=N_BA,1(i)/N_ALL(i), p_AB(i)=N_AB,2(i)/N_ALLIt (i) is to divide (B) and do not divide (A) Misprediction rate, N_BA,1(i) and N_AB,2It (i) is respectively B misprediction is A in No.1 classifier number of samples, A in No. 2 classifiers Misprediction is the number of samples of B, N_ALL(i) number of samples predicted for i-th layer of coding unit, wherein N_ALL(1) i.e. For the coding unit number of whole image.

Define q_S,1(i)=N_S,1(i)/N_ALLIt (i) is the percentage that segmentation is predicted as by the classifier that three values export, definition q_nS,2(i)=N_nS,2(i)/N_ALL(i) to be predicted as ameristic percentage, wherein N_S,1(i) and N_nS,2It (i) is respectively that three values are defeated The segmentation and ameristic number of samples that classifier 1 and 2 is predicted respectively in classifier out.As a result, by segmentation with do not divide it is excellent Computation complexity after change may be calculated

Δ T (i)=Δ T_S(i)×q_S,1(i)+ΔT_nS(i)×q_nS,2(i)；

Wherein Δ T_S(i) and Δ T_nSIt (i) is respectively to pass through segmentation and ameristic prediction in coding unit depth layer i Caused by computation complexity reduce percentage, Δ T_S(i)=1-T_S(i)/T_ALL(i), Δ T_nS(i)=1-T_nS(i)/T_ALL(i), Wherein T_S(i)、T_nS(i) and T_ALLIt (i) is respectively that segmentation (skips D_nOperation), do not divide and (omits 4 P_n+1Operation), and former behaviour Make (D_nWith 4 P_n+1Be carried out) computation complexity.

It is encoded and is counted for 20 frames of BQMall, FourPeople etc. 5 different cycle tests, by above-mentioned parameter It is fitted to

Wherein b_i、a_i、t_i、B_iAnd A_iFor fitting parameter；

Wherein u_i、v_iAnd T_iFor fitting parameter, these fitting parameters by with the cycle tests of selection, coding frame number etc. no Together, it is varied.In addition, Δ T_nS(i)、ΔT_S(i)、Δη_S→nS(i) and Δ η_nS→SIt (i) can also be by partial test sequence The coding of column and count to obtain.

In the present embodiment, above-mentioned fitting parameter is as follows:

Wherein R²Indicate fitting precision, it is better closer to 1.

Thus the performance optimized in order to obtain, function of setting objectives, i.e., under the conditions of rate distortion costs increase limited, The computation complexity 1- Δ T (i) for minimizing encoder, is represented by

Wherein, Δ η_T,iPercentage, x are reduced for compression efficiency_iAnd y_iIt is two models ginseng of i-th layer of three value output category device Number, is expressed asW_A(j, i) and W_BIt is single that (j, i) respectively indicates i-th layer of coding In member in j-th of classifier positive and negative samples weighted value.Above problem conversion are as follows:

To above-mentioned formula respectively to x_i、y_iAnd λ_iLocal derviation is sought, and is set to 0, is obtained

Above-mentioned formula is solved, available:

In the present embodiment, only λ_iWith Δ η_T,iFor parameter, other coefficients are known constant, although can not be dominant It indicates, but as long as giving a Δ η_T,iCorresponding λ can be obtained by the methods of known least square method_i, then by λ_iIt brings into X can be calculated in formula_iAnd y_i, finally obtain W_A(j,i)/W_BThe ratio of (j, i), the instruction as support vector machines learning machine Practice parameter.

Based on above-mentioned all embodiments, it is based on using the reference software platform HM12.0 verifying of efficient video coding device is above-mentioned The efficient video coding method of study.Configuration information includes low delay B frame class, and coded sequence first frame is I frame, remaining frame is P frame, coding unit size support 64 × 64 to 8 × 8, and motion estimation range 64, other parameters are default parameters.Coding experiments Implement on computers.

Encoding verification experiment is divided into two parts, encodes 5 cycle tests Basketballpass (416 × 240) first, Partyscene (832 × 480), Johnny (1280 × 720), Kimono (1920 × 1080), Traffic (2560 × 1600), using different user configuration parameter, Δ η_T,i, it can be expressed as { 0.1%, 0.1%, 0.1% } respectively, 0.3%, 0.3%, 0.3% }, { 0.5%, 0.5%, 0.5% }, { 0.7%, 0.7%, 0.7% } is denoted as Para_111, Para_333, Para_555 and Para_777.In addition, { 0.3%, 0.2%, 0.1% }, { 0.6%, 0.4%, 0.2% }, 0.9%, 0.6%, 0.3% }, it is denoted as Para_321, Para_642, Para_963, and thus obtains training parameter W_AAnd W_BTraining learning machine, then Learning machine is used for coding unit depth prediction in cataloged procedure.

It is illustrated in figure 7 the volume that the encoder encoded video before 5 videos and optimization is encoded using the encoder after optimization Code efficiency and computation complexity compare, the saving degree of Y-axis position computation complexity, compared to the code rate before optimization after the what happened latter of X-axis position Increase percentage, as seen from the figure, in different configuration parameter Δ η_T,iAdjusting under, average computation complexity can reduce by 42% ~56%.

In addition, using parameter Para_642 training learning machine, and the model that training obtains is used for the CU depth in coding Prediction optimization complexity encodes all frames of 21 sequences in this experiment, while compared existing state-of-the-art three kinds of codings Method ShenEVIP, ShenTMM and XiongTMM compare, and the present invention is soft compared to former efficient video coding device HEVC Knowledge Verification Model Part platform HM can reduce computation complexity 28.82% to 70.93%, and average 51.45%, average BDPSNR and BDBR is respectively- 0.061dB and 1.98%, it is almost the same with the compression efficiency of former HM.Due to currently advanced from compression efficiency and computation complexity Three schemes ShenEVIP, ShenTMM and XiongTMM.

It is that code efficiency and computation complexity compare such as following table:

It is above-mentioned it is all it is used in the examples by two Support vector machines form three values output learning machines, Learning machine therein could alternatively be other kinds of learning machine, such as Bayes, neural network, decision tree, while learning machine Quantity can be with more than two.Furthermore the learning machine of three values output both can export learning machine by multiple two-values and form, can also be by Classification learning machine is directly realized by more than one.

The feature vector of input learning machine includes, but are not limited to four classes mentioned in above-described embodiment, may also include figure As texture edge, brightness etc.；Meanwhile the characteristic quantity in the present invention can have many forms, such as current coded unit Feature, the motion information of block are calculated as x_{MV_Meg}(i)=| MVx |+| MVy |, contextual information, quantization parameter etc. and optimal encoding Code unit size etc. can actually be replaced using other forms of expression.

Classification method based on study is used for the decision of coding unit depth, complicated to reduce the model selection in coding Degree.Process with a variety of " multiselects one " selected similar to size/depth as coding in actual video cataloged procedure, is removed Outside coding unit depth decision, there are also predicting unit model selection predicting unit, converter unit model selection converter unit join more Frame selection is examined, the classification method in above-described embodiment based on study can be used in the processes such as estimation, solves related " multiselect One " the problem of.

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of efficient video coding method based on study, comprising the following steps:

Step 110, using efficient video coding device encoded video sequence, extract the corresponding feature vector of each coding unit block；

Step 120, the learning machine that the feature vector of extraction and forced coding unit size are inputted to the output of three values, configuration study ginseng Several and mode of learning, establishes learning model；

Early-abort strategy structure is added in step 130, the selection course that coding unit size is carried out in efficient video coding device, Wherein, in each coding unit depth layer i, SKIP mode and MERGE mode is first carried out in current coded unit block, is extracted and is walked The corresponding feature vector of rapid 110 corresponding current coded unit block；

Feature vector in step 130 is inputted the learning machine model succeeded in school by step 140, exports predicted value, if prediction Value is not divide, then executes and test current coded unit size, while skipping test and the coding of partition encoding unit size； If predicted value is segmentation, skip test current coded unit size directly executes test and the coding of the coding size of segmentation； If uncertain, then current coded unit size is tested, the coding unit size of segmentation is then tested；

Step 150 repeats step 130 and step 140 until coding unit layer all in coding tree unit all encodes completion；

Step 160 repeats step 130- step 150 until coding tree unit all encodes completion in all video frames；

The feature vector of extraction and forced coding unit size, are inputted the learning machine of three values output by the step 120, and configuration is learned The step of practising parameter and mode of learning, establishing learning model include:

To m output O_iIt is merged, obtaining final output is Q_ALL；

Wherein, T_AAnd T_BFor 0 to m two threshold values.

2. the efficient video coding method according to claim 1 based on study, which is characterized in that described eigenvector packet Include feature, motion information, contextual information, quantization parameter of current coded unit block etc. and forced coding unit size.

3. the efficient video coding method according to claim 2 based on study, which is characterized in that the present encoding list The feature of first block includes encoding block marker x_{CBF_Meg}(i), rate distortion costs value x_{RD_Meg}(i), it is distorted x_{D_Meg}(i) it and encodes Bit number x_{Bit_Meg}(i)；Wherein, i is the depth of current coded unit；

The calculation formula of the motion information is x_{MV_Meg}(i)=| MVx |+| MVy |, wherein MVx and MVy respectively indicate movement and Vertical movement amplitude, i are the depth of current coded unit；

Using the time of video macroblock mode, the rate distortion costs in space and coding unit depth as current coded unit depth The characteristic quantity of decision, is denoted as x respectively_{NB_RD}(i) and x_{CU_depth}(i)；Wherein, rate distortion costs x_{NB_RD}It (i) is the adjacent left side and upper The evaluation of estimate of the rate distortion costs value of side coding unit；Coding unit depth x_{CU_depth}It (i) is the average depth of adjacent encoder unit Degree；

It is calculated by the following formula to obtain:

Wherein d_jIt is in the left side and top coding unit with 4 × 4 pieces of depth values for unit, N_LFT(i) and N_ABV(i) for the left side and 4 × 4 pieces of numbers in the coding unit of top.

4. the efficient video coding method according to claim 1 based on study, which is characterized in that further include that feature is defeated Enter classifier, the classifier will make prediction A, B or C according to current video content characteristic and the model parameter learned；Wherein A It indicates just, B indicates negative, and C indicates uncertain；

5. the efficient video coding method according to claim 4 based on study, which is characterized in that rate caused by misprediction The incrementss of distortion cost can indicate are as follows:

Δη_RD(i)=Δ η_nS→S(i)×p_BA(i)+Δη_S→nS(i)×p_AB(i)；

Wherein Δ η_nS→S(i) for use Fractionation regimen coding select not Fractionation regimen for caused by the coding unit of optimal mode The incrementss of rate distortion costs, Δ η_S→nS(i) do not divide the coding that pattern-coding selects Fractionation regimen as optimal mode to use The incrementss of rate distortion costs caused by unit, p_BA(i)=N_BA,1(i)/N_ALL(i), p_AB(i)=N_AB,2(i)/N_ALL(i) it is Segmentation B and the misprediction rate for not dividing A, N_BA,1(i) and N_AB,2(i) be respectively in No.1 classifier B misprediction be A sample Number, A misprediction is the number of samples of B, N in No. 2 classifiers_ALL(i) sample predicted for i-th layer of coding unit Number, wherein N_ALL(1) be whole image coding unit number；

Δ T (i)=Δ T_S(i)×q_S,1(i)+ΔT_nS(i)×q_nS,2(i)；

Wherein Δ T_S(i) and Δ T_nSIt (i) is respectively to be led in coding unit depth layer i by segmentation and ameristic prediction The computation complexity of cause reduces percentage, Δ T_S(i)=1-T_S(i)/T_ALL(i), Δ T_nS(i)=1-T_nS(i)/T_ALL(i), wherein T_S(i)、T_nS(i) and T_ALL(i) it is respectively segmentation, does not divide and the computation complexity of origin operation；

The calculating complexity after optimization is not divided according to the incrementss of rate distortion costs caused by the misprediction and the segmentation and Degree setting objective function；

The objective function is expressed as

Wherein, Δ η_T,iPercentage, x are reduced for compression efficiency_iAnd y_iIt is two model parameters of i-th layer of three value output category device, It is expressed asW_A(j, i) and W_B(j, i) is respectively indicated in i-th layer of coding unit The weighted value of positive and negative samples in j-th of classifier.

6. the efficient video coding method according to claim 4 based on study, which is characterized in that further include learning configuration It practises parameter and Optimal Learning parameter is arranged in mode of learning；

To the partial video frame in several cycle tests, count to obtain Δ T using efficient video coding_nS(i)、ΔT_S(i)、Δ η_S→nS(i) and Δ η_nS→S(i), and fitting obtains parameter b_i、a_i、t_i、B_i、A_i、u_i、v_iAnd T_i,

X can be calculated_iAnd y_i, final basisObtain W_A(j,i)/ W_BThe ratio of (j, i), the training parameter as learning machine, wherein Δ T_nS(i)、ΔT_S(i)、Δη_S→nS(i)、Δη_nS→S(i)、 b_i、a_i、t_i、B_i、A_i、u_i、v_i、T_i、x_iAnd y_iIt is intermediate parameters, h₁(i), h₂(i), k₁(i), k₂It (i) is known constant.

7. the efficient video coding method according to claim 1 based on study, which is characterized in that the instruction of the learning machine The mode of white silk includes online mode and offline mode.

8. the efficient video coding method according to claim 7 based on study, which is characterized in that the online mode packet It includes to n frame video using original HM model based coding, exports the characteristic vector X of the learning machine towards each class coding unit depth i_i And the best macroblock mode Y of each coding unit；

By X_iWith Y input support vector machines learning machine training；

9. the efficient video coding method according to claim 7 based on study, which is characterized in that the offline mode packet Several frames for choosing several particular sequences and each sequence are included, is encoded using original HM model, exports these encoded views The characteristic vector X of learning machine towards every a kind of coding unit depth i in frequency frame_iAnd the best piecemeal mould of each coding unit Formula Y；

By X_iWith Y input support vector machines learning machine training；

Trained support vector machines learning machine is used for video sequence, the coding of video frame and coding unit depth prediction.