CN102547293A

CN102547293A - Method for coding session video by combining time domain dependence of face region and global rate distortion optimization

Info

Publication number: CN102547293A
Application number: CN201210034708XA
Authority: CN
Inventors: 范小九; 彭强; 杨天武; 王琼华
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2012-02-16
Filing date: 2012-02-16
Publication date: 2012-07-04
Anticipated expiration: 2032-02-16
Also published as: CN102547293B

Abstract

The invention discloses a method for coding a session video by combining the time domain dependence of a face region and global rate distortion optimization. The distortion of a face region of interest (ROI) and diffusion influence thereof are estimated in advance by utilizing the time domain dependence of the face ROI between adjacent coded fames in the same group of pictures (GOP), and an effective auxiliary means is provided for optimal motion vector and mode division and selection. According to the method, the optimization of a coded unit of the face ROI is particularly emphasized from an overall point of view, so that the subjective and objective quality of the coded unit of the face ROI and subsequent coded units taking the coded unit of the face ROI as a reference is well ensured, additional bit overhead caused by the distortion diffusion in the conventional coding process is avoided, and on the premise of maintaining or improving the subjective and objective quality of a coded picture, the coding rate of the session video is effectively decreased, and coding performance is improved; and the method is completely compatible with the conventional sequential coding structure, and is applied to application places such as video storage, real-time video coding with requirements on real-time performance of more than a GOP delay, and the like.

Description

The conversational video coding method that human face region time domain dependence combines with overall rate-distortion optimization

Affiliated technical field

The invention belongs to video coding and process field, be specifically related to the research of rate-distortion optimization coding method in the conversational video cataloged procedure.

Background technology

People's face is different from one of key feature of other biological as the mankind, in human communication and social activities, playing the part of the role of main information carrier, thereby it is carried out comprehensive and deep research has important theoretical and realistic meaning.The rise of As real-time multimedia service, application such as video conference, video telephone, news report all have direct or indirect getting in touch with people's face.Follow the extensive popularization of these application, the importance of people's face research grows with each passing day especially.Usually, video coding and communication circle come above-mentioned application is summarized with " conversational video sequence ", and its corresponding coding techniques then is called the conversational video coding techniques.In the video compression theory of classics, all two field pictures and coding unit are all based on equal importance and by sequential encoding.Along with going deep into of research; People recognize that gradually the evaluation index of video coding algorithm is except compression ratio and Y-PSNR (Peak Signal to Noise Ratio; PSNR) outside, the coding quality of considered " area-of-interest (Region of Interest, ROI) " also.In fact, the user is often to come directly evaluation video coding result's acceptable degree to the quality of the subjective feeling of ROI compression effectiveness.Therefore, how to guarantee or the encoding and decoding quality that improves people's face ROI in the conversational video sequence is the advanced subject of demanding urgently studying in the current sessions field of video encoding.

See from available research achievements, mainly be divided into two types around the correlative study of people's face ROI video coding: 1) coding side priority protection people face ROI, as the intra-frame encoding mode based on people's face ROI upgrade, based on Bit Allocation in Discrete and the resource optimization of people's face ROI; 2) the decoding end emphasis recover or under situation about going wrong priority restores people face ROI, like error concealment based on people's face ROI.Wherein, most of achievement in research is through giving people's face ROI higher encoding and decoding priority, realized the raising of people's face ROI subjective and objective quality to a certain extent and promoted the development of conversational video coding techniques.Yet the problem that scholars ignore is; Though the quality of people's face ROI has the special role on the video evaluation; But because it is the part in the conversational video sequence, for people's face ROI to stress other parts that encoding and decoding must mean the conversational video sequence be the reduction of non-face ROI part encoding and decoding priority.The embodiment of this point in cataloged procedure is particularly outstanding, and as under the situation of bit resource-constrained, the Bit Allocation in Discrete that stresses people's face ROI is a prerequisite with the coded-bit of sacrificing non-face ROI promptly.When the coded-bit of being sacrificed had influence on the coding quality of non-face ROI, the non-face ROI that this moment, coding quality suddenly reduced can surmount people's face ROI became the core that human eye is paid close attention to.Thus, though the coding quality of people's face ROI has obtained obvious raising because of bit stresses to distribute, not only the video sequence binary encoding quality that human eye is felt can not improve reduction on the contrary.On the other hand; The importance of each several part is not in full accord yet among people's face ROI; Though in the existing part document people's face coding priority has been provided finer division (as by eye, ear, mouth, nose region) so that this problem is given prominence to, it is too subjective that relevant division methods still seems.(Rate-Distortion, R-D) performance is carried out also should to combine concrete manifestation such as the rate distortion of people's face ROI in cataloged procedure during therefore, based on people's face ROI actual coding.

(Rate Distortion Optimization, RDO) control strategy is one of effective means that under limited bandwidth condition, provides the decoding end best video quality to rate-distortion optimization.In theory, the optimal solution of video coding RDO is the result who all coding units is carried out global optimization.In order to make problem be easier to find the solution, scholars often tend to do an independence assumption, promptly think to be independent of each other between each coding unit, thereby realize independently weighing of each the coding unit code check and the distortion factor.Based on this and combine method of Lagrange multipliers, video coding RDO problem is promptly divided and rule and is found the solution.In fact, because single encoded unit must could obtain after other coding units calculating finishes at the bit number under the specific coding pattern, so strictly speaking, the judgement of each coding unit forced coding pattern is complementary.Because the mission critical of video coding is the redundancy (time redundancy, spatial redundancy and statistical redundancy) that removes between the different coding unit; So strategies commonly used such as relative estimation, motion compensation and entropy coding have caused complicated coding dependence, this dependence makes that also the RDO of each coding unit can not be the individuality of a complete closed.Therefore, also unreasonable based on the RDO method of independence assumption, and consider that in each coding unit RDO process the coding dependence has become one of important means of improving video coding performance.

In recent years, in a lot of video coding correlative study work the coding dependence has been had related, but the higher defective of these method ubiquity computation complexities.In order to obtain the balance between code efficiency and the time complexity, a large amount of RDO methods have to abandon part is encoded dependent consideration with the raising on the obtained performance.In conversational video that the present invention paid close attention to coding because the similitude of people's face ROI coding unit texture and the consistency of motion, thereby an image sets (Group of Picture, GOP) in the dependence that showed when encoding of consecutive frame stronger.

Summary of the invention

Above deficiency in view of prior art; The objective of the invention is to design a kind of new method that improves the conversational video coding efficiency; Make it to obtain more outstanding coding efficiency and good using value and theory significance, and be applicable to that video storage (it is the whole sequence frame number that the GOP maximum length can be set) and real-time require the real-time video coding greater than a GOP time-delay.

The objective of the invention is to realize through following means.

The conversational video coding method that a kind of human face region time domain dependence combines with overall rate-distortion optimization; Utilize people's face region of interest ROI time domain dependence between the adjacent encoder frame in same image sets GOP; Estimate the distortion factor of people's face ROI in advance and extend influence; For optimal motion vectors and mode division select to provide efficient assistant method, to realize video sequence integral body and the synchronous raising of people's face ROI on subjective and objective quality, its implementation comprises following sequence step:

A. (before each GOP of coding conversational video sequence) to current GOP in all coded frame carry out people's face ROI and detect, thereby confirm the particular location of people's face ROI coding unit.The definition of conversational video sequence, GOP, coding unit and people's face ROI coding unit and sketch map see below about the 1st of the explanation of accompanying drawing and term.

B. whether belong to people's face ROI according to the present encoding unit, select different RDO methods to be optimized coding.

For people's face ROI coding unit,

B.1 construct people's face ROI coding unit time domain diffusion chain.The definition of people's face ROI coding unit time domain diffusion chain is seen hereinafter about the 2nd of the explanation of accompanying drawing and term.Time complexity when reducing people's face ROI coding unit time domain diffusion chain structure, it is following that the present invention provides a kind of people's face ROI coding unit time domain diffusion chain building method of simplification:

(1) each coding unit in the present encoding GOP of conversational video sequence is carried out the propulsion search; To obtain each coding unit best match unit position in next frame, forward motion vector and forward prediction difference (this step is only carried out once) that record is corresponding in current GOP.Propulsion search, best match unit, forward motion vector, forward prediction difference are seen hereinafter about the 3rd of the explanation of accompanying drawing and term.

(2) according to the diffusion position of forward motion vector derivation people face ROI coding unit in next coded frame of current GOP that obtains in the step (1), this diffusion position is pairing to be called people's face ROI diffusion unit with the identical unit of people's face ROI coding unit size.For the purpose of the difference, this step people face ROI diffusion unit is called people's face ROI diffusion unit No. 1.In fact, No. 1 people's face ROI diffusion unit is the best match unit of working as forefathers' face ROI coding unit in the step (1).The forward prediction difference of this step storage people face ROI coding unit and the position of No. 1 people's face ROI diffusion unit.

(3), thereby can obtain its diffusion position in the next again coded frame of current GOP with the forward motion vector of the actual coding unit at place, No. 1 people's face ROI diffusion unit center in the step (2) forward motion vector as this people's face ROI diffusion unit.This diffusion position is pairing promptly to be the people face ROI diffusion unit of people's face ROI coding unit in the next again coded frame of current GOP with the identical unit of people's face ROI coding unit size, is called people's face ROI diffusion unit No. 2.Resulting here diffusion unit should not exceed in the summary of the invention steps A people's face ROI scope in the resulting current encoded frame; If exceed then with in diffusion unit horizontal translation to people's face ROI scope as No. 2 people's face ROI diffusion units, then continue vertical translation and be positioned at people's face ROI scope fully if still exceed people's face ROI scope after the translation until diffusion unit.Simultaneously, according to the ratio situation of the resulting No. 1 people's face ROI diffusion unit of step (2) on each actual coding unit, the forward prediction difference of each actual coding unit is sued for peace as the forward prediction difference of No. 1 people's face ROI diffusion unit in proportion.The position of this step No. 1 people's face ROI diffusion unit forward prediction difference of storage and No. 2 people's face ROI diffusion units.

(4) be similar to the follow-up people's face ROI diffusion unit of step (3) derivation, be positioned at the last frame of current GOP until people's face ROI diffusion unit.People's face ROI coding unit and all diffusion units on subsequent frame thereof are joined together to form people's face ROI coding unit time domain diffusion chain, and each forward prediction difference is stated use after preserving and supplying.

The sketch map of this method and related description are seen hereinafter about the 2nd of the explanation of accompanying drawing and term.

B.2 calculate the distortion factor estimated value of all diffusion units on people's face ROI coding unit and the people's face ROI coding unit time domain diffusion chain.Distortion factor estimated value is the result that the distortion factor that before present encoding unit or diffusion unit are not encoded, produced after to its actual coding is rationally estimated gained, and the present invention provides the resulting distortion factor method of estimation of a kind of laplacian distribution characteristic according to residual error DCT coefficient for as follows:

Formula 1:

D = D_{MCP} \cdot F (\sqrt{2} Q / \sqrt{D_{MCP}})

Wherein D is a distortion factor estimated value, D _MCPBe a last coding unit of current coding unit or the forward prediction difference of diffusion unit on the time domain diffusion chain, Q is a quantization step.Because people's face ROI coding unit is the start element of time domain prediction chain, need adopt thereafter to prediction difference when therefore calculating its distortion factor estimated value.The back forecast difference obtains based on reverse search, and reverse search and back forecast difference are seen the 4th in specification definition and accompanying drawing.For the F in the formula 1 () function, its computational methods are following,

Formula 2:

F (θ) = [{&Integral;}_{0}^{dθ} y^{2} \cdot p (y) dy + Σ_{k = 0}^{\infty} {&Integral;}_{(k + d) \cdot θ}^{(k + d + 1) \cdot θ} (c (y) \cdot {| y - (k + d + ω) \cdot θ |}^{2} + (1 - c (y)) \cdot y^{2}) \cdot p (y) \cdot dy] .

B.3 calculate distortion factor diffusion coefficient and the summation that all diffusion units are influenced by people's face ROI coding unit on people's face ROI coding unit time domain diffusion chain and obtain total distortion degree diffusion coefficient.Distortion factor diffusion coefficient is the measurement sign of the coding result of a certain coding unit or diffusion unit to its next adjacent diffusion unit coding influence of time domain diffusion chain.The present invention provides a kind of distortion factor diffusion coefficient computational methods based on experiment derivation gained and representes as follows,

Formula 3:

β_{t} = \frac{D_{t}}{D_{t - 1} + D_{t}^{MCP}}

β wherein _tRepresent the distortion factor diffusion coefficient that current diffusion unit is influenced by last coding unit on the time domain diffusion chain or diffusion unit, D _tThe distortion factor estimated value of representing current diffusion unit, D _T-1The distortion factor estimated value of representing last coding unit or diffusion unit,

The forward prediction difference of representing current diffusion unit.For calculate every other diffusion unit on the people's face ROI coding unit time domain diffusion chain receive distortion factor diffusion coefficient that people's face ROI coding unit influences and and then try to achieve total distortion degree diffusion coefficient; The present invention calculates the distortion factor diffusion coefficient that certain diffusion unit is influenced by last coding unit or diffusion unit on people's face ROI coding unit time domain diffusion chain respectively, utilizes the sexual intercourse of taking advantage of that obtains based on derivation to draw the distortion factor diffusion coefficient that it is influenced by people's face ROI coding unit then.For example, the distortion factor diffusion coefficient of a current diffusion unit N and a front N-1 diffusion unit thereof is respectively β _N, β _N-1..., β ₁, then its distortion factor diffusion coefficient that influenced by people's face ROI coding unit is β ₁β ₂β _N

B.4 upgrade Lagrangian coefficient.

(1) the actual coding mode of statistics people face ROI coding unit (in SKIP, DIRECT, the frame, interframe etc.), motion compensated prediction distortion value and rebuild distortion value.The corresponding people's face ROI coding unit of motion compensated prediction distortion value and its absolute difference average between video coding motion search respective coding unit are rebuild distortion value then corresponding people's face ROI coding unit and its absolute difference average between the reconstruction unit behind the video coding.

(2) if when forefathers' face ROI coding unit be last people's face of present frame ROI coding unit (in the past backward from the top down) by spatial order, calculate in all encoded GOP and current GOP people's face ROI coding unit percentage, the mean motion compensation prediction distortion value of people's face ROI coding unit and the average reconstruction distortion value of people's face ROI coding unit of encoding with mode in the frame in the coded frame.Otherwise, skip to STEP 3.

(3) adjust Lagrangian coefficient.The adjustment formula does accordingly,

Formula 4:

λ_{New} = \frac{λ_{Old}}{η \cdot (1 - α \times (1 - γ) \cdot \overset{&OverBar;}{D} / {\overset{&OverBar;}{D}}_{MCP})}

Wherein, λ _NewBe adjusted Lagrangian coefficient, λ _OldLagrangian coefficient before being to adjust, the total distortion diffusion coefficient of η for obtaining in the step B.3, γ are all people's face ROI coding unit percentages of having encoded with mode in the frame in the coded frame in the current GOP,

The mean motion compensation prediction distortion value of behaviour face ROI coding unit,

The average reconstruction distortion value of behaviour face ROI coding unit, α is a constant value, optional scope be [0.88,1.0).

B.5 based on the Lagrangian coefficient that has upgraded in B.4, call the lagrangian optimization method people's face ROI coding unit is carried out RDO.

For non-face ROI coding unit,

B.6 if the Lagrangian coefficient in the current existence B.4 is η λ with its product with corresponding total distortion degree diffusion coefficient _NewThe Lagrangian coefficient that substitutes conventional RD O carries out the RDO coding of non-face ROI coding unit.Otherwise, carry out the optimization of non-face ROI coding unit by conventional RD O and corresponding Lagrangian coefficient and encode.

Employing the present invention is based on the conversational video coding method that human face region time domain dependence and overall rate-distortion optimization thought combine; Stress to optimize the people's face ROI coding unit that the human eye subjective visual quality do is had considerable influence from the angle of the overall situation; People's face ROI coding unit and following subjective and objective quality have better been guaranteed with its coding unit as a reference; Avoided spreading caused additional bit expense because of the distortion factor in traditional cataloged procedure; Under the prerequisite of keeping or promote the coded image subjective and objective quality, effectively reduce the conversational video encoder bit rate.Simultaneously, the present invention is different according to people's face ROI coding unit and non-face ROI coding unit importance in the conversational video coding, realizes selectable RDO coding, when ensureing the coding result subjective and objective quality, has improved the coding efficiency of conversational video sequence.In addition; The present invention is based on the conversational video coding method that human face region coded time domain dependence and overall rate-distortion optimization thought combines and be compatible with traditional sequential encoding structure fully; Be easy to realize, require application scenarios such as real-time video coding greater than a GOP time-delay applicable to video storage and real-time.

Description of drawings

Fig. 1 illustrates series of drawing for example for the conversational video sequence.

Fig. 2 is that frame, GOP and the coding unit of conversational video sequence formed sketch map.

Fig. 3 face ROI elementary cell example schematic of behaving.

Fig. 4 is a kind of people's face ROI coding unit time domain diffusion chain building flow chart that can be for reference.

Fig. 5 is the sweep forward procedure chart.

Fig. 6 is the sweep backward procedure chart.

Fig. 7 is the step block diagram of the inventive method.

Explanation about accompanying drawing and term:

1, (session) video sequence, GOP, (people's face ROI) coding unit

The conversational video sequence is a kind of continuous and video-frequency band set that formed with the video image (being called picture again) that people's face head shoulder image is the main body by some frames.The video sequences that the field produced such as video conference, visual telephone, news report are typical case's representative of conversational video sequence, and are as shown in Figure 1.

Because the conversational video sequence still belongs to conventional video sequence category, hereinafter still is object with the video sequence during to GOP and coding unit explanation.

GOP is the picture group in the video sequence, one group of continuous picture in the corresponding video sequence to be encoded of GOP.Common H.264/AVC or in the MPEG video encoding standard; Frame format is divided into three kinds of I (intra-coded frame), P (forward predicted frame), B (bi-directional predicted frames); As line up IBBPBBPBBPBBP ... Pattern, this continuous frame picture (picture) combination is a GOP.According to the difference of employing coded system, GOP length generally can be set to 1-15.

Coding unit is a basic conception in the video coding, is made up of a luminance pixel coding unit and two additional chroma coder unit usually.With common coding unit---macro block (Macroblock) is an example; A coded image (video frame image) can comprise several macro blocks; Each macro block is made up of a luminance pixel and two chroma pixel pieces, and wherein luminance block is the block of pixels of 16x16 size, and the size of two colourity block of image pixels is decided according to the sample format of its image; As for the YUV420 sampled images, chrominance block is the block of pixels of 8x8 size.In each picture frame, several macro blocks is arranged form in blocks, and video coding algorithm is a unit with the macro block, and macro block is encoded one by one, is organized into the continuous video code stream.

If certain (session) video sequence comprises the N frame, each GOP length is M, and what then be somebody's turn to do (session) video sequence and GOP, coding unit concerns that sketch map is as shown in Figure 2.

Each coded frame of conversational video sequence is being carried out similar coding unit as shown in Figure 2 when dividing, people's face ROI the coding unit of all or part of covering be called people's face ROI coding unit.As shown in Figure 3, light grey dashed lines labeled the position of each coding unit in the current encoded image, black dash line collimation mark has been remembered the testing result of people's face ROI, then all coding units in black side's point frame all are people's face ROI coding unit.

2, people's face ROI coding unit time domain diffusion chain

Considering under the dependent prerequisite of coded time domain, can form people's face ROI coding unit time domain diffusion chain linked together by its other all or part of coding units that influence.Wherein, the people's face ROI diffusion unit that is otherwise known as of other unit except that people's face ROI coding unit on the time domain diffusion chain.

In theory; Above-mentioned extending influence mainly comes from the motion compensated prediction reference; Therefore diffusion unit is corresponding to the whole of people's face ROI coding unit and all or part of coding unit of piece as a reference partly, and with these all or part of coding units follow-up all or part of coding unit of piece as a reference.Utilize the motion compensation information in the actual coding process can construct people's face ROI coding unit time domain diffusion chain accurately.

People's face ROI coding unit time domain diffusion chain building method sketch map of a kind of simplification of mentioning in the summary of the invention is as shown in Figure 4.

Wherein, The aterrimus frame of broken lines representes to constitute people's face ROI coding unit of people's face ROI coding unit time domain diffusion chain and the people's face ROI diffusion unit on subsequent frame thereof; When near the solid black lines frame table the aterrimus frame of broken lines is shown in structure people face ROI coding unit time domain diffusion chain, residing actual coding unit, people's face ROI diffusion unit center in each frame.People's face ROI diffusion unit among the figure on the n+1 frame directly obtains based on the forward prediction vector of people's face ROI coding unit on the n frame, and the people's face ROI diffusion unit on the n+2 frame then obtains based on the forward prediction vector of place, people's face ROI diffusion unit center coding unit on the n+1 frame.Accordingly, the people's face ROI diffusion unit on the n+3 frame will obtain based on the forward prediction vector of place, people's face ROI diffusion unit center coding unit on the n+2 frame, and follow-up diffusion unit is analogized with this rule.

3, propulsion search, best match unit, forward motion vector, forward prediction difference.

The propulsion search procedure is similar to the motion search process in the conventional video coding technique, and sketch map is as shown in Figure 5.Its basic thought is to think that the displacement of interior all pixels of each coding unit of video sequence is identical; Thereby each coding unit is found out and the most similar unit, present encoding unit with the hunting zone according to certain matching criterior (generally adopting minimum average B configuration absolute value error MAD or minimum absolute difference value and SAD) in reference frame; It is best match unit; The relative displacement of best match unit and present encoding unit is called forward motion vector, and matching error value between the two is called the forward prediction difference.What be different from conventional video coding motion search process is, propulsion is searched for selected reference frame for by next original coding frame in the current GOP of coding sequential, but not reconstruction frames.In Fig. 5, CU _nBe the specific coding unit in the n frame, CU _N+1Be CU _nBest match unit in the n+1 frame that obtains through the propulsion searching method, MV _nForward motion vector for correspondence.

4, reverse search, back forecast difference

The reverse search class is similar to propulsion and searches element.Different is that it is that sketch map is as shown in Figure 6 by the back to the original coding frame of sequential of encoding that selected reference frame is searched in reverse.Wherein, CU _N+1Be the specific coding unit in the n+1 frame, CU _nBe the best match unit in the n frame that obtains through the reverse searching method, MV _N+1Reverse vector for correspondence.CU _N+1With CU _nMatching error value between the two is called the back forecast difference, and matching criterior is consistent with the propulsion search.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is described further.

For ease of explanation, and be without loss of generality, conversational video sequence to be encoded done following supposition:

Suppose that the coding unit size is 16*16;

Suppose that coded image resolution is 352*288, then coding unit quantity is 22*18, is 1-396 by row order number consecutively;

Suppose that the coding totalframes is 100, the GOP size is 5;

Suppose that each coded frame all can carry out people's face according to suitable method for detecting human face and detect.

According to above hypothesis, present embodiment is that example is introduced with first GOP.

A. all coded frame in the current GOP are carried out people's face ROI and detect, thus the particular location of definite people's face ROI coding unit.If detecting the serial number of descendant's face ROI coding unit in each frame through people's face is followed successively by:

The 1st frame:

53，54，55，74，75，76，77，78，96，97，98，99，100，118，119，120，121，122，141，142，143

The 2nd frame:

52，53，54，73，74，75，76，77，95，96，97，98，99，118，119，120，121，122，140，141，142

The 3rd frame:

53，54，55，74，75，76，77，78，96，97，98，99，100，118，119，120，121，122，140，141，142，143，144

The 4th frame:

53，54，55，74，75，76，77，96，97，98，99，118，119，120，121，140，141，142，143，163，164，165

The 5th frame:

53，54，55，74，75，76，77，96，97，98，99，118，119，120，121，140，141，142，143

B. since the 1st coding unit of the 1st frame, judge whether each coding unit belongs to people's face ROI in each frame, different RDO methods is set is optimized coding.

The 1st frame the 1st is non-face ROI coding unit to the 52nd coding unit; There is not the Lagrangian coefficient that has upgraded based on people's face ROI this moment; The present invention will give tacit consent to by routine and encode based on the RDO method of independence assumption, and concrete grammar please with reference to step B.6.

The 53rd coding unit behaviour face of the 1st frame ROI coding unit, but, lack corresponding reverse searching for reference frame because present frame is current GOP first frame, thereby can't carry out the distortion factor according to the laplacian distribution characteristic of residual error DCT coefficient and estimate.Therefore, the rate-distortion optimization of each one face ROI coding unit coding still carries out based on the method for independence assumption by routine in the 1st frame.

Preceding 51 coding units of the 2nd frame are optimized coding by routine based on the RDO method of independence assumption.

The 52nd coding unit behaviour face of the 2nd frame ROI coding unit, its RDO method is following:

B.1 construct people's face ROI coding unit time domain diffusion chain.Step is following:

(1) all coding units in the current GOP is carried out the propulsion search, preserve corresponding forward motion vector and forward prediction difference (this step is only carried out once) in current GOP.

(2) according to the forward motion vector of the 52nd coding unit of the 2nd frame, its diffusion position in the 3rd frame of deriving, thus obtain the diffusion unit of the 52nd coding unit of the 2nd frame on the 3rd frame.The best match unit of the 52nd coding unit of the 2nd frame when obviously, the corresponding propulsion of this diffusion unit is searched for.Note; Resulting here diffusion unit should not surpass the people's face ROI scope that obtains that detects in the 3rd frame; If exceed then the diffusion unit horizontal translation to people's face ROI scope, then continue vertical translation and be positioned at people's face ROI scope fully until diffusion unit as still exceeding behind the horizontal translation.

(3) get the forward motion vector of the actual coding unit forward motion vector at place, diffusion unit center in the 3rd frame as this diffusion unit; The diffusion position of the 52nd coding unit of the 2nd frame on the 4th frame of deriving, thus the diffusion unit of the 52nd coding unit of the 2nd frame on the 4th frame obtained.Similar with step (1), this step need detect resulting diffusion unit equally and whether exceed people's face ROI scope.In the forward prediction difference of the ratio situation shared of diffusion unit in the 3rd frame and each actual coding unit, can obtain the forward prediction difference of this diffusion unit in each actual coding unit.

(4) the 52nd coding unit of repeating step (2) derivation the 2nd frame other coded frame in current GOP are the forward prediction difference of diffusion unit in the 4th frame and the 5th frame.The 52nd coding unit of the 2nd frame and all diffusion units on subsequent frame thereof are linked together promptly form people's face ROI coding unit time domain diffusion chain.Stating step after each forward prediction difference is preserved and supplied uses.

B.2 calculate the 52nd coding unit of the 2nd frame and the distortion factor estimated value of all diffusion units on people's face ROI coding unit time domain diffusion chain.Method step is: at first, the 52nd coding unit of the 2nd frame carried out the reverse search to obtain its best match unit position in the 1st frame, write down corresponding back forecast difference.Secondly, calculate the distortion factor estimated value of the 52nd coding unit of the 2nd frame according to formula in the summary of the invention 1 and formula 2.At last, according to the forward prediction difference of the 52nd coding unit of the 2nd frame, calculate the distortion factor estimated value of first diffusion unit of people's face ROI coding unit time domain diffusion chain.By that analogy, until the distortion factor estimated value that obtains all diffusion units on people's face ROI coding unit time domain diffusion chain.

B.3 calculate the distortion factor diffusion coefficient that each diffusion unit is influenced by the 52nd coding unit of the 2nd frame on people's face ROI coding unit time domain diffusion chain, and summation obtains total distortion degree diffusion coefficient.Method step is: at first; According to the distortion factor estimated value of the 52nd coding unit of the 2nd frame, the distortion factor estimated value and the forward prediction difference of first diffusion unit of people's face ROI coding unit time domain diffusion chain, calculate the diffusion coefficient that first diffusion unit is influenced by people's face ROI coding unit in conjunction with formula in the summary of the invention 3.Secondly; Similar with abovementioned steps; Calculate second diffusion coefficient that diffusion unit is influenced by first diffusion unit on people's face ROI coding unit time domain prediction chain, the product of itself and previous diffusion unit diffusion coefficient is the diffusion coefficient that second diffusion unit influenced by people's face ROI coding unit.The 3rd; Calculate the distortion factor diffusion coefficient that other diffusion units are influenced by last diffusion unit on people's face ROI coding unit time domain diffusion chain respectively; Utilize then and take advantage of sexual intercourse to draw the distortion factor diffusion coefficient that it is influenced by people's face ROI coding unit, until arriving last diffusion unit.At last; The distortion factor diffusion coefficient summation (distortion factor diffusion coefficient that people's face ROI coding unit receives self to influence is 1) that influenced by people's face ROI coding unit people's face ROI coding unit and each diffusion unit on the people's face ROI coding unit time domain diffusion chain, thus the total distortion degree diffusion coefficient of people's face ROI coding unit obtained.

B.4 upgrade the Lagrangian coefficient of the 52nd coding unit of the 2nd frame.Step is following:

(1) adds up actual coding mode (in SKIP, DIRECT, the frame, interframe etc.), the motion compensated prediction distortion value of people's face ROI coding unit in the coded frame and rebuild distortion value.

(2) if when forefathers' face ROI coding unit be last people's face of whole GOP ROI coding unit (press space-time incremental order the past backward from the top down), calculate people's face ROI coding unit percentage, the mean motion compensation prediction distortion value of people's face ROI coding unit and the average reconstruction distortion value of people's face ROI coding unit that all have been encoded with mode in the frame in the coded frame.Otherwise, skip to (3).

(3) press the Lagrangian coefficient of formula 4 renewals in the summary of the invention.

B.5 based on the Lagrangian coefficient that upgrades the 52nd coding unit of the 2nd frame carried out the RDO coding.

For the 53rd, 54 coding units of the 2nd frame, encode by the 52nd identical RDO coding step of coding unit of the 2nd frame.

For the 55th coding unit of the 2nd frame, the present encoding unit is non-face ROI coding unit, and its RDO coding carries out based on the method for independence assumption by routine, and method is described below:

B.6 owing to existed people's face ROI coding unit to be encoded and carried out Lagrangian coefficient update step before it, the Lagrangian coefficient that is therefore adopted is based on the i.e. η λ of the 54th coding unit of the 2nd frame of person of modern times's face ROI coding unit _NewSubstitute, the RDO method is consistent with conventional RD O.For the coding unit in the 1st frame, owing to do not have Lagrangian coefficient update before, so the present invention directly carries out RDO based on traditional Lagrangian coefficient (calculating by current quantization parameter value).

For other people face ROI coding unit and non-face ROI coding unit in the current GOP, all encode by the RDO method of the 52nd coding unit of described the 2nd frame of preceding text and the 1st coding unit of the 1st frame or the 55th coding unit of the 2nd frame.

Claims

1. the conversational video coding method that combines with overall rate-distortion optimization of human face region time domain dependence; Utilize people's face region of interest ROI time domain dependence between the adjacent encoder frame in same image sets GOP; Estimate the distortion factor of people's face ROI in advance and extend influence; For optimal motion vectors and mode division select to provide efficient assistant method, to realize video sequence integral body and the synchronous raising of people's face ROI on subjective and objective quality, its implementation comprises following series of steps:

A. before each GOP of coding conversational video sequence to current GOP in all coded frame carry out people's face ROI and detect, thereby confirm the particular location of people's face ROI coding unit;

B. whether belong to people's face ROI according to the present encoding unit, select different RDO methods to be optimized coding:

For people's face ROI coding unit,

B.1 construct people's face ROI coding unit time domain diffusion chain, building method is following:

(1) each coding unit in the present encoding GOP of conversational video sequence is carried out the propulsion search, to obtain each coding unit best match unit position in next frame, forward motion vector and forward prediction difference that record is corresponding; This step is only carried out once in current GOP;

(2) according to the diffusion position of forward motion vector derivation people face ROI coding unit in next coded frame of current GOP that obtains in the step (1), this diffusion position is pairing to be called people's face ROI diffusion unit with the identical unit of people's face ROI coding unit size; For the purpose of the difference, this step people face ROI diffusion unit is called people's face ROI diffusion unit No. 1, the forward prediction difference of storage people face ROI coding unit and the position of No. 1 people's face ROI diffusion unit;

(3), thereby obtain its diffusion position in the next again coded frame of current GOP with the forward motion vector of the actual coding unit at place, No. 1 people's face ROI diffusion unit center in the step (2) forward motion vector as this people's face ROI diffusion unit; This diffusion position is pairing promptly to be the people face ROI diffusion unit of people's face ROI coding unit in the next again coded frame of current GOP with the identical unit of people's face ROI coding unit size, is called people's face ROI diffusion unit No. 2; Resulting here diffusion unit should not exceed in the summary of the invention steps A people's face ROI scope in the resulting current encoded frame; If exceed then with in diffusion unit horizontal translation to people's face ROI scope as No. 2 people's face ROI diffusion units, then continue vertical translation and be positioned at people's face ROI scope fully if still exceed people's face ROI scope after the translation until diffusion unit; Simultaneously; According to the ratio situation of the resulting No. 1 people's face ROI diffusion unit of step (2) on each actual coding unit; The forward prediction difference of each actual coding unit is sued for peace as the forward prediction difference of No. 1 people's face ROI diffusion unit in proportion, store the position of No. 1 people's face ROI diffusion unit forward prediction difference and No. 2 people's face ROI diffusion units;

(4) repeating step (3) is handled follow-up people's face ROI diffusion unit; When people's face ROI diffusion unit is positioned at the last frame of current GOP; People's face ROI coding unit and all diffusion units on subsequent frame thereof are joined together to form people's face ROI coding unit time domain diffusion chain, state step after each forward prediction difference is preserved and supplied and use;

B.2 calculate the distortion factor estimated value of all diffusion units on people's face ROI coding unit and the people's face ROI coding unit time domain diffusion chain, distortion factor method of estimation is as follows:

Formula 1:

D = D_{MCP} \cdot F (\sqrt{2} Q / \sqrt{D_{MCP}})

Wherein D is a distortion factor estimated value, D _MCPBe a last coding unit of current coding unit or the forward prediction difference of diffusion unit on the time domain diffusion chain, Q is a quantization step, the F in the formula 1 () function, and its computational methods are following,

Formula 2:

F (θ) = [{&Integral;}_{0}^{dθ} y^{2} \cdot p (y) dy + Σ_{k = 0}^{\infty} {&Integral;}_{(k + d) \cdot θ}^{(k + d + 1) \cdot θ} (c (y) \cdot {| y - (k + d + ω) \cdot θ |}^{2} + (1 - c (y)) \cdot y^{2}) \cdot p (y) \cdot dy];

B.3 calculate distortion factor diffusion coefficient and the summation that all diffusion units are influenced by people's face ROI coding unit on people's face ROI coding unit time domain diffusion chain and obtain total distortion degree diffusion coefficient, represent as follows based on the distortion factor diffusion coefficient computational methods of experiment derivation gained,

Formula 3:

β_{t} = \frac{D_{t}}{D_{t - 1} + D_{t}^{MCP}}

The forward prediction difference of representing current diffusion unit;

B.4 upgrade Lagrangian coefficient,

(1) the actual coding mode of statistics people face ROI coding unit comprises in SKIP, DIRECT, the frame, interframe, motion compensated prediction distortion value and rebuild distortion value; The corresponding people's face ROI coding unit of motion compensated prediction distortion value and its absolute difference average between video coding motion search respective coding unit are rebuild distortion value then corresponding people's face ROI coding unit and its absolute difference average between the reconstruction unit behind the video coding;

(2) comply with behind the forward direction spatial order from the top down; If when forefathers' face ROI coding unit is last people's face of present frame ROI coding unit; Calculate in all encoded GOP and current GOP people's face ROI coding unit percentage, the mean motion compensation prediction distortion value of people's face ROI coding unit and the average reconstruction distortion value of people's face ROI coding unit of encoding with mode in the frame in the coded frame; Otherwise, skip to (3);

(3) adjust Lagrangian coefficient, adjust formula accordingly and do,

Formula 4:

λ_{New} = \frac{λ_{Old}}{η \cdot (1 - α \times (1 - γ) \cdot \overset{&OverBar;}{D} / {\overset{&OverBar;}{D}}_{MCP})}

The mean motion compensation prediction distortion value of behaviour face ROI coding unit, The average reconstruction distortion value of behaviour face ROI coding unit, α is a constant value, optional scope be [0.88,1.0);

B.5 based on the Lagrangian coefficient that has upgraded in B.4, call the lagrangian optimization method people's face ROI coding unit is carried out RDO;

For non-face ROI coding unit,

B.6 if the Lagrangian coefficient in the current existence B.4 is η λ with its product with corresponding total distortion degree diffusion coefficient _NewThe Lagrangian coefficient that substitutes conventional RD O carries out the RDO coding of non-face ROI coding unit; Otherwise, carry out the optimization of non-face ROI coding unit by conventional RD O and corresponding Lagrangian coefficient and encode.