CN105590327A

CN105590327A - Motion estimation method and apparatus

Info

Publication number: CN105590327A
Application number: CN201410578173.1A
Authority: CN
Inventors: 陈子冲; 刘余钱; 章国锋
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-10-24
Filing date: 2014-10-24
Publication date: 2016-05-18

Abstract

The invention, which belongs to the technical field of the digital signal processing, discloses a motion estimation method and apparatus. The method comprises: an initial light stream of each frame of image in an inputted image sequence is obtained; according to color information, each frame of image is divided into at least one segment; initial motion models of all segments in each frame of image are obtained; motion segmentation time-sequence matching is carried out on initial motion models of all segments in each frame of image, so that the same object in the image sequence has the same mark in each frame in the image sequence; and according to the time-sequence matching result, a motion segment and an output light stream are obtained. According to the invention, the motion segment and motion estimation of a long image sequence can be processed and the motion segment and motion estimation results with good time-space coherence can be obtained, so that the time-space coherence of the motion segments can be improved effectively.

Description

Method for estimating and device

Technical field

The present invention relates to digital signal processing technique field, particularly a kind of method for estimating and device.

Background technology

Light stream (English: the OpticalFlow) estimation technique, i.e. estimation in sequence of video images, is motion diagramA kind of important method in picture analysis, it has a wide range of applications in computer vision field, for example:To image tracing, video interpolation and three-dimensional reconstruction etc. Wherein, light stream refers to the apparent fortune of brightness of image patternMoving, express the variation of image, due to the information that it has comprised target travel, therefore can observed person be used forDetermine the motion conditions of target. Existing light stream method of estimation conventionally need to be to optical flow field (English: OpticalFlowField) analyze, the set of all light streams is called optical flow field, concrete, and optical flow field refers in imageOne two dimension (the English: 2-Dimensions that all pixels form; Be called for short: 2D) instantaneous velocity, whereinTwo-dimension speed vector is that the three dimensional velocity vectors of visible point in scenery is in the projection of imaging surface. So light stream is notOnly comprise the movable information of observed object, but also included the abundant letter about object three-dimensional structureBreath. In recent years, carrying out based on optical flow field the method that motion segmentation improves estimation accuracy causes graduallyThe concern of academia. How further to improve the accuracy rate that light stream is estimated, and obtaining object of which movement estimationIn time, obtains motion segmentation accurately and becomes gradually the main flow direction of motion estimation techniques development.

Comparatively conventional a kind of method for estimating has proposed a kind of fortune of combining of blocking processing that shows at presentThe moving unified variate model of estimating and cutting apart, represents optical flow field with multiple labeling, and each mark representsA kind of parameterized motor pattern, also represent a motion segmentation simultaneously.

Although can carrying out motion segmentation and the light stream of multiframe, existing method for estimating estimates to obtain comparativelyMotion segmentation accurately still, due to the restriction of space-time cost, can only be carried out a small amount of a few frame figure conventionally simultaneouslyThe processing of picture, is difficult to ensure the space-time consistency of motion segmentation, and wherein, the space-time consistency of motion segmentation isRefer to the estimation of video image and be segmented in time domain to there is uniformity, in spatial domain, there is continuity.

Summary of the invention

In order to improve the space-time consistency of motion segmentation, the embodiment of the present invention provides a kind of method for estimatingAnd device. Described technical scheme is as follows:

First aspect, provides a kind of method for estimating, and described method comprises:

Obtain the initial light stream of each two field picture in the image sequence of input;

According to colouring information, described each two field picture being divided into at least one cuts apart;

Obtain all initial motion model of cutting apart in described each two field picture;

All initial motion model of cutting apart in described each two field picture are carried out to the sequential coupling of motion segmentation,To make thering is identical mark in same target in described image sequence each frame in described image sequenceNote;

Obtain motion segmentation and output light stream according to the result of sequential coupling.

In the possible embodiment of the first of first aspect, described according to all points of each two field pictureBefore the initial motion model of cutting is carried out the sequential coupling of motion segmentation, described method also comprises:

According to all initial motion model of cutting apart in described each two field picture, same movement form will be there isInitial motion model is carried out cluster optimization; And/or

Merge consistent the cutting apart of moving in described each two field picture.

In conjunction with the possible embodiment of the first of first aspect or first aspect, in the possible enforcement of the secondIn mode, described by all initial motion model of cutting apart in described each two field picture carry out motion segmentation timeOrder coupling comprises:

Using the first two field picture in the image sequence of input as initial matching masterplate, its each cutting apart as oneIndividual data sectional, for data sectional described in each distributes unique mark;

For cutting apart in t frameLight stream according to t frame to t-1 frame, follows the tracks ofIn each pixel x_tCorresponding points x in t-1 frame_t-1, wherein, x_t-1In t-1 frame, be designated as cutting apart of place?StructureThe set becomingExpression is cut apartCorresponding likely cutting apart in t-1 frame;

ForIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

Wherein, ︱ x_t︱ is for cutting apartIn with cut apartThe number of corresponding pixel, denominator represents to cut apartThe number of mid point;

CalculateWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

Wherein, rightIn each cutting apartLight stream according to t-1 frame to t frame, obtainsIn eachPoint x_t-1Corresponding points x in t_t, statistics x_tBelong toThe number of point be designated as c,For cutting apartInThe number of point;

CalculateWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k});

WhenTime, determineWithCoupling, willFlag update beMark, and willJoinIn the data sectional at place, wherein, σ_mFor default matching threshold;

WhenWhile not existing coupling to cut apart in t-1 frame, willWith mating cutting apart in t-2 frame;

In the time that being greater than default coupling step-length, matching times finishes coupling, for not existing coupling to cut apartDistributeA new mark, willJoin in new data sectional.

In conjunction with the possible embodiment of the second of first aspect, in the third possible embodiment, instituteState according to the result of sequential coupling and obtain motion segmentation and export light stream comprising: divide according to the result of sequential couplingDo not carry out the space-time consistency optimization of motion segmentation and the space-time consistency optimization of estimation, obtain space-time oneThe motion segmentation causing and output light stream.

In conjunction with the third possible embodiment of first aspect, in the 4th kind of possible embodiment, instituteThe space-time consistency optimization of stating motion segmentation comprises:

The dividing mark of fixing each two field picture is constant;

According to light stream, follow the tracks of each pixel x in t frame_tAt adjacent N_tCorresponding points in frame, are waitedSelect tag set P (x_t)；

According to described candidate's tag set P (x_t) obtain the mark that occurrence number is maximum;

Work as C_s＜N_t/ 2 or C_Pt＜σ_PtTime, wherein, C_sFor the number of cutting apart comprising in each data sectionalAmount, C_PtFor the number of all cut-points in described data sectional, σ_PtFor predetermined threshold value, for described dataEach cutting apart in segmentation, according to the mark situation of the corresponding points in other frames of the point in described data sectional,Select the data sectional of an optimum Match to force to merge;

Upgrade the mark of remaining data sectional.

In conjunction with the third possible embodiment of first aspect, in the 5th kind of possible embodiment, instituteThe space-time consistency optimization of stating estimation comprises:

According to current motion, obtain data sectional that each mark l represents corresponding cut apart imitative in each framePenetrate transformation model

For t frame, according to affine Transform ModelCalculate light stream, to make each in t framePixel x_tCorresponding points x in other frames t '_t'In the Euclidean distance sum minimum of color space;

Upgrade the light stream of described t frame;

Carry out the detection of occlusion area;

The motion of occlusion area is shown to correction.

In conjunction with the possible embodiment of the first of first aspect or first aspect, the 6th kind of possible enforcementIn mode, described initial motion model comprises affine model;

Described in described obtaining, in each two field picture, all initial motion model of cutting apart comprise:

Calculate the parameter of affine transformation, obtain the affine model of matching initial motion;

Obtain the affine motion field of each two field picture according to described affine model;

Described affine motion field and initial optical flow field are merged;

Carry out the detection of occlusion area according to the uniformity of motion;

The affine motion that occlusion area is cut apart according to the place rectification of explicitly moving.

In conjunction with the 6th kind of possible embodiment of first aspect, in the 7th kind of possible embodiment, instituteState according to all initial motion model of cutting apart in described each two field picture, by have same movement form at the beginning ofBeginning motion model carries out cluster optimization and comprises:

To sorting all the cutting apart according to size in a two field picture, successively descending according to areaScreening affine model;

When cutting apart s_iArea be greater than preset area Size_sOr the affine model collection { quantity of affine model in A}Be less than predetermined number Num_ATime, will cut apart s_iAffine model A_iWith { A} compares, if { exist in A}A same A of affine model A'_iEuclidean distance be less than predeterminable range Th_A, give up current affine model A_i，Otherwise by A_iJoin that { A} obtains least affine Models Sets { A}.

In conjunction with the 6th kind of possible embodiment of first aspect, in the 8th kind of possible embodiment, instituteState and merge consistent the cutting apart of moving in described each two field picture and comprise:

For each s of cutting apart, calculate the color distortion of cutting apart interior all pixels under the current affine model of cutting apartSummation;

Computed segmentation S_iThe adjacent set { S of cutting apart of candidate_Nei, wherein, cut apart s ' completely for each candidate is adjacentFoot 2|s'| > | s|;

For { S_NeiIn each adjacent s ' of cutting apart, calculate respectively the uniformity of moving on the same s ' of s bordersIn the uniformity of affine model of the same s ' of current motion of pixelPixel in s is at the affine model of s 'The ratio C of the color distortion of the lower colour-difference similarities and differences self affine model_s,s'；

From { S_NeiIn select and meet S_s,s'＞0.5，F_s,s'＞0.5，C_s,s'The adjacent of > 0.5 cut apart, by C_s,s'MaximumAdjacent cutting apart as segmentation candidates, be designated as s*, work as C_s，s*＞θ_wTime, wherein, θ_wFor default ratio, willS merges in s*;

Each s of cutting apart is merged, if the last merged sum of cutting apart has exceeded one of initial segmentation numberHalf, re-start the estimation of affine model.

In the 9th kind of possible embodiment of first aspect, described initial light stream comprise forward direction light stream and/orOppositely light stream.

Second aspect, provides a kind of movement estimation apparatus, and described device comprises:

Acquiring unit, for obtaining the initial light stream of each two field picture of image sequence of input;

Cutting unit, cuts apart for described each two field picture being divided into at least one according to colouring information;

Motion model estimation unit, for obtaining the described all initial motion model of cutting apart of each two field picture;

Sequential matching unit, for transporting all described each two field picture initial motion model of cutting apartThe moving sequential coupling of cutting apart, to make every in described image sequence of same target in described image sequenceIn one frame, there is identical mark;

Processing unit, for obtaining motion segmentation and output light stream according to the result of sequential coupling.

In the possible embodiment of the first of second aspect, described device also comprises:

Cluster cell, for according to all initial motion model of cutting apart of described each two field picture, will haveThe initial motion model of same movement form is carried out cluster optimization; And/or

Merge cells, for merging consistent the cutting apart of moving of described each two field picture.

In conjunction with the possible embodiment of the first of second aspect or second aspect, in the possible enforcement of the secondIn mode, described sequential matching unit specifically for:

ForIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

CalculateWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

CalculateWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k});

In conjunction with the possible embodiment of the second of second aspect, in the third possible embodiment, instituteStating processing unit comprises: motion segmentation optimizes module and estimation is optimized module;

Described motion segmentation is optimized module, for carry out the space-time one of motion segmentation according to the result of sequential couplingThe optimization of causing property;

Described estimation is optimized module, for carry out the space-time one of estimation according to the result of sequential couplingThe optimization of causing property.

In conjunction with the third possible embodiment of second aspect, in the 4th kind of possible embodiment, instituteState fortune motion segmentation optimize module specifically for:

The dividing mark of fixing each two field picture is constant;

Upgrade the mark of remaining data sectional.

In conjunction with the third possible embodiment of second aspect, in the 5th kind of possible embodiment, instituteState estimation optimize module specifically for:

Upgrade the light stream of described t frame;

Carry out the detection of occlusion area;

The motion of occlusion area is shown to correction.

In conjunction with the possible embodiment of the first of second aspect or second aspect, the 6th kind of possible enforcementIn mode, described initial motion model comprises affine model;

Described motion model estimation unit specifically for:

Described affine motion field and initial optical flow field are merged;

In conjunction with the 6th kind of possible embodiment of second aspect, in the 7th kind of possible embodiment, instituteState cluster cell specifically for:

In conjunction with the 6th kind of possible embodiment of second aspect, in the 8th kind of possible embodiment, instituteState merge cells specifically for:

In the 9th kind of possible embodiment of second aspect, described initial light stream comprise forward direction light stream and/orOppositely light stream.

The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:

The invention provides a kind of method for estimating and device, by by each two field picture in image sequenceIn all initial motion model of cutting apart carry out the sequential coupling of motion segmentation, to make same in image sequenceIn one object each frame in this image sequence, there is identical mark, and obtain according to the result of sequential couplingTo motion segmentation and output light stream. Adopt a kind of like this method for estimating can process longer image sequenceMotion segmentation and estimation, obtain space-time consistency good motion segmentation and motion estimation result, fromAnd can effectively improve the space-time consistency of motion segmentation.

Brief description of the drawings

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, institute in describing embodiment belowNeed the accompanying drawing using to be briefly described, apparently, the accompanying drawing in the following describes is only the present inventionSome embodiment, for those of ordinary skill in the art, do not paying under the prerequisite of creative work,Can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the method flow diagram of a kind of method for estimating of providing of the embodiment of the present invention;

Fig. 2 is the original sequence schematic diagram that the embodiment of the present invention provides;

Fig. 3 is the initial light stream schematic diagram of the image sequence that provides of the embodiment of the present invention;

Fig. 4 is that the image of the image sequence that provides of the embodiment of the present invention is cut apart schematic diagram;

Fig. 5 be the embodiment of the present invention provide obtain all initial motion model of cutting apart in each two field pictureThe method flow diagram of concrete steps;

Fig. 6 is the figure of the initial light stream that provides of the embodiment of the present invention through affine model parameter Estimation and after mergingThe light stream schematic diagram of picture sequence;

Fig. 7 is the schematic diagram that blocks of the image sequence that provides of the embodiment of the present invention;

Fig. 8 is the method flow diagram of another method for estimating of providing of the embodiment of the present invention;

Fig. 9 is the method flow diagram of the concrete steps optimized of the cluster that provides of the embodiment of the present invention;

Figure 10 is the method flow diagram of the concrete steps of the combination and segmentation that provides of the embodiment of the present invention;

Figure 11 be the embodiment of the present invention provide complete the image order of motion after the consistent merging of cutting apartList intention;

Figure 12 is the method flow diagram of the concrete steps of the motion segmentation sequential coupling that provides of the embodiment of the present invention;

Figure 13 is the method for the concrete steps optimized of the space-time consistency of the motion segmentation that provides of the embodiment of the present inventionFlow chart;

Figure 14 is the method for the concrete steps optimized of the space-time consistency of the estimation that provides of the embodiment of the present inventionFlow chart;

Figure 15 is the structural representation of a kind of movement estimation apparatus of providing of the embodiment of the present invention;

Figure 16 is the structural representation of another movement estimation apparatus of providing of the embodiment of the present invention.

Detailed description of the invention

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present inventionEmbodiment is described in further detail.

The method for estimating that the embodiment of the present invention provides can be for moving and estimate sequence of video imagesMeter, the motion estimation result of acquisition can destination object be cut apart, identify, be followed the tracks of for realizing andThe various application such as shape information recovery. Certainly more than, also only illustrate, carry out the fortune of sequence of video imagesThe result that moving estimation obtains can be widely used in various computer vision fields, the embodiment of the present invention pairThis does not limit.

The method for estimating that the embodiment of the present invention provides, as shown in Figure 1, the method flow process comprises:

Step 101, movement estimation apparatus obtain the initial light stream of each two field picture in the image sequence of input.

Step 102, movement estimation apparatus are divided at least one according to colouring information by each two field picture and cut apart.

Step 103, movement estimation apparatus obtain all initial motion model of cutting apart in each two field picture.

Step 106, movement estimation apparatus move all initial motion model of cutting apart in each two field pictureThe sequential coupling of cutting apart, has in same target in image sequence each frame in image sequence makingIdentical mark.

Step 107, movement estimation apparatus obtain motion segmentation and output light stream according to the result of sequential coupling.

The invention provides a kind of method for estimating, by owning in each two field picture in image sequenceThe initial motion model of cutting apart is carried out the sequential coupling of motion segmentation, to make the same target in image sequenceIn each frame in this image sequence, there is identical mark, and moved according to the result of sequential couplingCut apart and export light stream. Adopt a kind of like this method for estimating can process the motion of longer image sequenceCut apart and estimation, obtain good motion segmentation and the motion estimation result of space-time consistency, thus canEffectively improve the space-time consistency of motion segmentation.

Concrete, in actual process of carrying out estimation, for the input of obtaining described in step 101Image sequence in the process of initial light stream of each two field picture, can adopt existing light in prior artStream method of estimation is estimated to obtain to the image in image sequence. For example, can be in image sequence everyTwo frame adjacent images carry out estimation, owing to needing to carry out phase in follow-up space-time consistency optimizing processThe tracking of adjacent frame therefore can be carried out forward direction light stream (English: forwardflow) and/or reverse light stream (English simultaneouslyLiterary composition: estimation backwardflow). Taking the image sequence shown in Fig. 2 as example, initial forward direction light stream canWith as shown in Figure 3.

Described in step 102, according to colouring information, described each two field picture is divided into at least one cuts apartProcess can adopt various existing image partition methods equally, and the embodiment of the present invention does not limit this.Because this step is cut apart based on color of image information, the segmentation result therefore obtaining can keep and thingThe uniformity of body real border, same taking the image sequence shown in Fig. 2 as example, according to colouring information to oftenThe image that one frame carries out is cut apart can be as shown in Figure 4. Wherein, Fig. 4 is for adopting conventional images dividing methodThe figure obtaining is cut apart figure, and in the figure each is cut apart interior image and had identical colouring information, everyUnder individual pixel cut apart can mark with set { s_iIn a figure denote.

In step 103, can suppose that the motion of the pixel in each cutting apart is consistent, and meet certainForms of motion, the pixel within each cutting apart has motion uniformity. Concrete, in the invention processIn example, suppose that each forms of motion of cutting apart is affine transformation, regard each cutting apart as a plane of movement,The initial motion model of cutting apart interior point is affine model. Obtain all initial motions of cutting apart in each two field pictureThe concrete steps of model as shown in Figure 5, comprise the steps 1031 to 1035:

Step 1031, calculate the parameter of affine transformation, obtain the affine model of matching initial motion.

For example, can suppose that pixel X in the t frame homogeneous coordinates in image coordinate system are X=(x, y, 1),In t two field picture, pixel X is expressed as to the affine transformation matrix of t ' frame

A = (\begin{matrix} a 1 & a 2 & b 1 \\ a 3 & a 4 & b 2 \\ 0 & 0 & 1 \end{matrix})

In t, the position of pixel X in t ' is X '=AX, note X '=(x ', y ', 1), have

x'＝a1·x+a2·y+b1

y'＝a3·x+a4·y+b2

Note X is w=(u, v) to the motion of X ', u=x '-x, and v=y '-y, that is:

u＝a1·x+a2·y+b1-x

v＝a3·x+a4·y+b2-y

The light stream u obtaining according to initial calculation, v, can set up system of linear equations:

a1·x+a2·y+b1＝u+x

a3·x+a4·y+b2＝v+y

Owing to containing 6 variablees in A, but the motion of horizontal and vertical direction is independent, therefore needs 3 pointsCould estimate A. For the mistake that reduces to estimate, therefore counting of actual samples is greater than 3 points.In the time having n data point, the system of linear equations of acquisition is as follows:

\{\begin{matrix} a 1 \cdot x_{1} + a 2 \cdot y_{1} + b 1 = u_{1} + x_{1} \\ a 3 \cdot x_{1} + a 4 \cdot y_{1} + b 2 = v_{1} + y_{1} \\ a 1 \cdot x_{2} + a 2 \cdot y_{2} + b 1 = u_{2} + x_{2} \\ a 3 \cdot x_{2} + a 4 \cdot y_{2} + b 2 = v_{2} + y_{2} \\ . \\ . \\ . \\ a 1 \cdot x_{n} + a 2 \cdot y_{n} + b 1 = u_{n} + x_{n} \\ a 3 \cdot x_{n} + a 4 \cdot y_{n} + b 2 = v_{n} + y_{n} \end{matrix}

Wherein, n represents the number of the sample point for carrying out parameter fitting, this system of linear equations solve employingLeast square method is carried out.

But it should be noted that, carry out parameter fitting if interior all points are cut apart in use, will increaseAdd time cost, can obviously not improve the accuracy of affine transformation simultaneously. Therefore can be to cutting apart inner pointFirst carry out sampling uniformly to obtain data point set, then adopt the consistent (English: Random of random sampleSampleConsensus, is called for short: mode RANSAC) is carried out the estimation of affine model.

Concrete, in the time that initial movement accuracy is relatively good, only need to choose a small amount of data just can be accurateAffine model is estimated, but in the time that initial movement accuracy is poor, just need to choose more pointEstimate. Therefore, the size of data point set is relevant with the accuracy of initial light stream, can be by initiallyThe accuracy of motion is estimated, and then the sample rate of specified data point set. First, calculate according to initial lightThe present frame t mid point x that stream obtains_tWith its corresponding points x in consecutive frame t '_t’Between color distortion d, thenCalculate in present frame average color difference a little, i.e. avg_d, and then count color distortion and be less thanThe number of the point of average avg_d, thus the accuracy rate of initial light stream obtainedAs r > 0.75 time, think that initial light stream is more accurately, sample rate is taken as 0.25, i.e. every four pixels choosingSelect one as data point; When 0.75>r>0.5 time, sample rate is taken as 0.5; In the time of r<0.5, sample rate is1.0。

After having determined sample rate, for the each pixel that may elect data point as, by this pixel place relativelyD compare with avg_d, determine want this pixel to be chosen to be data point. If its d is greater than in advanceIf threshold value, gets rid of this pixel, otherwise adds in set of data points. Wherein, predetermined threshold value canChoose according to actual needs, when the initial motion at pixel place is inaccurate or have while blocking its d valueBe greater than this predetermined threshold value.

The data point sampling in cutting apart is assigned to during corresponding this cut apart, thereby obtains each S of cutting apart_i'sData point set DS_i, next to each S of cutting apart_iCarry out the estimation of affine model by RANSAC method.

Step 1032, obtain the affine motion field of each two field picture according to affine model.

Step 1033, affine motion field and initial optical flow field are merged.

For example, can use the pseudo-boolean of secondary to optimize (English: Quadratic with initial optical flow field affine motion fieldPseudo-BooleanOptimization, is called for short: QPBO) algorithm merges. Fig. 6 is initial light stream warpCross the light stream example after affine model parameter Estimation and fusion, after merging with affine motion, energyEnough initial light stream excessively level and smooth phenomenons on Moving Objects border of effectively improving.

Step 1034, according to motion uniformity carry out the detection of occlusion area.

The rectification of explicitly moving of step 1035, the affine motion that occlusion area is cut apart according to place.

In light stream is estimated, in the time that being blocked, the specific objective object in image sequence will cause this target pairThe tracking failure of elephant, occlusion issue is a relatively stubborn problem, existing most of optical flow approach are allImplicitly adopt robust function to process occlusion area, the result that this processing mode obtains is also inadequateGood. In embodiments of the present invention, after the detection of blocking according to the uniformity of motion, can be to blockingThe affine motion that region is cut apart according to the place rectification of explicitly moving.

For example,, for the some pixel x in t two field picture_t, according to x_tMotion, can obtain its underCorresponding points x ' in one frame (x '=x_t+v(x_t). For an x ', the motion according to its place frame to present frame,Can obtain the corresponding points x of x ' in present frame_t+1→t(x_t+1→t=x '+v (x ')), if || x_t-x_t+1→t||＞θ_fAnd s (x_t)≠s(x_t+1→t)(θ_fFor threshold values, represent the largest motion deviation of permission, as be made as 1 pixel),Think x_tIn next frame, be blocked, be designated as O_t+1→t(x_t)=1, otherwise O_t+1→t(x_t)=0, representsx_tBe not blocked. In like manner, if x_tIn former frame, be blocked, correspondingly had O_t→t-1(x_t)＝1。

As shown in Figure 7, wherein white portion represents the part being blocked to the Occlusion Map O that detection obtains. According toThe Occlusion Map O that detection obtains explicitly blocks processing, if x_tBe blocked, i.e. O (x_t)=1,Make v (x_t)＝Ax_t-x_t, use affine motion as x_tMotion.

Optionally, as shown in Figure 8, before step 106, the estimation side that the embodiment of the present invention providesMethod can also comprise the steps at least one item in 104 and 105:

Step 104, movement estimation apparatus are according to all initial motion model of cutting apart in each two field picture, by toolThere is the initial motion model of same movement form to carry out cluster optimization.

Can find that by observing, in reality scene, forms of motion often has repeatability and unicity. MotionThe repeatability of form refers to that the forms of motion of different objects in Same Scene may be identical, forms of motionForms of motion that unicity refers to a large amount of objects in Same Scene minority often, although object in sceneMotion may be complicated, but same target is inner or the motion of several objects but meets same fortuneMoving form. Therefore, can extract the main movement form in scene, and then according to these main movement shapesFormula is carried out cluster optimization to the initial motion model of object in scene.

Concrete, after estimating through affine model, obtain a series of affine model, these affine modelsConventionally there is very large redundancy, be therefore necessary to carry out the cluster of model, obtain a least affine modelCollection { A}. Concrete, as shown in Figure 9, step 104 comprises the steps 1041 to 1042:

Step 1041, to sorting all the cutting apart according to size in a two field picture, according to area byLarge to the little affine model that screens successively.

Step 1042, when cutting apart S_iArea be greater than preset area Size_sOr affine model collection is { affine in A}The quantity of model is less than predetermined number Num_ATime, will cut apart S_iAffine model A_iWith A} compares, if{ in A}, there is a same A of affine model A '_iEuclidean distance be less than predeterminable range Th_A, give up current imitativePenetrate model A_i, otherwise by A_iJoin that { A} obtains least affine Models Sets { A}.

Wherein, preset area Size_sRepresent one and cut apart big or small threshold value, this value can be entered according to actual needsRow is chosen, as Size_s=W*H*0.01. Predetermined number Num_ARepresent the lower limit of affine model number, according toConcrete scene is set, as is made as 10. Predeterminable range Th_AFor default Euclidean distance, as be made as 0.01. ThisBright embodiment does not limit the value of preset value.

Further, by { in A}, the same mark value l of each affine model is corresponding, adds a volume simultaneouslyOuter mark is used for representing initial motion, and therefore the number of mark is N_l=| { next A}|+1 solves about fortuneMoving multiple labeling problem, as l < N_l-1 o'clock, obtain the motion of each point according to affine model, work as l=N_l-1Time, obtain the motion of each point according to initial light stream.

Target equation is:

E (x, l) = \underset{x}{Σ} (λ_{Data} w_{O} (x) \frac{d^{2}}{d^{2} + σ_{C}^{2}} + λ_{Smooth} \underset{y &Element; N (x)}{Σ} w_{s} (x, y) [l (x) &NotEqual; l (y)])

Wherein, w_o(x) represent to block weights,

w_{O} (x) = \{\begin{matrix} 0.01 & O (x) = 1 \\ 1 & O (x) = 0 \end{matrix};

D is the Europe of rgb spaceFormula distance, d=||I'(x')-I (x) ||; w_s(x, y) is level and smooth coefficient, cuts apart acquisition according to current,

w_{s} (x, y) = \{\begin{matrix} 1 & s (x) = s (y) \\ 0 & Otherwise \end{matrix},

N (x) represents the neighborhood space of x, λ_Data,λ_SmoothBe respectively data item peaceThe weights of sliding.

Use image cutting (English: Graphcut) method to solve above-mentioned energy equation minimization problem, and rootAccording to the solution obtaining, optical flow field is upgraded.

Through the combined optimization of too much affine model, and move after correction according to blocking estimated result, canEffectively correct the motion of occlusion area.

Consider the motion uniformity of cutting apart, because the initial segmentation obtaining according to clustering algorithm is in sequentialThe poor over-segmentation (English: oversegmentation) of uniformity, and be to obtain according to color and spaceCut apart, instead of according to motion. Therefore, be necessary initial segmentation to close according to the consistent situation of motionAnd, thereby obtain consistent the cutting apart of moving. On the one hand, cut apart through can effectively reducing after mergingNumber, also reduced the number of mark, reduced space-time in follow-up space-time consistency optimizing process multipleAssorted degree; Secondly,, after the consistent merging of motion, it is certain that the motion segmentation obtaining has kept in sequentialUniformity, the convergence rate of the follow-up optimization of quickening that can be effectively.

And/or before step 106, the method for estimating that the embodiment of the present invention provides can also comprise:

Step 105, movement estimation apparatus merge consistent the cutting apart of moving in each two field picture.

In order to improve the uniformity of initial segmentation in sequential, be necessary initial segmentation according to the consistent feelings of motionCondition merges, thereby obtains consistent the cutting apart of moving. On the one hand, after merging, can effectively subtractThe number of cutting apart less, also reduced the number of mark, reduced in follow-up space-time consistency optimizing processSpace-time Complexity; On the other hand, after the merging of cutting apart through motion uniformity, the motion obtainingBe segmented in and in sequential, kept certain uniformity, thus the convergence speed of the follow-up optimization of quickening that can be effectivelyDegree.

Concrete, as shown in figure 10, step 105 comprises the steps 1051 to 1055:

Step 1051, for each s of cutting apart, calculate and under the current affine model of cutting apart, cut apart interior all pixelsColor distortion summation.

Step 1052, computed segmentation S_iThe adjacent set { S of cutting apart of candidate_Nei, wherein, for each candidate's phaseNeighbour is cut apart s ' and is met 2|s'| > | s|.

Step 1053, for { S_NeiIn each adjacent s ' of cutting apart, calculate respectively on the same s ' of s border, move oneCausing property S_s,s', the uniformity F of the affine model of the same s ' of current motion of the pixel in s_s,s', the pixel in s is at s 'Affine model under the ratio C of color distortion of the colour-difference similarities and differences self affine model_s,s'。

Wherein, S_S,S’，F_S,S’，C_S,S’Account form is respectively:

S_{{s, s}^{'}} = \frac{\underset{x &Element; s, y &Element; s^{'}, | x - y | = 1}{Σ} (| | v (x) - v (y) | | < {Th}_{Smooth})}{\underset{x &Element; s, y &Element; s^{'}, | x - y | = 1}{Σ} 1}

F_{{s, s}^{'}} = \frac{\underset{x &Element; s}{Σ} (| | v (x) - (A_{s^{'}} x - x) | | < {Th}_{Flow})}{\underset{x &Element; s}{Σ} (1)}

C_{{s, s}^{'}} = \frac{\underset{x &Element; s}{Σ} | | I^{'} (A_{s} x) - I (x) | |}{\underset{x &Element; s}{Σ} | | I^{'} (A_{s^{'}} x) - I (x) | |}

Step 1054, from { S_NeiIn select and meet S_s,s'＞0.5，F_s,s'＞0.5，C_s,s'The adjacent of > 0.5 cut apart,By C_s,s'Maximum adjacent cutting apart as segmentation candidates, be designated as s^*, whenTime, wherein, θ_wFor defaultRatio, merges to s by s^*In.

Wherein, θ_wCan choose according to actual conditions, if value is 0.9, the embodiment of the present invention is not done thisLimit.

Step 1055, each s of cutting apart is merged, if at the beginning of the last merged sum of cutting apart exceededThe half of beginning Segmentation Number, re-starts the estimation of affine model.

By iteration above-mentioned steps, can be merged until do not cut apart, complete motion and there is uniformityThe merging of cutting apart. Example results after merging as shown in figure 11. As can be seen from the results, through mergingAfterwards, be incorporated in together multiple the cutting apart that belongs to same forms of motion.

Further, cutting apart while optimizing with the space-time consistency of estimation, in order to ensure sameIn the different frame of object on video sequence, have same mark, need to be mated cutting apart of different frame,As shown in figure 12, step 106 specifically comprises the steps 1061 to 1068:

Step 1061, using input image sequence in the first two field picture as initial matching masterplate, it is eachCut apart as a data sectional (English: volumesegment), for data sectional described in each distributesUnique mark.

Step 1062, for cutting apart in t frameLight stream according to t frame to t-1 frame, follows the tracks ofInEach pixel x_tCorresponding points x in t-1 frame_t-1, wherein, x_t-1In t-1 frame, be designated as cutting apart of place?The set formingExpression is cut apartCorresponding likely cutting apart in t-1 frame.

Step 1063, forIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

Wherein, | x_t| for cutting apartIn with cut apartThe number of corresponding pixel, denominatorExpression is cut apartInThe number of point.

Step 1064, calculatingWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

Wherein, rightIn each cutting apartLight stream according to t-1 frame to t frame, obtainsIn eachPoint x_t-1Corresponding points x in t_t, statistics x_tBelong toThe number of point be designated as c,For cutting apartMid pointNumber.

Step 1065, calculatingWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k}) .

Step 1066, whenTime, determineWithCoupling, willFlag update beMarkNote, and willJoinIn the data sectional at place, wherein, σ_mFor default matching threshold.

Concrete, ifσ_mFor predefined coupling threshold values, as being made as 0.5, recognizeForWithMate, or also can beIn find matching rateMaximum cuts apartWithMate, now maximum matching rateAll generally to be greater than σ_m, and matching rate is greater than σ_mPointCutIn general be all unique, existCan only find at the most one cuts apartWithCouplingRate is greater than σ_m, willFlag update beMark, and willJoinThe data sectional at placeIn.

Step 1067, whenWhile not existing coupling to cut apart in t-1 frame, willWith carrying out cutting apart in t-2 frameCoupling.

, for not there is not coupling point in step 1068, the coupling that finishes in the time that matching times is greater than default coupling step-lengthCutDistribute a new mark, willJoin in new data sectional.

Wherein, owing to following the tracks of the constantly accumulation of time stream error, therefore default coupling step-length is commonGet a less value, if value is 5, the embodiment of the present invention does not limit this.

After overmatching, obtain some data sectionals, be designated as S={S^k| k=1,2..., K}, each data are dividedSection S^kIn corresponding different frame, belong to cutting apart of same object, and use unique mark l to represent, therefore asPixel x in fruit t frame_tCorresponding pixel x in same t ' frame_t’Belong to same data sectional, so x_tAnd x_t’Have identical mark, i.e. S (x_t)＝S(x_t')。

After the sequential coupling of over-segmentation, obtain some data sectionals, being divided in each data sectionalConsistent cutting apart in sequential in each frame, meanwhile, it is consistent that these motions in cutting apart also roughly meet space-timeProperty. Next can carry out respectively according to the result of sequential coupling space-time consistency optimization and the fortune of motion segmentationThe moving space-time consistency optimization of estimating, obtains the consistent motion segmentation of space-time and output light stream.

Wherein, for the space-time consistency optimization of motion segmentation, concrete, based on cutting apart and base of single framesWhat obtain in sequential consistent boundary cuts apart, and due to some accidentalia, causes the segmentation result obtaining at space-timeOn still consistent not. For in a certain frame t in video sequence, occur two different objects due to color similarity alsoAnd space is adjacent and lead to errors and cut apart this situation by cluster to same, apart from far away its of this frameIn he frame t ', due to the difference of object motion, object position is spatially no longer adjacent, thereby in t ',These two objects can be correctly divided. But these two objects are still very near the frame part of this frameMay be cut apart mistakenly, if therefore just the processing of the multiframe based on local still cannot correctly correctThis segmentation errors.

In the space-time consistency Optimum Operation of motion segmentation, can utilize in sequential correctly carve information to goCorrect the erroneous segmentation in those local frames, correctly carve information is promoted globally, instead of entersThe multi-frame processing that row is local, thereby can obtain upper consistent the cutting apart of space-time of the overall situation. For in video sequenceEach pixel x in each frame t_t, followed the tracks of in other frames according to light stream, obtain corresponding points x_t’. If pointIt is correct cutting, so x_tWith x_t’Should there is identical data sectional mark, i.e. S (x_t)＝S(x_t’). Therefore, systemCount all corresponding points x_t’Data sectional mark, obtain a tag set P (x_t), x so_tMark shouldWhen being set P (x_t) the middle maximum mark l of occurrence number. Said process can form be turned to following energy sideJourney:

E (l) = Σ_{t = 1}^{n} (\underset{x_{t}}{Σ} (E_{d} (x_{t}, l) + \underset{y_{t} &Element; N (x_{t})}{Σ} E_{s} (l (x_{t}), l (y_{t}))))

Wherein, E_d(l,x_t) be data item constraint, E_s(l(x_t),l(y_t)) represent level and smooth item constraint, N (x_t) expression x_t'sNeighbor. E_d(S(x_t)) be defined as follows:

E_d(l,x_t)＝-w_plogL_p(l(x_t),x_t)-w_clogL_c(l(x_t),x_t)-w_slogL_s(l(x_t),x_t)+w_flog(1+D_f(l(x_t),x_t))

Wherein, L_p(l(x_t),x_t) be x_tBelong to the marking probability of l, L_c(l(x_t),x_t) expression x_tWith data corresponding to lThe color similarity of segmentation mid point, L_s(l(x_t),x_t) expression x_tWith being correlated with of the locus of cutting apart mid point corresponding to lProperty, D_f(l(x_t),x_t) represent the uniformity of the affine model that current motion cuts apart with l correspondence, w_p，w_c，w_s，w_fBe respectively marking probability, color similarity, spatial coherence, affine conforming coefficient.

L_{p} (l, x_{t}) = \frac{1}{n_{p}} (\underset{t^{'}}{Σ} [S (x_{t^{'}}) = l] + [S (x_{t}) = 1]), l &Element; P (x_{t})

Wherein, n_pFor the number of the available point that traces into (comprises x_t). Known, L_p(l,x_t) larger, show x_tThe probability that belongs to the data sectional that mark l is corresponding is larger.

Suppose that distribution of color and the spatial distribution cut apart meet Gaussian distribution N (), therefore color similarity L_c(l(x_t),x_t) be defined as:

L_{c} (l (x_{t}), x_{t}) = N (I (x_{t}) | μ_{S_{l}}^{c}, Σ_{S_{l}}^{c})

WithRepresent respectively the data sectional S that l is corresponding_lThe average of middle distribution of color and covariance. SimilarGround, L_s(l(x_t),x_t) be defined as:

L_{s} (l (x_{t}), x_{t}) = \frac{1}{| f (S_{l}) |} \underset{t^{'} &Element; f (S_{l})}{Σ} N (x_{t} | η_{s_{l}}, Δ_{s_{l}})

Wherein, f (S_l) expression S_lInterior all frame index that are segmented in video sequence, therefore L_s(l(x_t),x_t) meterThat calculate is x_tAnd corresponding points x_t’At S_lSpatial coherence in inside cutting apart separately average. S_lRepresent dividing bodyThe S of unit_lCorresponding cutting apart in frame t ',WithRepresent respectively S_lAverage and the covariance of interior space of points position.

Affine uniformity D_f(l(x_t),x_t) be defined as:

D_{f} (l, x_{t}) = | | (A_{l}^{t} x_{t} - x_{t}) - v_{x_{t}} | |

From defining above, marking probability is larger, color similarity is larger, and spatial coherence is higher, imitativePenetrate uniformity better, the value at cost of data item is less. Coefficient w_pWith respect to w_cAnd w_s, value is larger,And w_cAnd w_sValue is less, w_fAccording to the accuracy setting of initial motion, therefore approximately intrafascicular of data item is mainStill the marking probability constraint of effect of contraction, color similarity and spatial coherence are just used for carrying out complementaryConstraint.

E_s(l(x_t),l(y_t)) be used for carrying out regularization (smoothly) constraint, guarantee that on space, the mark of neighbor is consistent.E_s(l(x_t),l(y_t)) be defined as anisotropy, as follows:

E_{s} (l (x_{t}), l (y_{t})) = (λ_{l} \frac{ϵ_{l}}{p_{l} (x_{t}, y_{t}) + ϵ_{l}} + λ_{c} \frac{ϵ_{c}}{| | I (x_{t}) - I (y_{t}) | | + ϵ_{c}}) [l (x_{t}) &NotEqual; l (y_{t})]

Wherein,For the weights of anisotropic level and smooth, p_l(x_t,y_t)Represent x_tAnd y_tThe inconsistent probability of mark, || I (x_t)-I(y_t)) || represent x_tAnd y_tIn the European distance of color spaceFrom, λ_lAnd λ_cRepresent coefficient, ε_lAnd ε_cFor sensitivity coefficient, control the span of weights. p_l(x_t,y_t) definitionFor:

p_{l} (x_{t}, y_{t}) = \frac{1}{n_{l}} (\underset{t^{'}}{Σ} ([s (x_{t^{'}}) &NotEqual; s (y_{t^{'}})]) + [s (x_{t}) &NotEqual; s (y_{t})])

Known, work as x_tAnd y_tDo not cut apart or when color distortion is larger, the weights of level and smooth are less same;Otherwise weights are larger.

Due to the restriction of time and memory headroom cost, directly E (l) is solved very difficult, can iterationGround solves E (t, l) successively, simultaneously in order to improve the efficiency of processing, can only choose the N of present frame t_tIndividualConsecutive frame is followed the tracks of, for each pixel x_t, equally also choose the N of cost minimization_lIndividual mark is optimized.

Concrete, as shown in figure 13, the space-time consistency optimization method of motion segmentation is as follows, comprises following stepRapid 10711 to 10715:

Step 10711, the dividing mark of fixing each two field picture are constant.

Wherein, first calculate average and the covariance and each of the Gaussian distribution of color in each data sectionalIn data sectional, each is segmented in average and the covariance of the Gaussian distribution of the locus in frame separately, connectsFrom t=1 frame, carry out the fixing of dividing mark for each two field picture.

Step 10712, according to light stream, follow the tracks of each pixel x in t frame_tAt adjacent N_tCorrespondence in framePoint, obtains candidate's tag set P (x_t)。

Step 10713, according to candidate's tag set P (x_t) obtain the mark that occurrence number is maximum.

Concrete, calculating P (x_t) the middle maximum N of occurrence number_lAfter individual mark, can calculate this N_lThe marking probability of individual mark, and x_tEach mark corresponding cut apart in spatial similarity and space correlationProperty, thereby obtain data item cost.

Further, according to the definition of level and smooth item constraint, calculate the weights of the level and smooth item constraint of each some xt.

Use the efficient belief propagation of early vision (English: EfficientBeliefPropagationforEarlyVision, is called for short: BP) method solves energy equation E (t, l). Upgrade present frame t's according to the result obtainingDividing mark.

After one takes turns and optimized, add up the quantity C of cutting apart comprising in each data sectional_sAnd this numberAccording to the number C of all cut-points in segmentation_pt。

Step 10714, work as C_s＜N_t/ 2 or C_Pt＜σ_PtTime, wherein, C_sFor comprising in each data sectionalThe quantity of cutting apart, C_PtFor the number of all cut-points in data sectional, σ_PtFor predetermined threshold value, for numberAccording to each the cutting apart in segmentation, according to the mark situation of the corresponding points in other frames of the point in data sectional, choosingSelecting the data sectional of an optimum Match forces to merge.

Through after aforesaid operations, the data sectional that some mark l are corresponding has not existed.

Step 10715, upgrade the mark of remaining data sectional.

Similar with the space-time consistency optimization of motion segmentation, for the space-time consistency optimization of estimation,For each mark l, in corresponding each frame, belong to each affine transformation mould of cutting apart of same data sectionalTypeBecause cutting apart in same data sectionalMotion might not be constant in sequential, stillIn each frameInterior some forms of motion is consistent substantially,The motion of interior point all meetsRightEach pixel x in t frame_tIf, x_tMotion meet affine Transform Model at present frame tSox_tCorresponding points x in other frames t '_t’Motion should meet affine Transform ModelCan use x_tWith its correspondencePoint x_t’Measure the accuracy of motion at the Euclidean distance of color space, therefore, target is exactly to be each pixelChoose a best mark l^*, make x_tL coexists^*Corresponding affine transformationUnder all corresponding points of obtainingx_t’Between Euclidean distance and minimum, keep the mark of Local Phase neighbouring region pixel consistent simultaneously, formalization is above-mentionedDescription obtains about the energy equation of motion optimization as follows:

E (l) = Σ_{t = 1}^{n} (\underset{x_{t}}{Σ} (σ_{c} * E_{c} (x_{t}, l) + σ_{l} * E_{l} (x_{t}, l) + σ_{f} * E_{f} (x_{t}, f) + σ_{s} * E_{s} (x_{t}, l)))

Wherein, E_cRepresent x_tThe corresponding points that obtain under mark l of coexisting are at the Euclidean distance of rgb space, E_lRightx_tThe marking probability of the corresponding points that obtain under mark l is punished, E_fRepresent x_tIn present frame under mark lAffine motion with the difference of current motion, E_sFor smoothness constraint, σ_c，σ_l，σ_f，σ_sFor the power of corresponding entryValue.

E_{c} (x_{t}, l) = \underset{t^{'}}{Σ} (w_{O} (x_{t}) \frac{d^{2}}{d^{2} + θ_{C}^{2}}), d = | | I_{t^{'}} (x_{t^{'}}) - I_{t} (x_{t}) | |

Wherein, x_t’For x_tCorresponding points under affine transformation corresponding to mark l in other frames t ', by continuousAffine transformation obtains,t₁,t₂,t_kFor t is to the intermediate frame of t '.

E_{l} (x_{t}, l) = - \log (\frac{\underset{t^{'}}{Σ} (S_{t^{'}} (x_{t^{'}}) = = l)}{N})

Wherein, S_t'(x_t') expression x_tCorresponding points x in t ' frame_t’The dividing mark at place.

E_{f} (x_{t}, l) = | | (A_{t}^{l} x_{t} - x_{t}) - v (x_{t}) | |

For current some x_tThe motion obtaining under affine model corresponding to mark l, v (x_t) expression x_t'sCurrent motion.

E_{s} (x_{t}, l) = \underset{y_{t} &Element; N (x_{t})}{Σ} w_{s} (x, y) (l (x_{t}) = = l (y_{t}) ? 0 : θ_{Smooth})

w_s(x, y) is x_t,y_tBetween the weights of level and smooth item constraint, cut apart acquisition according to current, be used for strengthening motionThe uniformity of continuum, allows noncoherent boundary mark value difference. θ_SmoothFor consecutive points mark inconsistentPenalty value.

Space-time consistency optimization with motion segmentation is the same, and this energy equation of direct solution is very difficult, because ofThis carries out iterative, chooses Nt consecutive frame t ' simultaneously.

Concrete, as shown in figure 14, the space-time consistency optimization of estimation comprises, comprises the steps 10721To 10725:

Step 10721, according to current motion, obtain the data sectional that each mark l represents right in each frameThe affine Transform Model of cutting apart of answering

Step 10722, for t frame, according to affine Transform ModelCalculate light stream, to make tEach pixel x in frame_tCorresponding points x in other frames t '_t’In the Euclidean distance sum minimum of color space.

Concrete, from t=1 frame, for each frame t, the data item E of calculating energy equation_c、E_l、E_f，Further, calculate the coefficient w of level and smooth item constraint_s(x, y), carries out ENERGY METHOD by image cutting methodSolve.

The light stream of step 10723, renewal t frame.

Step 10724, carry out the detection of occlusion area.

Step 10725, the motion of occlusion area is shown to correction.

The method for estimating that adopts the embodiment of the present invention to provide, can improve color phase in image effectivelyLike, block and without the motion estimation result of texture region; In addition, sports ground keeps good structure, is dividingThe inside of cutting, motion is level and smooth and consistent, on the border of cutting apart, is also often the border of object in scene,Motion has kept good motion discontinuity.

It should be noted that, the core of technical solution of the present invention is to have in video sequence based on objectCut apart and the feature of the space-time consistency of estimation, effectively utilize cutting apart in different frame in video sequenceWith movable information correct in a certain frame causing due to accidentalia cut apart and move occur wrongSituation. Similarly, can use this technical scheme to carry out the depth recovery of video sequence, to obtain space-time oneThe picture depth information causing. In addition, apply method proposed by the invention, carry out the motion of video sequence and estimateThe result that obtains of meter can be widely applied to video interpolation, to the computer vision such as image tracing, three-dimensional reconstructionIn field, differ and one illustrate at this, but all should include protection scope of the present invention in.

The embodiment of the present invention also provides a kind of movement estimation apparatus, and as shown in figure 15, this device comprises:

Acquiring unit 1, for obtaining the initial light stream of each two field picture of image sequence of input.

Cutting unit 2, cuts apart for described each two field picture being divided into at least one according to colouring information.

Motion model estimation unit 3, for obtaining all initial motion model of cutting apart of each two field picture.

Sequential matching unit 6, for moving all each two field picture initial motion model of cutting apart pointThe sequential coupling of cutting, to make having phase in same target in image sequence each frame in image sequenceSame mark.

Processing unit 7, for obtaining motion segmentation and output light stream according to the result of sequential coupling.

The invention provides a kind of movement estimation apparatus, by owning in each two field picture in image sequenceThe initial motion model of cutting apart is carried out the sequential coupling of motion segmentation, to make the same target in image sequenceIn each frame in this image sequence, there is identical mark, and moved according to the result of sequential couplingCut apart and export light stream. Adopt a kind of like this method for estimating can process the motion of longer image sequenceCut apart and estimation, obtain good motion segmentation and the motion estimation result of space-time consistency, thus canEffectively improve the space-time consistency of motion segmentation.

Wherein, motion model estimation unit 3 specifically can be for:

Calculate the parameter of affine transformation, obtain the affine model of matching initial motion.

Obtain the affine motion field of each two field picture according to affine model.

Affine motion field and initial optical flow field are merged.

Carry out the detection of occlusion area according to the uniformity of motion.

Optionally, as shown in figure 16, described device can also comprise:

Cluster cell 4, for according to all initial motion model of cutting apart of each two field picture, will have identicalThe initial motion model of forms of motion is carried out cluster optimization.

Concrete, cluster cell 6 can be for:

To sorting all the cutting apart according to size in a two field picture, successively descending according to areaScreening affine model.

And/or as shown in figure 16, described device can also comprise:

Merge cells 5, for merging consistent the cutting apart of moving of each two field picture.

Concrete, merge cells 5 can be for:

For each s of cutting apart, calculate the color distortion of cutting apart interior all pixels under the current affine model of cutting apartSummation.

Computed segmentation S_iThe adjacent set { S of cutting apart of candidate_Nei, wherein, cut apart s ' completely for each candidate is adjacentFoot 2|s'| > | s|.

For { S_NeiIn each adjacent s ' of cutting apart, calculate respectively the uniformity of moving on the same s ' of s bordersIn the uniformity of affine model of the same s ' of current motion of pixelPixel in s is at the affine model of s 'The ratio C of the color distortion of the lower colour-difference similarities and differences self affine model_s,s'。

Wherein, S_S,S’，F_S,S’，C_S,S’Account form is respectively:

S_{{s, s}^{'}} = \frac{\underset{x &Element; s, y &Element; s^{'}, | x - y | = 1}{Σ} (| | v (x) - v (y) | | < {Th}_{Smooth})}{\underset{x &Element; s, y &Element; s^{'}, | x - y | = 1}{Σ} 1}

F_{{s, s}^{'}} = \frac{\underset{x &Element; s}{Σ} (| | v (x) - (A_{s^{'}} x - x) | | < {Th}_{Flow})}{\underset{x &Element; s}{Σ} (1)}

C_{{s, s}^{'}} = \frac{\underset{x &Element; s}{Σ} | | I^{'} (A_{s} x) - I (x) | |}{\underset{x &Element; s}{Σ} | | I^{'} (A_{s^{'}} x) - I (x) | |}

From { S_NeiIn select and meet S_s,s'＞0.5，F_s,s'＞0.5，C_s,s'The adjacent of > 0.5 cut apart, by C_s,s'MaximumAdjacent cutting apart as segmentation candidates, be designated as s*, work as C_s，s*＞θ_wTime, wherein, θ_wFor default ratio, willS merges in s*.

Further, cutting apart while optimizing with the space-time consistency of estimation, in order to ensure sameIn the different frame of object on video sequence, have same mark, need to be mated cutting apart of different frame,Sequential matching unit 6 specifically can be for:

Using the first two field picture in the image sequence of input as initial matching masterplate, its each cutting apart as oneIndividual data sectional, for data sectional described in each distributes unique mark.

For cutting apart in t frameLight stream according to t frame to t-1 frame, follows the tracks ofIn each pixel x_tCorresponding points x in t-1 frame_t-1, wherein, x_t-1In t-1 frame, be designated as cutting apart of place?StructureThe set becomingExpression is cut apartCorresponding likely cutting apart in t-1 frame.

ForIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

Wherein, ︱ x_t︱ is for cutting apartIn with cut apartThe number of corresponding pixel, denominator represents to cut apartThe number of mid point.

CalculateWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

Wherein, rightIn each cutting apartLight stream according to t-1 frame to t frame, obtainsIn eachPoint x_t-1Corresponding points x in t_t, statistics x_tBelong toThe number of point be designated as c,For cutting apartInThe number of point.

CalculateWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k}) .

WhenTime, determineWithCoupling, willFlag update beMark, and willJoinIn the data sectional at place, wherein, σ_mFor default matching threshold.

WhenWhile not existing coupling to cut apart in t-1 frame, willWith mating cutting apart in t-2 frame.

Concrete, as shown in figure 16, processing unit 7 comprises: motion segmentation is optimized module 71 and motion is estimatedMeter is optimized module 72.

Motion segmentation is optimized module 71, and the space-time that carries out motion segmentation for the result of mating according to sequential is consistentProperty is optimized.

Concrete, fortune motion segmentation is optimized module 71 can be for:

The dividing mark of fixing each two field picture is constant.

According to light stream, follow the tracks of each pixel x in t frame_tAt adjacent N_tCorresponding points in frame, are waitedSelect tag set P (x_t)。

According to candidate's tag set P (x_t) obtain the mark that occurrence number is maximum.

Work as C_s＜N_t/ 2 or C_Pt＜σ_PtTime, wherein, C_sFor the number of cutting apart comprising in each data sectionalAmount, C_PtFor the number of all cut-points in described data sectional, σ_PtFor predetermined threshold value, for described dataEach cutting apart in segmentation, according to the mark situation of the corresponding points in other frames of the point in described data sectional,Select the data sectional of an optimum Match to force to merge.

Upgrade the mark of remaining data sectional.

Estimation is optimized module 72, and the space-time that carries out estimation for the result of mating according to sequential is consistentProperty is optimized.

Concrete, estimation is optimized module 72 can be for:

For t frame, according to affine Transform ModelCalculate light stream, to make each in t framePixel x_tCorresponding points x in other frames t '_t’In the Euclidean distance sum minimum of color space.

Upgrade the light stream of t frame.

Carry out the detection of occlusion area.

The motion of occlusion area is shown to correction.

The movement estimation apparatus that adopts the embodiment of the present invention to provide, can improve color phase in image effectivelyLike, block and without the motion estimation result of texture region; In addition, sports ground keeps good structure, is dividingThe inside of cutting, motion is level and smooth and consistent, on the border of cutting apart, is also often the border of object in scene,Motion has kept good motion discontinuity.

The movement estimation apparatus that above-described embodiment provides is in the time carrying out estimation, only with above-mentioned each functional moduleDivision be illustrated, in practical application, can as required above-mentioned functions be distributed by differentFunctional module completes, and is divided into different functional modules by the internal structure of device, to complete above descriptionAll or part of function. In addition, the movement estimation apparatus that above-described embodiment provides and previous embodiment instituteThe method for estimating embodiment stating belongs to same design, and its specific implementation process refers to embodiment of the method, thisIn repeat no more.

One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can pass throughHardware completes, and also can carry out the hardware that instruction is relevant by program and complete, and described program can be stored inIn a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be read-only storage, disk orCD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all of the present inventionWithin spirit and principle, any amendment of doing, be equal to replacement, improvement etc., all should be included in of the present inventionWithin protection domain.

Claims

1. a method for estimating, is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, described according to the institute of each two field pictureBefore the sequential that has the initial motion model of cutting apart to carry out motion segmentation is mated, described method also comprises:

3. method according to claim 1 and 2, is characterized in that, described by described each two field pictureIn all initial motion model of cutting apart sequential coupling of carrying out motion segmentation comprise:

ForIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

CalculateWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

CalculateWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k});

4. method according to claim 3, is characterized in that, the described result according to sequential coupling obtainsComprise to motion segmentation and output light stream: the space-time one that carries out respectively motion segmentation according to the result of sequential couplingThe space-time consistency optimization of the optimization of causing property and estimation, obtains the consistent motion segmentation of space-time and output light stream.

5. method according to claim 4, is characterized in that, the space-time consistency of described motion segmentationOptimization comprises:

The dividing mark of fixing each two field picture is constant;

Work as C_s＜N_t2 or C_Pt＜σ_PtTime, wherein, C_sFor the number of cutting apart comprising in each data sectionalAmount, C_PtFor the number of all cut-points in described data sectional, σ_PtFor predetermined threshold value, for described dataEach cutting apart in segmentation, according to the mark situation of the corresponding points in other frames of the point in described data sectional,Select the data sectional of an optimum Match to force to merge;

Upgrade the mark of remaining data sectional.

6. method according to claim 4, is characterized in that, the space-time consistency of described estimationOptimization comprises:

Upgrade the light stream of described t frame;

Carry out the detection of occlusion area;

The motion of occlusion area is shown to correction.

7. method according to claim 1 and 2, is characterized in that, described initial motion model comprisesAffine model;

Described affine motion field and initial optical flow field are merged;

8. method according to claim 7, is characterized in that, described in described basis in each two field pictureAll initial motion model of cutting apart, carry out cluster optimization by the initial motion model with same movement formComprise:

9. method according to claim 7, is characterized in that, described in described merging in each two field pictureConsistent the cutting apart of moving comprises:

For { S_NeiIn each adjacent s ' of cutting apart, calculate respectively the uniformity of moving on the same s ' of s bordersIn the uniformity of affine model of the same s ' of current motion of pixelPixel in s is at the affine model of s 'The ratio of the color distortion of the lower colour-difference similarities and differences self affine model

From { S_NeiIn select and meet S_s,s'＞0.5，F_s,s'＞0.5，C_s,s'The adjacent of > 0.5 cut apart, by C_s,s'MaximumAdjacent cutting apart as segmentation candidates, be designated as s^*, whenTime, wherein, θ_wFor default ratio, willS merges to s^*In;

10. method according to claim 1, is characterized in that, described initial light stream comprises forward lightStream and/or oppositely light stream.

11. 1 kinds of movement estimation apparatus, is characterized in that, described device comprises:

12. devices according to claim 11, is characterized in that, described device also comprises:

13. according to the device described in claim 11 or 12, it is characterized in that, described sequential matching unitSpecifically for:

For cutting apart in t frameLight stream according to t frame to t-1 frame, follows the tracks ofIn each pixel xtCorresponding points x in t-1 frame_t-1, wherein, x_t-1In t-1 frame, be designated as cutting apart of place?StructureThe set becomingExpression is cut apartCorresponding likely cutting apart in t-1 frame;

ForIn each cutting apartCalculateWithMatching rate:

r_{t - > t - 1}^{k - > k^{'}} = \frac{| x_{t} | x_{t} &Element; s_{t}^{k}, x_{t - 1} &Element; s_{t - 1}^{k^{'}} |}{| s_{t - 1}^{k^{'}} |}

CalculateWithMatching rate:

r_{t - 1 - > t}^{k^{'} - > k} = \frac{c}{| s_{t}^{k} |}

CalculateWithFinal matching rate:

r_{t, t - 1}^{k, k^{'}} = \min (r_{t - > t - 1}^{k - > k^{'}}, r_{t - 1 - > t}^{k^{'} - > k});

14. devices according to claim 13, is characterized in that, described processing unit comprises: motionCut apart and optimize module and estimation optimization module;

15. devices according to claim 14, is characterized in that, described fortune motion segmentation is optimized moduleSpecifically for:

The dividing mark of fixing each two field picture is constant;

Upgrade the mark of remaining data sectional.

16. devices according to claim 14, is characterized in that, described estimation is optimized module toolBody is used for:

Upgrade the light stream of described t frame;

Carry out the detection of occlusion area;

The motion of occlusion area is shown to correction.

17. according to the device described in claim 11 or 12, it is characterized in that described initial motion modelComprise affine model;

Described motion model estimation unit specifically for:

Described affine motion field and initial optical flow field are merged;

18. devices according to claim 17, is characterized in that, described cluster cell specifically for:

19. devices according to claim 17, is characterized in that, described merge cells specifically for:

20. devices according to claim 11, is characterized in that, described initial light stream comprises forward lightStream and/or oppositely light stream.