CN101189870A

CN101189870A - Motion stabilization

Info

Publication number: CN101189870A
Application number: CNA2006800198266A
Authority: CN
Inventors: 阿齐兹·于米特·巴图尔
Original assignee: Texas Instruments Inc
Current assignee: Texas Instruments Inc
Priority date: 2005-04-28
Filing date: 2006-04-28
Publication date: 2008-05-28

Abstract

Stabilization for devices such as hand-held camcorders segments a low-resolution frame into a region of reliable estimation, refines the motion vectors of that region hierarchically while at the same time updating the segmentation, finds a global motion vector for the region at high resolution, and uses the global motion vector to compensate for jitter.

Description

Motion stabilization

Technical field

The present invention relates to Digital Signal Processing, and the imaging device that more particularly relates to image stability method and have electronic stabilityization.

Background technology

The task of image stabilization (IS) is to eliminate the shake of the video sequence that is captured by handheld cameras.Shake is normally caused by unwanted hand shake during the videograph, and become more serious problem when using higher zoom ratio.For consumer's digital camera and camera phone, the shake of eliminating video sequence has become more and more important problem.The method that has several different described image stabilization problems of solution.A kind of ad hoc approach is to use digital image processing techniques to eliminate shake.This method is commonly referred to " digital image stabilization " (DIS).

Typical digital image stabilizing method can be summarized as follows: step 1: motion vector calculation: calculate a plurality of candidate motion vectors between two frames by finding out being correlated with between the block of pixels.

Step 2: bulk motion vector is determined: use a plurality of heuristics to find out two overall jitter between the frame and move and handle candidate motion vector from step 1.Step 3: motion compensation: compensate estimated jitter motion by on the rightabout of described motion, output image being carried out numerical shift.

For example, referring to USP 5,748,231, USP 5,563,652 and USP 6,628,711.

Summary of the invention

The invention provides digital image stabilization, described digital image stabilization be by via the motion vector analysis with the low-definition version segmentation of incoming frame and hierarchically the refinement motion vector upgrade described segmentation simultaneously and come shake is estimated to realize.

Description of drawings

Fig. 1 illustrates layer-stepping graphical representation.

Fig. 2 shows image segmentation.

Fig. 3 shows global motion estimation.

Fig. 4 explanation is to the compensation of motion.

Fig. 5-the 6th, flow chart.

Fig. 7-9 display image pipeline, processor and network communication.

Embodiment

For example, the first preferred embodiment method that is used for the digital image stabilization (DIS) of handheld video device estimates that jitter motion also correspondingly compensates.Fig. 5 illustrates that described shake estimates, it comprises the steps: at first, and the motion by analyzing each piece is segmented into effectively (jitter motion) piece and invalid (other motions) piece with the low-definition version of incoming frame; Next, assemble motion vector to find out candidate motion region, and choose best candidate motion region by score, described score comprise piece in poor, the candidate motion region of motion vector of level and smooth average motion vector and present frame quantity and with previous frame choose the quantity of the overlapping piece in moving region.Then, by convergent-divergent and the described motion vector of refinement and upgrade described segmentation and expand to high-resolution in the bulk motion vector under the highest resolution up to producing described zone.Stablize the bulk motion vector (if there is) is applied to described frame is carried out motion compensation.Fig. 6 is the flow chart of entire method.

The preferred embodiment system comprises the video camera of implementing the preferred embodiment antihunt means, digital camera, video mobile phone, video display devices or the like.Fig. 7 shows general image processing pipeline, and preferred embodiment is stable can be embodied as the MPEG/JPEG function and determine to integrate with motion vector, although do not need coding or compression preferred embodiment stable.In fact, can stablize the video that shows without stable by the preferred embodiment of using as the part of procedure for displaying.

The preferred embodiment system may be embodied as any in the hardware of several types: digital signal processor (DSP), general purpose programmable processors, special circuit or system on chip (SoC), for example DSP and risc processor are together with the combination of various special-purpose programmable accelerator.Fig. 8 explanation is used for the example of the processor of digital camera application, wherein has the Video processing subsystem on the upper left side.Institute's program stored can be implemented signal processing on the plate or among outside (quickflashing EEP) ROM or the FRAM.Analogue-to-digital converters and digital-analog convertor can be provided to the coupling that reaches real world, modulator and demodulator (adding the antenna that is used for air interface) can provide coupling for transmitted waveform, and burster can be provided for going up at network (for example internet) form of transmission, referring to Fig. 9.

The described first preferred embodiment DIS method comprises following three steps:

Step 1. Segmentation: handle the block-based segmentation of calculating described frame by top (lowest resolution) to the layer-stepping graphical representation of each frame.

Step 2. Global motion estimation: use the segmentation of calculating in step 1 and the bulk motion vector that described frame is estimated in described layer-stepping graphical representation.

Step 3. Motion compensation: the bulk motion vector of calculating in the use step 2 compensates the jitter motion in the present frame.

Provide detailed description with the lower part to these steps.

Step 1-segmentation

Handle each new frame in the sequence frame and produce as shown in fig. 1 layer-stepping graphical representation.This layer-stepping is represented to be made up of the version of several different resolutions of original image.Can be by higher stage resolution ratio being carried out low-pass filtering (for example using Gaussian kernel), and on each direction, descend sampling to obtain each level of described expression with 2 factor then.Repeatedly repeat the version that this filtering and following sampling process reduce gradually with the resolution that produces primitive frame.The quantity of level changes according to the size of incoming frame.For example, under the situation of VGA input (640 * 480 pixels), use 4 grades of expressions, it provides 640 * 480,320 * 240,160 * 120 and 80 * 60 image.Say that roughly it is higher level's (lowest resolution) of 80 * 60 pixels that preferred embodiment uses rough size.The layer-stepping of two frames (present frame and previous frame) is represented to be kept in the memory for being used for estimation.

Use top block-based segmentation of calculating present frame current and that layer-stepping previous frame is represented.The purpose of this segmentation is that described frame is divided into independent moving region.Suppose that all pixels in the specific region all have same motion vector.The first preferred embodiment method is used 16 pieces as shown in Figure 2, and top 80 * 60 frames are divided into the piece of 4 * 4 arrays, and the size of each piece is 16 * 12 pixels, adds that width is the borderline region (being used for motion vector calculation) of 6 or 8 pixels.So the piece in more rudimentary will be 32 * 24,64 * 48 and 128 * 96 pixels.

In order to calculate described segmentation, at first by each piece at the top top place that layer-stepping is represented as reference frame of previous frame being implemented estimation based on SAD.Use full search, described full search relates at all possible motion vector (MV) calculates SAD (absolute difference and), and chooses described MV then as a MV with minimum SAD.In particular, to the present frame at time t place, suppose p ^t(i, j) expression is positioned at (i, (brightness) value of the pixel of j) locating, p ^T-1(m, the n) pixel of expression previous frame, and MV=[MVx, MVy] the possible motion vector of expression piece, then:

SAD(MV)＝∑ _{(i，j)∈block}?|p ^t(i，j)-p ^t-1(i+MVx，j+MVy)|

Owing to use two little low-resolution frames to implement this process, so computation complexity is lower.In case find out minimum SAD, promptly use four the adjacent SADs of described minimum SAD on vertical and horizontal direction to come following calculating reliability score.Suppose that S represents minimum SAD, and St, Sb, Sr and Sl represent upper and lower, right, the left adjacent S AD of minimum SAD respectively, that is to say, if V=[Vx, Vy] provide the piece motion vector of minimum S, St is motion vector [Vx so, Vy-1] SAD, Sb is the SAD of motion vector [Vx, Vy+l], by that analogy.Calculate (St-S)+(Sb-S)+(Sr-S)+(Sl-S) and, it is poor that we are called SAD with (St-S)+(Sb-S)+(Sr-S)+(Sl-S).This is to the tolerance of described texture content and shows the reliability of its MV.

In case described MV can be used for described, be about to it and assemble to discern described moving region.Before proceeding assembly, at first remove least reliable MV.In order to discern described least reliable MV, can use following two restrictive conditions:

1) if the level of described MV or vertical amplitude greater than a certain threshold value, it is unreliable then described MV to be labeled as.The representative value of described threshold value is in from 2 to 5 the scope.

2) if described SDA is poor less than a certain threshold value, it is unreliable then its MV to be labeled as.Described threshold value can be (8) pixel quantity in the piece divided by 2, therefore for 16 * 12, described threshold value will be 96.

After removing described unreliable MV, find out candidate motion region by assembling described MV, wherein each is trooped and includes identical MV; That is to say that candidate motion region is made up of the piece that all have identical (reliably) MV.The quantity of trooping will equal the quantity of different MV.Each MV troops all corresponding to a candidate motion region now.

Subsequently, the zone that has a MV that does not belong to described zone by absorption as described below makes each candidate motion region growth.If MV in the candidate motion region and the distance of the vector space between a MV are less than or equal to 1, then the piece with described MV absorbs in the described candidate motion region.During this process, some piece can be included in the more than zone.In other words, candidate motion region can be overlapping.

In all available candidate motion region, must assign to find out the best zone that can be used for video stabilization by what calculate each zone according to following formula:

PTS=relative motion score+big or small score+overlapping score

In order to calculate described relative motion score, at first calculate each regional relative motion.To each piece in the zone, accumulate MV in time by the autoregression of using following form and calculate the relative motion between (in described) object and the camera:

R_{j}^{t} = α R_{j}^{t - 1} + (1 - α) V_{j}^{t}

R wherein _f ^tBe i ^ThJ in the frame ^ThThe relative motion of piece, α is an accumulation factor, and V _f ^tBe described t ^ThPiece motion vector in the frame.First preferred embodiment uses a to equal 0.8.This equation is implemented at V _f ^tLow-pass filtering.In case the relative motion of each piece can be used, promptly, the relative motion of all pieces in each candidate motion region finds out the relative motion in described zone by being averaged.After obtaining each regional relative motion, can following calculating relative motion score:

The relative motion score=

Minimum ((maxRelMotV-relMotV), (maxRelMotH-relMotH)),

Wherein maxRelMotV and maxRelMotH are that maximum on vertical and horizontal direction allows relative motion, and relMotV and the relMotH relative motion that is current block on vertical and horizontal direction.The representative value of maxRelMotV and maxRelMotH is in from 2 to 5 the scope usually.If it is negative that the relative motion of moving region must be divided into, be unreliable with described zone marker so and no longer it taken in.

By with in the zone the quantity of piece and a certain constant multiply each other and calculate big or small score.Described first preferred embodiment is used as constant with 0.1.

By calculating overlapping score with multiplying each other with the quantity of the piece of the segmentation overlay of previous frame in a certain constant and the described zone.For described constant, first preferred embodiment uses 0.4.

In case obtain the score that all are trooped, promptly select and have trooping of top score and be used for video stabilization.(because described trooping all has negative score) forbids the video stabilization at present frame so if exist without any trooping at this some place.

If selected trooping had polylith, for the computation complexity reason it is diminished so.In order to reduce the quantity of piece, eliminate piece with minimum SAD difference.

Step 2-global motion estimation

Continuation is carried out to subordinate's (high-resolution) that layer-stepping is represented.On each level, at first make estimation (from more senior) to being labeled as effective piece in the segmentation.Estimation refine to more high accuracy with MV, referring to Fig. 3.Precision increase along with movable information can separate undistinguishable moving region under low resolution before now.Therefore, use new motion vector, upgrade current segmentation.In order to upgrade described segmentation, use the program of using at top place that is similar to, but current only the use is labeled as effective piece in current segmentation.The MV that at first assembles through refinement all has identical MV so that each is trooped.Then, make each growth of trooping to find out candidate motion region.During described growth course, if new MV and troop in MV between ultimate range be less than or equal to 1, so described MV is absorbed in described the trooping.Note that the threshold value of locating corresponding to previous higher level when the threshold value 1 at prime place 0.5.The appearance of the more accurate movable information that the subordinate that represents along with layer-stepping locates, the separation of described moving region is proceeded with degree of precision.In case find out candidate motion region, promptly calculate the PTS of each candidate motion region and choose best trooping and be used for video stabilization.This program is identical with the program that top place uses.If selected trooping is excessive, can eliminates the least reliable piece so and reduce motion estimation complexity.Use the SAD difference to discern the least reliable piece.

The continuous subordinate that described layer-stepping is represented repeats above program.At the lowermost level place, in case assembled MV, promptly do not allow moving region growth, be used for video stabilization and only calculate its PTS and choose best one.Because the zone is not in the growth of lowermost level place, so trooping of existing should have the piece that has identical MV.This motion vector is to be used to stablize the whole MV of described frame.

All finish if piece is stated in any point place during described global motion estimation process, present frame is unsuitable for estimation so, so disable video stabilization.Disable video stabilization means and will can not upgrade montage the window's position in present frame.

Step 3-motion compensation

For each frame, described preferred embodiment method subwindow of montage and it is shown to the beholder from image is as illustrated in fig. 4.If this subwindow with the rightabout of described (estimation) jitter motion on suitably move, the beholder can not observe shake so.Equation below using comes the mover window:

U ^t＝K ^tU ^t-1-W ^t

U wherein ^tThe coordinate in the subwindow upper left corner in the expression present frame, W ^tBe the estimated whole MV of present frame, and K ^tIt is the self adaptation accumulation factor.This equation can be applied to the vertical and horizontal coordinate in the described subwindow upper left corner individually.U ^tThe reference point neutral position that is described window in the middle of frame, make U ^tWindow is not zero from first frame of video that its initial position moves therein.K ^tHow far between minimum and maximum, change linearly apart from its neutral position according to described subwindow.Following calculating K ^tValue:

K ^t＝[(K _min-K _max||U ^t-1||/U _max]+K _max

Wherein || U ^T-1|| be component U ^T-1Absolute value and, U _MaxBe the maximum allowable offset of subwindow apart from its neutral position, K _MaxBe K ^tMaximum, and K _MinBe K ^tMinimum value.First preferred embodiment uses K _Max=1 and K _Min=0.85.

3. revise

Can revise described preferred embodiment in various manners and keep one or more features of low-resolution frames segmentation simultaneously, under high-resolution to the motion vector refinement of institute's sectional area, upgrade described segmentation and obtain the high-resolution movable information, and upgrade the window's position with bulk motion vector.

For instance, the block array that is used for segmentation can be according to the pixel quantity (for example 3000 to 8000 pixels) of the lowest resolution version of described frame and the ratio of width to height (4 * 5 the ratio of width to height of portrait for example, 4 * 3,16 * 9 the ratio of width to height of HDTV or the like) change, for example 8 * 5 arrays of 3 * 3 arrays of 4 * 3 the ratio of width to height with 3000 pixels and 16 * 9 the ratio of width to height have 8000 pixels.That is to say that from 9 to 40 pieces all suit.Usually can implement stable to picture; That is to say described stable top also that can be applied to separate based on field or combination field piece and end field piece.Lowest resolution full search can be replaced by limited search.SAD measures and can replace by other measurements of motion vector prediction error (for example SSD (difference of two squares and)) or by the SAD of double sampling.During refinement (and corresponding adjustment), can skip one or more levels that layer-stepping is represented, and can use different low-pass filtering and following sampling method to define described layering.When described motion estimation process reached the highest resolution that layer-stepping represents, it can continue the described motion vector of refinement and upgrade segmentation at subpixel resolution (for example half-pix, 1/4th pixels etc.).If the low-definition version of incoming frame (for example in the low pass-low pass of wavelet decomposition) is available, can use described version to accelerate to produce the speed of the process that described layer-stepping represents so.Can use the different equations of considering current location to calculate position through stable subwindow.Can change described all size, threshold value, accumulation factor or the like.

Claims

1. the method for a digital image stabilization, described method comprises following steps:

(a) provide the low-definition version of input digit picture;

(a) according to motion prediction error described low-definition version is divided into reliable piece and unreliable;

(b) the piece motion vector according to described reliable piece gathers described reliable piece in the candidate motion region;

(c) from described candidate motion region, find out the segmentation of described low-definition version;

(d) described segmentation is updated to the segmentation of described input picture;

(d) from the described segmentation of described input picture, find out the bulk motion vector of described input picture; And

(d) the described bulk motion vector of use compensates the jitter motion in the described input picture.

2. the method for claim 1, wherein saidly find out segmentation and comprise that the score with described candidate motion region compares, wherein Qu Yu described score comprises: (i) motion of average piece relatively is with peaked poor, (ii) to the tolerance of the quantity of reliable piece described in the zone, and (iii) to the also tolerance of the quantity of the described reliable piece in the segmentation of the low-definition version of picture formerly in the zone.

3. the method for claim 1, comprise wherein said cutting apart: for each of described, with described described predicated error and motion vector equal described motion vector in one of two components of described motion vector, add be 1 increment or decrement predicated error difference and compare with threshold value.

4. the method for claim 1, the described segmentation of wherein said renewal is included in first the upgrading second of the described segmentation add the described input picture of being fragmented into of the above mid-resolution version and upgrade of segmentation of the mid-resolution version of described input picture, and wherein said mid-resolution version has: (i) resolution higher than described low-definition version; And (ii) than the low resolution of described input picture.

5. method to the jitter compensation of frame of video, it comprises following steps:

(a) provide resolution demixing F for incoming frame ₁, F ₂... F _N, wherein N equals F for integer and described incoming frame greater than 2 _N

(b) with described F ₁Be divided into piece and add borderline region;

(c) for each of described, (i) computing block motion vector; Be 1 motion vector calculation predicated error (ii) at described motion vector and at differing with described motion vector; And (iii) when the difference of described predicated error and during above first threshold, with described described each be appointed as reliable piece;

(d) the piece motion vector according to described reliable piece is focused to described reliable piece in the candidate motion region;

(e) passing through according to following some more described zone is F from described candidate motion region ₁Select segmentation: (i) quantity of piece; (ii) corresponding to the quantity of the piece of the piece in the segmentation of the layering of previous frame; Reach the (iii) mean value of related blocks motion vector;

(f) for n=2 ..., N is by at corresponding to F _N-1The piece repeating step (c)-(e) of piece of described segmentation come to be F _nSelect segmentation;

(g) from F _NDescribed segmentation calculate bulk motion vector; And

(h) use described bulk motion vector to come described incoming frame is carried out jitter compensation.

6. method as claimed in claim 5, wherein said related blocks motion vector are the low-pass filtering of described motion vector.

7. method as claimed in claim 5, wherein said jitter compensation comprise the self adaptation accumulation of described bulk motion vector, to define the position of the subwindow in the described incoming frame.

8. method as claimed in claim 5, the wherein said piece that is divided into gives the coldest days of the year end individual piece between 40.

9. video camera, it comprises:

(a) picture grabber;

(b) shake estimator, it is coupled to described picture grabber, and described shake estimator comprises:

(i) sampler under the resolution demixing;

(ii) sectionalizer, it is coupled to described sampler down, and described sectionalizer can be operated to find out segmentation at each level of the resolution demixing that calculates in regular turn from the input picture and add motion vector;

(iii) bulk motion vector determiner, it is coupled to described sectionalizer;

(c) memory, it is used for the resolution demixing of previous frame, and described memory is coupled to described shake estimator; And

(d) jitter motion compensator, it is coupled to described shake estimator.

10. video camera as claimed in claim 9, wherein:

(a) described shake estimator is implemented on the programmable processor as program.