CN101742122B

CN101742122B - Method and system for removing video jitter

Info

Publication number: CN101742122B
Application number: CN2009102427956A
Authority: CN
Inventors: 黄磊; 刘昌平; 姚波
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanwang Technology Co Ltd
Priority date: 2009-12-21
Filing date: 2009-12-21
Publication date: 2012-06-06
Anticipated expiration: 2029-12-21
Also published as: CN101742122A

Abstract

The invention provides a method and a system for removing video jitter. The method comprises the following steps: 1. registering the current image frame in a video opposite to the adjacent image frame thereof; 2. accumulating trajectory information of the jitter generated in registration, and jittering the generated motion trail smoothly; and registering the current image frame in a direction opposite to the motion trail; 3. filling a blank area generated at the edge of the current image frame by an image frame adjacent to the current image frame; and 4. jumping to the next image frame of the current image frame, returning to the first step until all the image frames in the video are completely processed. The method and the system of the invention has quicker processing speed, is unrelated to the contrast ratio of the image, has relatively higher registering precision, ensures that the filled image has good continuity and consistency at the edge, and reduces trails of manual handling.

Description

A kind of method and system of removing video jitter

Technical field

The invention belongs to image processing field, especially, relate to the technology of removing video jitter.

Background technology

When picture pick-up device is fixed on building or the pillar, when wind, will occur rocking; At machine (like car, aircraft, ship etc.), heating ventilation equipment, air-conditioning, PTZ The Cloud Terrace etc. the also unsettled video image of output jitter is arranged in the occasion of vibrations; Especially under the camera lens situation of using high power to amplify; The degree of video jitter is more serious, has had a strong impact on visual effect.Be directed to the video of camera head output, the speed that present video is removed the dither method processing often can not reach real-time requirement, or reduces the resolution of handling rear video.

The primary work that video is removed shake is the registration between the consecutive frame image, and the levels of precision of registration will directly influence video and go the effect of shaking.The method that is usually used in image registration has: the method for optical flow method, the method based on shape content, Corner Detection and coupling etc.The computational complexity of optical flow method is higher, is difficult to accomplish the real-time processing of video.Method based on shape content is suitable for content match and retrieval between image.The method of Corner Detection and coupling mainly comprises SIFT Corner Detection and matching process, Harris angular-point detection method, SUSAN angular-point detection method etc.Wherein, SIFT Corner Detection and matching process have yardstick and rotational invariance, can carry out registration to image more accurately, but computational complexity are higher, are difficult to accomplish real-time processing.Harris and SUSAN Corner Detection Algorithm are very fast relatively, and can detect the angle point in the image more accurately, but when the contrast of image when lower or image shift yardstick is big, registration accuracy is lower, and the continuity and the consistency at edge are relatively poor.

Summary of the invention

The object of the present invention is to provide a kind of method and system of removing video jitter, the trace information of the shake that can carry out producing behind the registration to the original image in the frame of video carries out smoothly, and the white space that produces is filled.

This method comprises the steps:

Step 1: current image frame in the video is carried out registration with respect to its contiguous picture frame;

Step 2: the trace information of the shake that produces in the registration is added up, and the movement locus of level and smooth shake generation, current image frame is carried out and the rightabout correction of movement locus;

Step 3: the white space that produces in current image frame edge when using the picture frame that is close to current image frame to fill level and smooth the shake;

Step 4: jump to next picture frame of current image frame, return step 1, all images frame in video disposes.

Said step 1 comprises the steps:

The maximum down-sampling yardstick of step a, calculating current image frame carries out down-sampling according to maximum down-sampling yardstick to current image frame and its contiguous picture frame, generates down-sampled images;

Step b, in down-sampled images, calculate the textural characteristics of each pixel;

Step c, calculate the cost function of the down-sampled images of two picture frames, thereby obtain the direction of motion of down-sampled images according to the textural characteristics of each pixel;

Steps d, current image frame is proofreaied and correct, obtain the trace information of current image frame on line direction and column direction according to the direction of motion;

Step e, dwindling the down-sampling yardstick, the image after proofreading and correct is carried out down-sampling, return step b, is 0 until the down-sampling yardstick.

Among the said step a, maximum down-sampling yardstick is τ,

τ＝max{i-γ|2 ⁱ≤min{H，W}}

Wherein, i representes critical down-sampling yardstick, When sampling scale is greater than i instantly, the image behind the down-sampling will become a pixel less than 2 * 2,γ is a relative parameter, the height of H presentation video, the width of W presentation video.

Among the said step b, textural characteristics is represented with gray scale, level and the vertical gradient three-dimensional vector of image, is shown below:

f_{(x, y)} = [I_{(x, y)}, I_{(x, y)}^{x^{'}}, I_{(x, y)}^{y^{'}}]

Wherein, I _{(x, y)}For image pixel (x, the gray value of y) locating,

Know

Presentation video is at pixel (x, level of y) locating and vertical gradient.

Among the said step c, cost function is:

C_{(u, v)} = \frac{1}{H \times W} Σ_{x = 1}^{H} Σ_{y = 1}^{W} {(f_{(x, y)}^{1} - f_{(x + u, y + v)}^{2})}^{2}

Wherein, u is the pixel-shift of horizontal direction, and v is the pixel-shift of vertical direction, the span of u and v all be 1,0,1}; C _{(u, v)}Be by (u, the cost value when v) direction is moved, the height of H presentation video, the width of W presentation video between two two field pictures;

Presentation video pixel (x, the textural characteristics value of y) locating,

The picture frame of expression vicinity is in pixel (x+u, the textural characteristics value of y+v) locating.

Among the said step c, the computing formula of the direction of motion of image is:

(u^{'} v^{'}) = \{\begin{matrix} (u, v) & \min {C_{(u, v)} | u, v &Element; {- 1,0,1}} \leq φ \\ (0,0) & other \end{matrix}

Wherein, φ is a decision threshold, (u ', v ') be the direction of motion of adjacent two interframe.

Among the said step e, when the image after proofreading and correct is carried out down-sampling, adopt pyramid model to carry out down-sampling.

In the said step 2, when the trace information of the shake that produces in the registration is added up, at first calculate the number of pixels δ that present frame need move at line direction _0rThe number of pixels δ that need move at column direction with present frame _0c,

δ_{0 r} = p_{0 r} - \frac{1}{2 a + 1} Σ_{i = - a}^{a} p_{ir}

δ_{0 c} = p_{0 c} - \frac{1}{2 a + 1} Σ_{j = - a}^{a} p_{jc}

Wherein, p _IrBe the accumulation displacement of the image of adjacent i frame with current image frame in the movement locus, p at line direction _JcBe the accumulation displacement at column direction of the image of adjacent j frame with current image frame in the movement locus, a is the frame of video width that the filtering shake is adopted, p _0rThe expression current image frame is at the accumulation displacement of line direction, p _0cThe expression current image frame is in the accumulation displacement of column direction, and i, j are natural number; According to δ _0rAnd δ _0cMove the movement locus that current image frame comes level and smooth shake to produce in the other direction.

In the said step 3; Calculate current image frame and and the contiguous picture frame of current image frame between shared edge constraint value; When shared edge constraint value during less than set threshold value; Use the white space that current image frame edge produces when filling level and smooth shake with the corresponding region of the contiguous picture frame of current image frame, otherwise use the next frame with the contiguous picture frame of current image frame to carry out the threshold value coupling, the white space that produces until current image frame edge is filled and is finished.

In the said step 3, share edge constraint value S _k(x ₀, y ₀) be:

S _k(x ₀，y ₀)＝S _0k(x ₀，y ₀)+S _k0(x ₀，y ₀)

Wherein,

S_{0 k} (x_{0}, y_{0}) = \frac{1}{m} \underset{(x, y) &Element; ω_{1} (x_{0}, y_{0})}{Σ} {(I (x, y) - \overset{&OverBar;}{I (x_{0 k}, y_{0 k})})}^{2}

For current image frame with respect to contiguous k two field picture at (x ₀, y ₀) the shared edge constraint located;

S_{k 0} (x_{0}, y_{0}) = \frac{1}{m} \underset{(x, y) &Element; ω_{2} (x_{0}, y_{0})}{Σ} {(I_{k} (x, y) - \overset{&OverBar;}{I (x_{0 k}, y_{0 k})})}^{2}

For contiguous k two field picture with respect to current image frame at (x ₀, y ₀) the shared edge constraint located; (x ₀, y ₀) to be current image frame sharing the coordinate of pixel on the edge with contiguous k two field picture; (x y) is pixel coordinate in the display window; M is with (x ₀, y ₀) be the number of pixels of the set scope at center; ω ₁(x ₀, y ₀) be the zone of current image frame in set scope, ω ₂(x ₀, y ₀) be k two field picture the zone in set scope contiguous with current image frame, (x is that current image frame is at (x, the brightness value of y) locating, I y) to I _k(x, y) be with the contiguous k two field picture of current image frame (x, the brightness value of y) locating,

\overset{&OverBar;}{I (x_{0 k}, y_{0 k})} = \frac{1}{m} (\underset{(x, y) &Element; ω_{1} (x_{0}, y_{0})}{Σ} I (x, y) + \underset{(x, y) &Element; ω_{2} (x_{0}, y_{0})}{Σ} I_{k} (x, y))

Be the contiguous k two field picture of current image frame and current image frame with (x ₀, y ₀) be the brightness average of the set scope at center.

The present invention also provides a kind of system of removing video jitter, comprising:

Registration apparatus is used for the video current image frame is carried out registration with respect to its contiguous picture frame;

Means for correcting, the trace information of the shake that is used for the registration current image frame is produced add up, and the movement locus of level and smooth shake generation, and current image frame is carried out and the rightabout correction of movement locus;

Filling device, the white space that produces in current image frame edge when being used to use the picture frame that is close to current image frame to fill level and smooth the shake.

Said registration apparatus comprises:

Downsampling unit is used to calculate the maximum down-sampling yardstick of current image frame, according to maximum down-sampling yardstick current image frame and its contiguous picture frame is carried out down-sampling, generates down-sampled images; And dwindling the down-sampling yardstick, the image after direction of motion correcting unit is proofreaied and correct carries out down-sampling to generate down-sampled images, is 0 until the down-sampling yardstick;

Computing unit is used for the textural characteristics in each pixel of down-sampled images calculating;

The direction of motion is confirmed the unit, is used for calculating according to the textural characteristics of each pixel the cost function of the down-sampled images of two picture frames, thereby obtains the direction of motion of down-sampled images;

Direction of motion correcting unit is used for according to the direction of motion current image frame being proofreaied and correct, and obtains the trace information of current image frame on line direction and column direction.

When the trace information of the shake that produces in registration at means for correcting adds up, at first calculate the number of pixels δ that present frame need move at line direction _0rThe number of pixels δ that need move at column direction with present frame _0c,

δ_{0 r} = p_{0 r} - \frac{1}{2 a + 1} Σ_{i = - a}^{a} p_{ir}

δ_{0 c} = p_{0 c} - \frac{1}{2 a + 1} Σ_{j = - a}^{a} p_{jc}

Compared with prior art, the method and system that the present invention removes video jitter utilizes multiple sampling scale that picture frame is carried out registration, and calculates textural characteristics; Processing speed is very fast; And irrelevant with the contrast of image, only depend on size of images, adopt cost function to come the direction of motion of image is judged; And on this basis the direction of motion of image is proofreaied and correct, guaranteed the registration accuracy when the lower or image shift yardstick of the contrast of image is big.Utilize contiguous picture frame to come present frame is filled; Guaranteed to fill the back and had good continuity and consistency in the edge of present frame; Reduced the vestige of artificial treatment, the video of output has identical resolution with original video, and has kept the definition and the continuity of image.

Description of drawings

Fig. 1 is the flow chart that the present invention removes the method for video jitter;

Fig. 2 is the registration flow chart of the picture frame of the present invention's method of removing video jitter;

Fig. 3 is the pyramid model sketch map of the present invention's method of removing video jitter;

Fig. 4 is the schematic block diagram that the present invention removes the system of video jitter;

Fig. 5 a is that the video of the present invention's method of removing video jitter is at the movement locus of line direction and the movement locus sketch map of level and smooth rear video in various degree;

Fig. 5 b is that the video of the present invention's method of removing video jitter is at the movement locus of column direction and the movement locus sketch map of level and smooth rear video in various degree;

Fig. 6 be the present invention's method of removing video jitter present frame and with the shared edge constraint sketch map of the contiguous picture frame of present frame;

Fig. 7 a is a image that the present invention removes the method for video jitter when having different shift ratio, the registration accuracy of distinct methods;

Fig. 7 b is a image that the present invention removes the method for video jitter when having different contrast, the registration accuracy of distinct methods;

Fig. 7 c is the influence that image that the present invention removes the method for video jitter has different contrast diagonal angle spot check detecting number;

Fig. 7 d is a image that the present invention removes the method for video jitter when having different contrast, and three kinds of methods are carried out the image registration required time.

Embodiment

Below in conjunction with accompanying drawing the embodiment of the invention is elaborated.

In the present embodiment, adopt 8 sections different scenes shake videos of shaking videos and 30 sections individual recordings of 5 minutes that do not wait in 3 seconds to 54 seconds, picture frame is of a size of 320 * 240, and computer is configured to CPU:Pentium (R) D 3GHz; Internal memory: 1GB.From the video of different scenes intercepting totally 100 two field pictures be used for the explanation, 100 two field pictures have different contrasts and definition.Every two field picture has been carried out random offset in various degree, made that the shift ratio minimum is 0, be 1/2 of picture traverse or height to the maximum, the offset direction comprises level and vertical both direction.

Fig. 1 is the flow chart that the present invention removes the method for video jitter:

Step 1: current image frame in the video is carried out registration with respect to its neighborhood graph picture frame.

This step is specifically as shown in Figure 2:

The maximum down-sampling yardstick of step a, calculating current image frame carries out down-sampling according to maximum down-sampling yardstick to original image, generates down-sampled images.Wherein, maximum down-sampling yardstick is τ,

τ＝max{i-γ|2 ⁱ≤min{H，W}}

Wherein, i representes critical down-sampling yardstick, and when sampling scale is greater than i instantly, the image behind the down-sampling will become a pixel less than 2 * 2; γ is a relative parameter, is provided with in order to guarantee registration accuracy, comes τ is adjusted through the value of γ, makes the picture size that obtains behind the down-sampling be not less than 2 * 2, thereby has guaranteed the precision of registration; When γ was big more in the formula, the precision of registration was high more, but correctly the pixel count of registration will reduce, and it is 2 that relative parameter γ is set in the present embodiment.The height of H presentation video, the width of W presentation video.Size of images is big more, and the precision of overall Texture registration will be high more.The value of γ is all influential to the registration scope of image registration accuracy and image shift.As shown in Figure 3, γ value 2 in the present embodiment makes that the picture size behind the down-sampling is at least 3 * 3, has enough precision to guarantee image registration, makes the registration scope reach the half the of picture traverse or height simultaneously, satisfies the registration requirement of violent shake video.

Textural characteristics is represented with gray scale, level and the vertical gradient three-dimensional vector of image, is shown below:

f_{(x, y)} = [I_{(x, y)}, I_{(x, y)}^{x^{'}}, I_{(x, y)}^{y^{'}}]

Wherein, I _{(x, y)}For image at pixel (x, the gray value of y) locating, I _{(x, y)} ^x' and I _{(x, y)} ^y' presentation video is at pixel (x, level of y) locating and vertical gradient.

Cost function is:

C_{(u, v)} = \frac{1}{H \times W} Σ_{x = 1}^{H} Σ_{y = 1}^{W} {(f_{(x, y)}^{1} - f_{(x + u, y + v)}^{2})}^{2}

Wherein, u is the pixel-shift of horizontal direction, and v is the pixel-shift of vertical direction, the span of u and v all be 1,0,1}; C _{(u, v)}Be by (u, the cost value when v) direction is moved, the height of H presentation video, the width of W presentation video between two two field pictures.f _{(x, y)} ¹Presentation video is at pixel (x, the textural characteristics value of y) locating, f _{(x+u, y+v)} ²Represent that adjacent two field picture is in pixel (x+u, the textural characteristics value of y+v) locating.

The computing formula of the direction of motion of image is:

(u^{'}, v^{'}) = \{\begin{matrix} (u, v) & \min {C_{(u, v)} | u, v &Element; {- 1,0,1}} \leq φ \\ (0,0) & other \end{matrix}

Wherein, φ is a decision threshold, (u ', v ') be the direction of motion of adjacent two interframe.When the cost function value of minimum means that two two field pictures are diverse scenes during greater than some threshold value φ, otherwise can get the direction of motion (u ', v ') of adjacent two interframe.

Steps d, original image is proofreaied and correct, obtained the trace information of original image on line direction and column direction according to the direction of motion;

The peak excursion ratio is divided into from 1/10 to 1/2 do not wait 10 groups, and concrete shift ratio is as shown in table 1.Like the shift ratio of arbitrary image in 1/2 group between width and height 0 times to 1/2 times.Every two field picture all has 10 groups of peak excursion ratios, and every group produces 100 width of cloth migrated images at random, and the offset direction of document image and offset pixels number, obtains the trace information of former figure on line direction and column direction.

The corresponding registration accuracy of the different skew of table 1. yardstick

Shift ratio	1/10	1/8	1/6	1/5	1/4.5	1/4	1/3.5	1/3	1/2.5	1/2
											Registration accuracy	0.991	0.989	0.982	0.979	0.972	0.955	0.916	0.836	0.735	0.631

Step e, dwindling the down-sampling yardstick, the image after proofreading and correct is carried out down-sampling generate down-sampled images, get back to step b and continue to handle, is 0 o'clock end process until the down-sampling yardstick.

In the present embodiment, when the image after proofreading and correct is carried out down-sampling, adopt pyramid model to carry out down-sampling.As shown in Figure 3, the bottom is an original image, can obtain the intermediate layer image through the down-sampling of one time 2 * 2 yardstick, and the average of 2 * 2 pixels constitutes behind the down-sampling value of a pixel in the image of intermediate layer in this moment original image.After returning the textural characteristics of each pixel of step b calculating; Calculate the cost function of the down-sampled images of two picture frames according to the textural characteristics of each pixel; The direction of motion that obtains down-sampled images is (1,1), the skew that expression intermediate layer image has level and each 1 pixel of vertical direction.According to registration result original image is proofreaied and correct, obtained the trace information of original image on line direction and column direction and be (2,2).

Continue to dwindle the down-sampling yardstick, the intermediate layer image is carried out 2 * 2 yardsticks (be equivalent to former

Image carries out 4 * 4 yardsticks) down-sampling can obtain top layer images.The average of 2 * 2 pixels constitutes behind the down-sampling value of a pixel in the top layer images in this moment intermediate layer image.After returning the textural characteristics of each pixel of step b calculating; Calculate the cost function of the down-sampled images of two picture frames according to the textural characteristics of each pixel; The direction of motion that obtains down-sampled images is (1,1), representes that top layer images has level and the respectively skew of 1 pixel of vertical direction.According to registration result the intermediate layer image is proofreaied and correct, obtained the trace information of intermediate layer image on line direction and column direction and be (2,2), then the trace information of original image on line direction and column direction is (4,4), and this moment, the down-sampling yardstick was 0, end process.

Step 2: the trace information of the shake that produces in the registration is added up, and the movement locus of level and smooth shake generation, image is carried out and the rightabout correction of movement locus.

When the trace information of the shake that produces in the registration is added up, in the present embodiment, detected 100 two field pictures altogether, curve 501 is the movement locus of frame of video at line direction among Fig. 5 a, and curve 501 is the movement locus of frame of video at column direction among Fig. 5 b.At first calculate the number of pixels δ that current image frame need move at line direction _0rThe number of pixels δ that need move at column direction with current image frame _0c,

δ_{0 r} = p_{0 r} - \frac{1}{2 a + 1} Σ_{i = - a}^{a} p_{ir}

δ_{0 c} = p_{0 c} - \frac{1}{2 a + 1} Σ_{j = - a}^{a} p_{jc}

Wherein, shown in curve 501 among Fig. 5 a, p _IrBe the accumulation displacement of the image of adjacent i frame with current image frame in the movement locus, p at line direction _JcBe the accumulation displacement at column direction of the image of adjacent j frame with current image frame in the movement locus, α is the frame of video width that the filtering shake is adopted, like 5 frames among Fig. 5 a, Fig. 5 b, and 10 frames or 20 frames, p _0rThe expression current image frame is at the accumulation displacement of line direction, p _0cExpression expression current image frame is in the accumulation displacement of column direction, and i .j are natural number; According to δ _0rAnd δ _0cMove the movement locus that current image frame comes level and smooth shake to produce in the other direction.

Can detect the jitter conditions between frame of video through multiple dimensioned overall Texture registration, the movable information that calculating just can obtain video camera is accumulated in shake.Video contains a large amount of radio-frequency components, promptly violent shake in the movement locus of line direction and column direction.Low-frequency component in the movement locus is the subjectivity motion of video camera, should keep.It is exactly for the radio-frequency component in the filtering movement locus that video goes dither operation.After detecting the movable information of video, the movement locus of line direction capable of using and column direction comes the shake of filtering video.Adopt the smoothing processing of movement locus to reach the purpose of filtering shake among the present invention, and obtained good effect.

Shake filtering method in the present embodiment similar with movement locus is carried out LPF, the curve 502 among Fig. 5 a is the smoothing processing results that movement locus carried out 11 frame width (about each 5 frames); Adopt 21 frame width to come the smooth motion track can obtain the movement locus shown in the curve 503 among Fig. 5 a; Adopt 41 frame width to handle and to obtain the movement locus shown in the curve 504 among Fig. 5 a.With respect to original video movement locus 501, the movement locus that removes to shake rear video is more level and smooth, filtering the high dither interference, and kept the low frequency movement trend in the curve 501, reflected the subjective direction of motion of video camera.In like manner, the different curves among Fig. 5 b have reflected the movement locus of level and smooth in various degree rear video at column direction.

After motion detection and shake filtering processing, video is basicly stable.To the skew of image, cause the periphery of video white space to occur in the trimming process.Some videos went the result of dither algorithm to reduce the resolution of removing to shake rear video usually in the past.The present invention will adopt the neighborhood graph picture frame of the sharing edge constraint zone that fills in the blanks, and make output video and input have identical resolution, continuity and consistency that the assurance frame of video is located on the edge of, and kept the contrast of frame of video.

Deviation appears in the result of image registration unavoidably, if directly adopt the white space of filling present frame with the contiguous picture frame of current image frame, stays tangible artificial treatment vestige easily.The present invention will adopt and share the result that edge constraint is come the authentication image registration.As shown in Figure 6, solid box 601 is viewing areas, and frame of broken lines 602 is the picture frames after the translation.Through motion detection and filtering, current image frame need cause display window downside and right side white space to occur to the upper left side translation to eliminate shake.The picture frame (dotted line 603) contiguous with current image frame need cover a part of white space in the display window to the lower right translation.Black heavy line 604 is the shared edge of two two field pictures among Fig. 6, adopts the variance in the set scope to describe the level and smooth degree of sharing the edge.Set scope makes the center traversal of set scope share the point on the edge shown in grey square frame 605 among Fig. 6, and the average of variance of brightness of adding up set scope interior pixel is as sharing the edge constraint value.

Share edge constraint value S _k(x ₀, y ₀) be:

S _k(x ₀，y ₀)＝S _0k(x ₀，y ₀)+S _k0(x ₀，y ₀)

Wherein,

S_{0 k} (x_{0}, y_{0}) = \frac{1}{m} \underset{(x, y) &Element; ω_{1} (x_{0}, y_{0})}{Σ} {(I (x, y) - \overset{&OverBar;}{I (x_{0 k}, y_{0 k})})}^{2}

For current image frame with respect to the contiguous k two field picture of current image frame at (x ₀, y ₀) the shared edge constraint located.

S_{k 0} (x_{0}, y_{0}) = \frac{1}{m} \underset{(x, y) &Element; ω_{2} (x_{0}, y_{0})}{Σ} {(I_{k} (x, y) - \overset{&OverBar;}{I (x_{0 k}, y_{0 k})})}^{2}

For with the contiguous k two field picture of current image frame with respect to current image frame at (x ₀, y ₀) the shared edge constraint located.(x ₀, y ₀) be current image frame with the contiguous k two field picture of current image frame in the coordinate of pixel at edge; (x y) is pixel coordinate in the display window; M is with (x ₀, y ₀) be the number of pixels of the set scope at center; ω ₁(x ₀, y ₀) be the zone of current image frame in set scope, ω ₂(x ₀, y ₀) be k two field picture the zone in set scope contiguous with current image frame, (x is that current image frame is at (x, the brightness value of y) locating, I y) to I _k(x, y) be with the contiguous k two field picture of current image frame at (x, the brightness value of y) locating, I (x _0k, y _0k) be the contiguous k two field picture of current image frame and current image frame with (x ₀, y ₀) be the brightness average of the set scope at center, promptly

\overset{&OverBar;}{I (x_{0 k}, y_{0 k})} = \frac{1}{m} (\underset{(x, y) &Element; ω_{1} (x_{0}, y_{0})}{Σ} I (x, y) + \underset{(x, y) &Element; ω_{2} (x_{0}, y_{0})}{Σ} I_{k} (x, y)) .

Set scope be with current image frame with the contiguous k two field picture of current image frame in the coordinate (x of pixel at edge ₀, y ₀) be the center, 3% to 1% of viewing area width is the square area of the length of side.

Share edge constraint value S _k(x ₀, y ₀) more little, explain two two field pictures locate on the edge of level and smooth more, i.e. image registration accurate more, the confidence level of filling current image frame with this contiguous frames is high more.Local window adopts 5 * 5 size in the present embodiment.

After obtaining the shared edge constraint of two two field pictures; Just can confirm the levels of precision of registration by the binding occurrence at the shared edge of this frame and current image frame; When the binding occurrence of sharing the edge during less than set threshold value; Use the white space that produces in present frame edge when filling level and smooth shake, otherwise white space is filled with the next frame of the picture frame of this vicinity with the corresponding region of the contiguous picture frame of current image frame.Be shown below, when shared edge constraint during less than set threshold value ρ (being made as 100 in the present embodiment), the white space that can be used for current image frame is filled; Otherwise, can not use this contiguous frames.λ _kEqual the contiguous k two field picture of 1 expression and can be used for the white space filling.

When contiguous multiple image all can be used for the white space filling of current image frame, preferentially select the picture frame more contiguous for use with current image frame.Because what two two field pictures leaned in the video is near more, the coefficient that rotation and convergent-divergent change between image is more little, meets the hypothesis of the fast image registration method of the present invention's proposition more, and the precision of image registration is higher, and the artificial treatment vestige after white space is filled also will be more little.

Step 4: jump to the next frame of current image frame, return step 1, finish after all images frame in video disposes.

Referring to Fig. 4, a kind of system 400 of removing video jitter is disclosed, comprise registration apparatus 401, means for correcting 402 and filling device 403.Wherein registration apparatus 401 carries out registration with current image frame in the video with respect to its neighborhood graph picture frame; Means for correcting 402 adds up the trace information of the shake that current image frame in the registration produces, and the movement locus of level and smooth shake generation, and current image frame is carried out and the rightabout correction of movement locus; The white space that the picture frame that filling device 403 uses and current image frame is contiguous produces in current image frame edge when filling level and smooth the shake.Registration apparatus 401 further comprises: downsampling unit 411, be used to calculate the maximum down-sampling yardstick of current image frame, and according to maximum down-sampling yardstick original image is carried out down-sampling, generate down-sampled images; And dwindle the down-sampling yardstick, and the image after the direction of motion correction is carried out down-sampling to generate down-sampled images, be 0 until the down-sampling yardstick; Computing unit 412 is used for the textural characteristics in each pixel of down-sampled images calculating; The direction of motion is confirmed unit 413, is used for calculating according to the textural characteristics of each pixel the cost function of the down-sampled images of two picture frames, thereby obtains the direction of motion of down-sampled images; Direction of motion correcting unit 414 is used for according to the direction of motion original image being proofreaied and correct, and obtains the trace information of original image on line direction and column direction.Can implement the method for said video jitter through this system.

Registration process of this method and Harris and SUSAN Corner Detection and method for registering compare, and comprise the registration accuracy that changes along with shift ratio, registration accuracy that changes along with picture contrast and the registration speed that changes along with picture contrast.Fig. 7 a has showed when image has different shift ratio, the registration accuracy of distinct methods.Among the figure, the registration process of this method is gathered textural characteristics and is carried out registration, so represent with MSGTR (the multiple dimensioned overall Texture registration method of Multi-Scale Global Texture Registeration) owing to adopt multiple yardstick to carry out down-sampling.Abscissa be the scope of 10 o'clock presentation video line directions and column direction skew from 0 to width and 1/10 of height.Can know that from figure when image shift was big, the registration process registration accuracy of this method was higher relatively, is 63.1%; In image shift hour, its registration accuracy and SUSAN Corner Detection are suitable with matching process, and be more relatively low than the registration accuracy of Harris Corner Detection and method for registering.The concrete registration accuracy of the whole bag of tricks is as shown in table 2.

The corresponding registration accuracy of three kinds of methods of table 2. different skew yardstick

Fig. 7 b has showed when image has different contrast, the registration accuracy of distinct methods.When the contrast of image was low, the levels of precision of Harris and SUSAN Corner Detection decreased, thereby influences the precision of image registration.Visible from figure, the registration process of this method of MSGTR representative receives the influence of contrast less relatively, and when contrast was minimum, registration accuracy was 82%; Have the greatest impact and Harris Corner Detection and matching process receive, accuracy of registration is merely 69%.When contrast was higher, the registration accuracy of three kinds of methods was all higher.

Fig. 7 c has shown that image has the influence of different contrast diagonal angle spot check detecting number.Visible from figure, along with the enhancing of picture contrast, Harris and SUSAN angular-point detection method can detect more angle point.And under the same contrast situation, the detected angle point number of SUSAN angular-point detection method is more more.This is an image when having same contrast, and SUSAN Corner Detection and the slightly higher reason of method for registering precision are shown in Fig. 5 a and Fig. 5 b.

Fig. 7 d has shown when image has different contrast that three kinds of methods are carried out the situation of change of image registration required time.Visible from figure; Along with the raising of picture contrast, Harris and SUSAN algorithm detect more angle point, also consume more time; But the registration process of this method of MSGTR representative receives the influence of picture contrast less, and consuming time obviously less than normal than other two kinds of methods.

The performance of registration process of this method and Harris and SUSAN Corner Detection and method for registering, the registration speed of the registration process of this method and the contrast of image are irrelevant, only receive the influence of picture size.When picture size is 320 * 240; The processing time of the registration process of this method is merely 16ms, and confirm by the binding occurrence of sharing the edge fill level and smooth shake with the contiguous picture frame of present frame the time white space that produces in present frame edge, guaranteed the continuity and the consistency at edge; Make reconstruction back image have identical resolution with original input video; Locate on the edge of and can seamlessly transit, the fill area has higher contrast ratio, has removed the shake of video effectively.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. a method of removing video jitter is characterized in that, comprising:

Step 2: the trace information of the shake of current image frame generation in the registration is added up, and the movement locus of level and smooth shake generation, current image frame is carried out and the rightabout correction of movement locus;

Step 4: jump to next picture frame of current image frame, return step 1, all images frame in video disposes;

Wherein, step 1 further comprises:

Step e, dwindling the down-sampling yardstick, the image after proofreading and correct is carried out down-sampling, return step b, is 0 until the down-sampling yardstick;

In the said step 2, when the trace information of the shake that produces in the registration is added up, at first calculate the number of pixels δ that current image frame need move at line direction _0rThe number of pixels δ that need move at column direction with current image frame _0c,

2. method according to claim 1 is characterized in that, among the said step a, maximum down-sampling yardstick is τ,

τ＝max{i-γ|2 ⁱ≤min{H，W}}

Wherein, i representes critical down-sampling yardstick, and when sampling scale is greater than i instantly, the image behind the down-sampling will become a pixel less than 2 * 2, and γ is a relative parameter, the height of H presentation video, the width of W presentation video.

3. method according to claim 1 is characterized in that, among the said step b, textural characteristics is represented with gray scale, level and the vertical gradient three-dimensional vector of image, is shown below:

Wherein, I _{(x, y)}For image pixel (x, the gray value of y) locating,

With

Presentation video is at pixel (x, level of y) locating and vertical gradient.

4. method according to claim 1 is characterized in that, among the said step c, cost function is:

Presentation video pixel (x, the textural characteristics value of y) locating,

5. method according to claim 4 is characterized in that, among the said step c, the computing formula of the direction of motion of image is:

6. method according to claim 1 is characterized in that, among the said step e, when the image after proofreading and correct is carried out down-sampling, adopts pyramid model to carry out down-sampling.

7. method according to claim 1; It is characterized in that; In the said step 3; Calculate current image frame and and the contiguous picture frame of current image frame between shared edge constraint value, when shared edge constraint value during, use the white space of current image frame edge generation when filling level and smooth shake with the corresponding region of the contiguous picture frame of current image frame less than set threshold value; Otherwise the next frame of using the picture frame that is close to current image frame carries out the threshold value coupling, and the white space that produces until current image frame edge is filled and finished.

8. method according to claim 7 is characterized in that, in the said step 3, shares edge constraint value S _k(x ₀, y ₀) be:

S _k(x ₀，y ₀)＝S _0k(x ₀，y ₀)+S _k0(x ₀，y ₀)

Wherein,

9. a system of removing video jitter is characterized in that, comprising:

Filling device, the white space that produces in current image frame edge when being used to use the picture frame that is close to current image frame to fill level and smooth the shake,

Wherein, said registration apparatus further comprises:

The direction of motion is confirmed the unit, is used for calculating according to the textural characteristics of each pixel the cost function of the down-sampled images of two picture frames, thereby obtains the direction of motion of the down-sampled images of current image frame;

Direction of motion correcting unit is used for according to the direction of motion current image frame being proofreaied and correct, and obtains the trace information of current image frame on line direction and column direction,

When the trace information of the shake that produces in registration at means for correcting adds up, at first

Calculate the number of pixels δ that current image frame need move at line direction _0rThe number of pixels δ that need move at column direction with present frame _0c,