CN111355881B

CN111355881B - Video stabilization method for simultaneously eliminating rolling artifacts and jitters

Info

Publication number: CN111355881B
Application number: CN201911260902.8A
Authority: CN
Inventors: 肖亮; 吴慧聪; 杨帆
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2021-09-21
Anticipated expiration: 2039-12-10
Also published as: CN111355881A

Abstract

The invention discloses a video stabilization method for simultaneously eliminating rolling artifacts and jitter, which comprises the following steps: 1) gridding the video frame; 2) estimating inter-frame motion; 3) constructing a data fidelity term and a motion smoothing regular term; 4) constructing an interframe intraframe motion joint optimization model; 5) estimating intra-frame motion; 6) setting an adaptive sliding window; 7) calculating self-adaptive weight; 8) solving restoration transformation; 9) while de-rolling artifact stabilization video generation. According to the method, each frame of a gridded video is converted by utilizing homography to simulate inter-frame motion, and rigid conversion simulates intra-frame motion, an inter-frame intra-frame motion combined model is established, and the recovery conversion for removing rolling artifacts and dithering is directly solved; the method can simultaneously realize the removal of the rolling artifact of the video and the stabilization of the video, avoids the problem of over-smoothness in the video shaking removal process, and can be widely applied to various types of video stabilization such as mobile phone shooting, unmanned aerial vehicle shooting, vehicle navigation and the like which adopt CMOS cameras.

Description

Video stabilization method for simultaneously eliminating rolling artifacts and jitters

Technical Field

The invention relates to a dithering video stabilization technology, in particular to a video stabilization method for simultaneously eliminating rolling artifacts and dithering.

Background

In the field of video processing and display, video signals shot by a vehicle-mounted camera shooting platform, an unmanned aerial vehicle or ship camera shooting system, a handheld camera shooting device and the like by adopting a CMOS camera often have rolling artifacts due to line-by-line imaging of the camera, and the shooting process is subject to random disturbance and also easily causes video jitter. On one hand, these video degradations are very likely to cause visual fatigue of video observers and affect the content understanding of video images, resulting in erroneous judgment or missed judgment of the observers; on the other hand, these video judder and rolling effect often hinder subsequent processing of these videos, such as tracking, recognition, pattern analysis, and the like.

Currently, there are many methods for the separation process of rolling artifacts and video judder, such as Robust Mesh repair method [ Yeong Jun Koh, Chulwoo Lee, and Chang-Su Kim.2015.video Stabilization Based on Feature extraction and Selection and Robust Mesh Processing. IEEE Transactions on Processing 24,12(2015),5260 Transactions 5273 ], and subspace method [ Feng Liu, Michael Gleicher, Jue Wang, Hailin Jin and asm Agarwala.2011.subspace video Stabilization on ACM Transactions on Graphics30,1 (4) ]. The robust mesh restoration method adopts a separation processing frame, firstly, the motion trail without rolling artifacts is estimated, and then the motion smoothing is carried out on the corrected characteristic trail. And the subspace method takes the rolling artifacts as a set of structured noise, and implicitly removes the rolling artifacts in the video image stabilization process.

However, whether the robust mesh repairing method or the subspace method is based on processing rolling artifacts or video jitter respectively, the algorithm process is complex.

Disclosure of Invention

The invention aims to provide a video stabilization method for simultaneously eliminating rolling artifacts and jitters.

The technical solution for realizing the purpose of the invention is as follows: a video stabilization method for simultaneously eliminating rolling artifacts and jitter, the method comprising the steps of:

step 1, gridding video frames: suppose that the observed jittered video sequence containing rolling artifacts is: { I_t|t∈[1,N]Where N denotes the number of frames of the video sequence, for each frame of the video image a grid of the form 8 x 8 is defined, the t-th frame I_tThe grid of the ith row and the jth column can be expressed as

Defining the exposure time between adjacent grids under the same grid column as unit time;

step 2, estimating inter-frame motion: detecting the motion characteristic points by using the characteristic points of the grids corresponding to two adjacent frames in the video sequence, and calculating a rigid transformation matrix and a homography matrix of each frame of video image by using a random sampling consistency method, wherein the rigid transformation matrix and the homography matrix of the ith row and the jth column grid of the t frame can be respectively expressed as follows:

wherein i, j ∈ [1,8 ]],t∈[1,N]；

Step 3, constructing a data fidelity term and a motion smoothing regular term: defining a grid

In a unit timeIn-frame motion of

Using the current frame I_tBefore and after frame I_t-1And I_t+1；

According to the fidelity between the sum of the intra-frame motion under the same grid column and the inter-frame motion, constructing the fidelity item of the inter-frame intra-frame motion data

According to the similarity of the intra-frame motion under the condition of dense sampling density, constructing an intra-frame motion smoothing regular term

And 4, constructing an interframe intraframe motion joint optimization model: and (3) establishing an interframe intraframe motion joint optimization model according to the constraint items constructed in the step 3: arg min_{F}P (F) + λ Q (F), where the regularization parameter λ > 0;

step 5, estimating intra-frame motion: respectively optimizing the combined model in the step 4 according to the additivity of the angle theta, the horizontal displacement x and the vertical displacement y in the rigid transformation matrix parameters, finally respectively solving the three parameters of the angle theta, the horizontal displacement x and the vertical displacement y of the intra-frame motion, and synthetically solving the intra-frame motion matrix

Step 6, setting an adaptive sliding window: adopting windowing processing to each grid, setting the window size to be s, and obtaining the t-th frame I in the step 2_tInter-frame motion matrix of

And the intra-frame motion matrix obtained in the step 5

s has a value in the range of [0,30 ]]An integer of (d);

step 7, calculating self-adaptive weight: calculating the time distance and the space distance between the current grid and the k frame global shutter point grid, and then weighting

Where G (-) represents a Gaussian function, a set of weight vectors is finally obtained

And estimating a uniform weight vector for the grid of the same frame:

||·||₁is the L1 norm of the matrix;

step 8, solving restoration transformation: according to the adaptive weight obtained in step 7 and the relation

The tth frame I can be solved_tA recovery change of the grid of the ith row and the jth column, wherein w_t,kIn order to adapt the weights adaptively to each other,

as a grid in the window

To the grid

The accumulation of motion between the frames in between,

for the i row and j column grid of the k frame to the grid column global shutter point, then

Can express whenTotal motion of the front mesh to the k-th frame global shutter point;

and 9, simultaneously generating a rolling artifact stabilization video: according to a transformation matrix

And redrawing each grid of each frame of video image to finally generate a stable video image sequence with rolling artifacts removed.

Further, when constructing the data fidelity item in step 3, according to the grid

Will be the same as other grids in the same grid column

And k ≠ i } share, can obtain properties

Grid on the left side of the equation

After 8 units of time frame motion accumulation, the right side is the corresponding frame motion of the grid, and according to the property, the data fidelity item P (F) between the sum of the frame motion and the frame motion can be designed.

Further, when the motion smoothing regularization term is constructed in step 3, the mesh is subjected to similarity under the high-frequency sampling condition according to the intra-frame motion

In a frame

Which is a grid with the next grid row of the same grid column

In a frame

Should there be similarity, then a smooth regularization term can be designed

Further, the rigid transformation in step 5 is defined as three-degree-of-freedom transformation

Wherein theta, x and y respectively represent a rotation angle, horizontal displacement and vertical displacement between every two adjacent grids; the transformation having the additive property that

Further, if the rigid transformation in step 5 has an additive property, the optimization model in step 4 can be converted into an optimization model in three degrees of freedom, and for horizontal displacement, f represents horizontal displacement in a frame, and r represents horizontal displacement between frames, the data fidelity term and the motion smoothing term in step 3 can be converted into:

conversion of the optimization model into argmin_{f}P (f) + lambda Q (f), the horizontal displacement of each grid can be obtained by a new model

The vertical displacement and the rotation angle solving method are the same as the horizontal displacement; the model can be expressed in matrix-vector form as:

further, when the adaptive weight is constructed in step 7

Then, define the grid

To the grid

The temporal distance and the spatial distance are respectively | t-k |, and

then it is the horizontal distance between the two grids,

is the vertical distance between the two grids.

Further, step 8 defines

The motion accumulation in the frame from the ith row and j column grids of the kth frame to the global shutter point, if the 4 th grid row is taken as the global shutter point, the t-th frame I_tThe intra motion restoration matrix for the grid of row k and column j can be expressed as:

by passing

The rolling artifacts for each grid can be removed.

Compared with the prior art, the invention has the following remarkable advantages: (1) aiming at the degradation problem that the rolling artifacts and the shaking of the low-quality video occur simultaneously, the prior art usually needs two separate preprocessing processes of rolling artifact removal and shaking removal, but the invention establishes a joint processing optimization model, and can overcome the problems of under-smoothing and over-smoothing in video stabilization through the intra-frame and inter-frame motion estimation of gridding video frames and the full-time adjustment of a self-adaptive sliding window; (2) the method makes full use of the correlation of the interframe intraframe motion and adaptively adjusts the weight parameters, so that the method has good effects on the aspects of removing the jitter and rolling artifacts of the video; (3) the method simulates interframe motion by utilizing homography transformation and rigid transformation of each frame of the gridded video, establishes an interframe intraframe motion combined model, directly solves the recovery transformation for removing rolling artifacts and dithering, and can be widely applied to various types of video stabilization such as mobile phone shooting, unmanned aerial vehicle shooting, vehicle navigation and the like by adopting a CMOS camera.

Drawings

Fig. 1 is a flow chart of a video stabilization method for simultaneously eliminating rolling artifacts and jitter according to the present invention.

Fig. 2(a) is a first frame diagram of a first test video.

Fig. 2(b) is a second frame diagram of the first test video.

Fig. 2(c) is a graph of the residual between the original two frames.

Fig. 2(d) is a graph of the residual between two frames after de-jittering using the method alone.

Fig. 2(e) is a diagram of the residual between two frames after being processed by the robust mesh repair method.

Fig. 2(f) is a diagram of the residual between two frames after the method simultaneously de-jitters and de-rolling the artifacts.

Fig. 3(a) is an original feature point trajectory diagram.

Fig. 3(b) is a feature point trajectory diagram after processing by the robust mesh repairing method.

Fig. 3(c) is a feature point trajectory diagram after processing by the subspace method.

FIG. 3(d) is a trace diagram of feature points after processing by the method of the present invention.

Fig. 4(a) is a randomly chosen three-frame original picture in a test video.

Fig. 4(b) is a result diagram of the three-frame image after being processed by the robust mesh repair method.

Fig. 4(c) is a result diagram of the three-frame image processed by the subspace method.

FIG. 4(d) is a graph of the results of three frame images processed by the method of the present invention.

Fig. 5 (1) - (10) are 10 test video images containing dithering and rolling artifacts, respectively, used in the experiment of the present invention.

Fig. 6 is a video visual evaluation result chart for 35 users.

Detailed Description

The invention provides a joint optimization method for simultaneously eliminating rolling artifacts and jitter by establishing an estimation model for interframe intra-frame motion aiming at videos shot by a CMOS camera and realizing the effect of simultaneously removing the rolling artifacts and the jitter in the videos. The following describes the implementation process of the present invention in detail with reference to fig. 1:

And defining the exposure time between adjacent grids under the same grid column as unit time.

Step 2, estimating inter-frame motion: applying feature point detection to the grids corresponding to two adjacent frames in the video sequence to obtain dense motion feature points, and calculating a rigid transformation matrix and a homography matrix of each frame of video image by using a random sampling consistency method, wherein the rigid transformation matrix and the homography matrix of the ith row and the jth column grid of the t frame can be respectively expressed as:

wherein i, j ∈ [1,8 ]],t∈[1,N]。

The intra-frame motion in unit time of

Using the current frame I_tBefore and after frame I_t-1And I_t+1。

Step 3.1, when constructing the data fidelity item, according to the grid

Will be the same as other grids in the same grid column

And k ≠ i } share, can obtain properties

Grid on the left side of the equation

After the accumulation of the motion in the frame within 8 unit time, the right side is the motion between frames corresponding to the grid, and the number between the sum of the motion in the frame and the motion between the frames can be designed according to the propertyAccording to fidelity terms P (F).

Step 3.2, when constructing the motion smoothing regular term, according to the similarity that the intra-frame motion should have under the high-frequency sampling condition, for the grid

In a frame

Which is a grid with the next grid row of the same grid column

In a frame

Should there be similarity, then a smooth regularization term can be designed

And 4, constructing an interframe intraframe motion joint optimization model: and (3) establishing an interframe intraframe motion joint optimization model according to the constraint items constructed in the step 3: arg min_{F}P (F) + λ Q (F), where the regularization parameter λ > 0.

Step 5.1, rigid transformation is defined as three-degree-of-freedom transformation

Where θ, x and y represent rotation angles between two adjacent grids, horizontalDisplacement and vertical displacement. According to the additivity that the transformation has, i.e.

Step 5.2, according to the additive property of the rigid transformation, the optimization model in step 4 can be converted into an optimization model in three degrees of freedom, for example, for horizontal displacement, f represents horizontal displacement in a frame, and r represents horizontal displacement between frames, the data fidelity term and the motion smoothing term in step 3 are converted into:

conversion of the optimization model to argmin_{f}P (f) + lambda Q (f), the horizontal displacement of each grid can be obtained by a new model

And spread to vertical displacement and rotation angle. The model can be expressed in matrix-vector form as:

And the intra-frame motion matrix obtained in the step 5

Value of sIn the range of [0,30]Is an integer of (1).

And estimating a uniform weight vector for the grid of the same frame:

||·||₁is the L1 norm of the matrix.

When constructing adaptive weights

Then, define the grid

To the grid

The temporal distance and the spatial distance are respectively | t-k |, and

then it is the horizontal distance between the two grids,

is the vertical distance between the two grids.

as a grid in the window

To the grid

The accumulation of motion between the frames in between,

for the i-row and j-column grid of the k-th frame to the global shutter point of the grid column, then

The total motion of the current mesh to the k-th frame global shutter point may be represented.

Definition of

The motion accumulation in the frame from the I-th frame row and j-column grid to the global shutter point of the grid column is performed, if the 4-th grid row is used as the global shutter point, the t-th frame I_tThe intra motion restoration matrix for the grid of row k and column j can be expressed as:

by passing

The rolling artifacts for each grid can be removed.

For each net of each frame of video imageAnd redrawing the grids to finally generate a stable video image sequence without rolling artifacts.

The effect of the invention can be further illustrated by the following simulation experiment:

(1) simulation conditions

The simulation experiment adopts ten groups of shaking video data containing rolling artifacts, and the simulation experiment is completed by adopting Matlab R2012 under a Windows7 operating system. The processor is Xeon W3520 CPU (2.66GHz) and the memory is 4 GB. The initialization values of the parameters in the simulation experiment are as follows: the regularization parameter λ is set to 1, the standard deviation of the two gaussian functions is 6 and 30, respectively, the window length is 2s +1 and s is 30.

The performance of the method is analyzed through the subjective visual experience qualitative evaluation of the user. In the experiment, 35 participants subjectively scored the result videos of different stabilization methods, and for fairness testing, the participants selected videos with better human visual perception when the specific methods are unknown.

(2) Emulated content

The invention adopts the de-jitter performance of the real jitter video data inspection algorithm, and the test video is the jitter video containing the rolling artifacts. In order to test the performance of the algorithm, the proposed video stabilization method is compared with the current international mainstream method. The comparison method comprises the following steps: a robust mesh repair method and subspace approach.

(3) Analysis of simulation experiment results

Fig. 2(a) to 2(f) are residual maps of two frames of images obtained by different methods for removing a rolling artifact, fig. 3(a) to 3(d) are feature point trajectory maps processed by different video stabilization methods, fig. 4(a) to 4(d) are result maps obtained by different methods for simultaneously removing a rolling artifact and a shake, fig. 5 is 10 test videos, and fig. 6 is a visual evaluation result of 10 test videos by 35 users.

In fig. 2(a) to 2(f), the restoration effect is visualized by using a frame difference method. Fig. 2(a) is a first frame image of a first test video, fig. 2(b) is a second frame image of the first test video, fig. 2(c) is a residual image between two original frames, fig. 2(d) is a residual image between two frames after de-jittering only by the method, fig. 2(e) is a residual image between two frames after being processed by the robust mesh repairing method, and fig. 2(f) is a residual image between two frames after the de-jittering and de-rolling artifacts are simultaneously performed by the method. It can be obviously observed that the method can remove shaking and rolling artifacts simultaneously to obtain the minimum residual error between two frames of images, and can well remove the rolling artifacts and shaking effects contained in the video.

Fig. 3(a) is an original feature point trajectory diagram, fig. 3(b) is a feature point trajectory diagram processed by the robust mesh restoration method, fig. 3(c) is a feature point trajectory diagram processed by the subspace method, and fig. 3(d) is a feature point trajectory diagram processed by the method of the present invention. It can be observed that the method of the present invention has good effect when processing video jitter, and the adoption of the adaptive weight design can not easily cause the phenomenon of over-smoothing when processing the jitter of step motion type, which shows that the method of the present invention has excellent effect in the aspect of video de-jitter.

Fig. 4(a) is a diagram of three randomly selected original frames in a test video, fig. 4(b) is a diagram of the result of the three frames processed by the robust mesh repair method, fig. 4(c) is a diagram of the result of the three frames processed by the subspace method, and fig. 4(d) is a diagram of the result of the three frames processed by the method of the present invention. As can be seen from fig. 4(a) -4 (d), the subspace method only treats the rolling artifacts as a structured noise implicitly in the video de-jittering process, and as a result, a spatial distortion phenomenon occurs, whereas both the robust mesh repairing method and the method of the present invention have a specific step to treat the rolling artifacts, so that the object can be corrected to some extent. In addition, because the invention adopts the self-adaptive weight, the result of the invention saves more image information under the condition of fast motion than other two methods.

Fig. 6 shows a video visual evaluation result chart for 35 users. Since one may have different criteria, it is difficult to design an index to quantitatively assess the effectiveness of stabilization and roller door correction. Thus, we performed a user survey of 35 participants, and performed a qualitative comparison. Fig. 5 shows randomly selected test videos, and for each test video, the user selects the video that is considered to have the best visual effect without knowing the three repair methods (robust mesh repair method, subspace method, and method of the present invention). In the test cases of 7 th, 9 th and 10 th videos, no participant selects the subspace method, and the visual experience is greatly reduced because of obvious geometric distortion. In addition, more users prefer the method, and the method is considered to achieve better balance between motion smoothing and information storage, so that the method is considered to be superior to the other two most advanced methods, and the video jitter and rolling artifacts are removed simultaneously.

Claims

1. A video stabilization method for simultaneously eliminating rolling artifacts and jitter, the method comprising the steps of:

wherein i, j ∈ [1,8 ]],t∈[1,N]；

In a unit timeIn-frame motion of

Using the current frame I_tAnd its preceding and following frames I_t-1、I_t+1Constructing an interframe intraframe motion data fidelity term and an intraframe motion smoothing regular term:

And 4, constructing an interframe intraframe motion joint optimization model: and (3) establishing an interframe intraframe motion joint optimization model according to the constraint items constructed in the step 3: argmin_{F}P (F) +. lambda Q (F), where the regularization parameter lambda>0；

Step 6, setting an adaptive sliding window: adopting windowing processing to each grid, setting the window size to be s, and obtaining the t-th frame I in the step 2_tIs expressed as:

the intra motion matrix obtained in step 5 is represented as:

wherein s has a value in the range of [0,30 ]]An integer of (d);

And estimating a uniform weight vector for the grid of the same frame:

is the L1 norm of the matrix; defining a grid

To the grid

The temporal distance and the spatial distance are respectively | t-k |, and

is a grid

And a grid

The horizontal distance between the two plates,

is a grid

And a grid

The vertical distance therebetween;

as a grid in the window

To the grid

The accumulation of motion between the frames in between,

May represent the total motion of the current mesh to the k-th frame global shutter point;

2. The video stabilization method for simultaneous removal of rolling artifacts and jitter according to claim 1, wherein: when constructing the data fidelity item in step 3, according to the grid

Will be mapped to other grids in the same grid column

And k ≠ i } share, can obtain properties

Grid on the left side of the equation

3. The video stabilization method for simultaneous removal of rolling artifacts and jitter according to claim 1, wherein: when the motion smoothing regular term is constructed in the step 3, according to the similarity of the intra-frame motion under the high-frequency sampling condition, for the grid

In a frame

Which is a grid with the next grid row of the same grid column

In a frame

Should there be similarity, then a smooth regularization term can be designed

4. The video stabilization method for simultaneous removal of rolling artifacts and jitter according to claim 1, wherein: the rigid transformation in step 5 is defined as a three-degree-of-freedom transformation

5. The video stabilization method for simultaneous removal of rolling artifacts and jitter according to claim 1, wherein: in step 5, according to the additive property of the rigid transformation, the optimization model in step 4 can be converted into an optimization model in three degrees of freedom, for horizontal displacement, f represents horizontal displacement in a frame, and r represents horizontal displacement between frames, the data fidelity term and the motion smoothing term in step 3 are converted into:

6. the video stabilization method for simultaneous removal of rolling artifacts and jitter according to claim 1, wherein: defined in step 8

by passing

The rolling artifacts for each grid can be removed.