CN112001860A - Video debounce algorithm based on content-aware blocking strategy - Google Patents

Video debounce algorithm based on content-aware blocking strategy Download PDF

Info

Publication number
CN112001860A
CN112001860A CN202010810101.0A CN202010810101A CN112001860A CN 112001860 A CN112001860 A CN 112001860A CN 202010810101 A CN202010810101 A CN 202010810101A CN 112001860 A CN112001860 A CN 112001860A
Authority
CN
China
Prior art keywords
frame
video
follows
constraint
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010810101.0A
Other languages
Chinese (zh)
Inventor
凌强
赵敏达
王健
李峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010810101.0A priority Critical patent/CN112001860A/en
Publication of CN112001860A publication Critical patent/CN112001860A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/21Circuitry for suppressing or minimising disturbance, e.g. moiré or halo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video debounce algorithm based on a content-aware blocking strategy, which comprises the following steps: step1, extracting a characteristic track from the video, obtaining the characteristic point distribution of each frame according to the characteristic track, and performing triangle segmentation according to the characteristic points; step2, based on the track smooth constraint, the interframe similarity transformation constraint, the intraframe similarity transformation constraint and the regular constraint, and solving the stable position of the characteristic point through self-adaptive weight setting, and further solving the stable position of the edge control point through the interframe similarity transformation constraint and the intraframe similarity transformation constraint according to the obtained stable position of the characteristic point; step 3. mapping from a dithered view to a stabilized view is established based on the affine transformed frame-by-frame image. According to the invention, each frame in the video is divided into triangular mesh structures with different numbers and sizes according to the distribution of the track points, and the triangular meshes are utilized to carry out inter-frame motion estimation and smoothing processing, so that a more stable effect can be generated, and the phenomenon of local distortion in the generated video is avoided.

Description

Video debounce algorithm based on content-aware blocking strategy
Technical Field
The invention relates to the technical field of computer vision and video debouncing, in particular to a video debouncing algorithm based on a content-aware blocking strategy.
Background
In recent years, as the video debounce problem is studied more and more algorithms are proposed. Despite the numerous correlation algorithms, existing algorithms follow the following three steps: camera motion estimation, camera motion smoothing, and mapping of a shake perspective to a stable perspective. We can classify the debounce algorithm into a 2D algorithm, a 3D algorithm, and a 2.5D algorithm according to the method used in the above three steps.
The 2D algorithm refers to a method of representing inter-frame motion of a video frame using an inter-frame transformation matrix and smoothing camera motion by smoothing a series of inter-frame transformation matrices. While gaussian low-pass filtering, particle filtering, regularization, etc. are typically used when smoothing camera motion. Grundmann et al achieve smoothing of the camera path by constraining the derivatives of the motion changes between frames. Joshi et al use the video frame sequence in a certain neighborhood of the current frame to perform inter-frame feature point matching to obtain the maximum interior point set that fits the video frame sequence, thereby excluding the influence of part of the foreground region on jitter estimation. Liu et al, using a warping transform based on content preservation to divide the video frame into a grid-like structure and perform camera motion estimation and smoothing locally, can more robustly cope with scenes with discontinuous depth-of-field transformation.
The 3D algorithm is specially designed for the video of a three-dimensional scene, so that the video de-jittering problem of a scene with large parallax can be better solved. The method adopts an SFM method to recover the 3D attitude matrix of the camera, and then smoothes the 3D attitude matrix sequence to eliminate jitter. Liu et al reconstructs the three-dimensional motion pattern of the captured scene and employs a mesh-based warping transformation for generation of stable frames. Zhang et al performs estimation and smoothing of camera motion by establishing a smoothness-based optimization function.
In recent years, people design 2.5D algorithms by combining the advantages of the 2D algorithm and the 3D algorithm, firstly extract the characteristic tracks of the jitter video, then smooth the characteristic tracks through various algorithms, and finally realize the mapping from the jitter frame to the stable frame by utilizing the corresponding relation of the jitter characteristic tracks and the smooth characteristic tracks in each frame. Lee et al extract feature trajectories from the dithered video and design optimization functions to solve for smooth trajectories at stable viewing angles. The above approach does not perform well when processing video containing large foreground and parallax scenes. Liu et al impose well-known subspace constraints on the feature trajectories for motion estimation and smoothing. In order to ensure the performance of a video jitter removal algorithm in a scene with a large foreground, Ling et al propose an algorithm for separating foreground and background characteristic tracks and removing jitter of a video based on a feedback strategy.
As can be seen, the conventional method divides the entire video frame into several rectangles of fixed size, and then estimates a mapping relationship from a local jittered view to a stable view for each small block. This approach has the obvious drawback of ignoring the content of the video and considering all regions of the full picture equally. For example, for areas lacking useful information, such as sky, roads, and rich content areas, such as crowd, the method divides them into rectangular areas of the same size and performs an estimation of the transformation matrix within each rectangle. For the former region, such a blocking operation may be unnecessary since the same motion pattern may be satisfied over a larger range, but for the latter region, such a blocking operation is not fine enough since there is a more complicated motion pattern and a discontinuous disparity transform.
In conclusion, the method has poor applicability to scenes with large foreground shielding, complex parallax change and the like, and has poor de-jittering capability to videos of complex scenes.
Disclosure of Invention
In order to solve the technical problem, the invention adaptively blocks the video frame according to the scene content, simultaneously imposes constraints on the foreground and the background, and performs camera motion estimation by using the full-image information, thereby enhancing the jitter removal capability of the video containing the complex scene. The invention provides a video debounce algorithm based on a content-aware blocking strategy. Unlike existing de-jittering algorithms based on fixed blocking strategies, the present algorithm considers video content and adaptively partitions the video frame into blocks accordingly. And carrying out Delaunay triangulation on the feature points of the feature track in each frame to realize the self-adaptive block partitioning strategy and realize different partitioning results in each frame. By solving a two-stage optimization problem, the position of each triangle in the triangular mesh segmentation under the stable visual angle can be obtained, and the mapping from the jittering visual angle to the stable visual angle is carried out according to the position. The algorithm proposed by the invention does not distinguish foreground and background feature tracks any more and uses all feature tracks to estimate and smooth the camera motion. In order to further improve the robustness of the algorithm, two adaptive weight setting strategies are provided, and the performance of scenes with large foreground and parallax change is obviously improved.
The technical scheme of the invention is as follows: a video de-jittering algorithm based on a content-aware blocking strategy comprises the following steps:
and Step1, extracting a characteristic track from the video, obtaining the characteristic point distribution of each frame according to the characteristic track, and performing triangle segmentation according to the characteristic points.
And Step2, solving the stable position of the characteristic point through self-adaptive weight setting based on trajectory smoothing constraint, inter-frame similarity transformation constraint, intra-frame similarity transformation constraint and regular constraint. And solving the stable position of the edge control point according to the obtained stable position of the characteristic point, the interframe similarity transformation constraint and the intraframe similarity transformation constraint.
Step3 mapping of the frame-by-frame image from a dithered view to a stabilized view based on affine transformation.
Further, in the video debounce algorithm based on the content-aware blocking strategy, the feature trajectory extraction method in Step1 is as follows:
the method comprises the steps of extracting feature points from video frames by using a KLT algorithm and tracking to generate feature tracks, dividing the video frames into grids of 10x10 in size in order to avoid all the feature tracks from gathering in a middle region, extracting 200 corner points by using a uniform threshold, and ensuring that at least one feature point can be detected by reducing the threshold for a region without corner points.
Further, in the video debounce algorithm based on the content-aware blocking strategy, a delaunay algorithm is adopted for triangulation in Step 1.
Using standard Delaunay triangulation method to pair MtPerforming segmentation and generating triangular mesh segmentation result
Figure BDA0002630639310000031
Wherein KtRepresents QtThe number of medium triangles. MtAnd QtAre respectively expressed as
Figure BDA0002630639310000032
And
Figure BDA0002630639310000033
to find the result at a stable viewing angle for these edge regions, 10 control points are set on each of the four sides of the video frame and defined as E for the set of t-th framest={Et,1,...,Et,36}. We then base the video frame on the new set of points Mt,EtDelaunay triangulation operation is performed. The segmentation result is represented as
Figure BDA0002630639310000034
Wherein
Figure BDA0002630639310000035
Referred to as "interior triangles" meaning vertices all made up of MtThe point in (1) constitutes a triangle.
Figure BDA0002630639310000036
Referred to as an "outer triangle," denotes a triangle whose vertices contain at least one control point. Qt、BtThe position under a stable viewing angle is represented as
Figure BDA0002630639310000037
Further, in the video debounce algorithm based on the content-aware blocking strategy, the Step of solving the stable position of the feature point based on the trajectory smoothing constraint, the inter-frame similarity transformation constraint, the intra-frame similarity transformation constraint and the regularization constraint in Step2 is as follows:
the method solves an optimization problem comprising three constraints on the characteristic track
Figure BDA0002630639310000038
Is estimated, the three constraints are:
(1) a characteristic track P is giveniPosition at stable viewing angle
Figure BDA0002630639310000039
Should change slowly between frames.
(2) In the t-th frame, the frame,
Figure BDA00026306393100000310
should each stabilized triangle maintain and QtThe corresponding original triangles in (a) are similar.
(3) In the t-th frame, the frame,
Figure BDA00026306393100000311
should maintain the transformation relationship between similar triangles in (1) and (Q)tThe transformation relations between the corresponding triangles in the tree are consistent.
Based on the above constraints, an optimization function is designed that is minimized as follows:
Figure BDA00026306393100000312
wherein:
Figure BDA00026306393100000313
(1)
Figure BDA0002630639310000041
is a "smoothing term" used to smooth the feature trajectory by constraining the first and second derivatives of the feature trajectory to smooth the feature trajectory.
Figure BDA0002630639310000042
The definition is as follows:
Figure BDA0002630639310000043
where α and β are weighting coefficients, α ═ 2 and β ═ 10.
(2)
Figure BDA0002630639310000044
Is an "inter-frame similarity transformation constraint term" that ensures that the video frame after transformation and the original video frame remain similarly transformed, for QtK in (1)tA Delaunay triangle whose stable viewing angle is defined as
Figure BDA0002630639310000045
QtAnd
Figure BDA0002630639310000046
the vertex of the ith triangle is defined as
Figure BDA0002630639310000047
And
Figure BDA0002630639310000048
Figure BDA0002630639310000049
is required and QtThe corresponding triangles in (a) are similar.
Figure BDA00026306393100000410
The definition is as follows:
Figure BDA00026306393100000411
where γ is a weight coefficient, γ is 10,
Figure BDA00026306393100000412
the definition is as follows:
Figure BDA00026306393100000413
Figure BDA00026306393100000414
wherein a isBAnd bBThis can be obtained by the following formula:
Figure BDA00026306393100000415
(3)
Figure BDA00026306393100000416
the method is an intra-frame similarity transformation constraint item which is used for ensuring that the transformation relation between local areas in a video frame obtained by transformation is similar to the transformation relation between original video frames.
Figure BDA00026306393100000417
The definition is as follows:
Figure BDA00026306393100000418
wherein:
Figure BDA00026306393100000419
is a weight coefficient, 20, phi (i) denotes all the adjoining triangles of triangle i, a, b, c can be obtained by the following formula:
Figure BDA0002630639310000051
Figure BDA0002630639310000052
a+b+c=1
(4)
Figure BDA0002630639310000053
is a "regularization term" that limits the stabilized feature trajectory from remaining positionally close to the original feature trajectory, thereby avoiding excessive transformation resulting in substantial loss of video content. The definition is as follows:
Figure BDA0002630639310000054
further, in the video debounce algorithm based on the content-aware blocking strategy, the Step of solving the stable position of the feature point based on the trajectory smoothing constraint, the inter-frame similarity transformation constraint, the intra-frame similarity transformation constraint and the regularization constraint in Step2 is as follows:
further, in the video debounce algorithm based on the content-aware blocking strategy, the Step of solving the stable position of the edge control point in Step2 is as follows:
the following optimization problem is designed to find the positions of these control points at a stable viewing angle:
Figure BDA0002630639310000055
wherein:
Figure BDA0002630639310000056
(1)
Figure BDA0002630639310000057
is an interframe similarity transformation constraint term defined as follows:
Figure BDA0002630639310000058
wherein
Figure BDA0002630639310000059
Different optimization operations for the feature points and the control points are represented, and are specifically defined as follows:
Figure BDA00026306393100000510
Figure BDA00026306393100000511
representing the desired stable position, the solution is as follows:
Figure BDA00026306393100000512
Figure BDA00026306393100000513
Figure BDA00026306393100000514
indicating the stable position of the feature point. Gamma is a weight coefficient with a value equal to 1.
(2)
Figure BDA0002630639310000061
Is an intra-frame similarity transformation constraint term defined as follows:
Figure BDA0002630639310000062
wherein:
Figure BDA0002630639310000063
Figure BDA0002630639310000064
wherein a, b, c are solved by the following equations:
Figure BDA0002630639310000065
Figure BDA0002630639310000066
a+b+c=1
Figure BDA0002630639310000067
indicating the stable position of the feature point.
Further, in the video debounce algorithm based on the content-aware blocking strategy, the adaptive weight setting strategy in Step2 is as follows:
(1) time-adaptive based weight setting
Figure BDA0002630639310000068
The term expects that the change between adjacent frames of the feature trajectory tends to 0. However, rapid camera motion often results in large frame-to-frame variations in the feature trajectories, which can lead to collapse of the video content. I.e., alpha, should be appropriately reduced when fast camera motion is detected, so improvements
Figure BDA0002630639310000069
In the form:
Figure BDA00026306393100000610
wherein the sigma is 10, the total weight of the powder,
Figure BDA00026306393100000611
and
Figure BDA00026306393100000612
representing the velocities in the x and y directions, respectively, is calculated as follows:
Figure BDA00026306393100000613
(2) weight setting based on spatial adaptation
Since handheld devices capture videos of real-world scenes, there are inevitably discontinuous depth variations in these videos and inconsistencies in local motion due to foreground occlusion. This may create a distortion phenomenon in the stabilized video frame. In order to solve the problems, whether a dynamic foreground object exists in each triangle divided by a triangular mesh is judged, and then a triangular area with the foreground object is added
Figure BDA00026306393100000614
The weight of the term. For the characteristic track i, an overdetermined equation is adopted to calculate Ct,iAnd Ct-1,iA transformation matrix between, wherein Ht,iThe definition is as follows:
Figure BDA0002630639310000071
the parameters are solved by the following equations:
Figure BDA0002630639310000072
the above equation is solved by the least square method, and the above formula is expressed asAt,iβt,i=Bt,iTo obtain
Figure BDA0002630639310000073
Residual is | | At,iβ-Bt,i||2The residual is specifically defined as:
Figure BDA0002630639310000074
wherein theta ist,iIs a spatial scale defined as:
Figure BDA0002630639310000075
where ρ ═ (W/τ + H/τ)/2, and W and H denote the width and height of the video frame. τ is used to control the scale size of the partitions in the normalization, and is set to a value of 10 and remains equal to the number of control points on each side of the video frame.
Finally, it is
Figure BDA0002630639310000076
Is defined as:
Figure BDA0002630639310000077
wherein
Figure BDA0002630639310000078
Representing the set of three vertices of the triangle p.
By finding all Pi,tAnd
Figure BDA0002630639310000079
and performing affine transformation and mapping the jitter frame to the stable frame.
Compared with the prior art, the invention has the advantages that:
(1) the invention adaptively blocks the video frame according to the scene content, simultaneously imposes constraints on the foreground and the background, and utilizes the full-image information to carry out camera motion estimation, thereby enhancing the jitter removal capability of the video containing the complex scene.
(2) Further, adaptive weight setting based on time and space further improves the robustness of the algorithm.
(3) Further, the calculation of the stable position of the edge control point improves the stability of the image of the edge area.
Drawings
Fig. 1 is a flow chart of a video debounce algorithm method based on a content-aware blocking strategy according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
As shown in fig. 1, according to an embodiment of the present invention, a video de-jittering algorithm method based on content-aware blocking strategy is provided, which includes the following steps:
step1, extracting and triangulating characteristic track
The method comprises the steps of extracting feature points from video frames by using a KLT algorithm and tracking to generate feature tracks, in order to avoid gathering all the feature tracks in a middle region, firstly dividing the video frames into grids of 10x10, and extracting 200 corner points in total by using a uniform threshold (considering the calculation amount, preferably, only 200 corner points are extracted), and for regions without corner points, ensuring that at least one feature point can be detected by reducing the threshold. And then generating a feature track by using the extracted feature points.
Suppose that N feature tracks are extracted in a jittered video, which is defined as
Figure BDA0002630639310000081
These trajectories may be fromBackground or foreground regions. For a feature trajectory i (i e [1, N ]]) The first and last frames of its occurrence are defined as siAnd ei. Defining a characteristic track i at the t(s)i≤t≤ei) The position of the frame is Pi,t. The set of the feature points of which all the feature tracks appear in the t-th frame is Mt={Pi,t|i∈[1,N],t∈[si,ei]}。
Using standard Delaunay triangulation method to pair MtPerforming segmentation and generating triangular mesh segmentation result
Figure BDA0002630639310000082
Wherein KtRepresents QtThe number of medium triangles. MtAnd QtAre respectively expressed as
Figure BDA0002630639310000083
And
Figure BDA0002630639310000084
to find the result at stable viewing angles of these edge regions, 10 control points, for a total of 36 control points, are set on each of the four sides of the video frame and defined as E in the t-th frame sett={Et,1,...,Et,36}. The video frame is then based on the new set of points Mt,EtDelaunay triangulation operation is performed. The segmentation result is represented as
Figure BDA0002630639310000085
Wherein
Figure BDA0002630639310000086
Referred to as "interior triangles" meaning vertices all made up of MtThe point in (1) constitutes a triangle.
Figure BDA0002630639310000087
Referred to as an "outer triangle," denotes a triangle whose vertices contain at least one control point. Qt、BtPosition under stable viewing angle is byIs shown as
Figure BDA0002630639310000088
Step2, smoothing the characteristic track by using an optimization function
2.1 calculation of Stable positions of feature points
The step is to solve an optimization problem pair characteristic track containing three constraints
Figure BDA0002630639310000091
Is estimated, the three constraints are:
(1) a characteristic track P is giveniPosition at stable viewing angle
Figure BDA0002630639310000092
Should change slowly between frames.
(2) In the t-th frame, the frame,
Figure BDA0002630639310000093
should each stabilized triangle maintain and QtThe corresponding original triangles in (a) are similar.
(3) In the t-th frame, the frame,
Figure BDA0002630639310000094
should maintain the transformation relationship between similar triangles in (1) and (Q)tThe transformation relations between the corresponding triangles in the tree are consistent.
Based on the above constraints, an optimization function is designed that is minimized as follows:
Figure BDA0002630639310000095
wherein:
Figure BDA0002630639310000096
(1)
Figure BDA0002630639310000097
is a "smoothing term" used to smooth the feature trajectory by constraining the first and second derivatives of the feature trajectory to smooth the feature trajectory.
Figure BDA0002630639310000098
The definition is as follows:
Figure BDA0002630639310000099
where α and β are weighting coefficients, α ═ 2 and β ═ 10.
(2)
Figure BDA00026306393100000910
Is an "inter-frame similarity transformation constraint term" that ensures that the video frame after transformation and the original video frame remain similarly transformed, for QtK in (1)tA Delaunay triangle whose stable viewing angle is defined as
Figure BDA00026306393100000911
QtAnd
Figure BDA00026306393100000912
the vertex of the ith triangle is defined as
Figure BDA00026306393100000913
And
Figure BDA00026306393100000914
Figure BDA00026306393100000915
is required and QtThe corresponding triangles in (a) are similar.
Figure BDA00026306393100000916
The definition is as follows:
Figure BDA00026306393100000917
where γ is a weight coefficient, γ is 10,
Figure BDA00026306393100000918
the definition is as follows:
Figure BDA00026306393100000919
Figure BDA0002630639310000101
wherein a isBAnd bBThis can be obtained by the following formula:
Figure BDA0002630639310000102
(3)
Figure BDA0002630639310000103
the method is an intra-frame similarity transformation constraint item which is used for ensuring that the transformation relation between local areas in a video frame obtained by transformation is similar to the transformation relation between original video frames.
Figure BDA0002630639310000104
The definition is as follows:
Figure BDA0002630639310000105
wherein:
Figure BDA0002630639310000106
where is the weight factor, 20, phi (i) denotes all the adjoining triangles of triangle i, a, b, c can be obtained by the following formula:
Figure BDA0002630639310000107
Figure BDA0002630639310000108
a+b+c=1
(4)
Figure BDA0002630639310000109
is a "regularization term" that limits the stabilized feature trajectory from remaining positionally close to the original feature trajectory, thereby avoiding excessive transformation resulting in substantial loss of video content. The definition is as follows:
Figure BDA00026306393100001010
the adaptive weight setting strategy is as follows:
(1) time-adaptive based weight setting
Figure BDA00026306393100001011
The term expects that the change between adjacent frames of the feature trajectory tends to 0. However, rapid camera motion often results in large frame-to-frame variations in the feature trajectories, which can lead to collapse of the video content. I.e., alpha, should be appropriately reduced when fast camera motion is detected, so improvements
Figure BDA00026306393100001012
In the form:
Figure BDA00026306393100001013
wherein the sigma is 10, the total weight of the powder,
Figure BDA00026306393100001014
and
Figure BDA00026306393100001015
the calculation method of (c) is as follows:
Figure BDA0002630639310000111
(2) weight setting based on spatial adaptation
Since handheld devices capture videos of real-world scenes, there are inevitably discontinuous depth variations in these videos and inconsistencies in local motion due to foreground occlusion. This may create a distortion phenomenon in the stabilized video frame. In order to solve the problems, whether a dynamic foreground object exists in each triangle divided by a triangular mesh is judged, and then a triangular area with the foreground object is added
Figure BDA0002630639310000112
The weight of the term. For the characteristic trajectory i, C is calculated by using the overdetermined equationt,iAnd Ct-1,iA transformation matrix between, wherein Ht,iThe definition is as follows:
Figure BDA0002630639310000113
the parameters are solved by the following equations:
Figure BDA0002630639310000114
the above equation is solved by the least square method, and the above formula is expressed as At,iβt,i=Bt,iTo obtain
Figure BDA0002630639310000115
Residual is | | At,iβ-Bt,i||2The residual is specifically defined as:
Figure BDA0002630639310000116
wherein theta ist,iIs a spatial scale defined as:
Figure BDA0002630639310000117
where ρ ═ (W/τ + H/τ)/2, and W and H denote the width and height of the video frame. τ is used to control the scale size of the partitions in the normalization, and is set to a value of 10 and remains equal to the number of control points on each side of the video frame.
Finally, it is
Figure BDA0002630639310000118
Is defined as:
Figure BDA0002630639310000119
wherein
Figure BDA0002630639310000121
Representing the set of three vertices of the triangle p.
2.2 calculation of Stable position of control Point
The following optimization problem is designed to find the positions of these control points at a stable viewing angle:
Figure BDA0002630639310000122
wherein:
Figure BDA0002630639310000123
(1)
Figure BDA0002630639310000124
is an interframe similarity transformation constraint term defined as follows:
Figure BDA0002630639310000125
wherein:
Figure BDA0002630639310000126
Figure BDA0002630639310000127
Figure BDA0002630639310000128
Figure BDA0002630639310000129
indicating the stable position of the feature point.
(2)
Figure BDA00026306393100001210
Is an intra-frame similarity transformation constraint term defined as follows:
Figure BDA00026306393100001211
wherein:
Figure BDA00026306393100001212
Figure BDA00026306393100001213
wherein a, b, c are solved by the following equations:
Figure BDA00026306393100001214
Figure BDA00026306393100001215
a+b+c=1
step3, affine transformation from jittering visual angle to stable visual angle
And performing homography matrix calculation according to the characteristic point coordinates under the dithering visual angle of the t-th frame and the estimated characteristic point coordinates under the stable visual angle, and performing affine transformation.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A video debounce algorithm based on a content-aware blocking strategy is characterized by comprising the following steps:
step1, extracting a characteristic track from the video, obtaining the characteristic point distribution of each frame according to the characteristic track, and performing triangle segmentation according to the characteristic points;
step2, based on the track smooth constraint, the interframe similarity transformation constraint, the intraframe similarity transformation constraint and the regular constraint, and solving the stable position of the characteristic point through self-adaptive weight setting, and further solving the stable position of the edge control point through the interframe similarity transformation constraint and the intraframe similarity transformation constraint according to the obtained stable position of the characteristic point;
step 3. mapping from a dithered view to a stabilized view is established based on the affine transformed frame-by-frame image.
2. The video de-jittering algorithm based on content-aware blocking strategy according to claim 1,
the characteristic track extraction method in Step1 is as follows:
extracting feature points from a video frame by using a KLT algorithm and tracking to generate a feature track, firstly dividing the video frame into grids of 10x10, extracting 200 corner points by using a unified threshold, and ensuring that at least one feature point can be detected by reducing the threshold for an area without corner points;
suppose that N feature tracks are extracted in a jittered video, which is defined as
Figure FDA0002630639300000011
These trajectories come from the background or foreground region, for a characteristic trajectory i, i ∈ [1, N ]]The first and last frames of its occurrence are defined as siAnd eiDefining the position of the characteristic track i in the t frame as Pi,tWherein s isi≤t≤eiThe set of feature points of which all feature tracks appear in the t-th frame is Mt={Pi,t|i∈[1,N],t∈[si,ei]}。
3. The video de-jittering algorithm based on content-aware blocking strategy according to claim 1, wherein said triangulation Step in Step1 is as follows:
set M of feature points by using standard Delaunay triangulation methodtPerforming segmentation and generating triangular mesh segmentation result
Figure FDA0002630639300000012
Wherein KtRepresents QtNumber of medium triangles, MtAnd QtAre respectively expressed as
Figure FDA0002630639300000013
And
Figure FDA0002630639300000014
to find the result at a stable view angle of the edge region, four at the video frameSetting 10 control points on each side of the edges, and defining the set of the control points in the t-th frame as Et={Et,1,...,Et,36Then the video frame is put according to the new point set { M }t,EtCarry out Delaunay triangulation operation, the segmentation result is expressed as
Figure FDA0002630639300000015
Wherein
Figure FDA0002630639300000018
Referred to as "interior triangles" meaning vertices all made up of MtThe point(s) in (b) constitute a triangle,
Figure FDA0002630639300000016
referred to as "outer triangles" and representing triangles having vertices containing at least one control point, the number Lt,Qt、BtThe position under a stable viewing angle is represented as
Figure FDA0002630639300000017
4. The video de-jittering algorithm based on content-aware blocking strategy according to claim 1, wherein said Step2 is based on trajectory smoothing constraint, inter-frame similarity transformation constraint, intra-frame similarity transformation constraint and regularization constraint, and the Step of solving stable positions of feature points is as follows:
by solving an optimization problem pair feature trajectory containing three constraints
Figure FDA0002630639300000021
Is estimated, the three constraints are:
(A) a characteristic track P is giveniPosition at stable viewing angle
Figure FDA0002630639300000022
Should change slowly between frames;
(B) in the t-th frame, the frame,
Figure FDA0002630639300000023
should each stabilized triangle maintain and QtThe corresponding original triangles in (1) are similar;
(C) in the t-th frame, the frame,
Figure FDA0002630639300000024
should maintain the transformation relation between adjacent trianglestThe transformation relations between the corresponding adjacent triangles in the triangle are consistent;
based on the above constraints, an optimization function is designed that is minimized as follows:
Figure FDA0002630639300000025
wherein:
Figure FDA0002630639300000026
(1)
Figure FDA0002630639300000027
is a 'smoothing term' used to smooth the feature trajectory by constraining the first and second derivatives of the feature trajectory to smooth the feature trajectory;
Figure FDA0002630639300000028
the definition is as follows:
Figure FDA0002630639300000029
wherein α and β are weighting coefficients, α ═ 2, β ═ 10;
(2)
Figure FDA00026306393000000210
is an "inter-frame similarity transformation constraint term" that ensures that the video frame after transformation and the original video frame remain similarly transformed, for QtK in (1)tDelaunay triangle, the desired view angle being defined as
Figure FDA00026306393000000211
Stable viewing angle is defined as
Figure FDA00026306393000000212
QtAnd
Figure FDA00026306393000000213
the vertex of the ith triangle is defined as
Figure FDA00026306393000000214
And
Figure FDA00026306393000000215
Figure FDA00026306393000000216
is required and QtThe corresponding triangles in (a) are similar,
Figure FDA00026306393000000217
the definition is as follows:
Figure FDA00026306393000000218
γ is a weight coefficient, γ ═ 10, where:
Figure FDA00026306393000000219
Figure FDA00026306393000000220
wherein a isBAnd bBIs the coefficient of the corresponding vector edge, solved by the following equation:
Figure FDA0002630639300000031
(3)
Figure FDA0002630639300000032
the video frame is an intra-frame similarity transformation constraint item which is used for ensuring that the transformation relation between local areas in the video frame obtained by transformation is similar to the transformation relation between the local areas in the original video frame;
Figure FDA0002630639300000033
the definition is as follows:
Figure FDA0002630639300000034
where γ is a weight coefficient, φ (i) represents all adjacent triangles of triangle i,
Figure FDA0002630639300000035
is the expected stable position of the intra-frame similarity transformation, defined as follows:
Figure FDA0002630639300000036
where is the weighting factor, 20, a, b, c are the coefficients corresponding to the three vertices, which is obtained by the following formula:
Figure FDA0002630639300000037
Figure FDA0002630639300000038
a+b+c=1
(4)
Figure FDA0002630639300000039
is a "regularization term" used to limit the stabilized feature trajectory from remaining positionally close to the original feature trajectory, thereby avoiding a large loss of video content due to excessive transformation, defined as follows:
Figure FDA00026306393000000310
5. the video de-jittering algorithm based on content-aware blocking strategy according to claim 4, wherein the Step of solving the stable position of the edge control point in Step2 is as follows:
the following optimization problem is designed to find the positions of these control points at a stable viewing angle:
Figure FDA00026306393000000311
wherein:
Figure FDA00026306393000000312
(1)
Figure FDA00026306393000000313
is an interframe similarity transformation constraint term defined as follows:
Figure FDA0002630639300000041
wherein
Figure FDA0002630639300000042
Different optimization operations for the feature points and the control points are represented, and are specifically defined as follows:
Figure FDA0002630639300000043
Figure FDA0002630639300000044
representing the desired stable position, the solution is as follows:
Figure FDA0002630639300000045
Figure FDA0002630639300000046
Figure FDA0002630639300000047
representing the position where the feature point is stable, gamma is a weight coefficient with a value equal to 1.
(2)
Figure FDA0002630639300000048
Is an intra-frame similarity transformation constraint term, boundary BtThe triangles in the frame are required to keep consistent transformation relation between adjacent triangles in the stable frame and the relation between corresponding triangles in the original video frame; the adjacent triangle j of the triangle i may belong to BtAnd QtTriangle j with Bt,iDifferent vertices may belong to BtOr QtA 1 to BtAnd QtIs defined as
Figure FDA0002630639300000049
In the t-th frame, Bt,iIs defined as BQt,j,Bt,iAnd BQt,jThe different vertices in the set are defined as
Figure FDA00026306393000000410
And
Figure FDA00026306393000000411
using linear texture mapping
Figure FDA00026306393000000412
Is represented as BQt,jCombinations of vertices, i.e.
Figure FDA00026306393000000413
Figure FDA00026306393000000414
A triangle at a stable viewing angle is shown,
Figure FDA00026306393000000415
has a vertex of
Figure FDA00026306393000000416
Figure FDA00026306393000000417
Figure FDA00026306393000000418
The definition is as follows:
Figure FDA00026306393000000419
wherein
Figure FDA00026306393000000420
Representing different operations according to the control point and the feature point to which the vertex of the stabilized triangle belongs, is defined as follows:
Figure FDA00026306393000000421
Figure FDA00026306393000000422
wherein a, b, c are solved by the following equations:
Figure FDA00026306393000000423
Figure FDA00026306393000000424
a+b+c=1。
6. the video de-jittering algorithm based on content-aware blocking strategy according to claim 1, wherein said adaptive weight setting method in Step2 adopts two methods as follows:
(1) time-adaptive based weight setting
Figure FDA0002630639300000051
The change between adjacent frames of the term expected feature trajectory tends to 0, i.e. alpha should be reduced appropriately when fast camera motion is detected, improvement
Figure FDA0002630639300000052
In the form:
Figure FDA0002630639300000053
wherein the sigma is 10, the total weight of the powder,
Figure FDA0002630639300000054
and
Figure FDA0002630639300000055
representing the velocities in the x and y directions, respectively, is calculated as follows:
Figure FDA0002630639300000056
(2) weight setting based on spatial adaptation
For a video shot by a handheld device in a real scene, discontinuous depth change and inconsistency of local motion caused by foreground occlusion exist, and a distortion phenomenon is generated in a stabilized video frame, so that whether a dynamic foreground object exists in each triangle segmented by a triangular mesh is judged firstly, and then a triangular area with the foreground object is added
Figure FDA0002630639300000057
The weight of the item; for the characteristic trajectory i, C is calculated by using the overdetermined equationt,iAnd Ct-1,iA transformation matrix between, wherein Ht,iThe definition is as follows:
Figure FDA0002630639300000058
the parameters are solved by the following equations:
Figure FDA0002630639300000059
the above equation is solved by the least square method, and the above formula is expressed as At,iβt,i=Bt,iTo obtain
Figure FDA00026306393000000510
Residual is | | At,iβ-Bt,i||2The residual is specifically defined as:
Figure FDA0002630639300000061
wherein theta ist,iIs a spatial scale defined as:
Figure FDA0002630639300000062
where ρ ═ (W/τ + H/τ)/2, W and H denote the width and height of the video frame; tau is used for controlling the scale size of the blocks in the normalization, the value of tau is set to 10, and the tau is equal to the number of control points on each edge of the video frame;
finally, it is
Figure FDA0002630639300000063
Is defined as:
Figure FDA0002630639300000064
wherein
Figure FDA0002630639300000065
Representing the set of three vertices of the triangle p.
7. The video de-jittering algorithm based on content-aware blocking strategy according to claim 1, wherein said Step3 specifically comprises:
and performing homography matrix calculation according to the characteristic point coordinates under the dithering visual angle of the t-th frame and the estimated characteristic point coordinates under the stable visual angle, and performing affine transformation.
CN202010810101.0A 2020-08-13 2020-08-13 Video debounce algorithm based on content-aware blocking strategy Pending CN112001860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010810101.0A CN112001860A (en) 2020-08-13 2020-08-13 Video debounce algorithm based on content-aware blocking strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010810101.0A CN112001860A (en) 2020-08-13 2020-08-13 Video debounce algorithm based on content-aware blocking strategy

Publications (1)

Publication Number Publication Date
CN112001860A true CN112001860A (en) 2020-11-27

Family

ID=73463980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010810101.0A Pending CN112001860A (en) 2020-08-13 2020-08-13 Video debounce algorithm based on content-aware blocking strategy

Country Status (1)

Country Link
CN (1) CN112001860A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112804444A (en) * 2020-12-30 2021-05-14 影石创新科技股份有限公司 Video processing method and device, computing equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MINDA ZHAO等: "Adaptively Meshed Video Stabilization", 《ARXIV》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112804444A (en) * 2020-12-30 2021-05-14 影石创新科技股份有限公司 Video processing method and device, computing equipment and storage medium

Similar Documents

Publication Publication Date Title
Wang et al. Spatially and temporally optimized video stabilization
Wang et al. Motion-aware temporal coherence for video resizing
CN106331480B (en) Video image stabilization method based on image splicing
US20150022677A1 (en) System and method for efficient post-processing video stabilization with camera path linearization
US9165401B1 (en) Multi-perspective stereoscopy from light fields
US9838604B2 (en) Method and system for stabilizing video frames
EP1864502A2 (en) Dominant motion estimation for image sequence processing
CN101853497A (en) Image enhancement method and device
CN111614965B (en) Unmanned aerial vehicle video image stabilization method and system based on image grid optical flow filtering
CN106851102A (en) A kind of video image stabilization method based on binding geodesic curve path optimization
Berdnikov et al. Real-time depth map occlusion filling and scene background restoration for projected-pattern-based depth cameras
Wang et al. Video stabilization: A comprehensive survey
CN104469086A (en) Method and device for removing dithering of video
Zhao et al. Adaptively meshed video stabilization
KR101851896B1 (en) Method and apparatus for video stabilization using feature based particle keypoints
CN105282400B (en) A kind of efficient video antihunt means based on geometry interpolation
CN112001860A (en) Video debounce algorithm based on content-aware blocking strategy
Chan et al. An object-based approach to image/video-based synthesis and processing for 3-D and multiview televisions
CN109729263A (en) Video based on fusional movement model removes fluttering method
CN108596858B (en) Traffic video jitter removal method based on characteristic track
Rawat et al. Adaptive motion smoothening for video stabilization
WO2022040988A1 (en) Image processing method and apparatus, and movable platform
Yousaf et al. Real time video stabilization methods in IR domain for UAVs—A review
Lee et al. ROI-based video stabilization algorithm for hand-held cameras
Dervişoğlu et al. Interpolation-based smart video stabilization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201127