CN108765317B - Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization - Google Patents
Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization Download PDFInfo
- Publication number
- CN108765317B CN108765317B CN201810429065.6A CN201810429065A CN108765317B CN 108765317 B CN108765317 B CN 108765317B CN 201810429065 A CN201810429065 A CN 201810429065A CN 108765317 B CN108765317 B CN 108765317B
- Authority
- CN
- China
- Prior art keywords
- image
- video
- algorithm
- adaptive
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000005457 optimization Methods 0.000 title claims abstract description 24
- 230000006641 stabilisation Effects 0.000 title claims abstract description 21
- 238000011105 stabilization Methods 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 9
- 230000003044 adaptive effect Effects 0.000 claims description 29
- 239000013598 vector Substances 0.000 claims description 28
- 238000013519 translation Methods 0.000 claims description 24
- 238000013213 extrapolation Methods 0.000 claims description 16
- 238000000354 decomposition reaction Methods 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 13
- 230000000295 complement effect Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000002401 inhibitory effect Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000005286 illumination Methods 0.000 claims description 2
- 238000000844 transformation Methods 0.000 claims description 2
- 230000004224 protection Effects 0.000 abstract description 13
- 238000009499 grossing Methods 0.000 abstract description 8
- 230000008030 elimination Effects 0.000 abstract description 4
- 238000003379 elimination reaction Methods 0.000 abstract description 4
- 230000009467 reduction Effects 0.000 abstract description 4
- 230000000007 visual effect Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000009826 distribution Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/68—Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
- H04N23/682—Vibration or motion blur correction
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Studio Devices (AREA)
Abstract
The invention provides a joint optimization method of space-time consistency and feature center EMD self-adaptive video stabilization, which aims at video anti-shake on the basis of decomposing noise signals by using an EMD method, applies the technologies of saliency protection based on space structure consistency matrix estimation, visual elimination, self-adaptive smoothing, cutting area reduction, video completion and the like aiming at a shake video, improves the stability, the universality, the accuracy and the self-adaptability of video enhancement processing, and improves the integrity of the video.
Description
Technical Field
The invention relates to a joint optimization method for space-time consistency and feature center EMD (Empirical Mode Decomposition) adaptive video stabilization, belonging to the technical field of computer vision enhancement.
Background
Hand-held devices used by amateurs, such as mobile phones, camcorders, tablet computers and commonly used cameras, have become fashionable, but because the stabilization devices of the devices are too simple, the videos captured by the hand-held devices are often erratic and uncomfortable for people to see. Video stabilization techniques aim to remove the jitter and shock between frames visible in jittered video. It is one of the most active research subjects in the field of computer vision and can be applied to many high-level video enhancement applications, such as artificial observation, video identification, video detection, video tracking, video compression, and the like.
Given videos captured by multiple handheld devices (such as cell phones or camcorders), most of the most advanced video stabilization methods, such as f.liu et al, a.goldstein et al, and c.morimoto et al, learn two-dimensional linear motion models by estimating and smoothing linear (affine or homography) transforms of consecutive frames, and many other most advanced video stabilization methods, such as c.buehler et al, f.liu et al, and s.liu et al, employ three-dimensional lens curve motion by processing parallax to generate strong stabilization results. Over the long term evolution, 2D and 3D approaches have had great success. However, some challenges have not been fully resolved in order to better integrate the traditional problems of feature detection, feature registration, and camera trajectory analysis into a unified stable framework.
The disadvantages and shortcomings of the prior art are summarized as follows: first, from the perspective of producing a stable, non-parallax camera trajectory based on saliency protection, parallax caused by non-trivial depth variations in the scene makes the estimation an ill-posed problem, since different regions may require different homographies of trajectory and spatial diversity. Although spatial multi-dimensional reconstruction can in principle handle parallax and produce more stable results, multi-dimensional motion model estimation is not robust enough due to variability and often neglecting significant feature protections such as fast rotation, tracking failures, camera zooming and motion blur. Therefore, how to simultaneously generate a uniform disparity-free homography matrix based on saliency protection by using spatially diverse homography matrices is urgently needed in the video stabilization problem. Second, the current approach suffers more or less from the following problems from the perspective of adaptively smoothing camera motion. Without sufficient adaptivity, more challenging situations (e.g. fast motion, fast scene transition, large occlusion) are difficult to handle by straightforward smoothing of the raw camera motion. As the motion of the camera tends to be smooth over/under in some cases. For example, unstable video tends to be excessively cut or jittery video tends to jitter. Therefore, considering the optimized quality of the original video, how to design an adaptive smoothing model to analyze and smooth the jittered video is crucial to the robust and high-quality result. Third, the current method has more or less of the following problems from the viewpoint of reducing the cropping area of the optimized video. Many techniques focus too much on the smoothing effect, ignoring the cropping area of the smoothed video. Therefore, in view of protection of original image content, an accurate and simple strategy is required to flexibly protect original video content while successfully suppressing high-frequency judder and low-frequency judder thereof. Fourth, while various high-order image interpolation and extrapolation methods have proven effective from the point of view of the integrity of the stabilized video, some unexpected results may occur if the effects of neighboring pixels are not adequately considered. For example, discontinuities in resample values may create significant mosaic and jagged portions. Therefore, in consideration of uncertainty of image interpolation and extrapolation, an accurate and simple strategy is required to flexibly complement the missing part of the original video.
Disclosure of Invention
The technical problem solved by the invention is as follows: the method overcomes the problems of stability and robustness of the existing video stabilization method, provides a joint optimization method of space-time consistency and feature center EMD adaptive video stabilization, adaptively processes video stabilization, image significance protection, parallax weakening, adaptive smoothing, cutting area reduction and video completion in a space-time consistent mode, improves the stability, universality, accuracy and adaptivity of video enhancement processing, and improves the integrity of the video.
The technical scheme adopted by the invention is as follows: a joint optimization method for space-time consistency and feature center EMD adaptive video stabilization comprises the following steps:
firstly, the invention provides a method for estimating a homography matrix of space structure consistency based on image salient region protection. This method effectively reveals the consistency of the intrinsic motion of different regions of the dithered video. While still being able to flexibly construct a uniform camera motion profile.
Secondly, the invention provides a method for self-adapting eigenmode function coefficients based on empirical mode decomposition. The method adaptively ensures that the jittering video is more stable, and promotes the analysis and optimization of the unstable video.
Thirdly, the invention provides a characteristic center strategy based on Gaussian distribution. This strategy can significantly reduce the cut-out region of the stabilized video while preserving the stability of the optimized video.
Finally, the present invention proposes an efficient video matching and completion method that adaptively interpolates missing regions and interpolating overlapping regions for the original video (image frame sequence).
The method specifically comprises the following four steps:
step (1), the consistency of the spatial structure based on the image significance protection: extracting image Scale Invariance (SIFT) feature points by an SIFT method, performing image Scale Invariance (SIFT) feature matching, deploying uniform grids on an image, acquiring saliency vectors of a plurality of grids, and performing space structure consistency deformation on the basis of protecting the saliency vectors of the image grids by taking the uniform grids as a reference to obtain a deformed image frame sequence;
step (2), self-adaptive eigenmode function: starting from a viewpoint position, re-acquiring an SIFT feature set from each image frame obtained by deformation in the step (1), constructing a space structure matrix based on SIFT, extracting rotation, translation and scaling motion information of the space structure matrix, constructing an original lens motion signal, decomposing and generating an eigenmode function through an EMD (empirical mode decomposition) algorithm, adaptively generating anisotropic coefficients of all the eigenmode functions according to an adaptive eigenmode function optimization algorithm, and acquiring a new lens adaptive motion signal through a weighted summation algorithm of the eigenmode function and the anisotropic coefficients;
step (3), EMD of the feature center: taking the lens self-adaptive motion signal obtained by calculation in the step (2) as a new input signal, weighting to obtain a new characteristic center motion signal according to a characteristic center algorithm, and protecting the original signal motion trend of the lens by adopting a weighting algorithm based on a Gaussian function while further inhibiting jitter components;
step (4), area extrapolation and interpolation: and (4) further generating a new stable video, namely an image frame sequence, on the basis of the characteristic central motion signal obtained by calculation in the step (3), complementing the blank formed by frame translation, rotation and stretching of the video, performing extrapolation of a missing region on the basis of an adaptive time domain algorithm, and performing interpolation of an overlapping region on the basis of a cubic spline interpolation algorithm.
The method for consistency of the spatial structure based on image significance protection in the step (1) is realized as follows:
(11) and (3) detection of extreme values in the scale space: firstly, constructing a scale space, searching image positions on all scales, and identifying potential interest points which are invariable in scale and rotation through a Gaussian differential function;
(12) key point positioning: determining the position and scale of each candidate position by fitting a fine model; the selection of key points depends on their degree of stability;
(13) direction determination: assigning one or more directions to each keypoint location based on the local gradient direction of the image, all subsequent operations on the image data being transformed with respect to the direction, scale and location of the keypoint, thereby providing invariance to these transformations;
(14) description of key points: measuring local gradient of the image on a selected scale in a neighborhood around each key point, and then converting the gradient into a representation, wherein the representation allows deformation of local shape and illumination change, and the obtained key points are scale-invariant feature points;
(15) matching the scale-invariant feature points by measuring Euclidean distances of the scale-invariant feature points in the adjacent frames;
(16) distributing a significance mapping vector for each pixel of the image, wherein the value range of the significance mapping vector is 0-1, deploying uniform grids on the image, averaging the significance mapping vectors of all pixels on each grid, and acquiring the significance vector of each grid;
(17) and taking the uniform grid as a reference, performing space structure consistency deformation on the basis of protecting the saliency vector of the image grid, preferentially protecting the area with the saliency vector value of the grid larger than the median, and concentrating distortion caused by the space structure consistency deformation into the area with the saliency vector value of the grid smaller than the median to obtain a deformed image frame sequence.
The self-adaptive eigenmode function optimization algorithm in the step (2) comprehensively considers a plurality of competing factors, wherein the factors comprise jitter elimination, excessive cutting elimination, maximum reduction of distortion and distortion, and solution of convex linear programming through CVX;
the concrete implementation is as follows:
(21) starting from the viewpoint position, re-acquiring an SIFT feature set from each image frame obtained by deformation in the step (1), and constructing a spatial structure matrix based on SIFT;
(22) converting the space structure matrix based on the SIFT into a model related to scale, rotation and translation, extracting rotation, translation and scaling motion information of the scale, rotation and translation related model, aggregating the rotation, translation and scaling motion information of all image frames, and constructing an original lens motion signal;
(23) decomposing and generating an eigenmode function through an EMD decomposition algorithm;
(24) and generating the anisotropy coefficients of all the eigenmode functions in a self-adaptive mode function optimization algorithm, and acquiring a new lens self-adaptive motion signal through a weighted summation algorithm of the eigenmode functions and the anisotropy coefficients.
The characteristic center algorithm in the step (3) further inhibits jitter components and simultaneously protects the motion trend of the original signal;
the concrete implementation is as follows:
(31) and taking the lens self-adaptive motion signal as a new input signal, and solving a Gaussian transformation signal of the new input signal through a Gaussian function.
(32) And according to the characteristic center algorithm, weighting the Gaussian transformation signal and the original motion signal of the lens to obtain a new characteristic center motion signal. The original signal motion tendency of the lens is protected as much as possible while further suppressing the shake component.
The adaptive time domain algorithm in the step (4) defines a matching rate, so that a combined optimization method of space-time consistency and feature center EMD adaptive video stabilization can adaptively select a library image for video frame extrapolation;
the concrete implementation is as follows:
(41) further generating a new stable video, namely an image frame sequence, by taking the characteristic center motion signal as an input parameter;
(42) in order to complement the blank formed by translation, rotation and stretching of the video frame, based on the adaptive time domain algorithm and the similarity transformation algorithm, the available library images, namely the extrapolated images, are adaptively selected and extrapolated, and the extrapolation of the missing area is executed.
(43) In order to complement the blank formed by frame translation, rotation, and stretching of the video frame sequence, the gray values of 16 adjacent pixels in the overlap region are weighted and processed based on a cubic spline interpolation algorithm, and interpolation of the overlap region is performed.
The principle of the invention is as follows: on the basis of decomposing noise signals by using an EMD method, the video anti-shake is taken as a target, and technologies such as saliency protection based on space structure consistency matrix estimation, visual elimination, self-adaption smoothness, cutting area reduction, video completion and the like are applied to a shake video. The invention mainly comprises the following four aspects:
(1) based on the consistency of the spatial structure of the image significance protection, the image significance is an important visual feature in the image, and the attention degree of human eyes to certain areas of the image is reflected. A large number of saliency mapping methods are widely applied to image compression, encoding, image edge and region reinforcement, saliency target segmentation and extraction and the like. In the spatial structure consistency method, a uniform grid is constructed on a natural image, a saliency map is constructed on an original image, and the saliency map is used for distributing 0-1 saliency values on each image pixel of the image. The saliency values of each cell of the grid of the original image are averaged to form a saliency vector for each cell of the grid. A new deformation mesh of the deformed image is calculated according to a corresponding stabilization method, the deformation process is used for saving the image area of the salient area, and the inevitable distortion deformation is concentrated in the unimportant area.
(2) The self-adaptive eigenmode function and the empirical mode decomposition are used for carrying out signal decomposition according to the time scale characteristics of data, and any basis function is not required to be preset. This is essentially different from the fourier decomposition and wavelet decomposition methods that are built on a priori harmonic basis functions and wavelet basis functions. Due to the characteristics, the empirical mode decomposition method can be theoretically applied to the decomposition of any type of signals, so that the method has very obvious advantages in processing non-stationary and non-linear data, is suitable for analyzing non-linear and non-stationary signal sequences, and has high signal-to-noise ratio. The key of the method is empirical mode decomposition, which can decompose a complex signal into a finite number of eigenmode functions, and each decomposed function component comprises local characteristic signals of different time scales of an original signal. The invention researches and realizes an eigenmode function decomposition method based on self-adaptation, and the core idea is as follows: the method comprises the steps of obtaining SIFT feature points of each frame of a jittering video, estimating a geometric deformation matrix on the basis, further calculating a lens motion signal, further decomposing the signal into a group of eigenmode functions, and calculating the optimal coefficient of each eigenmode function in a self-adaptive mode by using a CVX (constant frequency X) method. The method is used for carrying out stabilization processing on non-stationary data and then carrying out Hilbert transform to obtain a time-frequency spectrogram, so that the frequency with physical significance is obtained. The three-dimensional reconstruction of a video scene in a three-dimensional space is effectively avoided, and the robustness of the system is improved. Meanwhile, video processing is effectively converted into a signal processing problem, and the original video is conveniently analyzed and processed. The self-adaptability of the algorithm is improved, and the human intervention is avoided.
(3) The invention provides a characteristic center strategy based on Gaussian distribution, namely Gaussian distribution and normal distribution, and is a convenient model for quantitative phenomena in natural science and behavior science. Various psychological test scores and physical phenomena such as photon counts have been found to be approximately normal distributions. Although the root cause of these phenomena is often unknown, it can be shown theoretically that if many small effects are added up as one variable, then this variable follows a normal distribution. Normal distributions appear in many regions: for example, the mean of the sampling distribution is approximately normal, even though the original population distribution of the sampled samples does not follow a normal distribution. In addition, normal distribution information entropy is largest among all distributions with known mean and variance, which makes it a natural choice for a distribution with known mean and variance. A normal distribution is one of the most widely used distributions in statistics and many statistical tests. In probability theory, a normal distribution is the limiting distribution of several continuous as well as discrete distributions. The feature center strategy obviously reduces the clipping area of the stable video while maintaining the optimized video stabilization rate. The strategy keeps the characteristic centrality of the EMD, introduces the weighted Gaussian distribution, and keeps the motion trend of the original video track of the new video motion track while inhibiting the jitter.
(4) And (4) carrying out regional extrapolation and interpolation, and further generating a new stable video (image frame sequence) by taking the characteristic central motion signal as an input parameter. In order to complement the blank formed by frame translation, rotation and stretching of the video (image frame sequence), a matching rate is defined based on an adaptive time domain algorithm and a similar transformation algorithm, an available library image (an extrapolated image) is adaptively selected and extrapolated, and extrapolation of a missing area is performed. In order to complement the blank space formed by frame translation, rotation and stretching of the video (image frame sequence), the gray values of 16 adjacent pixels of the overlapping region are weighted and processed based on a cubic spline interpolation algorithm, and the interpolation of the overlapping region is performed. The adaptive time domain algorithm enables the space-time consistency and feature center EMD adaptive video stabilization combined optimization method to adaptively select a library image for video frame extrapolation.
Compared with the prior art, the invention has the advantages that:
(1) spatial structure consistency: the invention provides the spatial structure consistency based on the image significance protection, homography matrixes with different spaces are respectively constructed based on the spatial structure diversity of each area of the image, and the spatial consistency deformation is carried out based on the homography matrixes. Meanwhile, a uniform motion track of the camera can be flexibly constructed, and the universality and the accuracy of video enhancement processing are improved.
(2) Self-adaptability: the invention creates a self-adaptive eigenmode function optimization algorithm, self-adaptively calculates the video stability factor by using a CVX method, and self-adaptively generates an optimized camera motion track. The method adaptively ensures that the jittering video is more stable, promotes the analysis and optimization of the unstable video, and improves the stability and the adaptivity of video enhancement.
(3) Characteristic centrality: a characteristic center algorithm is constructed, the algorithm is a characteristic center strategy based on Gaussian distribution, the strategy can remarkably reduce the shearing area of the stable video, meanwhile, the stability of the optimized video is protected, and the accuracy and the integrity of the video are improved.
(4) Video integrity: in order to complement the blank formed by translation, rotation and stretching of the video frame, extrapolation of a missing area is executed based on an adaptive time domain algorithm, and interpolation of an overlapping area is executed based on a cubic spline interpolation algorithm. The method can adaptively interpolate the lost area and interpolate the overlapped area aiming at the original video (image frame sequence), thereby further improving the integrity of the video.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a graph of the spatial structure consistency homography matrix estimation and the relationship between S (t), D (t), and B (t); the upper graph is a homography matrix estimation graph of the consistency of the spatial structure, and the lower graph is a relation graph among S (t), D (t) and B (t);
FIG. 3 shows the original and optimized motion signals and the IMFs (eigenmode functions) decomposed from the original signals; the upper graph is a graph of the original and optimized motion signals, and the lower graph is an eigenmode function graph decomposed from the original signals;
FIG. 4 shows a square with or without feature centers EMD and its signals; the upper diagram is a feature screenshot of a square with or without feature center EMD, the lower left diagram is an EMD change diagram without the feature center, and the lower right diagram is an EMD change diagram with the feature center;
FIG. 5 is a diagram of adaptive time domain estimation;
FIG. 6 is an unknown pixel gray value estimate.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
As shown in FIG. 1, the method comprises the following steps:
(1) and extracting image features by an SIFT method, performing feature matching, obtaining a saliency vector, and performing space consistency deformation by taking the saliency vector as a reference.
(2) And (2) from the viewpoint position, re-acquiring a feature set of each image frame obtained by deformation in the step (1), constructing a space structure matrix based on SIFT, extracting rotation, translation and scaling information, constructing an original motion signal, and acquiring a new motion signal according to a self-adaptive eigenmode function optimization algorithm.
(3) And (3) taking the self-adaptive motion signal obtained by calculation in the step (2) as a new input signal, and protecting the motion trend of the original signal as far as possible while further inhibiting the jitter component according to a characteristic center algorithm.
(4) And (4) generating a new stable video by the characteristic center motion signal obtained by calculation in the step (3), performing extrapolation of a missing area based on a self-adaptive time domain algorithm for completing a blank formed by translation, rotation and stretching of a video frame, and performing interpolation of an overlapping area based on a cubic spline interpolation algorithm.
Specific implementations of the invention are described in detail.
1. Spatial structure consistency based on image saliency protection
The method extracts image features through an SIFT method, performs feature matching, obtains a saliency vector, and performs spatial consistency deformation by taking the saliency vector as a reference.
a. Spatial structure homography matrix construction based on SIFT
According to the method, SIFT features are extracted from a jittering video, and descriptor components and coordinate components are obtained from the SIFT features. The descriptor component is a K x 128 matrix where each row represents the invariant descriptor for one of the K keypoints. The SIFT descriptor is a vector of 128 values normalized to a single length. The coordinate component is a K × 4 matrix in which each row includes 4 values as the values of the key point coordinates (row, column, scale and direction). Oriented in [ -PI, PI [ ]]In the radian range. A SIFT feature match is accepted only if its euclidean distance is less than a distratito multiple of the second closest distance. In general, distratito is set to 0.1. The euclidean distance from point s to t is the length of the line segment connecting them. By the above-mentioned treatment, [ x, y ] can be obtained]An M x 2 matrix of coordinates. Outliers in the two frame match points are excluded by using the M-estimator SAmple Consensus (MSAC) algorithm. As shown in the second image of the first rectangle of figure 1,the geometric transformation maps the interior points in the matching points of the left video frame to the interior points in the matching points of the right video frame. Using a 3X 3 matrix TtTo represent the geometric transformation:
wherein R istIs a 2 × 2 rotation matrix, OtAre 2 x 1 translation vectors representing the direction and position of camera motion, respectively, in the global coordinate system. As shown in FIG. 2, the relative camera motion at time T may be transformed by a two-dimensional Euclidean transformation TtIndicates that S is satisfiedt=St-1Tt-1. St is represented as:
b. spatial structure consistency optimization
The invention overlays a uniform grid on the image, havingA column andand (4) a row. The goal is to compute the warped mesh of the adjusted image. Consistent with common image redirection methods, saliency maps are used to assign each pixel of an image an important value between 0 and 1. A distortion that preserves as much as possible the salient regions of the image can be calculated, and the inevitable distortion will be concentrated in the less important regions. And calculating the average value of all the significance values in each unit of the square grids of the grid original image, and finally obtaining the significance vector.
Based on the significance vector, the influence of spatial grid inconsistency is further optimized, and the parallax is greatly reduced. The camera path and local homography matrix are acquired using the method proposed by Lowe et al. Then, a camera path with a spatial grid that is not uniform is defined for the entire video. Let Si(t) as a framethe camera positions of the squares i in t are represented as follows:
Si(t)=Si(t-1)Ti(t-1)
get Si(1) For the cell matrix, the following formula is derived:
the video frame is uniformly divided into a plurality of meshes. As shown in FIG. 2 (top panel), each grid has a trajectory, denoted by Si(t) represents. T isi(t-1) denotes the same grid cell, from Si(t-1) to Si(t) estimated local homography matrix. As shown in fig. 2 (lower graph), d (t) represents the smooth path, and b (t) represents the transformation from the original path s (t) to the smooth path d (t). The spatial structure consistency trajectories of these cameras can be smoothed by the following formula:
s ═ S (t) } is the original path, and D ═ { D (t) } is the optimized path. Ω (i) represents eight neighbors of grid cell i. Data item | Di(t)-Si(t) | ensures that the new camera path approaches the original path to reduce clipping and distortion, and | | Di(t)-Dj(t) | | may keep the current grid cell consistent with nearby neighbors. Parameter lambdatFor balancing the two above. For an edge mesh cell, its value is set to be the same as that of its non-existent neighbors. I.e. when j is absent, it can be denoted as Dj(t)=Di(t) of (d). This optimization is quadratic, and the best results can be obtained by solving a large sparse linear system. The above solution is updated by a Jacobi-based iterative approach.
δ is an iteration factor. At initialization, D(0)(t) s (t), the optimal path D is obtainedi(t) of (d). Using B (t) ═ S-1(t) d (t), the original video frame may be converted into a video frame having spatial structure consistency while protecting the salient region. With this technique, the present invention eliminates parallax between spatially distinct grid cells within each frame. However, it cannot eliminate jitter between different video frames. The video stabilization between different frames will be described in the next section.
2. Adaptive eigenmode function
And (3) from the viewpoint position, re-acquiring a feature set of each image frame obtained by deformation in the step (1), constructing a space structure matrix based on SIFT, extracting rotation, translation and scaling information, constructing an original motion signal, and acquiring a new motion signal according to a self-adaptive eigenmode function optimization algorithm.
EMD can decompose any complex signal to generate IMF through a process of screening that requires the following steps to be performed iteratively. The first step is to find local extreme points (maxima and minima). The second step is to generate the upper and lower envelopes by cubic spline interpolation. The third step is to judge whether the difference is IMF. The last step is to judge whether the residual is monotonous. Specifically, it can decompose the original signal, and the formula is as follows:
where f isk(k-1, …, N) are IMFs, rNIs the corresponding residual. IMF is shown in FIG. 3 (lower panel)k(k-1, …,5), IMF6 denotes the residual. For ease of representation and calculation, the invention sets fN+1=rNThat is, the present invention considers the residual as the last IMF. The original signal shown in fig. 3 (lower panel) is decomposed into IMFs and residuals. As in fig. 3 (lower panel), 6 IMFs represent the decomposed signal of the original signal. To stabilize the video, the high frequency signal should be smooth. The optimal trajectory of the camera, represented as a high gray line as in fig. 3 (upper graph), is obtained by minimizing the objective function, which is as follows.
Where α denotes the coefficient of IMF. Variables are denoted by X. When in useWhen the temperature of the water is higher than the set temperature,andthe L1 norm of the first, second and third derivatives of X, respectively.Andthe minimum of the sum smoothes the IMF (as shown in fig. 3 (lower panel)) to remove jitter in the unstable video.Representing the original signal as shown in fig. 3 (upper panel).Andthe minimum value of the difference of (d) keeps the original signal close to the optimized signal to avoid excessive clipping. W is an adaptive balancing factor for balancing the four terms.
In summary, the optimization method of the present invention takes into account a variety of competing factors, such as eliminating vibration, eliminating excessive shear, and minimizing distortion. The above formula is a convex optimization problem that can be solved by a convex linear programming method (CVX). The optimal motion signal (shown in fig. 3 (top panel)) can be solved by the following equation:
is the optimal motion signal, represented as the high gray line in fig. 3 (upper panel).Is a new coefficient of IMF. Fig. 3 (top) shows the camera trajectory before and after smoothing, represented by the low and high gray lines, respectively.
3. EMD of feature centers
As in fig. 4 (bottom left), the low gray lines represent the motion signal of the original path and the high gray lines represent the motion signal of the smooth path without the characteristic center EMD. As shown in the bottom left image in fig. 4, the original method is overly smooth, losing the original tendency to move. The right picture shows the feature-centric results. The left picture is over-cropped. In order to preserve the trend of the original EMD motion signal, extreme points of the original motion signal are defined as features. To maintain central characteristics for the EMD signal, the characteristic-centered modal equation is as follows:
here ω istRepresenting 60 adjacent frames. The invention introduces a Gaussian function Gt() And is provided with Gt() The standard deviation was 10. StRepresenting the original value of frame t without the feature center EMD.Frames representing featureless central EMDThe original value of (a). StRepresenting the original value of frame t without the feature center EMD.Representing the optimized value of frame t.Representing framesThe optimum value of (c). As in figure 4 (lower right),representing the value of frame t with a characteristic centrality EMD. This value ensures that the new path maintains the trend of the original path while successfully suppressing high frequency jitter and low frequency jitter of the original path.
4. Regional extrapolation and interpolation
In order to fill the blank caused by the translation, rotation and stretching of the video frame, a video extrapolation method based on an adaptive time domain algorithm and an interpolation method based on cubic spline interpolation are adopted to realize the extrapolation of the missing area and the interpolation of the overlapping area. Library images are selected by an adaptive time domain method and deformed by an As-Similar-As-Possible algorithm. According to the warped image of the adaptive temporal method of the present invention, the present invention extrapolates all video frames. Specifically, the features of the scale-invariant feature transform (SIFT) between the frame t and its neighbors are first detected and matched. And then calculating the matching rate by the number of the matched features and the total number of the features. As shown in fig. 5, the percentage is a value of the matching rate. As the matching rate decreases, the transformed library image may be distorted. Therefore, the threshold is set to 65%, and video frames having a matching rate of more than 65% are selected as library images. The adaptive time domain range is [ t-E +1, t + E ] in FIG. 5. The extrapolated overlapping part may have uneven transition, and is performed by a cubic spline interpolation method. Fig. 6 shows the interpolated gray value of the unknown pixel (x, y), which can be solved by a weighted interpolation of sixteen gray values in its vicinity.
The above examples are provided for the purpose of describing the present invention only, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.
Claims (1)
1. A joint optimization method for space-time consistency and feature center EMD adaptive video stabilization is characterized by comprising the following steps:
extracting image Scale Invariance (SIFT) feature points by an SIFT method, performing SIFT feature matching, deploying uniform grids on an image, obtaining saliency vectors of a plurality of grids, and performing space structure consistency deformation on the basis of protecting the saliency vectors of the image grids by taking the uniform grids as a reference to obtain a deformed image frame sequence;
starting from the viewpoint position, re-acquiring an SIFT feature set of each image frame obtained by deformation in the step (1), constructing a space structure matrix based on SIFT, extracting rotation, translation and scaling motion information of the space structure matrix, constructing an original lens motion signal, decomposing and generating an eigenmode function through an EMD (empirical mode decomposition) algorithm, adaptively generating anisotropic coefficients of all the eigenmode functions according to an adaptive eigenmode function optimization algorithm, and acquiring a new lens adaptive motion signal through a weighted summation algorithm of the eigenmode function and the anisotropic coefficients;
step (3) taking the lens adaptive motion signal obtained by calculation in the step (2) as a new input signal, weighting to obtain a new characteristic center motion signal according to a characteristic center algorithm, and protecting the original signal motion trend of the lens by adopting a weighting algorithm based on a Gaussian function while further inhibiting jitter components;
step (4), based on the characteristic central motion signal obtained by calculation in step (3), further generating a new stable video, namely an image frame sequence, compensating blanks formed by frame translation, rotation and stretching of the video, performing extrapolation of a missing region based on an adaptive time domain algorithm, and performing interpolation of an overlapping region based on a cubic spline interpolation algorithm;
the method for obtaining the deformed image frame sequence by performing consistent deformation of the spatial structure on the basis of protecting the saliency vector of the image grid in the step (1) is realized as follows:
(11) and (3) detection of extreme values in the scale space: firstly, constructing a scale space, searching image positions on all scales, and identifying potential interest points which are invariable in scale and rotation through a Gaussian differential function;
(12) key point positioning: determining the position and scale of each candidate position by fitting a fine model; the selection of key points depends on their degree of stability;
(13) direction determination: assigning one or more directions to each keypoint location based on the local gradient direction of the image, all subsequent operations on the image data being transformed with respect to the direction, scale and location of the keypoint, thereby providing invariance to these transformations;
(14) description of key points: measuring local gradient of the image on a selected scale in a neighborhood around each key point, and then converting the gradient into a representation, wherein the representation allows deformation of local shape and illumination change, and the obtained key points are scale-invariant feature points;
(15) matching the scale-invariant feature points by measuring Euclidean distances of the scale-invariant feature points in the adjacent frames;
(16) distributing a significance mapping vector for each pixel of the image, wherein the value range of the significance mapping vector is 0-1, deploying uniform grids on the image, averaging the significance mapping vectors of all pixels on each grid, and acquiring the significance vector of each grid;
(17) taking a uniform grid as a reference, performing space structure consistency deformation on the basis of protecting a saliency vector of an image grid, preferentially protecting an area with a grid saliency vector value larger than a median value, and concentrating distortion caused by space structure consistency deformation in an area with a grid saliency vector value smaller than the median value to obtain a deformed image frame sequence;
the self-adaptive eigenmode function optimization algorithm in the step (2) is realized as follows:
(21) starting from the viewpoint position, re-acquiring an SIFT feature set from each image frame obtained by deformation in the step (1), and constructing a spatial structure matrix based on SIFT;
(22) converting the space structure matrix based on the SIFT into a model related to scale, rotation and translation, extracting rotation, translation and scaling motion information of the scale, rotation and translation related model, aggregating the rotation, translation and scaling motion information of all image frames, and constructing an original lens motion signal;
(23) decomposing and generating an eigenmode function through an EMD decomposition algorithm;
(24) generating the anisotropic coefficients of all the eigenmode functions in a self-adaptive mode function optimization algorithm, and acquiring a new lens self-adaptive motion signal through a weighted summation algorithm of the eigenmode functions and the anisotropic coefficients thereof;
the feature center algorithm in the step (3) is realized as follows:
(31) taking the lens self-adaptive motion signal as a new input signal, and solving a Gaussian transformation signal of the new input signal through a Gaussian function;
(32) according to the characteristic center algorithm, weighting processing is carried out on the Gaussian transformation signal and the original motion signal of the lens, a new characteristic center motion signal is obtained, and the motion trend of the original signal of the lens is protected as far as possible while the jitter component is further suppressed;
the adaptive time domain algorithm in the step (4) is realized as follows:
(41) further generating a new stable video, namely an image frame sequence, by taking the characteristic center motion signal as an input parameter;
(42) in order to complement the blank formed by translation, rotation and stretching of the video frame, based on the adaptive time domain algorithm and the similarity transformation algorithm, the available library image, namely the extrapolated image, is adaptively selected and extrapolated, and the extrapolation of the missing area is executed;
(43) in order to complement the blank formed by frame translation, rotation, and stretching of the video frame sequence, the gray values of 16 adjacent pixels in the overlap region are weighted and processed based on a cubic spline interpolation algorithm, and interpolation of the overlap region is performed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429065.6A CN108765317B (en) | 2018-05-08 | 2018-05-08 | Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429065.6A CN108765317B (en) | 2018-05-08 | 2018-05-08 | Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108765317A CN108765317A (en) | 2018-11-06 |
CN108765317B true CN108765317B (en) | 2021-08-27 |
Family
ID=64010269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810429065.6A Active CN108765317B (en) | 2018-05-08 | 2018-05-08 | Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108765317B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109905565B (en) * | 2019-03-06 | 2021-04-27 | 南京理工大学 | Video de-jittering method based on motion mode separation |
CN110070031A (en) * | 2019-04-18 | 2019-07-30 | 哈尔滨工程大学 | A kind of sediment extracting echo characteristics of active sonar fusion method based on EMD and random forest |
CN110287921B (en) * | 2019-06-28 | 2022-04-05 | 潍柴动力股份有限公司 | Noise reduction method and noise reduction system for engine characteristic parameters |
CN112351306A (en) * | 2019-08-09 | 2021-02-09 | 飞思达技术(北京)有限公司 | Video content source consistency comparison technology based on IPTV and OTT services |
CN112561839B (en) * | 2020-12-02 | 2022-08-19 | 北京有竹居网络技术有限公司 | Video clipping method and device, storage medium and electronic equipment |
CN112866670B (en) * | 2021-01-07 | 2021-11-23 | 北京邮电大学 | Operation 3D video image stabilization synthesis system and method based on binocular space-time self-adaptation |
CN114494356B (en) * | 2022-04-02 | 2022-06-24 | 中傲数据技术(深圳)有限公司 | Badminton video clip processing method and system based on artificial intelligence |
CN117094916B (en) * | 2023-10-19 | 2024-01-26 | 江苏新路德建设有限公司 | Visual inspection method for municipal bridge support |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140362240A1 (en) * | 2013-06-07 | 2014-12-11 | Apple Inc. | Robust Image Feature Based Video Stabilization and Smoothing |
CN104408741A (en) * | 2014-10-27 | 2015-03-11 | 大连理工大学 | Video global motion estimation method with sequential consistency constraint |
CN105976330A (en) * | 2016-04-27 | 2016-09-28 | 大连理工大学 | Embedded foggy-weather real-time video image stabilization method |
-
2018
- 2018-05-08 CN CN201810429065.6A patent/CN108765317B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140362240A1 (en) * | 2013-06-07 | 2014-12-11 | Apple Inc. | Robust Image Feature Based Video Stabilization and Smoothing |
CN104408741A (en) * | 2014-10-27 | 2015-03-11 | 大连理工大学 | Video global motion estimation method with sequential consistency constraint |
CN105976330A (en) * | 2016-04-27 | 2016-09-28 | 大连理工大学 | Embedded foggy-weather real-time video image stabilization method |
Also Published As
Publication number | Publication date |
---|---|
CN108765317A (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108765317B (en) | Joint optimization method for space-time consistency and feature center EMD self-adaptive video stabilization | |
US9280825B2 (en) | Image processing system with registration mechanism and method of operation thereof | |
US9692939B2 (en) | Device, system, and method of blind deblurring and blind super-resolution utilizing internal patch recurrence | |
US9536147B2 (en) | Optical flow tracking method and apparatus | |
CN107358623B (en) | Relevant filtering tracking method based on significance detection and robustness scale estimation | |
KR102185963B1 (en) | Cascaded camera motion estimation, rolling shutter detection, and camera shake detection for video stabilization | |
CN106338733B (en) | Forward-Looking Sonar method for tracking target based on frogeye visual characteristic | |
CN110503620B (en) | Image fusion method based on Fourier spectrum extraction | |
CN106228544B (en) | A kind of conspicuousness detection method propagated based on rarefaction representation and label | |
KR101634562B1 (en) | Method for producing high definition video from low definition video | |
CN107749987B (en) | Digital video image stabilization method based on block motion estimation | |
Lamberti et al. | CMBFHE: a novel contrast enhancement technique based on cascaded multistep binomial filtering histogram equalization | |
Micheli et al. | A linear systems approach to imaging through turbulence | |
US9449371B1 (en) | True motion based temporal-spatial IIR filter for video | |
Ttofis et al. | High-quality real-time hardware stereo matching based on guided image filtering | |
CN107563978A (en) | Face deblurring method and device | |
CN111383252A (en) | Multi-camera target tracking method, system, device and storage medium | |
Jeong et al. | Multi-frame example-based super-resolution using locally directional self-similarity | |
CN108305268A (en) | A kind of image partition method and device | |
KR101851896B1 (en) | Method and apparatus for video stabilization using feature based particle keypoints | |
Choi et al. | Robust video stabilization to outlier motion using adaptive RANSAC | |
CN103618904B (en) | Motion estimation method and device based on pixels | |
KR101919879B1 (en) | Apparatus and method for correcting depth information image based on user's interaction information | |
CN114998358A (en) | Multi-focus image fusion method and device, computer equipment and storage medium | |
Pickup et al. | Multiframe super-resolution from a Bayesian perspective |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |