CN115174817A - Hybrid anti-shake method and system based on deep learning - Google Patents

Hybrid anti-shake method and system based on deep learning Download PDF

Info

Publication number
CN115174817A
CN115174817A CN202211077092.4A CN202211077092A CN115174817A CN 115174817 A CN115174817 A CN 115174817A CN 202211077092 A CN202211077092 A CN 202211077092A CN 115174817 A CN115174817 A CN 115174817A
Authority
CN
China
Prior art keywords
optical flow
network
acquiring
camera
bidirectional optical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211077092.4A
Other languages
Chinese (zh)
Inventor
高歌
王保耀
郭奇锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shenzhi Future Intelligence Co ltd
Original Assignee
Shenzhen Shenzhi Future Intelligence Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shenzhi Future Intelligence Co ltd filed Critical Shenzhen Shenzhi Future Intelligence Co ltd
Priority to CN202211077092.4A priority Critical patent/CN115174817A/en
Publication of CN115174817A publication Critical patent/CN115174817A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Studio Devices (AREA)

Abstract

The invention discloses a hybrid anti-shake method and a hybrid anti-shake system based on deep learning, wherein the method comprises the following steps: acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video; inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network; acquiring pose data of a camera; inputting the output result of the bidirectional optical flow network and the pose data into an alignment network; and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation. The embodiment of the invention calculates the dense optical flow by using a deep learning end-to-end neural network method, is more robust than the traditional algorithm, obtains higher optical flow result precision, and selects historical and future camera pose data in a time domain. And the pose data is fused and corrected in an airspace, so that the anti-shake effect is reduced, and the quality of video images is improved.

Description

Hybrid anti-shake method and system based on deep learning
Technical Field
The invention relates to the technical field of image processing, in particular to a hybrid anti-shake method and system based on deep learning.
Background
With the continuous development of smart cameras, video anti-shake technology is becoming more and more important in products in the fields of unmanned aerial vehicles, unmanned ships, city security, high-point monitoring, robots, aerospace and the like.
Video anti-shake techniques can be roughly classified into Optical Image Stabilization (OIS), electronic Image Stabilization (EIS), and Hybrid Image Stabilization (HIS).
OIS is a hardware solution that uses micro-electro-mechanical system (MEMS) gyroscopes to detect motion and adjust the camera system accordingly.
The EIS is from the perspective of software algorithm, does not need additional hardware support, and stabilizes the low-frequency jitter and large-amplitude motion of the video. Compared with OIS, the method has the advantages of being embedded in software, easy to upgrade, low in power consumption, low in cost and the like. HIS is a fusion scheme for OIS and EIS. The HIS fusion scheme has the advantages that the advantages of all the sensors can be taken, the information of the sensors can be collected together, and the judgment accuracy of the camera anti-shake system is improved through comprehensive analysis.
Most of the anti-shake algorithms for devices on the market today are image-based methods to smooth the camera path. The algorithm flexibility is suitable for nonlinear motion compensation. Under the condition of no rigid constraint, the screenshot ratio is large, non-rigid distortion and smear can also occur, and the motion compensation effect is poor.
The prior art is therefore still subject to further development.
Disclosure of Invention
In view of the above technical problems, embodiments of the present invention provide a hybrid anti-shake method and system based on deep learning, which can solve the technical problems that in the prior art, most of anti-shake algorithms adopt an image processing-based method to perform translation of a camera path, a screenshot ratio is large, non-rigid distortion and smear occur without rigid termination, a motion compensation effect is relatively left, and video shooting quality is affected.
A first aspect of an embodiment of the present invention provides a hybrid anti-shake method based on deep learning, including:
acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video;
inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network;
acquiring pose data of a camera;
inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
Optionally, acquiring a video shot by a camera, and acquiring consecutive N-frame images based on the video includes:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
Optionally, inputting the continuous N frames of images into the bidirectional optical flow network, and acquiring an output result of the bidirectional optical flow network, including:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
Optionally, acquiring pose data of the camera includes:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
Optionally, the inputting the output result of the bidirectional optical flow network and the pose data into an alignment network comprises:
carrying out synchronous processing on the triaxial angular velocity data and the video data to generate synchronous triaxial angular velocity data and obtain a relative rotation matrix corresponding to the triaxial angular velocity data;
and inputting the synchronized triaxial angular velocity data and the output result of the bidirectional optical flow network into an alignment network, wherein the alignment network is an RNN (radio network) and comprises a forgetting stage, a memory selecting stage and an output stage.
A second aspect of the embodiments of the present invention provides a hybrid anti-shake system based on deep learning, where the system includes: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:
acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video;
inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network;
acquiring pose data of a camera;
inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
Optionally, the computer program when executed by the processor further implements the steps of:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
Optionally, the computer program when executed by the processor further implements the steps of:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
Optionally, the computer program when executed by the processor further implements the steps of:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
A third aspect of embodiments of the present invention provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and when executed by one or more processors, the computer-executable instructions may cause the one or more processors to perform the deep learning based hybrid anti-shake method described above.
In the technical scheme provided by the embodiment of the invention, a video shot by a camera is obtained, and continuous N frames of images are obtained based on the video; inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network; acquiring pose data of a camera; inputting the output result of the bidirectional optical flow network and the pose data into an alignment network; and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation. The embodiment of the invention uses a deep learning end-to-end neural network method to calculate the dense optical flow, is more robust than the traditional algorithm, obtains a higher optical flow result precision, and selects historical and future camera pose data in a time domain. And the pose data is fused and corrected in an airspace, so that the anti-shake effect is reduced, and the quality of video images is improved.
Drawings
Fig. 1 is a schematic flowchart illustrating an embodiment of a hybrid anti-shake method based on deep learning according to the present invention;
fig. 2 is a schematic diagram of a hardware structure of another embodiment of a hybrid anti-shake system based on deep learning according to the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following detailed description of embodiments of the invention refers to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a hybrid anti-shake method based on deep learning according to the present invention. As shown in fig. 1, includes:
s100, acquiring a video shot by a camera, and acquiring continuous N-frame images based on the video;
step S200, inputting continuous N frames of images into a bidirectional optical flow network, and acquiring an output result of the bidirectional optical flow network;
s300, acquiring pose data of a camera;
step S400, inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and S500, acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
In specific implementation, the embodiment of the invention adopts a camera to shoot a video, the shot video data is converted into an image, and the converted original image format comprises but is not limited to original image formats such as RGB, dng and RAW, or other color space pictures such as HSV and YUV.
Acquiring continuous N frames of images in the converted images, inputting the continuous frames of images into the bidirectional optical flow network, and acquiring an output result of the bidirectional optical flow network. The image is processed by adopting a bidirectional optical flow network, wherein an optical flow algorithm is based on three assumptions: the brightness between adjacent frames is constant; the motion of objects between adjacent frames is relatively 'tiny'; the space consistency is kept; i.e. adjacent pixels have the same motion.
Bidirectional optical flow, namely the optical flow result is calculated for both forward and reverse time dimensions, which plays an important role in deducing the occlusion area between frames. The training data used in the bidirectional optical flow network training is 720P resolution pictures, but may be replaced by other resolution pictures combined with data preprocessing such as up-down sampling.
The sensor can adopt OIS, a gyroscope, an accelerometer or a magnetometer and other sensors capable of obtaining camera pose information, the sensor System can adopt an MEMS (Micro-Electro-Mechanical System), also called a Micro-electromechanical System, a Micro-machine and the like, and refers to a device with the size of several millimeters or less, the MEMS System is capable of obtaining the camera pose information, and the camera pose information mainly comprises triaxial angular velocity data of the camera. The internal structure of the micro-electro-mechanical system is generally in the micron or even nanometer scale, and the micro-electro-mechanical system is an independent intelligent system.
Inputting the output result of the optical flow network and the three-axis angular velocity sensor corresponding to the pose information of the camera into the trained alignment network, performing alignment operation, and acquiring the output result of the alignment network; and warping the output result of the alignment network to a pose corresponding to the camera to obtain an image stabilizing result of the current image frame, and finishing anti-shake operation.
Further, acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video, comprising:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
In specific implementation, a video shot by a camera is obtained, continuous 5-frame RGB images are obtained from the video, and continuous five-frame RGB color space data pairs
Figure 524914DEST_PATH_IMAGE001
Figure 973213DEST_PATH_IMAGE002
Figure 116399DEST_PATH_IMAGE003
Figure 573925DEST_PATH_IMAGE004
The use of (dimension of each frame is hxw x 3) as input to find motion between frames is widely used.
Further, inputting the continuous N frames of images into the bidirectional optical flow network, and acquiring the output result of the bidirectional optical flow network, the method comprises the following steps:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
Specifically, the bidirectional optical flow network is a CNN network conforming to UNet structure, and the output result is 4 optical flow forward and backward optical flow results:
Figure 857139DEST_PATH_IMAGE005
Figure 476339DEST_PATH_IMAGE006
Figure 805689DEST_PATH_IMAGE007
Figure 365109DEST_PATH_IMAGE008
each in the data format of H x W x 2.
The Farneback algorithm based on the OpenCV is the traditional most classical dense optical flow algorithm, and FlowNet I, II, III and PWC Net based on deep learning and a subsequently updated latest optical flow network are matched with a reversed optical flow layer to directly obtain the bidirectional optical flow. Bidirectional optical flow results can be obtained directly, including bidirectional optical flow networks based on framing applications, and the like.
Further, acquiring pose data of the camera and performing synchronous operation with the video time stamp comprises:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
In specific implementation, the MEMS gyroscope data three-axis angular velocity
Figure 830725DEST_PATH_IMAGE009
Function of (2). Firstly, data preprocessing operations of complementary filtering and Kalman filtering are carried out on gyroscope data, namely, an angle obtained by a gyroscope is used as an optimal value in a short time, and an acceleration value sampled by acceleration is averaged at regular time to correct the angle obtained by the gyroscope. Then, kalman uses the state estimation value of the previous time and the observation value of the current time to obtain the optimal estimation of the state variable of the dynamic system at the current time.
Further, inputting the output result of the bidirectional optical flow network and the camera pose data into an alignment network, comprising:
carrying out synchronous processing on the triaxial angular velocity data and the video data to generate synchronous triaxial angular velocity data and obtain a relative rotation matrix corresponding to the triaxial angular velocity data;
and inputting the synchronized triaxial angular velocity data and the output result of the bidirectional optical flow network into an alignment network, wherein the alignment network is an RNN (radio network) and comprises a forgetting stage, a memory selecting stage and an output stage.
In particular, the network is aligned in such a way that it provides a bi-directional optical flow result
Figure 558510DEST_PATH_IMAGE005
Figure 906315DEST_PATH_IMAGE006
Figure 440064DEST_PATH_IMAGE007
Figure 697870DEST_PATH_IMAGE008
As input, pass through a plurality of 2D convolutional layers and activation functions. The encoder is operative to encode the data in a high degree of dimensionality
Figure 422987DEST_PATH_IMAGE010
Encoding into low-dimensional hidden variables to force neural network to learn most information-quantity characteristics, and rotating matrix for motion parameters of image
Figure 258088DEST_PATH_IMAGE011
This matrix includes the rotation parameters and translation parameters of the three axes. Current frame
Figure 267632DEST_PATH_IMAGE012
Is subject to four adjacent frame transformation parameters
Figure 301316DEST_PATH_IMAGE013
The mean value of (a):
Figure 465844DEST_PATH_IMAGE014
(formula 1)
The motion parameters are obtained in time sequence and then sent to the RNN for learning long-term dependence information, and information persistence is allowed. The anti-shake algorithm needs to infer the next moment through the persistence of motion information of the previous time period, but excessive long-term dependence is avoided. The RNN network of this invention needs to be designed into three internal stages to achieve the effect of filtering valid information in time sequence:
a forgetting stage: this stage is mainly the selective forgetting of the input coming from the last node.
Selecting a memory stage: this stage selectively memorizes the inputs of this stage. Which important ones are recorded and which ones are not important, and the others are recorded less. And adding the results obtained in the two steps to obtain the state transmitted to the next state.
And (5) an output stage. This phase will determine which will be the output of the current state. And the result obtained in the last stage is scaled through a tanh activation function, and a final result is output.
The input required to be received by the RNN network has the functions of fusing and filtering the preprocessed and synchronized MEMS camera poses besides the memory and selection of the implicit motion parameters in the previous step on the time sequence; because the gyroscope data has higher sampling frequency, the gyroscope and the video data need to be synchronously processed aiming at the sampling time, and the invention uses the spherical linearInterpolation formula when the time stamp is coincident
Figure 991503DEST_PATH_IMAGE015
In the case of (2):
Figure 804738DEST_PATH_IMAGE016
wherein the definition for the Slerp formula is:
Figure 99453DEST_PATH_IMAGE017
(formula 2)
Wherein
Figure 464576DEST_PATH_IMAGE018
Represents from
Figure 439048DEST_PATH_IMAGE019
Is rotated to
Figure 383871DEST_PATH_IMAGE020
And (4) radian. So that the same time stamp as the camera video can be calculated
Figure 798671DEST_PATH_IMAGE021
At the moment
Figure 600274DEST_PATH_IMAGE022
. Since the gyroscope data is acquired in the 3D world coordinate system, the camera pose needs to be derived from the 2D image coordinates to which the extrinsic parameters are mapped in combination with the camera intrinsic parameters:
Figure 336411DEST_PATH_IMAGE023
(formula 3)
Figure 491449DEST_PATH_IMAGE024
(formula 4)
Figure 760756DEST_PATH_IMAGE025
Represents a reference matrix within the camera and is,
Figure 936523DEST_PATH_IMAGE026
is a matrix of the rotations of the camera,
Figure 455229DEST_PATH_IMAGE027
representing the focal length. The RNN network needs a history pose queue, and absolute pose information is obtained by the last step of calculation. However, relative rotation matrixes are required between pose and pose rotation transformation and in the network learning process, so that the relative rotation matrix obtained by gyroscope data is required. The advantage of this design is that the network model only needs to learn the initial changes, but is non-deformable for absolute poses. It has also been found that during training, more consistent visual effects and greater generalization ability can be achieved using relative information. The pose information passing through the alignment network is assisted by the relative pose of the MEMS, so that the rotation information is better learned, and high-frequency jitter is filtered in a time domain.
And further, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
In specific implementation, the stable transformation matrix result output by the network is aligned, and the jittered initial RGB color space data is obtained
Figure 413958DEST_PATH_IMAGE028
Warping the image to the pose corresponding to the rotation matrix result, namely the image stabilizing result of the current frame. The warping process here is performed by dividing the picture into a 12 × 12 grid. And respectively twisting the image in each grid to the pose after image stabilization. The image stabilization result has good uniformity and stability and original parallax property is maintained.
Further, the loss function in the embodiment of the present invention is calculated as follows:
transformation loss: this loss has two components, and the camera motion can be tracked in the initial stage in order for the network to learn the motion parameters first. Part of the matrix for rotationCalculated parameters
Figure 770727DEST_PATH_IMAGE029
Sum true value
Figure 914133DEST_PATH_IMAGE030
To find the L1 loss, another part is to transform the image before
Figure 123397DEST_PATH_IMAGE031
And the image after the parameter transformation from the network science
Figure 620238DEST_PATH_IMAGE032
The L1 loss was calculated.
Figure 660875DEST_PATH_IMAGE033
(formula 5)
Smoothing loss: based on the sampling time interval, the invention designs two parts in the smoothing loss part to constrain the camera trajectory. One is used to directly constrain the inter-frame displacement and the other expands the time interval to constrain the current frame to more closely fit the global displacement:
Figure 414329DEST_PATH_IMAGE034
(equation 6)
Figure 110890DEST_PATH_IMAGE035
(formula 7)
Loss of drawing: the smoothing effect of the network often brings the side effect of causing the picture to cross the actual picture boundary, and the invention designs the picture loss to directly punish. Wherein weight parameters conforming to a Gaussian distribution are combined
Figure 145842DEST_PATH_IMAGE036
The standard deviation is a preset value.
Figure 306565DEST_PATH_IMAGE037
Can control the appearance of the pictureTolerance parameters.
Figure 463877DEST_PATH_IMAGE038
Representing the number of future frames that can be merged into the calculation. For the definition of the function, four corners of pose after image stabilization can be evaluated
Figure 585417DEST_PATH_IMAGE039
Projected into actual camera space
Figure 256350DEST_PATH_IMAGE040
The twist angle and the maximum value of the frame edge distance are normalized, and this relative distance is calculated. This lossy design may control the sensitivity of the algorithm to camera motion.
Figure 22312DEST_PATH_IMAGE041
(formula 8)
Deformation loss: the most important one of the judgments of the anti-shake algorithm is deformation, because it will greatly reduce the original image quality.
Figure 708115DEST_PATH_IMAGE042
Is the spherical angle between the current image space and the true camera pose,
Figure 333263DEST_PATH_IMAGE043
is the threshold value of the threshold value,
Figure 631389DEST_PATH_IMAGE044
is a parameter that controls the slope of the logic function. Deformation losses only if the angular deviation is greater than a threshold value
Figure 376491DEST_PATH_IMAGE043
It is effective when it is used.
Figure 937922DEST_PATH_IMAGE045
(formula 9)
Optical flow loss: other loss functions are given to the image integral layerTo perform the calculation. The optical flow loss is applied to reduce the motion range between pixels in motion. In the calculation, the points of the actual camera space will be calculated
Figure 346905DEST_PATH_IMAGE046
Figure 792930DEST_PATH_IMAGE047
Conversion to virtual space
Figure 454855DEST_PATH_IMAGE048
And the corresponding relation between the pixel points is still tight after the warping operation in image stabilization. Therefore, the operation that hollow pixel points appear after warping and interpolation is needed is also avoided.
Figure 452767DEST_PATH_IMAGE049
(formula 10)
Figure 832933DEST_PATH_IMAGE050
(formula 11)
Figure 551490DEST_PATH_IMAGE052
(formula 12)
Total loss: since the invention is trained in stages, the respective weights of the loss functions need to be adjusted and referred to in each stage to achieve the purpose of training in the stage.
Figure 897283DEST_PATH_IMAGE053
(formula 13)
The method absorbs and integrates the advantages of a camera hardware system and a deep learning algorithm, can provide excellent video image stabilization effect in daily, parallax, running, fast rotation and crowd scenes, and corrects images at a pixel level. And original view angles are restored as far as possible, and high-quality videos with high stability, low screen capture ratio and low distortion are kept.
The embodiment of the invention has the following technical advantages:
the method for computing the dense optical flow by using the deep learning end-to-end CNN network is more robust than the traditional algorithm, and the obtained optical flow result precision (EPE) is higher.
The RNN network was first used to select historical and future camera pose data in the time domain. And performing fusion correction on the pose data in the airspace.
The MEMS gyroscope data provides more accurate rotation parameters for the camera on the basis of the existing 3DOF, and a 6DOF anti-shake algorithm is realized. This can be more closely related to the true motion of the camera and can supplement the camera data.
Index factors are fused in the loss function for the first time, the track is smooth and deformed, and the three hard indexes which are most concerned by shake prevention are drawn and directly fused into training. And control parameters are added, constraining but not excessively altering the actual scene.
Previous anti-shake algorithms only focus on the shake patterns associated with artificial motion. However, the camera itself is designed to be not subject to the rolling phenomenon, and the camera itself is not designed to be shaken. The invention has the function of correcting the rolling curtain phenomenon in the part of the light flow result.
The design of drawing the loss function not only directly controls the screen capture ratio, but also can better restore the view angle of the original video compared with other algorithms. This is not a concern of other anti-shake algorithms before.
The rotary matrix is used for representing the pose, and the parameter quantity and the calculated quantity are greatly reduced. And Slerp spherical linear interpolation is used for solving the problem of multi-sensor time synchronization.
It should be noted that, a certain order does not necessarily exist between the above steps, and those skilled in the art can understand, according to the description of the embodiments of the present invention, that in different embodiments, the above steps may have different execution orders, that is, may be executed in parallel, may also be executed interchangeably, and the like.
With reference to fig. 2, fig. 2 is a schematic diagram of a hardware structure of another embodiment of the deep learning-based hybrid anti-shake system according to an embodiment of the present invention, and as shown in fig. 2, the system 10 includes: a memory 101, a processor 102 and a computer program stored on the memory and executable on the processor, the computer program realizing the following steps when executed by the processor 101:
acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video;
inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network;
acquiring pose data of a camera;
inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
The specific implementation steps are the same as those of the method embodiments, and are not described herein again.
Optionally, the computer program when executed by the processor 101 further implements the steps of:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
The specific implementation steps are the same as those of the method embodiments, and are not described herein again.
Optionally, the computer program when executed by the processor 101 further implements the steps of:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
The specific implementation steps are the same as those of the method embodiments, and are not described herein again.
Optionally, the computer program when executed by the processor 101 further implements the steps of:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
The specific implementation steps are the same as those of the method embodiments, and are not described herein again.
Optionally, the computer program when executed by the processor 101 further realizes the steps of:
carrying out synchronous processing on the triaxial angular velocity data and the video data to generate synchronous triaxial angular velocity data and obtain a relative rotation matrix corresponding to the triaxial angular velocity data;
and inputting the synchronized triaxial angular velocity data and the output result of the bidirectional optical flow network into an alignment network, wherein the alignment network is an RNN (radio network) and comprises a forgetting stage, a selecting and memorizing stage and an outputting stage.
The specific implementation steps are the same as those of the method embodiments, and are not described herein again.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100 through S500 of fig. 1 described above.
By way of example, non-volatile storage media can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described in embodiments of the invention are intended to comprise one or more of these and/or any other suitable types of memory.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A hybrid anti-shake method based on deep learning is characterized by comprising the following steps:
acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video;
inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network;
acquiring pose data of a camera;
inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
2. The deep learning-based hybrid anti-shake method according to claim 1, wherein the acquiring a video taken by a camera, acquiring consecutive N-frame images based on the video, comprises:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
3. The deep learning-based hybrid anti-shake method according to claim 2, wherein the inputting of the consecutive N frames of images into the bidirectional optical flow network and obtaining the output result of the bidirectional optical flow network comprises:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
4. The deep learning-based hybrid anti-shake method according to claim 3, wherein the acquiring pose data of the camera comprises:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
5. The deep learning-based hybrid anti-shake method according to claim 4, wherein the inputting the output results of the bidirectional optical flow network and the pose data into an alignment network comprises:
performing synchronous processing on the triaxial angular velocity data and the video data to generate synchronized triaxial angular velocity data and obtain a relative rotation matrix corresponding to the triaxial angular velocity data;
and inputting the synchronized triaxial angular velocity data and the output result of the bidirectional optical flow network into an alignment network, wherein the alignment network is an RNN (radio network) and comprises a forgetting stage, a selecting and memorizing stage and an outputting stage.
6. A hybrid anti-shake system based on deep learning, the system comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of:
acquiring a video shot by a camera, and acquiring continuous N frames of images based on the video;
inputting continuous N frames of images into a bidirectional optical flow network to obtain an output result of the bidirectional optical flow network;
acquiring pose data of a camera;
inputting the output result of the bidirectional optical flow network and the pose data into an alignment network;
and acquiring an output result of the alignment network, warping the output result of the alignment network to a corresponding pose to obtain an image stabilization result of the current image frame, and finishing anti-shake operation.
7. The deep learning-based hybrid anti-shake system according to claim 6, wherein the computer program, when executed by the processor, further implements the steps of:
acquiring a video shot by a camera, and acquiring continuous 5-frame RGB images based on the video;
4 pairs of RGB color space data are generated based on the 5-frame RGB images.
8. The deep learning based hybrid anti-shake system according to claim 7, wherein the computer program, when executed by the processor, further implements the steps of:
inputting 4 pairs of RGB color space data into a bidirectional optical flow network, wherein the bidirectional optical flow network is a CNN network conforming to a UNet structure;
and acquiring 4 positive and negative optical flow results output by the bidirectional optical flow network.
9. The deep learning based hybrid anti-shake system according to claim 8, wherein the computer program, when executed by the processor, further implements the steps of:
acquiring initial triaxial angular velocity data of a camera based on the MEMS gyroscope;
and filtering the initial triaxial angular velocity data based on complementary filtering and Kalman filtering to generate pose data of the camera.
10. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the deep learning based hybrid anti-shake method of any one of claims 1-5.
CN202211077092.4A 2022-09-05 2022-09-05 Hybrid anti-shake method and system based on deep learning Pending CN115174817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211077092.4A CN115174817A (en) 2022-09-05 2022-09-05 Hybrid anti-shake method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211077092.4A CN115174817A (en) 2022-09-05 2022-09-05 Hybrid anti-shake method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN115174817A true CN115174817A (en) 2022-10-11

Family

ID=83481881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211077092.4A Pending CN115174817A (en) 2022-09-05 2022-09-05 Hybrid anti-shake method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN115174817A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012164303A (en) * 2010-12-23 2012-08-30 Samsung Electronics Co Ltd Digital image stabilization method utilizing adaptive filtering
CN109729263A (en) * 2018-12-07 2019-05-07 苏州中科广视文化科技有限公司 Video based on fusional movement model removes fluttering method
CN110490928A (en) * 2019-07-05 2019-11-22 天津大学 A kind of camera Attitude estimation method based on deep neural network
CN112967341A (en) * 2021-02-23 2021-06-15 湖北枫丹白露智慧标识科技有限公司 Indoor visual positioning method, system, equipment and storage medium based on live-action image
CN114429191A (en) * 2022-04-02 2022-05-03 深圳深知未来智能有限公司 Electronic anti-shake method, system and storage medium based on deep learning
WO2022125090A1 (en) * 2020-12-10 2022-06-16 Google Llc Enhanced video stabilization based on machine learning models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012164303A (en) * 2010-12-23 2012-08-30 Samsung Electronics Co Ltd Digital image stabilization method utilizing adaptive filtering
CN109729263A (en) * 2018-12-07 2019-05-07 苏州中科广视文化科技有限公司 Video based on fusional movement model removes fluttering method
CN110490928A (en) * 2019-07-05 2019-11-22 天津大学 A kind of camera Attitude estimation method based on deep neural network
WO2022125090A1 (en) * 2020-12-10 2022-06-16 Google Llc Enhanced video stabilization based on machine learning models
CN112967341A (en) * 2021-02-23 2021-06-15 湖北枫丹白露智慧标识科技有限公司 Indoor visual positioning method, system, equipment and storage medium based on live-action image
CN114429191A (en) * 2022-04-02 2022-05-03 深圳深知未来智能有限公司 Electronic anti-shake method, system and storage medium based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘煦: "先进视频稳像技术研究", 《上海交通大学硕士论文》 *

Similar Documents

Publication Publication Date Title
CN111133747B (en) Method and device for stabilizing video
CN108363946B (en) Face tracking system and method based on unmanned aerial vehicle
CN101616310B (en) Target image stabilizing method of binocular vision system with variable visual angle and resolution ratio
CN110493525B (en) Zoom image determination method and device, storage medium and terminal
US10764496B2 (en) Fast scan-type panoramic image synthesis method and device
CN107566688B (en) Convolutional neural network-based video anti-shake method and device and image alignment device
CN105611116B (en) A kind of global motion vector method of estimation and monitor video digital image stabilization method and device
JP6087671B2 (en) Imaging apparatus and control method thereof
CN110520694A (en) A kind of visual odometry and its implementation
CN107564063B (en) Virtual object display method and device based on convolutional neural network
CN112585644A (en) System and method for creating background blur in camera panning or movement
CN114175091A (en) Method for optimal body or face protection with adaptive dewarping based on context segmentation layer
WO2010151215A1 (en) Real time video stabilization
Wang et al. Video stabilization: A comprehensive survey
CN108900775A (en) A kind of underwater robot realtime electronic image stabilizing method
CN110060295B (en) Target positioning method and device, control device, following equipment and storage medium
CN114170290A (en) Image processing method and related equipment
Wang et al. Automated camera-exposure control for robust localization in varying illumination environments
US11531211B2 (en) Method for stabilizing a camera frame of a video sequence
CN116152121B (en) Curved surface screen generating method and correcting method based on distortion parameters
CN115174817A (en) Hybrid anti-shake method and system based on deep learning
JP2016110312A (en) Image processing method, and image processor and program
US10764500B2 (en) Image blur correction device and control method
KR101576426B1 (en) Apparatus and Method for surveillance using fish eyes lens
JP7013205B2 (en) Image shake correction device and its control method, image pickup device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination