CN114584785B - Real-time image stabilizing method and device for video image - Google Patents

Real-time image stabilizing method and device for video image Download PDF

Info

Publication number
CN114584785B
CN114584785B CN202210116810.8A CN202210116810A CN114584785B CN 114584785 B CN114584785 B CN 114584785B CN 202210116810 A CN202210116810 A CN 202210116810A CN 114584785 B CN114584785 B CN 114584785B
Authority
CN
China
Prior art keywords
image
data processing
processing module
frame image
perspective transformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210116810.8A
Other languages
Chinese (zh)
Other versions
CN114584785A (en
Inventor
鹿璇
李磊
曾意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhuomu Technology Co ltd
Original Assignee
Wuhan Zhuomu Technology Co ltd
Filing date
Publication date
Application filed by Wuhan Zhuomu Technology Co ltd filed Critical Wuhan Zhuomu Technology Co ltd
Priority to CN202210116810.8A priority Critical patent/CN114584785B/en
Publication of CN114584785A publication Critical patent/CN114584785A/en
Application granted granted Critical
Publication of CN114584785B publication Critical patent/CN114584785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a real-time image stabilizing method and device for video images, wherein the method comprises the following steps: the first data processing module receives the image sequence and determines a reference frame image; the second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and a reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage; the first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module; the second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image. The real-time image stabilizing method and device for the video image can improve algorithm efficiency by adopting hardware acceleration on the premise of ensuring calculation accuracy, and give consideration to timeliness and accuracy.

Description

Real-time image stabilizing method and device for video image
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for real-time image stabilization of video images.
Background
Video stabilization refers to processing an original video sequence acquired by video equipment to remove jitter between continuous frames. Compared with the traditional mechanical image stabilizing technology, the electronic image stabilizing technology has the advantages of low cost, low power consumption, high integration level, flexibility, easy configuration and modification and even higher image stabilizing precision. However, most of the existing electronic video image stabilization methods use software for centralized processing after video acquisition, so that the effect of the algorithm is more focused, and the real-time performance of the algorithm is less focused.
The principle of electronic image stabilization is to calculate the relative offset between the current frame and the reference frame, and then to compensate the current frame reversely, so as to achieve the purpose of image stabilization. In the prior art, motion estimation algorithms include a block matching method, an optical flow method and the like. The block matching algorithm usually divides a frame of image into a plurality of small blocks for matching, and defaults that pixels in the divided small blocks all move towards one direction to solve a global optimal offset, but the depth change of the pixels is ignored. The more complex the block matching algorithm, the higher the accuracy and the more time consuming it is. The optical flow method utilizes the visual movement of brightness information in the image, and calculates displacement vectors of front and rear frames of pixel points by adopting calculus solution to estimate the movement of the image. The more feature points calculated by the optical flow method, the higher the calculation accuracy, and the time consumption of the algorithm is serious, so that the real-time application is difficult to achieve. Therefore, the electronic image stabilization method in the prior art cannot simultaneously achieve timeliness and accuracy.
Disclosure of Invention
The invention provides a real-time image stabilizing method and device for video images, which are used for solving the defect that timeliness and accuracy cannot be simultaneously considered in the prior art, realizing the operation with large calculation amount processed by an FPGA and the operation with small calculation amount processed by an ARM under the platform architecture of the FPGA and the ARM, and improving algorithm efficiency by adopting hardware acceleration and considering timeliness and accuracy.
The invention provides a real-time image stabilizing method of a video image, which comprises the following steps:
The first data processing module receives the image sequence and determines a reference frame image;
The second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and the reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage;
The first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module;
The second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image;
The image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
According to the real-time image stabilizing method of video image provided by the invention, the motion estimation is carried out on the current frame image and the reference frame image, and the motion vector is obtained, which comprises the following steps:
A target module in the second data processing module acquires a matching point set according to the current frame image and the reference frame image;
and the target module screens the matching point set to obtain a target matching point set, and obtains the motion vector based on the target matching point set.
According to the real-time image stabilizing method for video images provided by the invention, the vector set is subjected to smoothing processing to obtain a perspective transformation matrix, and the method comprises the following steps:
under the condition that the total number of vectors of the vector set is larger than a preset threshold value, the first data processing module intercepts a target vector set from the vector set;
The first data processing module obtains the perspective transformation matrix by using a smoothing filter based on the target vector set;
wherein the set of target vectors consists of said motion vectors which are consecutive under the number of targets.
According to the real-time image stabilizing method for video images provided by the invention, the vector set is subjected to smoothing processing to obtain a perspective transformation matrix, and the method comprises the following steps: and under the condition that the total number of vectors of the vector set is smaller than or equal to a preset threshold value, the first data processing module acquires the perspective transformation matrix by using a smoothing filter based on the vector set.
According to the real-time image stabilizing method of the video image provided by the invention, the matching point set is obtained according to the current frame image and the reference frame image, and the method comprises the following steps:
acquiring a first feature point based on the reference frame image;
processing by using a pyramid optical flow method based on the first characteristic points to obtain second characteristic points;
acquiring the matching point set based on the first characteristic point and the second characteristic point;
Wherein the number of the first feature points includes one or more, and the second feature points correspond to the current frame image.
According to the real-time image stabilizing method for the video image, the target module is built based on a Vivado HLS tool.
The invention also provides a real-time image stabilizing device of the video image, which comprises:
the first data processing module is used for receiving the image sequence and determining a reference frame image;
the second data processing module is used for reading a current frame image from the image sequence, carrying out motion estimation on the current frame image and the reference frame image, obtaining a motion vector, and sending the motion vector to the first data processing module for storage;
The first data processing module is further configured to store the received motion vector into a vector set, perform smoothing processing on the vector set, obtain a perspective transformation matrix, and send the perspective transformation matrix to the second data processing module;
The second data processing module is further configured to receive the perspective transformation matrix, and output a target image after image stabilization processing by using the perspective transformation matrix and the reference frame image;
The image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the real-time image stabilizing method of any one of the video images are realized when the processor executes the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of real-time image stabilization of a video image as described in any one of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a method for real-time image stabilization of a video image as described in any one of the above.
According to the real-time image stabilizing method and device for the video image, based on the ARM and the FPGA, the reference frame image of the image sequence is determined through the ARM, the FPGA executes complex motion estimation between the current frame image and the reference frame image to obtain the motion vector, the ARM performs simple smoothing on the motion vector to obtain the perspective transformation matrix, the FPGA performs complex perspective transformation, and the target image after image stabilizing is output. On the premise of ensuring calculation accuracy, hardware acceleration is adopted to improve algorithm efficiency, and timeliness and accuracy are considered.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a real-time image stabilizing method of a video image provided by the invention;
FIG. 2 is a schematic flow chart of a pyramid optical flow method in the real-time image stabilization method of video images provided by the invention;
Fig. 3 is a schematic structural diagram of a real-time image stabilizing device for video images according to the present invention;
Fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more.
It is to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Fig. 1 is a flow chart of a real-time image stabilizing method for video images provided by the invention. As shown in fig. 1, the real-time image stabilizing method for video image provided by the embodiment of the invention includes: step 101, a first data processing module receives an image sequence and determines a reference frame image.
The image sequence comprises video image data acquired in real time, and the first data processing module comprises an ARM.
It should be noted that, the execution subject of the real-time image stabilizing method for video images provided by the embodiment of the invention is a real-time image stabilizing device for video images.
The real-time image stabilizing method of the video image is suitable for electronic equipment with the real-time image stabilizing device of the video image, and is used for outputting the acquired video image after electronic image stabilization, so that the actual appearance is improved.
The electronic device described above may be implemented in various forms. For example, the electronic device described in the embodiments of the present application may be a terminal device integrating a real-time image stabilizing device and a video capturing device for video images, such as a mobile terminal of a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, a smart bracelet, a smart watch, a digital camera, or the like.
The electronic device described in the embodiment of the application can also be a terminal device for setting a real-time image stabilizing device of a video image, and the terminal needs to be in communication connection with a video acquisition device. A fixed terminal such as a desktop computer or the like. In the following, it is assumed that the electronic device is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for a moving purpose.
The image sequence is a sequence of continuous still images, which is a processing object of the real-time image stabilizing device for video images. The image sequence is used to characterize the video acquired by the electronic device.
The first data processing module refers to a component in the real-time image stabilizing device of the video image. The first data processing module is used for executing simple instructions with smaller operand and is one of executing units of a real-time image stabilizing method of video images.
The first data processing module includes, but is not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a reduced instruction set computer (Reduced Instruction Set Computer, RISC), and like processor elements.
Preferably, the first data processing module is an advanced RISC machine (ADVANCED RISC MACHINES, ARM).
Specifically, in step 101, the first data processing module receives, in real time, an image sequence acquired by the video acquisition device, initializes the image sequence, and indexes the image sequence according to an image sequence number, so as to set a reference frame image.
The method for setting the reference frame image according to the embodiment of the invention is not particularly limited.
For any group of image sequences to be stabilized, the first frame is taken as an initial reference frame, traversal is performed in the image sequences, and the electronic image stabilizing processing image of the corresponding video is not calculated until the last frame in the image sequences is processed as a reference image.
For example, for any group of image sequences to be stabilized, preprocessing may be performed first, and after the useless image frames are removed, a new image sequence is formed. And traversing the image sequence by taking the first frame of the new image sequence as an initial reference frame until the last frame in the image sequence is used as a reference image for processing, and calculating to finish electronic image stabilizing processing of the corresponding video.
Step 102, the second data processing module reads the current frame image from the image sequence, performs motion estimation on the current frame image and the reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage.
Wherein the second data processing module comprises an FPGA.
The second data processing module refers to another component except the first data processing module in the real-time image stabilizing device of the video image. The second data processing module is used for executing complex instructions with larger calculation amount, and is one of execution units of the real-time image stabilizing method of the video image.
The first data processing module includes, but is not limited to, programmable devices such as programmable array Logic (Programmable Array Logic, PAL), general purpose array Logic (GENERIC ARRAY Logic, GAL), complex programmable Logic devices (Complex Programming Logic Device, CPLD), and the like.
Preferably, the second data processing module is a field programmable gate array (Field Programmable GATE ARRAY, FPGA).
Therefore, before step 102, the compiler needs to map the business logic of the motion estimation algorithm into many basic gates, and nor gates, registers, latches, etc. inside the FPGA. I.e. the instructions in the software algorithm are compiled into machine codes and assembly codes which can be processed by the computer, and then the circuit is formed.
Specifically, in step 102, the second data processing module takes the image sequence number corresponding to the reference frame as an initial index, reads images adjacent to the reference frame image in the image sequence according to a specified order, takes the currently read image as a current frame image, performs motion estimation on the current frame image and the reference frame image, acquires a motion vector corresponding to the adjacent frames, and sends the motion vector to the first data processing module for storage.
The motion vector refers to a relative offset of a spatial position between a reference frame image and a most similar spatial position in a current frame image according to a certain matching criterion by taking the spatial position of the reference frame image as a benchmark, and the relative offset comprises variation components of an x direction, a y direction and an angle. The motion vectors are used for motion compensation between adjacent image frames.
It will be appreciated that the reference frame image may lead or lag the current frame image in the image sequence, and the reading mode of the current frame is not particularly limited in the embodiment of the present invention.
For example, the current frame image may be the next frame image of the reference frame image, and then backward motion estimation is performed to obtain a motion vector.
For example, if the current frame image is a frame image before the reference frame image, forward motion estimation is performed to obtain a motion vector.
And 103, the first data processing module stores the received motion vector into a vector set, performs smoothing processing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module.
The vector set refers to a database that is maintained independently. The vector set is used for sequentially storing motion vectors between different reference frame images and current frame images.
Specifically, in step 103, the first data processing module performs euclidean transformation using the received motion vector, obtains a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module.
Preferably, in step 103, the first data processing module stores each received set of motion vectors into a vector set, and performs a mean filtering process on all motion vectors in the vector set. And obtaining a smooth motion track after mean value filtering. The smoothed motion trajectory is used for calculation, and a smoothed euclidean transformation can be obtained and represented by a perspective transformation matrix. The embodiment of the invention does not limit the transformed perspective transformation matrix in detail.
Illustratively, assuming that the motion vector of the current frame to reference frame smoothed is represented by (x, y, θ), the corresponding perspective transformation matrix T may be represented by the following formula:
step 104, the second data processing module receives the perspective transformation matrix, and outputs the target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image.
Specifically, in step 104, the second data processing module performs perspective transformation on the reference frame image by using the received perspective transformation matrix, so as to obtain a smoothed reference frame image, and uses the smoothed reference frame image as a target image output by the real-time image stabilizing device of the video image, so as to correct jitter in the original reference frame image.
The embodiment of the invention does not limit the perspective transformation process in detail. Illustratively, the transformation is performed with reference to the following formula:
[x′,y′,z′]=[u′,v′,w′]*T
wherein, [ x ', y', z '] represents the coordinates of any spatial position in the original reference frame image, and transformed coordinates [ u', v ', w' ] are obtained by multiplying the coordinates by the perspective transformation matrix T, thereby completing perspective coordinate transformation, generating an image after image stabilization, and being capable of being updated as a new reference frame.
It can be understood that after the target image after the image stabilizing process is output, the method further includes using the target image after the current image stabilizing process as a new reference frame image, and continuing to execute the above steps of the real-time image stabilizing method of the video image until the image stabilizing process is completed on the last frame image in the image sequence.
According to the embodiment of the invention, under the architecture based on ARM and FPGA, the reference frame image of the image sequence is determined through ARM, the FPGA executes complex motion estimation between the current frame image and the reference frame image, the motion vector is obtained, the ARM is used for carrying out simple smoothing processing on the motion vector, the perspective transformation matrix is obtained, the FPGA is used for carrying out complex perspective transformation processing, and the target image after image stabilization processing is output. On the premise of ensuring calculation accuracy, hardware acceleration is adopted to improve algorithm efficiency, and timeliness and accuracy are considered.
On the basis of any one of the above embodiments, performing motion estimation on the current frame image and the reference frame image to obtain a motion vector, including: and the target module in the first data processing module acquires a matching point set according to the current frame image and the reference frame image.
It should be noted that the target module refers to a data processing unit in the first data processing module. Illustratively, where the second data processing module is an FPGA, the target module is correspondingly an IP core in the FPGA.
Specifically, in step 102, the target module extracts feature points from the reference frame image, and transmits the current frame image and the reference frame image data to the target module for calculation, so as to obtain a matching point set of the current frame and the reference frame.
The mode of acquiring the feature points in the embodiment of the invention is not particularly limited.
Illustratively, the reference frame image may be sampled with an angular point detector for motion estimation with sparse optical flow.
For example, uniform acquisition points may be performed at certain intervals on the reference frame image for motion estimation by dense optical flow.
And the target module screens the matching point set to obtain a target matching point set, and obtains a motion vector based on the target matching point set.
Specifically, in step 102, the target module performs iterative screening on the above-mentioned matching point set, retains the background static point and excludes the foreground dynamic local outside point, and finally obtains a screened target matching point set, and finds euclidean transformation from the coordinate system of the mapping reference frame image to the coordinate system of the current frame image by using the target matching point set, namely, the euclidean transformation is used as a motion vector between two adjacent frames.
Screening methods for the matching point set include, but are not limited to, random sampling Consensus algorithm (Random Sample Consensus, RANSAC), M-estimator sample Consensus algorithm (M-ESTIMATE SAMPLE Consensus, MSAC), or minimum median method (LEAST MEDIAN of square, LMedS) to reject mismatching points, which are not particularly limited in this embodiment of the present invention.
It can be understood that the target module (the optical flow method IP core) is transplanted to other series of FPGAs, and motion estimation between the current frame image and the reference frame image can be realized.
Correspondingly, the execution main body of the step 104 is perspective transformation IP core in the second data processing module, the IP core is independent of the IP core setting of the optical flow method, and the IP core is transplanted to other series of FPGAs to realize perspective transformation of the image to be stabilized to obtain the target image after image stabilization. And updating the original image to be stabilized into a reference frame.
According to the embodiment of the invention, the target module is arranged in the FPGA, the target module is used for motion estimation to obtain the matching point set, the matching point set is subjected to iterative screening to obtain the target matching point set, and the motion vector is obtained. The motion estimation process with high operation complexity can be executed by the independent IP core, the portability of the program is improved, the development period is accelerated, and the algorithm migration is facilitated. Furthermore, on the premise of ensuring the calculation precision, hardware acceleration is adopted to improve the algorithm efficiency, and timeliness and accuracy are both considered.
On the basis of any one of the above embodiments, smoothing the vector set to obtain a perspective transformation matrix, including: and under the condition that the total number of vectors in the vector set is larger than a preset threshold value, the first data processing module intercepts the target vector set from the vector set.
Wherein the set of target vectors consists of consecutive motion vectors for the number of targets.
Specifically, in step 103, the total number of motion vectors stored in the vector set is matched with a preset threshold, and there are two types of matching results: match success and match failure.
The successful matching is that the total number of the motion vectors stored in the vector set is greater than a preset threshold, namely that the number of the processed image frames in the image sequence is excessive can be determined, in order to reduce the calculated amount, the sequence number of the motion vectors corresponding to the current reference frame is used as an initial index, the motion vectors corresponding to the continuous historical reference frames with the target number are extracted from back to front in the vector set, and the motion vectors are sequentially stored in the target vector set.
Wherein the target number is less than or equal to a preset threshold.
The failure of matching is a situation that the total number of motion vectors stored in the vector set is not greater than a preset threshold, that is, the number of image frames processed in the image sequence can be determined to be within an acceptable computational power range, and then all the motion vectors in the vector set can be directly processed.
The first data processing module obtains a perspective transformation matrix based on the set of target vectors using a smoothing filter.
Specifically, the first data processing module processes all motion vectors in the target vector set by using a smoothing filter, and performs euclidean transformation on the obtained smoothed motion track to obtain a perspective transformation matrix.
Smoothing filters include, but are not limited to, median filters, neighborhood mean filters, and the like.
Preferably, the first data processing module adopts an average filter to all the motion vectors, and performs euclidean transformation on the smoothed motion trail obtained after the average filter to obtain a corresponding perspective transformation matrix.
Under the condition that the total number of vectors in the vector set is larger than a preset threshold, the embodiment of the invention cuts out the target vector set from the vector set in the ARM, and obtains the perspective transformation matrix by carrying out smoothing processing on the target vector set. The ARM can execute the perspective transformation matrix calculation process with low operation complexity. Furthermore, on the premise of ensuring the calculation precision, hardware acceleration is adopted to improve the algorithm efficiency, and timeliness and accuracy are both considered.
On the basis of any one of the above embodiments, smoothing the vector set to obtain a perspective transformation matrix, including: and under the condition that the total number of vectors in the vector set is smaller than or equal to a preset threshold value, the first data processing module acquires a perspective transformation matrix by using a smoothing filter based on the vector set.
Specifically, in step 103, the total number of motion vectors stored in the vector set is matched with a preset threshold, and the matching result is that the matching fails.
The failure of matching is that the total number of motion vectors stored in the vector set is smaller than or equal to a preset threshold value, namely, the number of image frames processed in the image sequence can be determined to be in an acceptable calculation force range, then all the motion vectors in the vector set can be directly processed by a smoothing filter, after mean value filtering, euclidean transformation is carried out on the obtained smoothed motion track, and a perspective transformation matrix is obtained.
The present invention is not particularly limited to this process. Taking an example that the image sequence contains 40 frames of images and the initial reference frame is the 1 st frame, setting the preset threshold value and the target number to be 30, and giving a specific implementation manner of the above process:
Case one:
In the case that the current reference frame image is the 36 th frame of the image sequence, the vector set stored in the ARM in step 103 already contains 35 groups of motion vectors, i.e. the motion vectors of the 1 st frame to the 2 nd frame, the motion vectors of the 2 nd frame to the 3 rd frame, and so on, and the last group of motion vectors is the motion vectors of the 35 th frame to the 36 th frame, which is 35 groups in total.
Since the total number of motion vectors in the vector set is greater than a preset threshold (i.e. 30), the vector set is truncated, the motion vectors from 35 th frame to 36 th frame are taken as initial index points, the motion vectors with the continuous target number (i.e. 30) are selected from the back to the front according to the arrangement sequence in the vector set as target motion vectors, the target motion vectors comprise the motion vectors from 25 th frame to 26 th frame, the motion vectors from 26 th frame to 27 th frame, and the like, and the final group of motion vectors is the motion vectors from 35 th frame to 36 th frame, which are 30 groups in total. And performing smoothing processing by using the 30 groups of motion vectors to obtain a perspective transformation matrix.
And a second case:
In the case that the current reference frame image is the 5 th frame of the image sequence, the set of vectors stored in the ARM in step 103 already contains 4 sets of motion vectors, namely, the 1 st to 2 nd frame motion vectors, the 2 nd to 3 rd frame motion vectors, the 3 rd to 4 th frame motion vectors, and the 4 th to 5 th frame motion vectors, which are 4 sets in total.
Since the total number of motion vectors in the vector sets is smaller than a preset threshold (i.e. 30), the four sets of vectors are directly smoothed to obtain the perspective transformation matrix.
In the embodiment of the invention, under the condition that the total number of vectors in the vector set is smaller than or equal to the preset threshold value, the perspective transformation matrix is obtained by directly carrying out smoothing processing on the vector set in the ARM. The ARM can execute the perspective transformation matrix calculation process with low operation complexity. Furthermore, on the premise of ensuring the calculation precision, hardware acceleration is adopted to improve the algorithm efficiency, and timeliness and accuracy are both considered.
On the basis of any one of the above embodiments, obtaining a set of matching points according to the current frame image and the reference frame image includes: based on the reference frame image, a first feature point is acquired.
Wherein the number of first feature points includes one or more.
Specifically, the first module collects feature points in the reference frame image to obtain one or more first feature points.
The embodiment of the invention does not specifically limit the process of acquiring the first feature point.
Preferably, the uniform sampling points are performed at equal intervals in the reference frame image.
And processing by using a pyramid optical flow method based on the first characteristic points to obtain second characteristic points.
Wherein the second feature point corresponds to the current frame image.
Specifically, the first module performs feature point matching on the collected first feature points, and matches second feature points corresponding to the first feature points in the current frame image.
The embodiment of the invention does not limit the characteristic point matching algorithm in detail.
Preferably, the pyramid optical flow method is adopted in the first module to process, and the second characteristic points matched with the first characteristic points are obtained. The embodiment of the present invention is not particularly limited to this process.
Fig. 2 is a schematic flow chart of a pyramid optical flow method in the real-time image stabilizing method of the video image provided by the invention. Illustratively, as shown in fig. 2, a process of acquiring the second feature point will be described taking a pyramid structure having three layers as an example:
(1) And respectively solving the corresponding three-layer pyramid feature map according to the two input frames of images, and up-sampling from the original map, wherein the three-layer pyramid feature map is shown in a three-layer structure in fig. 2.
(2) According to the first feature point coordinates p of the input 0 th layer image (namely the original reference frame image), calculating coordinates corresponding to the feature points of the other two layers of pyramids: p 1 and p 2.
(3) An initial optical flow coordinate m 0,m1,m2 is calculated, where m 0=p,m1=p1,m2=p2.
(4) The optical flow endpoint n 2 at layer 2 is found with m 2 as input.
(5) The corresponding coordinates n 1 of n 2 at layer 1 are calculated, and the optical flow end point q 1 at layer 1 is obtained by taking n 1 as an input.
(6) The corresponding coordinate q 0 of q 1 at the first layer is calculated, and the optical flow end point q, i.e., the second feature point, at the 0 th layer is obtained by taking q 0 as an input.
And acquiring a matching point set based on the first characteristic point and the second characteristic point.
Specifically, any group of matched first characteristic points and second characteristic points are stored into a matched point set.
It can be understood that the plurality of sets of the matched first feature point p and the second feature point q are stored in the matching point set S. And (3) screening the matching point set S, performing iterative computation on the matching point set S by using a RANSAC algorithm, reserving background static points, and removing foreground dynamic external points at the same time, and finally obtaining a screened target matching point set S'.
The embodiment of the invention is based on the fact that a pyramid optical flow method is utilized in a target module, a second characteristic point matched with a first characteristic point is obtained, a matching point set is obtained through the first characteristic point and the second characteristic point, further, the target matching point set is obtained through iterative screening of the matching point set, and a motion vector is obtained. The motion estimation process with high operation complexity can be executed by the independent IP core, the portability of the program is improved, the development period is accelerated, and the algorithm migration is facilitated. Furthermore, on the premise of ensuring the calculation precision, hardware acceleration is adopted to improve the algorithm efficiency, and timeliness and accuracy are both considered.
On the basis of any of the above embodiments, the target module is built based on a Vivado HLS tool.
Specifically, before step 102, the compiler performs development design of the target module by using a hardware description language based on the algorithm business logic of the pyramid optical flow method, so that the second data processing module can call or transplant the target module.
The hardware description language includes, but is not limited to, very High-SPEED INTEGRATED Circuit Hardware Description Language (VHDL), verilog hardware description language (Verilog Hardware Description Language, verilog HDL), or SystemC, which are not particularly limited to embodiments of the invention.
Preferably, the development and design of the target module (namely, the IP core of the optical flow method) are carried out by adopting a Vivado HLS tool of Xilinx, and the target module can be called and transplanted in an FPGA platform.
It can be understood that the code implementation is performed in HLS by the pyramid optical flow method and the perspective transformation matrix application, and module design optimization is performed by adopting optimization instructions such as cyclic expansion optimization, block storage and the like, so as to finally obtain an optical flow method IP core and a perspective transformation IP core.
The two IP cores are realized by adopting fixed point decimal in the FPGA, so that the accuracy of the original algorithm is reserved, and the speed of the algorithm is improved. Therefore, the image stabilizing method described by the invention achieves further improvement on implementation efficiency, and real-time image stabilizing processing can be performed on high-resolution video, such as 1920 x 1080 resolution video images.
The embodiment of the invention optimizes the resource occupation condition and the calculation efficiency of the target module based on the HLS optimization instruction in the Vivado HLS tool, can facilitate the transplantation of the target module to other series of FPGAs, improves the portability of programs, accelerates the development period and facilitates the algorithm migration.
Fig. 3 is a schematic structural diagram of a real-time image stabilizing device for video images provided by the invention. On the basis of any of the above embodiments, as shown in fig. 3, the apparatus includes a first data processing module 310 and a second data processing module 320, wherein:
a first data processing module 310 is configured to receive the image sequence and determine a reference frame image.
The second data processing module 320 is configured to read a current frame image from the image sequence, perform motion estimation on the current frame image and the reference frame image, obtain a motion vector, and send the motion vector to the first data processing module for storage.
The first data processing module 310 is further configured to store the received motion vector in a vector set, perform smoothing on the vector set, obtain a perspective transformation matrix, and send the perspective transformation matrix to the second data processing module.
The second data processing module 320 is further configured to receive the perspective transformation matrix, and output the image-stabilized target image by using the perspective transformation matrix and the reference frame image.
The image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
Specifically, the first data processing module 310 and the second data processing module 320 are electrically connected in sequence.
The first data processing module 310 receives the image sequence acquired by the video acquisition device in real time, initializes the image sequence, and indexes the image sequence according to the image sequence number to set the reference frame image.
The second data processing module 320 reads images adjacent to the reference frame image in the image sequence according to the designated order by using the image sequence number corresponding to the reference frame as an initial index, takes the currently read image as the current frame image, performs motion estimation on the current frame image and the reference frame image, acquires a motion vector corresponding to the adjacent frames, and sends the motion vector to the first data processing module 310 for storage.
The first data processing module 310 performs euclidean transformation using the received motion vectors to obtain a perspective transformation matrix, and transmits the perspective transformation matrix to the second data processing module 320.
The second data processing module 320 further performs perspective transformation on the reference frame image by using the received perspective transformation matrix, so as to obtain a smoothed reference frame image, and uses the smoothed reference frame image as an output image of the real-time image stabilizing device of the video image, so as to correct jitter in the original reference frame image.
Optionally, the second data processing module 320 further comprises a target module, wherein:
And the target module is used for acquiring a matching point set according to the current frame image and the reference frame image.
And the target module is also used for screening the matching point set to obtain a target matching point set and obtaining a motion vector based on the target matching point set.
Optionally, the first data processing module 310 further includes an interception unit and a first smoothing unit, wherein:
the intercepting unit is used for intercepting the target vector set from the vector set under the condition that the total number of vectors in the vector set is larger than a preset threshold value.
And the first smoothing unit is used for acquiring a perspective transformation matrix by utilizing a smoothing filter based on the target vector set.
Optionally, the first data processing module 310 further comprises a second smoothing unit, wherein:
And the second smoothing unit is used for acquiring the perspective transformation matrix by using a smoothing filter based on the vector set under the condition that the total number of vectors in the vector set is smaller than or equal to a preset threshold value.
Wherein the set of target vectors consists of consecutive motion vectors for the number of targets.
Optionally, the target module includes: the system comprises a point acquisition subunit, an optical flow processing subunit and a matching set acquisition subunit, wherein:
and the acquisition point subunit is used for acquiring the first characteristic points based on the reference frame image.
And the optical flow processing subunit is used for processing the first characteristic points by using a pyramid optical flow method to obtain second characteristic points.
And the matching set acquisition subunit is used for acquiring a matching point set based on the first characteristic point and the second characteristic point.
Wherein the number of the first feature points includes one or more, and the second feature points correspond to the current frame image.
Optionally, the target module is built based on a Vivado HLS tool.
The real-time image stabilizing device for the video image provided by the embodiment of the invention is used for executing the real-time image stabilizing method for the video image, the implementation mode of the real-time image stabilizing device for the video image is consistent with that of the real-time image stabilizing method for the video image provided by the invention, and the same beneficial effects can be achieved, and the description is omitted here.
According to the embodiment of the invention, under the architecture based on ARM and FPGA, the reference frame image of the image sequence is determined through ARM, the FPGA executes complex motion estimation between the current frame image and the reference frame image, the motion vector is obtained, the ARM is used for carrying out simple smoothing processing on the motion vector, the perspective transformation matrix is obtained, the FPGA is used for carrying out complex perspective transformation processing, and the target image after image stabilization processing is output. On the premise of ensuring calculation accuracy, hardware acceleration is adopted to improve algorithm efficiency, and timeliness and accuracy are considered.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430, and communication bus 440, wherein processor 410, communication interface 420, and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a method of real-time image stabilization of a video image, the method comprising: the first data processing module receives the image sequence and determines a reference frame image; the second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and a reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage; the first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module; the second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image; the image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method for real-time image stabilization of a video image provided by the above methods, the method comprising: the first data processing module receives the image sequence and determines a reference frame image; the second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and a reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage; the first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module; the second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image; the image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method for real-time image stabilization of video images provided by the above methods, the method comprising: the first data processing module receives the image sequence and determines a reference frame image; the second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and a reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage; the first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module; the second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image; the image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for real-time image stabilization of a video image, comprising:
The first data processing module receives the image sequence and determines a reference frame image;
The second data processing module reads a current frame image from the image sequence, carries out motion estimation on the current frame image and the reference frame image, acquires a motion vector, and sends the motion vector to the first data processing module for storage;
The first data processing module stores the received motion vector into a vector set, performs smoothing on the vector set, acquires a perspective transformation matrix, and sends the perspective transformation matrix to the second data processing module;
The first data processing module stores each received group of motion vectors into a vector set, performs mean filtering processing on all the motion vectors in the vector set, obtains a smoothed motion track after mean filtering, calculates by using the smoothed motion track, obtains smoothed Euclidean transformation and represents by using a perspective transformation matrix;
The second data processing module receives the perspective transformation matrix and outputs a target image after image stabilization processing by utilizing the perspective transformation matrix and the reference frame image;
The image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA;
The step of performing motion estimation on the current frame image and the reference frame image to obtain a motion vector includes: a target module in the second data processing module acquires a matching point set according to the current frame image and the reference frame image; the target module screens the matching point set to obtain a target matching point set, and obtains the motion vector based on the target matching point set;
The second data processing module receives the perspective transformation matrix, and outputs a target image after image stabilization processing by using the perspective transformation matrix and the reference frame image, wherein the target image comprises: the target module in the second data processing module performs perspective transformation on the reference frame image according to the perspective transformation matrix and the reference frame image to obtain a smoothed reference frame image and outputs the smoothed reference frame image as a target image;
The method comprises the steps of setting a plurality of target modules on an FPGA, establishing the target modules based on a Vivado HLS tool, respectively using the target modules for optical flow method calculation and perspective matrix calculation, enabling programs related to the two target modules to be mutually independent, implementing code implementation in HLS, and implementing module design optimization by adopting cyclic expansion optimization and partitioning storage optimization instructions.
2. The method for real-time image stabilization of a video image according to claim 1, wherein the smoothing the vector set to obtain a perspective transformation matrix comprises:
under the condition that the total number of vectors of the vector set is larger than a preset threshold value, the first data processing module intercepts a target vector set from the vector set;
The first data processing module obtains the perspective transformation matrix by using a smoothing filter based on the target vector set;
wherein the set of target vectors consists of said motion vectors which are consecutive under the number of targets.
3. The method for real-time image stabilization of a video image according to claim 1, wherein the smoothing the vector set to obtain a perspective transformation matrix comprises: and under the condition that the total number of vectors of the vector set is smaller than or equal to a preset threshold value, the first data processing module acquires the perspective transformation matrix by using a smoothing filter based on the vector set.
4. The method for real-time image stabilization of a video image according to claim 1, wherein the acquiring a set of matching points from the current frame image and the reference frame image comprises:
acquiring a first feature point based on the reference frame image;
processing by using a pyramid optical flow method based on the first characteristic points to obtain second characteristic points;
acquiring the matching point set based on the first characteristic point and the second characteristic point;
Wherein the number of the first feature points includes one or more, and the second feature points correspond to the current frame image.
5. A real-time image stabilization device for video images, applying the real-time image stabilization method for video images according to any one of claims 1 to 4, comprising:
the first data processing module is used for receiving the image sequence and determining a reference frame image;
the second data processing module is used for reading a current frame image from the image sequence, carrying out motion estimation on the current frame image and the reference frame image, obtaining a motion vector, and sending the motion vector to the first data processing module for storage;
The first data processing module is further configured to store the received motion vector into a vector set, perform smoothing processing on the vector set, obtain a perspective transformation matrix, and send the perspective transformation matrix to the second data processing module;
The second data processing module is further configured to receive the perspective transformation matrix, and output a target image after image stabilization processing by using the perspective transformation matrix and the reference frame image;
The image sequence comprises video image data acquired in real time, the first data processing module comprises an ARM, and the second data processing module comprises an FPGA.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the real-time image stabilization method of a video image according to any one of claims 1 to 4 when the program is executed by the processor.
7. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the real-time image stabilization method of a video image according to any one of claims 1 to 4.
8. A computer program product, characterized in that the computer program product, when being executed by a processor, implements the steps of a method for real-time image stabilization of video images according to any one of claims 1 to 4.
CN202210116810.8A 2022-02-07 Real-time image stabilizing method and device for video image Active CN114584785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210116810.8A CN114584785B (en) 2022-02-07 Real-time image stabilizing method and device for video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210116810.8A CN114584785B (en) 2022-02-07 Real-time image stabilizing method and device for video image

Publications (2)

Publication Number Publication Date
CN114584785A CN114584785A (en) 2022-06-03
CN114584785B true CN114584785B (en) 2024-07-02

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287819A (en) * 2020-10-28 2021-01-29 武汉三力通信有限责任公司 High-speed multi-channel real-time image stabilizing method for video recording equipment
CN113255538A (en) * 2021-06-01 2021-08-13 大连理工大学 FPGA-based infrared small and weak target detection tracking device and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287819A (en) * 2020-10-28 2021-01-29 武汉三力通信有限责任公司 High-speed multi-channel real-time image stabilizing method for video recording equipment
CN113255538A (en) * 2021-06-01 2021-08-13 大连理工大学 FPGA-based infrared small and weak target detection tracking device and method

Similar Documents

Publication Publication Date Title
CN109271933B (en) Method for estimating three-dimensional human body posture based on video stream
CN111105352B (en) Super-resolution image reconstruction method, system, computer equipment and storage medium
CN111652966B (en) Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
WO2020001168A1 (en) Three-dimensional reconstruction method, apparatus, and device, and storage medium
CN111598993B (en) Three-dimensional data reconstruction method and device based on multi-view imaging technology
CN107749987B (en) Digital video image stabilization method based on block motion estimation
CN111160298B (en) Robot and pose estimation method and device thereof
CN108073857A (en) The method and device of dynamic visual sensor DVS event handlings
CN111080776B (en) Human body action three-dimensional data acquisition and reproduction processing method and system
CN112634163A (en) Method for removing image motion blur based on improved cycle generation countermeasure network
CN115294275A (en) Method and device for reconstructing three-dimensional model and computer readable storage medium
CN110853071A (en) Image editing method and terminal equipment
CN112183506A (en) Human body posture generation method and system
CN116310046B (en) Image processing method, device, computer and storage medium
CN113362338A (en) Rail segmentation method, device, computer equipment and rail segmentation processing system
CN115222889A (en) 3D reconstruction method and device based on multi-view image and related equipment
CN114584785B (en) Real-time image stabilizing method and device for video image
CN112489103A (en) High-resolution depth map acquisition method and system
CN116704123A (en) Three-dimensional reconstruction method combined with image main body extraction technology
CN108510533B (en) Fourier mellin registration and Laplace fusion image acceleration system based on FPGA
CN116109778A (en) Face three-dimensional reconstruction method based on deep learning, computer equipment and medium
CN116385577A (en) Virtual viewpoint image generation method and device
CN116342385A (en) Training method and device for text image super-resolution network and storage medium
CN116958481A (en) Point cloud reconstruction method and device, electronic equipment and readable storage medium
CN114584785A (en) Real-time image stabilizing method and device for video image

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 430073 No. 006, 20th floor, business project (China Pharmaceutical Technology Trading Market), No. 1, xiyaojian Road, north of Gaoxin Avenue and Heying Road, East Lake New Technology Development Zone, Wuhan City, Hubei Province

Applicant after: Wuhan Zhuomu Technology Co.,Ltd.

Address before: 430073 No. 006, 20th floor, business project (China Pharmaceutical Technology Trading Market), No. 1, xiyaojian Road, north of Gaoxin Avenue and Heying Road, East Lake New Technology Development Zone, Wuhan City, Hubei Province

Applicant before: WUHAN ZMVISION TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant