CN114119678A

CN114119678A - Optical flow estimation method, computer program product, storage medium, and electronic device

Info

Publication number: CN114119678A
Application number: CN202111164952.3A
Authority: CN
Inventors: 李海鹏; 刘帅成; 李有为; 叶年进; 程深
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-03-01

Abstract

The application relates to the technical field of image processing, and provides an optical flow estimation method, a computer program product, a storage medium and an electronic device. The optical flow estimation method comprises the following steps: acquiring a first image, a second image, first gyroscope data and second gyroscope data; the first image and the second image are images acquired by the same camera at different moments, the first gyroscope data is data acquired by a gyroscope during the acquisition of the first image, and the second gyroscope data is data acquired by the gyroscope during the acquisition of the second image; calculating a gyro domain from the first gyro data and the second gyro data; and estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image. The method can remarkably improve the light stream estimation precision, and particularly can effectively process images collected in special scenes such as rainy days, foggy days and nights.

Description

Optical flow estimation method, computer program product, storage medium, and electronic device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an optical flow estimation method, a computer program product, a storage medium, and an electronic device.

Background

Optical flow estimation is a fundamental but important computer vision task and has been widely used in applications such as object tracking, visual mapping, and image alignment. Existing optical flow estimation methods rely heavily on image content, and generally require that images used for optical flow estimation contain abundant texture information and similar lighting conditions. However, the above requirements are often difficult to satisfy for images acquired in scenes such as rainy days, foggy days, and nights, which results in low accuracy of the optical flow estimated for these scenes by the conventional method.

Disclosure of Invention

It is an object of the embodiments of the present application to provide an optical flow estimation method, a computer program product, a storage medium and an electronic device, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides an optical flow estimation method, including: acquiring a first image, a second image, first gyroscope data and second gyroscope data; the first image and the second image are images acquired by the same camera at different moments, the first gyroscope data is data acquired by a gyroscope during the acquisition of the first image, and the second gyroscope data is data acquired by the gyroscope during the acquisition of the second image; calculating a gyro domain from the first and second gyroscope data, the gyro domain being a two-dimensional motion field between the first and second images; estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain the optical flow between the first image and the second image.

The gyroscope domain in the method is calculated according to the gyroscope data, the gyroscope data is not influenced by image content, and the gyroscope data is acquired during image acquisition, so that the calculated gyroscope domain can effectively reflect background motion in the first image and the second image, and the temporary optical flow estimated according to the first image and the second image can better reflect the motion of a foreground and a moving object in the first image and the second image, so that the optical flow estimation precision can be obviously improved after the first image and the second image are fused.

In particular, for images acquired in scenes such as rainy days, foggy days, nights and the like, the background is blurred, the optical flow estimation method based on the image content is difficult to process, and the gyroscope data is used for performing good estimation in the method, so that the method can effectively meet the challenges brought by special scenes.

In one implementation manner of the first aspect, the first gyroscope data is data acquired by a gyroscope during the exposure of the first image, and the second gyroscope data is data acquired by a gyroscope during the exposure of the second image; the camera acquires images and comprises two stages of exposure and post-processing.

Because the image is generated in the exposure stage, and the quality of the generated image is optimized in the post-processing stage, the gyroscope data and the image in the post-processing stage do not have a corresponding relation, so that the gyroscope data in the exposure stage can be only used for calculating the gyroscope domain, and the calculation accuracy is improved.

In one implementation of the first aspect, the calculating a gyro domain from the first and second gyroscope data comprises: calculating a rotation matrix between the first image and the second image from the first gyroscope data and the second gyroscope data; calculating a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera; and calculating the gyro domain according to the homography matrix.

The rotation matrix in the three-dimensional space can be calculated according to gyroscope data, the rotation matrix can be converted into a homography matrix in the two-dimensional space according to the internal parameters of the camera, and the homography matrix is applied to the pixel coordinates, so that a gyroscope domain can be calculated. Viewed in three-dimensional space, the gyro domain represents the rotation of the camera, and viewed in two-dimensional space, the gyro domain represents the background motion in the image.

In one implementation manner of the first aspect, the calculating a rotation matrix between the first image and the second image according to the first gyroscope data and the second gyroscope data by using a rolling shutter includes: calculating n rotation matrices between the first image and the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; the method comprises the following steps that n is an integer larger than 1, each group of data is collected at different moments, the ith rotation matrix is a rotation matrix between the ith part of a first image according to an exposure sequence and the ith part of a second image according to the exposure sequence, and i is any integer from 1 to n; the calculating the homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera comprises: calculating n homography matrixes corresponding to the n rotation matrixes according to the n rotation matrixes and the internal parameters of the camera; the computing the gyro domain according to the homography matrix includes: calculating n partial gyro domains according to the n homography matrixes, and splicing the n partial gyro domains into a gyro domain; the ith partial gyro domain is a two-dimensional motion field between the ith part of the first image in the exposure sequence and the ith part of the second image in the exposure sequence, and i is any integer from 1 to n.

In a first aspect, aIn this way, the calculating n rotation matrices between the first image and the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data includes: calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n(ii) a i traverses integers from 1 to n and rotates the matrix M according to n +1 temporary rotation matrices_i～M_n+iAnd calculating the ith rotation matrix between the first image and the second image, and obtaining the n rotation matrices after traversing.

The two implementation modes give a possible calculation mode of a gyro domain when the camera adopts a curtain shutter. Wherein, n (n is more than 1) rotation matrixes are calculated, and the method is an effective approximation for the image generation mode under the shutter.

In an implementation manner of the first aspect, the calculating, by the camera using a global shutter, a rotation matrix between the first image and the second image according to the first gyroscope data and the second gyroscope data includes: calculating a rotation matrix between the whole of the first image and the whole of the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; wherein n is an integer greater than 1, and each group of data is acquired at different times.

In an implementation manner of the first aspect, the calculating a rotation matrix between the whole of the first image and the whole of the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data includes: calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n(ii) a According to 2n temporary rotation matrixes M₁～M_2nThe rotation matrix is calculated.

The two implementation modes give possible calculation modes of the gyro domain when the camera adopts the global shutter. Only one rotation matrix is calculated, and the method accords with the image generation mode under the global shutter.

In an implementation manner of the first aspect, the estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image includes: estimating the provisional optical flow from the first image and the second image using a neural network model; and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image.

In the above implementation, the neural network model is used to estimate the temporary optical flow, and since the gyro domain already well estimates the background optical flow (the gyro domain itself can also be regarded as an optical flow estimated from the gyro data), the neural network model can concentrate more on estimating the optical flows of the foreground and the moving object, which is equivalent to reducing the range of optical flow estimation, thereby being beneficial to improving the optical flow estimation accuracy of the model.

In one implementation manner of the first aspect, the fusing the temporary optical flow and the gyro domain includes: fusing the temporary optical flow and the gyro domain by using the neural network model; the neural network model comprises m optical flow estimation modules which are sequentially connected, m is an integer larger than 1, and the kth optical flow estimation module executes the following steps: extracting the kth level feature of the first image according to the kth-1 level feature of the first image, extracting the kth level feature of the second image according to the kth-1 level feature of the second image, and performing downsampling on the kth-1 level feature when the kth level feature is extracted; fusing a kth-level gyro domain and a (k + 1) th-level temporary optical flow to obtain a kth-level fused optical flow, wherein the kth-level gyro domain is obtained by down-sampling a kth-1-level gyro domain; calculating a kth-level temporary optical flow according to the kth-level feature of the first image, the kth-level feature of the second image and the kth-level fused optical flow, wherein the kth-level fused optical flow is up-sampled when the kth-level temporary optical flow is calculated; wherein k is any integer from 1 to m, the 0 th-level feature of the first image is the first image, the 0 th-level feature of the second image is the second image, the 0 th-level gyro domain is the gyro domain, the m th-level gyro domain is 0, the m +1 th-level provisional optical flow is 0, and the optical flow between the first image and the second image is obtained from the 1 st-level provisional optical flow.

In the implementation mode, m sequentially connected optical flow estimation modules are used for feature extraction, optical flow fusion and optical flow estimation on multiple scales, and each optical flow estimation module further optimizes the optical flow output by the last optical flow estimation module, so that the optical flow estimation from coarse to fine is realized, and the optical flow estimation result is favorably improved.

In an implementation manner of the first aspect, the merging the kth-level gyro domain and the (k + 1) th-level temporary optical flow to obtain the kth-level merged optical flow includes one of the following three manners: fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level fusion optical flow, wherein the first optical flow fusion unit comprises at least one convolution layer; predicting a k-th-level weight map according to the k-th-level features of the first image and the k-th-level features of the second image by using a weight prediction unit in the k-th optical flow estimation module, and performing weighted fusion on the k-th-level gyro domain and the k + 1-th-level temporary optical flow by using the k-th-level weight map to obtain a k-th-level fused optical flow; wherein pixel values in the kth-level weight map represent fusion weights, the weight prediction unit comprising at least one convolution layer; fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level temporary fusion optical flow, predicting a kth-level weight map according to the kth-level feature of the first image and the kth-level feature of the second image by using a weight prediction unit in the kth optical flow estimation module, and performing weighted fusion on the kth-level gyro domain and the kth-level temporary fusion optical flow by using the kth-level weight map to obtain the kth-level fusion optical flow; wherein the first optical flow fusion unit and the weight prediction unit each include at least one convolution layer, and pixel values in the k-th-order weight map represent fusion weights.

In the three optical flow fusion modes, the first attribute is implicit fusion, that is, how to fuse a gyro domain and a temporary optical flow is learned by using a network (a first optical flow fusion unit); the second category is explicit fusion, namely, the gyro domain and the temporary optical flow are fused according to a definite instruction message (weight map); and the third method simultaneously uses implicit fusion and display fusion, thereby being beneficial to obtaining a more accurate optical flow estimation result.

In a second aspect, an embodiment of the present application provides an optical flow estimation apparatus, including: the data acquisition component is used for acquiring a first image, a second image, first gyroscope data and second gyroscope data; the first image and the second image are images acquired by the same camera at different moments, the first gyroscope data is data acquired by a gyroscope during the acquisition of the first image, and the second gyroscope data is data acquired by the gyroscope during the acquisition of the second image; a gyro domain calculation component for calculating a gyro domain from the first and second gyro data, the gyro domain being a two-dimensional motion field between the first and second images; and the optical flow estimation component is used for estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain the optical flow between the first image and the second image.

In a third aspect, an embodiment of the present application provides a computer program product, which includes computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided in the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a memory in which computer program instructions are stored, and a processor, where the computer program instructions are read and executed by the processor to perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a flow chart of an optical flow estimation method provided by an embodiment of the present application;

fig. 2 illustrates a data acquisition manner provided in an embodiment of the present application;

FIG. 3 illustrates one way of calculating the gyro domain when a shutter is used;

FIG. 4 illustrates one way of computing the rotation matrix of FIG. 3;

FIG. 5 illustrates one way of computing the gyro domain when a global shutter is employed;

FIG. 6 illustrates one way of computing the rotation matrix of FIG. 5;

FIG. 7 illustrates a structure of a neural network model provided by an embodiment of the present application;

FIG. 8 illustrates the structure of the optical flow fusion sub-module in the neural network model of FIG. 7;

FIG. 9 is a diagram illustrating the structure of an optical flow estimation apparatus according to an embodiment of the present application;

fig. 10 shows a structure of an electronic device provided in an embodiment of the present application.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is a new scientific technology that is developed to study and develop theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction, computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

The optical flow estimation method in the embodiment of the application belongs to the technical field of computer vision on the whole, improves the accuracy of optical flow estimation by means of gyroscope data, and particularly has good performance in scenes which are difficult to process by the existing method, such as rainy days, foggy days, nights and the like.

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The terms "first," "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily being construed as indicating or implying any actual such relationship or order between such entities or actions.

FIG. 1 shows a flow of an optical flow estimation method provided by an embodiment of the present application. The method may be, but is not limited to being, performed by the electronic device shown in fig. 10. Referring to fig. 1, the method includes:

s110: the first image, the second image, the first gyroscope data and the second gyroscope data are obtained.

Step S110 is a step of acquiring data required for estimating an optical flow, and the data required to be acquired includes two types, one type is image data (first image, second image) acquired by the camera, and the other type is gyroscope data (first gyroscope data, second gyroscope data) acquired by the gyroscope.

It should be noted here that the camera and gyroscope in the method of fig. 1 should be mounted on the same device and are not referred to as device a, but the device performing the method is not necessarily device a, but may be device B. For example, image data and gyroscope data are acquired from a camera and a gyroscope of a mobile phone (device a) in real time, and optical flow estimation is performed locally at device a; for another example, after a camera and a gyroscope of a mobile phone (device a) generate image data and gyroscope data, these data are exported to a PC (device B), and optical flow estimation is performed on device B, and so on.

The first image and the second image in step S110 are images acquired by the same camera at different times, and the optical flow to be estimated is an optical flow between the first image and the second image.

The optical flow between the first image and the second image is in the form of a two-dimensional vector field, each vector in the vector field (including two components in the x and y directions) reflects the rate of change of the gray scale at the corresponding position in the image, and the change of the gray scale in the image is generally caused by the motion of the pixel on the image plane, so that the optical flow can reflect the motion of the pixel between the first image and the second image to some extent.

The image collected by the camera is understood to be an image generated by an image sensor of the camera at any time, and not only an image generated after a user issues a collection instruction. For example, a video frame generated after a user clicks a button of "start recording" in a camera APP belongs to an image acquired by a camera; for another example, the image that the user sees on the preview interface of the camera APP also belongs to the image captured by the camera.

The first image and the second image are general in concept, for example, a video captured by a camera includes a large number of video frames, where any two adjacent frames can be selected as the first image and another frame as the second image, but the method of estimating the optical flow is similar no matter which two frames are selected.

As will be described later, the following conditions are mainly used as an example in the case where the first image and the second image satisfy the following conditions, and other conditions can be similarly analyzed:

(1) the contents of the first image and the second image are for the same scene

If the contents of the first image and the second image are not for the same scene, it does not make much sense to estimate the optical flow between the two. If the acquisition time interval between the first image and the second image is not large (for example, two adjacent frames in a video, two pictures taken continuously, etc.), the condition (1) is easily satisfied.

(2) The first image and the second image are the same size

Since the first image and the second image are captured by the same camera, the condition (2) is easily satisfied.

The first gyroscope data in step S110 is data acquired by the gyroscope during the first image acquisition, and the second gyroscope data is data acquired by the gyroscope during the second image acquisition, so that the first image and the first gyroscope data correspond to each other, and the second image and the second gyroscope data correspond to each other.

The acquisition of each image is not finished instantly and needs to be continued for a period of time, the gyroscope data can be considered to be finished instantly, a group of gyroscope data is acquired every time, and the gyroscope data can be acquired according to a fixed frequency. In the embodiment of the application, at least one set of gyroscope data may be acquired during the acquisition of an image. Each image generates a corresponding time stamp when being collected, wherein the time stamp comprises a time stamp for collecting the start and a time stamp for collecting the end, and each group of gyroscope data also corresponds to the time stamp, so that the corresponding relation between the images and the gyroscope data can be determined according to the mutual relation between the time stamps. The time at these timestamps may be the kernel time of the system (referring to the operating system of the device where the camera, gyroscope is located).

Referring to FIG. 2, I_aRepresenting a first image with a time stamp t for the start of acquisition_aS, the time stamp of the collection end is t_aE, at t_aS～t_aDuring the period E, the gyroscope acquires 14 groups of data recorded as g_a(1)～g_a(14)；I_bRepresenting a second image with a time stamp t for the start of acquisition_bS, the time stamp of the collection end is t_bE, at t_bS～t_bDuring the period E, the gyroscope acquires 14 groups of data in total, and the data is recorded asg_b(1)～g_b(14). Wherein, t_aE and t_bS may also be implemented as the same timestamp, where each image corresponds to only one timestamp indicating the start of the acquisition.

It should be noted, however, that some implementations may use all of the data acquired by the gyroscope during image acquisition for optical flow estimation, as in FIG. 2, g_a(1)～g_a(14) As first gyroscope data, g_b(1)～g_b(14) As second gyroscope data. While other implementations use only a portion of the data collected by the gyroscope during image acquisition for optical flow estimation, as shown in FIG. 2, g_a(1)～g_a(6) As first gyroscope data, g_b(1)～g_b(6) As second gyroscope data, regarding g_a(7)～g_a(14)、g_b(7)～g_b(14) The data is discarded, and the discarding may refer to deletion or simply not participating in subsequent calculations.

The latter case is explained in the following with emphasis, in fig. 2, the camera acquires I_aProcess t of_aS～t_aE is divided into two phases, respectively t_aS～t_aM and t_aM～t_aE, wherein the former stage is an exposure stage, the latter stage is a post-processing stage, and the two stages are at the exposure ending time t_aM is a boundary line. I is_aThe original Image of (2) is generated by the Image sensor of the camera in the exposure stage, but the quality of the generated Image is poor, and the Image Signal Processor (ISP) of the camera is used for post-processing, so that I with good quality can be obtained_a。

It can be seen that the post-processing stage is only in the pair I_aSo that the gyroscope data g acquired at this stage is quality optimized_a(7)～g_a(14) And I_aThere is no correspondence, and if these data are used for the gyro domain calculation in step S120, the calculation accuracy may be reduced, so that they are not discarded in step S110, and only the gyro data g acquired in the exposure phase is retained_a(1)～g_a(6) As first gyroscope data.

Note that g is obtained_a(1)～g_a(6) In one way, g is obtained first, as described above_a(1)～g_a(14) And g in the filtrate is filtered_a(7)～g_a(14). Alternatively, only g may be directly obtained_a(1)～g_a(6) Since the duration of the exposure phase is fixed, the number of sets of its corresponding gyroscope data is also fixed, e.g. 6, so that t is the number_aS is taken as the time starting point, 6 groups of gyroscope data are taken to obtain g_a(1)～g_a(6)。

For I_bIn the case of (3), it may be similar to that of (I)_aThe analysis was performed and the description will not be repeated. As will be described later, the acquired gyroscope data is mainly taken as an example of data including only an exposure phase.

The gyroscope data used in the embodiment of the application at least includes angular velocity information in three directions, which certainly does not exclude that the gyroscope can also acquire other information, and in addition, the timestamp corresponding to each set of gyroscope data can also be regarded as a part of the gyroscope data.

In the following, a mobile phone is taken as an example to describe a possible data acquisition process, and the situation is similar for other electronic devices:

first, a root-enabled handset without optical anti-shake (or with the functionality turned off by technical means) is selected. The gyroscope data can always reflect the real motion track of the equipment (referring to a camera and the equipment where the gyroscope is located), but the optical anti-shake function can cause that the image acquired from the camera cannot reflect the real motion track of the equipment, which is unfavorable for optical flow estimation, so that the function needs to be shielded. The root function is supported to obtain the highest authority of the mobile phone so as to normally access data collected by the sensor (such as a camera and a gyroscope).

Then, a customized library is installed and operated in the mobile phone, and the code of the customized library at least realizes the following functions:

firstly, the required data (image and gyroscope data) is obtained from a Hardware Abstraction Layer (HAL for short) of the system, and the accuracy of the data is higher when the data is directly obtained from the HAL Layer compared with the data obtained from some upper-Layer applications.

Secondly, the obtained data is exported through a data transmission protocol, and the data can be exported to a local memory of the mobile phone or an external device according to the position for carrying out optical flow estimation.

Note that the logic described above for acquiring images and their corresponding gyroscope data using timestamp information may be implemented in a custom library.

And finally, preprocessing the derived data, including formatting, deleting useless data and the like, to obtain data convenient for optical flow estimation. This step may or may not be performed by the custom library, e.g., if the data has been exported to an external device, pre-processing may be performed on the external device.

S120: a gyro domain is calculated from the first and second gyro data.

The gyromagnetic field is a two-dimensional motion field between the first image and the second image, and the two-dimensional motion field can reflect the pixel motion situation between the first image and the second image to a certain extent. The size of the gyro-field is the same as the first and second images, each pixel of which is a motion vector (comprising two components in the x and y directions). It should be understood that the gyro-domain can also be considered as an optical flow calculated from the gyro data (in the prior art, the optical flow is generally calculated from the image).

For the optical flow estimation problem, the image content can be roughly divided into three parts: background, foreground, and moving objects.

The background can be considered as the area of the image that is formed by objects that are further away from the camera. While the background motion (referring to the motion of the pixels in the background) and the motion of the camera are consistent, it has been mentioned in the description of step S110 that the gyroscope data can reflect the real motion trajectory of the device, and naturally also the motion trajectory of the camera mounted on the device, and it can be known from step S110 that the gyroscope data is acquired during the image acquisition, so that the calculated gyroscope field can effectively reflect the background motion in the first image and the second image. The calculation of the gyro domain is not influenced by the image content, but depends on the special hardware of a gyroscope, so that the estimation of the background motion can achieve high precision.

The foreground can be considered as the area of the image that is formed by objects that are closer to the camera, and although the foreground motion (meaning the motion of pixels in the foreground) and the motion of the camera are theoretically consistent, the gyroscope data does not describe the foreground motion particularly well. The reason for this is that the gyroscope can only record rotation information of the device and cannot record translation information of the device, and the first image and the second image are acquired at different times, and during this time, there is a high possibility of some translation of the device, which will cause parallax between the first image and the second image, and the parallax can be basically ignored in the background of the image, but is more obvious in the foreground of the image.

However, according to statistics, when mobile devices such as mobile phones and the like perform image acquisition, 90% of generated motion is rotation, and only 10% of generated motion is translation, so that a gyro domain can well describe background motion and can also describe foreground motion to a certain extent, and the rest can be left to be solved in step S130.

The moving object refers to an object capable of autonomously moving in an image, the motion track of the moving object has no inevitable relation with the motion track of the camera, and a natural gyro domain cannot effectively describe the moving object.

How to calculate the gyro domain will be described in detail later, and will not be expanded here.

S130: and estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image.

The temporary optical flow is an intermediate optical flow estimation result, and it is pointed out in the description of step S120 that the gyro domain may also be regarded as an optical flow, so that the fusion of the gyro domain and the temporary optical flow is not an obstacle, and the final optical flow estimation result, i.e. the optical flow between the first image and the second image, is obtained after the fusion.

For the estimation of the temporary optical flow, a method based on deep learning may be adopted, and a conventional optical flow estimation method may also be adopted, and hereinafter, a manner of performing optical flow estimation using a neural network model (belonging to the method based on deep learning) is mainly described.

For optical flow fusion, there are also a number of ways, including: directly fusing by using a neural network model, which is called implicit fusion; fusing with a weight map (where the pixel values represent fusion weights) computed in some way, called explicit fusing; and combining the implicit fusion and the explicit fusion. Specific examples will be given later for these three fusion methods.

Regarding step S130, the following problem also needs to be explained:

(1) estimating the temporary optical flow uses at least the first and second images, and possibly a gyroscopic domain in some implementations.

(2) The process of estimating the temporal optical flow and fusing the optical flow is not necessarily performed only once, and may need to be performed multiple times to obtain the final optical flow estimation result.

(3) The size of the finally obtained optical flow is the same as that of the first image and the second image, but the size of the temporary optical flow is not necessarily the same as that of the first image and the second image, namely, the temporary optical flow can be estimated on different scales.

(4) The two stages of estimating the temporary optical flow and fusing the optical flow may be unified into a neural network model, i.e. the model not only learns how to estimate the temporary optical flow, but also learns how to fuse the temporary optical flow and the gyro domain (i.e. the whole step S130 is completed by the neural network model). Of course, the case where only the estimated provisional optical flow uses a neural network and the optical flow fusion does not use a neural network, or the case where the estimated provisional optical flow does not use a neural network and the optical flow fusion uses a neural network is not excluded.

The above problems (1) to (4) can be found in the following examples, and will not be explained here.

In setting up step S120, it is mentioned that the gyro domain can effectively reflect the background motion in the first image and the second image, but there is a lack in the description of the foreground motion and the moving object. The temporary optical flows estimated according to the first image and the second image can better reflect the foreground in the first image and the second image and the motion of a moving object. Taking the case of estimating the temporary optical flow by using the neural network model as an example, since the gyro domain already well estimates the background optical flow (as mentioned earlier, the gyro domain itself can also be regarded as the optical flow), the neural network model can concentrate more on estimating the optical flows of the foreground and the moving object, which is equivalent to reducing the range of the optical flow estimation, thereby being beneficial to improving the accuracy of estimating the optical flows of the foreground and the moving object by the model. Therefore, the temporary optical flow and the gyro domain are fused in step S130, so that the advantages and the disadvantages can be made up, a good estimation effect can be achieved in the background, the foreground and the moving object region, and the optical flow estimation precision is remarkably improved.

In particular, for images acquired in scenes such as rainy days, foggy days, nights, and the like, the background tends to be blurred, the optical flow estimation method based on the image content is difficult to process, in which the background optical flow is well estimated using gyroscope data, and the optical flow estimation of the foreground and moving objects is also improved, so that the method can effectively process images acquired in these special scenes.

The optical flow finally estimated in step S130 is not limited in use, and may be used, for example, to align a first image to a second image (or vice versa), to track a target in a video sequence in which the first image and the second image are located, and so on. Since the accuracy of the optical flow estimation is increased, the performance of these optical flow-based tasks is naturally improved accordingly.

On the basis of the above embodiment, the following description is continued on the possible calculation manner of the gyro domain in step S120:

in some implementations, the gyro domain may be calculated as follows:

step A: a rotation matrix between the first image and the second image is calculated from the first gyroscope data and the second gyroscope data.

And B: and calculating a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera.

And C: and calculating a gyro domain according to the homography matrix.

The rotation matrix in the three-dimensional space can be calculated according to gyroscope data, the rotation matrix can be converted into a homography matrix in the two-dimensional space according to internal parameters of the camera, and the homography matrix is applied to pixel coordinates to calculate a gyroscope domain. As can be known from the calculation process of the gyro domain, the gyro domain represents the rotation of the camera in a three-dimensional space, and the gyro domain represents the background motion in the image in a two-dimensional space.

It is to be noted that the homography matrix calculated according to this method only contains rotation information and does not contain translation information (because the gyroscope is not capable of providing translation information), and therefore it is not excluded that in some alternatives, translation information is obtained by other sensors and is also incorporated into the calculation of the homography matrix.

The shutters used in the existing cameras are mainly divided into two types, one is a rolling shutter (rolling shutter) and the other is a global shutter (global shutter). With a camera using a rolling shutter, an image is generally exposed line by line (or column by column), that is, each line of pixels in the image is generated in different time, which is also called rolling shutter effect (rolling shutter effect), for example, when a picture is taken on a high-speed train, a telegraph pole on the picture is tilted, which is the effect of the rolling shutter effect. With a global shutter camera, the image is exposed as a whole, i.e. each row of pixels in the image is generated in the same time. At present, the cost and the technical implementation difficulty of the global shutter are higher than those of the shutter, so that most cameras of the equipment adopt the shutter. For the case that the camera adopts a rolling shutter and a global shutter, there are some differences in the implementation of the above steps a to C, which are specifically described below:

adopting a shutter (steps A1-C1 are an implementation of steps A-C respectively)

Step A1: n rotation matrices between the first image and the second image are calculated from n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data.

Where n is an integer greater than 1, for example, for fig. 2, n may be 6. If i is any integer from 1 to n, the ith rotation matrix of the n rotation matrices is: a rotation matrix between the ith part of the first image in exposure order and the ith part of the second image in exposure order. For example, in the case where n is 6, if the image size is 720 × 480 (width × height) and the line-by-line exposure is performed, the image may be equally divided into 6 portions in the line direction, each portion having a size of 720 × 80, and numbered 1 to 6 in the order from top to bottom.

Assuming that the first image and the second image are both W × H (width × height), strictly speaking, due to the effect of the rolling effect, there is a rotation matrix between each line of the first image and the corresponding line in the second image, i.e. there are W different rotation matrices in total, but in most cases, W rotation matrices are not calculated, on one hand, because there are not so many sets of gyroscope data to support such calculation (n < W or even n < < W), on the other hand, even if there is enough gyroscope data, the amount of calculation required to calculate W rotation matrices is too large. Therefore, in step a1, only n rotation matrices are calculated, and the calculation accuracy is reduced compared to W rotation matrices, but the image generation method when the shutter is used can be effectively reflected as well.

Referring to FIG. 3, a first image I_aDivided into n parts I in the order of exposure_a(1)～I_a(n), second image I_bAlso divided into n parts I in the order of exposure_b(1)～I_b(n), first gyroscope data g_aTotally include n groups, each is g_a(1)～g_a(n), second gyroscope data g_bAlso included are n groups, each g_b(1)～g_b(n) according to g_a(1)～g_a(n) and g_b(1)～g_b(n) can calculate I_a(1) And I_b(1) R of the rotation matrix R₁、I_a(2) And I_b(2) R of the rotation matrix R₂，…，I_a(n) and I_bRotation matrix R between (n)_n。

In some implementations, the n rotation matrices are calculated as follows:

step A11: calculating n corresponding temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n。

Referring to FIG. 4, gyroscope data g is not provided_a(1) For example, g_a(1) Includes angular velocity information in three directions, and multiplies the angular velocity information in the three directions by an acquisition time interval (g)_a(2) Time stamp t of_a(2) Minus g_a(1) Time stamp t of_a(1) Get three corresponding rotation vectors, and calculate M by substituting the three rotation vectors into the Rodrigues equation₁Can be regarded as M₁Describe t_a(1)～t_a(2) The rotation information of the device during this time. For g_a(2)～g_a(n)、g_b(1)～g_b(n) corresponding temporary rotation matrix M₂～M_n、M_n+1～M_2nThe calculation is similar and will not be repeated. Note only that for g_a(n) if it is not I_aCorresponding last set of gyroscope data, e.g. in FIG. 2, g_a(6) The rear surface also has g_a(7) But g is_a(7) Is abandoned, then g_a(6) Should be multiplied by t_a(7)-t_a(6) Instead of t_b(1)-t_a(6)。

Step A12: i traverses integers from 1 to n and rotates the matrix M according to n +1 temporary rotation matrices_i～M_n+iAnd calculating the ith rotation matrix between the first image and the second image, and obtaining n rotation matrices after traversing.

For example, when i is 1, n +1 temporary rotation matrices M₁～M_n+1Multiplying to obtain the first graph1 st rotation matrix R between image and second image₁. It is to be noted that R₁Not only g_a(1)、g_b(1) Related, but need to be combined with g_a(1)～g_a(n)、g_b(1) Can the calculation be accurate. For the rotation matrix R₂～R_nThe calculation is similar and will not be repeated.

The right side of fig. 4 shows the calculation process described in step a12, with each portion of the image in the exposure order and the corresponding set of gyroscope data being written together at the far left side of fig. 4.

In some alternatives, if n is relatively large, only a part of the gyroscope data may be selected for calculating the rotation matrix, so as to save computation, for example, when n is 10, only the 1 st, 3 rd, 5 th, 7 th, and 9 th groups of data may be selected to calculate 5 rotation matrices, and the first image and the second image will be divided into only 5 parts.

Step B1: and calculating n homography moments corresponding to the n rotation matrixes according to the n rotation matrixes and the internal parameters of the camera.

Assuming that an internal parameter matrix formed by internal parameters of the camera is K, R is₁Corresponding homography matrix H₁＝K×R₁×K^-1，H₁It can also be considered as a homography matrix between the ith part of the first image in exposure order and the ith part of the second image in exposure order. For and R₂～R_nCorresponding homography matrix H₂～H_nThe calculation manner is similar and will not be repeated, and the final calculation result is shown in fig. 3.

Step C1: and calculating n partial gyro domains according to the n homography matrixes, and splicing the n partial gyro domains into a gyro domain.

The ith partial gyro domain is a two-dimensional motion field between the ith part of the first image in the exposure sequence and the ith part of the second image in the exposure sequence, and i is any integer from 1 to n. Due to the curtain rolling effect, each homography matrix can only correspond to a part of rotation information in the image, so that the obtained gyro domain is only a part of the whole gyro domain, the obtained gyro domain is called as a partial gyro domain, and finally, the complete gyro domain can be obtained after the partial gyro domains are spliced.

E.g. H₁Corresponding partial gyro domain is G_ab(1)，G_ab(1) The calculation method is as follows: for any coordinate (x, y) in part 1 of the first image in the exposure order, it is compared with H₁Multiplying to obtain a new coordinate (x ', y'), subtracting the original coordinate from the new coordinate to obtain a motion vector (u, v) ═ x '-x, y' -y) at the coordinate (x, y), traversing all coordinates in the 1 st part of the first image in the exposure order to obtain G_ab(1). Note that the image content of the first image is not used in the calculation process. For sum H₂～H_nCorresponding partial gyro domain G_ab(2)～G_ab(n) the calculation is similar and will not be repeated, and G is obtained after final splicing_abAs shown in fig. 3.

A global shutter is adopted (steps A2-C2 are an implementation of steps A-C, respectively)

Step A2: a rotation matrix between the whole of the first image and the whole of the second image is calculated based on n sets of data included in the first gyro data and n sets of data included in the second gyro data.

Where n is an integer greater than 1, for example, for fig. 2, n may be 6. The difference between step A2 and step A1 is that step A1 needs to calculate n rotation matrices, while step A2 only needs to calculate one rotation matrix R, because the rotation information of the entire image can be represented by one rotation when a global shutter is used, as shown in FIG. 5.

In some implementations, the rotation matrix is calculated as follows:

step A21: calculating n corresponding temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n。

Step A21 is the same as step A11 and will not be repeated.

Step A22: according to 2n temporary rotation matrixes M₁～M_2nA rotation matrix is calculated between the entirety of the first image and the entirety of the second image.

For example, 2n temporary rotation matrices M₁～M_2nThe rotation matrix R between the whole of the first image and the whole of the second image can be obtained by multiplication.

FIG. 6 shows the calculation process described in steps A11 and A12.

Step B2: and calculating a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera.

Step B2 is similar to step B1, except that only one rotation matrix is calculated in step a2, so step B2 also only needs to calculate a corresponding homography matrix H, which can also be considered as a homography matrix between the whole of the first image and the whole of the second image, and the calculation result is shown in fig. 5.

Step C2: and calculating a gyro domain according to the homography matrix.

Step C2 is similar to step C1 except that only one homography matrix is computed in step B2, so that the complete gyro domain G can be computed directly in step C2_abThe calculation results are shown in fig. 5.

In some alternatives, if n is relatively large, only a part of the gyroscope data may be selected for calculating the rotation matrix R, so as to save computation, for example, when n is 10, only the 1 st, 3 rd, 5 th, 7 th, and 9 th groups of data may be selected for calculating the rotation matrix R.

On the basis of the above embodiment, a manner of implementing step S130 by using a neural network model is described as follows:

fig. 7 shows a structure of a neural network model provided in an embodiment of the present application. Referring to fig. 7, viewed laterally, the neural network model includes m optical flow estimation modules (m is an integer greater than 1, and m is 4 in fig. 7) connected in sequence, each of the optical flow estimation modules has functions of feature extraction, optical flow fusion, and optical flow estimation; viewed in the longitudinal direction, the neural network model comprises an encoding network and a decoding network (the left side and the structure related to the gyro domain may belong to a part of the neural network model or may not belong to the neural network model), the encoding network is mainly used for extracting multi-scale features, and the decoding network is mainly used for multi-scale optical flow estimation and fusion. In the following description, the optical flow estimation module is mainly used for explanation.

Taking the kth optical flow estimation module (k is any integer from 1 to m) as an example, the kth optical flow estimation module mainly comprises three sub-modules:

feature extraction submodule (module E in fig. 7): for features from the k-1 level of the first image

Extracting kth-level features of a first image

And, from the (k-1) th level features of the second image

Extracting kth-level features of a second image

Wherein the E module is extracting

Time pair

Down-sampling is performed and thus can be seen in fig. 7

Is less than

For the

And

the situation is similar. The E-module may be implemented as a convolution module comprising at least one convolution layer, although other structures are possible.

In particular, it is possible to use, for example,

optical flow fusion submodule (SGF module in fig. 7): for coupling the kth-order gyro domain

And k +1 th order temporal optical flow

Performing fusion to obtain the k-th fusion optical flow

Wherein the content of the first and second substances,

by

The data is obtained by carrying out down-sampling,

the downsampling process of the gyro domain is shown in the leftmost column of fig. 7, DOWN represents a downsampling module, and the DOWN module may be a part of the neural network model (i.e., downsampling is performed during running the neural network model) or a separate part (i.e., the result obtained after downsampling in advance is input to the neural network model).

In particular, it is possible to use, for example,

(where 0 refers to a full 0 two-dimensional motion field), not shown in figure 7,

(Here, the0 refers to an all 0 optical flow), not shown in fig. 7. At the same time, because

And is

Therefore, it is not only easy to use

The SGF module in the mth optical flow estimation module may thus be omitted, as shown by optical flow estimation module 4 of fig. 7.

It is noted that when describing the functionality of the SGF module above, no mention is made of its inputs including

And

however, in fig. 7, the input to the SGF module includes both items of information. The reason for this is that whether these two items of information are to be used as inputs is optional in different implementations of the SGF module, as will be described later in detail with respect to the internal structure of the SGF module.

Decoding sub-module (D-module in fig. 7): for the kth-order features from the first image

Kth level features of second image

And k-th fused light stream

Computing a kth-level provisional optical flow

Wherein the D module is in computing

Time pair

And performing upsampling, wherein the upsampling can be realized by a structure such as an deconvolution layer. Of course, other structures are also included in the D-module, for example, these structures may be based on

And

prediction

And

residual error of

Then use

To calculate

In particular, the optical flow V between the first image and the second image_abCan be prepared from

And then UP-sampled, e.g., as needed, as shown by the UP module of fig. 7, the UP module may also be omitted if UP-sampling is not needed.

Briefly summarizing the structure of the neural network model, the model simultaneously realizes optical flow estimation and optical flow fusion in step S130, the model performs feature extraction, optical flow fusion and optical flow estimation on multiple scales by using m sequentially connected optical flow estimation modules, and each optical flow estimation module further optimizes the optical flow output by the previous optical flow estimation module, thereby realizing coarse-to-fine optical flow estimation, and thus being beneficial to improving the optical flow estimation result. It should be understood that FIG. 7 shows only the structure of the neural network model that is relevant to optical flow estimation, and does not exclude that the neural network model also contains other structures.

Fig. 8 shows a possible architecture for the SGF module. Referring to fig. 8, the SGF module may include the following three units:

a first optical flow fusion unit: for coupling the kth-order gyro domain

And k +1 th order temporal optical flow

Performing fusion to obtain k-th temporary fusion optical flow

The first optical flow fusion unit may be implemented as a convolution module including at least one convolution layer, although other structures may be included.

A weight prediction unit: according to the kth level feature of the first image

And the kth level feature of the second image

Predicted kth-order weight map

Wherein the content of the first and second substances,

each pixel value in (a) represents a fusion weight for fusion

And

vectors at the same location. For example, the fusion weight may take [0,255 [ ]]The numerical value in between.

Optionally, in

Before being input into the weight prediction unit, the weight prediction unit can also be utilized

It is subjected to warping (warp) processing (for estimation I)_aTo I_bIn case of optical flow, if it is estimated I_bTo I_aThe optical flow of (1) should be in accordance with

Twisted), so-called twist, i.e. to

To the direction of

One operation of alignment is shown in fig. 8.

Optionally, in addition to

And

in addition, the weight prediction unit may also include other inputs, such as

And/or

The weight prediction unit may be implemented as a convolution module including at least one convolution layer, although other structures are also possible.

A second optical flow fusion unit: for using a kth-level weight map

For k-th order gyro domain

And k-th order temporally fused optical flow

Performing weighted fusion to obtain a k-th fused optical flow

Wherein, if

Pixel value representation in (1)

Corresponding to the fusion weight, a possible fusion formula (the second optical flow fusion unit implements the formula) is as follows:

symbol [ ] in the formula represents a pixel-by-pixel multiplication between matrices, and

normalization has been performed to ensure that each pixel value lies in the interval 0,1]In the interior, for convenience, it is still used

Representing the normalized weight map. From an intuitive understanding of the formula, it is,

the blackish part (corresponding to the back in the image) before normalizationScene area) to blend more

Since the optical flow of this part of the region is already given relatively accurately by the gyro domain, the whitish parts (corresponding to foreground regions or moving objects in the image) are more merged

Since the optical flow gyro domain of this partial area cannot be accurately given, it is necessary to rely more on the optical flow estimated from the image.

It is to be understood that if

Pixel value representation in (1)

The corresponding fusion weight, the above formula needs to be adjusted accordingly.

There are also some variations of the SGF module in fig. 8, such as:

modification 1: removing the weight prediction unit and the second optical flow fusion unit, i.e. directly using the first optical flow fusion unit

And

performing fusion to obtain

And will be

As

And (6) outputting. This approach is the implicit fusion mentioned above, i.e. a network (the first optical flow fusion unit) is used to learn how to fuse the gyro domain and the temporary optical flow.

It is noted that the input to the SGF block is also no longer included since the weight prediction unit is eliminated

And

modification 2: by removing the first optical flow fusion unit, i.e. by using a weight prediction unit, based on

And

prediction

Then using a second optical flow fusion unit based on

To pair

And

performing weighted fusion to obtain

This is the explicit fusion mentioned above, i.e. the fusion of the gyro domain and the temporary optical flow according to an explicit indication (weight map).

As for the SGF module in fig. 8, implicit fusion and explicit fusion are used simultaneously, which is beneficial to obtaining a more accurate optical flow estimation result.

In the following, on the basis of the above embodiments, how the above neural network model is trained will be briefly described:

the training of the neural network model can be divided into two modes, namely supervised training and unsupervised training, wherein the supervised training needs data labeling, but because the labeling of optical flow data is difficult and time-consuming, the former mainly trains on synthetic data (such as video generated by a computer vision algorithm) and has a narrow application range, while the latter can train on actual data (such as live shot video) and has a wide application range, so that the unsupervised training method is mainly introduced here.

In the case of unsupervised training, the following two loss functions may be set, but are not limited to:

1. image loss (photo loss)

Suppose V_abIs I_aTo I_b(I in training)_aTo I_bUnderstanding two images in a training set), V may be used_abTo I_aWarping to obtain warp (I)_a) Then calculates the representation warp (I)_a) And I_bThe image loss of the difference between (which may be in the form of L1 loss or L2 loss). Setting this penalty is intended to reduce warp (I)_a) And I_bThe difference between them.

2. Loss of smoothness (smooth loss)

The term loss is calculated V_abAn internal smoothness indicator (which may be in the form of TV loss). The loss is set to make the estimated optical flow smoother, because the motion of the actual object generally has a certain integrity, and accordingly, the distribution of the vectors in the optical flow has a certain regularity, and if the vectors in the optical flow are disordered, the situation is not necessarily met.

The above two losses can be weighted as a total loss, and the network parameters can be updated according to the gradient of the total loss function until a training termination condition (e.g., model convergence) is reached. In some implementations, only the image loss may be calculated, and no smoothing loss may be calculated.

Fig. 9 is a functional block diagram of an optical flow estimation apparatus 200 according to an embodiment of the present application. Referring to fig. 9, the optical flow estimation device 200 includes:

a data acquisition component 210 for acquiring a first image, a second image, first gyroscope data, and second gyroscope data; the first image and the second image are images acquired by the same camera at different moments, the first gyroscope data is data acquired by a gyroscope during the acquisition of the first image, and the second gyroscope data is data acquired by the gyroscope during the acquisition of the second image;

a gyro domain calculating component 220 for calculating a gyro domain from the first gyro data and the second gyro data, the gyro domain being a two-dimensional motion field between the first image and the second image;

an optical flow estimation component 230, configured to estimate a temporary optical flow between the first image and the second image according to the first image and the second image, and fuse the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image.

In one implementation of the optical flow estimation apparatus 200, the first gyroscope data is data collected by a gyroscope during the first image exposure, and the second gyroscope data is data collected by a gyroscope during the second image exposure; the camera acquires images and comprises two stages of exposure and post-processing.

In one implementation of the optical flow estimation device 200, the gyro domain calculating component 220 calculates a gyro domain according to the first gyroscope data and the second gyroscope data, including: calculating a rotation matrix between the first image and the second image from the first gyroscope data and the second gyroscope data; calculating a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera; and calculating the gyro domain according to the homography matrix.

In one implementation of the optical flow estimation apparatus 200, the camera employs a rolling shutter, and the gyro domain calculating component 220 calculates a rotation matrix between the first image and the second image according to the first gyroscope data and the second gyroscope data, including: calculating n rotation matrices between the first image and the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; the method comprises the following steps that n is an integer larger than 1, each group of data is collected at different moments, the ith rotation matrix is a rotation matrix between the ith part of a first image according to an exposure sequence and the ith part of a second image according to the exposure sequence, and i is any integer from 1 to n; the gyro domain calculating component 220 calculates a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera, including: calculating n homography matrixes corresponding to the n rotation matrixes according to the n rotation matrixes and the internal parameters of the camera; the gyro domain calculation component 220 calculates the gyro domain according to the homography matrix, including: calculating n partial gyro domains according to the n homography matrixes, and splicing the n partial gyro domains into a gyro domain; the ith partial gyro domain is a two-dimensional motion field between the ith part of the first image in the exposure sequence and the ith part of the second image in the exposure sequence, and i is any integer from 1 to n.

In one implementation of the optical flow estimation apparatus 200, the gyro-domain calculating component 220 calculates n rotation matrices between the first image and the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data, including: calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n(ii) a i traverses integers from 1 to n and rotates the matrix M according to n +1 temporary rotation matrices_i～M_n+iAnd calculating the ith rotation matrix between the first image and the second image, and obtaining the n rotation matrices after traversing.

In one implementation of the optical flow estimation apparatus 200, the camera employs a global shutter, and the gyro domain calculating component 220 calculates a rotation matrix between the first image and the second image according to the first gyroscope data and the second gyroscope data, including: calculating a rotation matrix between the whole of the first image and the whole of the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; wherein n is an integer greater than 1, and each group of data is acquired at different times.

In one implementation of the optical flow estimation apparatus 200, the gyro-domain calculating component 220 calculates a rotation matrix between the whole of the first image and the whole of the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data, including: calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n(ii) a According to 2n temporary rotation matrixes M₁～M_2nThe rotation matrix is calculated.

In one implementation of the optical flow estimation apparatus 200, the optical flow estimation component 230 estimates a temporary optical flow between the first image and the second image according to the first image and the second image, and fuses the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image, including: estimating the provisional optical flow from the first image and the second image using a neural network model; and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image.

In one implementation of the optical flow estimation apparatus 200, the optical flow estimation component 230 fuses the provisional optical flow and the gyro domain, including: fusing the temporary optical flow and the gyro domain by using the neural network model; the neural network model comprises m optical flow estimation modules which are sequentially connected, m is an integer larger than 1, and the kth optical flow estimation module executes the following steps: extracting the kth level feature of the first image according to the kth-1 level feature of the first image, extracting the kth level feature of the second image according to the kth-1 level feature of the second image, and performing downsampling on the kth-1 level feature when the kth level feature is extracted; fusing a kth-level gyro domain and a (k + 1) th-level temporary optical flow to obtain a kth-level fused optical flow, wherein the kth-level gyro domain is obtained by down-sampling a kth-1-level gyro domain; calculating a kth-level temporary optical flow according to the kth-level feature of the first image, the kth-level feature of the second image and the kth-level fused optical flow, wherein the kth-level fused optical flow is up-sampled when the kth-level temporary optical flow is calculated; wherein k is any integer from 1 to m, the 0 th-level feature of the first image is the first image, the 0 th-level feature of the second image is the second image, the 0 th-level gyro domain is the gyro domain, the m th-level gyro domain is 0, the m +1 th-level provisional optical flow is 0, and the optical flow between the first image and the second image is obtained by up-sampling the 1 st-level provisional optical flow.

In one implementation manner of the optical flow estimation apparatus 200, the k-th optical flow estimation module fuses the k-th level gyro domain and the k + 1-th level temporary optical flow to obtain a k-th level fused optical flow, which includes one of the following three manners: fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level fusion optical flow, wherein the first optical flow fusion unit comprises at least one convolution layer; predicting a k-th-level weight map according to the k-th-level features of the first image and the k-th-level features of the second image by using a weight prediction unit in the k-th optical flow estimation module, and performing weighted fusion on the k-th-level gyro domain and the k + 1-th-level temporary optical flow by using the k-th-level weight map to obtain a k-th-level fused optical flow; wherein pixel values in the kth-level weight map represent fusion weights, the weight prediction unit comprising at least one convolution layer; fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level temporary fusion optical flow, predicting a kth-level weight map according to the kth-level feature of the first image and the kth-level feature of the second image by using a weight prediction unit in the kth optical flow estimation module, and performing weighted fusion on the kth-level gyro domain and the kth-level temporary fusion optical flow by using the kth-level weight map to obtain the kth-level fusion optical flow; wherein the first optical flow fusion unit and the weight prediction unit each include at least one convolution layer, and pixel values in the k-th-order weight map represent fusion weights.

The optical flow estimation device 200 provided in the embodiment of the present application, its implementation principle and the resulting technical effects have been introduced in the foregoing method embodiments, and for the sake of brief description, portions of the device embodiments that are not mentioned may refer to corresponding contents in the method embodiments.

Fig. 10 shows a possible structure of an electronic device 300 provided in an embodiment of the present application. Referring to fig. 10, the electronic device 300 includes (solid line blocks): a processor 310, a memory 320, and a communication interface 350, which are interconnected and in communication with each other via a communication bus 360 and/or other form of connection mechanism (not shown).

The processor 310 includes one or more, which may be an integrated circuit chip, having signal processing capabilities. The Processor 310 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; the Processor may also be a dedicated Processor, including a Neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component. Also, when there are a plurality of processors 310, some of them may be general-purpose processors, and the other may be special-purpose processors.

The Memory 320 includes one or more of, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

The processor 310, as well as possibly other components, may access, read, and/or write data to the memory 320. In particular, one or more computer program instructions may be stored in the memory 320, and may be read and executed by the processor 310 to implement the optical flow estimation method provided by the embodiment of the present application.

Communication interface 350 includes one or more devices that can be used to communicate directly or indirectly with other devices for the purpose of data interaction. Communication interface 350 may include an interface to communicate wired and/or wireless. The electronic device 300 may not be provided with the communication interface 350 if communication with other devices is not required.

It will be appreciated that the configuration shown in FIG. 10 is merely illustrative, and that electronic device 300 may also include more or fewer components than shown in FIG. 10, or have a different configuration than shown in FIG. 10:

in some implementations, the electronic device 300 further includes (dashed box): camera 330 and gyroscope 340, both interconnected and in communication with other components described above via communication bus 360 and/or other forms of connection mechanisms (not shown).

The camera 330 may be one or more, for example, a wide-angle camera, a telephoto camera, or the like. The camera 330 is used for acquiring image or video data, including a first image and a second image required for optical flow estimation, and the first image and the second image may be acquired by the same camera at different times.

Gyroscope 340 may be a three-axis gyroscope, a six-axis gyroscope, or the like. The gyroscope 340 is configured to acquire gyroscope data, which includes first gyroscope data and second gyroscope data required for optical flow estimation, and the gyroscope data at least includes angular velocity information.

The components shown in fig. 10 may be implemented in hardware, software, or a combination thereof. The electronic device 300 may be a physical device, such as a cell phone, a video camera, a tablet, a laptop, a PC, a drone, a wearable device, a robot, a server, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 300 is not limited to a single device, and may be a combination of a plurality of devices or a cluster including a large number of devices.

Embodiments of the present application further provide a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the method for estimating an optical flow provided by the embodiments of the present application is performed. The computer-readable storage medium may be implemented as, for example, memory 320 in electronic device 300 in FIG. 10.

Embodiments of the present application further provide a computer program product, which includes computer program instructions, and when the computer program instructions are read and executed by a processor, the method for estimating optical flow provided by the embodiments of the present application is performed.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An optical flow estimation method, comprising:

acquiring a first image, a second image, first gyroscope data and second gyroscope data; the first image and the second image are images acquired by the same camera at different moments, the first gyroscope data is data acquired by a gyroscope during the acquisition of the first image, and the second gyroscope data is data acquired by the gyroscope during the acquisition of the second image;

calculating a gyro domain from the first and second gyroscope data, the gyro domain being a two-dimensional motion field between the first and second images;

estimating a temporary optical flow between the first image and the second image according to the first image and the second image, and fusing the temporary optical flow and the gyro domain to obtain the optical flow between the first image and the second image.

2. The optical flow estimation method of claim 1, wherein the first gyroscope data is data acquired by a gyroscope during the exposure of the first image, and the second gyroscope data is data acquired by a gyroscope during the exposure of the second image; the camera acquires images and comprises two stages of exposure and post-processing.

3. The optical flow estimation method according to claim 1 or 2, wherein said calculating a gyro domain from said first gyro data and said second gyro data comprises:

calculating a rotation matrix between the first image and the second image from the first gyroscope data and the second gyroscope data;

calculating a homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera;

and calculating the gyro domain according to the homography matrix.

4. The optical flow estimation method according to claim 3, wherein the camera employs a shutter, and the calculating a rotation matrix between the first image and the second image from the first gyroscope data and the second gyroscope data includes:

calculating n rotation matrices between the first image and the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; the method comprises the following steps that n is an integer larger than 1, each group of data is collected at different moments, the ith rotation matrix is a rotation matrix between the ith part of a first image according to an exposure sequence and the ith part of a second image according to the exposure sequence, and i is any integer from 1 to n;

the calculating the homography matrix corresponding to the rotation matrix according to the rotation matrix and the internal parameters of the camera comprises:

calculating n homography matrixes corresponding to the n rotation matrixes according to the n rotation matrixes and the internal parameters of the camera;

the computing the gyro domain according to the homography matrix includes:

calculating n partial gyro domains according to the n homography matrixes, and splicing the n partial gyro domains into a gyro domain; the ith partial gyro domain is a two-dimensional motion field between the ith part of the first image in the exposure sequence and the ith part of the second image in the exposure sequence, and i is any integer from 1 to n.

5. The optical flow estimation method according to claim 4, wherein said calculating n rotation matrices between said first image and said second image according to n sets of data contained in said first gyroscope data and n sets of data contained in said second gyroscope data comprises:

calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the first gyroscope data₁～M_nAnd calculating corresponding n temporary rotation matrixes M according to n groups of data contained in the second gyroscope data_n+1～M_2n；

i traverses integers from 1 to n and rotates the matrix M according to n +1 temporary rotation matrices_i～M_n+iAnd calculating the ith rotation matrix between the first image and the second image, and obtaining the n rotation matrices after traversing.

6. The optical flow estimation method of claim 3, wherein the camera employs a global shutter, and the calculating a rotation matrix between the first image and the second image from the first gyroscope data and the second gyroscope data comprises:

calculating a rotation matrix between the whole of the first image and the whole of the second image according to n sets of data included in the first gyroscope data and n sets of data included in the second gyroscope data; wherein n is an integer greater than 1, and each group of data is acquired at different times.

7. The optical flow estimation method according to claim 6, wherein said calculating a rotation matrix between the whole of the first image and the whole of the second image based on the n sets of data included in the first gyroscope data and the n sets of data included in the second gyroscope data includes:

According to 2n temporary rotation matrixes M₁～M_2nThe rotation matrix is calculated.

8. The method according to any one of claims 1-7, wherein said estimating a provisional optical flow between said first image and said second image from said first image and said second image, and fusing said provisional optical flow and said gyro domain to obtain an optical flow between said first image and said second image comprises:

estimating the provisional optical flow from the first image and the second image using a neural network model;

and fusing the temporary optical flow and the gyro domain to obtain an optical flow between the first image and the second image.

9. The optical flow estimation method of claim 8, wherein said fusing the provisional optical flow and the gyro domain comprises: fusing the temporary optical flow and the gyro domain by using the neural network model;

the neural network model comprises m optical flow estimation modules which are sequentially connected, m is an integer larger than 1, and the kth optical flow estimation module executes the following steps:

extracting the kth level feature of the first image according to the kth-1 level feature of the first image, extracting the kth level feature of the second image according to the kth-1 level feature of the second image, and performing downsampling on the kth-1 level feature when the kth level feature is extracted;

fusing a kth-level gyro domain and a (k + 1) th-level temporary optical flow to obtain a kth-level fused optical flow, wherein the kth-level gyro domain is obtained by down-sampling a kth-1-level gyro domain;

calculating a kth-level temporary optical flow according to the kth-level feature of the first image, the kth-level feature of the second image and the kth-level fused optical flow, wherein the kth-level fused optical flow is up-sampled when the kth-level temporary optical flow is calculated;

wherein k is any integer from 1 to m, the 0 th-level feature of the first image is the first image, the 0 th-level feature of the second image is the second image, the 0 th-level gyro domain is the gyro domain, the m th-level gyro domain is 0, the m +1 th-level provisional optical flow is 0, and the optical flow between the first image and the second image is obtained from the 1 st-level provisional optical flow.

10. The optical flow estimation method according to claim 9, wherein the fusion of the kth-level gyro domain and the (k + 1) th-level temporary optical flow to obtain the kth-level fused optical flow comprises one of the following three ways:

fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level fusion optical flow, wherein the first optical flow fusion unit comprises at least one convolution layer;

predicting a k-th-level weight map according to the k-th-level features of the first image and the k-th-level features of the second image by using a weight prediction unit in the k-th optical flow estimation module, and performing weighted fusion on the k-th-level gyro domain and the k + 1-th-level temporary optical flow by using the k-th-level weight map to obtain a k-th-level fused optical flow; wherein pixel values in the kth-level weight map represent fusion weights, the weight prediction unit comprising at least one convolution layer;

fusing the kth-level gyro domain and the (k + 1) th-level temporary optical flow by using a first optical flow fusion unit in the kth optical flow estimation module to obtain a kth-level temporary fusion optical flow, predicting a kth-level weight map according to the kth-level feature of the first image and the kth-level feature of the second image by using a weight prediction unit in the kth optical flow estimation module, and performing weighted fusion on the kth-level gyro domain and the kth-level temporary fusion optical flow by using the kth-level weight map to obtain the kth-level fusion optical flow; wherein the first optical flow fusion unit and the weight prediction unit each include at least one convolution layer, and pixel values in the k-th-order weight map represent fusion weights.

11. A computer program product comprising computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1 to 10.

12. A computer-readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1-10.

13. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-10.

14. The electronic device of claim 13, further comprising a camera to capture the first image and the second image and a gyroscope to capture the first gyroscope data and the second gyroscope data.