CN113538525B

CN113538525B - Optical flow estimation method, model training method and corresponding devices

Info

Publication number: CN113538525B
Application number: CN202110597248.0A
Authority: CN
Inventors: 黄哲威; 袁嵩; 胡晨; 周舒畅
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2023-12-05
Anticipated expiration: 2041-05-28
Also published as: CN113538525A

Abstract

The application relates to the technical field of video processing, and provides an optical flow estimation method, a model training method and a corresponding device. The optical flow estimation method comprises the following steps: obtaining an inferred image; inputting the inferred images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules in the model, and outputting optical flows among the inferred images; the input of the first optical flow estimation module comprises an inferred image and an initial optical flow, and the input of the other optical flow estimation modules comprises the inferred image and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: merging input data, downsampling the merged data, calculating a preliminary optical flow based on a downsampling result, and upsampling the preliminary optical flow to the original resolution; the downsampling multiple of each optical flow estimation module is decreased, and the downsampling multiple of the first optical flow estimation module is a value of the ratio of the resolution of the inferred image to the resolution of the training image or the vicinity thereof. The method can accurately estimate the optical flow under the condition that the training image resolution is low and the inferred image resolution is high.

Description

Optical flow estimation method, model training method and corresponding devices

Technical Field

The present application relates to the field of optical flow estimation technologies, and in particular, to an optical flow estimation method, a model training method, and a corresponding apparatus.

Background

Dense optical flow estimation (hereinafter referred to as optical flow estimation) is an important component in video processing and video understanding, and dense optical flow is used to describe the motion vectors of each pixel of an image of a frame of video to the next frame. In recent years, an algorithm based on deep learning is becoming a mainstream algorithm for optical flow estimation.

Because the actual video is difficult to accurately label the optical flow, the training data set of the current optical flow estimation algorithm is mostly generated by using a computer graphics method. However, since the cost of creating high-resolution (e.g., 2K, 4K resolution) datasets is high and the later storage and computation will consume huge resources, the resolution of the training datasets currently in widespread use is often low (e.g., the FlyingChairs, flyingThings dataset, which is generally lower than 720×1280). Thus, models trained by existing algorithms are generally only suitable for optical flow estimation of lower resolution video, and when optical flow needs to be estimated in higher resolution video, there are generally two solutions:

firstly, downsampling a video frame, and then carrying out optical flow estimation by using a model, wherein downsampling necessarily causes image information loss, so that the accuracy of optical flow estimation is reduced;

The second is to temporarily increase the resolution of the training image during training, but this can blur the image content, resulting in poor training results.

In summary, existing algorithms are based on low resolution data set trained optical flow estimation models, with poor estimation results when optical flow estimation is performed for high resolution video.

Disclosure of Invention

An objective of an embodiment of the present application is to provide an optical flow estimation method, a model training method, and a corresponding apparatus, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides a method for estimating optical flow, including: acquiring two inferred images; inputting the two frames of inferred images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of inferred images; wherein the input data of the first optical flow estimation module comprises the two-frame inferred image and the initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two-frame inferred image and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiple of each optical flow estimation module is gradually decreased, and the downsampling multiple of the first optical flow estimation module is as follows: the ratio of the resolution of the inferred image to the training image, or a value near the ratio.

In the above method, the optical flow output by each optical flow estimation module can be regarded as an estimation of the optical flow between two frames of inferred images, and since the input data of each optical flow estimation module includes inferred images and the downsampling multiple of each optical flow estimation module decreases in sequence, the image information loss caused by downsampling of each optical flow estimation module decreases in sequence, and the receptive fields corresponding to each optical flow estimation module also decrease in sequence. Thus, the front optical flow estimation module (especially the first optical flow estimation module) is mainly used for capturing the larger motion of the object in the inferred image, the rear optical flow estimation module is mainly used for capturing the more detailed motion of the object in the inferred image, and the input data of each optical flow estimation module comprises the optical flow output by the last optical flow estimation module, so that the optical flow estimation model realizes the gradual estimation of the optical flow from coarse to fine.

Further, since the downsampling multiple of the first optical flow estimation module is the ratio (or a value near the ratio) of the resolution of the inferred image to the resolution of the training image, the first optical flow estimation module can also well process the inferred image (may be a high resolution image) based on the parameters learned by the training image (may be a low resolution image) in the training phase, that is, can better estimate the basic motion condition (i.e., larger amplitude motion) of the object in the image, and on the basis of the basic motion condition, the object motion details estimated by the subsequent optical flow estimation modules continuously optimize the optical flow output by the first optical flow estimation module, so that a higher-precision optical flow estimation result can be finally obtained.

In an implementation manner of the first aspect, the fusing input data of the optical flow estimation module to obtain fused data includes: aligning a second inferred image in the input data to a first inferred image in the input data using optical flow in the input data, resulting in an aligned second inferred image; and fusing the optical flow in the input data, the first inferred image and the aligned second inferred image to obtain fused data.

The first inferred image may be any one of two inferred images, and the second inferred image is a different one of the two inferred images than the first inferred image. In the above implementation manner, the optical flow, the original estimated image, and the estimated image aligned by the optical flow are fused, and since the contribution of various information to the optical flow estimation is considered, a better result can be obtained by performing subsequent optical flow estimation based on the fused data.

In an implementation manner of the first aspect, the calculating, based on the downsampling result, a preliminary optical flow includes: and processing the downsampling result by using at least one convolution layer to obtain a primary optical flow.

In the above implementation, the convolutional neural network (i.e., the neural network including at least one convolutional layer) is used to estimate the preliminary optical flow, so that a better estimation effect can be obtained.

In an implementation manner of the first aspect, the downsampling the fused data to obtain a downsampling result includes: downsampling the fusion data by using a downsampling layer to obtain a downsampling result; the upsampling the preliminary optical flow back to the resolution of the input data comprises: upsampling the preliminary optical flow back to the resolution of the input data using an upsampling layer; wherein the downsampling layer and the upsampling layer do not include parameters requiring training.

In the above implementation, since the downsampling layer and the upsampling layer do not include parameters that need to be trained, it is easy to generalize to images with different resolutions, and parameter rewriting is not needed.

In an implementation manner of the first aspect, the downsampling manner of the downsampling layer includes one of the following: bilinear interpolation, bicubic interpolation, and nearest neighbor interpolation; the upsampling layer upsamples in a manner corresponding to the downsampling layer.

The interpolation algorithms provided in the implementation mode are simple in operation, support sampling multiples to be decimal, and are high in flexibility.

In one implementation manner of the first aspect, the downsampling multiple of the optical flow estimation module except for the first one is: the downsampling multiple of the last optical flow estimation module is divided by a scaling factor, or a value near the value, wherein the scaling factor is greater than 1.

In the implementation manner, the downsampling factors to be taken by each optical flow estimation module can be calculated in proportion, the calculation manner is simple, the distribution of the downsampling factors is uniform, and the optical flow estimation precision is improved.

In an implementation manner of the first aspect, the value around the numerical value refers to a value between 1/2 and 2 times the numerical value.

In the above implementation, the fluctuation range of the downsampling factor is limited to be between [1/2,2], which is a reasonable range, if the fluctuation range is set too small, this results in lack of flexibility in the selection of the downsampling factor, and if the fluctuation range is set too large, this results in impaired effect of the scaling factor used for calculating the downsampling factor.

In an implementation manner of the first aspect, the downsampling multiple of the optical flow estimation module is further related to a motion amplitude of an object in the inferred image.

In accordance with the foregoing, different optical flow estimation modules are used to capture different magnitudes of motion in the inferred image, and thus in the above-described implementations, determining the downsampling multiple of the optical flow estimation module based on the magnitude of motion of objects in the inferred image facilitates a more accurate estimation of optical flow.

In one implementation of the first aspect, the downsampling multiple of the first optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image.

In light of the foregoing, the first optical flow estimation module is primarily used to capture large motions of objects. In the implementation manner, the larger the motion amplitude of the object in the inferred image is, the larger the downsampling multiple of the first optical flow estimation module is, so that the receptive field of the optical flow estimation module is enlarged, and the larger-amplitude motion of the object is effectively captured. The smaller the motion amplitude of the object in the inferred image is, the smaller the downsampling multiple of the first optical flow estimation module is, so that the receptive field of the optical flow estimation module is reduced, and the smaller-amplitude motion of the object is effectively captured. In summary, if the downsampling multiple of the first optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image, the receptive field of the downsampling multiple can be matched with the motion condition of the object, so that the optical flow estimation accuracy is improved.

In an implementation manner of the first aspect, the downsampling multiple of each optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image.

In the implementation manner, the larger the motion amplitude of the object in the inferred image is, the larger the downsampling multiple of each optical flow estimation module is, so that the receptive field of the optical flow estimation module is enlarged, and the larger-amplitude motion of the object is effectively captured. The smaller the motion amplitude of the object in the inferred image is, the smaller the downsampling multiple of each optical flow estimation module is, so that the receptive field of the optical flow estimation module is reduced, and the motion of the object with smaller amplitude is effectively captured. In short, if the downsampling multiple of each optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image, the receptive field of the downsampling multiple can be matched with the motion condition of the object, so that the optical flow estimation precision is improved.

In a second aspect, an embodiment of the present application provides a model training method, including: obtaining two frames of training images; inputting the two frames of training images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of training images; wherein the input data of the first optical flow estimation module comprises the two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two frames of training images and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiples of all the optical flow estimation modules are sequentially decreased; and calculating an optical flow estimation loss according to the optical flow output by at least one optical flow estimation module and the optical flow label, and updating parameters of the optical flow estimation model according to the optical flow estimation loss.

The optical flow estimation model trained by the model training method described above may be used in the optical flow estimation method provided by the first aspect or any implementation manner thereof, where the model may effectively estimate optical flow of the high resolution image, and may be trained using the low resolution image (i.e. the existing dataset).

In one implementation manner of the second aspect, the method further includes: the downsampling factor of each optical flow estimation module is changed in the training process, and the mode of changing the downsampling factor comprises random change or change in a fixed mode.

In the implementation manner, the combination of the downsampling multiples of each optical flow estimation module is changed, so that the optical flow estimation model can effectively process images with different resolutions. In the deducing stage, the downsampling factors of the rest optical flow estimation modules except the first optical flow estimation module have no strict rule, and the requirement of the downsampling factors is weakened to a certain extent by enhancing the generalization performance of the model, namely, even if the downsampling factors do not adopt optimal values, the optical flow estimation result is not greatly influenced.

In a third aspect, an embodiment of the present application provides an optical flow estimation device, including: an inferred image acquisition unit configured to acquire two frames of inferred images; the inferred optical flow estimation unit is used for inputting the two frames of inferred images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of inferred images; wherein the input data of the first optical flow estimation module comprises the two-frame inferred image and the initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two-frame inferred image and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiple of each optical flow estimation module is gradually decreased, and the downsampling multiple of the first optical flow estimation module is as follows: the ratio of the resolution of the inferred image to the training image, or a value near the ratio.

In a fourth aspect, an embodiment of the present application provides a model training apparatus, including: the training image acquisition unit is used for acquiring two frames of training images; the training optical flow estimation unit is used for inputting the two frames of training images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of training images; wherein the input data of the first optical flow estimation module comprises the two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two frames of training images and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiples of all the optical flow estimation modules are sequentially decreased; and the parameter updating unit is used for calculating the optical flow estimation loss according to the optical flow output by the at least one optical flow estimation module and the optical flow label and updating the parameters of the optical flow estimation model according to the optical flow estimation loss.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform a method as provided by the first aspect, the second aspect, or any one of the possible implementations of the first aspect, the second aspect, or both.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method provided by the first aspect, the second aspect or any one of the possible implementations of the two aspects.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates one possible flow of an optical flow estimation method provided by an embodiment of the present application;

FIG. 2 illustrates one possible structure of an optical flow estimation model provided by an embodiment of the present application;

FIG. 3 illustrates one possible configuration of an optical flow estimation module provided by an embodiment of the present application;

FIG. 4 illustrates one possible flow of a model training method provided by an embodiment of the present application;

FIG. 5 shows one possible configuration of an optical flow estimation device provided by an embodiment of the present application;

FIG. 6 shows one possible configuration of a model training apparatus provided by an embodiment of the present application;

fig. 7 shows a possible structure of the electronic device provided by the embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The terms "first," "second," and the like, are used merely to distinguish one entity or action from another entity or action, and are not to be construed as indicating or implying any actual such relationship or order between such entities or actions.

Fig. 1 shows a possible flow of the optical flow estimation method provided by the embodiment of the application. Fig. 2 and 3 show possible configurations of the optical flow estimation model and the optical flow estimation module, respectively, involved in the method, for reference in the description of the optical flow estimation method. The method may be performed, but is not limited to, by the electronic device shown in fig. 7, as to the structure of which reference may be made to the explanation below with respect to fig. 7.

Referring to fig. 1, the method includes:

step S110: two inferred images are acquired.

Deep learning based optical flow estimation generally includes at least two phases: the training stage is used for training an optical flow estimation model by using training images (also called training samples) in a training set, and the deducing stage is used for carrying out optical flow estimation by using the trained optical flow estimation model. The inferred image in step S110 is an image used in the inference stage.

The two estimated images may be respectively denoted as I ₁ And I ₂ ，I ₁ And I ₂ With a certain collectionThe time interval may be, for example, two video frames in a video that are consecutive or spaced apart. Since a large number of video frames are usually included in a video, if the optical flow in the video is estimated, the optical flow estimation method proposed by the present application can be continuously applied to estimate the optical flow between two video frames at a time. The acquisition method of the inferred image is not limited: for example, the optical flow estimation may be performed on the basis of video frames in real time while capturing video; for another example, a video file may be read, video frames extracted therefrom for optical flow estimation, and so forth. Note that the video described above is not limited to the video actually captured, but may be a video generated according to a computer vision algorithm.

The objective of the optical flow estimation method is to estimate I ₁ And I ₂ The optical flow between them, two of such optical flows are I respectively ₁ To I ₂ Optical flow sum I of (2) ₂ To I ₁ The optical flows of (2) are respectively denoted as F _1→2 And F _2→1 In the following mainly to estimate F _1→2 The case of (2) is described as an example.

Step S120: two frames of inferred images are input into an optical flow estimation model, and the optical flow between the two frames of inferred images is output through operation of a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model.

The optical flow estimation model includes a plurality of optical flow estimation modules, but is not precluded from including other network structures. The optical flow estimation modules are sequentially connected to form a serial structure, the optical flow estimation module at the initial position of the model is called a first optical flow estimation module, the optical flow estimation module next to the first optical flow estimation module is called a second optical flow estimation module, and the optical flow estimation module at the final position of the model is called a last optical flow estimation module. Referring to fig. 2, 3 optical flow estimation modules are shown in fig. 2, denoted as optical flow estimation module 1, optical flow estimation module 2, and optical flow estimation module 3, respectively, for ease of representation.

Wherein the input data of the first optical flow estimation module comprises I ₁ 、I ₂ And an initial optical flow, which can be recorded asThe initial optical flow may take a fixed value, for example, 0 vector, and the value of the initial optical flow may be preset before the optical flow estimation method is executed, or may be assigned after the optical flow estimation method is started to be executed. The input data of the optical flow estimation module other than the first one includes I ₁ 、I ₂ And the optical flow output by the last optical flow estimation module, e.g., the i (i)>1) The optical flow output by each optical flow estimation module is marked as +. >The optical flow outputted by the i-1 th optical flow estimation module is +.>Will act as one of the input data for the ith optical flow estimation module. The last optical flow output by the optical flow estimation module is namely the optical flow F estimated by the optical flow estimation model _1→2 . Referring to fig. 2, +.>As one item of input data of the optical flow estimation module 2, +.>As one item of input data of the optical flow estimation module 3, +.>

Further, the optical flow output by each optical flow estimation module can be regarded as F _1→2 Whereby the estimation result of each optical flow estimation module can be regarded as an optimization of the estimation result of the last optical flow estimation module (the estimation result of the first optical flow estimation module can be regarded as an optimization of the initial optical flow), whereby the optical flow estimation model enables a progressive estimation of the optical flow.

Each optical flow estimation module performs a similar operation to obtain an output optical flow, and the operation process may include the following steps:

step A: and fusing the input data of the optical flow estimation module to obtain fused data.

The fusion refers to the operation of integrating several items of information in the input data into one item of information, and the way of fusing the input data in the step a is not limited. Two of these are listed below:

Mode 1:

the optical flow in the input data and the two-frame inferred image are directly fused. For example, for the ith optical flow estimation module, the optical flow may be estimated by(i takes 1 as well because the initial optical flow has been denoted +.>)、I ₁ And I ₂ Directly splicing (concat) to obtain fusion data. The stitching may be replaced by other operations such as weighted summation.

Mode 2:

the second inferred image in the input data may be aligned to the first inferred image in the input data using the optical flow in the input data to obtain an aligned second inferred image, and then the optical flow in the input data, the first inferred image, and the aligned second inferred image may be fused to obtain fused data.

The first inferred image may be any one of two inferred images, and the second inferred image may be one of the two inferred images different from the first inferred image. Referring to the ith optical flow estimation module in FIG. 3, it may be assumed that the first inferred image is I ₁ The second inferred image is I ₂ Can be firstly utilizedPair I ₂ Backward mapping (backpedaling) is done to align it to I ₁ Aligned I ₂ Marked as->Then will->I ₁ And->And splicing to obtain fusion data. Of course, if each optical flow estimation module estimates I ₂ To I ₁ The optical flow outputted by the (i-1) th optical flow estimation module isAt this time will I ₂ Alignment to I ₁ Then forward mapping (forward mapping) may be employed.

In the embodiment 2, since the optical flow, the original estimated image, and the estimated image aligned by the optical flow are fused, and the contribution of various information to the optical flow estimation is considered, a preferable result can be obtained by performing the subsequent optical flow estimation based on the fused data.

In the input data, the optical flow and the two frame extrapolated images both have the same resolution, and the fusion operation does not change the resolution, i.e. the fusion data also has the same resolution as the input data.

And (B) step (B): and downsampling the fusion data to obtain a downsampling result.

Step C: and calculating to obtain a preliminary optical flow based on the downsampling result.

Step D: and upsampling the preliminary optical flow back to the resolution of the input data to obtain the optical flow output by the optical flow estimation module.

Step B, C, D is described in conjunction. The downsampling multiple in step B and the upsampling multiple in step D are identical, so that the preliminary optical flow obtained in step C can be restored back to the resolution of the input data in step D. Referring to FIG. 3, the downsampling multiple of the ith optical flow estimation module is K _i The up-sampling multiple of the module is also K _i 。

The structure for performing the downsampling operation in the optical flow estimation module may be referred to as a downsampling layer, and the structure for performing the upsampling operation may be referred to as an upsampling layer.

In one implementation, the downsampling layer and upsampling layer are designed to be free of parameters that need to be trained (but may include other parameters, such as downsampling factors), so that the trained optical flow estimation model is easily generalized to images of different resolutions (i.e., can be effectively processed for images of different resolutions), without parameter overwriting.

For example, the downsampling layer downsampling method may adopt one of bilinear interpolation, bicubic interpolation, nearest neighbor interpolation and the like. The interpolation algorithms are simple in operation, support the downsampling multiple to be decimal, and are high in flexibility. The upsampling layer upsamples in a manner corresponding to the downsampling layer, for example, the downsampling layer adopts bilinear interpolation, and the upsampling layer also adopts bilinear interpolation.

Of course, in other implementations, the downsampling layer and upsampling layer may also contain parameters that require training, e.g., the downsampling layer downsamples using convolution, while the upsampling layer upsamples using deconvolution.

Step C realizes the core function of the optical flow estimation module, namely, the optical flow with lower resolution is estimated based on the downsampling result with lower resolution, and then the optical flow with higher resolution is obtained through step D. In one implementation, the preliminary optical flow may be estimated using a convolutional neural network that includes at least one convolutional layer, but other structures are not excluded. Referring to fig. 3, the convolutional neural network in fig. 3 is implemented as N (N > 1) continuous convolutional layers, and the step size (stride) of the convolutional layers is 1, that is, the resolution of the downsampling result is not changed, the structure of the network is simple, and the optical flow estimation effect is good. Of course, it is not excluded to use other types of neural networks or other algorithms for the estimation of the preliminary optical flow in step C.

In the optical flow estimation model, the downsampling multiple of the first optical flow estimation module is set to a value that infers or is near the ratio of the resolution of the image to the training image. And, the downsampling multiple of each optical flow estimation module is set to be sequentially decreasing. Note that the "setting" here may be set in advance before the optical flow estimation method is executed, may be set temporarily before the optical flow estimation module starts optical flow estimation, or the like. The "setting" appearing hereinafter in plural is understood by reference to this meaning and will not be described again. According to the foregoing, since the up-sampling multiple and the down-sampling multiple are always kept identical, only the setting of the down-sampling multiple is discussed below.

The optical flow estimation method provided by the embodiment of the application is mainly aimed at the situation that the resolution of the inferred image is not lower than the resolution of the training image (of course, if the resolution of the inferred image is lower than the resolution of the training image, the method can also be used), and is mainly exemplified and described according to the premise. For example, the resolution of the estimated image is 2560×1440, and the resolution of the training image is 640×480. The ratio of resolutions of the inferred image and the training image may be defined as the ratio of the widths of two frames, such as 2560/640=4, the ratio of the heights of two frames, such as 1440/480=3, or the average of the above two ratios, such as (4+3)/2=3.5, although other definitions are not excluded.

As for values around the resolution ratio, there are various definition ways:

for example, the resolution ratio may be rounded (including upper, lower, rounded, etc.).

For another example, the value may be a value within one section including the resolution ratio. For example, values within the interval [ x- δ, x+δ ], where x represents the resolution ratio, δ represents the interval width, δ may take a constant or value related to x, such as 0.1x, 0.5x, etc., but δ is not desirable to be too large (e.g., greater than 0.5 x), otherwise certain values within the interval are not desirable to be referred to as values near x; also for example values within the interval x mu, x/mu, where x denotes the resolution ratio and mu denotes the scaling factor, 0 < mu < 1, e.g. 0.5, 0.9 etc. are preferable, but mu is not preferable to be too small (e.g. less than 0.5), otherwise some values within the interval are not preferable to be referred to as values around x.

The downsampling factor decrementing referred to above is understood to be a non-strict decrementing, i.e. the downsampling factor of the optical flow estimation module other than the first one must not be greater than the downsampling factor of the last optical flow estimation module. In particular, in the inference phase, the case where the downsampling factors of the respective optical flow estimation modules are equal does not belong to the above-mentioned decrementing. For example, with respect to fig. 2, if the downsampling multiple of the optical flow estimation module 1 is 4, the downsampling multiple of the three optical flow estimation modules may be a combination of [4,2,1], [4,2,2], [4, 1], etc.

The setting of the downsampling multiple can be manually operated and set, can be automatically operated and set by a computer program, or can be manually adjusted on the basis of the automatic operation of the computer program, and the like.

The following analysis shows that the downsampling factor of the optical flow estimation module is valued according to the requirements:

first, it has been mentioned that the optical flow output by each optical flow estimation module can be regarded as an estimation of the optical flow between two frames of inferred images, and since the input data of each optical flow estimation module includes two frames of inferred images and the downsampling multiple of each optical flow estimation module decreases in sequence, the image information loss caused by downsampling of each optical flow estimation module decreases in sequence, and the receptive field corresponding to each optical flow estimation module also decreases in sequence. The larger visual understanding of the receptive field is that the optical flow estimation module can 'see' a larger range in the inferred image, and the smaller visual understanding of the receptive field is that the optical flow estimation module can 'see' a smaller range in the inferred image.

Thus, the front optical flow estimation module (and in particular the first optical flow estimation module) is mainly used to capture the larger movements of the objects in the inferred image, while the rear optical flow estimation module is mainly used to capture the more detailed movements of the objects in the inferred image. Wherein the quantitative representation of the motion of the object is to infer the optical flow in the image. Since the input data of each optical flow estimation module includes the optical flow output by the last optical flow estimation module, the working principle of the whole optical flow estimation model is as follows: the basic motion condition of the object in the inferred image is estimated, then the motion details of the object in the inferred image are further estimated, and the gradual estimation of the light flow from coarse to fine is realized by continuously optimizing the light flow estimation results of all modules.

Further, since the downsampling multiple of the first optical flow estimation module is set to be the ratio (or a value near the ratio) of the resolution of the inferred image (may be a high-resolution image) to the resolution of the training image (may be a low-resolution image), the first optical flow estimation module, after downsampling, has a resolution equal to or close to the resolution of the training image, so that the first optical flow estimation module can also process the inferred image well, i.e., can estimate the basic motion of the object in the image well, based on the parameters learned by the training image when performing optical flow estimation based on the downsampling result. On the basis, the optical flow output by the first optical flow estimation module is continuously optimized through the object motion details estimated by the subsequent optical flow estimation module, and an optical flow estimation result with higher precision can be finally obtained.

The optical flow estimation method is characterized in that the downsampling multiple of each optical flow estimation module can be adjusted according to the resolution of the inferred image, namely the method has strong generalization capability, and in the prior art, the parameters of the model are generally fixed after training is finished, so that images with different resolutions cannot be effectively processed.

It should be appreciated that the optical flow estimation method, while being capable of accurately estimating optical flow in the case where the training image resolution is low and the inferred image resolution is high, does not indicate that the inferred image in the method must be a high definition image (e.g., 1080p, 2K, 4K), but may also be a standard definition image (e.g., 720 p) or even a low definition image (e.g., 480 p). The method has wide application range and high practical value.

Based on the above embodiments, some setting manners of the downsampling multiple of each optical flow estimation module are described below:

in one implementation, the downsampling multiple of the optical flow estimation module other than the first may be set to: the value obtained by dividing the downsampling multiple of the last optical flow estimation module by the scaling factor, or a value near the value, is hereinafter referred to as a preliminary sampling multiple for convenience of description. Wherein, the proportionality coefficient is larger than 1, for example, 1.5, 2, 4 and the like can be taken.

Taking fig. 2 as an example, assuming that the downsampling multiple of the optical flow estimation module 1 is set to 4 and the scaling factor is set to 2, the downsampling multiple of the optical flow estimation modules 2 and 3 may be set to 2 and 1, or values around 2 and 1.

The method for calculating the downsampling multiple according to the proportionality coefficient is simple and efficient, and the downsampling multiple is distributed uniformly, namely, each optical flow estimation module is used for capturing the movement of an object under different scales, so that the optical flow estimation precision is improved.

Further, there are several defining ways for values near the preliminary sampling multiple:

for example, the value may be a value obtained by rounding (including up-rounding, down-rounding, etc.) the preliminary sampling multiple.

For another example, the value may be a value within a section including the preliminary sampling multiple. For example values within the interval [ m- ε, m+ε ], where m represents a preliminary sampling multiple, ε represents the interval width, ε may take a constant or value related to m, such as 0.1m, 0.5m, etc., but ε is not desirable to be too large (e.g., greater than 0.5 m), otherwise some values within the interval are not desirable to be referred to as values around m; also for example values within the interval [ mxβ, m/β ], where m represents a preliminary sampling multiple, β represents a scaling factor, 0 < β < 1, e.g., 0.5, 0.9, etc., but β is not desirable to be too small (e.g., less than 0.5), otherwise some values within the interval are not desirable to be referred to as values around m.

The above-described section including the preliminary sampling times is referred to as a fluctuation range of the downsampling times, and if the fluctuation range is set too small, this results in lack of flexibility in setting the downsampling times, and if the fluctuation range is set too large, this results in impaired effect of the scaling factor used for calculating the downsampling times, and thus the fluctuation range needs to be set appropriately, for example, according to the examples given above.

In addition, it should be noted that, regardless of the fluctuation of the value of the downsampling multiple around the preliminary sampling multiple, the aforementioned requirement of downsampling multiple decrease must be satisfied first.

In one implementation, the downsampling factor of the optical flow estimation module (partial or complete) may be set while taking into account the motion amplitude of the object in the inferred image, in addition to the aforementioned criteria (downsampling factor decrementing requirement, setting requirement of the first optical flow estimation module). In accordance with the foregoing, different optical flow estimation modules are used to capture the different magnitudes of motion of the object in the inferred image, and therefore the downsampling factor set when considering the magnitude of motion of the object in the inferred image facilitates a more accurate estimation of optical flow by the optical flow estimation module.

As to how to evaluate the motion amplitude of the object in the inferred image, it may be determined manually, for example, based on the scene of the inferred image, such as the camera that acquired the inferred image being farther from the moving object, the motion amplitude of the moving object in the inferred image is less probable based on the perspective principle. Of course, it is also possible to calculate it by means of a computer program.

It will be appreciated that this implementation may also be combined with the previous implementation of calculating the downsampling multiple in accordance with a scaling factor.

The following two examples illustrate how the downsampling multiple may be set according to the motion amplitude:

example 1:

the downsampling multiple of the first optical flow estimation module is set to be positively correlated with the motion amplitude of the object in the inferred image, and the downsampling multiple of other optical flow estimation modules can be set as required, and is not particularly limited.

Alternatively, the resolution ratio of the inferred image and the training image may be calculated first, and then the ratio calculation result may be properly adjusted according to the motion amplitude of the object in the inferred image, to obtain the downsampling multiple of the first optical flow estimation module. It should be appreciated that although the ratio calculation may be adjusted, the downsampled multiple obtained after adjustment must also be located near the resolution ratio.

In light of the foregoing, the first optical flow estimation module is primarily used to capture large motions of objects. In example 1, the larger the motion amplitude of the object in the inferred image is, the larger the downsampling multiple of the first optical flow estimation module is, so that the receptive field is enlarged, and the larger-amplitude motion of the object is effectively captured; the smaller the motion amplitude of the object in the inferred image is, the smaller the downsampling multiple of the first optical flow estimation module is, so that the receptive field of the optical flow estimation module is reduced, and the smaller-amplitude motion of the object is effectively captured. In short, the downsampling multiple of the first optical flow estimation module is set to be positively correlated with the motion amplitude of the object in the inferred image, so that the receptive field of the downsampling multiple can be matched with the motion condition of the object, and the optical flow estimation precision is improved.

Example 2

The downsampling multiple of each optical flow estimation module is set to be positively correlated with the motion amplitude of the object in the inferred image.

Alternatively, the resolution ratio of the inferred image and the training image may be calculated first, then the ratio calculation result may be adjusted appropriately according to the motion amplitude of the object in the inferred image, so as to obtain the downsampling multiple of the first optical flow estimation module, and then the downsampling multiple of the remaining optical flow estimation modules may be calculated sequentially according to a scaling factor. In the case of a fixed scaling factor, if the downsampling multiple of the first optical flow estimation module is set to be positively correlated with the motion amplitude of the object in the inferred image, the downsampling multiple of the remaining optical flow estimation modules naturally also positively correlates with the motion amplitude of the object in the inferred image.

In example 2, the larger the motion amplitude of the object in the inferred image is, the larger the downsampling multiple of each optical flow estimation module is, so that the receptive field of the optical flow estimation module is enlarged, and the larger-amplitude motion of the object is effectively captured. The smaller the motion amplitude of the object in the inferred image is, the smaller the downsampling multiple of each optical flow estimation module is, so that the receptive field of the optical flow estimation module is reduced, and the motion of the object with smaller amplitude is effectively captured. In short, the downsampling multiple of each optical flow estimation module is set to be positively correlated with the motion amplitude of the object in the inferred image, so that the receptive field of the downsampling multiple can be matched with the motion condition of the object, and the optical flow estimation precision is improved.

In one implementation, the downsampling multiple of the optical flow estimation module (partial or complete) may also be set according to the operand of the optical flow estimation model. The reason is that the larger the downsampling multiple of the optical flow estimation module is, the smaller the calculation amount is when the optical flow estimation is performed, whereas the smaller the downsampling multiple of the optical flow estimation module is, the larger the calculation amount is when the optical flow estimation is performed. For example, in order to increase the optical flow estimation speed, the downsampling multiple of each optical flow estimation module may be appropriately increased.

It will be appreciated that such an implementation may also be combined with the previous implementation of calculating the downsampling multiple in accordance with a scaling factor and/or the implementation of calculating the downsampling multiple in accordance with the magnitude of the motion of the object.

FIG. 4 illustrates one possible flow of a model training method provided by embodiments of the present application for training the optical flow estimation model in the above embodiments. The method may be performed, but is not limited to, by the electronic device shown in fig. 7, as to the structure of which reference may be made to the explanation below with respect to fig. 7. Referring to fig. 4, the method includes:

step S210: two frames of training images are obtained.

Step S220: two frames of training images are input into an optical flow estimation model, and operation is carried out through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, so that optical flow between the two frames of training images is output.

The input data of the first optical flow estimation module comprises two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first optical flow estimation module comprises two frames of training images and the optical flow output by the last optical flow estimation module. The operations performed by each optical flow estimation module include: fusing input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; and upsampling the preliminary optical flow back to the resolution of the input data to obtain the optical flow output by the optical flow estimation module. The downsampling multiple of each optical flow estimation module is gradually decreased.

It will be appreciated that steps S210 and S220 are similar to steps S110 and S120 (the inferred image is replaced with a training image), and reference is made to the foregoing for a repetition and will not be explained. However, two differences need to be pointed out:

first, step S220 does not limit how to set the downsampling coefficient of the first optical flow estimation module.

The meaning of the term decreasing in step S220 is slightly different from that of step S120, and in step S120, the case where the downsampling multiple of each optical flow estimation module is equal does not belong to decreasing, whereas in step S220, the case where the downsampling multiple of each optical flow estimation module is equal belongs to decreasing.

Step S230: and calculating an optical flow estimation loss according to the optical flow output by the at least one optical flow estimation module and the optical flow label, and updating parameters of the optical flow estimation model according to the optical flow estimation loss.

At least one of the optical flow estimation modules in step S230 may be the last optical flow estimation module only, i.e. only calculate the loss based on the final optical flow estimation result, or only supervise the overall performance of the optical flow estimation model.

At least one of the optical flow estimation modules in step S230 may be all optical flow estimation modules, i.e. calculate the loss based on the final optical flow estimation result and all intermediate estimation results (the total loss is the weighted sum of the losses corresponding to the optical flow estimation modules), or monitor the performance of each optical flow estimation module in the optical flow estimation model. Such a method of calculating the loss is advantageous for improving the optical flow estimation accuracy of the model, but the calculation amount of the calculation loss increases.

It is understood that at least one optical flow estimation module in step S230 may be only a part of the total optical flow estimation modules.

In addition to the optical flow estimation loss mentioned in step S230, other loss may be calculated according to the requirement, for example, if the optical flow estimation model is used to perform a video interpolation task, a video frame to be interpolated may be calculated according to the optical flow estimation result and the training image, and then an image reconstruction loss may be calculated according to the video frame to be interpolated and the reference video.

The updating of the model parameters in step S230 may employ a back propagation algorithm, and the specific procedure thereof may refer to the prior art, which is not described herein.

The optical flow estimation model obtained by training by the model training method can effectively estimate the optical flow of the high-resolution image, and can train by using the low-resolution image (namely a data set widely used in the prior art). Of course, the optical flow estimation model may also be used to process images that are not of high resolution.

Optionally, the downsampling multiple of each optical flow estimation module may be continuously changed during the training process, so that the optical flow estimation model can effectively process images with different resolutions. In the inference phase, the downsampling factors of the rest optical flow estimation modules except the first optical flow estimation module are set to be less than strict rule (only decrease), so that the requirement for setting the downsampling factors is weakened to a certain extent by enhancing the generalization performance of the model during training, that is, even if the downsampling factors do not adopt optimal settings, the optical flow estimation result is not greatly influenced.

The manner of changing the downsampling multiple includes a random change, a fixed change, etc., but it should be noted that, regardless of the change, the downsampling multiple of each optical flow estimation module should satisfy the decrementing requirement in step S220.

Wherein, the changing in a fixed manner may be according to a set downsampling multiple change, for example, three sets of downsampling multiple [4,2,1], [4,2,2], [4, 1] are set first, the downsampling multiple of each optical flow estimation module initially takes a first set of values [4,2,1], the downsampling multiple of each optical flow estimation module after one change takes a second set of values [4,2,2], and the downsampling multiple of each optical flow estimation module after two changes takes a third set of values [4,2,2]. Alternatively, the change in a fixed manner may be a change in the downsampling multiple according to a set rule, for example, the downsampling multiple of each optical flow estimation module is [4, 4] initially, the set rule is that the downsampling multiple 4 of the fixed optical flow estimation modules 1 and 2 is unchanged, the downsampling multiple of the optical flow estimation module 3 is reduced according to a 2-fold relationship, the downsampling multiple of each optical flow estimation module after one change is [4,4,2] according to the rule, and the downsampling multiple of each optical flow estimation module after two changes is [4,4,1].

For example, training an optical flow estimation model often uses batch-to-batch (batch) training, inputting a batch of training images into the model at a time, calculating the total loss, and updating the model parameters. Thus, a set of downsampling factors may be set for each batch of training images, i.e. the frequency of changing the downsampling factors is set to one batch of training images. Of course, in other implementations, the frequency may be set to a frame of training images, or a fixed duration, etc.

Fig. 5 shows a functional block diagram of an optical flow estimation device 300 according to an embodiment of the present application. Referring to fig. 5, optical flow estimating apparatus 300 includes:

an inferred image acquisition unit 310 for acquiring two frames of inferred images;

an inferred optical flow estimation unit 320, configured to input the two frames of inferred images into an optical flow estimation model, perform an operation through a plurality of optical flow estimation modules sequentially connected in the optical flow estimation model, and output an optical flow between the two frames of inferred images;

wherein the input data of the first optical flow estimation module comprises the two-frame inferred image and the initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two-frame inferred image and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiple of each optical flow estimation module is gradually decreased, and the downsampling multiple of the first optical flow estimation module is as follows: the ratio of the resolution of the inferred image to the training image, or a value near the ratio.

In one implementation of the optical flow estimation device 300, the optical flow estimation module fuses input data of the optical flow estimation module to obtain fused data, including: aligning a second inferred image in the input data to a first inferred image in the input data using optical flow in the input data, resulting in an aligned second inferred image; and fusing the optical flow in the input data, the first inferred image and the aligned second inferred image to obtain fused data.

In one implementation of the optical flow estimation device 300, the optical flow estimation module calculates a preliminary optical flow based on the downsampling result, including: and processing the downsampling result by using at least one convolution layer to obtain a primary optical flow.

In one implementation of the optical flow estimation device 300, the optical flow estimation module downsamples the fused data to obtain a downsampled result, including: downsampling the fusion data by using a downsampling layer to obtain a downsampling result; the optical flow estimation module upsamples the preliminary optical flow back to a resolution of the input data, comprising: upsampling the preliminary optical flow back to the resolution of the input data using an upsampling layer; wherein the downsampling layer and the upsampling layer do not include parameters requiring training.

In one implementation of the optical flow estimation device 300, the downsampling layer downsampling includes one of: bilinear interpolation, bicubic interpolation, and nearest neighbor interpolation; the upsampling layer upsamples in a manner corresponding to the downsampling layer.

In one implementation of optical flow estimation device 300, the downsampling multiple of the optical flow estimation modules other than the first is: the downsampling multiple of the last optical flow estimation module is divided by a scaling factor, or a value near the value, wherein the scaling factor is greater than 1.

In one implementation of optical flow estimation device 300, values near the value refer to values between 1/2 and 2 times the value.

In one implementation of optical flow estimation device 300, the downsampling factor of the optical flow estimation module is also related to the motion amplitude of objects in the inferred image.

In one implementation of optical flow estimation device 300, the downsampling multiple of the first optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image.

In one implementation of optical flow estimation device 300, the downsampling multiple of each optical flow estimation module is positively correlated with the motion amplitude of the object in the inferred image.

The optical flow estimating device 300 according to the embodiment of the present application has been described in the foregoing method embodiments, and for brevity, reference may be made to the corresponding contents of the method embodiments where the device embodiment is not mentioned.

Fig. 6 shows a functional block diagram of a model training apparatus 400 according to an embodiment of the present application. Referring to fig. 6, the model training apparatus 400 includes:

a training image acquisition unit 410 for acquiring two frames of training images;

the training optical flow estimation unit 420 is configured to input the two training images into an optical flow estimation model, perform operation through a plurality of optical flow estimation modules sequentially connected in the optical flow estimation model, and output an optical flow between the two training images; wherein the input data of the first optical flow estimation module comprises the two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two frames of training images and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiples of all the optical flow estimation modules are sequentially decreased;

The parameter updating unit 430 is configured to calculate an optical flow estimation loss according to the optical flow output by the at least one optical flow estimation module and the optical flow label, and update the parameter of the optical flow estimation model according to the optical flow estimation loss.

In one implementation of the model training apparatus 400, the apparatus further comprises: and the multiple adjusting module is used for changing the downsampling multiple of each optical flow estimating module in the training process, wherein the mode of changing the downsampling multiple comprises random change or change according to a fixed mode.

The model training apparatus 400 according to the embodiment of the present application has been described in the foregoing method embodiment, and for brevity, reference may be made to the corresponding contents of the method embodiment where the apparatus embodiment is not mentioned.

Fig. 7 shows a possible structure of an electronic device 500 according to an embodiment of the present application. Referring to fig. 7, the electronic device 500 includes: processor 510, memory 520, and communication interface 530, which are interconnected and communicate with each other by a communication bus 540 and/or other forms of connection mechanisms (not shown).

Wherein the processor 510 includes one or more (only one shown), which may be an integrated circuit chip, having signal processing capabilities. The processor 510 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a micro control unit (Micro Controller Unit, MCU), a network processor (Network Processor, NP), or other conventional processor; but may also be a special purpose processor including a graphics processor (Graphics Processing Unit, GPU), a Neural network processor (Neural-network Processing Unit, NPU for short), a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Also, when the processor 510 is plural, some of them may be general-purpose processors, and another may be special-purpose processors.

The Memory 520 includes one or more (Only one shown in the drawings), which may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable programmable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), and the like.

Processor 510 and other possible components may access memory 520, read and/or write data therein. In particular, one or more computer program instructions may be stored in memory 520 that may be read and executed by processor 510 to implement the optical flow estimation method and/or the model training method provided by embodiments of the present application.

Communication interface 530 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for data interaction. Communication interface 530 may include an interface for wired and/or wireless communication.

It is to be understood that the configuration shown in fig. 7 is illustrative only, and that electronic device 500 may also include more or fewer components than shown in fig. 7, or have a different configuration than shown in fig. 7. The components shown in fig. 7 may be implemented in hardware, software, or a combination thereof. The electronic device 500 may be a physical device, such as a PC, a notebook, a tablet, a cell phone, a server, a smart wearable device, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 500 is not limited to a single device, and may be a combination of a plurality of devices or a cluster of a large number of devices.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer program instructions which execute the optical flow estimation method and/or the model training method provided by the embodiment of the application when being read and run by a processor of a computer. For example, a computer-readable storage medium may be implemented as memory 520 in electronic device 500 in FIG. 7.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of optical flow estimation, comprising:

acquiring two inferred images;

inputting the two frames of inferred images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of inferred images;

wherein the input data of the first optical flow estimation module comprises the two-frame inferred image and the initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two-frame inferred image and the optical flow output by the last optical flow estimation module;

The operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module;

the downsampling multiple of each optical flow estimation module is gradually decreased, and the downsampling multiple of the first optical flow estimation module is as follows: the ratio of the resolution of the inferred image to the training image, or a value near the ratio.

2. The optical flow estimation method according to claim 1, wherein the fusing the input data of the optical flow estimation module to obtain fused data includes:

aligning a second inferred image in the input data to a first inferred image in the input data using optical flow in the input data, resulting in an aligned second inferred image;

and fusing the optical flow in the input data, the first inferred image and the aligned second inferred image to obtain fused data.

3. The optical flow estimation method according to claim 1 or 2, characterized in that the calculating a preliminary optical flow based on the downsampling result includes:

And processing the downsampling result by using at least one convolution layer to obtain a primary optical flow.

4. The method of optical flow estimation according to claim 1, wherein the downsampling the fused data to obtain downsampled results includes:

downsampling the fusion data by using a downsampling layer to obtain a downsampling result;

the upsampling the preliminary optical flow back to the resolution of the input data comprises:

upsampling the preliminary optical flow back to the resolution of the input data using an upsampling layer;

wherein the downsampling layer and the upsampling layer do not include parameters requiring training.

5. The optical flow estimation method of claim 4, wherein the downsampling layer downsampling comprises one of: bilinear interpolation, bicubic interpolation, and nearest neighbor interpolation; the upsampling layer upsamples in a manner corresponding to the downsampling layer.

6. The optical flow estimation method according to claim 1, wherein the downsampling multiple of the optical flow estimation modules other than the first one is: the downsampling multiple of the last optical flow estimation module is divided by a scaling factor, or a value near the value, wherein the scaling factor is greater than 1.

7. The optical flow estimation method according to claim 6, characterized in that the value around the numerical value refers to a value between 1/2 and 2 times the numerical value.

8. The optical flow estimation method of claim 1, wherein the downsampling factor of the optical flow estimation module is further related to a motion amplitude of an object in the inferred image.

9. The optical flow estimation method of claim 8, wherein the downsampling factor of the first optical flow estimation module is positively correlated with the motion amplitude of objects in the inferred image.

10. The optical flow estimation method of claim 8, wherein the downsampling multiple of each optical flow estimation module is positively correlated with the motion amplitude of objects in the inferred image.

11. A method of model training, comprising:

obtaining two frames of training images;

inputting the two frames of training images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of training images; wherein the input data of the first optical flow estimation module comprises the two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two frames of training images and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiples of all the optical flow estimation modules are sequentially decreased;

And calculating an optical flow estimation loss according to the optical flow output by at least one optical flow estimation module and the optical flow label, and updating parameters of the optical flow estimation model according to the optical flow estimation loss.

12. The model training method of claim 11, wherein the method further comprises:

the downsampling factor of each optical flow estimation module is changed in the training process, and the mode of changing the downsampling factor comprises random change or change in a fixed mode.

13. An optical flow estimating device, comprising:

an inferred image acquisition unit configured to acquire two frames of inferred images;

the inferred optical flow estimation unit is used for inputting the two frames of inferred images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of inferred images;

14. A model training device, comprising:

the training image acquisition unit is used for acquiring two frames of training images;

the training optical flow estimation unit is used for inputting the two frames of training images into an optical flow estimation model, performing operation through a plurality of optical flow estimation modules which are sequentially connected in the optical flow estimation model, and outputting optical flow between the two frames of training images; wherein the input data of the first optical flow estimation module comprises the two frames of training images and an initial optical flow, and the input data of the optical flow estimation modules except the first one comprises the two frames of training images and the optical flow output by the last optical flow estimation module; the operations performed by each optical flow estimation module include: fusing the input data of the optical flow estimation module to obtain fused data; downsampling the fusion data to obtain a downsampling result; calculating to obtain a preliminary optical flow based on the downsampling result; upsampling the preliminary optical flow back to the resolution of the input data to obtain an optical flow output by the optical flow estimation module; the downsampling multiples of all the optical flow estimation modules are sequentially decreased;

And the parameter updating unit is used for calculating the optical flow estimation loss according to the optical flow output by the at least one optical flow estimation module and the optical flow label and updating the parameters of the optical flow estimation model according to the optical flow estimation loss.

15. A computer readable storage medium, having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any of claims 1-12.

16. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-12.