CN111626308A

CN111626308A - Real-time optical flow estimation method based on lightweight convolutional neural network

Info

Publication number: CN111626308A
Application number: CN202010322368.5A
Authority: CN
Inventors: 孔令通; 杨杰; 黄晓霖
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2020-09-04
Anticipated expiration: 2040-04-22
Also published as: CN111626308B

Abstract

The invention discloses a real-time optical flow estimation method based on a lightweight convolutional neural network, which comprises the following steps: giving two adjacent frames of images, and constructing a multi-scale feature pyramid with shared parameters; on the basis of the constructed characteristic pyramid, a first frame image U-shaped network structure is constructed by adopting deconvolution operation to perform multi-scale information fusion; initializing a lowest-resolution optical flow field to be zero, and performing deformation operation based on bilinear sampling on the second frame matching characteristics after the optical flow estimated by the second low resolution is up-sampled; local similarity calculation based on inner products is carried out on the characteristics of the first frame and the deformed characteristics of the second frame, matching cost is constructed, and cost aggregation is carried out; taking the multi-scale features, the up-sampled optical flow field and the matched cost features after cost aggregation as the input of an optical flow regression network, and estimating the optical flow field under the resolution; and repeating until the optical flow field under the highest resolution is estimated. By the method and the device, the optical flow estimation is more accurate, and the model is light, efficient, real-time and rapid.

Description

Real-time optical flow estimation method based on lightweight convolutional neural network

Technical Field

The invention relates to the technical field of computer vision, in particular to a real-time optical flow estimation method based on a lightweight convolutional neural network.

Background

Optical flow estimation is a fundamental research task in computer vision, and is a bridge and a link connecting images and videos. The core idea is to give two frames of images before and after, and estimate the corresponding relation of each pixel. This can also be understood approximately as the projected motion field of the 3D object on the 2D image plane. The optical flow method plays an important role in behavior understanding, video processing, motion prediction, multi-view 3D reconstruction, automatic driving, instantaneous positioning, and map construction (SLAM). Therefore, it is important in the field of computer vision to estimate optical flow accurately and quickly (especially dense optical flow).

The traditional optical flow estimation method is based on brightness consistency assumption, introduces prior knowledge such as local smoothness and the like, and solves the problem by constructing an energy function and a regularization constraint condition and using a variation optimization strategy. The disadvantage is slow running speed and poor estimation effect for large displacement.

The block matching based method can obtain sparse optical flow of a non-occlusion area in an image, and then fill a missing part through an interpolation algorithm and construct dense optical flow. The method has the disadvantages that the non-reference block matching algorithm relates to a random initialization and random search algorithm, the result depends on a random initial value, and the stability is not high. And the large number of search matching operations increases the time overhead.

The existing deep learning-based method constructs an image pyramid or a single-feature pyramid, and the method fuses multi-scale features by constructing a U-shaped network structure, so that the matched features have global consciousness, and the robustness of the algorithm is improved. In the existing deep learning method, the matching cost is directly used as the optical flow regression network input, but the dynamic range of the optical flow regression network input at the same time is inconsistent with the dynamic range of the previously-stage up-sampled optical flow field characteristics, so that the performance is reduced.

The application numbers are: 201710731234.7, entitled: the Chinese patent of a dense optical flow estimation method and a dense optical flow estimation device discloses a dense optical flow estimation method and a dense optical flow estimation device. However, it still relies on traditional methods and cannot make inferences quickly in real time.

Therefore, it is urgently needed to provide a lightweight, efficient, real-time and fast convolutional neural network for full-scene dense optical flow estimation.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a real-time optical flow estimation method based on a lightweight convolutional neural network, which carries out cost aggregation after matching the cost, adjusts the dynamic range of output while purifying the matching cost, improves the network performance, and exceeds the existing deep learning method in terms of parameter quantity, inference speed and model precision.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides a real-time optical flow estimation method based on a lightweight convolutional neural network, which comprises the following steps of:

s11: giving two adjacent frames of images, and extracting hierarchical image features by using a parameter-shared convolutional neural network to construct a first frame feature pyramid and a second frame feature pyramid;

s12: on the basis of the characteristic pyramid constructed in the S11, constructing a first frame image U-shaped network structure by adopting deconvolution operation to perform multi-scale information fusion to obtain multi-scale characteristics;

s13: initializing a lowest-resolution optical flow field to be zero, and performing bilinear sampling-based deformation operation on the second frame matching characteristics after the optical flow field estimated by the second low resolution is up-sampled;

s14: local similarity calculation based on inner products is carried out on the characteristics of the first frame characteristic pyramid and the characteristics of the deformed second frame obtained in the step S13, matching cost is constructed, and cost aggregation is carried out;

s15: taking the multi-scale features constructed in S12, the up-sampled optical flow field in S13 and the matched cost features after cost aggregation in S14 as the input of an optical flow regression network, and estimating the optical flow field under the resolution;

s16: and repeating the steps from S13 to S15 until the optical flow field under the highest resolution is estimated.

Preferably, the S11 specifically includes:

given two adjacent input images I₁,I₂Extracting multi-scale image features by a pyramid network, and constructing a first frame feature pyramid and a second frame feature pyramid:

wherein:

for the first frame image feature at the k-th level,

for the second frame image feature at the k-th level, k denotes a scale level, and k is 1,2, …, 6. Where 1 represents 1/2 native resolution and 6 represents 1/64 native resolution.

Preferably, the S12 specifically includes:

aiming at the first frame feature pyramid, the pyramid feature of the k +1 level is used

Upsampling to k level spatial resolution by deconvolution operation, and matching with the original pyramid feature of k level

Performing cascade convolution to obtain k level semantic features of fused multi-scale information

Preferably, the S13 specifically includes:

optical flow field flow estimated at the k +1 th level^k+12 times of spatial upsampling is carried out to obtain an initial optical flow Up of the k level₂(flow^k+1) Using Up₂(flow^k+1) Pyramid feature to k level of second frame image

Carrying out deformation operation based on bilinear sampling to obtain deformed target characteristics

Preferably, the S14 specifically includes:

s141: calculating a matching cost:

wherein, the inner product is represented, x represents the two-dimensional space position coordinate of the first frame feature, d represents the two-dimensional coordinate of the search offset at x, and the search radius is R, so that d belongs to the square region of { - (2R +1), …,2R +1} × { - (2R +1), …,2R +1 };

s142: for matching cost c^k(x, d) performing convolution operation of 3 × 3 to obtain matched cost characteristics after cost aggregation

Compared with the prior art, the invention has the following advantages:

(1) according to the real-time optical flow estimation method based on the lightweight convolutional neural network, multi-scale features are obtained by performing multi-scale information fusion in S12, and compared with the traditional method of only constructing an image or a single feature pyramid, the fusion feature pyramid has higher expressive power; both low texture information and multi-scale semantic information are considered, and a full-scene dense optical flow field can be accurately estimated;

(2) according to the real-time optical flow estimation method based on the lightweight convolutional neural network, the deformation operation based on bilinear sampling is carried out on the second frame matching characteristics from coarse to fine in S13, the spatial distance of large-amplitude movement can be shortened, the challenge brought by large-amplitude movement is relieved, and residual error estimation is facilitated; (ii) a

(3) According to the real-time optical flow estimation method based on the lightweight convolutional neural network, through cost aggregation in S14, compared with the traditional strategy, the cost aggregation enables the original matching cost based on inner products to have certain adaptability, and therefore the network performance is improved;

(4) according to the real-time optical flow estimation method based on the lightweight convolutional neural network, the multiple information in the S15 cascade connection is used as the input of the optical flow regression network, and different from the prior art, semantic information is provided by fused multi-scale features instead of original pyramid features, so that the overall perception field of the network is improved, and mismatching is reduced. In addition, the characteristics after cost aggregation are used for replacing the original matching cost as input, so that the network convergence is accelerated and the model precision is improved;

(5) according to the real-time optical flow estimation method based on the lightweight convolutional neural network, a pyramid estimation method from coarse to fine is adopted for optical flow estimation in S16, and specifically, the relative displacement of large-amplitude motion in a low-resolution pyramid is small, so that the search radius R during matching is reduced, and therefore, compared with the conventional method, the method has the advantages of large estimation dynamic range and high inference speed. (ii) a

(6) The real-time optical flow estimation method based on the lightweight convolutional neural network provided by the invention is light in weight, efficient, real-time and rapid, can be deployed in mobile computing equipment and has strong practicability.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flow chart of a method for estimating a real-time optical flow based on a lightweight convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a network structure diagram of a method for estimating real-time optical flow based on a lightweight convolutional neural network according to an embodiment of the present invention;

FIG. 3a is a first frame of an image according to an embodiment of the present invention;

FIG. 3b is a diagram of a second frame image according to an embodiment of the present invention

FIG. 3c is a result of a dense optical flow estimation obtained by performing real-time optical flow estimation on FIGS. 3a and 3b using a method according to an embodiment of the present invention;

FIG. 4a is a first frame image according to another embodiment of the present invention;

FIG. 4b is a diagram of a second frame image according to another embodiment of the present invention

FIG. 4c is a result of a dense optical flow estimation obtained by performing real-time optical flow estimation on the optical flows of FIGS. 4a and 4b according to an embodiment of the present invention;

FIG. 5a is a first frame image according to another embodiment of the present invention;

FIG. 5b is a diagram of a second frame image according to another embodiment of the present invention

FIG. 5c is a result of a dense optical flow estimation obtained by performing real-time optical flow estimation on FIGS. 5a and 5b using a method according to an embodiment of the present invention;

FIG. 6a is a first frame image according to another embodiment of the present invention;

FIG. 6b is a diagram of a second frame image according to another embodiment of the present invention

FIG. 6c is a result of a dense optical flow estimation obtained by performing real-time optical flow estimation on FIGS. 6a and 6b using a method according to an embodiment of the present invention;

FIG. 7a is a first frame image according to another embodiment of the present invention;

FIG. 7b is a diagram of a second frame image according to another embodiment of the present invention

FIG. 7c is a result of a dense optical flow estimation using a method of one embodiment of the present invention to perform real-time optical flow estimation on FIGS. 7a and 7 b;

FIG. 8 is a comparison of the method of the present invention with existing depth models in terms of parametric quantities and inferred velocities.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

Fig. 1 is a flowchart of a method for estimating a real-time optical flow based on a lightweight convolutional neural network according to an embodiment of the present invention.

Referring to fig. 1, the method for estimating a real-time optical flow based on a lightweight convolutional neural network of the present embodiment includes the following steps:

s12: on the basis of the characteristic pyramid constructed in the S11, a first frame image U-shaped network structure is constructed through deconvolution operation to perform multi-scale information fusion, and multi-scale characteristics are obtained;

s16: and repeating S13-S15 until the optical flow field under the highest resolution is estimated.

In an embodiment, S11 specifically includes:

given two adjacent input images I₁，I₂Extracting multi-scale image features by a pyramid network, and constructing a first frame feature pyramid and a second frame feature pyramid:

wherein:

for the first frame image feature at the k-th level,

S12 specifically includes:

aiming at the first frame feature pyramid, the pyramid features of the k +1 level are combined

S13 specifically includes:

S14 specifically includes:

s141: calculating a matching cost:

In one embodiment, the optical flow network structure shown in fig. 2 is first constructed using any deep learning framework. Such as: the proposed network architecture can be implemented using a PyTorch framework.

Then, a forward propagation algorithm as shown in fig. 1 is constructed, and the optical flow field with 5 levels of resolutions including network outputs 1/4, 1/8, 1/16, 1/32 and 1/64 is trained end to end by using the following multi-scale loss function:

α therein₆＝0.32,α₅＝0.08,α₄＝0.02,α₃＝0.01,α₂0.005 is the weighting coefficient of the loss function between each level. flow (W)^l(x) Representing the optical flow field estimated by the l-th level network,

supervised signal representing down-sampling of true optical flow to corresponding hierarchical resolution, | luminance₂Representing a 2 norm.

Next, the proposed model was supervised trained using a second step multi-scale loss function using the FlyingChairs, Flyingthings3D synthetic dataset. In the FlyingChairs training phase, the initial learning rate is set to lr being 1e-4, 600k iterations are performed, and then the decay is half of the previous learning rate at 300k, 400k, 500k iterations. The model is then fine-tune on the Flyingthings3D dataset with the initial learning rate set to lr-1 e-5 for a total of 500k iterations, decaying to half the previous learning rate at 200k, 300k, 400k iterations. Through the training of the two stages, the proposed model can be subjected to fine-tune in other synthetic or real scene data sets and finally deployed for use. In the training process, a plurality of data amplification modes such as random mirror image, random rotation, random scaling, random color dithering, random slicing and the like are used.

And finally, when the model is trained and is actually used, the optical flow field with the highest resolution (1/4 resolution) in 5 levels is adopted for up-sampling to obtain the optical flow field with the resolution of the original input image as the final estimation result of the network.

The effects of the examples of the present invention will be further described below by experiments.

1. Conditions of the experiment

The MPI Sintel and KITTI standard test video image sequence is adopted as experimental data in the experiment. The experimental facility had one Intel Core i7-6700 CPU and a single NVIDIA GTX1080Ti GPU, and the experimental environment was PyTorch-0.4.0.

2. Content of the experiment

The proposed dense optical flow estimation method is validated from both qualitative and quantitative perspectives.

2.1 qualitative test results

The invention selects 5 representative adjacent frame test picture sequences 3a-3b, 4a-4b, 5a-5b, 6a-6b and 7a-7b from a computer synthesis data set MPI Sintel and a real automatic driving data set KITTI, and test scenes comprising flexible object motion, large-amplitude rigid motion and the like, and optical flow estimation results of the method are shown in figures 3c, 4c, 5c, 6c and 7 c.

2.2 quantitative test results

The method adopts MPI Sintel and KITTI test data set to carry out quantitative analysis on the precision of the estimated dense optical flow. The estimated result is submitted to a relevant test server for evaluation. The methods compared include the currently preferred FlowNet C, FlowNet2, LiteFlowNet and PWC-Net. The evaluation indexes include an Average End Point Error (AEPE) and an Error estimation pixel percentage (Fl). Where a correctly estimated pixel is defined as an estimated value that differs from the authentic signature by less than 3 pixels or by a difference distance that is less than 5% of the authentic signature amplitude. The performance of the related methods in the test data set is shown in Table 1, where Fl-Noc represents the Fl index for non-occluded regions.

TABLE 1 comparison of different depth learning methods in Sintel, KITTI test datasets

The best results are shown in bold in table 1, from which it can be seen that the average accuracy of optical flow estimation in different test benchmarks of multiple data sets exceeds most of the current advanced methods. The optical flow estimation method based on the optical flow prediction model has the advantages that the optical flow estimation precision can be improved in a plurality of different testing environments, and the scene generalization capability is good. As shown in FIG. 8, the method has the fastest inference speed under the same test environment, and can reach 63fps in a 448x1024 resolution video sequence, so that the high-efficiency real-time performance and the wide application prospect of the method are fully embodied.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A real-time optical flow estimation method based on a lightweight convolutional neural network is characterized by comprising the following steps:

s13: initializing a lowest-resolution optical flow field to be zero, and performing deformation operation based on bilinear sampling on the second frame matching characteristics after the optical flow estimated by the second low resolution is up-sampled;

2. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 1, wherein the S11 specifically comprises:

wherein:

for the first frame image feature at the k-th level,

for the second frame image feature at the k-th level, k denotes a scale level, and k is 1,2, …, 6. Wherein 1 represents 1/2 atomStarting resolution, 6 denotes 1/64 original resolution.

3. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 1 or 2, wherein the S12 specifically comprises:

4. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 1 or 2, wherein the S13 specifically comprises:

5. The method according to claim 3, wherein the S13 specifically comprises:

6. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 1 or 2, wherein the S14 specifically comprises:

s141: calculating a matching cost:

7. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 3, wherein the step S14 specifically comprises:

s141: calculating a matching cost:

8. The method for estimating real-time optical flow based on a lightweight convolutional neural network as claimed in claim 4, wherein the step S14 specifically comprises:

s141: calculating a matching cost:

9. The method for estimating a real-time optical flow based on a lightweight convolutional neural network as claimed in claim 5, wherein the step S14 specifically comprises:

s141: calculating a matching cost: