CN111402302B

CN111402302B - Optical flow generating device and method

Info

Publication number: CN111402302B
Application number: CN202010352547.3A
Authority: CN
Inventors: 康燕斌; 张志齐
Original assignee: Shanghai Yitu Technology Co ltd
Current assignee: Shanghai Yitu Technology Co ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2023-06-06
Anticipated expiration: 2040-04-28
Also published as: CN111402302A

Abstract

The invention discloses an optical flow generating device, comprising: generating an antagonism network; generating an antagonism network comprising a first generator and a discriminator; the first generator comprises a first neural network, two frames of images are input by the first generator, and a corresponding first predicted optical flow is output; the output end of the first generator is connected with the input end of the discriminator; the optical flow input by the discriminator comprises a sample optical flow or a first predicted optical flow corresponding to the training set; the weight parameters of the first neural network of the first generator are obtained through training, and when the discriminator cannot distinguish the first predicted optical flow and the sample optical flow, the training of the first generator is completed. The invention also discloses a method for generating the optical flow. The invention has better light index and faster training speed.

Description

Optical flow generating device and method

Technical Field

The invention relates to the technical field of computer vision, in particular to an optical flow generating device. The invention also relates to an optical flow generation method.

Background

In the process of shooting a moving object or shooting an external environment when the shooting object moves, continuous frame images are formed, for the moving object in the frame images, the actual positions of the moving object change along with the change of time, and the frame images are arranged in time sequence, so that the positions of the moving object in the frame images also change continuously, and the optical flow refers to the change of the positions of the images corresponding to the moving object in different frame images; can also be expressed as: in computer vision, optical flow defines the movement of objects in an image, which may be caused by camera movement or object movement. Specifically, the amount of movement of a pixel representing the same object (object) in one frame of a video image to the next frame is represented by a two-dimensional vector. The optical flow generating device can predict the future position of the moving object according to the calculated optical flow of the moving object in the frame image. By optical flow calculation, motion capture, motion prediction, object tracking, motion recognition, and the like can be realized.

The existing optical flow generating method comprises the step of directly calculating the optical flow, wherein the optical flow is mainly calculated by using the Lucas-Kanade (L-K), and the method is invented by Bruce D. It assumes that the optical flow is a constant in the neighborhood of pixels, and then uses the least squares method to solve the basic optical flow equation for all pixels in the neighborhood. The calculation amount of directly calculating the optical flow is large and the speed is low.

With the development of deep learning technology and the wide application in the artificial intelligence fields such as computer vision, convolutional Neural Networks (CNNs) are also adopted to solve the problem of optical flow estimation, and the initial network structure of the CNN for calculating optical flow is FlowNet, and the improved version is fownet2.0. Flownet2.0 is commonly used in the prior art to calculate optical flow.

In FlowNet, a dimension-reduced encoding (encoder) module and a dimension-enhanced decoding (decoder) module are generally included; the encoder module includes FlowNetS and FlowNetCoor. The FlowNetCoor has better effect, and is added with manual correlation operation. The decoder modules of the FlowNetS and FlowNetCoor are the same.

The network structure of FlowNet2.0 comprises a superposition structure of FlowNetCoor+FlowNetS+FlowNetS, and the FlowNet2.0 has a fast calculation speed and meets the real-time requirement.

Disclosure of Invention

The invention aims to solve the technical problem of providing an optical flow generating device which has better optical indexes and faster training speed. For this purpose, the invention also provides an optical flow generation method.

In order to solve the above technical problems, an optical flow generating device provided by the present invention includes: an antagonizing network is generated.

The generation of the antagonism network includes a first generator and a arbiter.

The first generator comprises a first neural network, inputs two frames of images and outputs corresponding first predicted optical flow.

The output end of the first generator is connected with the input end of the discriminator.

The optical flow input by the discriminator comprises a sample optical flow or a first predicted optical flow corresponding to a training set; the sample optical flow is a real optical flow, and the discriminator is used for judging the reality of the first predicted optical flow; when the optical flow input by the discriminator is the first predicted optical flow, and the output of the discriminator judges that the optical flow input by the discriminator is the sample optical flow, the discriminator considers the first predicted optical flow to be equal to the sample optical flow. That is, when the optical flow is input by the discriminator, the discriminator does not know whether the input optical flow is the sample optical flow or the first predicted optical flow at first, and the inputted optical flow is the sample optical flow or the first predicted optical flow only after the discriminating process by the discriminator; typically, when the input optical flow is the sample optical flow, the arbiter determines that the input optical flow is the sample optical flow; this is achieved by training the discriminators individually using the sample optical flow. However, when the input optical flow is the first predicted optical flow, if the sample optical flow corresponding to the first predicted optical flow and the actual optical flow has a large deviation, the discriminator can separately recognize the first predicted optical flow and consider the first predicted optical flow and the sample optical flow to be different; how the first predicted optical flow and the sample optical flow are very close and the difference between them is less than the recognition limit of the arbiter, the arbiter treats the first predicted optical flow as the sample optical flow when the input optical flow is the first predicted optical flow.

The weight parameters of the first neural network of the first generator are obtained through training, and when the discriminator considers that the first predicted optical flow is equal to the sample optical flow, the training of the first generator is completed. That is, when the arbiter considers that the first predicted optical flow is equal to the sample optical flow, it indicates that the difference between the first predicted optical flow and the sample optical flow is small, and the first generator can be practically applied, that is, training is completed.

In the training stage, the two frames of images input by the first generator and the two frames of images input by the discriminator are the same and correspond to the samples in the training set, and the two frames of images are continuous two frames of images with a sequence from front to back in time.

The optical flow is a displacement formed in the two frame images by an object in the two frame images at different times.

In a further improvement, the generating countermeasure network further comprises a first loss function module for training the first generator, when the first generator is trained, the first loss function module inputs the first predicted optical flow and the corresponding sample optical flow, the first loss function module output end outputs a first loss function formed by the first predicted optical flow and the corresponding sample optical flow, and the first generator trains weight parameters of the first neural network according to the first loss function.

A further improvement is that, when training the first generator, the input of the discriminator inputs two frames of images, the first predicted optical flow and the corresponding sample optical flow.

The output of the discriminator outputs a determination of the authenticity of the input optical flow.

A further improvement is that the generation countermeasure network comprises a second generator, the second generator comprises a third neural network, the second generator inputs two frames of images and outputs a corresponding second predicted optical flow.

The third neural network is a student network with a smaller scale formed by performing knowledge distillation on the first neural network.

A further improvement is that the third neural network is a neural network formed by scaling down each layer of channels of the first neural network.

A further improvement is that the generating an countermeasure network further comprises a third loss function module for training the second generator; when the second generator is trained, the weight parameters of the first neural network and the weight parameters of the second neural network are fixed, the third loss function module inputs the first predicted optical flow and the second predicted optical flow, the output end of the third loss function module outputs a third loss function formed by the first predicted optical flow and the second predicted optical flow, and the second generator trains the weight parameters of the third neural network according to the third loss function.

A further improvement is that, in training the second generator, the input of the arbiter inputs two frames of images, the first predicted optical flow and the second predicted optical flow.

A further improvement is that the weight parameters of the second neural network of the arbiter are obtained by training, and the generating countermeasure network further comprises a second loss function module for training the arbiter.

In a further improvement, the network structures of the first neural network and the third neural network are FlowNet2.0.

Further improvements include network structures of the second neural network including Resnet-50, flown 2.0, mobilet, densnet, and acceptance.

In order to solve the technical problems, the optical flow generating method provided by the invention adopts a generation countermeasure network to generate the optical flow.

The optical flow input by the discriminator comprises a sample optical flow or a first predicted optical flow corresponding to a training set; the sample optical flow is a real optical flow, and the discriminator is used for judging the reality of the first predicted optical flow; when the optical flow input by the discriminator is the first predicted optical flow, and the output of the discriminator judges that the optical flow input by the discriminator is the sample optical flow, the discriminator considers the first predicted optical flow to be equal to the sample optical flow.

The weight parameters of the first neural network of the first generator are obtained through training, and when the discriminator considers that the first predicted optical flow is equal to the sample optical flow, the training of the first generator is completed.

A further improvement is that: the generating countermeasure network further comprises a first loss function module for training the first generator, when the first generator is trained, the first loss function module inputs the first predicted optical flow and the corresponding sample optical flow, the first loss function module output end outputs a first loss function formed by the first predicted optical flow and the corresponding sample optical flow, and the first generator trains weight parameters of the first neural network according to the first loss function.

The invention adopts the device for generating the optical flow by generating the countermeasure network, wherein the generating countermeasure network comprises a discriminator, and the discriminator can discriminate the first predicted optical flow generated by the first generator, thereby improving the optical index of the generated optical flow.

The first neural network of the first generator of the invention can adopt FlowNet2.0, thereby optimizing the network structure of the first generator.

The invention can also carry out knowledge distillation on the basis of the first generator to form the second generator, and the network scale of the second generator can be halved, thereby improving the training speed.

Drawings

The invention is described in further detail below with reference to the attached drawings and detailed description:

FIG. 1 is a schematic diagram of an optical flow generating device according to a first embodiment of the present invention;

fig. 2 is a schematic structural view of an optical flow generating device according to a second embodiment of the present invention.

Detailed Description

Optical flow generating device according to first embodiment of the present invention:

referring to fig. 1, which is a schematic diagram of an optical flow generating device according to a first embodiment of the present invention, fig. 1 is a schematic diagram mainly showing the structure of the optical flow generating device according to the first embodiment of the present invention when training a first generator 1; the optical flow generating device according to the first embodiment of the present invention includes: an antagonizing network is generated.

The generation of the countermeasure network comprises a first generator 1 and a discriminator 2.

The first generator 1 includes a first neural network, and the first generator 1 inputs two frames of images and outputs a corresponding first predicted optical flow.

The output end of the first generator 1 is connected with the input end of the discriminator 2.

The discriminator 2 comprises a second neural network, the discriminator 2 inputs two frames of images and an optical flow, and the optical flow input by the discriminator 2 comprises a sample optical flow or a first predicted optical flow corresponding to a training set; the sample optical flow is a real optical flow, and the discriminator 2 is used for judging the reality of the first predicted optical flow; when the optical flow input by the discriminator 2 is the first predicted optical flow, and the output of the discriminator 2 determines that the optical flow input by the discriminator is the sample optical flow, the discriminator 2 considers the first predicted optical flow to be equivalent to the sample optical flow. That is, when the optical flow is input to the discriminator 2, the discriminator 2 does not know whether the input optical flow is the sample optical flow or the first predicted optical flow at first, and the inputted optical flow is the sample optical flow or the first predicted optical flow after the discriminating process of the discriminator 2; in general, when the input optical flow is the sample optical flow, the arbiter 2 determines that the input optical flow is the sample optical flow; this is achieved by training the discriminators individually using the sample optical flow. However, when the input optical flow is the first predicted optical flow, if the sample optical flow corresponding to the first predicted optical flow and the actual optical flow has a large deviation, the arbiter 2 can recognize the first predicted optical flow alone and consider the first predicted optical flow and the sample optical flow to be different; how the first predicted optical flow and the sample optical flow are very close and the difference between them is smaller than the recognition limit of the arbiter 2, the arbiter 2 regards the first predicted optical flow as the sample optical flow when the inputted optical flow is the first predicted optical flow.

The weight parameters of the first neural network of the first generator 1 are obtained through training, and when the arbiter 2 considers that the first predicted optical flow is equivalent to the sample optical flow, the training of the first generator 1 is completed. That is, when the arbiter 2 considers that the first predicted optical flow is equal to the sample optical flow, it indicates that the difference between the first predicted optical flow and the sample optical flow is small, and the first generator 1 can be practically applied, that is, training is completed.

In the training stage, the two frames of images input by the first generator 1, that is, the kth frame of image and the kth+1 frame of image in fig. 1, and the two frames of images input by the discriminator 2 are identical and correspond to samples in the training set, and the two frames of images are continuous two frames of images having a sequential order in time.

The generating countermeasure network further comprises a first loss function module 3 for training the first generator 1, when the first generator 1 is trained, the first loss function module 3 inputs the first predicted optical flow and the corresponding sample optical flow, an output end of the first loss function module 3 outputs a first loss function formed by the first predicted optical flow and the corresponding sample optical flow, and the first generator 1 trains weight parameters of the first neural network according to the first loss function.

In training the first generator 1, the input of the discriminator 2 inputs two frames of images, the first predicted optical flow and the corresponding sample optical flow.

The output of the discriminator 2 outputs a determination of the authenticity of the input optical flow. Fig. 1 shows that when the output of the discriminator 2 is used to determine which optical flow is generated by a generator, and the difference between the optical flow generated by the generator and the sample optical flow cannot be determined, the optical index indicating the corresponding first predicted optical flow of the first generator 1 is good.

The weight parameters of the second neural network of the arbiter 2 are obtained through training, and the generating countermeasure network further comprises a second loss function module for training the arbiter.

In the first embodiment of the present invention, the first generator 1 and the arbiter 2 need to perform training, and the training mode is divided into 2 dimensions, namely, data and a loss function. The data are divided into two types, positive example data corresponding to the actual optical flow, that is, the sample optical flow, as the optical flow input to the discriminator 2, and negative example data corresponding to the predicted optical flow, that is, the first predicted optical flow, as the optical flow input to the discriminator 2. Because the positive column data is irrelevant to prediction, only the discriminator 2 can be trained, namely the sample data can be adopted and the second loss function module is combined to train the discriminator 2 independently; the negative example data can train the generator, namely the first generator 1 and the discriminator 2, simultaneously. For the loss function, for the arbiter 2, the loss function corresponding to the second loss function module is cross entropy (cross_entropy).

Preferably, the network structures of the first neural network are all flownet2.0. The network structure of the second neural network is Resnet-50. In other embodiments can also be: the network structure of the second neural network is flownet2.0, mobilet, densenet, and acceptance.

The first embodiment of the present invention adopts the device for generating the optical flow as the optical flow generation, and the discriminator 2 is included in the generation countermeasure network, so that the first predicted optical flow generated by the first generator 1 can be discriminated, and the optical index of the generated optical flow can be improved.

The first neural network of the first generator 1 of the first embodiment of the present invention adopts flownet2.0, so that the network structure of the first generator 1 itself can be optimized.

Optical flow generating device according to second embodiment of the present invention:

fig. 2 is a schematic structural view of an optical flow generating device according to a second embodiment of the present invention, and fig. 2 is a schematic structural view mainly showing the optical flow generating device according to the second embodiment of the present invention when training the second generator 4; the optical flow generating device according to the second embodiment of the present invention includes: an antagonizing network is generated.

The discriminator 2 comprises a second neural network, the discriminator 2 inputs two frames of images and an optical flow, and the optical flow input by the discriminator 2 comprises a sample optical flow or a first predicted optical flow corresponding to a training set; the sample optical flow is a real optical flow, and the discriminator 2 is used for judging the reality of the first predicted optical flow; when the optical flow input by the discriminator 2 is the first predicted optical flow, and the output of the discriminator 2 determines that the optical flow input by the discriminator is the sample optical flow, the discriminator 2 considers the first predicted optical flow to be equivalent to the sample optical flow.

The weight parameters of the first neural network of the first generator 1 are obtained through training, and when the arbiter 2 considers that the first predicted optical flow is equivalent to the sample optical flow, the training of the first generator 1 is completed.

In the training stage, the two frames of images input by the first generator 1, that is, the kth frame of image and the kth+1 frame of image in fig. 2, are identical to the two frames of images input by the discriminator 2 and correspond to samples in the training set, and the two frames of images are continuous two frames of images having a sequential order in time.

The output end of the discriminator 2 outputs a judgment of the authenticity of the input optical flow, and fig. 2 shows that the output of the discriminator 2 is used for judging which optical flow is generated by a generator, and when the difference between the optical flow generated by the generator and the sample optical flow cannot be judged, the optical index of the corresponding first predicted optical flow of the first generator 1 is good.

The generation countermeasure network comprises a second generator 4, the second generator 4 comprises a third neural network, and the second generator 4 inputs two frames of images and outputs corresponding second predicted optical flow.

The third neural network is a student network with a smaller scale formed by performing knowledge distillation on the first neural network. In a second embodiment of the present invention, the third neural network is a neural network formed by scaling down each layer of channels of the first neural network. Preferably, the third neural network is a neural network formed by cutting each layer of channels of the first neural network in half; the corresponding model calculation speed (latency) after the channel is cut in half is doubled; and the different reduction ratios can lead the corresponding model calculation speed to be increased in different ratios, so that the method can be flexibly set.

The generation of the countermeasure network further comprises a third loss function module 5 for training the second generator 4; when training the second generator 4, the weight parameters of the first neural network and the weight parameters of the second neural network are fixed, the third loss function module 5 inputs the first predicted optical flow and the second predicted optical flow, the output end of the third loss function module 5 outputs a third loss function formed by the first predicted optical flow and the second predicted optical flow, and the second generator 4 trains the weight parameters of the third neural network according to the third loss function.

In training the second generator 4, the input of the discriminator 2 inputs two frames of images, the first predicted optical flow and the second predicted optical flow.

The output of the discriminator 2 outputs a determination of the authenticity of the input optical flow.

The weight parameters of the second neural network of the arbiter 2 are obtained by training, and the generation countermeasure network further comprises a second loss function module for training the arbiter 2.

The network structure of the third neural network is FlowNet2.0.

The optical flow generating device according to the second embodiment of the present invention can further perform knowledge distillation based on the first generator 1 to form the second generator 4, and the network size of the second generator 4 can be halved, thereby improving the training speed.

The optical flow generating method of the first embodiment of the invention comprises the following steps:

the optical flow generation method according to the first embodiment of the present invention generates an optical flow by using the generation countermeasure network, that is, the optical flow generation device according to the first embodiment of the present invention shown in fig. 1 is used to generate an optical flow.

As shown in fig. 1, the generation of the countermeasure network includes a first generator 1 and a discriminator 2.

The optical flow generating method of the second embodiment of the invention comprises the following steps:

the optical flow generation method according to the second embodiment of the present invention generates an optical flow by using the generation countermeasure network, that is, the optical flow generation device according to the second embodiment of the present invention shown in fig. 2.

As shown in fig. 2, the generation of the countermeasure network includes a first generator 1 and a discriminator 2.

The third neural network is a student network with a smaller scale formed by performing knowledge distillation on the first neural network. In the method of the second embodiment of the present invention, the third neural network is a neural network formed by scaling down each layer of channels of the first neural network. Preferably, the third neural network is a neural network formed by cutting each layer of channels of the first neural network in half; the corresponding model calculation speed (latency) after the channel is cut in half is doubled; and the different reduction ratios can lead the corresponding model calculation speed to be increased in different ratios, so that the method can be flexibly set.

The network structure of the third neural network is FlowNet2.0.

The present invention has been described in detail by way of specific examples, but these should not be construed as limiting the invention. Many variations and modifications may be made by one skilled in the art without departing from the principles of the invention, which is also considered to be within the scope of the invention.

Claims

1. An optical flow generating device, comprising: generating an antagonism network;

the generation countermeasure network comprises a first generator, a discriminator and a first loss function module;

the first generator comprises a first neural network, two frames of images are input by the first generator, and a corresponding first predicted optical flow is output;

the output end of the first generator is connected with the input end of the discriminator;

the optical flow input by the discriminator comprises a sample optical flow or a first predicted optical flow corresponding to a training set; the sample optical flow is a real optical flow, and the discriminator is used for judging the reality of the first predicted optical flow; when the optical flow input by the discriminator is the first predicted optical flow, and the output of the discriminator judges that the optical flow input by the discriminator is the sample optical flow, the discriminator considers that the first predicted optical flow is equal to the sample optical flow;

the weight parameters of the first neural network of the first generator are obtained through training, the first loss function module is used for training the first generator, when the first generator is trained, the first loss function module inputs the first predicted optical flow and the corresponding sample optical flow, the output end of the first loss function module outputs a first loss function formed by the first predicted optical flow and the corresponding sample optical flow, and the first generator trains the weight parameters of the first neural network according to the first loss function; training of the first generator is completed when the arbiter considers the first predicted optical flow to be equivalent to the sample optical flow.

2. The optical flow generating device according to claim 1, wherein: in the training stage, two frames of images input by the first generator are identical to two frames of images input by the discriminator and correspond to samples in a training set, and the two frames of images are continuous two frames of images with a front-back sequence in time;

3. The optical flow generating device according to claim 1, wherein: when training the first generator, the input end of the discriminator inputs two frames of images, the first predicted optical flow and the corresponding sample optical flow;

4. The optical flow generating device according to claim 3, wherein: the generation countermeasure network comprises a second generator, the second generator comprises a third neural network, the second generator inputs two frames of images and outputs a corresponding second predicted optical flow;

5. The optical flow generating device according to claim 4, wherein: the third neural network is formed by scaling down each layer of channels of the first neural network.

6. The optical flow generating device according to claim 4, wherein: the generating an countermeasure network further includes a third loss function module for training the second generator; when the second generator is trained, the weight parameters of the first neural network and the weight parameters of the second neural network are fixed, the third loss function module inputs the first predicted optical flow and the second predicted optical flow, the output end of the third loss function module outputs a third loss function formed by the first predicted optical flow and the second predicted optical flow, and the second generator trains the weight parameters of the third neural network according to the third loss function.

7. The optical flow generating device according to claim 6, wherein: when training the second generator, the input end of the discriminator inputs two frames of images, the first predicted optical flow and the second predicted optical flow;

8. The optical flow generating device according to claim 1, wherein: the weight parameters of the second neural network of the discriminator are obtained through training, and the generating countermeasure network further comprises a second loss function module for training the discriminator.

9. The optical flow generating device according to claim 4, wherein: the network structures of the first neural network and the third neural network are flownet2.0.

10. The optical flow generating device according to claim 1, wherein: the network structure of the second neural network comprises Resnet-50, flownet2.0, mobilet, densene, and indication.

11. An optical flow generating method, characterized in that: generating an optical flow by adopting a generation countermeasure network;

judging the authenticity of the input optical flow, wherein the optical flow input by the discriminator comprises a sample optical flow or a first predicted optical flow corresponding to the training set;

12. The optical flow generation method according to claim 11, wherein: in the training stage, two frames of images input by the first generator are identical to two frames of images input by the discriminator and correspond to samples in a training set, and the two frames of images are continuous two frames of images with a front-back sequence in time;

13. The optical flow generation method according to claim 11, wherein: when training the first generator, the input end of the discriminator inputs two frames of images, the first predicted optical flow and the corresponding sample optical flow;

14. The optical flow generation method according to claim 13, wherein: the generation countermeasure network comprises a second generator, the second generator comprises a third neural network, the second generator inputs two frames of images and outputs a corresponding second predicted optical flow;

15. The optical flow generation method according to claim 14, wherein: the third neural network is formed by scaling down each layer of channels of the first neural network.

16. The optical flow generation method according to claim 14, wherein: the generating an countermeasure network further includes a third loss function module for training the second generator; when the second generator is trained, the weight parameters of the first neural network and the weight parameters of the second neural network are fixed, the third loss function module inputs the first predicted optical flow and the second predicted optical flow, the output end of the third loss function module outputs a third loss function formed by the first predicted optical flow and the second predicted optical flow, and the second generator trains the weight parameters of the third neural network according to the third loss function.

17. The optical flow generation method of claim 16, wherein: when training the second generator, the input end of the discriminator inputs two frames of images, the first predicted optical flow and the second predicted optical flow;

18. The optical flow generation method according to claim 11, wherein: the weight parameters of the second neural network of the discriminator are obtained through training, and the generating countermeasure network further comprises a second loss function module for training the discriminator.

19. The optical flow generation method of claim 16, wherein: the network structures of the first neural network and the third neural network are flownet2.0.

20. The optical flow generation method according to claim 11, wherein: the network structure of the second neural network comprises Resnet-50, flownet2.0, mobilet, densene, and indication.