CN114693744A - Optical flow unsupervised estimation method based on improved cycle generation countermeasure network - Google Patents

Optical flow unsupervised estimation method based on improved cycle generation countermeasure network Download PDF

Info

Publication number
CN114693744A
CN114693744A CN202210150329.0A CN202210150329A CN114693744A CN 114693744 A CN114693744 A CN 114693744A CN 202210150329 A CN202210150329 A CN 202210150329A CN 114693744 A CN114693744 A CN 114693744A
Authority
CN
China
Prior art keywords
optical flow
generator
network
training
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210150329.0A
Other languages
Chinese (zh)
Inventor
张涛
刘晓晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210150329.0A priority Critical patent/CN114693744A/en
Publication of CN114693744A publication Critical patent/CN114693744A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

An optical flow unsupervised estimation method for generating a confrontation network based on improved circulation comprises the steps of firstly, improving a generator structure, introducing cavity convolution to increase the receptive field of the network under the condition that the size of a characteristic diagram is not reduced, introducing a channel-space attention module to enable a generator to be more concentrated on moving objects in an input picture, and secondly, introducing Spynet as a teacher network to guide a generator. Finally, an optical flow consistency loss and a contour consistency loss are introduced. The improvements are placed into a loop-generating countermeasure network framework, with the generator responsible for spoofing the discriminator with the generated image. The generator and the discriminator carry out the antagonistic training in the mode, when the discriminator can not distinguish the image source, the training is completed, and then the estimation of the optical flow can be realized by utilizing two continuous images. The method can be used for autonomous navigation tasks based on optical flow, which take unmanned planes, unmanned vehicles, underwater unmanned submergence vehicles and the like as carriers and are easily influenced by severe conditions such as illumination change, rainy days and the like.

Description

Optical flow unsupervised estimation method based on improved cycle generation countermeasure network
Technical Field
The invention relates to the technical field of operation and maintenance of photovoltaic power generation systems, in particular to an optical flow unsupervised estimation method for generating a countermeasure network based on improved circulation.
Background
Navigation is a key technology for unmanned systems (unmanned vehicles, unmanned planes, underwater unmanned vehicles and the like) to perform tasks, and is concerned by more and more scholars. Currently, the mainstream Navigation methods of the unmanned System are a Global Positioning System (GPS) and an Inertial Navigation System (INS). However, in urban or indoor environments, GPS is generally unusable due to signal occlusion, and the INS inevitably introduces cumulative errors due to the use of integration to obtain navigation parameters, resulting in a reduction in long-term working accuracy. Due to the limitations of GPS and INS themselves, more and more navigation approaches are being developed. The optical flow is inspired by nature as one of bionic visual navigation, and on a biological layer, insects such as bees finish the behaviors of navigation, obstacle avoidance, landing and the like by utilizing the optical flow information in compound eyes. In computer vision, optical flow refers to relative movement of corresponding object pixels in two frames of pictures due to motion, so that the optical flow contains abundant motion information and scene structure information. The current optical flow acquisition modes are mainly divided into two types, namely a classical method and a learning method. The classical method is mainly based on a variational model method, mainly comprises an HS method and an LK method, although the variational model method can obtain accurate optical flow estimation, the optical flow calculation precision of the classical method under the conditions of large displacement, motion shielding, illumination change and the like needs to be further improved, and meanwhile, due to the existence of a large amount of iterative computations, the variational model method is poor in real-time performance, and further the application of the method is limited. With the rapid development of Graphic Processing Units (GPUs) and computer technologies, the deep learning-based optical flow estimation method has attracted extensive attention due to its advantages of strong robustness, high real-time performance, strong adaptability, and the like. However, most of the current optical flow estimation methods based on deep learning need corresponding pixel-by-pixel optical flow truth values to participate in model training, and the acquisition work of the optical flow truth values is extremely tedious, which greatly limits the optical flow estimation methods based on deep learning. The cyclic generation countermeasure network is an unsupervised network based on the idea of zero sum game, and the network realizes unsupervised interconversion of pictures in different domains through cyclic countermeasure of a generator and a discriminator. Based on the method, the loop generation countermeasure network is adopted to realize the loop conversion of the two frames of pictures and the light flow graph, and further the accurate estimation of the light flow graph is completed.
In general, the optical flow is used as a bionic visual navigation means, and the classical optical flow method is greatly influenced by the environment and has higher computational complexity. The deep learning method limits the application of the training samples due to the difficulty in obtaining the true values of the training samples, the cyclic generation countermeasure network can complete optical flow estimation under the condition of lacking the true values of the samples in an unsupervised training mode, but the classical cyclic generation countermeasure network does not complete specific optimization aiming at the optical flow estimation, so that the optical flow estimation result is inaccurate. Therefore, the invention improves the loop generation countermeasure network correspondingly aiming at the characteristics of the optical flow estimation task.
Disclosure of Invention
In order to solve the technical problems, the invention provides an optical flow unsupervised estimation method for generating a countermeasure network based on improved circulation, which is used for solving the problem that a true value is required for estimating an optical flow field by using a deep learning method, and the network is realized in an end-to-end training mode. Compared with the conventional optical flow estimation network such as Flownet, the optical flow estimation method provided by the invention abandons the dependence on the true value of the training sample in a cyclic countermeasure mode of samples of two different domains. First, the accuracy of the estimation of the generated optical flow is improved by improving the generator and introducing an attention mechanism. And then, the Spynet network is used as a teacher network to conduct guide training on the generator, so that the risk that the generated countermeasure network falls into mode collapse is reduced, meanwhile, the dependence of the network on the number of training samples is relieved, and therefore network training is accelerated. And finally, further improving the estimation precision of the optical flow by using the loss function after optimization. Through the improvement, the estimation effect of the optical flow of the moving object is improved, and further an accurate and robust optical flow field is obtained.
An optical flow unsupervised estimation method based on an improved cycle generation countermeasure network is characterized by comprising the following steps: the method comprises the following steps:
(1) a training stage: the training data includes picture I1Picture I2And optical flow visualization map IrealThree parts, of which picture I1And picture I2For calculating the optical flow, IrealIt is used to guide the optical flow image style finally generated by the network, in particular, the picture I1And picture I2After the channel dimensions are combined, a tensor I with the size of 6 dimensions is formed1+2And I withrealInputting the two into a cyclic generation network for training, wherein a generator G in the network is used for training according to I in the training process1+2Generating a tensor FLOW with the same size as the original image but with the channel number of 2, and then converting the FLOW into a tensor FLOW with the same I by using visualizationrealImages I of similar stylesfakeAnd is combined withfakeAnd IrealSimultaneously, the input data is input to a discriminator D for discrimination, the D is used for distinguishing the source of an input sample, the generator is used for deceiving the discriminator by the generated sample, the network forms countertraining in such a way, and the training is finished until the discriminator can not distinguish the image source;
(2) a prediction stage: after the training of the circularly generated countermeasure network is finished, the discriminator is frozen, only the generator is reserved, and at the moment, two pictures of the optical flow to be calculated are subjected to image processing
Figure RE-GDA0003643078900000021
And
Figure RE-GDA0003643078900000022
obtaining an image after channel merging
Figure RE-GDA0003643078900000023
Visualization of images with arbitrary optical flow
Figure RE-GDA0003643078900000024
Input into the network at the same time, and the network can be utilized
Figure RE-GDA0003643078900000025
Is completed to
Figure RE-GDA0003643078900000026
And (4) obtaining the predicted optical flow graph of the two frames of images.
As a further improvement of the invention, the specific meaning of the improvement of the cycle generation countermeasure network in the step (1) is as follows:
compared with the traditional method for improving the circularly generated countermeasure network, the method mainly comprises three aspects, firstly, the structure of a generator is improved, the receptive field of the network is increased under the condition that the size of a characteristic diagram is not reduced by introducing cavity convolution, meanwhile, a channel-space attention module CSAM is introduced to enable the generator to be more concentrated on moving objects in an input picture, further, the estimation precision of the optical flow is improved, secondly, Spynet is introduced to serve as a teacher network to guide the generator, the dependence of the network on large data volume is reduced, meanwhile, the risk that the circularly generated countermeasure network falls into mode collapse is reduced, and finally, the optical flow consistency loss and the contour consistency loss are introduced, and the precision of the generated optical flow is improved.
As a further improvement of the invention, the specific training process of the step (1) is as follows:
suppose there are two fields, field A and field B, in the problem being solved, field A representing the two optical flows to be solved I1And I26-dimensional tensor I generated after channel splicing1+2The domain B represents the optical flow diagram IrealThe proposed task of improving the cycle generation network is to simplify the cycle conversion between domain a and domain B, i.e. to use generator GAWill I1+2Conversion to domain B while using generator GBWill IrealAnd converting the image to the domain A, discriminating the image source by using a discriminator after the conversion, namely inputting the image generated by the generator and the real image to a discriminator D, judging the image source by the discriminator through training, guiding and optimizing the parameters of the generator, further improving the authenticity of the image generated by the generator, and carrying out co-training in the confrontation by the discriminator and the generator. In the training process, the consistency loss of the optical flow is generated by calculating the generated optical flow and I1、I2The relationship between the two is used for improving the precision of the generated light flow, and the contour consistency loss is obtained by comparing the generated light flow graph IfakeAnd I1To refine I by contour differencefakeThe contour information of the system is simultaneously introduced into Spynet as a teacher network, the EPE loss function is utilized to train and guide the generator, and when the discriminator cannot distinguishAnd if the image comes from the generator, the Nash balance is achieved, and the training is completed.
As a further improvement of the invention, the specific operation process of the step (2) is as follows:
after the training of the generation of the countermeasure network for the improvement cycle is completed, the discriminator is frozen, only the generator is adopted, and two images of the optical flow to be calculated are processed similarly to the training process
Figure RE-GDA0003643078900000031
Obtaining an image of a representative domain A after channel splicing
Figure RE-GDA0003643078900000032
With arbitrary real optical flow diagrams
Figure RE-GDA0003643078900000033
As field B, simultaneously into the trained network, at which time generator GAAnd GBThe interconversion between the domain A and the domain B is completed, and the optical flow prediction is completed under the unsupervised condition.
Has the advantages that:
(1) the invention improves the structure of the generator, increases the receptive field of the network under the condition of not reducing the size of the characteristic diagram by introducing the cavity convolution, and simultaneously introduces the channel-space attention module (CSAM) to enable the generator to be more concentrated on moving objects in an input picture, thereby improving the estimation precision of the optical flow.
(2) According to the invention, the teacher network is introduced to guide the generator, so that the risk of collapse into a mode is reduced while the training cost is reduced.
(3) The invention promotes the direction and intensity information of the optical flow generated by the network by introducing the optical flow consistency loss, and refines the outline of the moving object by the outline consistency loss.
Drawings
FIG. 1 is an overall framework of the improved cycle generation countermeasure network of the present invention;
FIG. 2 is a diagram of a color wheel for determining the direction and magnitude of optical flow;
FIG. 3 is a partial network structure of generator feature extraction proposed by the present invention;
FIG. 4 is a partial network structure of optical flow prediction for a generator according to the present invention;
FIG. 5 is a schematic diagram of a channel-space attention module according to the present invention
FIG. 6 is a structure of an arbiter employed in the present invention;
FIG. 7 is one embodiment of optical flow estimation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the method disclosed by the invention is used for solving the optical flow of the moving object by utilizing a deep learning method in view of the requirements of high robustness and high adaptability optical flow acquisition under the condition of illumination change. Specifically, optical flow is used as a means for describing object motion pixel by pixel, and the truth value of a dense optical flow field is very difficult to acquire, so that the depth network training is easy to overfit due to insufficient data set, and the accuracy of optical flow estimation is reduced. In order to overcome the defect that the traditional deep convolutional network such as the flonet needs to participate in model training in a true value, the invention introduces the improved cycle generation countermeasure network to carry out unsupervised acquisition of the optical flow field of the moving object. Firstly, the invention improves the network structure of the generator, simultaneously introduces the cavity convolution to maintain the size of the characteristic diagram and simultaneously improve the receptive field of the network, introduces the CSAM attention mechanism to enable the network to pay more attention to the moving foreground part in the input picture, and further improves the estimation precision of the generator to the optical flow. And secondly, Spynet is introduced as a teacher network to guide a generator, so that the risk of collapse of the network in a mode is reduced, and the precision and the robustness of the optical flow graph are further improved. And finally, introducing the consistency loss of the optical flow to promote the direction and the strength information of the optical flow generated by the network, and introducing the consistency loss of the outline to refine the outline of the moving object. The method can be used for autonomous navigation tasks based on optical flow, which take unmanned planes, unmanned vehicles, underwater unmanned submergence vehicles and the like as carriers and are easily influenced by severe conditions such as illumination change, rainy days and the like.
The whole framework of the loop generation countermeasure network improved by the invention is shown as a figure1 is shown. Two RGB pictures I of optical flow to be solved1And I2And 6-dimensional tensors are generated after the channels are spliced, so that the network can be conveniently trained.
In the loop generation countermeasure network, real _ A and real _ B are two types of pictures without corresponding relationship, and in the invention, real _ A is two frames of optical flow pictures to be solved, which contain optical flow information. real _ B is any real light flow graph, mainly responsible for indicating the direction of training for the network. The loop generation countermeasure network is composed of two completely symmetrical branches, and for simplifying the description, the invention only explains real _ A to rec _ A. First, real _ A passes through generator GAGenerating a two-dimensional optical flow matrix, and then converting the two-dimensional matrix into a three-dimensional image, namely fake _ B, namely a generator G through optical flow visualization operationAAnd is responsible for converting the original graph into the optical flow graph. Then, fake _ B and real _ A are simultaneously inputted to the discriminator DAThe discriminator judges the image source. If from the generator, it is marked as 0 as possible, and if from the real sample field B, it is marked as 1 as possible. Thereafter, fake _ B is then input to the generator GBGenerates a picture rec _ A similar to real _ A, i.e. a generator GBAnd the optical flow graph is responsible for converting the pictures from the optical flow graph to the original graph. Similarly, in the other branch, real _ B also undergoes the cyclic conversion of the two generators, and is not described in detail here. The loop generation mechanism of the countermeasure network is made to have the capability of unsupervised learning. And a one-to-one corresponding sample truth value is not needed, so that the difficulty in obtaining the training sample is greatly reduced.
The generation of the penalty function for the counterpoise network is explained below.
Since rec _ a is the result of real _ a undergoing two generator cycles, the network requires rec _ a and real _ a to be as similar as possible to ensure semantic correctness of fake _ B. Based on this, a cycle consistency loss is introduced.
Figure RE-GDA0003643078900000041
Wherein x and y are each independentlyRepresenting two images real A and real B, E representing the expected value of the data, and F representing the generator GAH represents the generator GB
Secondly, the method circularly generates a confrontation loss function in the confrontation network, and the loss function guides the game confrontation training of the generator and the discriminator. The loss of confrontation is as follows:
Figure RE-GDA0003643078900000051
as can be seen from the countervailing losses, the discriminator aims to maximize the losses, i.e. to distinguish as much as possible the false image generated by the generator from the real image. The generator achieves the purpose of deceiving the discriminator by minimizing the function of the countervailing loss.
As shown in fig. 2, the color wheel diagram is used to show the magnitude and direction of the optical flow. The color wheel is based on an HIS color space, represents different directions of optical flow when rotating in a tangential direction, and represents that the size of the optical flow is gradually increased when the saturation degree is increased from the center of the color wheel to the outside in a radial direction. Therefore, the moving direction and size in the light flow diagram can be determined one by one according to the color wheel. The visual transformation of the optical flow is I1On the basis of (I)2The motion of the medium object is revealed using the HIS color space. Based on this, the invention combines the definition of optical flow, introduces the loss of consistency of optical flow, and is specifically defined as follows:
LOF=ρ(I2(x2+u,y2+v)-I1(x1,y1))(3)
wherein u, v represent the horizontal and vertical optical flows calculated by the network, I1And I2Two original pictures representing optical flows to be solved, x and y are coordinates in corresponding pictures respectively, and rho (x) ═ x2+0.0012)αFor the Charbonnier loss function, α is taken to be 0.5 in the present invention. The definition of optical flow is I1To I2And thus, the L can be lost by the optical flow uniformityOFTo optimize the accuracy of the optical flow.
Second, in the light flow diagramIs based on1Generated by the position of moving objects, thereby introducing a contour loss function to optimize the generated optical flow map compared to I1The profile details of (a). The method comprises the following specific steps:
Loutline=δ(F(x))-δ(I1)(4)
wherein, δ represents the binary contour extraction operation performed on the whole image.
The invention then introduces Spynet as a teacher network, which guides the optical flow generated by the generator. The training of deep learning needs a large amount of sample data, the speed of network training is slowed down due to the introduction of a large amount of training data, the dependence of a cyclic generation countermeasure network on the training samples can be reduced due to the introduction of the teacher network, the training speed is improved, and meanwhile the risk that the network falls into mode collapse is reduced. The present invention uses EPE loss as the loss of the teacher's network.
Figure RE-GDA0003643078900000052
Wherein the content of the first and second substances,
Figure RE-GDA0003643078900000053
output flow values for the teacher network in horizontal and vertical directions respectively,
Figure RE-GDA0003643078900000054
and
Figure RE-GDA0003643078900000055
estimated optical flow values in the horizontal and vertical directions of the generator, respectively. N is the total number of optical flow values.
Finally, the optimized cycle generation adopted by the invention resists the network loss function as follows:
L=Lcyc1LGAN2LOF3Loutline4LTeacher(6)
wherein λ is14Representing weightsAnd (4) the coefficient.
In order to generate more accurate optical flow information, the invention improves the generator in the classical cyclic generation countermeasure network, and combines the void convolution and attention mechanism with the optical flow convolution network. The improved generator is divided into two parts, namely a generator feature extraction network and a generator optical flow prediction network. The specific structure is shown in fig. 3 and fig. 4, respectively. In a feature extraction part in a generator, two original images of optical flow to be solved are spliced on a channel dimension to form a 6-dimensional tensor, and then after three layers of convolution layers (conv1-conv2), the feature image size is reduced to 1/8 of the original image, and the dimension is also improved from 6 dimensions to 256 dimensions. Since dense optical flow estimation belongs to pixel-by-pixel prediction, excessive down-sampling of the feature map results in loss of part of feature information, which is not favorable for accurate estimation of optical flow. In order to solve the problem, the invention adopts multilayer cavity convolution after the characteristic diagram is reduced to 1/8 size, and can obtain larger receptive field without reducing the size of the characteristic diagram. Meanwhile, the subsequent optical flow prediction network uses partial information of the feature extraction network, so that a channel-space attention module (CSAM) is introduced into the feature layers to enable a generator to be more concentrated on moving objects in an input picture, and the estimation accuracy of the optical flow is improved. CSAM module fig. 5.
The CSAM module firstly calculates the attention of channels, compresses the space dimension of a feature layer with the input size of C W H through maximum pooling and average pooling operations, outputs two matrixes with the size of C1H 1, the matrixes have a global receptive field to some extent, extracts the features of the two matrixes by utilizing learnable convolution with the size of 3W 3, then carries out summation operation, carries out sigmoid activation operation on the result obtained by summation, and maps the result to [0,1]In the above, a channel attention of size C × 1 was obtained. Finally, multiplying the channel attention by the original input features with the size of C, W and H to obtain a feature graph F after the channel attention mechanism operationCA. In the spatial attention Module of CSAM, F is first assignedCAPerforming maximum and average operation in channel dimension, outputting two matrices with size of 1 × W × H, and performing maximum and average operation in channel dimensionAnd performing splicing operation, inputting the obtained matrix with the size of 2W H into a convolution layer with the size of 7W 7H, then performing batch normalization and LeakeyReLU activation layer, outputting a matrix with the size of 1W H, and activating the matrix by using sigmoid to obtain the spatial attention with the size of 1W H. Finally, the space attention is paid to FCAAnd multiplying to obtain a final output characteristic diagram containing the channel and the space attention.
In the optical flow prediction part of the generator, the network structure is shown in FIG. 4. The optical flow adopts a layer-by-layer prediction mode, and the optical flow prediction of each layer is realized by one layer of convolution, so that 5 layers of optical flows are predicted. In the process of predicting the optical flow, the prediction result of the previous layer (except the first layer) needs to be combined with the prediction result of the previous layer, so that the context information of the optical flow prediction is enhanced, and the optical flow prediction precision is improved. A stitching operation is required before the optical flow is predicted. Taking the solution of flow5 as an example, before solving flow5, 1024-dimensional features generated by the generator feature extraction network conv6-1 need to be input into a transposed convolution layer with cavity convolution and CSAM to obtain 512-dimensional feature vectors, and then the 512-dimensional feature vectors are spliced with 512-dimensional features output by conv5-1 and 2-dimensional flow6 to obtain 1026-dimensional features, and the 1026-dimensional features are input into flow5 for optical flow prediction. The update iterations are then repeated until flow2 of size H/4W/4 x 2 is calculated, and then flow2 is subjected to four times up-sampling to obtain the final flow.
Fig. 6 shows a network structure of the arbiter used in the present invention. The network is a two-class network that is dedicated to separating real images from producer-generated images. That is, as much as possible, the picture generated by the generator is labeled as 1, and the real image is labeled as 0.
In the training stage of circularly generating the countermeasure network, two original images of optical flows to be solved are spliced according to channel dimensions to serve as a domain A, any real optical flow graph serves as a domain B, meanwhile, the two original images are input to the improved circularly generated countermeasure network, and alternate training is carried out on a discriminator and a generator. And the network unsupervised training is realized by using the confrontation and circulation mechanism of the network. And improving the optical flow prediction precision by utilizing the optical flow consistency loss. The contour information of the optical flow is optimized by means of contour loss, the risk that the network collapses in a mode is reduced by means of a teacher network, the requirement of the network on large data volume is reduced, and the training speed is increased. And (3) further improving the prediction precision of the optical flow by utilizing a CSAM attention module of a hole convolution kernel. And then, mutual transformation from the direction from the domain A to the domain B and the direction from the domain B to the domain A is completed, namely the solution from the original image to the optical flow is completed. Then, the image generated by the generator after the data generated by the generator is subjected to visual conversion and the real image of the original domain B are input to the discriminator, the discriminator is trained according to the image source, the sample generated by the generator is labeled as 0 through training, and the real sample is labeled as 1. The generator aims to trick the discriminator by the generated data, so that the countermeasure training of the discriminator and the generator is realized, the parameters of the generator are continuously optimized, and the training of the countermeasure network is finished when the discriminator cannot distinguish the image source.
In the stage of predicting the circularly generated confrontation network, the image pairs of all the optical flows to be solved and any number of optical flow diagrams are simultaneously input into the network, and the network can realize the circular transformation of the domain A and the domain B, namely the prediction of the optical flows.
According to one example of the present invention, the detailed training device information and parameter information are as follows. In the aspect of training equipment, a server CPU adopted by the invention is 2 blocks of Intel to strong E5-2698V 4, a GPU is 2 blocks of NvidIARX 3090, a video memory is 24GB x 2, a whole machine memory is 128GB, and the memory is 1TB SSD +4TB HDD. In terms of training parameters, 200 epochs are trained, wherein the first 100 epochs are fixed for learning rate and have the size of 0.0002, and the learning rate of the last 100 epochs is continuously attenuated from 0.0002 until the learning rate of the 200 th epoch is attenuated to 0. Meanwhile, the teacher network spynet acts on the first 100 epochs, and λ in the last 100 epochs4When the training time is 0,1 epoch is approximately 610s in the above configuration, the batch size is 4, and the deep learning frame adopted is Pytroch. In fig. 7, it can be seen that the classical cycle-GAN does not make any improvement on the optical flow estimation, resulting in unclear contours of moving objects and large errors. After the improvement of the invention, the optical flow estimation precision is obviously improved, and the detailed information is also richer.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims (4)

1. An optical flow unsupervised estimation method based on an improved cycle generation countermeasure network is characterized by comprising the following steps: the method comprises the following steps:
(1) a training stage: the training data includes picture I1Picture I2And optical flow visualization map IrealThree parts, of which picture I1And picture I2For calculating the optical flow, IrealIt is used to guide the optical flow image style finally generated by the network, in particular, the picture I1And picture I2After the channel dimensions are combined, a tensor I with the size of 6 dimensions is formed1+2And I withrealInputting the two into a cyclic generation network for training, wherein a generator G in the network is used for training according to I in the training process1+2Generating a tensor FLOW with the same size as the original image but with the channel number of 2, and then converting the FLOW into a tensor FLOW with the same I by using visualizationrealImages I of similar stylesfakeAnd is combined withfakeAnd IrealMeanwhile, the input data is input to a discriminator D for discrimination, the D is used for distinguishing the source of the input sample, the generator is used for deceiving the discriminator with the generated sample, the network forms countertraining in such a way, and the training is finished until the discriminator can not distinguish the image source;
(2) a prediction stage: after the training of the circularly generated countermeasure network is finished, the discriminator is frozen, only the generator is reserved, and at the moment, two pictures of the optical flow to be calculated are subjected to image processing
Figure FDA0003510084710000011
And
Figure FDA0003510084710000012
obtaining an image after channel merging
Figure FDA0003510084710000013
Visualization of images with arbitrary optical flow
Figure FDA0003510084710000014
Input into the network at the same time, and the network is utilized
Figure FDA0003510084710000015
Is completed to
Figure FDA0003510084710000016
And (4) obtaining the predicted optical flow graph of the two frames of images.
2. The method of claim 1, wherein the method comprises:
the specific meaning of the improved cycle generation countermeasure network in the step (1) is as follows:
compared with the traditional method for improving the circularly generated countermeasure network, the method mainly comprises three aspects, firstly, the structure of a generator is improved, the receptive field of the network is increased under the condition that the size of a characteristic diagram is not reduced by introducing cavity convolution, meanwhile, a channel-space attention module CSAM is introduced to enable the generator to be more concentrated on moving objects in an input picture, further, the estimation precision of the optical flow is improved, secondly, Spynet is introduced to serve as a teacher network to guide the generator, the dependence of the network on large data volume is reduced, meanwhile, the risk that the circularly generated countermeasure network falls into mode collapse is reduced, and finally, the optical flow consistency loss and the contour consistency loss are introduced, and the precision of the generated optical flow is improved.
3. The method of claim 1, wherein the method comprises:
the specific training process of the step (1) is as follows:
suppose there are two domains, domain A and domain B, in the problem being solved, domain A representing two optical flows to be solved I1And I26-dimensional tensor I generated after channel splicing1+2The domain B represents the optical flow diagram IrealThe proposed task of improving the cycle generation network is to simplify the cycle conversion between domain a and domain B, i.e. to use generator GAWill I1+2Conversion to domain B while using generator GBWill IrealAnd converting the image to the domain A, discriminating the image source by using a discriminator after conversion, namely inputting the image generated by the generator and a real image to a discriminator D, judging the image source by the discriminator through training, guiding and optimizing generator parameters, further improving the reality degree of the image generated by the generator, and jointly training the discriminator and the generator in the countermeasure. In the training process, the consistency loss of the optical flow is generated by calculating the generated optical flow and I1、I2The relationship between the two is used for improving the precision of the generated light flow, and the contour consistency loss is obtained by comparing the generated light flow graph IfakeAnd I1To refine I by contour differencefakeAnd simultaneously, introducing Spynet as a teacher network, training and guiding the generator by utilizing an EPE loss function, and when the discriminator cannot distinguish whether the image comes from the generator, achieving Nash balance and finishing training.
4. The method of claim 1, wherein the method comprises: the specific operation process of the step (2) is as follows:
after the training of the generation of the countermeasure network for the improvement cycle is completed, the discriminator is frozen, only the generator is adopted, and two images of the optical flow to be calculated are processed similarly to the training process
Figure FDA0003510084710000021
Obtaining an image of a representative domain A after channel splicing
Figure FDA0003510084710000022
With arbitrary real optical flow diagrams
Figure FDA0003510084710000023
As field B, simultaneously into the trained network, at which time generator GAAnd GBThe interconversion between the domain A and the domain B is completed, and the optical flow prediction is completed under the unsupervised condition.
CN202210150329.0A 2022-02-18 2022-02-18 Optical flow unsupervised estimation method based on improved cycle generation countermeasure network Pending CN114693744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210150329.0A CN114693744A (en) 2022-02-18 2022-02-18 Optical flow unsupervised estimation method based on improved cycle generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210150329.0A CN114693744A (en) 2022-02-18 2022-02-18 Optical flow unsupervised estimation method based on improved cycle generation countermeasure network

Publications (1)

Publication Number Publication Date
CN114693744A true CN114693744A (en) 2022-07-01

Family

ID=82138107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210150329.0A Pending CN114693744A (en) 2022-02-18 2022-02-18 Optical flow unsupervised estimation method based on improved cycle generation countermeasure network

Country Status (1)

Country Link
CN (1) CN114693744A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism
CN115861384B (en) * 2023-02-27 2023-05-23 广东工业大学 Optical flow estimation method and system based on countermeasure and attention mechanism generation

Similar Documents

Publication Publication Date Title
Cortinhal et al. Salsanext: Fast, uncertainty-aware semantic segmentation of lidar point clouds for autonomous driving
Ming et al. Deep learning for monocular depth estimation: A review
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN111325794A (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN109756690B (en) Light-weight video interpolation method based on feature-level optical flow
Mahjourian et al. Geometry-based next frame prediction from monocular video
CN110910437B (en) Depth prediction method for complex indoor scene
CN111311663B (en) Real-time large-scene three-dimensional semantic modeling method
CN115018888A (en) Optical flow unsupervised estimation method based on Transformer
Maslov et al. Online supervised attention-based recurrent depth estimation from monocular video
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
Ubina et al. Intelligent underwater stereo camera design for fish metric estimation using reliable object matching
CN116205962A (en) Monocular depth estimation method and system based on complete context information
Qin et al. Depth estimation by parameter transfer with a lightweight model for single still images
CN114693744A (en) Optical flow unsupervised estimation method based on improved cycle generation countermeasure network
CN116342675B (en) Real-time monocular depth estimation method, system, electronic equipment and storage medium
CN116452654B (en) BEV perception-based relative pose estimation method, neural network and training method thereof
CN116503602A (en) Unstructured environment three-dimensional point cloud semantic segmentation method based on multi-level edge enhancement
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
Nie et al. Binocular image dehazing via a plain network without disparity estimation
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
Bi et al. EBStereo: edge-based loss function for real-time stereo matching
Billy et al. DA-NET: Monocular Depth Estimation using Disparity maps Awareness NETwork
US20230005165A1 (en) Cross-task distillation to improve depth estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination