CN113537379A - Three-dimensional matching method based on CGANs - Google Patents
Three-dimensional matching method based on CGANs Download PDFInfo
- Publication number
- CN113537379A CN113537379A CN202110860315.3A CN202110860315A CN113537379A CN 113537379 A CN113537379 A CN 113537379A CN 202110860315 A CN202110860315 A CN 202110860315A CN 113537379 A CN113537379 A CN 113537379A
- Authority
- CN
- China
- Prior art keywords
- cgans
- input
- discriminator
- image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 44
- 230000008569 process Effects 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 7
- 238000013135 deep learning Methods 0.000 description 9
- 230000004913 activation Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A stereo matching method based on CGANs comprises the following steps: image input: inputting two left and right camera views and a true value, respectively taking a left image and a right image as a reference image and a target image, and taking the true value as a label corresponding to the left image; feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions; generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map; and (3) identifying true and false: extracting the fused features as conditions, inputting the fused features and the real values or the generated disparity maps into a discriminator, and then discriminating whether the input samples generate samples or real values by the discriminator; training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
Description
Technical Field
The invention belongs to the field of computer vision and the technical field of deep learning, and particularly relates to a stereo matching method based on CGANs (conditional general adaptive networks), wherein the CGANs (conditional general adaptive networks) refer to a conditional generation countermeasure network.
Background
Stereo matching plays a crucial role in many applications of computer vision orientation, such as robotic navigation, autopilot, augmented reality, gesture recognition, three-dimensional reconstruction, military reconnaissance, and maintenance detection. The purpose of computer vision is to mimic the human perception of the distance of objects in a three-dimensional scene using the visual system. And stereo matching can obtain depth information in a three-dimensional scene according to a two-dimensional image. The stereo matching is used as one of key research directions in computer vision, firstly, related pixel points are matched in two camera views under different viewpoints, then, the difference value of the related pixel points on horizontal displacement is calculated to obtain parallax, and finally, depth information is obtained through a mathematical model. Because the parallax between two related pixel points has a proportional relation with the depth information, the task of acquiring the depth information can be reasonably converted into the task of calculating the parallax through mathematical transformation. The problems of shielding, illumination, weak texture and the like exist in stereo matching, and various previous algorithms are all used for solving the problems so as to improve the accuracy of parallax prediction.
The algorithms can be divided into two types, namely a traditional stereo matching algorithm and an end-to-end stereo matching algorithm. The first type of traditional stereo matching algorithm comprises four steps of matching cost calculation, cost aggregation, parallax calculation and parallax refinement. However, the method of dividing and solving the problems has the problems that firstly, the number of the hyper-parameters is increased; secondly, the implementation process of stereo matching becomes complicated; thirdly, the method of dividing the steps does not necessarily obtain the best result, because the optimal solution of the combined subproblem is not equal to the global optimal solution; fourthly, the association range of a single pixel point is limited by an aggregation window when the parallax is calculated. The second type is an end-to-end stereo matching algorithm. Deep learning is one of the key research fields of artificial intelligence and machine learning, and achieves remarkable results in the computer vision direction. In order to solve the problems caused by the dividing steps of the traditional stereo matching algorithm, an end-to-end system can be constructed through deep learning to combine the four steps. The deep learning is a method for training a multilayer neural network, the training process of the deep learning is that training data is firstly input into a first layer of neurons, the weights of the layer are obtained through a nonlinear activation method, then the data output by the layer of neurons is used as input and transmitted to the next layer to continuously obtain the weights of the corresponding layer, the values of the weights can be continuously updated according to the advance of the learning progress, and finally, reasonable weights are obtained, namely, the distributed characteristic representation of the learned data is obtained. The end-to-end idea in deep learning realizes the process that one end inputs data and the other end directly outputs results, so that the defects caused by manual design can be overcome. In the end-to-end deep neural network, the independence of each pixel point in parallax prediction can be reduced by the characteristics generated after semantic information fusion is carried out on each layer of neurons. Therefore, the end-to-end stereo matching algorithm is that two left and right camera views are input to output corresponding disparity maps, and intermediate feature learning and information fusion are both handed to deep learning processing.
Although the existing end-to-end stereo matching algorithm solves the problems existing in the traditional method due to step-by-step implementation, the essence of the algorithm is that the cost amount is formed by calculating the matching points, but the matching of the pixel points and the pixel points does not necessarily accord with the ideal situation, and the precision loss is caused. Also, most of them use 3D convolution to handle the cost, which results in very high computation cost.
Disclosure of Invention
Object of the Invention
The invention provides a stereo matching method based on CGANs, aiming at the problems caused by divide-and-conquer of the traditional stereo matching algorithm and the problem of high calculation cost caused by 3D convolution processing cost.
Technical scheme
A stereo matching method based on CGANs comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image;
feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
and (3) identifying true and false: extracting the fused features and inputting the fused features and a real value or a generated disparity map into a discriminator, and then discriminating whether the input sample is a generated sample or a real value by the discriminator;
training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
Further, after the two left and right camera views are input, a cropping operation with a size of 256 × 256 is performed, and then whether the number of channels of the two images is 3 or not is judged, if so, the next operation is performed, otherwise, an error is reported.
Further, when the input is two images, a pseudo-twin network method is adopted. The pseudo-twin network used in the algorithm is composed of two convolutional neural networks with the same structure but different weights. Features extracted from both images need to be changed to one input before being input into the next module, and therefore need to be superimposed in the channel dimension.
Further, the fused features are set as conditions in CGANs, which are input to U-Net in the generator. And U-Net is used as a coder decoder network, and performs down-sampling and up-sampling operations on the input to generate a disparity map with the number of channels being 1.
Further, the feature extraction takes the result of channel dimension superposition as a condition and inputs the result and a true value or a U-Net output result into a discriminator, and the discriminator identifies a true or false binary problem through convolutional neural network processing, namely, a probability value is output to indicate whether an input sample is a true value or a generated sample.
Further, the disparity map generated by the U-Net calculates the error between the actual value and the traditional loss function of L1. The conventional loss function of L1 is as follows:
wherein E isx,yMeaning that x is the expectation of the training data distribution and y is the true value distribution, x is the condition that the input in generator G is also shared with discriminator D, G (x) is the generated sample, and y is the true value.
The judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs. The loss function for CGANs is as follows:
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data.
And the two loss functions are updated in a gradient mode through an Adam optimization method so as to guide the training of the whole network model. In training the network, the generator G needs to minimize the loss function and the discriminator D needs to maximize the loss function. The resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
advantages and effects
In order to obtain parallax information with higher precision, the problem caused by the fact that the traditional stereo matching algorithm is divided and controlled is solved, meanwhile, the parallax is predicted by a better method instead of the traditional end-to-end algorithm needing 3D convolution, and the calculation cost is reduced. The invention provides a three-dimensional matching method based on CGANs; the invention combines four steps in the traditional stereo matching method into one step from the end-to-end thought in deep learning, simplifies the steps of the stereo matching algorithm and solves the problems caused by divide and conquer. Compared with the traditional similar end-to-end algorithm, the pseudo-twin network is adopted for processing two similar pictures, and the negative influence of the convolutional neural network on subsequent U-Net learning is eliminated. The conditions set in the CGANs are changed from left and right camera views into feature maps extracted from the pseudo-twin network, so that the parameter quantity of the training model is reduced, and the calculation cost is reduced. For selection of a generator in CGANs, the U-Net which has better capability of learning high-level semantic information and has skip level connection is used to generate a disparity map with higher precision and better effect. And meanwhile, the network structure of the U-Net is adjusted, and a relatively proper layer number setting is found between the calculated amount and the accuracy of the disparity map.
The method adopts CGANs to generate the disparity map to finish the task of predicting the disparity, reduces the consumption of memory and time under the condition of improving certain accuracy, reduces the calculation cost and simplifies the implementation process of the stereo matching algorithm.
Drawings
Fig. 1 is a network structure diagram of a stereo matching method based on CGANs provided by the present invention.
Fig. 2 is a logic flow diagram of a stereo matching method based on CGANs provided in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
examples
A stereo matching method based on CGANs comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image; as shown in fig. 1.
Feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
the difference from the conventional image generation task is that: for the binocular stereo matching task, the input is no longer one picture but two left and right camera views. Two methods for performing input processing on two left and right camera views are generally used, one is to directly stack two original images on a channel dimension, and the other is to use a twin network for reference, namely to respectively extract features of input by using two convolutional neural networks with the same structure and shared weight and then fuse the features. Although the two methods can be handed to a subsequent network to learn data distribution, the first method can adversely affect feature learning, so that the effect of generating the disparity map by the generator G has a certain limitation. The second method using a twin network affects subsequent network learning because of the correlation with pixel points, and ignores the concern about a slight difference between left and right camera views when finding parallax to some extent.
So to improve the effect, feature extraction, i.e. pseudo-twin networks, can be achieved by two neural network branches with different weights in the network structure of the generator G. Therefore, the influence caused by the correlation calculation in the step can be eliminated, the attention to the small difference of the left camera view and the right camera view in the process of parallax prediction is reserved, and the difficulty of subsequent network learning can be reduced.
And respectively inputting the left camera view and the right camera view which are cut into 256 × 256 sizes into two convolutional neural networks with the same structure and different weights for feature extraction. The two images as input respectively pass through a convolution module with 64 layers of output channels and then pass through a convolution module with 128 layers of output channels, and the size of the images is kept to be 256 × 256 all the time. The convolution module used by this part of the convolutional neural network consists of a convolutional layer with size 3 × 3, step size 1 and padding 1, a BN layer and a leakage relu activation function layer.
The BN layer is a call Batch Normalization (Batch Normalization) regularization method. The CGANs learning process is a process of capturing the distribution rules of the training data, but the distribution rules of the processed pictures at each time are different in the expression mode of the values, which is not beneficial to the learning of the network model. Therefore, the value range of the input data can be unified to the range of [ -1,1] by using a common batch normalization method in deep learning. Therefore, the method not only solves the problem of difficulty in learning the network model, but also is beneficial to updating the gradient of back propagation, utilizing the nonlinearity of the LeakyReLU activation function, accelerating the speed of network convergence and reducing the sensitivity of the network to the adjustment of hyper-parameters. The specific way of processing when using batch normalization is to subtract the channel-by-channel calculated mean value from the batch size (batch size) after the convolutional layer and divide by the standard deviation, and when dividing the image by the standard deviation in training, the divisor may be directly replaced by a value of 255, i.e. the maximum value of 8-bit unsigned integer representing the maximum number of RGB channels, in order to reduce the calculation amount.
The LeakyReLU function is expressed as follows, where aiIs a fixed parameter in the interval (0, + ∞) set to 0.2; x is the number ofiRepresenting a value input into a function; y isiRepresenting the output of the function.
The features extracted for the two images respectively are superimposed on the channel dimensions. As input to the generator G and as a condition for the discriminator D.
Generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
the condition shared by the generator and the discriminator in the CGANs is that the convolution layer in the pseudo-twin network is used for extracting higher-layer features with higher resolution from the two left and right camera views to replace the original image pixel condition.
In order to improve the accuracy of the generated result, the problems of occlusion, light and weak texture in stereo matching need to be solved. The key to this is to learn high-level semantic information, so it is necessary to select a suitable network as the generator in CGANs, and the network of codec structures is capable of handling these problems. The encoder processes low-level features such as contours, colors, edges, textures and shapes, continuously extracts the features, reduces the pictures and increases the size of a receptive field, and the decoder restores the images to process high-level features which are beneficial to understanding and have complex semantics. U-Net is one of networks of encoder and decoder structures, and has advantages in generating disparity maps compared with other networks. The conventional CGANs generation model network structure requires that all data information flow through each layer from input to output, which undoubtedly lengthens the training time. For the stereo matching task, although the two input left and right camera views and the generated disparity map need to be subjected to complicated conversion, the structures of the two views are approximately the same, so that low-level semantic information shared between the two views is very important. In the process of feature learning, the information is prevented from being lost and redundant conversion operation is carried out, and the network structure of the feature learning module can be adjusted according to the requirement of stereo matching. And the U-Net with the layer-hopping connection in the network structure not only can realize the information sharing between input and output, but also avoids the resource waste brought by adopting the traditional CGANs network structure to a certain extent. In other words, the operation of the generator network is to fuse the features extracted by the pseudo-twin network in the channel dimension, and then give the fused features to U-Net to learn and generate the disparity map.
U-Net performs 8 down-sampling and 8 up-sampling operations for processing the input. The convolution module used in the down-sampling consists of convolution layer with size of 3 × 3, step size of 2 and padding of 1, BN layer and LeakyReLU activation function layer. The first seven layers of the convolution module used in upsampling are deconvolution layers with the size of 3 × 3, the step length of 2 and the filling of 1, a BN layer and a ReLU activation function layer are formed, and the mathematical expression of the ReLU activation function is as follows:
the last layer of upsampling, i.e. the output layer, will replace the activation function with a Tanh function, the mathematical expression of which is as follows:
wherein e isxIs that the input value is subjected to an exponential function operation, e-xThe method is characterized in that exponential function operation is carried out after an input value takes a negative value.
The input data will go through the convolution module with 3 output channels being 256 and the convolution module with 5 output channels being 512. During the downsampling process, the length and width of the input image are reduced by half every time the input image passes through a convolution module, and the size of the input image is changed from 128 × 128 to 1 × 1 at the end of the downsampling process. During the up-sampling process, data passes through the deconvolution module with 512 layers of output channels and 256 layers of output channels, and during the processing process, the data is superposed with the output results of the corresponding layers in the down-sampling process by using the skip layer connection of U-Net, and then is input into the deconvolution module. And the length and the width of the image are enlarged by half through a deconvolution module, and are gradually adjusted from the size 1 x 1 to the size 256 x 256 required when the disparity map is output, and the processing is performed so as to keep consistent with the size of the input image in the step one.
And (3) identifying true and false: inputting the true value or the generated disparity map and the condition into a discriminator, and then discriminating whether the input sample is a generated sample or a true value by the discriminator;
for the discriminator network, the original left and right camera views are no longer used as conditions shared with the generator, but the setting of the conditions is replaced with feature maps extracted for the two left and right views by the pseudo-twin network. After stacking the condition and the generated sample or the real sample on the channel dimension, inputting the stacked condition and generated sample or real sample into the convolution modules with the four layers of output channels with the numbers of 64, 128, 256 and 512, and then outputting a probability value indicating the judgment result of the discriminator by utilizing the convolution module with the output channel with the number of 1. The first four layers of convolution modules used in the discriminator are consistent in structure with the convolution modules adopted in the U-Net down sampling in the step three, and the last layer of output layer convolution module consists of convolution layers with the size of 3 x 3, the step length of 2 and the filling of 1 and a Sigmoid activation function layer. The Sigmoid function is used to handle the binary problem that the input samples are true or false. The mathematical expression of Sigmoid function is as follows:
where σ (x) refers to the output value of the Sigmoid function.
Training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
During training, the generator G is firstly trained, then the discriminator D is trained, and the pseudo-twin network for extracting the characteristics of the two left and right camera views is trained together with the U-Net, so that the training is circulated until the training is finished. The whole training is the process of the generator G and the discriminator D for gaming, the generator G hopes that the disparity map generated by the U-Net tricks the discriminator D, namely the discriminator D discriminates the generated sample as true, so the generator G tries to minimize the loss function. And arbiter D tries to maximize the loss function because it wants to improve its ability to discriminate the generated sample as false. The training process stops when the generator G and the discriminator D both obtain the optimal solution, theoretically achieving nash balance.
The detailed process of training is to guide the training of the whole network model through a loss function, namely, the gradient is updated by means of an optimization method, and the gradient is continuously decreased to approach the optimal solution to update the weight parameters. And regarding the weight parameter, the method relates to both the weight initialization and the optimization method.
The weight initialization is to enable the network model to have a better initial position when seeking a global optimal solution in a numerical space, so that better and faster convergence is facilitated during network model learning. When the weight of the convolutional layer is initialized, random normal distribution with the mean value of 0 and the variance of 0.02 is adopted.
The process by which the network model searches for the optimal solution may be referred to as optimization. The method adopted in the optimization is an Adam method improved on a gradient descent method, and the Adam method is used because the learning rate can be automatically adjusted to help the network model to better and faster converge during learning as long as initial values of some related hyper-parameters are set.
The disparity map generated by the U-Net calculates the error between the real value and the real value through an L1 loss function; the conventional loss function of L1 is as follows:
wherein E isx,yX is the expectation that the training data distribution is met and y is the true value distribution, x is the condition that the input in the generator G is shared with the discriminator D, G (x) is a generated sample, and y is the true value;
the judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs; the loss function for CGANs is as follows:
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data. The two loss functions are subjected to gradient updating together through an optimization method so as to guide the training of the whole network model; in the process of training the network, a generator G needs to minimize a loss function, and a discriminator D needs to maximize the loss function; to balance the CGAN loss term and the L1 loss term, a hyperparameter λ is added: the resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
wherein G is*For the loss function, λ is the hyperparameter added to balance the CGAN loss term and the L1 loss term.
Claims (6)
1. A three-dimensional matching method based on CGANs is characterized in that: the method comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image;
feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
and (3) identifying true and false: the extracted and fused features are used as conditions and input into a discriminator together with a real value or a generated disparity map, and then the discriminator identifies whether the input sample is a generated sample or a real value;
training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
2. The CGANs-based stereo matching method according to claim 1, wherein: and after the two left and right camera views are input, a cropping operation with the size of 256 × 256 is carried out, then whether the number of channels of the two images is 3 or not is judged, if so, the next operation is carried out, and if not, an error is reported.
3. The CGANs-based stereo matching method according to claim 1, wherein: when the input is two images, a pseudo twin network method is adopted; the pseudo-twin network used in the algorithm is composed of two convolutional neural networks with the same structure and different weights; and features extracted from both images need to be superimposed in the channel dimension before being input to the next module.
4. The CGANs-based stereo matching method according to claim 1, wherein: setting the fused features as conditions in CGANs, and inputting the conditions into U-Net in a generator; and U-Net is used as a coder decoder network, and performs down-sampling and up-sampling operations on the input to generate a disparity map with the number of channels being 1.
5. The CGANs-based stereo matching method according to claim 4, wherein: either the real value or the output result of the U-Net and the condition are superposed by the channel dimension and then input into a discriminator, and the discriminator identifies the real or false binary problem through the convolutional neural network processing, namely, a probability value is output to indicate whether the input sample is the real value or the generated sample.
6. The CGANs-based stereo matching method according to claim 5, wherein: the disparity map generated by the U-Net calculates the error between the real value and the real value through an L1 loss function; the conventional loss function of L1 is as follows:
wherein E isx,yX is the expectation that the training data distribution is met and y is the true value distribution, x is the condition that the input in the generator G is shared with the discriminator D, G (x) is a generated sample, and y is the true value;
the judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs; the loss function for CGANs is as follows:
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data; the two loss functions are subjected to gradient updating together through an optimization method so as to guide the training of the whole network model; in the process of training the network, a generator G needs to minimize a loss function, and a discriminator D needs to maximize the loss function; to balance the CGAN loss term and the L1 loss term, a hyperparameter λ is added: the resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110860315.3A CN113537379B (en) | 2021-07-27 | 2021-07-27 | Three-dimensional matching method based on CGANs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110860315.3A CN113537379B (en) | 2021-07-27 | 2021-07-27 | Three-dimensional matching method based on CGANs |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537379A true CN113537379A (en) | 2021-10-22 |
CN113537379B CN113537379B (en) | 2024-04-16 |
Family
ID=78121448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110860315.3A Active CN113537379B (en) | 2021-07-27 | 2021-07-27 | Three-dimensional matching method based on CGANs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537379B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118262385A (en) * | 2024-05-30 | 2024-06-28 | 齐鲁工业大学(山东省科学院) | Scheduling sequence based on camera difference and pedestrian re-recognition method based on training |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358626A (en) * | 2017-07-17 | 2017-11-17 | 清华大学深圳研究生院 | A kind of method that confrontation network calculations parallax is generated using condition |
CN110136063A (en) * | 2019-05-13 | 2019-08-16 | 南京信息工程大学 | A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition |
CN110263192A (en) * | 2019-06-06 | 2019-09-20 | 西安交通大学 | A kind of abrasive grain topographic data base establishing method generating confrontation network based on condition |
CN110619347A (en) * | 2019-07-31 | 2019-12-27 | 广东工业大学 | Image generation method based on machine learning and method thereof |
CN111028277A (en) * | 2019-12-10 | 2020-04-17 | 中国电子科技集团公司第五十四研究所 | SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network |
CN111091144A (en) * | 2019-11-27 | 2020-05-01 | 云南电网有限责任公司电力科学研究院 | Image feature point matching method and device based on depth pseudo-twin network |
CN111145116A (en) * | 2019-12-23 | 2020-05-12 | 哈尔滨工程大学 | Sea surface rainy day image sample augmentation method based on generation of countermeasure network |
WO2020172838A1 (en) * | 2019-02-26 | 2020-09-03 | 长沙理工大学 | Image classification method for improvement of auxiliary classifier gan |
CN112785478A (en) * | 2021-01-15 | 2021-05-11 | 南京信息工程大学 | Hidden information detection method and system based on embedded probability graph generation |
CN112861774A (en) * | 2021-03-04 | 2021-05-28 | 山东产研卫星信息技术产业研究院有限公司 | Method and system for identifying ship target by using remote sensing image |
-
2021
- 2021-07-27 CN CN202110860315.3A patent/CN113537379B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358626A (en) * | 2017-07-17 | 2017-11-17 | 清华大学深圳研究生院 | A kind of method that confrontation network calculations parallax is generated using condition |
WO2020172838A1 (en) * | 2019-02-26 | 2020-09-03 | 长沙理工大学 | Image classification method for improvement of auxiliary classifier gan |
CN110136063A (en) * | 2019-05-13 | 2019-08-16 | 南京信息工程大学 | A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition |
CN110263192A (en) * | 2019-06-06 | 2019-09-20 | 西安交通大学 | A kind of abrasive grain topographic data base establishing method generating confrontation network based on condition |
CN110619347A (en) * | 2019-07-31 | 2019-12-27 | 广东工业大学 | Image generation method based on machine learning and method thereof |
CN111091144A (en) * | 2019-11-27 | 2020-05-01 | 云南电网有限责任公司电力科学研究院 | Image feature point matching method and device based on depth pseudo-twin network |
CN111028277A (en) * | 2019-12-10 | 2020-04-17 | 中国电子科技集团公司第五十四研究所 | SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network |
CN111145116A (en) * | 2019-12-23 | 2020-05-12 | 哈尔滨工程大学 | Sea surface rainy day image sample augmentation method based on generation of countermeasure network |
CN112785478A (en) * | 2021-01-15 | 2021-05-11 | 南京信息工程大学 | Hidden information detection method and system based on embedded probability graph generation |
CN112861774A (en) * | 2021-03-04 | 2021-05-28 | 山东产研卫星信息技术产业研究院有限公司 | Method and system for identifying ship target by using remote sensing image |
Non-Patent Citations (2)
Title |
---|
李从利 等: "《侦察图像清晰化及质量评价方法》", 合肥工业大学出版社, pages: 91 * |
魏林林: "基于文本语义的图像生成算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 July 2020 (2020-07-15), pages 91 - 79 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118262385A (en) * | 2024-05-30 | 2024-06-28 | 齐鲁工业大学(山东省科学院) | Scheduling sequence based on camera difference and pedestrian re-recognition method based on training |
Also Published As
Publication number | Publication date |
---|---|
CN113537379B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN111753698B (en) | Multi-mode three-dimensional point cloud segmentation system and method | |
CN112634341B (en) | Method for constructing depth estimation model of multi-vision task cooperation | |
CN110378838B (en) | Variable-view-angle image generation method and device, storage medium and electronic equipment | |
CN108846473B (en) | Light field depth estimation method based on direction and scale self-adaptive convolutional neural network | |
CN112884682B (en) | Stereo image color correction method and system based on matching and fusion | |
CN111508013B (en) | Stereo matching method | |
CN111402311B (en) | Knowledge distillation-based lightweight stereo parallax estimation method | |
CN111819568A (en) | Method and device for generating face rotation image | |
CN110689599A (en) | 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement | |
CN109389667B (en) | High-efficiency global illumination drawing method based on deep learning | |
CN115393410A (en) | Monocular view depth estimation method based on nerve radiation field and semantic segmentation | |
CN116883990B (en) | Target detection method for stereoscopic vision depth perception learning | |
CN113763446B (en) | Three-dimensional matching method based on guide information | |
CN112270701B (en) | Parallax prediction method, system and storage medium based on packet distance network | |
CN111027581A (en) | 3D target detection method and system based on learnable codes | |
CN114693744A (en) | Optical flow unsupervised estimation method based on improved cycle generation countermeasure network | |
CN115511759A (en) | Point cloud image depth completion method based on cascade feature interaction | |
CN115115917A (en) | 3D point cloud target detection method based on attention mechanism and image feature fusion | |
CN113537379B (en) | Three-dimensional matching method based on CGANs | |
CN114092540A (en) | Attention mechanism-based light field depth estimation method and computer readable medium | |
CN116485892A (en) | Six-degree-of-freedom pose estimation method for weak texture object | |
CN112463936B (en) | Visual question-answering method and system based on three-dimensional information | |
CN110910450A (en) | Method for carrying out 3D target detection based on mixed feature perception neural network | |
CN115035545B (en) | Target detection method and device based on improved self-attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |