CN113537379A - Three-dimensional matching method based on CGANs - Google Patents

Three-dimensional matching method based on CGANs Download PDF

Info

Publication number
CN113537379A
CN113537379A CN202110860315.3A CN202110860315A CN113537379A CN 113537379 A CN113537379 A CN 113537379A CN 202110860315 A CN202110860315 A CN 202110860315A CN 113537379 A CN113537379 A CN 113537379A
Authority
CN
China
Prior art keywords
cgans
input
discriminator
image
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110860315.3A
Other languages
Chinese (zh)
Other versions
CN113537379B (en
Inventor
魏东
刘涵
何雪
于璟玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Priority to CN202110860315.3A priority Critical patent/CN113537379B/en
Publication of CN113537379A publication Critical patent/CN113537379A/en
Application granted granted Critical
Publication of CN113537379B publication Critical patent/CN113537379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A stereo matching method based on CGANs comprises the following steps: image input: inputting two left and right camera views and a true value, respectively taking a left image and a right image as a reference image and a target image, and taking the true value as a label corresponding to the left image; feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions; generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map; and (3) identifying true and false: extracting the fused features as conditions, inputting the fused features and the real values or the generated disparity maps into a discriminator, and then discriminating whether the input samples generate samples or real values by the discriminator; training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.

Description

Three-dimensional matching method based on CGANs
Technical Field
The invention belongs to the field of computer vision and the technical field of deep learning, and particularly relates to a stereo matching method based on CGANs (conditional general adaptive networks), wherein the CGANs (conditional general adaptive networks) refer to a conditional generation countermeasure network.
Background
Stereo matching plays a crucial role in many applications of computer vision orientation, such as robotic navigation, autopilot, augmented reality, gesture recognition, three-dimensional reconstruction, military reconnaissance, and maintenance detection. The purpose of computer vision is to mimic the human perception of the distance of objects in a three-dimensional scene using the visual system. And stereo matching can obtain depth information in a three-dimensional scene according to a two-dimensional image. The stereo matching is used as one of key research directions in computer vision, firstly, related pixel points are matched in two camera views under different viewpoints, then, the difference value of the related pixel points on horizontal displacement is calculated to obtain parallax, and finally, depth information is obtained through a mathematical model. Because the parallax between two related pixel points has a proportional relation with the depth information, the task of acquiring the depth information can be reasonably converted into the task of calculating the parallax through mathematical transformation. The problems of shielding, illumination, weak texture and the like exist in stereo matching, and various previous algorithms are all used for solving the problems so as to improve the accuracy of parallax prediction.
The algorithms can be divided into two types, namely a traditional stereo matching algorithm and an end-to-end stereo matching algorithm. The first type of traditional stereo matching algorithm comprises four steps of matching cost calculation, cost aggregation, parallax calculation and parallax refinement. However, the method of dividing and solving the problems has the problems that firstly, the number of the hyper-parameters is increased; secondly, the implementation process of stereo matching becomes complicated; thirdly, the method of dividing the steps does not necessarily obtain the best result, because the optimal solution of the combined subproblem is not equal to the global optimal solution; fourthly, the association range of a single pixel point is limited by an aggregation window when the parallax is calculated. The second type is an end-to-end stereo matching algorithm. Deep learning is one of the key research fields of artificial intelligence and machine learning, and achieves remarkable results in the computer vision direction. In order to solve the problems caused by the dividing steps of the traditional stereo matching algorithm, an end-to-end system can be constructed through deep learning to combine the four steps. The deep learning is a method for training a multilayer neural network, the training process of the deep learning is that training data is firstly input into a first layer of neurons, the weights of the layer are obtained through a nonlinear activation method, then the data output by the layer of neurons is used as input and transmitted to the next layer to continuously obtain the weights of the corresponding layer, the values of the weights can be continuously updated according to the advance of the learning progress, and finally, reasonable weights are obtained, namely, the distributed characteristic representation of the learned data is obtained. The end-to-end idea in deep learning realizes the process that one end inputs data and the other end directly outputs results, so that the defects caused by manual design can be overcome. In the end-to-end deep neural network, the independence of each pixel point in parallax prediction can be reduced by the characteristics generated after semantic information fusion is carried out on each layer of neurons. Therefore, the end-to-end stereo matching algorithm is that two left and right camera views are input to output corresponding disparity maps, and intermediate feature learning and information fusion are both handed to deep learning processing.
Although the existing end-to-end stereo matching algorithm solves the problems existing in the traditional method due to step-by-step implementation, the essence of the algorithm is that the cost amount is formed by calculating the matching points, but the matching of the pixel points and the pixel points does not necessarily accord with the ideal situation, and the precision loss is caused. Also, most of them use 3D convolution to handle the cost, which results in very high computation cost.
Disclosure of Invention
Object of the Invention
The invention provides a stereo matching method based on CGANs, aiming at the problems caused by divide-and-conquer of the traditional stereo matching algorithm and the problem of high calculation cost caused by 3D convolution processing cost.
Technical scheme
A stereo matching method based on CGANs comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image;
feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
and (3) identifying true and false: extracting the fused features and inputting the fused features and a real value or a generated disparity map into a discriminator, and then discriminating whether the input sample is a generated sample or a real value by the discriminator;
training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
Further, after the two left and right camera views are input, a cropping operation with a size of 256 × 256 is performed, and then whether the number of channels of the two images is 3 or not is judged, if so, the next operation is performed, otherwise, an error is reported.
Further, when the input is two images, a pseudo-twin network method is adopted. The pseudo-twin network used in the algorithm is composed of two convolutional neural networks with the same structure but different weights. Features extracted from both images need to be changed to one input before being input into the next module, and therefore need to be superimposed in the channel dimension.
Further, the fused features are set as conditions in CGANs, which are input to U-Net in the generator. And U-Net is used as a coder decoder network, and performs down-sampling and up-sampling operations on the input to generate a disparity map with the number of channels being 1.
Further, the feature extraction takes the result of channel dimension superposition as a condition and inputs the result and a true value or a U-Net output result into a discriminator, and the discriminator identifies a true or false binary problem through convolutional neural network processing, namely, a probability value is output to indicate whether an input sample is a true value or a generated sample.
Further, the disparity map generated by the U-Net calculates the error between the actual value and the traditional loss function of L1. The conventional loss function of L1 is as follows:
Figure BDA0003182166440000041
wherein E isx,yMeaning that x is the expectation of the training data distribution and y is the true value distribution, x is the condition that the input in generator G is also shared with discriminator D, G (x) is the generated sample, and y is the true value.
The judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs. The loss function for CGANs is as follows:
Figure BDA0003182166440000042
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data.
And the two loss functions are updated in a gradient mode through an Adam optimization method so as to guide the training of the whole network model. In training the network, the generator G needs to minimize the loss function and the discriminator D needs to maximize the loss function. The resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
Figure BDA0003182166440000043
advantages and effects
In order to obtain parallax information with higher precision, the problem caused by the fact that the traditional stereo matching algorithm is divided and controlled is solved, meanwhile, the parallax is predicted by a better method instead of the traditional end-to-end algorithm needing 3D convolution, and the calculation cost is reduced. The invention provides a three-dimensional matching method based on CGANs; the invention combines four steps in the traditional stereo matching method into one step from the end-to-end thought in deep learning, simplifies the steps of the stereo matching algorithm and solves the problems caused by divide and conquer. Compared with the traditional similar end-to-end algorithm, the pseudo-twin network is adopted for processing two similar pictures, and the negative influence of the convolutional neural network on subsequent U-Net learning is eliminated. The conditions set in the CGANs are changed from left and right camera views into feature maps extracted from the pseudo-twin network, so that the parameter quantity of the training model is reduced, and the calculation cost is reduced. For selection of a generator in CGANs, the U-Net which has better capability of learning high-level semantic information and has skip level connection is used to generate a disparity map with higher precision and better effect. And meanwhile, the network structure of the U-Net is adjusted, and a relatively proper layer number setting is found between the calculated amount and the accuracy of the disparity map.
The method adopts CGANs to generate the disparity map to finish the task of predicting the disparity, reduces the consumption of memory and time under the condition of improving certain accuracy, reduces the calculation cost and simplifies the implementation process of the stereo matching algorithm.
Drawings
Fig. 1 is a network structure diagram of a stereo matching method based on CGANs provided by the present invention.
Fig. 2 is a logic flow diagram of a stereo matching method based on CGANs provided in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
examples
A stereo matching method based on CGANs comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image; as shown in fig. 1.
Feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
the difference from the conventional image generation task is that: for the binocular stereo matching task, the input is no longer one picture but two left and right camera views. Two methods for performing input processing on two left and right camera views are generally used, one is to directly stack two original images on a channel dimension, and the other is to use a twin network for reference, namely to respectively extract features of input by using two convolutional neural networks with the same structure and shared weight and then fuse the features. Although the two methods can be handed to a subsequent network to learn data distribution, the first method can adversely affect feature learning, so that the effect of generating the disparity map by the generator G has a certain limitation. The second method using a twin network affects subsequent network learning because of the correlation with pixel points, and ignores the concern about a slight difference between left and right camera views when finding parallax to some extent.
So to improve the effect, feature extraction, i.e. pseudo-twin networks, can be achieved by two neural network branches with different weights in the network structure of the generator G. Therefore, the influence caused by the correlation calculation in the step can be eliminated, the attention to the small difference of the left camera view and the right camera view in the process of parallax prediction is reserved, and the difficulty of subsequent network learning can be reduced.
And respectively inputting the left camera view and the right camera view which are cut into 256 × 256 sizes into two convolutional neural networks with the same structure and different weights for feature extraction. The two images as input respectively pass through a convolution module with 64 layers of output channels and then pass through a convolution module with 128 layers of output channels, and the size of the images is kept to be 256 × 256 all the time. The convolution module used by this part of the convolutional neural network consists of a convolutional layer with size 3 × 3, step size 1 and padding 1, a BN layer and a leakage relu activation function layer.
The BN layer is a call Batch Normalization (Batch Normalization) regularization method. The CGANs learning process is a process of capturing the distribution rules of the training data, but the distribution rules of the processed pictures at each time are different in the expression mode of the values, which is not beneficial to the learning of the network model. Therefore, the value range of the input data can be unified to the range of [ -1,1] by using a common batch normalization method in deep learning. Therefore, the method not only solves the problem of difficulty in learning the network model, but also is beneficial to updating the gradient of back propagation, utilizing the nonlinearity of the LeakyReLU activation function, accelerating the speed of network convergence and reducing the sensitivity of the network to the adjustment of hyper-parameters. The specific way of processing when using batch normalization is to subtract the channel-by-channel calculated mean value from the batch size (batch size) after the convolutional layer and divide by the standard deviation, and when dividing the image by the standard deviation in training, the divisor may be directly replaced by a value of 255, i.e. the maximum value of 8-bit unsigned integer representing the maximum number of RGB channels, in order to reduce the calculation amount.
The LeakyReLU function is expressed as follows, where aiIs a fixed parameter in the interval (0, + ∞) set to 0.2; x is the number ofiRepresenting a value input into a function; y isiRepresenting the output of the function.
Figure BDA0003182166440000061
The features extracted for the two images respectively are superimposed on the channel dimensions. As input to the generator G and as a condition for the discriminator D.
Generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
the condition shared by the generator and the discriminator in the CGANs is that the convolution layer in the pseudo-twin network is used for extracting higher-layer features with higher resolution from the two left and right camera views to replace the original image pixel condition.
In order to improve the accuracy of the generated result, the problems of occlusion, light and weak texture in stereo matching need to be solved. The key to this is to learn high-level semantic information, so it is necessary to select a suitable network as the generator in CGANs, and the network of codec structures is capable of handling these problems. The encoder processes low-level features such as contours, colors, edges, textures and shapes, continuously extracts the features, reduces the pictures and increases the size of a receptive field, and the decoder restores the images to process high-level features which are beneficial to understanding and have complex semantics. U-Net is one of networks of encoder and decoder structures, and has advantages in generating disparity maps compared with other networks. The conventional CGANs generation model network structure requires that all data information flow through each layer from input to output, which undoubtedly lengthens the training time. For the stereo matching task, although the two input left and right camera views and the generated disparity map need to be subjected to complicated conversion, the structures of the two views are approximately the same, so that low-level semantic information shared between the two views is very important. In the process of feature learning, the information is prevented from being lost and redundant conversion operation is carried out, and the network structure of the feature learning module can be adjusted according to the requirement of stereo matching. And the U-Net with the layer-hopping connection in the network structure not only can realize the information sharing between input and output, but also avoids the resource waste brought by adopting the traditional CGANs network structure to a certain extent. In other words, the operation of the generator network is to fuse the features extracted by the pseudo-twin network in the channel dimension, and then give the fused features to U-Net to learn and generate the disparity map.
U-Net performs 8 down-sampling and 8 up-sampling operations for processing the input. The convolution module used in the down-sampling consists of convolution layer with size of 3 × 3, step size of 2 and padding of 1, BN layer and LeakyReLU activation function layer. The first seven layers of the convolution module used in upsampling are deconvolution layers with the size of 3 × 3, the step length of 2 and the filling of 1, a BN layer and a ReLU activation function layer are formed, and the mathematical expression of the ReLU activation function is as follows:
Figure BDA0003182166440000081
the last layer of upsampling, i.e. the output layer, will replace the activation function with a Tanh function, the mathematical expression of which is as follows:
Figure BDA0003182166440000082
wherein e isxIs that the input value is subjected to an exponential function operation, e-xThe method is characterized in that exponential function operation is carried out after an input value takes a negative value.
The input data will go through the convolution module with 3 output channels being 256 and the convolution module with 5 output channels being 512. During the downsampling process, the length and width of the input image are reduced by half every time the input image passes through a convolution module, and the size of the input image is changed from 128 × 128 to 1 × 1 at the end of the downsampling process. During the up-sampling process, data passes through the deconvolution module with 512 layers of output channels and 256 layers of output channels, and during the processing process, the data is superposed with the output results of the corresponding layers in the down-sampling process by using the skip layer connection of U-Net, and then is input into the deconvolution module. And the length and the width of the image are enlarged by half through a deconvolution module, and are gradually adjusted from the size 1 x 1 to the size 256 x 256 required when the disparity map is output, and the processing is performed so as to keep consistent with the size of the input image in the step one.
And (3) identifying true and false: inputting the true value or the generated disparity map and the condition into a discriminator, and then discriminating whether the input sample is a generated sample or a true value by the discriminator;
for the discriminator network, the original left and right camera views are no longer used as conditions shared with the generator, but the setting of the conditions is replaced with feature maps extracted for the two left and right views by the pseudo-twin network. After stacking the condition and the generated sample or the real sample on the channel dimension, inputting the stacked condition and generated sample or real sample into the convolution modules with the four layers of output channels with the numbers of 64, 128, 256 and 512, and then outputting a probability value indicating the judgment result of the discriminator by utilizing the convolution module with the output channel with the number of 1. The first four layers of convolution modules used in the discriminator are consistent in structure with the convolution modules adopted in the U-Net down sampling in the step three, and the last layer of output layer convolution module consists of convolution layers with the size of 3 x 3, the step length of 2 and the filling of 1 and a Sigmoid activation function layer. The Sigmoid function is used to handle the binary problem that the input samples are true or false. The mathematical expression of Sigmoid function is as follows:
Figure BDA0003182166440000091
where σ (x) refers to the output value of the Sigmoid function.
Training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
During training, the generator G is firstly trained, then the discriminator D is trained, and the pseudo-twin network for extracting the characteristics of the two left and right camera views is trained together with the U-Net, so that the training is circulated until the training is finished. The whole training is the process of the generator G and the discriminator D for gaming, the generator G hopes that the disparity map generated by the U-Net tricks the discriminator D, namely the discriminator D discriminates the generated sample as true, so the generator G tries to minimize the loss function. And arbiter D tries to maximize the loss function because it wants to improve its ability to discriminate the generated sample as false. The training process stops when the generator G and the discriminator D both obtain the optimal solution, theoretically achieving nash balance.
The detailed process of training is to guide the training of the whole network model through a loss function, namely, the gradient is updated by means of an optimization method, and the gradient is continuously decreased to approach the optimal solution to update the weight parameters. And regarding the weight parameter, the method relates to both the weight initialization and the optimization method.
The weight initialization is to enable the network model to have a better initial position when seeking a global optimal solution in a numerical space, so that better and faster convergence is facilitated during network model learning. When the weight of the convolutional layer is initialized, random normal distribution with the mean value of 0 and the variance of 0.02 is adopted.
The process by which the network model searches for the optimal solution may be referred to as optimization. The method adopted in the optimization is an Adam method improved on a gradient descent method, and the Adam method is used because the learning rate can be automatically adjusted to help the network model to better and faster converge during learning as long as initial values of some related hyper-parameters are set.
The disparity map generated by the U-Net calculates the error between the real value and the real value through an L1 loss function; the conventional loss function of L1 is as follows:
Figure BDA0003182166440000101
wherein E isx,yX is the expectation that the training data distribution is met and y is the true value distribution, x is the condition that the input in the generator G is shared with the discriminator D, G (x) is a generated sample, and y is the true value;
the judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs; the loss function for CGANs is as follows:
Figure BDA0003182166440000102
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data. The two loss functions are subjected to gradient updating together through an optimization method so as to guide the training of the whole network model; in the process of training the network, a generator G needs to minimize a loss function, and a discriminator D needs to maximize the loss function; to balance the CGAN loss term and the L1 loss term, a hyperparameter λ is added: the resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
Figure BDA0003182166440000103
wherein G is*For the loss function, λ is the hyperparameter added to balance the CGAN loss term and the L1 loss term.

Claims (6)

1. A three-dimensional matching method based on CGANs is characterized in that: the method comprises the following steps:
image input: inputting two left and right camera views and a real value, respectively taking a left image and a right image as a reference image and a target image, and taking the real value as a label corresponding to the left image;
feature extraction: respectively extracting features of the two input left and right camera views by using a pseudo-twin network, and fusing the extracted features by using channel dimensions;
generating a disparity map: the fused features are used as conditions in CGANs to enable a generator to be shared with a discriminator, and the generator generates a disparity map;
and (3) identifying true and false: the extracted and fused features are used as conditions and input into a discriminator together with a real value or a generated disparity map, and then the discriminator identifies whether the input sample is a generated sample or a real value;
training a model: the error of the generated disparity map and the true value and the result output by the discriminator are used for guiding the network model learning.
2. The CGANs-based stereo matching method according to claim 1, wherein: and after the two left and right camera views are input, a cropping operation with the size of 256 × 256 is carried out, then whether the number of channels of the two images is 3 or not is judged, if so, the next operation is carried out, and if not, an error is reported.
3. The CGANs-based stereo matching method according to claim 1, wherein: when the input is two images, a pseudo twin network method is adopted; the pseudo-twin network used in the algorithm is composed of two convolutional neural networks with the same structure and different weights; and features extracted from both images need to be superimposed in the channel dimension before being input to the next module.
4. The CGANs-based stereo matching method according to claim 1, wherein: setting the fused features as conditions in CGANs, and inputting the conditions into U-Net in a generator; and U-Net is used as a coder decoder network, and performs down-sampling and up-sampling operations on the input to generate a disparity map with the number of channels being 1.
5. The CGANs-based stereo matching method according to claim 4, wherein: either the real value or the output result of the U-Net and the condition are superposed by the channel dimension and then input into a discriminator, and the discriminator identifies the real or false binary problem through the convolutional neural network processing, namely, a probability value is output to indicate whether the input sample is the real value or the generated sample.
6. The CGANs-based stereo matching method according to claim 5, wherein: the disparity map generated by the U-Net calculates the error between the real value and the real value through an L1 loss function; the conventional loss function of L1 is as follows:
Figure FDA0003182166430000021
wherein E isx,yX is the expectation that the training data distribution is met and y is the true value distribution, x is the condition that the input in the generator G is shared with the discriminator D, G (x) is a generated sample, and y is the true value;
the judgment result of the discriminator D on the real value y or the generated sample G (x) is used for calculating the loss function of the CGANs; the loss function for CGANs is as follows:
Figure FDA0003182166430000022
wherein E isxMeaning that x conforms to the expectation of the distribution of the training data; the two loss functions are subjected to gradient updating together through an optimization method so as to guide the training of the whole network model; in the process of training the network, a generator G needs to minimize a loss function, and a discriminator D needs to maximize the loss function; to balance the CGAN loss term and the L1 loss term, a hyperparameter λ is added: the resulting loss function G of the CGANs used in the algorithm*Is represented as follows:
Figure FDA0003182166430000023
CN202110860315.3A 2021-07-27 2021-07-27 Three-dimensional matching method based on CGANs Active CN113537379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110860315.3A CN113537379B (en) 2021-07-27 2021-07-27 Three-dimensional matching method based on CGANs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110860315.3A CN113537379B (en) 2021-07-27 2021-07-27 Three-dimensional matching method based on CGANs

Publications (2)

Publication Number Publication Date
CN113537379A true CN113537379A (en) 2021-10-22
CN113537379B CN113537379B (en) 2024-04-16

Family

ID=78121448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110860315.3A Active CN113537379B (en) 2021-07-27 2021-07-27 Three-dimensional matching method based on CGANs

Country Status (1)

Country Link
CN (1) CN113537379B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118262385A (en) * 2024-05-30 2024-06-28 齐鲁工业大学(山东省科学院) Scheduling sequence based on camera difference and pedestrian re-recognition method based on training

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358626A (en) * 2017-07-17 2017-11-17 清华大学深圳研究生院 A kind of method that confrontation network calculations parallax is generated using condition
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110263192A (en) * 2019-06-06 2019-09-20 西安交通大学 A kind of abrasive grain topographic data base establishing method generating confrontation network based on condition
CN110619347A (en) * 2019-07-31 2019-12-27 广东工业大学 Image generation method based on machine learning and method thereof
CN111028277A (en) * 2019-12-10 2020-04-17 中国电子科技集团公司第五十四研究所 SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network
CN111091144A (en) * 2019-11-27 2020-05-01 云南电网有限责任公司电力科学研究院 Image feature point matching method and device based on depth pseudo-twin network
CN111145116A (en) * 2019-12-23 2020-05-12 哈尔滨工程大学 Sea surface rainy day image sample augmentation method based on generation of countermeasure network
WO2020172838A1 (en) * 2019-02-26 2020-09-03 长沙理工大学 Image classification method for improvement of auxiliary classifier gan
CN112785478A (en) * 2021-01-15 2021-05-11 南京信息工程大学 Hidden information detection method and system based on embedded probability graph generation
CN112861774A (en) * 2021-03-04 2021-05-28 山东产研卫星信息技术产业研究院有限公司 Method and system for identifying ship target by using remote sensing image

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358626A (en) * 2017-07-17 2017-11-17 清华大学深圳研究生院 A kind of method that confrontation network calculations parallax is generated using condition
WO2020172838A1 (en) * 2019-02-26 2020-09-03 长沙理工大学 Image classification method for improvement of auxiliary classifier gan
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110263192A (en) * 2019-06-06 2019-09-20 西安交通大学 A kind of abrasive grain topographic data base establishing method generating confrontation network based on condition
CN110619347A (en) * 2019-07-31 2019-12-27 广东工业大学 Image generation method based on machine learning and method thereof
CN111091144A (en) * 2019-11-27 2020-05-01 云南电网有限责任公司电力科学研究院 Image feature point matching method and device based on depth pseudo-twin network
CN111028277A (en) * 2019-12-10 2020-04-17 中国电子科技集团公司第五十四研究所 SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network
CN111145116A (en) * 2019-12-23 2020-05-12 哈尔滨工程大学 Sea surface rainy day image sample augmentation method based on generation of countermeasure network
CN112785478A (en) * 2021-01-15 2021-05-11 南京信息工程大学 Hidden information detection method and system based on embedded probability graph generation
CN112861774A (en) * 2021-03-04 2021-05-28 山东产研卫星信息技术产业研究院有限公司 Method and system for identifying ship target by using remote sensing image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李从利 等: "《侦察图像清晰化及质量评价方法》", 合肥工业大学出版社, pages: 91 *
魏林林: "基于文本语义的图像生成算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 July 2020 (2020-07-15), pages 91 - 79 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118262385A (en) * 2024-05-30 2024-06-28 齐鲁工业大学(山东省科学院) Scheduling sequence based on camera difference and pedestrian re-recognition method based on training

Also Published As

Publication number Publication date
CN113537379B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN111753698B (en) Multi-mode three-dimensional point cloud segmentation system and method
CN112634341B (en) Method for constructing depth estimation model of multi-vision task cooperation
CN110378838B (en) Variable-view-angle image generation method and device, storage medium and electronic equipment
CN108846473B (en) Light field depth estimation method based on direction and scale self-adaptive convolutional neural network
CN112884682B (en) Stereo image color correction method and system based on matching and fusion
CN111508013B (en) Stereo matching method
CN111402311B (en) Knowledge distillation-based lightweight stereo parallax estimation method
CN111819568A (en) Method and device for generating face rotation image
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN115393410A (en) Monocular view depth estimation method based on nerve radiation field and semantic segmentation
CN116883990B (en) Target detection method for stereoscopic vision depth perception learning
CN113763446B (en) Three-dimensional matching method based on guide information
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN111027581A (en) 3D target detection method and system based on learnable codes
CN114693744A (en) Optical flow unsupervised estimation method based on improved cycle generation countermeasure network
CN115511759A (en) Point cloud image depth completion method based on cascade feature interaction
CN115115917A (en) 3D point cloud target detection method based on attention mechanism and image feature fusion
CN113537379B (en) Three-dimensional matching method based on CGANs
CN114092540A (en) Attention mechanism-based light field depth estimation method and computer readable medium
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN112463936B (en) Visual question-answering method and system based on three-dimensional information
CN110910450A (en) Method for carrying out 3D target detection based on mixed feature perception neural network
CN115035545B (en) Target detection method and device based on improved self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant