CN114972822A - End-to-end binocular stereo matching method based on convolutional neural network - Google Patents

End-to-end binocular stereo matching method based on convolutional neural network Download PDF

Info

Publication number
CN114972822A
CN114972822A CN202210659456.3A CN202210659456A CN114972822A CN 114972822 A CN114972822 A CN 114972822A CN 202210659456 A CN202210659456 A CN 202210659456A CN 114972822 A CN114972822 A CN 114972822A
Authority
CN
China
Prior art keywords
stereo matching
feature
layer
module
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210659456.3A
Other languages
Chinese (zh)
Inventor
刘杰
高晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202210659456.3A priority Critical patent/CN114972822A/en
Publication of CN114972822A publication Critical patent/CN114972822A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an end-to-end binocular stereo matching method based on a convolutional neural network, which improves the existing PsmNet network model for parallax estimation. Firstly, adding cavity convolution in a multi-scale space pyramid pooling layer to enlarge the receptive field of the network; and in the cost aggregation module, 4 coding and decoding modules are connected in series for stacking, so that high-level information is further extracted. The improved network model enlarges the receptive field, simultaneously improves the good performance of the network model, increases rich detail information and solves the problem that the shielded and weak texture areas can not be matched correctly.

Description

End-to-end binocular stereo matching method based on convolutional neural network
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an end-to-end binocular stereo matching method based on a convolutional neural network.
Background
The purpose of stereo matching is to obtain disparity values of image points in a stereo image pair, thereby performing depth information calculation, which is the core of many computer vision applications, such as autopilot, robotic navigation, binocular ranging, three-dimensional reconstruction, and the like. The stereo matching is divided into a traditional method and a deep learning method, and the deep learning is divided into a non-end-to-end method and an end-to-end method. The traditional stereo matching algorithm has low precision and low processing speed, which greatly limits the application of the stereo matching algorithm in actual scenes. In recent years, with the development of massively parallel computing devices, methods based on deep learning have made breakthrough progress in numerous visual tasks. The convolutional neural network has the advantages of high processing speed, strong robustness and the like, and is very suitable for the requirements of a stereo matching algorithm, so that the convolutional neural network gradually becomes the mainstream research direction of the stereo matching algorithm. However, the end-to-end algorithm is more convenient for optimizing the whole algorithm compared with the non-end-to-end algorithm, and thus is more convenient in practical application. However, there still exist many problems in the end-to-end binocular stereo matching, which affect the accuracy and speed of stereo matching, such as weak texture, repeated texture, and low matching rate at the edge of an object.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an end-to-end binocular stereo matching method based on a convolutional neural network, and the existing PsmNet network model for parallax estimation is improved. Firstly, adding cavity convolution into a multi-scale space pyramid pooling layer to expand the receptive field of the network; and in the cost aggregation module, 4 coding and decoding modules are connected in series for stacking, so that high-level information is further extracted. The improved network model enlarges the receptive field, simultaneously improves the good performance of the network model, increases rich detail information and solves the problem that the shielded and weak texture areas can not be matched correctly. The method is realized by the following steps:
(1) collecting a data set and preprocessing the data set;
(1-1) collecting a data set: the data set is derived from two open source data sets, SceneFlow and KITTI 2015, the former including a training set and a validation set, the latter including a training set and a test set;
(1-2) pretreatment: randomly cutting each input left view and right view in the data set to 256 multiplied by 512, and then carrying out normalization operation on the views;
(2) constructing a stereo matching network, wherein the stereo matching network comprises a feature extraction module, a feature fusion module, a cost construction module, a cost aggregation module and a parallax regression module;
(2-1) constructing a feature extraction module: the feature extraction module is a twin network sharing weight and is used for extracting features of an input left view and an input right view, wherein the input left view and the input right view are to be matched, and the output left view and the input right view are two unary features; the twin network firstly utilizes 3 convolutional layers to carry out down sampling on the input left view and the input right view once, the convolutional kernel of each convolutional layer is 3 multiplied by 3, and the step length is 2; next, 4 residual layers are further processed on the input left and right views, wherein the first residual layer comprises 3 residual blocks and the second residual layer comprises 16 residual blocks; the third residual layer comprises 3 residual blocks, and the fourth residual layer comprises 3 residual blocks; convolution kernels of the four residual error layers are all 3 multiplied by 3, characteristic dimensions are all 32, step lengths of residual error blocks in the second residual error layer are 2, and step lengths of the residual error blocks are all 1; each residual block structure is BN-conv-BN-ReLU-conv-BN, wherein BN, conv and ReLU refer to batch normalization, convolution layer and modified linear unit respectively. After the convolution operation, the output of the twin network is two unary features with the size of H/4 xW/4 xF, wherein H, W represents the height and width of the original input image respectively, and F represents the feature dimension;
(2-2) constructing a feature fusion module: the feature fusion module is used for performing multi-scale pooling operation on the features obtained in the previous step, performing pooling operation on each layer by adopting hole convolution, then performing feature fusion, and fusing the four-scale features obtained by pooling and the output of the second layer and the fourth layer of the residual error layer by using convolution kernel of 1x1, wherein the output is two unary features with the size of H/4 xW/4 xF, H, W respectively represents the height and the width of an original input image, and F represents a feature dimension;
(2-3) constructing a cost body module: the construction cost body module is used for calculating the matching cost of the two feature graphs, the input of the construction cost body module is two feature graphs containing context information, the output of the construction cost body module is a four-dimensional tensor, and the specific calculation process comprises the following steps: connecting a reference feature map containing context information with a corresponding target feature map containing context information under each possible parallax, and packaging the reference feature map and the target feature map into a 4-dimensional cost body, wherein the dimension of the cost body output by a cost body module is H/4 xW/4 xD/4 xF, H, W represents the height and the width of an original input image respectively, D represents the maximum possible parallax value, and F represents a feature dimension;
(2-4) constructing a cost aggregation module: the cost aggregation module is an encoding and decoding structure, and has four encoding and decoding structures in total, and is used for learning a regular function on a cost body to perform cost aggregation, wherein the input of the cost body is the cost body, and the output of the cost body is a regular characteristic graph; firstly, performing convolution on the obtained cost body by using 2 3D convolutional layers, wherein each 3D convolutional layer is 2 convolution kernels with the size of 3 multiplied by 3, the characteristic dimensionality is 32, and the output of the 1 st 3D convolutional layer is superposed on the output of the 2 nd 3D convolutional layer; stacking the four encoding and decoding structures in series, wherein the encoding and decoding structures comprise two stages of encoding and decoding, and the encoding stage comprises 4 3D convolutional layers; only two 3D deconvolution layers are applied to carry out upsampling in a decoding stage, and for the first deconvolution layer, a feature map of a corresponding dimension is added from an encoding stage so as to retain rough high-layer information and detailed low-layer information; finally, two 3D convolutional layers are utilized to further reduce the feature dimension to obtain a regularized feature map, wherein the regularized feature map dimension is H/4 xW/4 xD/4 x1, H, W respectively represents the height and width of the original input image, and D represents the maximum possible parallax value;
(2-5) constructing a parallax regression module: the parallax regression module is used for taking an inverse number for the value of the matching cost body and converting the matching cost body into corresponding matching probability by utilizing a softmax function; the input of the method is a regularized feature map, and the output of the method is a disparity map with dimensions H multiplied by W, wherein H, W respectively represent the height and width of an original input image;
(3) training a model;
(3-1) determining parameter settings of the network model;
the parameter setting of the network model comprises the steps of selecting Adam as an optimizer, setting the learning rate to be 1e-4 and the maximum training round to be 10;
(3-2) sending the preprocessed left and right views into model training;
firstly, inputting the left and right views of a training data set in preprocessed sceneFlow into a stereoPerforming forward propagation calculation in a model of the matching network to obtain a final disparity map; then, the output final disparity map and the real disparity map are input to a loss function
Figure BDA0003689283770000021
In the method, a batch gradient descent method is used for back propagation to obtain a pre-training model, and then the pre-processed KITTI training set data is trained on the pre-training model until
Figure BDA0003689283770000022
Converging to obtain a final stereo matching network model;
(4) and carrying out binocular stereo matching by using the trained stereo matching network model.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of an overall algorithm of an end-to-end binocular stereo matching method based on a convolutional neural network according to the present invention;
FIG. 2 is a network structure diagram of an end-to-end binocular stereo matching method based on a convolutional neural network according to the present invention;
Detailed Description
Exemplary embodiments of the present invention will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the parts closely related to the scheme according to the present invention are shown in the drawings, and other details not so much related to the present invention are omitted.
The first embodiment is as follows:
in this embodiment, an end-to-end binocular stereo matching method based on a convolutional neural network is described with reference to fig. 1, and the method includes the following steps:
step one, collecting a data set and preprocessing: the data set is derived from two open source data sets, namely a SceneFlow and a KITTI 2015, wherein the former comprises a training set and a verification set, the latter comprises a training set and a testing set, and the network training is performed under a pytorch framework;
step two, constructing a stereo matching network: the stereo matching network comprises a feature extraction module, a feature fusion module, a cost body construction module, a cost aggregation module and a parallax regression module;
step three, model training: inputting the left and right views of the preprocessed training data set into a model of a stereo matching network for forward propagation calculation to obtain a final disparity map; then, inputting the output final disparity map and the real disparity map into a loss function, and performing backward propagation by using a batch gradient descent method until the model converges;
and step four, performing binocular stereo matching by using the trained stereo matching network model.
The second embodiment is as follows:
on the basis of the first specific embodiment, with reference to fig. 2, the second specific method for constructing the feature extraction module, the feature fusion module, the cost body construction module, the cost aggregation module and the parallax regression module in the stereo matching network in the step two of the end-to-end binocular stereo matching method based on the convolutional neural network is as follows:
the feature extraction module is a twin network sharing weight and is used for extracting features of an input left view and an input right view, wherein the input left view and the input right view are to be matched, and the output left view and the input right view are two unary features; the twin network firstly utilizes 3 convolutional layers to carry out down sampling on the input left view and the input right view once, the convolutional kernel of each convolutional layer is 3 multiplied by 3, and the step length is 2; next, 4 residual layers are further processed on the input left and right views, wherein the first residual layer comprises 3 residual blocks and the second residual layer comprises 16 residual blocks; the third residual layer comprises 3 residual blocks, and the fourth residual layer comprises 3 residual blocks; convolution kernels of the four residual error layers are all 3 multiplied by 3, characteristic dimensions are all 32, step lengths of residual error blocks in the second residual error layer are 2, and step lengths of the residual error blocks are all 1; each residual block structure is BN-conv-BN-ReLU-conv-BN, wherein BN, conv and ReLU refer to batch normalization, convolution layer and modified linear unit respectively. After the convolution operation, the output of the twin network is two unary features with the size of 64 multiplied by 128, wherein 64, 128 and 128 respectively represent the height, width and feature dimensions of the features in turn;
the characteristic fusion module performs multi-scale pooling operation on the obtained characteristics, wherein each layer performs pooling operation by adopting cavity convolution, the expansion rates are respectively 6,12,18 and 24, and the step length is 1; performing feature fusion, and fusing the four-scale features obtained by the pooling and the outputs of the second layer and the fourth layer of the residual block by using a convolution kernel of 1 × 1, wherein the outputs are two unary features with the size of 64 × 128 × 32, and 64, 128 and 32 sequentially represent the height and width of the features and the feature dimensions respectively;
the construction cost body module is used for calculating the matching cost of two feature maps, the input of the construction cost body module is two feature maps containing context information, the output of the construction cost body module is a four-dimensional tensor, and the specific calculation process comprises the following steps: connecting a reference feature map containing context information with a corresponding target feature map containing context information under each possible parallax, and packaging the reference feature map and the target feature map into a 4-dimensional cost body, wherein the dimension of the cost body output by a cost body module is 64 multiplied by 128 multiplied by 48 multiplied by 64, and 64, 128, 48 and 64 sequentially represent the height and width of a feature, the maximum possible parallax value and the feature dimension respectively;
the cost aggregation module is of an encoding and decoding structure and is used for learning a regular function on a cost body to carry out cost aggregation, the input of the cost body is the cost body, and the output of the cost body is a regularized feature map; firstly, convolving the obtained cost body by using 2 3D convolution layers, wherein each 3D convolution layer uses 2 convolution kernels of 3x3, the characteristic dimensions are all 32, and the output of the 1 st 3D convolution layer is superposed on the output of the 2 nd 3D convolution layer; then, four encoding and decoding structures are connected in series to be stacked, wherein the encoding and decoding structures comprise an encoding stage and a decoding stage, the encoding stage comprises 4 3D convolutional layers, the convolutional layers are 3x3x3, the step length of the first convolutional layer and the third convolutional layer is 2, and the rest step length is 1; only two 3D deconvolution layers are applied to carry out upsampling in a decoding stage, convolution kernels are all 3x3x3, the step length is 2, and for the first deconvolution layer, a feature map of corresponding dimension is added from an encoding stage to reserve rough high-layer information and detailed low-layer information; for the output of each coding and decoding module, adding the output result of the cost body after two convolutions, wherein the convolution kernels of the two convolutions are both 3x3x3, and the step length is 1; finally, two 3D convolutional layers are utilized to further reduce the feature dimension to obtain a regularized feature map, wherein the regularized feature map dimension is 64 × 128 × 48 × 1, and 64, 128, 48 and 1 respectively represent the height and width of the feature, the maximum possible parallax value and the feature dimension in sequence;
the parallax regression module is used for taking an inverse number for the value of the matching cost body and converting the matching cost body into corresponding matching probability by utilizing a softmax function; its input is the regularized feature map and the output is a disparity map with dimensions H × W, where H, W represents the height and width, respectively, of the original input image.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications and substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (6)

1. An end-to-end binocular stereo matching method based on a convolutional neural network is characterized by comprising the following steps:
step 1: collecting and preprocessing a data set, wherein the data set is derived from two open source data sets, namely a scenflow and a KITTI 2015, the data set comprises a training set and a verification set, the data set comprises a training set and a testing set, and network training is performed under a pytorch framework;
step 2: constructing a stereo matching network, wherein the stereo matching network comprises a feature extraction module, a feature fusion module, a cost constructing module, a cost aggregation module and a parallax regression module;
and step 3: model training, namely inputting the left view and the right view of the preprocessed training data set into a model of a stereo matching network for forward propagation calculation to obtain a final disparity map; then, inputting the output final disparity map and the real disparity map into a loss function, and performing backward propagation by using a batch gradient descent method until the model converges;
and 4, step 4: and carrying out binocular stereo matching by using the trained stereo matching network model.
2. The convolutional neural network-based end-to-end binocular stereo matching method as claimed in claim 1, wherein the preprocessing in step 1 is implemented by the following steps:
(1) randomly clipping each input left view and right view in the data set to 256 multiplied by 512;
(2) and carrying out normalization operation on the cut picture.
3. The end-to-end binocular stereo matching method based on the convolutional neural network as claimed in claim 1, wherein the step 2 of constructing the stereo matching network is implemented by the following steps:
(1) a feature extraction module:
(1-1) downsampling the input left and right views once using 3 convolutional layers, each of which downsamples the input left and right views using 3 convolutional layers of 3 × 3 with a step size of 2;
(1-2) further processing the input left and right views with 4 residual layers, wherein a first residual layer includes 3 residual blocks and a second includes 16 residual blocks; the third comprises 3 residual blocks and the fourth comprises 3 residual blocks; convolution kernels of all residual error layers are 3 multiplied by 3, characteristic dimensions are 32, the step length of a residual error block in the second residual error layer is 2, and the step length of the rest residual error blocks is 1; after the convolution operation, the output is two unary features with the size of H/4 xW/4 xF, wherein H, W respectively represents the height and width of the original input image, and F represents the feature dimension;
(2) a feature fusion module:
(2-1) performing multi-scale pooling operation on the obtained features, wherein each layer is subjected to pooling operation by adopting hole convolution;
(2-2) performing feature fusion on the multi-scale features obtained by the pooling and the outputs of the second layer and the fourth layer of the residual layer by using a convolution kernel of 1x1, wherein the outputs are two unary features with the size of H/4 xW/4 xF, H, W respectively represents the height and width of an original input image, and F represents a feature dimension;
(3) constructing a cost body module:
(3-1) connecting the reference feature map containing the context information with the corresponding target feature map containing the context information at each possible disparity;
(3-2) packing the output feature map into a 4-dimensional cost body, wherein the dimensions of the cost body output by the cost body module are H/4 xW/4 xD/4 xF, wherein H, W respectively represents the height and the width of an original input image, D represents the maximum possible parallax value, and F represents a feature dimension;
(4) a cost aggregation module:
(4-1) convolving the obtained cost volume by using 2 3D convolutional layers, wherein each 3D convolutional layer is 2 convolution kernels with the size of 3x3, the characteristic dimension is 32, and the output of the 1 st 3D convolutional layer is superposed on the output of the 2 nd 3D convolutional layer;
(4-2) stacking four coding and decoding structures in series, wherein the coding and decoding structures comprise two stages of coding and decoding, and the coding stage comprises 4 3D convolutional layers; only two 3D deconvolution layers are applied to carry out upsampling in a decoding stage, and for the first deconvolution layer, a feature map of a corresponding dimension is added from an encoding stage so as to retain rough high-layer information and detailed low-layer information;
(4-3) for each codec structure, obtaining a regularized feature map by further reducing feature dimensions using two 3D convolutional layers, the regularized feature map dimensions being H/4 xw/4 xd/4 x1, where H, W represents the height and width of the original input image, respectively, and D represents the largest possible disparity value;
(5) a parallax regression module:
(5-1) taking an inverse number of the value of the matching cost body;
(5-2) converting the matching cost body into a corresponding matching probability by using a softmax function; its input is the regularized feature map and the output is a disparity map with dimensions H × W, where H, W represents the height and width, respectively, of the original input image.
4. The convolutional neural network-based end-to-end binocular stereo matching method as claimed in claim 1, wherein the model training in step 3 is implemented by the following steps:
(1) firstly, inputting the left and right views of a preprocessed training data set into a model of a stereo matching network for forward propagation calculation to obtain a final disparity map;
(2) inputting the output final disparity map and the real disparity map into a loss function, and performing back propagation by using a batch gradient descent method until the loss function is converged to obtain a pre-training model;
(3) and training the preprocessed KITTI training set data to the pre-training model until the loss function is converged to obtain a final stereo matching network model.
5. The end-to-end binocular stereo matching method based on the convolutional neural network as claimed in claim 3, wherein the multi-scale pooling operations in the feature fusion module are all downsampled by using hole convolution, the expansion rates are respectively 6,12,18 and 24, and the step lengths are all 1.
6. The convolutional neural network-based end-to-end binocular stereo matching method of claim 3, wherein the encoding stage in the cost aggregation module has four codec structures and each employs 4 convolutional layers with convolutional kernels of 3x3x3, wherein the step size of the first and third convolutional layers is 2, and the remaining step sizes are 1; in the decoding stage, 2 deconvolution layers with convolution kernels of 3x3x3 are applied, each with a step size of 2.
CN202210659456.3A 2022-06-10 2022-06-10 End-to-end binocular stereo matching method based on convolutional neural network Pending CN114972822A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210659456.3A CN114972822A (en) 2022-06-10 2022-06-10 End-to-end binocular stereo matching method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210659456.3A CN114972822A (en) 2022-06-10 2022-06-10 End-to-end binocular stereo matching method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN114972822A true CN114972822A (en) 2022-08-30

Family

ID=82960904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210659456.3A Pending CN114972822A (en) 2022-06-10 2022-06-10 End-to-end binocular stereo matching method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114972822A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information
CN115375930B (en) * 2022-10-26 2023-05-05 中国航发四川燃气涡轮研究院 Three-dimensional matching network and three-dimensional matching method based on multi-scale information

Similar Documents

Publication Publication Date Title
US11521039B2 (en) Method and apparatus with neural network performing convolution
CN110533712A (en) A kind of binocular solid matching process based on convolutional neural networks
CN110021069A (en) A kind of method for reconstructing three-dimensional model based on grid deformation
CN112927357A (en) 3D object reconstruction method based on dynamic graph network
CN108647723B (en) Image classification method based on deep learning network
CN112348870B (en) Significance target detection method based on residual error fusion
CN113554084B (en) Vehicle re-identification model compression method and system based on pruning and light convolution
CN112541572A (en) Residual oil distribution prediction method based on convolutional encoder-decoder network
CN113112607B (en) Method and device for generating three-dimensional grid model sequence with any frame rate
CN113514877A (en) Self-adaptive quick earthquake magnitude estimation method
CN112509021A (en) Parallax optimization method based on attention mechanism
CN114972822A (en) End-to-end binocular stereo matching method based on convolutional neural network
CN109948575A (en) Eyeball dividing method in ultrasound image
CN111783862A (en) Three-dimensional significant object detection technology of multi-attention-directed neural network
CN112529068A (en) Multi-view image classification method, system, computer equipment and storage medium
Kakillioglu et al. 3D capsule networks for object classification with weight pruning
CN113642675B (en) Underground rock stratum distribution imaging acquisition method, system, terminal and readable storage medium based on full waveform inversion and convolutional neural network
CN114549757A (en) Three-dimensional point cloud up-sampling method based on attention mechanism
WO2022213395A1 (en) Light-weighted target detection method and device, and storage medium
CN111612046B (en) Feature pyramid graph convolution neural network and application thereof in 3D point cloud classification
CN117011943A (en) Multi-scale self-attention mechanism-based decoupled 3D network action recognition method
CN116797640A (en) Depth and 3D key point estimation method for intelligent companion line inspection device
CN116597071A (en) Defect point cloud data reconstruction method based on K-nearest neighbor point sampling capable of learning
CN115578561A (en) Real-time semantic segmentation method and device based on multi-scale context aggregation network
Liu et al. A deep neural network pruning method based on gradient L1-norm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination