CN116681655A - Three-dimensional matching method and network based on residual cost volume - Google Patents

Three-dimensional matching method and network based on residual cost volume Download PDF

Info

Publication number
CN116681655A
CN116681655A CN202310553305.4A CN202310553305A CN116681655A CN 116681655 A CN116681655 A CN 116681655A CN 202310553305 A CN202310553305 A CN 202310553305A CN 116681655 A CN116681655 A CN 116681655A
Authority
CN
China
Prior art keywords
cost
scale
residual
volumes
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310553305.4A
Other languages
Chinese (zh)
Inventor
夏富坤
钱刃
刘洋
丘文峰
杨文帮
李建华
赵建川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN202310553305.4A priority Critical patent/CN116681655A/en
Publication of CN116681655A publication Critical patent/CN116681655A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)

Abstract

A three-dimensional matching method and a network based on residual cost volumes are used for constructing a plurality of first residual cost volumes with different scales according to an extracted feature pyramid, the dimensions of each first residual cost volume are different, information fusion is carried out on the residual cost volumes by adopting a residual heterogeneous aggregation mode, heterogeneous cost characterization can be effectively aggregated, information interaction of polymorphic cost characterization is achieved, the problem of information redundancy of a multi-scale cost volume network is solved, a binocular three-dimensional matching network is enabled to achieve better balance in precision and reasoning speed, error correction is carried out according to a plurality of first parallax images with different scales, and three-dimensional matching quality can be effectively improved.

Description

Three-dimensional matching method and network based on residual cost volume
Technical Field
The application relates to the technical field of stereo matching, in particular to a stereo matching method and a network based on residual cost volumes.
Background
As the most popular depth estimation scheme, binocular stereo matching is indispensable in many practical applications including automatic driving, robot navigation, three-dimensional reconstruction, augmented reality and the like, the performance of the binocular stereo matching algorithm directly affects the final performance implementation of the above technology, so that optimizing the stereo matching algorithm makes it important to accurately estimate the target depth in a faster time.
With the rise of deep learning and neural networks, a plurality of computer vision problems which are difficult to solve under the traditional method are greatly progressed, a classical three-dimensional matching network is based on an end-to-end convolutional neural network frame and sequentially divided into four modules, namely feature extraction, cost volume construction, cost aggregation and parallax regression, wherein the cost volume construction is the most obvious step of distinguishing the three-dimensional matching network from other computer vision tasks, and has a decisive effect on the speed and precision of the three-dimensional matching network, and the conventional 4D cost volume has more redundant information and slow speed, so that the precision and the speed are difficult to be well balanced by carrying out subsequent processing based on the cost volume.
Disclosure of Invention
According to the method, the first residual cost volumes with different scales and different dimensions are constructed according to the extracted feature pyramid, and the residual heterogeneous aggregation mode is adopted to conduct cost aggregation on the first residual cost volumes, so that heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular stereo matching network is better balanced in precision and reasoning speed.
In a first aspect, an embodiment provides a stereo matching method based on a residual cost volume, which is characterized by comprising: acquiring a left view and a right view to be matched; extracting features of the left view and the right view to obtain a feature pyramid; constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different; cost aggregation is carried out on all the first residual cost volumes in a residual heterogeneous aggregation mode, so that a plurality of second cost volumes with different scales are obtained, the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the corresponding second cost volumes are the same; parallax regression is carried out on each second cost volume respectively, so that a plurality of first parallax images with different sizes are obtained; and carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.
In some embodiments, the residual heterogeneous polymerization mode includes: performing inner scale cost aggregation on each first residual cost volume to obtain inner scale aggregation cost volumes; and carrying out information fusion among the cost volumes on all the intra-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.
In some embodiments, performing information fusion between cost volumes on all the intra-scale aggregation cost volumes includes: a multi-scale cross-scale cost aggregation operation, wherein: a cross-scale cost aggregation operation for each scale, comprising: sampling the intra-scale aggregation cost volumes with different scales to the same scale; and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.
In some embodiments, the first residual cost volume comprises: 1/3 scale residual 3D cost volume, 1/6 scale residual 4D cost volume, and 1/12 scale conventional 4D cost volume.
In some embodiments, the residual heterogeneous polymerization mode is represented by:wherein I is identity transformation; />The second cost volume being 1/3 scale;said second cost volume of 1/6 scale,>the second cost volume with the 1/12 scale is transformed, and T is the forward residual cost volume; />Transforming for reverse residual cost volume; s is S q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error;4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.
In some embodiments, each of the first residual cost volumes is intra-scale cost aggregated by a two-layer convolution.
In some embodiments, the performing parallax regression on each of the second cost volumes to obtain a plurality of first parallax maps with different sizes includes: performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view; performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view; and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.
In some embodiments, the performing error correction according to all the first disparity maps to obtain a second disparity map with the same size as the left view and the right view includes: upsampling a first disparity map with a size of 1/12 of the left view and the right view to obtain a first upsampled disparity map with a size of 1/6 of the left view and the right view; adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6; upsampling the first error correction map to obtain a second upsampled error map having a size of 1/3 of the left view and the right view; adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3; and upsampling the second error correction map to obtain a second parallax map with the same size as the left view and the right view.
In a second aspect, the present application provides a stereo matching network based on residual cost volumes, comprising: the device comprises a feature extraction module, a cost volume construction module, a cost aggregation module, a parallax regression module and a parallax optimization module; the feature extraction module is used for extracting features of the obtained left view and right view to obtain a feature pyramid; the cost volume construction module is used for constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, and the dimensionality of each first residual cost volume is different; the cost aggregation module is used for carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the second cost volumes are the same; the visual regression module is used for carrying out parallax regression on each second cost volume respectively to obtain a plurality of first parallax graphs with different sizes; the parallax optimization module is used for carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.
In some embodiments, the cost aggregation module includes an intra-scale cost aggregation sub-module and a cross-modality cost aggregation sub-module; the inner scale cost aggregation sub-module is used for carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module is used for carrying out information fusion among cost volumes on all the inner-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.
According to the method, a plurality of residual cost volumes with different scales and different dimensions are constructed according to the extracted feature pyramid, information fusion is carried out on the residual cost volumes by adopting a residual heterogeneous aggregation mode, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and a binocular three-dimensional matching network is better balanced in precision and reasoning speed; error correction is carried out according to a plurality of first parallax images with different scales, so that the quality of stereo matching can be effectively improved.
Drawings
FIG. 1 is a flow chart of a three-dimensional matching method based on residual cost volumes provided by the application;
FIG. 2 is a flow chart of a residual heterogeneous polymerization scheme according to an embodiment;
FIG. 3 is a flow diagram of a cross-scale cost aggregation operation for each scale of one embodiment;
FIG. 4 is a flow chart of parallax regression for each second cost volume according to one embodiment;
FIG. 5 is a flow chart of performing disparity regression on each second cost volume separately according to one embodiment;
fig. 6 is a schematic structural diagram of a stereo matching network based on residual cost volumes provided by the present application;
fig. 7 is a schematic structural diagram of a cost aggregation module according to an embodiment.
Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.
In the stereo matching process, the construction of the cost volume is the most obvious step of distinguishing the stereo matching network from other computer vision tasks, and plays a decisive role in the speed and the precision of the stereo matching network, but the conventional 4D cost volume has more redundant information and low speed, and the precision and the speed can be well balanced by carrying out subsequent cost aggregation and other treatments based on the cost volume.
In the embodiment of the application, a plurality of cost rolls with different scales and different dimensions are constructed according to the extracted feature pyramid, and the cost rolls are subjected to information fusion in a residual error heterogeneous aggregation mode, so that heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, the problem of information redundancy of a multi-scale cost roll network is solved, and a binocular stereo matching network is better balanced in precision and reasoning speed.
Referring to fig. 1, in an embodiment of the present application, a stereo matching method based on residual cost volumes is provided, including:
s10: and acquiring a left view and a right view to be matched.
S20: and extracting the features of the left view and the right view to obtain a feature pyramid.
S30: and constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different.
In some embodiments, the first residual cost volume comprises: the construction mode of the 1/3 scale residual error 3D cost volume, the 1/6 scale residual error 4D cost volume and the 1/12 scale conventional 4D cost volume is to balance the parameter quantity of the residual error cost volume and the information quality of cost representation, wherein the 1/3 scale occupies the parameter with larger specific gravity on the whole cost volume due to higher resolution, and the calculation cost of the 3D residual error cost volume can be effectively reduced by selecting the 3D residual error cost volume; the 1/6 scale and the 1/12 scale take the final parallax estimation performance of the network into consideration, and more accurate initial parallax and upper semantic features need to be provided, so that a 1/6 scale residual 4D cost volume and a 1/12 scale conventional 4D cost volume are constructed.
S40: and carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and second cost volumes are the same.
In some embodiments, as shown in fig. 2, the residual heterogeneous polymerization manner includes:
s41: and carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume.
In some embodiments, each of the first residual cost volumes is intra-scale cost aggregated by a two-layer convolution.
S42: and performing cross-scale cost aggregation on all the intra-scale aggregation cost volumes to obtain a plurality of second cost volumes with different scales.
In some embodiments, performing information fusion between cost volumes for all intra-scale aggregate cost volumes includes: a multi-scale cross-scale cost aggregation operation, wherein:
the cross-scale cost aggregation operation for each scale, as shown in fig. 3, includes:
s420: sampling the intra-scale aggregation cost volumes with different scales to the same scale;
s421: and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.
In some embodiments, the information fusion between the cost volumes is performed on all the intra-scale aggregation cost volumes, which includes N-scale cross-scale cost aggregation operations, and in each scale cross-scale cost aggregation operation, N-scale intra-scale cost aggregation results are aggregated, where N is 3, and 3 scales respectively correspond to three scales (1/3, 1/6, and 1/12) of the first residual cost volume.
In some embodiments, the residual heterogeneous polymerization mode is represented by:
wherein I is identity transformation;a second cost volume of 1/3 scale; />A second cost volume of 1/6 scale,a second cost volume with 1/12 scale, T is the transformation of a forward residual cost volume; />Transforming for reverse residual cost volume; s is S q The method is characterized by comprising a squeeze operation, which is used for converting a 4D cost volume into a 3D cost volume, namely realizing dimension reduction operation, specifically, firstly utilizing a 3D convolution layer to adjust the number of channels to be 1, and then adopting the squeeze operation to remove the dimension of the channels; US (US) q The method comprises the steps of referring to an unsqueeze operation, converting a 3D cost volume into a 4D cost volume, namely realizing dimension lifting operation, correspondingly, adding channel dimensions in the 3D cost volume by utilizing the unsqueeze operation, and then adopting a convolution layer to increase the number of channels and keep the same with the corresponding 4D cost volume; />3D cost volume for 1/3 scale residual error; />4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.
Cost aggregation is carried out on all first residual cost volumes constructed in a residual heterogeneous aggregation mode, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is achieved, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular three-dimensional matching network is better balanced in accuracy and reasoning speed.
S50: and respectively carrying out parallax regression on each second cost volume to obtain a plurality of first parallax images with different sizes.
In some embodiments, parallax regression is performed on each second cost volume to obtain a plurality of first parallax graphs with different sizes, as shown in fig. 4, including:
s51: and performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view.
S52: and performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view.
S53: and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.
S60: and carrying out error correction according to all the first parallax images to obtain a second parallax image with the same size as the left view and the right view.
In some embodiments, performing error correction according to all the first parallax diagrams to obtain a second parallax diagram with the same size as the left view and the right view, as shown in fig. 5, including:
s61: up-sampling the first disparity map with the size of 1/12 of the left view and the right view to obtain a first up-sampled disparity map with the size of 1/6 of the left view and the right view.
S62: and adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6.
S63: upsampling the first error correction map results in a second upsampled error map having dimensions 1/3 of the left and right views.
S64: and adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3.
S65: and up-sampling the second error correction diagram to obtain a second parallax diagram with the same size as the left view and the right view.
In some embodiments, the adding operation in step S62 and step S64 refers to adding the values of the corresponding pixels in the two images, so as to implement error correction of the first parallax image, obtain a second parallax image with the same size as the left view and the right view, and perform error correction according to the first parallax images with different scales, so that quality of stereo matching can be effectively improved.
In another embodiment of the present application, a stereo matching network based on residual cost volume is provided, as shown in fig. 6, including: the system comprises a feature extraction module 10, a cost volume construction module 20, a cost aggregation module 30, a parallax regression module 40 and a parallax optimization 50 module; the feature extraction module 10 is used for extracting features of the acquired left view and right view to obtain a feature pyramid; the cost volume construction module 20 is configured to construct a plurality of first residual cost volumes with different dimensions according to the feature pyramid, where each first residual cost volume has a different dimension; the cost aggregation module 30 is configured to aggregate costs of all the first residual cost volumes in a residual heterogeneous aggregation manner, so as to obtain a plurality of second cost volumes with different scales, where the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and second cost volumes are the same; the parallax regression module 40 is configured to perform parallax regression on each second cost volume to obtain a plurality of first parallax graphs with different sizes; the parallax optimization module 50 is configured to perform error correction according to all the first parallax images, so as to obtain a second parallax image with the same size as the left view and the right view.
In some embodiments, the residual heterogeneous polymerization mode is represented by:
wherein I is identity transformation;a second cost volume of 1/3 scale; />A second cost volume of 1/6 scale,a second cost volume of 1/12 scale, T being the forward residualPerforming difference cost volume transformation; />Transforming for reverse residual cost volume; s is S q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error; />1/6 scale residual 4D cost volume; />Is a 1/12 scale conventional 4D cost volume.
The cost aggregation module 30 is used for carrying out cost aggregation on all the first residual cost volumes constructed, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular three-dimensional matching network is better balanced in precision and reasoning speed.
In some embodiments, as shown in fig. 7, cost aggregation module 30 includes an intra-scale cost aggregation sub-module 31 and a cross-modality cost aggregation sub-module 32; the inner scale cost aggregation sub-module 31 is configured to perform inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module 32 is configured to perform information fusion between cost volumes on all the intra-scale aggregation cost volumes, so as to obtain a plurality of second cost volumes with different scales.
The implementation manner of the stereo matching network provided in this embodiment is the same as that of the foregoing method, and will not be repeated here.
The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims (10)

1. The stereo matching method based on the residual cost volume is characterized by comprising the following steps:
acquiring a left view and a right view to be matched;
extracting features of the left view and the right view to obtain a feature pyramid;
constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different;
cost aggregation is carried out on all the first residual cost volumes in a residual heterogeneous aggregation mode, so that a plurality of second cost volumes with different scales are obtained, the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the corresponding second cost volumes are the same;
parallax regression is carried out on each second cost volume respectively, so that a plurality of first parallax images with different sizes are obtained;
and carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.
2. The method of claim 1, wherein the residual heterogeneous polymerization scheme comprises:
performing inner scale cost aggregation on each first residual cost volume to obtain inner scale aggregation cost volumes;
and carrying out information fusion among the cost volumes on all the intra-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.
3. The method of claim 2, wherein performing inter-cost volume information fusion on all of the intra-scale aggregate cost volumes comprises: a multi-scale cross-scale cost aggregation operation, wherein:
a cross-scale cost aggregation operation for each scale, comprising:
sampling the intra-scale aggregation cost volumes with different scales to the same scale;
and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.
4. The method of claim 1, wherein the first residual cost volume comprises: 1/3 scale residual 3D cost volume, 1/6 scale residual 4D cost volume, and 1/12 scale conventional 4D cost volume.
5. The method of claim 4, wherein the residual heterogeneous polymerization mode is represented by:
wherein I is identity transformation;the second cost volume being 1/3 scale; />Said second cost volume of 1/6 scale,>the second cost volume with the 1/12 scale is transformed, and T is the forward residual cost volume; />Transforming for reverse residual cost volume; s is S q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error; />4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.
6. A method as claimed in claim 3, wherein each of said first residual cost volumes is inter-scale cost aggregated by two-layer convolution.
7. The method of claim 5, wherein performing disparity regression on each of the second cost volumes to obtain a plurality of first disparity maps of different sizes comprises:
performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view;
performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view;
and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.
8. The method of claim 7, wherein said performing error correction based on all of said first disparity maps to obtain a second disparity map having a same size as said left view and said right view comprises:
upsampling a first disparity map with a size of 1/12 of the left view and the right view to obtain a first upsampled disparity map with a size of 1/6 of the left view and the right view;
adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6;
upsampling the first error correction map to obtain a second upsampled error map having a size of 1/3 of the left view and the right view;
adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3;
and upsampling the second error correction map to obtain a second parallax map with the same size as the left view and the right view.
9. A three-dimensional matching network based on residual cost volumes, comprising: the device comprises a feature extraction module, a cost volume construction module, a cost aggregation module, a parallax regression module and a parallax optimization module;
the feature extraction module is used for extracting features of the obtained left view and right view to obtain a feature pyramid;
the cost volume construction module is used for constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, and the dimensionality of each first residual cost volume is different;
the cost aggregation module is used for carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the second cost volumes are the same;
the visual regression module is used for carrying out parallax regression on each second cost volume respectively to obtain a plurality of first parallax graphs with different sizes;
the parallax optimization module is used for carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.
10. The stereo matching network of claim 9, wherein the cost aggregation module comprises an intra-scale cost aggregation sub-module and a cross-modality cost aggregation sub-module; the inner scale cost aggregation sub-module is used for carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module is used for carrying out information fusion among cost volumes on all the inner-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.
CN202310553305.4A 2023-05-16 2023-05-16 Three-dimensional matching method and network based on residual cost volume Pending CN116681655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310553305.4A CN116681655A (en) 2023-05-16 2023-05-16 Three-dimensional matching method and network based on residual cost volume

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310553305.4A CN116681655A (en) 2023-05-16 2023-05-16 Three-dimensional matching method and network based on residual cost volume

Publications (1)

Publication Number Publication Date
CN116681655A true CN116681655A (en) 2023-09-01

Family

ID=87780025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310553305.4A Pending CN116681655A (en) 2023-05-16 2023-05-16 Three-dimensional matching method and network based on residual cost volume

Country Status (1)

Country Link
CN (1) CN116681655A (en)

Similar Documents

Publication Publication Date Title
CN109508681B (en) Method and device for generating human body key point detection model
US9418458B2 (en) Graph image representation from convolutional neural networks
US20200380695A1 (en) Methods, systems, and media for segmenting images
CN111145170A (en) Medical image segmentation method based on deep learning
WO2022257578A1 (en) Method for recognizing text, and apparatus
US20160196479A1 (en) Image similarity as a function of weighted descriptor similarities derived from neural networks
JP2021522565A (en) Neural hardware accelerator for parallel distributed tensor calculations
CN106683048A (en) Image super-resolution method and image super-resolution equipment
CN112990077B (en) Face action unit identification method and device based on joint learning and optical flow estimation
CN110569851B (en) Real-time semantic segmentation method for gated multi-layer fusion
CN117597703B (en) Multi-scale converter for image analysis
CN116740162B (en) Stereo matching method based on multi-scale cost volume and computer storage medium
CN113674334A (en) Texture recognition method based on depth self-attention network and local feature coding
CN104484886A (en) Segmentation method and device for MR image
CN111833261A (en) Image super-resolution restoration method for generating countermeasure network based on attention
CN112509021A (en) Parallax optimization method based on attention mechanism
CN113642445A (en) Hyperspectral image classification method based on full convolution neural network
KR102329546B1 (en) System and method for medical diagnosis using neural network and non-local block
CN113592015B (en) Method and device for positioning and training feature matching network
CN114565628A (en) Image segmentation method and system based on boundary perception attention
CN112927236B (en) Clothing analysis method and system based on channel attention and self-supervision constraint
CN114170519A (en) High-resolution remote sensing road extraction method based on deep learning and multidimensional attention
CN110599495A (en) Image segmentation method based on semantic information mining
CN116681655A (en) Three-dimensional matching method and network based on residual cost volume
CN116486155A (en) Target detection method based on transducer and cascade characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination