CN116681655A

CN116681655A - Three-dimensional matching method and network based on residual cost volume

Info

Publication number: CN116681655A
Application number: CN202310553305.4A
Authority: CN
Inventors: 夏富坤; 钱刃; 刘洋; 丘文峰; 杨文帮; 李建华; 赵建川
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-09-01

Abstract

A three-dimensional matching method and a network based on residual cost volumes are used for constructing a plurality of first residual cost volumes with different scales according to an extracted feature pyramid, the dimensions of each first residual cost volume are different, information fusion is carried out on the residual cost volumes by adopting a residual heterogeneous aggregation mode, heterogeneous cost characterization can be effectively aggregated, information interaction of polymorphic cost characterization is achieved, the problem of information redundancy of a multi-scale cost volume network is solved, a binocular three-dimensional matching network is enabled to achieve better balance in precision and reasoning speed, error correction is carried out according to a plurality of first parallax images with different scales, and three-dimensional matching quality can be effectively improved.

Description

Three-dimensional matching method and network based on residual cost volume

Technical Field

The application relates to the technical field of stereo matching, in particular to a stereo matching method and a network based on residual cost volumes.

Background

As the most popular depth estimation scheme, binocular stereo matching is indispensable in many practical applications including automatic driving, robot navigation, three-dimensional reconstruction, augmented reality and the like, the performance of the binocular stereo matching algorithm directly affects the final performance implementation of the above technology, so that optimizing the stereo matching algorithm makes it important to accurately estimate the target depth in a faster time.

With the rise of deep learning and neural networks, a plurality of computer vision problems which are difficult to solve under the traditional method are greatly progressed, a classical three-dimensional matching network is based on an end-to-end convolutional neural network frame and sequentially divided into four modules, namely feature extraction, cost volume construction, cost aggregation and parallax regression, wherein the cost volume construction is the most obvious step of distinguishing the three-dimensional matching network from other computer vision tasks, and has a decisive effect on the speed and precision of the three-dimensional matching network, and the conventional 4D cost volume has more redundant information and slow speed, so that the precision and the speed are difficult to be well balanced by carrying out subsequent processing based on the cost volume.

Disclosure of Invention

According to the method, the first residual cost volumes with different scales and different dimensions are constructed according to the extracted feature pyramid, and the residual heterogeneous aggregation mode is adopted to conduct cost aggregation on the first residual cost volumes, so that heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular stereo matching network is better balanced in precision and reasoning speed.

In a first aspect, an embodiment provides a stereo matching method based on a residual cost volume, which is characterized by comprising: acquiring a left view and a right view to be matched; extracting features of the left view and the right view to obtain a feature pyramid; constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different; cost aggregation is carried out on all the first residual cost volumes in a residual heterogeneous aggregation mode, so that a plurality of second cost volumes with different scales are obtained, the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the corresponding second cost volumes are the same; parallax regression is carried out on each second cost volume respectively, so that a plurality of first parallax images with different sizes are obtained; and carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.

In some embodiments, the residual heterogeneous polymerization mode includes: performing inner scale cost aggregation on each first residual cost volume to obtain inner scale aggregation cost volumes; and carrying out information fusion among the cost volumes on all the intra-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.

In some embodiments, performing information fusion between cost volumes on all the intra-scale aggregation cost volumes includes: a multi-scale cross-scale cost aggregation operation, wherein: a cross-scale cost aggregation operation for each scale, comprising: sampling the intra-scale aggregation cost volumes with different scales to the same scale; and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.

In some embodiments, the first residual cost volume comprises: 1/3 scale residual 3D cost volume, 1/6 scale residual 4D cost volume, and 1/12 scale conventional 4D cost volume.

In some embodiments, the residual heterogeneous polymerization mode is represented by:wherein I is identity transformation; />The second cost volume being 1/3 scale;said second cost volume of 1/6 scale,>the second cost volume with the 1/12 scale is transformed, and T is the forward residual cost volume; />Transforming for reverse residual cost volume; s is S _q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) _q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error;4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.

In some embodiments, each of the first residual cost volumes is intra-scale cost aggregated by a two-layer convolution.

In some embodiments, the performing parallax regression on each of the second cost volumes to obtain a plurality of first parallax maps with different sizes includes: performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view; performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view; and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.

In some embodiments, the performing error correction according to all the first disparity maps to obtain a second disparity map with the same size as the left view and the right view includes: upsampling a first disparity map with a size of 1/12 of the left view and the right view to obtain a first upsampled disparity map with a size of 1/6 of the left view and the right view; adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6; upsampling the first error correction map to obtain a second upsampled error map having a size of 1/3 of the left view and the right view; adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3; and upsampling the second error correction map to obtain a second parallax map with the same size as the left view and the right view.

In a second aspect, the present application provides a stereo matching network based on residual cost volumes, comprising: the device comprises a feature extraction module, a cost volume construction module, a cost aggregation module, a parallax regression module and a parallax optimization module; the feature extraction module is used for extracting features of the obtained left view and right view to obtain a feature pyramid; the cost volume construction module is used for constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, and the dimensionality of each first residual cost volume is different; the cost aggregation module is used for carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the second cost volumes are the same; the visual regression module is used for carrying out parallax regression on each second cost volume respectively to obtain a plurality of first parallax graphs with different sizes; the parallax optimization module is used for carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.

In some embodiments, the cost aggregation module includes an intra-scale cost aggregation sub-module and a cross-modality cost aggregation sub-module; the inner scale cost aggregation sub-module is used for carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module is used for carrying out information fusion among cost volumes on all the inner-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.

According to the method, a plurality of residual cost volumes with different scales and different dimensions are constructed according to the extracted feature pyramid, information fusion is carried out on the residual cost volumes by adopting a residual heterogeneous aggregation mode, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and a binocular three-dimensional matching network is better balanced in precision and reasoning speed; error correction is carried out according to a plurality of first parallax images with different scales, so that the quality of stereo matching can be effectively improved.

Drawings

FIG. 1 is a flow chart of a three-dimensional matching method based on residual cost volumes provided by the application;

FIG. 2 is a flow chart of a residual heterogeneous polymerization scheme according to an embodiment;

FIG. 3 is a flow diagram of a cross-scale cost aggregation operation for each scale of one embodiment;

FIG. 4 is a flow chart of parallax regression for each second cost volume according to one embodiment;

FIG. 5 is a flow chart of performing disparity regression on each second cost volume separately according to one embodiment;

fig. 6 is a schematic structural diagram of a stereo matching network based on residual cost volumes provided by the present application;

fig. 7 is a schematic structural diagram of a cost aggregation module according to an embodiment.

Detailed Description

The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.

In the stereo matching process, the construction of the cost volume is the most obvious step of distinguishing the stereo matching network from other computer vision tasks, and plays a decisive role in the speed and the precision of the stereo matching network, but the conventional 4D cost volume has more redundant information and low speed, and the precision and the speed can be well balanced by carrying out subsequent cost aggregation and other treatments based on the cost volume.

In the embodiment of the application, a plurality of cost rolls with different scales and different dimensions are constructed according to the extracted feature pyramid, and the cost rolls are subjected to information fusion in a residual error heterogeneous aggregation mode, so that heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, the problem of information redundancy of a multi-scale cost roll network is solved, and a binocular stereo matching network is better balanced in precision and reasoning speed.

Referring to fig. 1, in an embodiment of the present application, a stereo matching method based on residual cost volumes is provided, including:

s10: and acquiring a left view and a right view to be matched.

S20: and extracting the features of the left view and the right view to obtain a feature pyramid.

S30: and constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different.

In some embodiments, the first residual cost volume comprises: the construction mode of the 1/3 scale residual error 3D cost volume, the 1/6 scale residual error 4D cost volume and the 1/12 scale conventional 4D cost volume is to balance the parameter quantity of the residual error cost volume and the information quality of cost representation, wherein the 1/3 scale occupies the parameter with larger specific gravity on the whole cost volume due to higher resolution, and the calculation cost of the 3D residual error cost volume can be effectively reduced by selecting the 3D residual error cost volume; the 1/6 scale and the 1/12 scale take the final parallax estimation performance of the network into consideration, and more accurate initial parallax and upper semantic features need to be provided, so that a 1/6 scale residual 4D cost volume and a 1/12 scale conventional 4D cost volume are constructed.

S40: and carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and second cost volumes are the same.

In some embodiments, as shown in fig. 2, the residual heterogeneous polymerization manner includes:

s41: and carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume.

S42: and performing cross-scale cost aggregation on all the intra-scale aggregation cost volumes to obtain a plurality of second cost volumes with different scales.

In some embodiments, performing information fusion between cost volumes for all intra-scale aggregate cost volumes includes: a multi-scale cross-scale cost aggregation operation, wherein:

the cross-scale cost aggregation operation for each scale, as shown in fig. 3, includes:

s420: sampling the intra-scale aggregation cost volumes with different scales to the same scale;

s421: and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.

In some embodiments, the information fusion between the cost volumes is performed on all the intra-scale aggregation cost volumes, which includes N-scale cross-scale cost aggregation operations, and in each scale cross-scale cost aggregation operation, N-scale intra-scale cost aggregation results are aggregated, where N is 3, and 3 scales respectively correspond to three scales (1/3, 1/6, and 1/12) of the first residual cost volume.

In some embodiments, the residual heterogeneous polymerization mode is represented by:

wherein I is identity transformation;a second cost volume of 1/3 scale; />A second cost volume of 1/6 scale,a second cost volume with 1/12 scale, T is the transformation of a forward residual cost volume; />Transforming for reverse residual cost volume; s is S _q The method is characterized by comprising a squeeze operation, which is used for converting a 4D cost volume into a 3D cost volume, namely realizing dimension reduction operation, specifically, firstly utilizing a 3D convolution layer to adjust the number of channels to be 1, and then adopting the squeeze operation to remove the dimension of the channels; US (US) _q The method comprises the steps of referring to an unsqueeze operation, converting a 3D cost volume into a 4D cost volume, namely realizing dimension lifting operation, correspondingly, adding channel dimensions in the 3D cost volume by utilizing the unsqueeze operation, and then adopting a convolution layer to increase the number of channels and keep the same with the corresponding 4D cost volume; />3D cost volume for 1/3 scale residual error; />4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.

Cost aggregation is carried out on all first residual cost volumes constructed in a residual heterogeneous aggregation mode, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is achieved, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular three-dimensional matching network is better balanced in accuracy and reasoning speed.

S50: and respectively carrying out parallax regression on each second cost volume to obtain a plurality of first parallax images with different sizes.

In some embodiments, parallax regression is performed on each second cost volume to obtain a plurality of first parallax graphs with different sizes, as shown in fig. 4, including:

s51: and performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view.

S52: and performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view.

S53: and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.

S60: and carrying out error correction according to all the first parallax images to obtain a second parallax image with the same size as the left view and the right view.

In some embodiments, performing error correction according to all the first parallax diagrams to obtain a second parallax diagram with the same size as the left view and the right view, as shown in fig. 5, including:

s61: up-sampling the first disparity map with the size of 1/12 of the left view and the right view to obtain a first up-sampled disparity map with the size of 1/6 of the left view and the right view.

S62: and adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6.

S63: upsampling the first error correction map results in a second upsampled error map having dimensions 1/3 of the left and right views.

S64: and adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3.

S65: and up-sampling the second error correction diagram to obtain a second parallax diagram with the same size as the left view and the right view.

In some embodiments, the adding operation in step S62 and step S64 refers to adding the values of the corresponding pixels in the two images, so as to implement error correction of the first parallax image, obtain a second parallax image with the same size as the left view and the right view, and perform error correction according to the first parallax images with different scales, so that quality of stereo matching can be effectively improved.

In another embodiment of the present application, a stereo matching network based on residual cost volume is provided, as shown in fig. 6, including: the system comprises a feature extraction module 10, a cost volume construction module 20, a cost aggregation module 30, a parallax regression module 40 and a parallax optimization 50 module; the feature extraction module 10 is used for extracting features of the acquired left view and right view to obtain a feature pyramid; the cost volume construction module 20 is configured to construct a plurality of first residual cost volumes with different dimensions according to the feature pyramid, where each first residual cost volume has a different dimension; the cost aggregation module 30 is configured to aggregate costs of all the first residual cost volumes in a residual heterogeneous aggregation manner, so as to obtain a plurality of second cost volumes with different scales, where the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and second cost volumes are the same; the parallax regression module 40 is configured to perform parallax regression on each second cost volume to obtain a plurality of first parallax graphs with different sizes; the parallax optimization module 50 is configured to perform error correction according to all the first parallax images, so as to obtain a second parallax image with the same size as the left view and the right view.

wherein I is identity transformation;a second cost volume of 1/3 scale; />A second cost volume of 1/6 scale,a second cost volume of 1/12 scale, T being the forward residualPerforming difference cost volume transformation; />Transforming for reverse residual cost volume; s is S _q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) _q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error; />1/6 scale residual 4D cost volume; />Is a 1/12 scale conventional 4D cost volume.

The cost aggregation module 30 is used for carrying out cost aggregation on all the first residual cost volumes constructed, heterogeneous cost characterization can be efficiently aggregated, information interaction of polymorphic cost characterization is realized, and therefore the problem of information redundancy of a multi-scale cost volume network is solved, and the binocular three-dimensional matching network is better balanced in precision and reasoning speed.

In some embodiments, as shown in fig. 7, cost aggregation module 30 includes an intra-scale cost aggregation sub-module 31 and a cross-modality cost aggregation sub-module 32; the inner scale cost aggregation sub-module 31 is configured to perform inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module 32 is configured to perform information fusion between cost volumes on all the intra-scale aggregation cost volumes, so as to obtain a plurality of second cost volumes with different scales.

The implementation manner of the stereo matching network provided in this embodiment is the same as that of the foregoing method, and will not be repeated here.

The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims

1. The stereo matching method based on the residual cost volume is characterized by comprising the following steps:

acquiring a left view and a right view to be matched;

extracting features of the left view and the right view to obtain a feature pyramid;

constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, wherein the dimensions of each first residual cost volume are different;

cost aggregation is carried out on all the first residual cost volumes in a residual heterogeneous aggregation mode, so that a plurality of second cost volumes with different scales are obtained, the second cost volumes are in one-to-one correspondence with the first residual cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the corresponding second cost volumes are the same;

parallax regression is carried out on each second cost volume respectively, so that a plurality of first parallax images with different sizes are obtained;

and carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.

2. The method of claim 1, wherein the residual heterogeneous polymerization scheme comprises:

performing inner scale cost aggregation on each first residual cost volume to obtain inner scale aggregation cost volumes;

and carrying out information fusion among the cost volumes on all the intra-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.

3. The method of claim 2, wherein performing inter-cost volume information fusion on all of the intra-scale aggregate cost volumes comprises: a multi-scale cross-scale cost aggregation operation, wherein:

a cross-scale cost aggregation operation for each scale, comprising:

sampling the intra-scale aggregation cost volumes with different scales to the same scale;

and carrying out information fusion among the cost rolls on the intra-scale aggregation cost rolls with the same scale to obtain the second cost roll with one scale.

4. The method of claim 1, wherein the first residual cost volume comprises: 1/3 scale residual 3D cost volume, 1/6 scale residual 4D cost volume, and 1/12 scale conventional 4D cost volume.

5. The method of claim 4, wherein the residual heterogeneous polymerization mode is represented by:

wherein I is identity transformation;the second cost volume being 1/3 scale; />Said second cost volume of 1/6 scale,>the second cost volume with the 1/12 scale is transformed, and T is the forward residual cost volume; />Transforming for reverse residual cost volume; s is S _q Refers to a squeeze operation for converting a 4D cost volume into a 3D cost volume; US (US) _q Refers to an unsqueeze operation for converting a 3D cost volume into a 4D cost volume; />3D cost volume for 1/3 scale residual error; />4D cost volume for 1/6 scale residual error; />Is a 1/12 scale conventional 4D cost volume.

6. A method as claimed in claim 3, wherein each of said first residual cost volumes is inter-scale cost aggregated by two-layer convolution.

7. The method of claim 5, wherein performing disparity regression on each of the second cost volumes to obtain a plurality of first disparity maps of different sizes comprises:

performing parallax regression on the second cost volume with the 1/3 scale to obtain a first parallax map with the size of 1/3 of the left view and the right view;

performing parallax regression on the second cost volume with the 1/6 scale to obtain a first parallax map with the size of 1/6 of the left view and the right view;

and performing parallax regression on the second cost volume with the 1/12 scale to obtain a first parallax map with the size of 1/12 of the left view and the right view.

8. The method of claim 7, wherein said performing error correction based on all of said first disparity maps to obtain a second disparity map having a same size as said left view and said right view comprises:

upsampling a first disparity map with a size of 1/12 of the left view and the right view to obtain a first upsampled disparity map with a size of 1/6 of the left view and the right view;

adding the first up-sampling parallax image and the first parallax image with the same size to obtain a first error correction image with the size of 1/6;

upsampling the first error correction map to obtain a second upsampled error map having a size of 1/3 of the left view and the right view;

adding the second up-sampling error map and the first parallax map with the same size to obtain a second error correction map with the size of 1/3;

and upsampling the second error correction map to obtain a second parallax map with the same size as the left view and the right view.

9. A three-dimensional matching network based on residual cost volumes, comprising: the device comprises a feature extraction module, a cost volume construction module, a cost aggregation module, a parallax regression module and a parallax optimization module;

the feature extraction module is used for extracting features of the obtained left view and right view to obtain a feature pyramid;

the cost volume construction module is used for constructing a plurality of first residual cost volumes with different scales according to the feature pyramid, and the dimensionality of each first residual cost volume is different;

the cost aggregation module is used for carrying out cost aggregation on all the first residual cost volumes in a residual heterogeneous aggregation mode to obtain a plurality of second cost volumes with different scales, wherein the second cost volumes are in one-to-one correspondence with the first cost volumes, and the scales and dimensions of the corresponding first residual cost volumes and the second cost volumes are the same;

the visual regression module is used for carrying out parallax regression on each second cost volume respectively to obtain a plurality of first parallax graphs with different sizes;

the parallax optimization module is used for carrying out error correction according to all the first parallax images to obtain second parallax images with the same size as the left view and the right view.

10. The stereo matching network of claim 9, wherein the cost aggregation module comprises an intra-scale cost aggregation sub-module and a cross-modality cost aggregation sub-module; the inner scale cost aggregation sub-module is used for carrying out inner scale cost aggregation on each first residual cost volume to obtain an inner scale aggregation cost volume; the cross-form cost aggregation sub-module is used for carrying out information fusion among cost volumes on all the inner-scale aggregation cost volumes so as to obtain a plurality of second cost volumes with different scales.