CN117576428A - Hierarchical parallel aggregation calculation method and device for stereo matching - Google Patents
Hierarchical parallel aggregation calculation method and device for stereo matching Download PDFInfo
- Publication number
- CN117576428A CN117576428A CN202311350821.3A CN202311350821A CN117576428A CN 117576428 A CN117576428 A CN 117576428A CN 202311350821 A CN202311350821 A CN 202311350821A CN 117576428 A CN117576428 A CN 117576428A
- Authority
- CN
- China
- Prior art keywords
- scale
- low
- aggregation
- hierarchical
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 90
- 238000004220 aggregation Methods 0.000 title claims abstract description 90
- 238000004364 calculation method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 5
- 238000000034 method Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000020411 cell activation Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
Abstract
The application discloses a hierarchical parallel aggregation calculation method and device for stereo matching, which are characterized in that firstly, feature acquisition is carried out on two stereo matched original images which are acquired at the same moment and are subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales; then carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale; and finally, carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. The context information from the multi-scale cost volume is fused into the integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.
Description
Technical Field
The invention relates to the technical field of machine vision stereo matching, in particular to a hierarchical parallel aggregation calculation method and device for stereo matching.
Background
Stereo matching, also known as disparity estimation (disparity estimation), or binocular depth estimation, has been widely studied as one of the core techniques of computer vision, and is indispensable for many applications such as autopilot, robotic navigation, three-dimensional reconstruction, and the like. Accurate disparity estimation is essential to correct stereoscopic images for many computer vision tasks. The stereo matching input is two images (left image I l And right image I r ) The output is a disparity map d composed of disparity values corresponding to each pixel in a reference image (typically, a left image is taken as a reference image). Referring to fig. 1, a schematic view of disparity map acquisition is shown, where disparity is a pixel level difference between positions of corresponding points in left and right images of a certain point in a three-dimensional scene, and after a disparity map d is acquired, a depth map can be acquired according to a depth acquisition formula, where the depth acquisition formula is as follows:
z=(b×f)/d;
where f is the focal length of the camera lens, b is the distance between the centers of the two cameras, d is the parallax, and z is the depth value of the parallax d of the corresponding pixel on the left and right images by prediction. How to accurately and quickly predict parallax under limited computing resources through a given pair of corrected stereo images is a core problem in stereo matching computation.
Disclosure of Invention
The invention mainly solves the technical problem of how to construct a three-dimensional matching calculation method capable of capturing context information representation.
According to a first aspect, in one embodiment, there is provided a hierarchical parallel aggregation computing method for stereo matching, including:
performing feature acquisition on two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales;
hierarchical aggregation is carried out on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and carrying out parallel aggregation on the cost volumes of each scale, and using a preset size feature map obtained by parallel aggregation for predicting the parallax map.
In an embodiment, the feature acquiring of the two stereo matching original images acquired at the same time and corrected by the epipolar line includes:
and respectively carrying out feature acquisition on the two original images through a twin feature extraction network sharing weight so as to respectively acquire low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, wherein the values of the first scale, the second scale, the third scale and the fourth scale are sequentially decreased in proportion.
In an embodiment, the twin feature extraction network includes a convolution layer with a 3x3 convolution kernel, four residual blocks, and two hole convolution blocks.
In an embodiment, the feature acquiring of the two stereo matching original images acquired at the same time and corrected by the epipolar line further includes:
and regularizing each low-resolution feature map through two preset convolution layers.
In an embodiment, the regularizing each low-resolution feature map through two preset convolution layers includes:
and a batch normalization layer and a correction linear unit activation layer are arranged behind each convolution layer except the last convolution layer in the twin feature extraction network.
In one embodiment, the first scale, the second scale, the third scale and the fourth scale have values of 1/4,1/8,1/16 and 1/32, respectively
In an embodiment, the hierarchical aggregation of the low resolution feature maps of different scales includes downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
downsampling the low-resolution feature map with the high scale value to obtain a low-resolution feature map with the same value as the low scale value;
performing equal proportion convolution operation on the new low-resolution feature map obtained by downsampling and the original low-resolution feature map with equal proportion value so as to obtain the cost volume corresponding to the resolution feature map with high proportion value;
the upsampling aggregation includes:
upsampling the low resolution feature map of the low scale value to obtain a low resolution feature map of the same value as the high scale value;
and performing equal proportion convolution operation on the new low-resolution characteristic map obtained by up-sampling and the original low-resolution characteristic map with equal proportion value so as to obtain the cost volume corresponding to the resolution characteristic map with the low proportion value.
In an embodiment, the parallel aggregation of the cost rolls for each scale includes:
after each cost volume is subjected to 3D convolution according to a preset stride, reducing the characteristic size of the cost volume to 1/8 by using another three-dimensional convolution so as to obtain a cost volume to be expanded;
carrying out parallel expansion convolution on each cost roll to be expanded so as to output expansion characteristic diagrams which are the same in number and size as the cost rolls to be expanded;
splicing each expansion characteristic diagram to obtain a combined characteristic diagram for combining the characteristic mapping of each cost volume;
and inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
According to a second aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement a hierarchical parallel aggregation computing method as described above.
According to a third aspect, there is provided in an embodiment a hierarchical parallel aggregation computing device for stereo matching for applying the hierarchical parallel aggregation computing method as described above, the hierarchical parallel aggregation computing device comprising:
the twin feature extraction neural network unit is used for performing feature acquisition on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales;
the hierarchical aggregation neural network unit is used for performing hierarchical aggregation on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and the parallel aggregation neural network unit is used for carrying out parallel aggregation on the cost volumes of each scale and using a preset size characteristic diagram obtained by parallel aggregation for predicting the parallax diagram.
According to the hierarchical parallel aggregation computing method of the embodiment, the context information from the multi-scale cost volume is fused into one integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.
Drawings
Fig. 1 is a parallax map acquisition schematic diagram;
FIG. 2 is a schematic workflow diagram of a stereo matching system in one embodiment;
FIG. 3 is a flow diagram of a hierarchical parallel aggregation computing method in one embodiment;
FIG. 4 is a block diagram of a hierarchical parallel aggregation computing device in one embodiment;
FIG. 5 is a flow chart of a hierarchical parallel aggregation computing method according to another embodiment.
Detailed Description
The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.
Please refer to fig. 2, which is a schematic diagram of a workflow of a stereo matching system in an embodiment, wherein the left and right images acquired by two image acquisition devices at the same time are firstly acquired, then calibrated according to the relative positions and acquisition parameters of the two image acquisition devices, then polar correction is performed according to the calibration values, and finally stereo matching calculation is performed by applying a stereo matching algorithm to acquire a parallax image. At present, a three-dimensional matching network based on deep learning is generally used for constructing a cost quantity of a single scale and regularizing and regressing differences. However, none of these methods utilize multi-scale context information, resulting in limited parallax prediction performance in the sick region.
In the embodiment of the application, it is proposed that context information from a multi-scale cost volume is fused into an integrated cost volume through hierarchical aggregation and parallel aggregation, global and local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, and finally, a parallax map can be obtained in a parallax regression mode, so that the performance of stereo matching is greatly improved.
Embodiment one:
referring to fig. 3, a flow chart of a hierarchical parallel aggregation computing method according to an embodiment includes:
and step 101, extracting features.
And performing feature acquisition on the two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales. In an embodiment, feature acquisition is performed on two original images through a twin feature extraction network with shared weights, so as to obtain low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, where values of the first scale, the second scale, the third scale and the fourth scale decrease in proportion in sequence. In one embodiment, the values of the first scale, the second scale, the third scale, and the fourth scale are 1/4,1/8,1/16, and 1/32, respectively. In one embodiment, the twin feature extraction network includes a convolution layer with a 3x3 convolution kernel, four residual blocks, and two hole convolution blocks. In one embodiment, a batch normalization layer and a modified linear cell activation layer are provided after each convolution layer except the last convolution layer in the twinning feature extraction network. In one implementation, feature acquisition is performed on two stereo matching original images acquired at the same time and subjected to epipolar correction, and regularization processing is performed on each low-resolution feature image through two preset convolution layers.
Step 102, hierarchical aggregation.
And carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale. In one embodiment, the hierarchical aggregation includes downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
firstly, downsampling a low-resolution characteristic map with a high scale value to obtain the low-resolution characteristic map with the same value as the low scale value; and then, carrying out equal proportion convolution operation on the new low-resolution characteristic diagram obtained by downsampling and the original low-resolution characteristic diagram with equal proportion value so as to obtain a cost volume corresponding to the resolution characteristic diagram with high proportion value.
The upsampling aggregation includes:
firstly, up-sampling a low-resolution characteristic diagram with a low scale value to obtain the low-resolution characteristic diagram with the same value as a high scale value; and then, carrying out equal proportion convolution operation on the new low-resolution characteristic diagram obtained by up-sampling and the original low-resolution characteristic diagram with equal proportion value so as to obtain a cost volume corresponding to the resolution characteristic diagram with the low proportion value.
Step 103, parallel aggregation.
And carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. In one embodiment, aggregating cost volumes for each dimension in parallel includes:
first, after 3D convolution is performed on each cost volume according to a preset step, another three-dimensional convolution is used to reduce the feature size of the cost volume to 1/8, so as to obtain the cost volume to be expanded.
Then, parallel expansion convolution is performed on each cost volume to be expanded to output expansion feature maps of the same number and size as the cost volumes to be expanded.
And then, splicing each expansion characteristic map to obtain a combined characteristic map for combining the characteristic maps of each cost volume.
And finally, inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
Referring to fig. 4, which is a block diagram of a hierarchical parallel aggregation computing device in an embodiment, in an embodiment of the present application, a hierarchical parallel aggregation computing device is further disclosed, which is configured to apply the hierarchical parallel aggregation computing method as described above, where the hierarchical parallel aggregation computing device includes a twin feature extraction neural network unit 100, a hierarchical aggregation neural network unit 200, and a parallel aggregation neural network unit 300. The twin feature extraction neural network unit 100 is configured to perform feature extraction on two stereo-matched original images acquired at the same time and subjected to polar line correction, so as to acquire low-resolution feature maps with at least two different scales. The hierarchical aggregation neural network unit 200 is configured to perform hierarchical aggregation on low-resolution feature maps with different scales, so as to obtain a cost volume with at least one scale. The parallel aggregation neural network unit 300 performs parallel aggregation on the cost volumes of each scale, and uses a preset size feature map obtained by the parallel aggregation for predicting the disparity map.
According to the hierarchical parallel aggregation calculation method disclosed by the embodiment of the application, firstly, feature acquisition is carried out on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction, so as to acquire low-resolution feature images of at least two different scales; then carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale; and finally, carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. The context information from the multi-scale cost volume is fused into the integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.
The flow of the hierarchical parallel aggregation computing method disclosed in the present application is described below by way of a specific embodiment.
Referring to fig. 5, a flow chart of a hierarchical parallel aggregation computing method according to another embodiment specifically includes:
feature extraction:
starting with a twin feature extraction network of shared weights, which takes as input a pair of images. The twin feature extraction network first uses 3 convolutional layers with 3x3 kernels, 4 residual blocks, and 2 hole convolutional blocks. The twin feature extraction network specifically comprises two convolution layers with a stride of 2 to obtain a 1/4 scale feature map. In addition, other 3 downsampling blocks with the step length of 2 are used respectively to obtain low-resolution characteristic diagrams of 1/8,1/16 and 1/32 scale. In one embodiment, to construct the connected cost volume, four scale feature maps are regularized using two other convolution layers, i.e., each convolution is followed by a batch normalization layer and a modified linear element activation layer, except for the last convolution.
In one embodiment, network performance may be improved using packet-dependent cost volumes in combination with connection volumes.
Wherein < sum > is a splicing operator, the scale of the multi-scale cost volume is channel C multiplied by multi-scale coefficient alpha (deep multiplied by high multiplied by wide), and the multi-scale coefficient alpha is 1/4,1/8,1/16,1/32 respectively. Channel C is 32, 64, 128 in order from high to low.
Hierarchical aggregation:
the 1/4 combined volume is downsampled to 1/8 scale (V1) by three-dimensional convolution with a step of 2. The 1/8 combined roll (V1) is connected with the original 1/8 roll (V2) to form a new 1/8-scale roll (V). Then, 1x1x1 convolution is performed to halve the channel of the new cost volume into the channel corresponding to the scale. During downsampling, four rolls are hierarchically aggregated until a minimum ratio (1/32) is obtained. And vice versa. In cooperation with the other three larger volumes and linking operations, the new lowest scale volume (1/32) is staged to the highest scale volume (1/4). The corresponding cost volume is up-sampled using a three-dimensional deconvolution with a step size of 2. Multi-scale cost volumes with four different scales are aggregated into 1/4-sized volumes by hierarchical aggregation for subsequent differential regression.
Parallel aggregation:
in one embodiment of the present application, it is proposed to aggregate the cost of an original network by aggregating networks in parallel. The parallel aggregation network consists of three cascaded parallel aggregation modules in order to learn additional context information. Firstly, using a 3D convolution with a step of 2, and then using another three-dimensional convolution to reduce the feature size to 1/8; then, 4 parallel expansion convolutions with increased expansion rates output 4 feature maps of the same size. After the splice is completed, the four feature maps are combined together and then input into two three-dimensional convolutions, the latter being a step-2 deconvolution. And processing the final 1/4-size feature map to predict the parallax map during output.
In one embodiment, two stacked 3D volumes and an upsampling operator are used to generate a 1-channel 4D volume when aggregating outputs in parallel. Then, the 4D volume is converted into a probability volume with softmax along the parallax dimension, and the set Cd is the maximum parallax of the prediction cost Dmax, and then the following steps are:
;
cross-scale features are extracted from a multi-scale cost volume to improve the network's understanding of multi-level contexts. The parallel aggregation module with the extended convolution is used for cost quantity filtering, and the utilization rate of global context information is improved.
In this embodiment, the hierarchical stitching may fuse multi-scale content information, and make full use of global information and local information to construct an aggregate cost body, so as to obtain more accurate depth estimation, and stitch multi-scale features may construct feature representations with more expressive ability. The splicing operation is to connect two tensors in series in the channel dimension to form the same tensor, and the parallel aggregation can accelerate the reasoning speed of the network, so that the problems of high time consumption and high calculation of the sequential reasoning of the dense pixel matching task are solved. And the use of hole convolutions of multiple sizes can be that the obtained feature map has a larger receptive field, so that the local information and the global information are fully understood.
Experimental results of the embodiment show that the content-based hierarchical parallel aggregation network HPA-Net has the most advanced stereo matching performance on the KITTI data set.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.
Claims (10)
1. A hierarchical parallel aggregation computing method for stereo matching, comprising:
performing feature acquisition on two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales;
hierarchical aggregation is carried out on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and carrying out parallel aggregation on the cost volumes of each scale, and using a preset size feature map obtained by parallel aggregation for predicting the parallax map.
2. The hierarchical parallel aggregation computing method according to claim 1, wherein the feature acquisition of the two stereo matching original images acquired at the same time and subjected to epipolar correction includes:
and respectively carrying out feature acquisition on the two original images through a twin feature extraction network sharing weight so as to respectively acquire low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, wherein the values of the first scale, the second scale, the third scale and the fourth scale are sequentially decreased in proportion.
3. The hierarchical parallel aggregation computing method of claim 2, wherein the twinning feature extraction network comprises a convolutional layer with a 3x3 convolutional kernel, four residual blocks, and two hole convolutional blocks.
4. The hierarchical parallel aggregation computing method according to claim 3, wherein the feature acquisition of the two stereo matching original images acquired at the same time and subjected to epipolar correction further comprises:
and regularizing each low-resolution feature map through two preset convolution layers.
5. The hierarchical parallel aggregation computing method according to claim 4, wherein the regularizing each low-resolution feature map by two preset convolution layers includes:
and a batch normalization layer and a correction linear unit activation layer are arranged behind each convolution layer except the last convolution layer in the twin feature extraction network.
6. The hierarchical parallel aggregation computing method of claim 2, wherein the first scale, the second scale, the third scale, and the fourth scale have values of 1/4,1/8,1/16, and 1/32, respectively.
7. The hierarchical parallel aggregation computing method of claim 2, wherein the hierarchical aggregation of the low resolution feature maps of different scales comprises downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
downsampling the low-resolution feature map with the high scale value to obtain a low-resolution feature map with the same value as the low scale value;
performing equal proportion convolution operation on the new low-resolution feature map obtained by downsampling and the original low-resolution feature map with equal proportion value so as to obtain the cost volume corresponding to the resolution feature map with high proportion value;
the upsampling aggregation includes:
upsampling the low resolution feature map of the low scale value to obtain a low resolution feature map of the same value as the high scale value;
and performing equal proportion convolution operation on the new low-resolution characteristic map obtained by up-sampling and the original low-resolution characteristic map with equal proportion value so as to obtain the cost volume corresponding to the resolution characteristic map with the low proportion value.
8. The hierarchical parallel aggregation computing method of claim 2, wherein the parallel aggregating the cost rolls for each scale comprises:
after each cost volume is subjected to 3D convolution according to a preset stride, reducing the characteristic size of the cost volume to 1/8 by using another three-dimensional convolution so as to obtain a cost volume to be expanded;
carrying out parallel expansion convolution on each cost roll to be expanded so as to output expansion characteristic diagrams which are the same in number and size as the cost rolls to be expanded;
splicing each expansion characteristic diagram to obtain a combined characteristic diagram for combining the characteristic mapping of each cost volume;
and inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
9. A computer readable storage medium having stored thereon a program executable by a processor to implement the hierarchical parallel aggregation computing method of any one of claims 1-8.
10. Hierarchical parallel aggregation computing device for stereo matching, characterized by being adapted to apply the hierarchical parallel aggregation computing method according to any one of claims 1-8, the hierarchical parallel aggregation computing device comprising:
the twin feature extraction neural network unit is used for performing feature acquisition on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales;
the hierarchical aggregation neural network unit is used for performing hierarchical aggregation on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and the parallel aggregation neural network unit is used for carrying out parallel aggregation on the cost volumes of each scale and using a preset size characteristic diagram obtained by parallel aggregation for predicting the parallax diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311350821.3A CN117576428A (en) | 2023-10-18 | 2023-10-18 | Hierarchical parallel aggregation calculation method and device for stereo matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311350821.3A CN117576428A (en) | 2023-10-18 | 2023-10-18 | Hierarchical parallel aggregation calculation method and device for stereo matching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117576428A true CN117576428A (en) | 2024-02-20 |
Family
ID=89888774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311350821.3A Pending CN117576428A (en) | 2023-10-18 | 2023-10-18 | Hierarchical parallel aggregation calculation method and device for stereo matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117576428A (en) |
-
2023
- 2023-10-18 CN CN202311350821.3A patent/CN117576428A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108961327B (en) | Monocular depth estimation method and device, equipment and storage medium thereof | |
CN110220493B (en) | Binocular distance measuring method and device | |
WO2022100379A1 (en) | Object attitude estimation method and system based on image and three-dimensional model, and medium | |
US20210365194A1 (en) | Method and apparatus for allocating memory space for driving neural network | |
CN113140011A (en) | Infrared thermal imaging monocular vision distance measurement method and related assembly | |
CN109389667B (en) | High-efficiency global illumination drawing method based on deep learning | |
WO2023159757A1 (en) | Disparity map generation method and apparatus, electronic device, and storage medium | |
CN112509021B (en) | Parallax optimization method based on attention mechanism | |
CN115984494A (en) | Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image | |
EP3965014A1 (en) | Method and apparatus with image processing | |
US11727591B2 (en) | Method and apparatus with image depth estimation | |
CN114170311A (en) | Binocular stereo matching method | |
Makarov et al. | Sparse depth map interpolation using deep convolutional neural networks | |
CN113034666B (en) | Stereo matching method based on pyramid parallax optimization cost calculation | |
CN110335228B (en) | Method, device and system for determining image parallax | |
CN112270701B (en) | Parallax prediction method, system and storage medium based on packet distance network | |
CN114820755B (en) | Depth map estimation method and system | |
CN117576428A (en) | Hierarchical parallel aggregation calculation method and device for stereo matching | |
CN115731542A (en) | Multi-mode weak supervision three-dimensional target detection method, system and equipment | |
CN115223079A (en) | Video classification method and device | |
CN116188349A (en) | Image processing method, device, electronic equipment and storage medium | |
CN112802079A (en) | Disparity map acquisition method, device, terminal and storage medium | |
CN114550137B (en) | Method and device for identifying traffic sign board and electronic equipment | |
EP4386655A1 (en) | Method and apparatus with semiconductor image processing | |
US20220189031A1 (en) | Method and apparatus with optimization and prediction for image segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |