CN117576428A - Hierarchical parallel aggregation calculation method and device for stereo matching - Google Patents

Hierarchical parallel aggregation calculation method and device for stereo matching Download PDF

Info

Publication number
CN117576428A
CN117576428A CN202311350821.3A CN202311350821A CN117576428A CN 117576428 A CN117576428 A CN 117576428A CN 202311350821 A CN202311350821 A CN 202311350821A CN 117576428 A CN117576428 A CN 117576428A
Authority
CN
China
Prior art keywords
scale
low
aggregation
hierarchical
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311350821.3A
Other languages
Chinese (zh)
Inventor
赵昀
杨文邦
钱刃
刘钢
赵勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naro Era Technology Shenzhen Co ltd
Original Assignee
Naro Era Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naro Era Technology Shenzhen Co ltd filed Critical Naro Era Technology Shenzhen Co ltd
Priority to CN202311350821.3A priority Critical patent/CN117576428A/en
Publication of CN117576428A publication Critical patent/CN117576428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The application discloses a hierarchical parallel aggregation calculation method and device for stereo matching, which are characterized in that firstly, feature acquisition is carried out on two stereo matched original images which are acquired at the same moment and are subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales; then carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale; and finally, carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. The context information from the multi-scale cost volume is fused into the integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.

Description

Hierarchical parallel aggregation calculation method and device for stereo matching
Technical Field
The invention relates to the technical field of machine vision stereo matching, in particular to a hierarchical parallel aggregation calculation method and device for stereo matching.
Background
Stereo matching, also known as disparity estimation (disparity estimation), or binocular depth estimation, has been widely studied as one of the core techniques of computer vision, and is indispensable for many applications such as autopilot, robotic navigation, three-dimensional reconstruction, and the like. Accurate disparity estimation is essential to correct stereoscopic images for many computer vision tasks. The stereo matching input is two images (left image I l And right image I r ) The output is a disparity map d composed of disparity values corresponding to each pixel in a reference image (typically, a left image is taken as a reference image). Referring to fig. 1, a schematic view of disparity map acquisition is shown, where disparity is a pixel level difference between positions of corresponding points in left and right images of a certain point in a three-dimensional scene, and after a disparity map d is acquired, a depth map can be acquired according to a depth acquisition formula, where the depth acquisition formula is as follows:
z=(b×f)/d;
where f is the focal length of the camera lens, b is the distance between the centers of the two cameras, d is the parallax, and z is the depth value of the parallax d of the corresponding pixel on the left and right images by prediction. How to accurately and quickly predict parallax under limited computing resources through a given pair of corrected stereo images is a core problem in stereo matching computation.
Disclosure of Invention
The invention mainly solves the technical problem of how to construct a three-dimensional matching calculation method capable of capturing context information representation.
According to a first aspect, in one embodiment, there is provided a hierarchical parallel aggregation computing method for stereo matching, including:
performing feature acquisition on two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales;
hierarchical aggregation is carried out on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and carrying out parallel aggregation on the cost volumes of each scale, and using a preset size feature map obtained by parallel aggregation for predicting the parallax map.
In an embodiment, the feature acquiring of the two stereo matching original images acquired at the same time and corrected by the epipolar line includes:
and respectively carrying out feature acquisition on the two original images through a twin feature extraction network sharing weight so as to respectively acquire low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, wherein the values of the first scale, the second scale, the third scale and the fourth scale are sequentially decreased in proportion.
In an embodiment, the twin feature extraction network includes a convolution layer with a 3x3 convolution kernel, four residual blocks, and two hole convolution blocks.
In an embodiment, the feature acquiring of the two stereo matching original images acquired at the same time and corrected by the epipolar line further includes:
and regularizing each low-resolution feature map through two preset convolution layers.
In an embodiment, the regularizing each low-resolution feature map through two preset convolution layers includes:
and a batch normalization layer and a correction linear unit activation layer are arranged behind each convolution layer except the last convolution layer in the twin feature extraction network.
In one embodiment, the first scale, the second scale, the third scale and the fourth scale have values of 1/4,1/8,1/16 and 1/32, respectively
In an embodiment, the hierarchical aggregation of the low resolution feature maps of different scales includes downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
downsampling the low-resolution feature map with the high scale value to obtain a low-resolution feature map with the same value as the low scale value;
performing equal proportion convolution operation on the new low-resolution feature map obtained by downsampling and the original low-resolution feature map with equal proportion value so as to obtain the cost volume corresponding to the resolution feature map with high proportion value;
the upsampling aggregation includes:
upsampling the low resolution feature map of the low scale value to obtain a low resolution feature map of the same value as the high scale value;
and performing equal proportion convolution operation on the new low-resolution characteristic map obtained by up-sampling and the original low-resolution characteristic map with equal proportion value so as to obtain the cost volume corresponding to the resolution characteristic map with the low proportion value.
In an embodiment, the parallel aggregation of the cost rolls for each scale includes:
after each cost volume is subjected to 3D convolution according to a preset stride, reducing the characteristic size of the cost volume to 1/8 by using another three-dimensional convolution so as to obtain a cost volume to be expanded;
carrying out parallel expansion convolution on each cost roll to be expanded so as to output expansion characteristic diagrams which are the same in number and size as the cost rolls to be expanded;
splicing each expansion characteristic diagram to obtain a combined characteristic diagram for combining the characteristic mapping of each cost volume;
and inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
According to a second aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement a hierarchical parallel aggregation computing method as described above.
According to a third aspect, there is provided in an embodiment a hierarchical parallel aggregation computing device for stereo matching for applying the hierarchical parallel aggregation computing method as described above, the hierarchical parallel aggregation computing device comprising:
the twin feature extraction neural network unit is used for performing feature acquisition on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales;
the hierarchical aggregation neural network unit is used for performing hierarchical aggregation on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and the parallel aggregation neural network unit is used for carrying out parallel aggregation on the cost volumes of each scale and using a preset size characteristic diagram obtained by parallel aggregation for predicting the parallax diagram.
According to the hierarchical parallel aggregation computing method of the embodiment, the context information from the multi-scale cost volume is fused into one integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.
Drawings
Fig. 1 is a parallax map acquisition schematic diagram;
FIG. 2 is a schematic workflow diagram of a stereo matching system in one embodiment;
FIG. 3 is a flow diagram of a hierarchical parallel aggregation computing method in one embodiment;
FIG. 4 is a block diagram of a hierarchical parallel aggregation computing device in one embodiment;
FIG. 5 is a flow chart of a hierarchical parallel aggregation computing method according to another embodiment.
Detailed Description
The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.
Please refer to fig. 2, which is a schematic diagram of a workflow of a stereo matching system in an embodiment, wherein the left and right images acquired by two image acquisition devices at the same time are firstly acquired, then calibrated according to the relative positions and acquisition parameters of the two image acquisition devices, then polar correction is performed according to the calibration values, and finally stereo matching calculation is performed by applying a stereo matching algorithm to acquire a parallax image. At present, a three-dimensional matching network based on deep learning is generally used for constructing a cost quantity of a single scale and regularizing and regressing differences. However, none of these methods utilize multi-scale context information, resulting in limited parallax prediction performance in the sick region.
In the embodiment of the application, it is proposed that context information from a multi-scale cost volume is fused into an integrated cost volume through hierarchical aggregation and parallel aggregation, global and local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, and finally, a parallax map can be obtained in a parallax regression mode, so that the performance of stereo matching is greatly improved.
Embodiment one:
referring to fig. 3, a flow chart of a hierarchical parallel aggregation computing method according to an embodiment includes:
and step 101, extracting features.
And performing feature acquisition on the two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales. In an embodiment, feature acquisition is performed on two original images through a twin feature extraction network with shared weights, so as to obtain low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, where values of the first scale, the second scale, the third scale and the fourth scale decrease in proportion in sequence. In one embodiment, the values of the first scale, the second scale, the third scale, and the fourth scale are 1/4,1/8,1/16, and 1/32, respectively. In one embodiment, the twin feature extraction network includes a convolution layer with a 3x3 convolution kernel, four residual blocks, and two hole convolution blocks. In one embodiment, a batch normalization layer and a modified linear cell activation layer are provided after each convolution layer except the last convolution layer in the twinning feature extraction network. In one implementation, feature acquisition is performed on two stereo matching original images acquired at the same time and subjected to epipolar correction, and regularization processing is performed on each low-resolution feature image through two preset convolution layers.
Step 102, hierarchical aggregation.
And carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale. In one embodiment, the hierarchical aggregation includes downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
firstly, downsampling a low-resolution characteristic map with a high scale value to obtain the low-resolution characteristic map with the same value as the low scale value; and then, carrying out equal proportion convolution operation on the new low-resolution characteristic diagram obtained by downsampling and the original low-resolution characteristic diagram with equal proportion value so as to obtain a cost volume corresponding to the resolution characteristic diagram with high proportion value.
The upsampling aggregation includes:
firstly, up-sampling a low-resolution characteristic diagram with a low scale value to obtain the low-resolution characteristic diagram with the same value as a high scale value; and then, carrying out equal proportion convolution operation on the new low-resolution characteristic diagram obtained by up-sampling and the original low-resolution characteristic diagram with equal proportion value so as to obtain a cost volume corresponding to the resolution characteristic diagram with the low proportion value.
Step 103, parallel aggregation.
And carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. In one embodiment, aggregating cost volumes for each dimension in parallel includes:
first, after 3D convolution is performed on each cost volume according to a preset step, another three-dimensional convolution is used to reduce the feature size of the cost volume to 1/8, so as to obtain the cost volume to be expanded.
Then, parallel expansion convolution is performed on each cost volume to be expanded to output expansion feature maps of the same number and size as the cost volumes to be expanded.
And then, splicing each expansion characteristic map to obtain a combined characteristic map for combining the characteristic maps of each cost volume.
And finally, inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
Referring to fig. 4, which is a block diagram of a hierarchical parallel aggregation computing device in an embodiment, in an embodiment of the present application, a hierarchical parallel aggregation computing device is further disclosed, which is configured to apply the hierarchical parallel aggregation computing method as described above, where the hierarchical parallel aggregation computing device includes a twin feature extraction neural network unit 100, a hierarchical aggregation neural network unit 200, and a parallel aggregation neural network unit 300. The twin feature extraction neural network unit 100 is configured to perform feature extraction on two stereo-matched original images acquired at the same time and subjected to polar line correction, so as to acquire low-resolution feature maps with at least two different scales. The hierarchical aggregation neural network unit 200 is configured to perform hierarchical aggregation on low-resolution feature maps with different scales, so as to obtain a cost volume with at least one scale. The parallel aggregation neural network unit 300 performs parallel aggregation on the cost volumes of each scale, and uses a preset size feature map obtained by the parallel aggregation for predicting the disparity map.
According to the hierarchical parallel aggregation calculation method disclosed by the embodiment of the application, firstly, feature acquisition is carried out on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction, so as to acquire low-resolution feature images of at least two different scales; then carrying out hierarchical aggregation on the low-resolution feature maps with different scales to obtain cost volumes with at least one scale; and finally, carrying out parallel aggregation on the cost volumes of each scale, and using the preset size feature images obtained by parallel aggregation for predicting the parallax images. The context information from the multi-scale cost volume is fused into the integrated cost volume, and then the global clues and the local clues of the context information are captured simultaneously by utilizing a plurality of three-dimensional expansion convolutions, so that the parallax map can be finally obtained through parallax regression, and the stereo matching performance is greatly improved.
The flow of the hierarchical parallel aggregation computing method disclosed in the present application is described below by way of a specific embodiment.
Referring to fig. 5, a flow chart of a hierarchical parallel aggregation computing method according to another embodiment specifically includes:
feature extraction:
starting with a twin feature extraction network of shared weights, which takes as input a pair of images. The twin feature extraction network first uses 3 convolutional layers with 3x3 kernels, 4 residual blocks, and 2 hole convolutional blocks. The twin feature extraction network specifically comprises two convolution layers with a stride of 2 to obtain a 1/4 scale feature map. In addition, other 3 downsampling blocks with the step length of 2 are used respectively to obtain low-resolution characteristic diagrams of 1/8,1/16 and 1/32 scale. In one embodiment, to construct the connected cost volume, four scale feature maps are regularized using two other convolution layers, i.e., each convolution is followed by a batch normalization layer and a modified linear element activation layer, except for the last convolution.
In one embodiment, network performance may be improved using packet-dependent cost volumes in combination with connection volumes.
Wherein < sum > is a splicing operator, the scale of the multi-scale cost volume is channel C multiplied by multi-scale coefficient alpha (deep multiplied by high multiplied by wide), and the multi-scale coefficient alpha is 1/4,1/8,1/16,1/32 respectively. Channel C is 32, 64, 128 in order from high to low.
Hierarchical aggregation:
the 1/4 combined volume is downsampled to 1/8 scale (V1) by three-dimensional convolution with a step of 2. The 1/8 combined roll (V1) is connected with the original 1/8 roll (V2) to form a new 1/8-scale roll (V). Then, 1x1x1 convolution is performed to halve the channel of the new cost volume into the channel corresponding to the scale. During downsampling, four rolls are hierarchically aggregated until a minimum ratio (1/32) is obtained. And vice versa. In cooperation with the other three larger volumes and linking operations, the new lowest scale volume (1/32) is staged to the highest scale volume (1/4). The corresponding cost volume is up-sampled using a three-dimensional deconvolution with a step size of 2. Multi-scale cost volumes with four different scales are aggregated into 1/4-sized volumes by hierarchical aggregation for subsequent differential regression.
Parallel aggregation:
in one embodiment of the present application, it is proposed to aggregate the cost of an original network by aggregating networks in parallel. The parallel aggregation network consists of three cascaded parallel aggregation modules in order to learn additional context information. Firstly, using a 3D convolution with a step of 2, and then using another three-dimensional convolution to reduce the feature size to 1/8; then, 4 parallel expansion convolutions with increased expansion rates output 4 feature maps of the same size. After the splice is completed, the four feature maps are combined together and then input into two three-dimensional convolutions, the latter being a step-2 deconvolution. And processing the final 1/4-size feature map to predict the parallax map during output.
In one embodiment, two stacked 3D volumes and an upsampling operator are used to generate a 1-channel 4D volume when aggregating outputs in parallel. Then, the 4D volume is converted into a probability volume with softmax along the parallax dimension, and the set Cd is the maximum parallax of the prediction cost Dmax, and then the following steps are:
cross-scale features are extracted from a multi-scale cost volume to improve the network's understanding of multi-level contexts. The parallel aggregation module with the extended convolution is used for cost quantity filtering, and the utilization rate of global context information is improved.
In this embodiment, the hierarchical stitching may fuse multi-scale content information, and make full use of global information and local information to construct an aggregate cost body, so as to obtain more accurate depth estimation, and stitch multi-scale features may construct feature representations with more expressive ability. The splicing operation is to connect two tensors in series in the channel dimension to form the same tensor, and the parallel aggregation can accelerate the reasoning speed of the network, so that the problems of high time consumption and high calculation of the sequential reasoning of the dense pixel matching task are solved. And the use of hole convolutions of multiple sizes can be that the obtained feature map has a larger receptive field, so that the local information and the global information are fully understood.
Experimental results of the embodiment show that the content-based hierarchical parallel aggregation network HPA-Net has the most advanced stereo matching performance on the KITTI data set.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims (10)

1. A hierarchical parallel aggregation computing method for stereo matching, comprising:
performing feature acquisition on two three-dimensional matched original images acquired at the same moment and subjected to polar line correction to acquire low-resolution feature images of at least two different scales;
hierarchical aggregation is carried out on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and carrying out parallel aggregation on the cost volumes of each scale, and using a preset size feature map obtained by parallel aggregation for predicting the parallax map.
2. The hierarchical parallel aggregation computing method according to claim 1, wherein the feature acquisition of the two stereo matching original images acquired at the same time and subjected to epipolar correction includes:
and respectively carrying out feature acquisition on the two original images through a twin feature extraction network sharing weight so as to respectively acquire low-resolution feature graphs of a first scale, a second scale, a third scale and a fourth scale, wherein the values of the first scale, the second scale, the third scale and the fourth scale are sequentially decreased in proportion.
3. The hierarchical parallel aggregation computing method of claim 2, wherein the twinning feature extraction network comprises a convolutional layer with a 3x3 convolutional kernel, four residual blocks, and two hole convolutional blocks.
4. The hierarchical parallel aggregation computing method according to claim 3, wherein the feature acquisition of the two stereo matching original images acquired at the same time and subjected to epipolar correction further comprises:
and regularizing each low-resolution feature map through two preset convolution layers.
5. The hierarchical parallel aggregation computing method according to claim 4, wherein the regularizing each low-resolution feature map by two preset convolution layers includes:
and a batch normalization layer and a correction linear unit activation layer are arranged behind each convolution layer except the last convolution layer in the twin feature extraction network.
6. The hierarchical parallel aggregation computing method of claim 2, wherein the first scale, the second scale, the third scale, and the fourth scale have values of 1/4,1/8,1/16, and 1/32, respectively.
7. The hierarchical parallel aggregation computing method of claim 2, wherein the hierarchical aggregation of the low resolution feature maps of different scales comprises downsampling and/or upsampling aggregation;
the downsampling aggregation includes:
downsampling the low-resolution feature map with the high scale value to obtain a low-resolution feature map with the same value as the low scale value;
performing equal proportion convolution operation on the new low-resolution feature map obtained by downsampling and the original low-resolution feature map with equal proportion value so as to obtain the cost volume corresponding to the resolution feature map with high proportion value;
the upsampling aggregation includes:
upsampling the low resolution feature map of the low scale value to obtain a low resolution feature map of the same value as the high scale value;
and performing equal proportion convolution operation on the new low-resolution characteristic map obtained by up-sampling and the original low-resolution characteristic map with equal proportion value so as to obtain the cost volume corresponding to the resolution characteristic map with the low proportion value.
8. The hierarchical parallel aggregation computing method of claim 2, wherein the parallel aggregating the cost rolls for each scale comprises:
after each cost volume is subjected to 3D convolution according to a preset stride, reducing the characteristic size of the cost volume to 1/8 by using another three-dimensional convolution so as to obtain a cost volume to be expanded;
carrying out parallel expansion convolution on each cost roll to be expanded so as to output expansion characteristic diagrams which are the same in number and size as the cost rolls to be expanded;
splicing each expansion characteristic diagram to obtain a combined characteristic diagram for combining the characteristic mapping of each cost volume;
and inputting the combined feature map into a three-dimensional convolution operation model to obtain a feature map with a preset size output by the three-dimensional convolution operation model, wherein the three-dimensional convolution operation model comprises two cascaded three-dimensional convolution layers, and the later convolution layer is an deconvolution layer with a step of 2.
9. A computer readable storage medium having stored thereon a program executable by a processor to implement the hierarchical parallel aggregation computing method of any one of claims 1-8.
10. Hierarchical parallel aggregation computing device for stereo matching, characterized by being adapted to apply the hierarchical parallel aggregation computing method according to any one of claims 1-8, the hierarchical parallel aggregation computing device comprising:
the twin feature extraction neural network unit is used for performing feature acquisition on two three-dimensional matched original images which are acquired at the same moment and subjected to polar line correction so as to acquire low-resolution feature images of at least two different scales;
the hierarchical aggregation neural network unit is used for performing hierarchical aggregation on the low-resolution feature maps with different scales so as to obtain cost volumes with at least one scale;
and the parallel aggregation neural network unit is used for carrying out parallel aggregation on the cost volumes of each scale and using a preset size characteristic diagram obtained by parallel aggregation for predicting the parallax diagram.
CN202311350821.3A 2023-10-18 2023-10-18 Hierarchical parallel aggregation calculation method and device for stereo matching Pending CN117576428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311350821.3A CN117576428A (en) 2023-10-18 2023-10-18 Hierarchical parallel aggregation calculation method and device for stereo matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311350821.3A CN117576428A (en) 2023-10-18 2023-10-18 Hierarchical parallel aggregation calculation method and device for stereo matching

Publications (1)

Publication Number Publication Date
CN117576428A true CN117576428A (en) 2024-02-20

Family

ID=89888774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311350821.3A Pending CN117576428A (en) 2023-10-18 2023-10-18 Hierarchical parallel aggregation calculation method and device for stereo matching

Country Status (1)

Country Link
CN (1) CN117576428A (en)

Similar Documents

Publication Publication Date Title
CN108961327B (en) Monocular depth estimation method and device, equipment and storage medium thereof
CN110220493B (en) Binocular distance measuring method and device
WO2022100379A1 (en) Object attitude estimation method and system based on image and three-dimensional model, and medium
US20210365194A1 (en) Method and apparatus for allocating memory space for driving neural network
CN113140011A (en) Infrared thermal imaging monocular vision distance measurement method and related assembly
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
WO2023159757A1 (en) Disparity map generation method and apparatus, electronic device, and storage medium
CN112509021B (en) Parallax optimization method based on attention mechanism
CN115984494A (en) Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
EP3965014A1 (en) Method and apparatus with image processing
US11727591B2 (en) Method and apparatus with image depth estimation
CN114170311A (en) Binocular stereo matching method
Makarov et al. Sparse depth map interpolation using deep convolutional neural networks
CN113034666B (en) Stereo matching method based on pyramid parallax optimization cost calculation
CN110335228B (en) Method, device and system for determining image parallax
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN114820755B (en) Depth map estimation method and system
CN117576428A (en) Hierarchical parallel aggregation calculation method and device for stereo matching
CN115731542A (en) Multi-mode weak supervision three-dimensional target detection method, system and equipment
CN115223079A (en) Video classification method and device
CN116188349A (en) Image processing method, device, electronic equipment and storage medium
CN112802079A (en) Disparity map acquisition method, device, terminal and storage medium
CN114550137B (en) Method and device for identifying traffic sign board and electronic equipment
EP4386655A1 (en) Method and apparatus with semiconductor image processing
US20220189031A1 (en) Method and apparatus with optimization and prediction for image segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication