CN117058252B - Self-adaptive fusion stereo matching method - Google Patents

Self-adaptive fusion stereo matching method Download PDF

Info

Publication number
CN117058252B
CN117058252B CN202311317490.3A CN202311317490A CN117058252B CN 117058252 B CN117058252 B CN 117058252B CN 202311317490 A CN202311317490 A CN 202311317490A CN 117058252 B CN117058252 B CN 117058252B
Authority
CN
China
Prior art keywords
cost volume
fusion
branch
scale
left view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311317490.3A
Other languages
Chinese (zh)
Other versions
CN117058252A (en
Inventor
俞正中
李鹏飞
钱刃
丘文峰
赵勇
李福池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Aipeike Technology Co ltd
Original Assignee
Dongguan Aipeike Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Aipeike Technology Co ltd filed Critical Dongguan Aipeike Technology Co ltd
Priority to CN202311317490.3A priority Critical patent/CN117058252B/en
Publication of CN117058252A publication Critical patent/CN117058252A/en
Application granted granted Critical
Publication of CN117058252B publication Critical patent/CN117058252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

A self-adaptive fusion stereo matching method relates to the technical field of stereo matching. The method comprises the following steps: acquiring a left view and a right view shot by a binocular camera; feature extraction is carried out on the left view and the right view to construct a first scale cost volume and a second scale cost volume; downsampling the first scale cost volume to generate a first fused branch; convolving the second scale cost volume to generate a second fused branch; fusing the first fused branch and the second fused branch to generate an initial fused cost volume; sequentially upsampling and downsampling the initial fusion cost volume to update the first fusion branch; convolving the second scale cost roll to maintain a second fused branch; fusing the first fused branch and the second fused branch to update the initial fused cost roll; repeating the updating of the initial fusion cost volume until the set times are reached, and outputting the initial fusion cost volume as a fusion cost volume; and generating a dense disparity map according to the fusion cost volume to calculate disparities of the left view and the right view.

Description

Self-adaptive fusion stereo matching method
Technical Field
The invention relates to the technical field of stereo matching, in particular to a self-adaptive fusion stereo matching method.
Background
Convolutional neural networks in stereo matching have made significant progress, however, it remains difficult to handle occlusion regions. Stereo matching is a fundamental problem in computer science and has applications in many areas of computer vision, such as robotics, autopilot, etc. The goal of stereo matching is to establish a close correspondence between a pair of corrected stereo images. With the development of deep learning, convolutional neural networks have been widely applied to stereo matching, and compared with the traditional method, the existing shielding, repeating and reflecting areas cannot achieve an accurate matching effect in a complex unmanned scene.
Disclosure of Invention
The invention mainly solves the technical problems that: provided is a stereo matching method capable of improving accuracy of a matching effect in a complex unmanned scene.
According to a first aspect, in one embodiment, there is provided a stereo matching method of adaptive fusion, including:
acquiring a left view and a right view shot by a binocular camera;
feature extraction is carried out on the left view and the right view to construct a first scale cost volume and a second scale cost volume, and the scale of the first scale cost volume is larger than that of the second scale cost volume;
downsampling the first scale cost volume to generate a first fused branch; convolving the second scale cost volume to generate a second fused branch that maintains the scale of the second scale cost volume; fusing the first fused branch and the second fused branch to generate an initial fused cost volume;
updating the initial fusion cost volume to generate a fusion cost volume, wherein the updating of the initial fusion cost volume comprises the following steps:
sequentially upsampling and downsampling the initial fusion cost volume to update the first fusion branch; convolving the second scale cost roll to maintain the second fused branch; fusing the updated first fused branch and the second fused branch to update the initial fused cost roll;
repeating the updating of the initial fusion cost volume until the set times are reached, and outputting the initial fusion cost volume as a fusion cost volume;
and generating a dense parallax map according to the fusion cost volume so as to calculate the parallax of the left view and the right view.
In one embodiment, the first scale cost volume comprises 1/4d,1/4h,1/4w cost volumes; the second scale cost volume comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
In one embodiment, the downsampling the first scale cost volume to generate a first fused branch includes:
when the first scale cost volume is downsampled, the first scale cost volume is changed from 1/4d,1/4h,1/4w cost volume to 1/8d,1/8h and 1/8w cost volume, and then is changed from 1/8d,1/8h,1/8w cost volume to 1/16d,1/16h and 1/16w cost volume; the first fusion branch comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
In one embodiment, the convolving the second scale cost volume to generate a second fused branch that maintains the scale of the second scale cost volume, comprising:
and convolving the second scale cost volume by using a two-layer 3D convolution with a step length of 1 to generate a second fusion branch maintaining the scale of the second scale cost volume.
In one embodiment, the fusing the first fused branch and the second fused branch generates an initial fused cost volume, including:
respectively carrying out maximum pooling and average pooling on the first fusion branch and the second fusion branch so as to correspondingly acquire the 2D characteristics of the first fusion branch and the 2D characteristics of the second fusion branch;
inputting the 2D characteristics of the first fusion branch and the 2D characteristics of the second fusion branch into a 2D convolution layer to correspondingly generate a weight map of the first fusion branch and a weight map of the second fusion branch;
summing the weight map of the first fusion branch and the weight map of the second fusion branch to generate an initial fusion cost volume; the initial fusion cost volume comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
In one embodiment, the sequentially upsampling and downsampling the initial fusion cost volume includes:
when the initial fusion cost volume is up-sampled, the initial fusion cost volume is changed from 1/16d,1/16h,1/16w cost volume to 1/8d,1/8h and 1/8w cost volume, and then from 1/8d,1/8h and 1/8w cost volume to 1/4d,1/4h and 1/4w cost volume;
when the initial fusion cost volume is downsampled, the initial fusion cost volume is changed from 1/4d,1/4h,1/4w cost volume to 1/8d,1/8h and 1/8w cost volume, and then from 1/8d,1/8h and 1/8w cost volume to 1/16d,1/16h and 1/16w cost volume;
wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
In one embodiment, the set number of times is 2.
In one embodiment, the capturing left and right views captured by the binocular camera includes:
calibrating the binocular camera;
and matching the original left view and the original right view shot by the binocular camera according to the camera calibration result, and correcting the matched original left view and original right view to generate the left view and the right view shot by the binocular camera.
In one embodiment, the correcting the matched original left view and original right view to generate the left view and right view shot by the binocular camera includes:
and correcting the matched original left view and original right view by using a Bouguet polar line correction method so as to generate the left view and the right view shot by the binocular camera.
According to a second aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement the method described above.
According to the self-adaptive fusion stereo matching method and the computer readable storage medium, the algorithm firstly obtains the left view and the right view of the binocular camera, then performs feature extraction on the left view and the right view to construct a first scale cost volume and a second scale cost volume, and then performs downsampling on the first scale cost volume to generate a first fusion branch; the second scale cost volume is convolved to generate a second fused branch that maintains the scale of the second scale cost volume. And fusing the first fused branch and the second fused branch to generate an initial fused cost volume, and then sequentially carrying out up-sampling and down-sampling updating on the initial fused cost volume. The downsampling and upsampling operations are carried out on the initial fusion cost volume, the scale of the initial fusion cost volume can be changed from large to small and then from small to large, so that detailed information in the left view and the right view can be captured better, and the understanding capability and the recognition accuracy of the left view and the right view can be improved. And finally, outputting the initial fusion cost volume up to the update times as a fusion cost volume, and calculating the parallaxes of the left view and the right view according to the fusion cost volume dense parallax map. According to the method, the accuracy of the matching effect of the stereo matching method is improved by learning cost rolls with different scales and fusing information obtained by fusing the cost rolls with different scales.
Drawings
FIG. 1 is a flow chart of a stereo matching method of adaptive fusion according to one embodiment;
FIG. 2 is a flowchart of step S100 of an adaptive fusion stereo matching method according to an embodiment;
FIG. 3 is a schematic diagram of cost aggregation of a first scale cost volume and a second scale cost volume of an adaptive fusion stereo matching method according to an embodiment;
FIG. 4 is a flowchart of step S300 of an adaptive fusion stereo matching method according to an embodiment;
FIG. 5 is a schematic diagram of a first fusion branch and a second fusion branch of an embodiment of a stereo matching method of adaptive fusion;
fig. 6 is a flowchart of step S340 of the stereo matching method of adaptive fusion according to an embodiment.
Detailed Description
The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning.
The application provides a self-adaptive fusion stereo matching method, which learns weights of cost rolls of different scales through different distributions of the parallax dimensions of different scales, and performs weighted fusion on different branches. The self-adaptive fusion stereo matching method provided by the application can utilize the information of different branches to achieve a good effect in a weak texture area, and is specifically described below.
Referring to fig. 1, the stereo matching method of adaptive fusion provided in the present application includes the following steps.
Step S100: and acquiring a left view and a right view shot by the binocular camera.
Referring to fig. 2, in one embodiment, when step S100 is performed to obtain the left view and the right view captured by the binocular camera, the following steps are further included.
Step S110: and calibrating the binocular camera.
In one embodiment, the binocular camera is calibrated using its internal and external parameters. Calibrating the internal parameters of the camera refers to determining internal parameters of the camera, such as focal length, principal point position, distortion, etc. of the camera. Calibrating the external parameters of the cameras refers to determining the relative position and attitude between the binocular cameras, i.e., the rotational matrix and translational vector of the two cameras. These parameters are important for stereo matching in computer vision tasks, and the external parameters of a calibration camera are obtained by solving a transformation matrix between two cameras by using a set of known spatial points to correspond to the two cameras.
Step S120: and matching the original left view and the original right view shot by the binocular camera according to the camera calibration result, and correcting the matched original left view and original right view to generate the left view and right view shot by the binocular camera.
In one embodiment, an original left view and an original right view shot by a binocular camera are matched according to a camera calibration result, then the matched original left view and original right view are corrected, and pixel points of the original left view and the original right view which are possibly matched are located on the same horizontal polar line after correction. In reality, the imaging of two cameras may have non-parallel optical axes and the optical centers are not on the same plane, so that the Bouguet polar correction method is used for correcting the matched original left view and original right view, so as to generate a left view and a right view shot by the binocular camera.
Step S200: and extracting features of the left view and the right view to construct a first scale cost volume and a second scale cost volume.
In one embodiment, before feature extraction is performed on the left view and the right view, normalization preprocessing is performed on the left view and the right view, and then random clipping, contrast variation and other operations are performed on the left view and the right view in order to improve generalization of matching.
In one embodiment, the normalization preprocessing refers to performing some numerical adjustment on the input left view and right view so that the numerical ranges of the input left view and right view are the same in each dimension, so as to improve the training effect and stability of the model and reduce the occurrence of the overfitting phenomenon. Generalization refers to the ability of a machine learning model to perform when processing data that has not been seen. A model with good generalization performance can efficiently process new data rather than just fitting training data. Therefore, generalization performance is an important index for measuring the quality of the machine learning model. Random cropping refers to random collection of left and right views to increase the diversity and generalization ability of the data. The contrast change refers to random contrast adjustment of the left view and the right view, so that different illumination conditions and background noise can be better adapted.
And after the processing is finished, extracting the characteristics of the left view and the right view, so as to construct a first scale cost volume and a second scale cost volume, wherein the scale of the first scale cost volume is larger than that of the second scale cost volume. In one embodiment, the first scale cost volume is b, c,1/4d,1/4h,1/4w cost volume; the second scale cost volume is b, c,1/16d,1/16h and 1/16w cost volume, wherein b represents the number of pairs of pictures fed into the neural network each time, c represents the number of channels of the feature, d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Step S300: and carrying out cost aggregation on the first scale cost volume and the second scale cost volume.
Please refer to fig. 3, which is a schematic diagram of cost aggregation of the first scale cost volume and the second scale cost volume, wherein the upper branch in fig. 3 is a branch corresponding to the second scale cost volume, and the lower branch in fig. 3 is a branch corresponding to the first scale cost volume.
Referring to fig. 4, in one embodiment, performing step S300 to aggregate costs of the first scale cost volume and the second scale cost volume includes the following steps.
Step S310: downsampling the first scale cost volume to generate a first fused branch.
In one embodiment, when the first scale cost volume is downsampled, the scale of the first scale cost volume is changed from 1/4d,1/4h,1/4w to 1/8d,1/8h,1/8w, and then from 1/8d,1/8h,1/8w to 1/16d,1/16h,1/16w, so that the generated first fusion branch is the 1/16d,1/16h,1/16w cost volume, wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Step S320: the second scale cost volume is convolved to generate a second fused branch that maintains the scale of the second scale cost volume.
In one embodiment, the second scale cost volume is convolved with a two-layer 3D convolution of step size 1 to generate a second fused branch that maintains the scale of the second scale cost volume.
Step S330: the first fused branch and the second fused branch are fused to generate an initial fused cost volume.
Please refer to fig. 5, which is a schematic diagram illustrating a first merging branch and a second merging branch, wherein the upper half of fig. 5 is the first merging branch, and the lower half of fig. 5 is the second merging branch. In one embodiment, when the first fusion branch and the second fusion branch are fused, the first fusion branch and the second fusion branch are subjected to maximum pooling and average pooling along the parallax dimension D, so that the 2D feature of the first fusion branch and the 2D feature of the second fusion branch are correspondingly acquired. And then inputting the 2D features of the first fusion branch and the 2D features of the second fusion branch into a 2D convolution layer, so as to correspondingly generate a weight map of the first fusion branch and a weight map of the second fusion branch. And finally, summing the weight graph of the first fusion branch and the weight graph of the second fusion branch to generate an initial fusion cost volume. In one embodiment, the initial fusion cost volume is 1/16d,1/16h, and 1/16w cost volume, where d represents a preset maximum disparity value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
In one embodiment, in the whole fusion process, in order to fully utilize fusion branches corresponding to cost volumes of two scales, an attention mechanism is introduced to guide a convolutional neural network to select important matching information of different scales. And carrying out maximum pooling and average pooling on the first fused branch and the second fused branch along the parallax dimension to obtain matching information of a single pixel point in the parallax dimension, obtaining three weight feature maps (the three weight feature maps are respectively weight feature maps corresponding to the maximum pooling, the weight feature maps corresponding to the average pooling and the weight feature maps corresponding to the maximum pooling-average pooling) for each fused branch, and inputting the three weight feature maps of each branch into a 2D convolution layer to obtain spliced feature maps of the three weight maps, wherein each spliced feature map comprises the weight map of each pixel point. And summing weights of pixel points at the same position in each fusion branch, and finally generating an initial fusion cost volume. The method is specifically calculated by the following formula:
wherein M is (c,h,w) Representing a weight feature map corresponding to maximum pooling, maxpool representing maximum pooling calculation, V k And c represents a channel, d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Wherein A is (c,h,w) Representing weight feature graphs corresponding to mean pooling, wherein Avgpool represents mean pooling calculation, V k And c represents a channel, d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Wherein I is (3c,h,w) Spliced feature map representing three weight maps, concat representing feature fusion, M (c,h,w) Representing a weight characteristic diagram corresponding to maximum pooling, A (c,h,w) And c represents a channel, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Wherein V is (c,d,h,w) Representing an initial fusion cost volume, V 1(c,d,h,w) Represents a first scale cost roll, by which is expressed a exclusive OR operation, PWNet represents a 2D convolutional neural network of the scientific system, I 1(3c,h,w) A spliced characteristic diagram representing three weight diagrams corresponding to the first scale cost volume, V 2(c,d,h,w) Representing a second scale cost volume, I 2(3c,h,w) And c represents a channel, d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
Step S340: updating the initial fusion cost volume to generate a fusion cost volume.
Referring to fig. 6, in one embodiment, after the initial fusion cost volume is generated, the initial fusion cost volume is updated, and the whole updating process includes the following steps.
Step S341: and sequentially up-sampling and down-sampling the initial fusion cost volume, and updating the first fusion branch according to the down-sampling result.
In one embodiment, the initial fusion cost volume is 1/16d,1/16h,1/16w cost volume, so that the initial fusion cost volume is up-sampled, the scale of the initial fusion cost volume is changed from 1/16d,1/16h,1/16w to 1/8d,1/8h,1/8w, and then from 1/8d,1/8h,1/8w to 1/4d,1/4h,1/4w. After the up-sampling is completed, the down-sampling is performed, the scale of the initial fusion cost volume is changed from 1/4d,1/4h,1/4w to 1/8d,1/8h,1/8w cost volume, and then from 1/8d,1/8h,1/8w to 1/16d,1/16h,1/16w, wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view. After the initial fusion cost volume finishes up-sampling and down-sampling once, updating the first fusion branch according to the down-sampling result, wherein the updated first fusion branch is still 1/16d,1/16h and 1/16w cost volume, but the up-sampling and the down-sampling of one round are already carried out, and at the moment, the updated first fusion branch better captures detailed information in the left view and the right view, so that the understanding capability and the recognition accuracy of the left view and the right view are further improved.
Step S342: the second scale cost volume is convolved to maintain a second fused branch.
In one embodiment, the second scale cost volume is still convolved with a two-layer 3D convolution of step 1 while the first fused branch is updated, thereby generating a sustained second fused branch.
Step S343: and fusing the updated first fused branch and the updated second fused branch to update the initial fused cost roll.
In one embodiment, the updated first fusion branch and the second fusion branch are fused, so that the update of the initial fusion cost volume is completed, and the fusion process is the same as that of the initial fusion cost volume in step S330. And repeatedly executing the steps S341-S343 until the set times are reached, and outputting the initial fusion cost volume finally output in the step S343 as the fusion cost volume. In one embodiment, the number of settings is 2, i.e., 2, cycles of updates.
Step S400: and generating a dense parallax map according to the fusion cost volume so as to calculate the parallax of the left view and the right view.
The stereo matching method comprises the steps of obtaining a left view and a right view of a binocular camera, then carrying out feature extraction on the left view and the right view to construct a first scale cost volume and a second scale cost volume, and then carrying out downsampling on the first scale cost volume to generate a first fusion branch; the second scale cost volume is convolved to generate a second fused branch that maintains the scale of the second scale cost volume. And fusing the first fused branch and the second fused branch to generate an initial fused cost volume, and then sequentially carrying out up-sampling and down-sampling updating on the initial fused cost volume. The downsampling and upsampling operations are carried out on the initial fusion cost volume, the scale of the initial fusion cost volume can be changed from large to small and then from small to large, so that detailed information in the left view and the right view can be captured better, and the understanding capability and the recognition accuracy of the left view and the right view can be improved. Outputting the initial fusion cost volume up to the update times as a fusion cost volume, and calculating the parallaxes of the left view and the right view according to the dense parallax map of the fusion cost volume. According to the stereo matching method, the accuracy of the matching effect of the stereo matching method is improved by learning cost rolls with different scales and fusing information obtained by the cost rolls with different scales.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims (10)

1. The self-adaptive fusion stereo matching method is characterized by comprising the following steps:
acquiring a left view and a right view shot by a binocular camera;
feature extraction is carried out on the left view and the right view to construct a first scale cost volume and a second scale cost volume, and the scale of the first scale cost volume is larger than that of the second scale cost volume;
downsampling the first scale cost volume to generate a first fused branch; convolving the second scale cost volume to generate a second fused branch that maintains the scale of the second scale cost volume; fusing the first fused branch and the second fused branch to generate an initial fused cost volume;
the method comprises the steps of obtaining matching information of a single pixel point in a parallax dimension by using an attention mechanism, and correspondingly determining a maximum pooled weight feature map, an average pooled weight feature map and a maximum pooled-average pooled weight feature map of a first fused branch and a second fused branch; inputting weight feature graphs corresponding to the first fusion branch and the second fusion branch into a 2D convolution layer to correspondingly determine the 2D features of the first fusion branch and the 2D features of the second fusion branch; inputting the 2D characteristics of the first fusion branch and the 2D characteristics of the second fusion branch into a 2D convolution layer to correspondingly generate a weight map of the first fusion branch and a weight map of the second fusion branch; summing the weight map of the first fusion branch and the weight map of the second fusion branch to generate an initial fusion cost volume;
updating the initial fusion cost volume to generate a fusion cost volume, wherein the updating of the initial fusion cost volume comprises the following steps:
sequentially upsampling and downsampling the initial fusion cost volume to update the first fusion branch; convolving the second scale cost roll to maintain the second fused branch; fusing the updated first fused branch and the second fused branch to update the initial fused cost roll;
repeating the updating of the initial fusion cost volume until the set times are reached, and outputting the initial fusion cost volume as a fusion cost volume;
and generating a dense parallax map according to the fusion cost volume so as to calculate the parallax of the left view and the right view.
2. The method of stereo matching for adaptive fusion of claim 1, wherein the first scale cost volume comprises 1/4d,1/4h,1/4w cost volumes; the second scale cost volume comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
3. The method of stereo matching for adaptive fusion of claim 2, wherein downsampling the first scale cost volume to generate a first fused branch comprises:
when the first scale cost volume is downsampled, the first scale cost volume is changed from 1/4d,1/4h,1/4w cost volume to 1/8d,1/8h and 1/8w cost volume, and then is changed from 1/8d,1/8h,1/8w cost volume to 1/16d,1/16h and 1/16w cost volume; the first fusion branch comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
4. The method of stereo matching for adaptive fusion of claim 2, wherein convolving the second scale cost volume to generate a second fused branch that maintains the scale of the second scale cost volume, comprises:
and convolving the second scale cost volume by using a two-layer 3D convolution with a step length of 1 to generate a second fusion branch maintaining the scale of the second scale cost volume.
5. The method for stereo matching for adaptive fusion according to claim 2,
the initial fusion cost volume comprises 1/16d,1/16h and 1/16w cost volumes; wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
6. The stereo matching method of adaptive fusion of claim 5, wherein the sequentially upsampling and downsampling the initial fusion cost volume comprises:
when the initial fusion cost volume is up-sampled, the initial fusion cost volume is changed from 1/16d,1/16h,1/16w cost volume to 1/8d,1/8h and 1/8w cost volume, and then from 1/8d,1/8h and 1/8w cost volume to 1/4d,1/4h and 1/4w cost volume;
when the initial fusion cost volume is downsampled, the initial fusion cost volume is changed from 1/4d,1/4h,1/4w cost volume to 1/8d,1/8h and 1/8w cost volume, and then from 1/8d,1/8h and 1/8w cost volume to 1/16d,1/16h and 1/16w cost volume;
wherein d represents a preset maximum parallax value, h represents the heights of the left view and the right view, and w represents the widths of the left view and the right view.
7. The stereo matching method of adaptive fusion according to claim 1, wherein the set number of times is 2.
8. The method for stereo matching of adaptive fusion according to claim 1, wherein the acquiring the left view and the right view photographed by the binocular camera comprises:
calibrating the binocular camera;
and matching the original left view and the original right view shot by the binocular camera according to the camera calibration result, and correcting the matched original left view and original right view to generate the left view and the right view shot by the binocular camera.
9. The method of stereo matching for adaptive fusion according to claim 8, wherein the correcting the matched original left view and original right view to generate the left view and right view captured by the binocular camera comprises:
and correcting the matched original left view and original right view by using a Bouguet polar line correction method so as to generate the left view and the right view shot by the binocular camera.
10. A computer-readable storage medium, characterized in that the medium has stored thereon a program executable by a processor to implement the stereo matching method as claimed in any one of claims 1-9.
CN202311317490.3A 2023-10-12 2023-10-12 Self-adaptive fusion stereo matching method Active CN117058252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311317490.3A CN117058252B (en) 2023-10-12 2023-10-12 Self-adaptive fusion stereo matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311317490.3A CN117058252B (en) 2023-10-12 2023-10-12 Self-adaptive fusion stereo matching method

Publications (2)

Publication Number Publication Date
CN117058252A CN117058252A (en) 2023-11-14
CN117058252B true CN117058252B (en) 2023-12-26

Family

ID=88663116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311317490.3A Active CN117058252B (en) 2023-10-12 2023-10-12 Self-adaptive fusion stereo matching method

Country Status (1)

Country Link
CN (1) CN117058252B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101852609A (en) * 2010-06-02 2010-10-06 北京理工大学 Ground obstacle detection method based on binocular stereo vision of robot
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics
CN111191694A (en) * 2019-12-19 2020-05-22 浙江科技学院 Image stereo matching method
EP3822910A1 (en) * 2019-11-14 2021-05-19 Samsung Electronics Co., Ltd. Depth image generation method and device
CN115170636A (en) * 2022-06-17 2022-10-11 五邑大学 Binocular stereo matching method and device for mixed cost body and storage medium
CN116740162A (en) * 2023-08-14 2023-09-12 东莞市爱培科技术有限公司 Stereo matching method based on multi-scale cost volume and computer storage medium
CN116777971A (en) * 2023-05-29 2023-09-19 北京计算机技术及应用研究所 Binocular stereo matching method based on horizontal deformable attention module

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210481A (en) * 2020-01-10 2020-05-29 大连理工大学 Depth estimation acceleration method of multiband stereo camera

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101852609A (en) * 2010-06-02 2010-10-06 北京理工大学 Ground obstacle detection method based on binocular stereo vision of robot
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics
EP3822910A1 (en) * 2019-11-14 2021-05-19 Samsung Electronics Co., Ltd. Depth image generation method and device
CN111191694A (en) * 2019-12-19 2020-05-22 浙江科技学院 Image stereo matching method
CN115170636A (en) * 2022-06-17 2022-10-11 五邑大学 Binocular stereo matching method and device for mixed cost body and storage medium
CN116777971A (en) * 2023-05-29 2023-09-19 北京计算机技术及应用研究所 Binocular stereo matching method based on horizontal deformable attention module
CN116740162A (en) * 2023-08-14 2023-09-12 东莞市爱培科技术有限公司 Stereo matching method based on multi-scale cost volume and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多层次融合注意力网络的双目图像超分辨率重建;徐磊 等;《中国图象图形学报》(第4期);第1079-1090页 *

Also Published As

Publication number Publication date
CN117058252A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN108961327B (en) Monocular depth estimation method and device, equipment and storage medium thereof
CN110909693B (en) 3D face living body detection method, device, computer equipment and storage medium
CN109919993B (en) Parallax map acquisition method, device and equipment and control system
US20020015048A1 (en) System and method for median fusion of depth maps
KR100681320B1 (en) Method for modelling three dimensional shape of objects using level set solutions on partial difference equation derived from helmholtz reciprocity condition
JP2010513907A (en) Camera system calibration
WO2012100225A1 (en) Systems and methods for generating a three-dimensional shape from stereo color images
JP2021196951A (en) Image processing apparatus, image processing method, program, method for manufacturing learned model, and image processing system
US8433187B2 (en) Distance estimation systems and method based on a two-state auto-focus lens
CN110782412A (en) Image processing method and device, processor, electronic device and storage medium
US20230410341A1 (en) Passive and single-viewpoint 3d imaging system
CN115147709B (en) Underwater target three-dimensional reconstruction method based on deep learning
CN112509021A (en) Parallax optimization method based on attention mechanism
CN114170290A (en) Image processing method and related equipment
CN110070610B (en) Feature point matching method, and feature point matching method and device in three-dimensional reconstruction process
CN114742875A (en) Binocular stereo matching method based on multi-scale feature extraction and self-adaptive aggregation
CN110335228B (en) Method, device and system for determining image parallax
CN113313740B (en) Disparity map and surface normal vector joint learning method based on plane continuity
CN113034666B (en) Stereo matching method based on pyramid parallax optimization cost calculation
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN117058252B (en) Self-adaptive fusion stereo matching method
WO2009099117A1 (en) Plane parameter estimating device, plane parameter estimating method, and plane parameter estimating program
CN117058183A (en) Image processing method and device based on double cameras, electronic equipment and storage medium
CN112950653B (en) Attention image segmentation method, device and medium
CN114782507A (en) Asymmetric binocular stereo matching method and system based on unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant