CN116703999A - Residual fusion method for binocular stereo matching - Google Patents

Residual fusion method for binocular stereo matching Download PDF

Info

Publication number
CN116703999A
CN116703999A CN202310972969.4A CN202310972969A CN116703999A CN 116703999 A CN116703999 A CN 116703999A CN 202310972969 A CN202310972969 A CN 202310972969A CN 116703999 A CN116703999 A CN 116703999A
Authority
CN
China
Prior art keywords
cost volume
volume
cost
stereo matching
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310972969.4A
Other languages
Chinese (zh)
Inventor
俞正中
翟聚才
钱刃
杨文帮
赵勇
李福池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Aipeike Technology Co ltd
Original Assignee
Dongguan Aipeike Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Aipeike Technology Co ltd filed Critical Dongguan Aipeike Technology Co ltd
Priority to CN202310972969.4A priority Critical patent/CN116703999A/en
Publication of CN116703999A publication Critical patent/CN116703999A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

A residual fusion method for binocular stereo matching relates to the field of stereo matching. The method comprises the following steps: respectively acquiring image features of a left view and a right view of the binocular camera; performing point-by-point correlation on the image features of the left view and the right view to construct cost volumes with a plurality of set scales; performing nonlinear operation on each cost volume with set scale to correspondingly obtain a first cost volume, and performing linear operation on the first cost volume to correspondingly obtain a second cost volume; fitting the second price roll by using an attention module to correspondingly obtain a third price roll; upsampling the third cost volume to the first set resolution to obtain a fourth cost volume; performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume; fusing the residual cost volume to a cost volume corresponding to a set scale to obtain a parallax regression graph; up-sampling the parallax regression graph to a second set resolution to obtain a parallax graph; geometric information of objects in the left view and the right view is estimated using the disparity map.

Description

Residual fusion method for binocular stereo matching
Technical Field
The application relates to the field of stereo matching, in particular to a residual fusion algorithm for binocular stereo matching.
Background
In stereo matching of binocular vision, a key problem is to find corresponding points of left and right images to obtain a horizontal position difference of corresponding pixels in the two images, which is also called parallax. The depth of each pixel can be directly calculated according to the parameters between the parallax and the binocular camera.
Currently, most methods use convolutional neural networks for stereo matching, and the network model of the stereo matching generally comprises four parts: feature extraction, cost calculation, cost aggregation and parallax regression. Some network models use 3D convolution during cost aggregation, bring more floating point calculations while achieving high accuracy, and are long running and difficult to deploy in real-time applications. Still other network models use 2D convolution during cost aggregation, but 2D convolution is less accurate in depth estimation, reducing the applicability of binocular vision stereo matching networks.
Disclosure of Invention
The application mainly solves the technical problems that: a high-precision residual fusion method for binocular stereo matching is provided.
According to a first aspect, an embodiment provides a residual fusion algorithm for binocular stereo matching, comprising:
respectively acquiring image features of a left view and a right view of the binocular camera;
performing point-by-point correlation on the image features of the left view and the right view to construct cost volumes with a plurality of set scales;
performing nonlinear operation on each cost volume with a set scale to correspondingly obtain a first cost volume, and performing linear operation on the first cost volume to correspondingly obtain a second cost volume;
fitting the second price volume by using an attention module to correspondingly obtain a third price volume;
upsampling the third cost volume to a first set resolution to obtain a fourth cost volume; performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume; fusing the residual cost volume to the third price volume to obtain a parallax characteristic diagram; fusing the parallax characteristic map to a cost roll corresponding to a set scale to obtain a parallax regression map;
up-sampling the parallax regression graph to a second set resolution to obtain a parallax graph; and estimating geometric information of objects in the left view and the right view by using the disparity map.
In one embodiment, the capturing image features of the left view and the right view of the binocular camera respectively includes:
and respectively extracting the characteristics of the images of the different areas of the left view and the right view by using the same convolution kernel so as to correspondingly obtain the image characteristics of the left view and the right view.
In one embodiment, the point-by-point correlation of the image features of the left view and the right view to construct a plurality of scale cost volumes includes:
the scaled cost volume includes: 1/3 Dmax. Times.1/3 Hx 1/3W, 1/6 Dmax. Times.1/6 Hx 1/6W, and 1/12 Dmax. Times.1/12 Hx 1/12W;
where Dmax denotes a maximum parallax range, H denotes original heights of the left and right views, and W denotes original widths of the left and right views.
In one embodiment, the fitting the second cost volume with the attention module to obtain a third cost volume includes:
fitting the image features of the points around the points in the second cost volume to the points to obtain a third cost volume, wherein the third cost volume is used for increasing the connection between the points in the second cost volume and the surrounding points.
In one embodiment, the upsampling the third cost volume to the first set resolution includes:
and carrying out point-by-point convolution on the third price volume to enlarge the channel number of the third price volume, and up-sampling the third price volume with the enlarged channel number by a nearest neighbor interpolation method to reach the first set resolution.
In one embodiment, the first set resolution comprises 1/2 resolution.
In one embodiment, the fusing the disparity feature map to a cost volume corresponding to a set scale includes:
and adjusting the cost volume with the set dimension into the dimension of the parallax characteristic map through a set convolution layer, and fusing each point image characteristic in the parallax characteristic map to the cost volume with the set dimension, wherein the dimension of the cost volume is adjusted.
In one embodiment, the set convolution layer comprises a 3 x 3 convolution layer with a step size of 2.
In one embodiment, the second set resolution includes original resolutions of left and right views.
According to a second aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement the above-described residual fusion method for binocular stereo matching.
According to the residual fusion method and the computer-readable storage medium for binocular stereo matching of the embodiments, cost volumes of different set scales are constructed according to image features of left view and right view, so that a disparity map can contain information of different scales. And then, respectively processing the cost volume of each set scale by utilizing nonlinear operation and linear operation, so that higher-order image information can be obtained to obtain a more accurate parallax image. The use of an attention module can emphasize different parts of each cost volume and also facilitate further cross-scale aggregation. And finally, fusing by utilizing an up-sampling mode and a down-sampling mode, so that the fused image features have stronger recognition capability.
Drawings
FIG. 1 is a flowchart I of a residual fusion method for binocular stereo matching in one embodiment;
FIG. 2 is a second flowchart of a residual fusion method for binocular stereo matching in one embodiment;
FIG. 3 is a schematic block diagram of a residual fusion method for binocular stereo matching in one embodiment;
FIG. 4 is a third flowchart of a residual fusion method for binocular stereo matching in one embodiment;
fig. 5 is a flowchart four of a residual fusion method for binocular stereo matching in one embodiment.
Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.
An embodiment of the application provides a residual fusion method for binocular stereo matching, which consists of four parts of feature extraction, cost volume construction, cost aggregation and parallax refinement. Please refer to fig. 1, which specifically includes the following steps.
In the feature extraction stage, step S100 is adopted: and respectively acquiring the image characteristics of the left view and the right view of the binocular camera.
In some embodiments, when performing step S100 to obtain the image features of the left view and the right view of the binocular camera, please refer to fig. 2, the following steps are further included.
Step S110: and respectively extracting the characteristics of the images of the different areas of the left view and the right view by using the same convolution kernel so as to correspondingly obtain the image characteristics of the left view and the right view.
In some embodiments, a stacked hourglass extractor is used to extract features from images of areas where left and right views are not used. The stacked hourglass extractor is formed by stacking a plurality of "hourglass" modules, including a plurality of convolution kernels. The left view and the right view are respectively extracted by using the same convolution kernel in a plurality of convolution kernels, and the extracted image features comprise different scales because the stacked hourglass extractor comprises the plurality of convolution kernels. And then, splicing the features between different scales of the left view and the right view by using dense connection so as to correspondingly obtain the image features of the left view and the right view.
In the cost volume construction stage, step S200 is adopted: the image features of the left view and the right view are correlated point by point to construct a cost volume of a plurality of set scales.
In some embodiments, each corresponding pixel point in the left view and the right view is correlated point by point, so as to construct a plurality of cost volumes with set scales.
In some embodiments, the scaled cost volume includes 1/3Dmax 1/3H 1/3W, 1/6Dmax 1/6H 1/6W, and 1/12Dmax 1/12H 1/12W. Where Dmax denotes a maximum parallax range, H denotes original heights of the left and right views, and W denotes original widths of the left and right views. Different cost information can be acquired by using the constructed cost volumes with different scales, so that a parallax map with higher precision is constructed.
In the cost aggregation stage, step S300 is adopted: and determining a parallax regression graph according to the cost volumes with different set scales. Referring to fig. 3, in step S300, three sub-blocks, namely a combination module 300a, an attention module 300b and a residual fusion module 300c, are adopted for determining a disparity regression map according to cost volumes with different set scales.
In some embodiments, referring to fig. 4, determining a disparity regression map according to cost volumes of different set scales in step S300 includes the following steps.
Step S310 is performed in the combination module 300 a: and carrying out nonlinear operation on each cost volume with set scale to correspondingly obtain a first cost volume, and carrying out linear operation on the first cost volume to correspondingly obtain a second cost volume.
In some embodiments, each scaled cost volume of the construct is processed using a residual structure consisting of a 1*1 point volume and a 3*3 depth convolution. Nonlinear operation is carried out on each cost volume with set scale by using a ReLU activation function to obtain first cost volumes corresponding to each cost volume with set scale, and then each first cost volume is processed by using linear transformation to obtain corresponding second cost volumes.
Processing each scaled cost volume of the construct using a residual structure consisting of 1*1 point convolution and 3*3 depth convolution refers to: by adding some extra convolution layers on the basis of the scaled cost volume, a structure similar to a residual network is formed. The structure can enhance the expression capability of image characteristics and improve the accuracy and the robustness of binocular stereo matching according to the parallax images. In addition, the point convolution can reduce the dimension and increase the nonlinearity, and the depth convolution can increase the depth and receptive field of the network, so that the expression capability of the image features is further improved.
Step S320 is performed in the attention module 300 b: the second cost volume is fitted by the attention module to correspondingly obtain a third cost volume.
In some embodiments, image features of points surrounding each point in the second cost volume are fitted to the point to obtain a third cost volume. That is, there are several points around each point in the second cost volume, so each point may be referred to as a center point, and the image features of the points around the center point are fitted to the center point, so that the center point includes the image features of the points around.
In some embodiments, the attention module is a lightweight attention module, with which attention can be focused on the most relevant image features for which a disparity map is to be obtained, thereby improving the accuracy of the disparity map. Let V epsilon R C *h*w Is the second cost volume of input, where C is the number of input channels and h and w are the height and width of the second cost volume. The cost volume is divided into g groups in the channel direction, and each group is processed separately. group g is expressed in [ V ] 1 ,V 2 ,...,V g ]V is set up k Is defined as one of the groups, wherein k is equal to or greater than 1 and g is equal to or less than g. Processing of the second cost volume with the attention module may be accomplished by the following formula:
wherein maxpool is 3*3 max pooling layer and PW isPoint-by-point convolution, A k Is from V k The inferred attention is sought. Each group A k Spatial relationships are captured by learning cross-channel information. The softmax was used to activate the probability of correlation between the construction points. For each group, outputIs obtained by element multiplication and addition, and the output third generation price volume V' is obtained by stacking all highlighted +.>The result, concat, represents an increase in the number of channels.
Step S330 is performed in the residual fusion module 300 c: upsampling the third cost volume to the first set resolution to obtain a fourth cost volume; performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume; fusing the residual cost volume to a third price volume to obtain a parallax characteristic diagram; and fusing the parallax characteristic map to a cost volume corresponding to the set scale to obtain a parallax regression map.
Where resolution refers to the image size, i.e. length-width size, of the input model. The magnitude of the input resolution is typically determined by the number of model downsampling and the resolution of the feature map after the last downsampling.
In convolutional neural networks, since the size of the output tends to be small after the input image has been characterized by the convolutional neural network, it is sometimes necessary to restore the image to its original size for further computation, an operation that maps the image from a small resolution to a large resolution, called upsampling.
Referring to fig. 5, in some embodiments, step S330 is performed to upsample the third cost volume to the first set resolution to obtain a fourth cost volume; performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume; when fusing the residual cost volume to the cost volume corresponding to the set scale to obtain the parallax regression graph, the method comprises the following steps:
step S331: and carrying out point-by-point convolution on the third price volume to enlarge the channel number, and up-sampling the third price volume with the enlarged channel number by a nearest neighbor interpolation method to reach the first set resolution to form a fourth price volume.
In some embodiments, the first set resolution is 1/2 resolution.
Step S332: and performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume, and processing the residual cost volume through a 3*3 convolution layer with the step length of 1 to fuse the residual cost volume to the third cost volume to obtain a parallax characteristic diagram.
In some embodiments, the disparity map is obtained by the following formula:
wherein R is up The up sampling processing result is represented, UPsampling is carried out, PW is pointwise, and the method is used for expanding the channel number, V l Representing low resolution cost volumes, V h Representing a high resolution cost volume, conv represents a convolution operation,representing the disparity characteristics of the residual cost volume fused to the high resolution cost volume.
Step S333: and (3) adjusting the cost volume with the set dimension into the dimension of the parallax characteristic map through a 3X 3 convolution layer with the step length of 2 by utilizing a downsampling mode, and fusing each point image characteristic in the parallax characteristic map to the cost volume with the set dimension adjusted to obtain a parallax regression map.
In some embodiments, the disparity regression map is obtained by the following formula:
wherein R is down Representing the result of the downsampling process, conv represents the convolution operation, stride=2 represents the convolution step size of 2, which is used as downsampling, V h Representing high resolution cost volumes, V l Representing a low-resolution cost volume of the cost,representing the disparity characteristics of the residual cost volume fused to the low resolution cost volume.
In some embodiments, in upsampling and downsampling, residual cost volumes are concentrated on different information between different inputs, and the addition operation saves the original cost volume in each branch to save information, with fused features having a stronger recognition capability than simple additions or splices.
In the parallax refinement stage, step S400 is employed: and up-sampling the parallax regression graph to a second set resolution to obtain a parallax graph, and estimating geometric information of objects in the left view and the right view by using the parallax graph.
The disparity map is an image in which any one of a left view and a right view is used as a reference image, the size of the disparity map is equal to that of the reference image, the element value is a disparity value, and the disparity map contains geometric distance information of a scene. The successive disparity maps are estimated in a disparity regression manner using the disparity regression map. In some embodiments, the second set resolution is the original resolution of the left and right views.
In the residual fusion method for binocular stereo matching provided by the application, after the image features of the left view and the right view under the binocular camera are obtained, a plurality of cost volumes with set scales are constructed for the image features of the left view and the right view in a point-by-point correlation mode. Taking 1/3Dmax×1/3 Hx 1/3W as an example, nonlinear operation is performed on 1/3Dmax×1/3 Hx 1/3W cost rolls to obtain a first cost roll corresponding to 1/3Dmax×1/3 Hx 1/3W cost rolls, and then linear operation is performed on the first cost roll to obtain a second cost roll. Fitting the second cost volume by using an attention module to obtain a corresponding third cost volume, upsampling the third cost volume to 1/2 resolution to obtain a fourth cost volume, differencing the third cost volume and the fourth cost volume to obtain a residual cost volume, fusing the residual cost volume to the third cost volume to obtain a parallax regression graph, and finally fusing the parallax regression graph to the cost volume of 1/3Dmax multiplied by 1/3 Hmultiplied by 1/3W to obtain the parallax regression graph. The other two scaled cost volumes 1/6Dmax×1/6 Hx 1/6W and 1/12Dmax×1/12 Hx 1/12W are subjected to the same process, and will not be described again here.
The residual fusion method for binocular stereo matching provided by the application combines the characteristics of different receptive fields by utilizing the combination module, emphasizes the obvious areas in different cost volumes by utilizing the attention module, and focuses on extracting the difference between adjacent convolutions by utilizing the residual fusion module instead of simple stacking or adding by using gradual residual aggregation, thereby improving the accuracy of binocular stereo matching and the accuracy of the acquired parallax map.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims (10)

1. A residual fusion method for binocular stereo matching, comprising:
respectively acquiring image features of a left view and a right view of the binocular camera;
performing point-by-point correlation on the image features of the left view and the right view to construct cost volumes with a plurality of set scales;
performing nonlinear operation on each cost volume with a set scale to correspondingly obtain a first cost volume, and performing linear operation on the first cost volume to correspondingly obtain a second cost volume;
fitting the second price volume by using an attention module to correspondingly obtain a third price volume;
upsampling the third cost volume to a first set resolution to obtain a fourth cost volume; performing difference on the third cost volume and the fourth cost volume to obtain a residual cost volume; fusing the residual cost volume to the third price volume to obtain a parallax characteristic diagram; fusing the parallax characteristic map to a cost roll corresponding to a set scale to obtain a parallax regression map;
up-sampling the parallax regression graph to a second set resolution to obtain a parallax graph; and estimating geometric information of objects in the left view and the right view by using the disparity map.
2. The residual fusion method for binocular stereo matching according to claim 1, wherein the respectively acquiring image features of left and right views under the binocular camera comprises:
and respectively extracting the characteristics of the images of the different areas of the left view and the right view by using the same convolution kernel so as to correspondingly obtain the image characteristics of the left view and the right view.
3. The residual fusion method for binocular stereo matching of claim 1, wherein the point-wise correlating of the image features of the left and right views to construct a cost volume of several set scales comprises:
the scaled cost volume includes: 1/3 Dmax. Times.1/3 Hx 1/3W, 1/6 Dmax. Times.1/6 Hx 1/6W, and 1/12 Dmax. Times.1/12 Hx 1/12W;
where Dmax denotes a maximum parallax range, H denotes original heights of the left and right views, and W denotes original widths of the left and right views.
4. The residual fusion method for binocular stereo matching of claim 1, wherein fitting the second cost volume with an attention module to obtain a third cost volume comprises:
fitting the image features of the points around the points in the second cost volume to the points to obtain a third cost volume, wherein the third cost volume is used for increasing the connection between the points in the second cost volume and the surrounding points.
5. The residual fusion method for binocular stereo matching of claim 1, wherein upsampling the third cost volume to the first set resolution comprises:
and carrying out point-by-point convolution on the third price volume to enlarge the channel number of the third price volume, and up-sampling the third price volume with the enlarged channel number by a nearest neighbor interpolation method to reach the first set resolution.
6. The residual fusion method for binocular stereo matching of claim 5, wherein the first set resolution comprises 1/2 resolution.
7. The residual fusion method for binocular stereo matching of claim 1, wherein the fusing the disparity feature map to a corresponding scaled cost volume comprises:
and adjusting the cost volume with the set dimension into the dimension of the parallax characteristic map through a set convolution layer, and fusing each point image characteristic in the parallax characteristic map to the cost volume with the set dimension, wherein the dimension of the cost volume is adjusted.
8. The residual fusion method for binocular stereo matching of claim 7, wherein the set convolution layer comprises a 3 x 3 convolution layer with a step size of 2.
9. The residual fusion method for binocular stereo matching of claim 1, wherein the second set resolution includes original resolutions of left and right views.
10. A computer readable storage medium, characterized in that the medium has stored thereon a program executable by a processor to implement the method of any of claims 1-9.
CN202310972969.4A 2023-08-04 2023-08-04 Residual fusion method for binocular stereo matching Pending CN116703999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310972969.4A CN116703999A (en) 2023-08-04 2023-08-04 Residual fusion method for binocular stereo matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310972969.4A CN116703999A (en) 2023-08-04 2023-08-04 Residual fusion method for binocular stereo matching

Publications (1)

Publication Number Publication Date
CN116703999A true CN116703999A (en) 2023-09-05

Family

ID=87841823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310972969.4A Pending CN116703999A (en) 2023-08-04 2023-08-04 Residual fusion method for binocular stereo matching

Country Status (1)

Country Link
CN (1) CN116703999A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022120988A1 (en) * 2020-12-11 2022-06-16 深圳先进技术研究院 Stereo matching method based on hybrid 2d convolution and pseudo 3d convolution
CN116051752A (en) * 2023-02-22 2023-05-02 桂林电子科技大学 Binocular stereo matching algorithm based on multi-scale feature fusion cavity convolution ResNet
CN116229123A (en) * 2023-02-21 2023-06-06 深圳市爱培科技术股份有限公司 Binocular stereo matching method and device based on multi-channel grouping cross-correlation cost volume

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022120988A1 (en) * 2020-12-11 2022-06-16 深圳先进技术研究院 Stereo matching method based on hybrid 2d convolution and pseudo 3d convolution
CN116229123A (en) * 2023-02-21 2023-06-06 深圳市爱培科技术股份有限公司 Binocular stereo matching method and device based on multi-channel grouping cross-correlation cost volume
CN116051752A (en) * 2023-02-22 2023-05-02 桂林电子科技大学 Binocular stereo matching algorithm based on multi-scale feature fusion cavity convolution ResNet

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ZIJING HUANG ET AL: ""Fast Multi-Scale Residual Fusion Network for Stereo Matching"", 《2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO(ICME)》, pages 1 - 6 *
井明涛等: ""基于改进的Mask R-CNN的游泳池溺水检测研究"", 《青岛大学学报(工程技术版)》, vol. 36, no. 1, pages 1 - 7 *
刘杰平等: ""基于多尺度注意力导向网络的单目图像深度估计"", 《华南理工大学学报(自然科学版)》, vol. 48, no. 12, pages 52 - 62 *
周兴杰等: ""一种改进型卷积神经网络文本识别方法"", 《江苏理工学院报》, vol. 26, no. 6, pages 44 - 49 *

Similar Documents

Publication Publication Date Title
CN109101975B (en) Image semantic segmentation method based on full convolution neural network
CN110476185B (en) Depth of field information estimation method and device
CN111915660B (en) Binocular disparity matching method and system based on shared features and attention up-sampling
EP2466901B1 (en) Depth data upsampling
CN109635714B (en) Correction method and device for document scanning image
CN110880162A (en) Snapshot spectrum depth combined imaging method and system based on deep learning
CN113344869A (en) Driving environment real-time stereo matching method and device based on candidate parallax
CN115311454A (en) Image segmentation method based on residual error feature optimization and attention mechanism
CN114742875A (en) Binocular stereo matching method based on multi-scale feature extraction and self-adaptive aggregation
CN115272683A (en) Central differential information filtering phase unwrapping method based on deep learning
KR101795952B1 (en) Method and device for generating depth image of 2d image
CN113034666B (en) Stereo matching method based on pyramid parallax optimization cost calculation
CN115035551B (en) Three-dimensional human body posture estimation method, device, equipment and storage medium
CN116703999A (en) Residual fusion method for binocular stereo matching
CN116630388A (en) Thermal imaging image binocular parallax estimation method and system based on deep learning
CN112950653B (en) Attention image segmentation method, device and medium
EP4198897A1 (en) Vehicle motion state evaluation method and apparatus, device, and medium
CN112203023B (en) Billion pixel video generation method and device, equipment and medium
CN115496654A (en) Image super-resolution reconstruction method, device and medium based on self-attention mechanism
CN114998630A (en) Ground-to-air image registration method from coarse to fine
CN114445277A (en) Depth image pixel enhancement method and device and computer readable storage medium
CN113112547A (en) Robot, repositioning method thereof, positioning device and storage medium
Cho et al. Depth map up-sampling using cost-volume filtering
CN113674154A (en) Single image super-resolution reconstruction method and system based on generation countermeasure network
CN117058252B (en) Self-adaptive fusion stereo matching method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination