CN114581505A - Convolution-based binocular stereo matching network structure - Google Patents
Convolution-based binocular stereo matching network structure Download PDFInfo
- Publication number
- CN114581505A CN114581505A CN202210070978.XA CN202210070978A CN114581505A CN 114581505 A CN114581505 A CN 114581505A CN 202210070978 A CN202210070978 A CN 202210070978A CN 114581505 A CN114581505 A CN 114581505A
- Authority
- CN
- China
- Prior art keywords
- parallax
- image
- deconvolution
- convolution
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000010276 construction Methods 0.000 claims abstract description 8
- 238000004806 packaging method and process Methods 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000004148 unit process Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 abstract description 4
- 238000013461 design Methods 0.000 abstract description 3
- 238000001914 filtration Methods 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005538 encapsulation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a convolution-based binocular stereo matching network structure which comprises a feature extraction module, a coarse parallax value generation module, a parallax range prediction module, a cost space construction module, a coarse parallax image generation module and a fine parallax image generation module, wherein the feature extraction module is used for extracting a feature of a binocular stereo matching network structure; the characteristic extraction module is used for extracting characteristic data of an input image, processing the characteristic data and outputting a first characteristic image of the corresponding input image; the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image. According to the binocular stereo matching network structure based on convolution, the binocular image pair is used as input, the parallax image is directly output through the binocular stereo matching network, end-to-end network structure design is achieved, post-processing operations of a traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, are eliminated, and efficiency is greatly improved.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a binocular stereo matching network structure based on convolution.
Background
Computer vision is a subject of research that uses computers to simulate the human visual system.
Depth estimation from one or more RGB images is a long standing research-oriented problem with applications in various fields such as robotics, autopilot, object recognition and scene understanding, 3D modeling and animation, augmented reality, industrial control and medical diagnostics.
The binocular stereo matching technology is one of the cores in the field of computer vision, and is to shoot two RGB images by adopting two cameras positioned on the same horizontal line, find the corresponding relation of pixels in the images and obtain the depth by a triangulation principle.
The traditional binocular stereo matching is generally divided into four steps of matching cost calculation, cost aggregation, parallax calculation and post-processing. However, the conventional stereo matching method has a poor matching result for an occlusion region, a weak texture or a region with a repeated texture, and is sensitive to illumination, contrast and noise.
In recent years, it has been receiving wide attention that learning a strong representation of data by CNN based on a deep learning stereo matching method can also achieve a good effect, such as the MC-CNN method. However, the stereo matching method based on CNN uses the matching calculation result obtained by CNN to initialize the matching cost, and then still performs the same steps as the conventional stereo matching method, which is complicated in process.
Disclosure of Invention
The present invention is directed to solving the above problems by providing a convolution-based binocular stereo matching network structure.
The invention achieves the above purpose through the following technical scheme:
a binocular stereo matching network structure based on convolution, comprising:
the characteristic extraction module is used for extracting characteristic data input into a binocular image pair, processing the characteristic data and outputting a first characteristic image of a corresponding input image;
the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image;
the parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting the parallax range interval of each pixel point;
the cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
and the fine parallax image generation module is used for acquiring the coarse parallax image, processing the coarse parallax image and outputting the parallax image corresponding to the binocular image pair.
As a further optimization scheme of the present invention, the feature extraction module includes a first convolution unit provided with three two-dimensional convolutions, a residual error structural unit provided with four residual error modules, a second convolution unit provided with four two-dimensional convolutions, a first deconvolution unit provided with three deconvolution modules, a third convolution unit provided with three two-dimensional convolutions, and a second deconvolution unit provided with three deconvolution modules;
wherein,
the first convolution unit processes the input binocular image pair, and then the processed binocular image pair is processed by the residual error structure unit, the second convolution unit, the first deconvolution unit, the third convolution unit and the second deconvolution unit, and the second deconvolution unit outputs the first characteristic image.
As a further optimization scheme of the present invention, the input and output of each of the four residual error modules are used as the input of the next adjacent residual error module;
the output of each deconvolution module of the three deconvolution modules in the first deconvolution unit is used as the input of the next adjacent deconvolution module;
and the output of each deconvolution module of the three deconvolution modules in the second deconvolution unit is used as the input of the next adjacent deconvolution module, and the last deconvolution module in the second deconvolution unit outputs the first characteristic image.
As a further optimization scheme of the present invention, the coarse disparity value generating module comprises:
the parallax initialization unit is used for initializing N parallax values for each pixel point of the first characteristic image randomly in an initial parallax search range;
the parallax transmission unit is used for transmitting the random initialized parallax value of each pixel point in the horizontal and vertical directions, so that each pixel point has 5 multiplied by N random parallax values;
and the parallax evaluation unit is used for respectively calculating the matching similarity of each pixel point to the 5 multiplied by N random parallax values and selecting the parallax value with the highest matching similarity as the rough parallax value of the pixel point.
As a further optimization scheme of the present invention, the parallax range prediction module includes a first three-dimensional convolution unit provided with three-dimensional convolutions and a first three-dimensional deconvolution unit provided with three-dimensional deconvolution, the first three-dimensional convolution unit obtains a rough parallax value of a first feature image and each pixel point thereof, processes the rough parallax value, and outputs a range section in which the pixel point parallax is located through a last three-dimensional deconvolution of the first three-dimensional deconvolution unit;
wherein,
the output of each of the three-dimensional convolutions is used as the input of the next adjacent three-dimensional convolution;
the output of each of the three-dimensional deconvolution is taken as the input of the next adjacent three-dimensional deconvolution.
As a further optimization scheme of the present invention, the cost space construction module includes a first encapsulation layer, and is configured to encapsulate the parallax range interval of the first feature image and each pixel thereof into a four-dimensional cost space in a channel dimension.
As a further optimization of the present invention, the coarse parallax image generation module includes:
the first coding and decoding structure unit is used for acquiring a four-dimensional cost space, processing the four-dimensional cost space and outputting a second characteristic image corresponding to a binocular image pair;
and the rough parallax regression unit is used for acquiring a second characteristic image, processing the second characteristic image and outputting a rough parallax image with the same scale as the second characteristic image.
As a further optimization of the present invention, the fine parallax image generation module includes:
the second packaging layer is used for acquiring the first characteristic image, the second characteristic image and the rough parallax image and packaging the first characteristic image, the second characteristic image and the rough parallax image into a third characteristic image in a channel dimension;
the fourth convolution unit is used for acquiring and processing the third characteristic image and outputting a fine parallax image with the same scale as the coarse parallax image;
and the parallax image normalizing unit is used for acquiring the fine parallax image and performing interpolation up-sampling processing on the fine parallax image to obtain the parallax image with the same size as the binocular image.
The invention has the beneficial effects that:
the invention takes the binocular image pair as input, directly outputs parallax images through the binocular stereo matching network, realizes the end-to-end network structure design, eliminates the post-processing operations of the traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, and greatly improves the efficiency.
Drawings
FIG. 1 is a block diagram of the overall architecture of the present invention;
FIG. 2 is a block diagram of the structure of the feature extraction module of the present invention;
fig. 3 is a block diagram of the coarse parallax image generation module according to the present invention;
fig. 4 is a block diagram of the fine parallax image generation module of the present invention.
Detailed Description
The present application will now be described in further detail with reference to the drawings, it should be noted that the following detailed description is given for illustrative purposes only and is not to be construed as limiting the scope of the present application, as those skilled in the art will be able to make numerous insubstantial modifications and adaptations to the present application based on the above disclosure.
Example 1
As shown in fig. 1, a convolution-based binocular stereo matching network structure includes a feature extraction module, a coarse disparity value generation module, a disparity range prediction module, a cost space construction module, a coarse disparity image generation module, and a fine disparity image generation module;
the characteristic extraction module is used for extracting characteristic data of the input image, processing the characteristic data and outputting a first characteristic image of the corresponding input image; wherein, the input images are a left image and a right image of a binocular image pair;
as shown in fig. 2, the feature extraction module includes a first convolution unit, a residual structure unit, a second convolution unit, a first deconvolution unit, a third convolution unit, and a second deconvolution unit, wherein,
the first convolution unit comprises three two-dimensional convolutions, and the first convolution unit processes an input image and outputs the processed image to the residual error structure unit; the sizes of convolution kernels of the two-dimensional convolution are all 3 multiplied by 3, the step lengths are respectively 2, 1 and 1, and the number of output characteristic channels of the first convolution unit is 32;
the residual error structure unit comprises four residual error modules, wherein the input and the output of each residual error module are used as the input of the next adjacent residual error module, and the residual error structure unit processes the output data of the first convolution unit and outputs the processed data to the second convolution unit; the sizes of convolution kernels of the residual error modules are all 3 multiplied by 3, the step lengths are respectively 1, 2 and 1, and the output characteristic channel numbers of the four residual error modules are respectively 32, 64, 128 and 128;
the second convolution unit comprises four two-dimensional convolutions, and the second convolution unit processes the output data of the residual error structure unit and outputs the processed output data to the first deconvolution unit; the sizes of convolution kernels of the two-dimensional convolutions are all 3 multiplied by 3, the step lengths are respectively 1, 2 and 2, and the output characteristic channels of the four two-dimensional convolutions are respectively 32, 48, 64 and 96;
the first deconvolution unit comprises three deconvolution modules, wherein the output of each deconvolution module is used as the input of the next adjacent deconvolution module, and the first deconvolution unit processes the output data of the second convolution unit and outputs the processed data to the third convolution unit; the sizes of deconvolution kernels of the deconvolution modules are all 4 multiplied by 4, the step lengths of the three deconvolution modules are all 2, and the number of output characteristic channels is 64, 48 and 32 respectively;
the third convolution unit comprises three two-dimensional convolutions, and outputs the data output by the first deconvolution unit to the second deconvolution unit after processing; the convolution kernel size of the two-dimensional convolution is 3 multiplied by 3, the step length is 2, and the number of output characteristic channels is 48, 64 and 96 respectively;
the second deconvolution unit comprises three deconvolution modules, wherein the output of each deconvolution module is used as the input of the next adjacent deconvolution module, and the last deconvolution module outputs the first characteristic image; the sizes of deconvolution kernels of the deconvolution modules are all 4 multiplied by 4, the step lengths of the three deconvolution modules are all 2, and the number of output characteristic channels is 64, 48 and 32 respectively.
The feature extraction module uses fewer residual error modules (4), the structure of the feature extraction module is simple, the speed of the network is improved, the network can still have a larger receptive field, and the first feature image with the output size of H/8 xW/8 xC (C is the number of feature channels) is used for constructing a cost space with a small size.
The rough parallax value generation module is used for acquiring a first characteristic image, processing the first characteristic image and outputting a rough parallax value of each pixel point of the first characteristic image;
the coarse parallax value generation module comprises a parallax initialization unit, a parallax transmission unit and a parallax evaluation unit, wherein,
the parallax initialization unit randomly initializes N parallax values for each pixel point in an initial parallax search range after acquiring the first characteristic image and acquires a random initialization parallax value for each pixel point, the parallax transmission unit transmits the random initialization parallax value for each pixel point in the horizontal and vertical directions to enable each pixel point to have 5 multiplied by N random parallax values, the parallax evaluation unit respectively calculates matching similarity for the 5 multiplied by N random parallax values of each pixel point, and the parallax value with the highest matching degree is selected as a rough parallax value of the pixel point.
The method specifically comprises the steps of averagely dividing an initial parallax search range into N intervals, randomly initializing 1 parallax value for each pixel point in each interval, obtaining N randomly initialized parallax values for each pixel point, then carrying out horizontal and vertical direction transmission on the randomly initialized parallax values of each pixel point through One-Hot coding through a parallax transmission unit, enabling each pixel point to have 5 multiplied by N random parallax values, finally carrying out point multiplication operation on a first feature image in a channel dimension through a parallax evaluation unit to calculate matching similarity, and selecting the parallax value with the highest matching similarity in each interval as a rough parallax value of the pixel point.
The parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting a parallax range interval of each pixel point;
the parallax range prediction module comprises a first three-dimensional convolution unit and a first three-dimensional deconvolution unit, wherein the first three-dimensional convolution unit acquires the rough parallax value of the first characteristic image and each pixel point thereof, processes the rough parallax value and inputs the rough parallax value into the first three-dimensional deconvolution unit,
the first three-dimensional convolution unit comprises three-dimensional convolutions, and the output of each three-dimensional convolution is used as the input of the next adjacent three-dimensional convolution; the sizes of convolution kernels of the three-dimensional convolutions are all 3 multiplied by 3, and the step lengths of the three-dimensional convolutions are all 2;
the first three-dimensional deconvolution unit comprises three-dimensional deconvolution, the convolution kernel sizes of the three-dimensional deconvolution are all 3 multiplied by 3, the step lengths are all (1, 2, 2), the output of each three-dimensional deconvolution is used as the input of the next adjacent three-dimensional deconvolution, and the range interval where the pixel parallax is located is output by the last three-dimensional deconvolution.
The rough parallax value of the pixel point is firstly obtained, the small range interval where the pixel point parallax is located is obtained according to the rough parallax value, then the first feature image is utilized to construct a cost space with a small size according to the small range interval of the pixel point parallax, the parallax searching range of the pixel point is reduced, the calculated amount of the network is greatly reduced, the prediction accuracy of the network is guaranteed, and meanwhile the prediction speed of the network is improved.
The cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the cost space construction module comprises a first packaging layer, and the first packaging layer packages the parallax range interval of the first characteristic image and each pixel point of the first characteristic image into a four-dimensional cost space in the channel dimension.
The rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
as shown in fig. 3, the coarse parallax image generation module includes a first encoding/decoding structure unit and a coarse parallax regression unit, wherein,
the first coding and decoding structure unit acquires a four-dimensional cost space, processes the four-dimensional cost space and outputs a second characteristic image corresponding to the input image, and the coarse parallax regression unit acquires the second characteristic image, processes the second characteristic image and outputs a coarse parallax image with the same scale as the second characteristic image;
the first coding and decoding structure unit comprises three-dimensional convolution modules and three-dimensional deconvolution modules, the convolution kernel sizes of the three-dimensional convolution modules are all 3 multiplied by 3, the step sizes are all 2, the convolution kernel sizes of the three-dimensional deconvolution modules are all 3 multiplied by 3, and the step sizes are all (1, 2 and 2).
Specifically, the input of the first coding and decoding structure unit is a four-dimensional cost space, the output of the first coding and decoding structure unit is a second feature image corresponding to the input image, a coarse parallax regression unit is arranged, and the coarse parallax regression unit performs Softmax operation on the output of the first coding and decoding structure unit on the channel dimension to output a coarse parallax image with the same scale as the second feature image.
The coarse parallax image generation module is simple in structure (3 three-dimensional volumes and 3 three-dimensional deconvolution), and the calculation amount of the network is reduced and the network speed is increased by using a small number of three-dimensional volumes and three-dimensional deconvolution.
As shown in fig. 4, the fine parallax image generation module is configured to obtain the coarse parallax image, process the coarse parallax image, and output a parallax image corresponding to the input image;
the fine parallax image generation module includes a second encapsulation layer, a fourth convolution unit, and a parallax map normalization unit, wherein,
the second packaging layer obtains the first characteristic image, the second characteristic image and the coarse parallax image and packages the first characteristic image, the second characteristic image and the coarse parallax image in a channel dimension packaging layer to obtain a third characteristic image;
a fourth convolution unit acquires and processes the third characteristic image and outputs a fine image with the same scale as the coarse parallax image; the fourth convolution unit comprises seven two-dimensional convolutions, the sizes of convolution kernels are all 3 multiplied by 3, and the step sizes are all 1;
the disparity map normalizing unit acquires the fine disparity image and performs interpolation up-sampling processing on the fine disparity image to obtain the disparity image with the same scale as the input image.
And the fine parallax image generation module takes the first characteristic image and the second characteristic image output by the characteristic extraction module as guide images, abandons a complex residual error module, and outputs the fine parallax image through a fourth convolution unit (7 two-dimensional convolutions) with a simple structure, so that the network speed is increased.
It should be noted that the binocular image pair is used as input, and the parallax image is directly output through the binocular stereo matching network, so that the end-to-end network structure design is realized, the post-processing operations of the traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, are eliminated, and the efficiency is greatly improved.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Claims (8)
1. A binocular stereo matching network structure based on convolution is characterized by comprising:
the characteristic extraction module is used for extracting characteristic data input into a binocular image pair, processing the characteristic data and outputting a first characteristic image of a corresponding input image;
the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image;
the parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting the parallax range interval of each pixel point;
the cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
and the fine parallax image generation module is used for acquiring the coarse parallax image, processing the coarse parallax image and outputting the parallax image corresponding to the binocular image pair.
2. The binocular stereo matching network structure based on convolution of claim 1, wherein: the feature extraction module comprises a first convolution unit provided with three two-dimensional convolutions, a residual error structure unit provided with four residual error modules, a second convolution unit provided with four two-dimensional convolutions, a first deconvolution unit provided with three deconvolution modules, a third convolution unit provided with three two-dimensional convolutions and a second deconvolution unit provided with three deconvolution modules;
wherein,
the first convolution unit processes the input binocular image pair, and then the processed binocular image pair is processed by the residual error structure unit, the second convolution unit, the first deconvolution unit, the third convolution unit and the second deconvolution unit, and the second deconvolution unit outputs the first characteristic image.
3. The binocular stereo matching network structure based on convolution of claim 2, wherein: the input and output of each of the four residual modules are used as the input of the next adjacent residual module;
the output of each deconvolution module of the three deconvolution modules in the first deconvolution unit is used as the input of the next adjacent deconvolution module;
and the output of each deconvolution module of the three deconvolution modules in the second deconvolution unit is used as the input of the next adjacent deconvolution module, and the last deconvolution module in the second deconvolution unit outputs the first characteristic image.
4. The binocular disparity value matching network structure based on convolution of claim 3, wherein the coarse disparity value generating module comprises:
the parallax initialization unit is used for initializing N parallax values for each pixel point of the first characteristic image randomly in an initial parallax search range;
the parallax transmission unit is used for transmitting the random initialized parallax value of each pixel point in the horizontal and vertical directions, so that each pixel point has 5 multiplied by N random parallax values;
and the parallax evaluation unit is used for respectively calculating the matching similarity of each pixel point to the 5 multiplied by N random parallax values and selecting the parallax value with the highest matching similarity as the rough parallax value of the pixel point.
5. The binocular stereo matching network structure based on convolution of claim 4, wherein: the parallax range prediction module comprises a first three-dimensional convolution unit provided with three-dimensional convolutions and a first three-dimensional deconvolution unit provided with three-dimensional deconvolution, wherein the first three-dimensional convolution unit acquires a first characteristic image and a rough parallax value of each pixel point of the first characteristic image, processes the rough parallax value and outputs a range section where the pixel point parallax is located through the last three-dimensional deconvolution of the first three-dimensional deconvolution unit;
wherein,
the output of each of the three-dimensional convolutions is used as the input of the next adjacent three-dimensional convolution;
the output of each of the three-dimensional deconvolution is taken as the input of the next adjacent three-dimensional deconvolution.
6. The binocular stereo matching network structure based on convolution of claim 1, wherein: the cost space construction module comprises a first packaging layer and is used for packaging the first characteristic image and the parallax range interval of each pixel point of the first characteristic image into a four-dimensional cost space in a channel dimension.
7. The binocular stereo matching network structure based on convolution of claim 1, wherein: the coarse parallax image generation module includes:
the first coding and decoding structure unit is used for acquiring a four-dimensional cost space, processing the four-dimensional cost space and outputting a second characteristic image corresponding to a binocular image pair;
and the rough parallax regression unit is used for acquiring a second characteristic image, processing the second characteristic image and outputting a rough parallax image with the same scale as the second characteristic image.
8. The binocular stereo matching network structure based on convolution of claim 1, wherein: the fine parallax image generation module includes:
the second packaging layer is used for acquiring the first characteristic image, the second characteristic image and the coarse parallax image and packaging the first characteristic image, the second characteristic image and the coarse parallax image into a third characteristic image in a channel dimension;
the fourth convolution unit is used for acquiring and processing the third characteristic image and outputting a fine parallax image with the same scale as the coarse parallax image;
and the parallax image normalizing unit is used for acquiring the fine parallax image and performing interpolation up-sampling processing on the fine parallax image to obtain the parallax image with the same size as the binocular image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210070978.XA CN114581505B (en) | 2022-01-21 | 2022-01-21 | Binocular stereo matching network system based on convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210070978.XA CN114581505B (en) | 2022-01-21 | 2022-01-21 | Binocular stereo matching network system based on convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114581505A true CN114581505A (en) | 2022-06-03 |
CN114581505B CN114581505B (en) | 2024-07-09 |
Family
ID=81772923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210070978.XA Active CN114581505B (en) | 2022-01-21 | 2022-01-21 | Binocular stereo matching network system based on convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114581505B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110009691A (en) * | 2019-03-28 | 2019-07-12 | 北京清微智能科技有限公司 | Based on the matched anaglyph generation method of binocular stereo vision and system |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
WO2020182117A1 (en) * | 2019-03-12 | 2020-09-17 | 腾讯科技(深圳)有限公司 | Method, apparatus, and device for obtaining disparity map, control system, and storage medium |
CN112132201A (en) * | 2020-09-17 | 2020-12-25 | 长春理工大学 | Non-end-to-end stereo matching method based on convolutional neural network |
WO2021138992A1 (en) * | 2020-01-10 | 2021-07-15 | 大连理工大学 | Disparity estimation optimization method based on up-sampling and accurate rematching |
CN113313740A (en) * | 2021-05-17 | 2021-08-27 | 北京航空航天大学 | Disparity map and surface normal vector joint learning method based on plane continuity |
-
2022
- 2022-01-21 CN CN202210070978.XA patent/CN114581505B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020182117A1 (en) * | 2019-03-12 | 2020-09-17 | 腾讯科技(深圳)有限公司 | Method, apparatus, and device for obtaining disparity map, control system, and storage medium |
CN110009691A (en) * | 2019-03-28 | 2019-07-12 | 北京清微智能科技有限公司 | Based on the matched anaglyph generation method of binocular stereo vision and system |
CN110533712A (en) * | 2019-08-26 | 2019-12-03 | 北京工业大学 | A kind of binocular solid matching process based on convolutional neural networks |
WO2021138992A1 (en) * | 2020-01-10 | 2021-07-15 | 大连理工大学 | Disparity estimation optimization method based on up-sampling and accurate rematching |
CN112132201A (en) * | 2020-09-17 | 2020-12-25 | 长春理工大学 | Non-end-to-end stereo matching method based on convolutional neural network |
CN113313740A (en) * | 2021-05-17 | 2021-08-27 | 北京航空航天大学 | Disparity map and surface normal vector joint learning method based on plane continuity |
Non-Patent Citations (1)
Title |
---|
习路;陆济湘;涂婷;: "基于多尺度卷积神经网络的立体匹配方法", 计算机工程与设计, no. 09, 16 September 2018 (2018-09-16) * |
Also Published As
Publication number | Publication date |
---|---|
CN114581505B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110533712B (en) | Binocular stereo matching method based on convolutional neural network | |
Shin et al. | Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images | |
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
CN109598754B (en) | Binocular depth estimation method based on depth convolution network | |
US11348270B2 (en) | Method for stereo matching using end-to-end convolutional neural network | |
CN111259945B (en) | Binocular parallax estimation method introducing attention map | |
US20220178688A1 (en) | Method and apparatus for binocular ranging | |
CN110009674B (en) | Monocular image depth of field real-time calculation method based on unsupervised depth learning | |
CN110148181A (en) | A kind of general binocular solid matching process | |
CN113962858B (en) | Multi-view depth acquisition method | |
CN109902702A (en) | The method and apparatus of target detection | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN112509021A (en) | Parallax optimization method based on attention mechanism | |
Leite et al. | Exploiting motion perception in depth estimation through a lightweight convolutional neural network | |
CN114677479A (en) | Natural landscape multi-view three-dimensional reconstruction method based on deep learning | |
CN113762267A (en) | Multi-scale binocular stereo matching method and device based on semantic association | |
CN117152580A (en) | Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method | |
CN117851630A (en) | Method and apparatus with neural scene representation data | |
CN108830890B (en) | Method for estimating scene geometric information from single image by using generative countermeasure network | |
CN117745596B (en) | Cross-modal fusion-based underwater de-blocking method | |
CN114092540A (en) | Attention mechanism-based light field depth estimation method and computer readable medium | |
Jia et al. | Multi-scale cost volumes cascade network for stereo matching | |
Hyun et al. | Hardware-friendly architecture for a pseudo 2D weighted median filter based on sparse-window approach | |
CN114581505B (en) | Binocular stereo matching network system based on convolution | |
CN115035173B (en) | Monocular depth estimation method and system based on inter-frame correlation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |