CN111833386A - Pyramid binocular stereo matching method based on multi-scale information and attention mechanism - Google Patents
Pyramid binocular stereo matching method based on multi-scale information and attention mechanism Download PDFInfo
- Publication number
- CN111833386A CN111833386A CN202010707918.5A CN202010707918A CN111833386A CN 111833386 A CN111833386 A CN 111833386A CN 202010707918 A CN202010707918 A CN 202010707918A CN 111833386 A CN111833386 A CN 111833386A
- Authority
- CN
- China
- Prior art keywords
- attention mechanism
- information
- pyramid
- module
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000007246 mechanism Effects 0.000 title claims abstract description 19
- 238000000605 extraction Methods 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 3
- 230000001575 pathological effect Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000012937 correction Methods 0.000 claims description 2
- 239000011796 hollow space material Substances 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000016273 neuron death Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision. The method comprises the steps of firstly obtaining information among pixels from an original image by adopting a channel attention mechanism with convolution kernels of different sizes, then expanding a receptive field by adopting a hollow space pyramid module to obtain multi-scale information, and finally calculating parallax by using a three-dimensional channel attention mechanism and a stacked separable coding and decoding structure to obtain depth information. The depth test of the corrected binocular image is carried out by using the method, and the result shows that the algorithm can not only improve the matching accuracy, but also reduce the parameter quantity and the calculated quantity of the model, and shorten the running time.
Description
Technical Field
The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision.
Background
As a research hotspot in the field of computer vision, binocular stereo matching is widely applied in the fields of three-dimensional reconstruction, automatic driving, mobile robots and the like. For a set of corrected stereo images captured by a binocular camera, the essence of stereo matching is to compute the disparity of each pixel in the image. Generally, stereo matching algorithms are mainly classified into two types, one is a conventional algorithm, and the other is a convolutional neural network-based method. The development of the traditional stereo matching method is limited by adopting a method of manually selecting features. With the development of deep learning, the convolutional neural network exhibits strong computing power and feature extraction capability. Therefore, current research is mainly focused on neural network-based approaches. However, how to improve the information extraction capability of the network and achieve obtaining an accurate disparity map in a pathological region (such as a weak texture region, a reflective surface, etc.) still has certain difficulties.
Disclosure of Invention
Aiming at the problems, the invention provides a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism. The method comprises three modules, namely an adaptive feature extraction module, a context information extraction module and a parallax calculation module. In order to achieve the purpose, the technical scheme of the invention is as follows:
a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism comprises the following steps:
step one, acquiring a binocular image after parameter correction;
and step two, acquiring the weighted channel dimension characteristic by adopting a self-adaptive characteristic extraction module. The self-adaptive feature extraction module takes a residual network block as a basis, increases a channel attention module with multiple convolution kernels, enables the network to obtain features with different scales, can improve the weight occupied by the features with rich effective information, is beneficial to improving the subsequent matching precision, adopts the convolution layer to process the features after global pooling to improve the learning capacity of the network, and uses a PReLU function as an activation function to retain more detailed information;
step three, adopting a cavity space pyramid pooling structure with four convolution branches with different cavity rates and a global average pooling layer as a context information extraction module to obtain multi-scale context information and global context information in the image as information of image dimensionality, so as to improve the accuracy of the network in processing the ill-conditioned area;
step four, fusing the features in the step one with the features in the step two to construct a matching cost volume, calculating depth information by adopting the matching cost volume constructed by three-dimensional separable convolution processing in a parallax calculation module, wherein the parallax calculation module only reserves links among all encoding and decoding structures, and for the situation that no jump links exist among the encoding and decoding modules except for an added channel attention module, parameters of a network can be effectively reduced; meanwhile, in order to ensure that the matching precision is not lost under the condition of reducing parameters, a three-dimensional channel attention mechanism is added to the parallax calculation module.
Has the advantages that:
the invention provides a novel end-to-end stereo matching network. The extraction of local features and global features is realized by designing a self-adaptive feature extraction module and a multi-scale information extraction module. And then, a parallax calculation module is constructed by utilizing a three-dimensional depth convolution and three-dimensional channel attention mechanism, so that the width of the network is increased, the image details can be recovered, and the matching precision in a pathological area is improved.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention
FIG. 2 is a schematic diagram of a feature extraction module in the present invention
FIG. 3 is a schematic diagram of a parallax calculation module in the present invention
Detailed Description
The invention discloses a pyramid binocular stereo matching method based on a multi-scale context attention mechanism, which fully considers context information in an image while ensuring detail information among image pixels; meanwhile, a disparity calculation module based on a separable convolution and channel attention mechanism is designed to calculate depth information to obtain a disparity map. The method is further illustrated below:
the method provides a self-adaptive feature extraction module for binocular stereo matching. Firstly, a full connection layer is adopted to carry out dimensionality reduction compression on the network, and a channel attention mechanism is utilized to process the compressed features to generate corresponding channel dimension weights. In order to make the network have better learning capabilities, convolutional layers are used instead of fully-connected layers. Since the PRELU function adopts linear operation in the negative part, the problem of neuron death of the ReLU function in the negative part can be solved. Therefore, the PRELU function is selected as the activation function to reserve more feature information;
improving the accuracy of stereo matching also requires improving the ability of multi-scale feature extraction, and a void space pyramid structure is adopted to construct a context information extraction module: the module consists of four convolution branches with different void rates and one global average pooling layer. The design can obtain information of different scales on four branches and can also obtain information of image levels through the global average pooling layer;
as the number of dimensions and the number of network layers increase, the disparity calculation module generates a large number of parameters. This increases the computation time of the network. In order to reduce network parameters and save calculation time, the parallax calculation module only keeps the links among the encoding and decoding structures, and for the interior of the encoding and decoding module, except for the added channel attention module, other jump links do not exist, and meanwhile, a three-dimensional separable convolution is adopted to construct a parallax calculation unit. And a three-dimensional channel attention mechanism is added in the parallax calculation module to ensure that the matching precision of the network is not reduced when the parameters are reduced.
Claims (3)
1. A pyramid binocular stereo matching method based on multi-scale information and an attention mechanism is characterized by comprising the following steps:
step one, acquiring a binocular image after parameter correction;
and step two, acquiring the weighted channel dimension characteristic by adopting a self-adaptive characteristic extraction module. The self-adaptive feature extraction module enables the network to obtain features of different scales by setting convolution kernels of different sizes in the same network layer, adopts the convolution layers to process the features after global pooling so as to improve the learning capability of the network, and uses a PReLU function as an activation function to retain more detailed information;
step three, adopting a cavity space pyramid pooling structure with four convolution branches with different cavity rates and a global average pooling layer as a context information extraction module to obtain multi-scale context information and global context information in the image as information of image dimensionality;
and step four, fusing the features in the step one with the features in the step two to construct a matching cost volume, processing the constructed matching cost volume by adopting a three-dimensional separable convolution and three-dimensional channel attention mechanism in a parallax calculation module to calculate depth information, reducing network parameters while ensuring matching precision, only keeping links among the coding and decoding structures by the parallax calculation module, and not having other jump links except the added channel attention module among the coding and decoding modules, thereby effectively reducing the calculation parameters of the model.
2. The pyramid binocular stereo matching method based on the multi-scale contextual attention mechanism as claimed in claim 1, wherein an adaptive feature extraction module and an image-based information extraction module are designed to achieve extraction of local features and global features.
3. The pyramid binocular stereo matching method based on the multi-scale contextual attention mechanism as claimed in claim 1, wherein a parallax computation module is constructed by using a three-dimensional depth convolution and a three-dimensional channel attention mechanism, and only the model does not have jump links inside a coding and decoding module, so that not only can the network width be increased, but also the image details can be favorably restored, thereby improving the matching accuracy in a pathological region, and reducing the computation parameters of the model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010707918.5A CN111833386A (en) | 2020-07-22 | 2020-07-22 | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010707918.5A CN111833386A (en) | 2020-07-22 | 2020-07-22 | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111833386A true CN111833386A (en) | 2020-10-27 |
Family
ID=72924570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010707918.5A Pending CN111833386A (en) | 2020-07-22 | 2020-07-22 | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111833386A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112991422A (en) * | 2021-04-27 | 2021-06-18 | 杭州云智声智能科技有限公司 | Stereo matching method and system based on void space pyramid pooling |
CN113222904A (en) * | 2021-04-21 | 2021-08-06 | 重庆邮电大学 | Concrete pavement crack detection method for improving PoolNet network structure |
CN114445480A (en) * | 2022-01-26 | 2022-05-06 | 安徽大学 | Transformer-based thermal infrared image stereo matching method and device |
CN115375930A (en) * | 2022-10-26 | 2022-11-22 | 中国航发四川燃气涡轮研究院 | Stereo matching network and stereo matching method based on multi-scale information |
CN116128946A (en) * | 2022-12-09 | 2023-05-16 | 东南大学 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
CN118570492A (en) * | 2024-07-25 | 2024-08-30 | 长春工程学院 | Depth stereo matching method based on PSMNet optimized feature extraction |
-
2020
- 2020-07-22 CN CN202010707918.5A patent/CN111833386A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222904A (en) * | 2021-04-21 | 2021-08-06 | 重庆邮电大学 | Concrete pavement crack detection method for improving PoolNet network structure |
CN112991422A (en) * | 2021-04-27 | 2021-06-18 | 杭州云智声智能科技有限公司 | Stereo matching method and system based on void space pyramid pooling |
CN114445480A (en) * | 2022-01-26 | 2022-05-06 | 安徽大学 | Transformer-based thermal infrared image stereo matching method and device |
CN115375930A (en) * | 2022-10-26 | 2022-11-22 | 中国航发四川燃气涡轮研究院 | Stereo matching network and stereo matching method based on multi-scale information |
CN115375930B (en) * | 2022-10-26 | 2023-05-05 | 中国航发四川燃气涡轮研究院 | Three-dimensional matching network and three-dimensional matching method based on multi-scale information |
CN116128946A (en) * | 2022-12-09 | 2023-05-16 | 东南大学 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
CN116128946B (en) * | 2022-12-09 | 2024-02-09 | 东南大学 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
CN118570492A (en) * | 2024-07-25 | 2024-08-30 | 长春工程学院 | Depth stereo matching method based on PSMNet optimized feature extraction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111833386A (en) | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism | |
CN112150521B (en) | Image stereo matching method based on PSMNet optimization | |
CN111259945B (en) | Binocular parallax estimation method introducing attention map | |
CN112435282B (en) | Real-time binocular stereo matching method based on self-adaptive candidate parallax prediction network | |
CN111860693A (en) | Lightweight visual target detection method and system | |
CN111508013B (en) | Stereo matching method | |
CN102903096B (en) | Monocular video based object depth extraction method | |
CN113870335B (en) | Monocular depth estimation method based on multi-scale feature fusion | |
US8406512B2 (en) | Stereo matching method based on image intensity quantization | |
CN109389667B (en) | High-efficiency global illumination drawing method based on deep learning | |
CN113592026A (en) | Binocular vision stereo matching method based on void volume and cascade cost volume | |
Dai et al. | Adaptive disparity candidates prediction network for efficient real-time stereo matching | |
CN112016237A (en) | Deep learning method, device and system for lithium battery life prediction | |
CN111062395A (en) | Real-time video semantic segmentation method | |
CN110022422B (en) | Video frame sequence generation method based on dense connection network | |
CN113362242B (en) | Image restoration method based on multi-feature fusion network | |
CN115578426A (en) | Indoor service robot repositioning method based on dense feature matching | |
CN115239564A (en) | Mine image super-resolution reconstruction method combining semantic information | |
CN113887568A (en) | Anisotropic convolution binocular image stereo matching method | |
CN117152580A (en) | Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method | |
CN112489097A (en) | Stereo matching method based on mixed 2D convolution and pseudo 3D convolution | |
CN116434035A (en) | Target detection method of binary neural network model and hardware acceleration method thereof | |
CN111311698A (en) | Image compression method and system for multi-scale target | |
CN114494284B (en) | Scene analysis model and method based on explicit supervision area relation | |
CN113225552B (en) | Intelligent rapid interframe coding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201027 |