CN111833386A - Pyramid binocular stereo matching method based on multi-scale information and attention mechanism - Google Patents

Pyramid binocular stereo matching method based on multi-scale information and attention mechanism Download PDF

Info

Publication number
CN111833386A
CN111833386A CN202010707918.5A CN202010707918A CN111833386A CN 111833386 A CN111833386 A CN 111833386A CN 202010707918 A CN202010707918 A CN 202010707918A CN 111833386 A CN111833386 A CN 111833386A
Authority
CN
China
Prior art keywords
attention mechanism
information
pyramid
module
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010707918.5A
Other languages
Chinese (zh)
Inventor
郑秋梅
温阳
王风华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202010707918.5A priority Critical patent/CN111833386A/en
Publication of CN111833386A publication Critical patent/CN111833386A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision. The method comprises the steps of firstly obtaining information among pixels from an original image by adopting a channel attention mechanism with convolution kernels of different sizes, then expanding a receptive field by adopting a hollow space pyramid module to obtain multi-scale information, and finally calculating parallax by using a three-dimensional channel attention mechanism and a stacked separable coding and decoding structure to obtain depth information. The depth test of the corrected binocular image is carried out by using the method, and the result shows that the algorithm can not only improve the matching accuracy, but also reduce the parameter quantity and the calculated quantity of the model, and shorten the running time.

Description

Pyramid binocular stereo matching method based on multi-scale information and attention mechanism
Technical Field
The invention relates to a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism, and belongs to the field of binocular vision in computer vision.
Background
As a research hotspot in the field of computer vision, binocular stereo matching is widely applied in the fields of three-dimensional reconstruction, automatic driving, mobile robots and the like. For a set of corrected stereo images captured by a binocular camera, the essence of stereo matching is to compute the disparity of each pixel in the image. Generally, stereo matching algorithms are mainly classified into two types, one is a conventional algorithm, and the other is a convolutional neural network-based method. The development of the traditional stereo matching method is limited by adopting a method of manually selecting features. With the development of deep learning, the convolutional neural network exhibits strong computing power and feature extraction capability. Therefore, current research is mainly focused on neural network-based approaches. However, how to improve the information extraction capability of the network and achieve obtaining an accurate disparity map in a pathological region (such as a weak texture region, a reflective surface, etc.) still has certain difficulties.
Disclosure of Invention
Aiming at the problems, the invention provides a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism. The method comprises three modules, namely an adaptive feature extraction module, a context information extraction module and a parallax calculation module. In order to achieve the purpose, the technical scheme of the invention is as follows:
a pyramid binocular stereo matching method based on multi-scale information and an attention mechanism comprises the following steps:
step one, acquiring a binocular image after parameter correction;
and step two, acquiring the weighted channel dimension characteristic by adopting a self-adaptive characteristic extraction module. The self-adaptive feature extraction module takes a residual network block as a basis, increases a channel attention module with multiple convolution kernels, enables the network to obtain features with different scales, can improve the weight occupied by the features with rich effective information, is beneficial to improving the subsequent matching precision, adopts the convolution layer to process the features after global pooling to improve the learning capacity of the network, and uses a PReLU function as an activation function to retain more detailed information;
step three, adopting a cavity space pyramid pooling structure with four convolution branches with different cavity rates and a global average pooling layer as a context information extraction module to obtain multi-scale context information and global context information in the image as information of image dimensionality, so as to improve the accuracy of the network in processing the ill-conditioned area;
step four, fusing the features in the step one with the features in the step two to construct a matching cost volume, calculating depth information by adopting the matching cost volume constructed by three-dimensional separable convolution processing in a parallax calculation module, wherein the parallax calculation module only reserves links among all encoding and decoding structures, and for the situation that no jump links exist among the encoding and decoding modules except for an added channel attention module, parameters of a network can be effectively reduced; meanwhile, in order to ensure that the matching precision is not lost under the condition of reducing parameters, a three-dimensional channel attention mechanism is added to the parallax calculation module.
Has the advantages that:
the invention provides a novel end-to-end stereo matching network. The extraction of local features and global features is realized by designing a self-adaptive feature extraction module and a multi-scale information extraction module. And then, a parallax calculation module is constructed by utilizing a three-dimensional depth convolution and three-dimensional channel attention mechanism, so that the width of the network is increased, the image details can be recovered, and the matching precision in a pathological area is improved.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention
FIG. 2 is a schematic diagram of a feature extraction module in the present invention
FIG. 3 is a schematic diagram of a parallax calculation module in the present invention
Detailed Description
The invention discloses a pyramid binocular stereo matching method based on a multi-scale context attention mechanism, which fully considers context information in an image while ensuring detail information among image pixels; meanwhile, a disparity calculation module based on a separable convolution and channel attention mechanism is designed to calculate depth information to obtain a disparity map. The method is further illustrated below:
the method provides a self-adaptive feature extraction module for binocular stereo matching. Firstly, a full connection layer is adopted to carry out dimensionality reduction compression on the network, and a channel attention mechanism is utilized to process the compressed features to generate corresponding channel dimension weights. In order to make the network have better learning capabilities, convolutional layers are used instead of fully-connected layers. Since the PRELU function adopts linear operation in the negative part, the problem of neuron death of the ReLU function in the negative part can be solved. Therefore, the PRELU function is selected as the activation function to reserve more feature information;
improving the accuracy of stereo matching also requires improving the ability of multi-scale feature extraction, and a void space pyramid structure is adopted to construct a context information extraction module: the module consists of four convolution branches with different void rates and one global average pooling layer. The design can obtain information of different scales on four branches and can also obtain information of image levels through the global average pooling layer;
as the number of dimensions and the number of network layers increase, the disparity calculation module generates a large number of parameters. This increases the computation time of the network. In order to reduce network parameters and save calculation time, the parallax calculation module only keeps the links among the encoding and decoding structures, and for the interior of the encoding and decoding module, except for the added channel attention module, other jump links do not exist, and meanwhile, a three-dimensional separable convolution is adopted to construct a parallax calculation unit. And a three-dimensional channel attention mechanism is added in the parallax calculation module to ensure that the matching precision of the network is not reduced when the parameters are reduced.

Claims (3)

1. A pyramid binocular stereo matching method based on multi-scale information and an attention mechanism is characterized by comprising the following steps:
step one, acquiring a binocular image after parameter correction;
and step two, acquiring the weighted channel dimension characteristic by adopting a self-adaptive characteristic extraction module. The self-adaptive feature extraction module enables the network to obtain features of different scales by setting convolution kernels of different sizes in the same network layer, adopts the convolution layers to process the features after global pooling so as to improve the learning capability of the network, and uses a PReLU function as an activation function to retain more detailed information;
step three, adopting a cavity space pyramid pooling structure with four convolution branches with different cavity rates and a global average pooling layer as a context information extraction module to obtain multi-scale context information and global context information in the image as information of image dimensionality;
and step four, fusing the features in the step one with the features in the step two to construct a matching cost volume, processing the constructed matching cost volume by adopting a three-dimensional separable convolution and three-dimensional channel attention mechanism in a parallax calculation module to calculate depth information, reducing network parameters while ensuring matching precision, only keeping links among the coding and decoding structures by the parallax calculation module, and not having other jump links except the added channel attention module among the coding and decoding modules, thereby effectively reducing the calculation parameters of the model.
2. The pyramid binocular stereo matching method based on the multi-scale contextual attention mechanism as claimed in claim 1, wherein an adaptive feature extraction module and an image-based information extraction module are designed to achieve extraction of local features and global features.
3. The pyramid binocular stereo matching method based on the multi-scale contextual attention mechanism as claimed in claim 1, wherein a parallax computation module is constructed by using a three-dimensional depth convolution and a three-dimensional channel attention mechanism, and only the model does not have jump links inside a coding and decoding module, so that not only can the network width be increased, but also the image details can be favorably restored, thereby improving the matching accuracy in a pathological region, and reducing the computation parameters of the model.
CN202010707918.5A 2020-07-22 2020-07-22 Pyramid binocular stereo matching method based on multi-scale information and attention mechanism Pending CN111833386A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010707918.5A CN111833386A (en) 2020-07-22 2020-07-22 Pyramid binocular stereo matching method based on multi-scale information and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010707918.5A CN111833386A (en) 2020-07-22 2020-07-22 Pyramid binocular stereo matching method based on multi-scale information and attention mechanism

Publications (1)

Publication Number Publication Date
CN111833386A true CN111833386A (en) 2020-10-27

Family

ID=72924570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010707918.5A Pending CN111833386A (en) 2020-07-22 2020-07-22 Pyramid binocular stereo matching method based on multi-scale information and attention mechanism

Country Status (1)

Country Link
CN (1) CN111833386A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991422A (en) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 Stereo matching method and system based on void space pyramid pooling
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information
CN116128946A (en) * 2022-12-09 2023-05-16 东南大学 Binocular infrared depth estimation method based on edge guiding and attention mechanism

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN112991422A (en) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 Stereo matching method and system based on void space pyramid pooling
CN115375930A (en) * 2022-10-26 2022-11-22 中国航发四川燃气涡轮研究院 Stereo matching network and stereo matching method based on multi-scale information
CN115375930B (en) * 2022-10-26 2023-05-05 中国航发四川燃气涡轮研究院 Three-dimensional matching network and three-dimensional matching method based on multi-scale information
CN116128946A (en) * 2022-12-09 2023-05-16 东南大学 Binocular infrared depth estimation method based on edge guiding and attention mechanism
CN116128946B (en) * 2022-12-09 2024-02-09 东南大学 Binocular infrared depth estimation method based on edge guiding and attention mechanism

Similar Documents

Publication Publication Date Title
CN111833386A (en) Pyramid binocular stereo matching method based on multi-scale information and attention mechanism
CN112150521B (en) Image stereo matching method based on PSMNet optimization
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN109271933B (en) Method for estimating three-dimensional human body posture based on video stream
CN109377530A (en) A kind of binocular depth estimation method based on deep neural network
CN111860693A (en) Lightweight visual target detection method and system
CN112435282B (en) Real-time binocular stereo matching method based on self-adaptive candidate parallax prediction network
CN102903096B (en) Monocular video based object depth extraction method
CN111582483B (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN109741383A (en) Picture depth estimating system and method based on empty convolution sum semi-supervised learning
CN111508013B (en) Stereo matching method
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
US20120195493A1 (en) Stereo matching method based on image intensity quantization
CN110022422B (en) Video frame sequence generation method based on dense connection network
CN112016237A (en) Deep learning method, device and system for lithium battery life prediction
CN111062395A (en) Real-time video semantic segmentation method
CN113362242B (en) Image restoration method based on multi-feature fusion network
CN111553296A (en) Two-value neural network stereo vision matching method based on FPGA
CN116229222A (en) Light field saliency target detection method and device based on implicit graph learning
CN114529793A (en) Depth image restoration system and method based on gating cycle feature fusion
CN113887568A (en) Anisotropic convolution binocular image stereo matching method
CN112270701A (en) Packet distance network-based parallax prediction method, system and storage medium
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
CN114821368A (en) Power defect detection method based on reinforcement learning and Transformer
CN114595814A (en) Model training method, processing chip and edge terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201027