CN114581505A - Convolution-based binocular stereo matching network structure - Google Patents

Convolution-based binocular stereo matching network structure Download PDF

Info

Publication number
CN114581505A
CN114581505A CN202210070978.XA CN202210070978A CN114581505A CN 114581505 A CN114581505 A CN 114581505A CN 202210070978 A CN202210070978 A CN 202210070978A CN 114581505 A CN114581505 A CN 114581505A
Authority
CN
China
Prior art keywords
parallax
image
deconvolution
convolution
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210070978.XA
Other languages
Chinese (zh)
Other versions
CN114581505B (en
Inventor
鲍伟
沈浩然
徐玉华
孟周
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202210070978.XA priority Critical patent/CN114581505B/en
Publication of CN114581505A publication Critical patent/CN114581505A/en
Application granted granted Critical
Publication of CN114581505B publication Critical patent/CN114581505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a convolution-based binocular stereo matching network structure which comprises a feature extraction module, a coarse parallax value generation module, a parallax range prediction module, a cost space construction module, a coarse parallax image generation module and a fine parallax image generation module, wherein the feature extraction module is used for extracting a feature of a binocular stereo matching network structure; the characteristic extraction module is used for extracting characteristic data of an input image, processing the characteristic data and outputting a first characteristic image of the corresponding input image; the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image. According to the binocular stereo matching network structure based on convolution, the binocular image pair is used as input, the parallax image is directly output through the binocular stereo matching network, end-to-end network structure design is achieved, post-processing operations of a traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, are eliminated, and efficiency is greatly improved.

Description

Convolution-based binocular stereo matching network structure
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a binocular stereo matching network structure based on convolution.
Background
Computer vision is a subject of research that uses computers to simulate the human visual system.
Depth estimation from one or more RGB images is a long standing research-oriented problem with applications in various fields such as robotics, autopilot, object recognition and scene understanding, 3D modeling and animation, augmented reality, industrial control and medical diagnostics.
The binocular stereo matching technology is one of the cores in the field of computer vision, and is to shoot two RGB images by adopting two cameras positioned on the same horizontal line, find the corresponding relation of pixels in the images and obtain the depth by a triangulation principle.
The traditional binocular stereo matching is generally divided into four steps of matching cost calculation, cost aggregation, parallax calculation and post-processing. However, the conventional stereo matching method has a poor matching result for an occlusion region, a weak texture or a region with a repeated texture, and is sensitive to illumination, contrast and noise.
In recent years, it has been receiving wide attention that learning a strong representation of data by CNN based on a deep learning stereo matching method can also achieve a good effect, such as the MC-CNN method. However, the stereo matching method based on CNN uses the matching calculation result obtained by CNN to initialize the matching cost, and then still performs the same steps as the conventional stereo matching method, which is complicated in process.
Disclosure of Invention
The present invention is directed to solving the above problems by providing a convolution-based binocular stereo matching network structure.
The invention achieves the above purpose through the following technical scheme:
a binocular stereo matching network structure based on convolution, comprising:
the characteristic extraction module is used for extracting characteristic data input into a binocular image pair, processing the characteristic data and outputting a first characteristic image of a corresponding input image;
the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image;
the parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting the parallax range interval of each pixel point;
the cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
and the fine parallax image generation module is used for acquiring the coarse parallax image, processing the coarse parallax image and outputting the parallax image corresponding to the binocular image pair.
As a further optimization scheme of the present invention, the feature extraction module includes a first convolution unit provided with three two-dimensional convolutions, a residual error structural unit provided with four residual error modules, a second convolution unit provided with four two-dimensional convolutions, a first deconvolution unit provided with three deconvolution modules, a third convolution unit provided with three two-dimensional convolutions, and a second deconvolution unit provided with three deconvolution modules;
wherein,
the first convolution unit processes the input binocular image pair, and then the processed binocular image pair is processed by the residual error structure unit, the second convolution unit, the first deconvolution unit, the third convolution unit and the second deconvolution unit, and the second deconvolution unit outputs the first characteristic image.
As a further optimization scheme of the present invention, the input and output of each of the four residual error modules are used as the input of the next adjacent residual error module;
the output of each deconvolution module of the three deconvolution modules in the first deconvolution unit is used as the input of the next adjacent deconvolution module;
and the output of each deconvolution module of the three deconvolution modules in the second deconvolution unit is used as the input of the next adjacent deconvolution module, and the last deconvolution module in the second deconvolution unit outputs the first characteristic image.
As a further optimization scheme of the present invention, the coarse disparity value generating module comprises:
the parallax initialization unit is used for initializing N parallax values for each pixel point of the first characteristic image randomly in an initial parallax search range;
the parallax transmission unit is used for transmitting the random initialized parallax value of each pixel point in the horizontal and vertical directions, so that each pixel point has 5 multiplied by N random parallax values;
and the parallax evaluation unit is used for respectively calculating the matching similarity of each pixel point to the 5 multiplied by N random parallax values and selecting the parallax value with the highest matching similarity as the rough parallax value of the pixel point.
As a further optimization scheme of the present invention, the parallax range prediction module includes a first three-dimensional convolution unit provided with three-dimensional convolutions and a first three-dimensional deconvolution unit provided with three-dimensional deconvolution, the first three-dimensional convolution unit obtains a rough parallax value of a first feature image and each pixel point thereof, processes the rough parallax value, and outputs a range section in which the pixel point parallax is located through a last three-dimensional deconvolution of the first three-dimensional deconvolution unit;
wherein,
the output of each of the three-dimensional convolutions is used as the input of the next adjacent three-dimensional convolution;
the output of each of the three-dimensional deconvolution is taken as the input of the next adjacent three-dimensional deconvolution.
As a further optimization scheme of the present invention, the cost space construction module includes a first encapsulation layer, and is configured to encapsulate the parallax range interval of the first feature image and each pixel thereof into a four-dimensional cost space in a channel dimension.
As a further optimization of the present invention, the coarse parallax image generation module includes:
the first coding and decoding structure unit is used for acquiring a four-dimensional cost space, processing the four-dimensional cost space and outputting a second characteristic image corresponding to a binocular image pair;
and the rough parallax regression unit is used for acquiring a second characteristic image, processing the second characteristic image and outputting a rough parallax image with the same scale as the second characteristic image.
As a further optimization of the present invention, the fine parallax image generation module includes:
the second packaging layer is used for acquiring the first characteristic image, the second characteristic image and the rough parallax image and packaging the first characteristic image, the second characteristic image and the rough parallax image into a third characteristic image in a channel dimension;
the fourth convolution unit is used for acquiring and processing the third characteristic image and outputting a fine parallax image with the same scale as the coarse parallax image;
and the parallax image normalizing unit is used for acquiring the fine parallax image and performing interpolation up-sampling processing on the fine parallax image to obtain the parallax image with the same size as the binocular image.
The invention has the beneficial effects that:
the invention takes the binocular image pair as input, directly outputs parallax images through the binocular stereo matching network, realizes the end-to-end network structure design, eliminates the post-processing operations of the traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, and greatly improves the efficiency.
Drawings
FIG. 1 is a block diagram of the overall architecture of the present invention;
FIG. 2 is a block diagram of the structure of the feature extraction module of the present invention;
fig. 3 is a block diagram of the coarse parallax image generation module according to the present invention;
fig. 4 is a block diagram of the fine parallax image generation module of the present invention.
Detailed Description
The present application will now be described in further detail with reference to the drawings, it should be noted that the following detailed description is given for illustrative purposes only and is not to be construed as limiting the scope of the present application, as those skilled in the art will be able to make numerous insubstantial modifications and adaptations to the present application based on the above disclosure.
Example 1
As shown in fig. 1, a convolution-based binocular stereo matching network structure includes a feature extraction module, a coarse disparity value generation module, a disparity range prediction module, a cost space construction module, a coarse disparity image generation module, and a fine disparity image generation module;
the characteristic extraction module is used for extracting characteristic data of the input image, processing the characteristic data and outputting a first characteristic image of the corresponding input image; wherein, the input images are a left image and a right image of a binocular image pair;
as shown in fig. 2, the feature extraction module includes a first convolution unit, a residual structure unit, a second convolution unit, a first deconvolution unit, a third convolution unit, and a second deconvolution unit, wherein,
the first convolution unit comprises three two-dimensional convolutions, and the first convolution unit processes an input image and outputs the processed image to the residual error structure unit; the sizes of convolution kernels of the two-dimensional convolution are all 3 multiplied by 3, the step lengths are respectively 2, 1 and 1, and the number of output characteristic channels of the first convolution unit is 32;
the residual error structure unit comprises four residual error modules, wherein the input and the output of each residual error module are used as the input of the next adjacent residual error module, and the residual error structure unit processes the output data of the first convolution unit and outputs the processed data to the second convolution unit; the sizes of convolution kernels of the residual error modules are all 3 multiplied by 3, the step lengths are respectively 1, 2 and 1, and the output characteristic channel numbers of the four residual error modules are respectively 32, 64, 128 and 128;
the second convolution unit comprises four two-dimensional convolutions, and the second convolution unit processes the output data of the residual error structure unit and outputs the processed output data to the first deconvolution unit; the sizes of convolution kernels of the two-dimensional convolutions are all 3 multiplied by 3, the step lengths are respectively 1, 2 and 2, and the output characteristic channels of the four two-dimensional convolutions are respectively 32, 48, 64 and 96;
the first deconvolution unit comprises three deconvolution modules, wherein the output of each deconvolution module is used as the input of the next adjacent deconvolution module, and the first deconvolution unit processes the output data of the second convolution unit and outputs the processed data to the third convolution unit; the sizes of deconvolution kernels of the deconvolution modules are all 4 multiplied by 4, the step lengths of the three deconvolution modules are all 2, and the number of output characteristic channels is 64, 48 and 32 respectively;
the third convolution unit comprises three two-dimensional convolutions, and outputs the data output by the first deconvolution unit to the second deconvolution unit after processing; the convolution kernel size of the two-dimensional convolution is 3 multiplied by 3, the step length is 2, and the number of output characteristic channels is 48, 64 and 96 respectively;
the second deconvolution unit comprises three deconvolution modules, wherein the output of each deconvolution module is used as the input of the next adjacent deconvolution module, and the last deconvolution module outputs the first characteristic image; the sizes of deconvolution kernels of the deconvolution modules are all 4 multiplied by 4, the step lengths of the three deconvolution modules are all 2, and the number of output characteristic channels is 64, 48 and 32 respectively.
The feature extraction module uses fewer residual error modules (4), the structure of the feature extraction module is simple, the speed of the network is improved, the network can still have a larger receptive field, and the first feature image with the output size of H/8 xW/8 xC (C is the number of feature channels) is used for constructing a cost space with a small size.
The rough parallax value generation module is used for acquiring a first characteristic image, processing the first characteristic image and outputting a rough parallax value of each pixel point of the first characteristic image;
the coarse parallax value generation module comprises a parallax initialization unit, a parallax transmission unit and a parallax evaluation unit, wherein,
the parallax initialization unit randomly initializes N parallax values for each pixel point in an initial parallax search range after acquiring the first characteristic image and acquires a random initialization parallax value for each pixel point, the parallax transmission unit transmits the random initialization parallax value for each pixel point in the horizontal and vertical directions to enable each pixel point to have 5 multiplied by N random parallax values, the parallax evaluation unit respectively calculates matching similarity for the 5 multiplied by N random parallax values of each pixel point, and the parallax value with the highest matching degree is selected as a rough parallax value of the pixel point.
The method specifically comprises the steps of averagely dividing an initial parallax search range into N intervals, randomly initializing 1 parallax value for each pixel point in each interval, obtaining N randomly initialized parallax values for each pixel point, then carrying out horizontal and vertical direction transmission on the randomly initialized parallax values of each pixel point through One-Hot coding through a parallax transmission unit, enabling each pixel point to have 5 multiplied by N random parallax values, finally carrying out point multiplication operation on a first feature image in a channel dimension through a parallax evaluation unit to calculate matching similarity, and selecting the parallax value with the highest matching similarity in each interval as a rough parallax value of the pixel point.
The parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting a parallax range interval of each pixel point;
the parallax range prediction module comprises a first three-dimensional convolution unit and a first three-dimensional deconvolution unit, wherein the first three-dimensional convolution unit acquires the rough parallax value of the first characteristic image and each pixel point thereof, processes the rough parallax value and inputs the rough parallax value into the first three-dimensional deconvolution unit,
the first three-dimensional convolution unit comprises three-dimensional convolutions, and the output of each three-dimensional convolution is used as the input of the next adjacent three-dimensional convolution; the sizes of convolution kernels of the three-dimensional convolutions are all 3 multiplied by 3, and the step lengths of the three-dimensional convolutions are all 2;
the first three-dimensional deconvolution unit comprises three-dimensional deconvolution, the convolution kernel sizes of the three-dimensional deconvolution are all 3 multiplied by 3, the step lengths are all (1, 2, 2), the output of each three-dimensional deconvolution is used as the input of the next adjacent three-dimensional deconvolution, and the range interval where the pixel parallax is located is output by the last three-dimensional deconvolution.
The rough parallax value of the pixel point is firstly obtained, the small range interval where the pixel point parallax is located is obtained according to the rough parallax value, then the first feature image is utilized to construct a cost space with a small size according to the small range interval of the pixel point parallax, the parallax searching range of the pixel point is reduced, the calculated amount of the network is greatly reduced, the prediction accuracy of the network is guaranteed, and meanwhile the prediction speed of the network is improved.
The cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the cost space construction module comprises a first packaging layer, and the first packaging layer packages the parallax range interval of the first characteristic image and each pixel point of the first characteristic image into a four-dimensional cost space in the channel dimension.
The rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
as shown in fig. 3, the coarse parallax image generation module includes a first encoding/decoding structure unit and a coarse parallax regression unit, wherein,
the first coding and decoding structure unit acquires a four-dimensional cost space, processes the four-dimensional cost space and outputs a second characteristic image corresponding to the input image, and the coarse parallax regression unit acquires the second characteristic image, processes the second characteristic image and outputs a coarse parallax image with the same scale as the second characteristic image;
the first coding and decoding structure unit comprises three-dimensional convolution modules and three-dimensional deconvolution modules, the convolution kernel sizes of the three-dimensional convolution modules are all 3 multiplied by 3, the step sizes are all 2, the convolution kernel sizes of the three-dimensional deconvolution modules are all 3 multiplied by 3, and the step sizes are all (1, 2 and 2).
Specifically, the input of the first coding and decoding structure unit is a four-dimensional cost space, the output of the first coding and decoding structure unit is a second feature image corresponding to the input image, a coarse parallax regression unit is arranged, and the coarse parallax regression unit performs Softmax operation on the output of the first coding and decoding structure unit on the channel dimension to output a coarse parallax image with the same scale as the second feature image.
The coarse parallax image generation module is simple in structure (3 three-dimensional volumes and 3 three-dimensional deconvolution), and the calculation amount of the network is reduced and the network speed is increased by using a small number of three-dimensional volumes and three-dimensional deconvolution.
As shown in fig. 4, the fine parallax image generation module is configured to obtain the coarse parallax image, process the coarse parallax image, and output a parallax image corresponding to the input image;
the fine parallax image generation module includes a second encapsulation layer, a fourth convolution unit, and a parallax map normalization unit, wherein,
the second packaging layer obtains the first characteristic image, the second characteristic image and the coarse parallax image and packages the first characteristic image, the second characteristic image and the coarse parallax image in a channel dimension packaging layer to obtain a third characteristic image;
a fourth convolution unit acquires and processes the third characteristic image and outputs a fine image with the same scale as the coarse parallax image; the fourth convolution unit comprises seven two-dimensional convolutions, the sizes of convolution kernels are all 3 multiplied by 3, and the step sizes are all 1;
the disparity map normalizing unit acquires the fine disparity image and performs interpolation up-sampling processing on the fine disparity image to obtain the disparity image with the same scale as the input image.
And the fine parallax image generation module takes the first characteristic image and the second characteristic image output by the characteristic extraction module as guide images, abandons a complex residual error module, and outputs the fine parallax image through a fourth convolution unit (7 two-dimensional convolutions) with a simple structure, so that the network speed is increased.
It should be noted that the binocular image pair is used as input, and the parallax image is directly output through the binocular stereo matching network, so that the end-to-end network structure design is realized, the post-processing operations of the traditional binocular stereo matching method, such as interpolation, filtering, sub-pixel enhancement and the like, are eliminated, and the efficiency is greatly improved.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1. A binocular stereo matching network structure based on convolution is characterized by comprising:
the characteristic extraction module is used for extracting characteristic data input into a binocular image pair, processing the characteristic data and outputting a first characteristic image of a corresponding input image;
the rough parallax value generation module is used for acquiring the first characteristic image, processing the first characteristic image and outputting the rough parallax value of each pixel point of the first characteristic image;
the parallax range prediction module is used for acquiring the first characteristic image and the rough parallax value of each pixel point of the first characteristic image, processing the first characteristic image and the rough parallax value and outputting the parallax range interval of each pixel point;
the cost space construction module is used for acquiring the first characteristic image and the parallax range interval of each pixel point of the first characteristic image, processing the first characteristic image and outputting a four-dimensional cost space of the first characteristic image in the parallax range interval;
the rough parallax image generation module is used for acquiring a four-dimensional cost space, and outputting a rough parallax image with a four-dimensional cost space scale after processing;
and the fine parallax image generation module is used for acquiring the coarse parallax image, processing the coarse parallax image and outputting the parallax image corresponding to the binocular image pair.
2. The binocular stereo matching network structure based on convolution of claim 1, wherein: the feature extraction module comprises a first convolution unit provided with three two-dimensional convolutions, a residual error structure unit provided with four residual error modules, a second convolution unit provided with four two-dimensional convolutions, a first deconvolution unit provided with three deconvolution modules, a third convolution unit provided with three two-dimensional convolutions and a second deconvolution unit provided with three deconvolution modules;
wherein,
the first convolution unit processes the input binocular image pair, and then the processed binocular image pair is processed by the residual error structure unit, the second convolution unit, the first deconvolution unit, the third convolution unit and the second deconvolution unit, and the second deconvolution unit outputs the first characteristic image.
3. The binocular stereo matching network structure based on convolution of claim 2, wherein: the input and output of each of the four residual modules are used as the input of the next adjacent residual module;
the output of each deconvolution module of the three deconvolution modules in the first deconvolution unit is used as the input of the next adjacent deconvolution module;
and the output of each deconvolution module of the three deconvolution modules in the second deconvolution unit is used as the input of the next adjacent deconvolution module, and the last deconvolution module in the second deconvolution unit outputs the first characteristic image.
4. The binocular disparity value matching network structure based on convolution of claim 3, wherein the coarse disparity value generating module comprises:
the parallax initialization unit is used for initializing N parallax values for each pixel point of the first characteristic image randomly in an initial parallax search range;
the parallax transmission unit is used for transmitting the random initialized parallax value of each pixel point in the horizontal and vertical directions, so that each pixel point has 5 multiplied by N random parallax values;
and the parallax evaluation unit is used for respectively calculating the matching similarity of each pixel point to the 5 multiplied by N random parallax values and selecting the parallax value with the highest matching similarity as the rough parallax value of the pixel point.
5. The binocular stereo matching network structure based on convolution of claim 4, wherein: the parallax range prediction module comprises a first three-dimensional convolution unit provided with three-dimensional convolutions and a first three-dimensional deconvolution unit provided with three-dimensional deconvolution, wherein the first three-dimensional convolution unit acquires a first characteristic image and a rough parallax value of each pixel point of the first characteristic image, processes the rough parallax value and outputs a range section where the pixel point parallax is located through the last three-dimensional deconvolution of the first three-dimensional deconvolution unit;
wherein,
the output of each of the three-dimensional convolutions is used as the input of the next adjacent three-dimensional convolution;
the output of each of the three-dimensional deconvolution is taken as the input of the next adjacent three-dimensional deconvolution.
6. The binocular stereo matching network structure based on convolution of claim 1, wherein: the cost space construction module comprises a first packaging layer and is used for packaging the first characteristic image and the parallax range interval of each pixel point of the first characteristic image into a four-dimensional cost space in a channel dimension.
7. The binocular stereo matching network structure based on convolution of claim 1, wherein: the coarse parallax image generation module includes:
the first coding and decoding structure unit is used for acquiring a four-dimensional cost space, processing the four-dimensional cost space and outputting a second characteristic image corresponding to a binocular image pair;
and the rough parallax regression unit is used for acquiring a second characteristic image, processing the second characteristic image and outputting a rough parallax image with the same scale as the second characteristic image.
8. The binocular stereo matching network structure based on convolution of claim 1, wherein: the fine parallax image generation module includes:
the second packaging layer is used for acquiring the first characteristic image, the second characteristic image and the coarse parallax image and packaging the first characteristic image, the second characteristic image and the coarse parallax image into a third characteristic image in a channel dimension;
the fourth convolution unit is used for acquiring and processing the third characteristic image and outputting a fine parallax image with the same scale as the coarse parallax image;
and the parallax image normalizing unit is used for acquiring the fine parallax image and performing interpolation up-sampling processing on the fine parallax image to obtain the parallax image with the same size as the binocular image.
CN202210070978.XA 2022-01-21 2022-01-21 Binocular stereo matching network system based on convolution Active CN114581505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210070978.XA CN114581505B (en) 2022-01-21 2022-01-21 Binocular stereo matching network system based on convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210070978.XA CN114581505B (en) 2022-01-21 2022-01-21 Binocular stereo matching network system based on convolution

Publications (2)

Publication Number Publication Date
CN114581505A true CN114581505A (en) 2022-06-03
CN114581505B CN114581505B (en) 2024-07-09

Family

ID=81772923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210070978.XA Active CN114581505B (en) 2022-01-21 2022-01-21 Binocular stereo matching network system based on convolution

Country Status (1)

Country Link
CN (1) CN114581505B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009691A (en) * 2019-03-28 2019-07-12 北京清微智能科技有限公司 Based on the matched anaglyph generation method of binocular stereo vision and system
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks
WO2020182117A1 (en) * 2019-03-12 2020-09-17 腾讯科技(深圳)有限公司 Method, apparatus, and device for obtaining disparity map, control system, and storage medium
CN112132201A (en) * 2020-09-17 2020-12-25 长春理工大学 Non-end-to-end stereo matching method based on convolutional neural network
WO2021138992A1 (en) * 2020-01-10 2021-07-15 大连理工大学 Disparity estimation optimization method based on up-sampling and accurate rematching
CN113313740A (en) * 2021-05-17 2021-08-27 北京航空航天大学 Disparity map and surface normal vector joint learning method based on plane continuity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020182117A1 (en) * 2019-03-12 2020-09-17 腾讯科技(深圳)有限公司 Method, apparatus, and device for obtaining disparity map, control system, and storage medium
CN110009691A (en) * 2019-03-28 2019-07-12 北京清微智能科技有限公司 Based on the matched anaglyph generation method of binocular stereo vision and system
CN110533712A (en) * 2019-08-26 2019-12-03 北京工业大学 A kind of binocular solid matching process based on convolutional neural networks
WO2021138992A1 (en) * 2020-01-10 2021-07-15 大连理工大学 Disparity estimation optimization method based on up-sampling and accurate rematching
CN112132201A (en) * 2020-09-17 2020-12-25 长春理工大学 Non-end-to-end stereo matching method based on convolutional neural network
CN113313740A (en) * 2021-05-17 2021-08-27 北京航空航天大学 Disparity map and surface normal vector joint learning method based on plane continuity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
习路;陆济湘;涂婷;: "基于多尺度卷积神经网络的立体匹配方法", 计算机工程与设计, no. 09, 16 September 2018 (2018-09-16) *

Also Published As

Publication number Publication date
CN114581505B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN110533712B (en) Binocular stereo matching method based on convolutional neural network
Shin et al. Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN109598754B (en) Binocular depth estimation method based on depth convolution network
US11348270B2 (en) Method for stereo matching using end-to-end convolutional neural network
CN111259945B (en) Binocular parallax estimation method introducing attention map
US20220178688A1 (en) Method and apparatus for binocular ranging
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN110148181A (en) A kind of general binocular solid matching process
CN113962858B (en) Multi-view depth acquisition method
CN109902702A (en) The method and apparatus of target detection
CN110910437B (en) Depth prediction method for complex indoor scene
CN112509021A (en) Parallax optimization method based on attention mechanism
Leite et al. Exploiting motion perception in depth estimation through a lightweight convolutional neural network
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN113762267A (en) Multi-scale binocular stereo matching method and device based on semantic association
CN117152580A (en) Binocular stereoscopic vision matching network construction method and binocular stereoscopic vision matching method
CN117851630A (en) Method and apparatus with neural scene representation data
CN108830890B (en) Method for estimating scene geometric information from single image by using generative countermeasure network
CN117745596B (en) Cross-modal fusion-based underwater de-blocking method
CN114092540A (en) Attention mechanism-based light field depth estimation method and computer readable medium
Jia et al. Multi-scale cost volumes cascade network for stereo matching
Hyun et al. Hardware-friendly architecture for a pseudo 2D weighted median filter based on sparse-window approach
CN114581505B (en) Binocular stereo matching network system based on convolution
CN115035173B (en) Monocular depth estimation method and system based on inter-frame correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant