CN116128946A - Binocular infrared depth estimation method based on edge guiding and attention mechanism - Google Patents

Binocular infrared depth estimation method based on edge guiding and attention mechanism Download PDF

Info

Publication number
CN116128946A
CN116128946A CN202211588573.1A CN202211588573A CN116128946A CN 116128946 A CN116128946 A CN 116128946A CN 202211588573 A CN202211588573 A CN 202211588573A CN 116128946 A CN116128946 A CN 116128946A
Authority
CN
China
Prior art keywords
depth
map
characteristic
edge
characteristic diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211588573.1A
Other languages
Chinese (zh)
Other versions
CN116128946B (en
Inventor
耿可可
王金虎
殷国栋
汤文成
成小龙
孙宇啸
丁鹏博
王子威
柳志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211588573.1A priority Critical patent/CN116128946B/en
Publication of CN116128946A publication Critical patent/CN116128946A/en
Application granted granted Critical
Publication of CN116128946B publication Critical patent/CN116128946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a binocular infrared depth estimation method based on edge guiding and attention mechanism, relates to the technical field of binocular vision in computer vision, and solves the technical problem of low depth estimation precision caused by the defects of unclear texture, blurred edge, unobvious characteristics and the like of an infrared image; constructing a mixed attention module in the high-dimensional feature map to acquire depth association of different channels and spatial positions among the features to be matched, and promoting the effective depth reasoning of a subsequent network; meanwhile, an edge guiding module is introduced, an edge-depth joint loss function is constructed to generate a foreground depth map with clear edges, smooth depth and no depth cavity, and the possibility is provided for an intelligent body to maintain normal operation in a low-illumination environment.

Description

Binocular infrared depth estimation method based on edge guiding and attention mechanism
Technical Field
The application relates to the technical field of binocular vision in computer vision, in particular to a binocular infrared depth estimation method based on edge guiding and attention mechanisms.
Background
As a research hotspot in the field of computer vision, binocular depth estimation has been widely used in the fields of three-dimensional reconstruction, autopilot, mobile robots, and the like. For a group of corrected stereo images captured by a binocular camera, the essence of depth estimation is to find a matching point corresponding to each pixel when the cost value is minimum, and take the parallax between the left matching point and the right matching point at this time as parallax output. Compared with the traditional depth estimation algorithm, the deep learning-based algorithm can effectively optimize the problem of inappropriateness in image depth estimation, and can learn and estimate the depth information of the shielding and weak texture areas by using priori knowledge. However, most of researches are only based on visible light images, and compared with a visible light camera limited by low illumination and severe environment, an infrared camera can still image through receiving infrared electromagnetic waves emitted by environmental objects under the low illumination environment, so that the purpose of sensing the surrounding environment is achieved. While it is possible to grasp the characteristic of the infrared camera and develop a depth estimation research based on the infrared image, on the other hand, the research of the infrared image has certain difficulty due to inherent defects such as unclear texture, blurred edge, unobvious characteristics and the like.
Disclosure of Invention
The application provides a binocular infrared depth estimation method based on an edge guiding and attention mechanism, which aims to make up for inherent defects of unclear texture, blurred edge, unobvious characteristics and the like of an infrared image, so that an intelligent agent can maintain normal operation in a low-illumination environment.
The technical aim of the application is achieved through the following technical scheme:
a binocular infrared depth estimation method based on edge steering and attention mechanisms, comprising:
s1: constructing a depth estimation network framework based on edge guidance;
s2: training the depth estimation network framework to obtain a first depth estimation network;
s3: verifying the first depth estimation network, completing verification when the accuracy of the first depth estimation network meets a preset threshold, otherwise, repeating the step S2;
s4: performing binocular infrared depth estimation through the first depth estimation network;
the depth estimation network framework comprises an image preprocessing module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module, a stacking hourglass module and an edge guiding module.
The beneficial effects of this application lie in: according to the binocular infrared depth estimation method based on the edge guiding and attention mechanism, an image preprocessing module based on gamma correction and median filtering is introduced to enhance image edge and detail information, so that more deep feature representations which can be mined are provided for a convolutional neural network; constructing a mixed attention module in the high-dimensional feature map to acquire depth association of different channels and spatial positions among the features to be matched, and promoting the effective depth reasoning of a subsequent network; meanwhile, an edge guiding module is introduced, an edge-depth joint loss function is constructed to generate a foreground depth map with clear edges, smooth depth and no depth cavity, and the possibility is provided for an intelligent body to maintain normal operation in a low-illumination environment.
Drawings
FIG. 1 is a flow chart of a method described herein;
FIG. 2 is a schematic diagram of a hybrid attention mechanism module;
fig. 3 is a schematic view of an edge guide module.
Detailed Description
The technical scheme of the application will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the binocular infrared depth estimation method based on the edge guiding and attention mechanism described in the application comprises the following steps:
s1: an edge-guided depth estimation network framework is constructed.
Specifically, the depth estimation network framework comprises an image preprocessing module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module, a stacking hourglass module and an edge guiding module.
The preprocessing of the image preprocessing module comprises the following steps:
(1) And (3) preprocessing operations based on gamma correction and median filtering are carried out on the binocular infrared images subjected to distortion correction and Bouguet polar correction, so as to obtain preprocessed images IML and IMR respectively.
After distortion correction and Bouguet polar correction, the left image and the right image in the binocular infrared image are subjected to preprocessing operation based on gamma correction and median filtering.
The gamma correction is to edit the gamma curve of the image to edit the nonlinear tone of the image, detect the dark color part and the light color part in the image signal, and increase the proportion of the dark color part and the light color part, thereby improving the contrast effect of the image. Gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
The basic principle of median filtering is to replace the value of a point in a digital image or digital sequence with the median of the values of points in a neighborhood of the point, so that the surrounding pixel values are close to the true value, thereby eliminating isolated noise points.
(2) IML and IMR are input to the feature extraction module, and IML is input to the edge steering module.
Wherein the gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
The workflow of the feature extraction module comprises:
(1) Performing 3×3 convolution downsampling operation with a step length of 2 on IML and IMR, and performing batch normalization and Relu activation to obtain a final product with a size of 2
Figure BDA0003989779000000021
Characteristic map FL of (2) 1 And characteristic diagram FR 1
(2) Map FL of the characteristic 1 And characteristic diagram FR 1 Respectively transmitting the residual blocks into 3 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size of
Figure BDA0003989779000000022
Characteristic map FL of (2) 2 And characteristic diagram FR 2
(3) Map FL of the characteristic 2 And characteristic diagram FR 2 Respectively transmitting the residual blocks into 16 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size of
Figure BDA0003989779000000031
Characteristic map FL of (2) 3 And characteristic diagram FR 3
(4) Map FL of the characteristic 3 And characteristic diagram FR 3 Respectively transmitting into 3 continuous different residual blocks to execute expansion convolution operation with expansion coefficient of 2, and performing batch normalization and Relu activation to obtain a size of 2
Figure BDA0003989779000000032
Characteristic map FL of (2) 4 And characteristic diagram FR 4
(5) Map FL of the characteristic 4 And featuresFIG. FR 4 Performing expansion convolution operation with expansion coefficient of 4 in 3 different residual blocks respectively, and performing batch normalization and Relu activation to obtain a block with size of 4
Figure BDA0003989779000000033
Characteristic map FL of (2) 5 And characteristic diagram FR 5
(6) Map FL of the characteristic 5 And characteristic diagram FR 5 And inputting the data to the pyramid pooling module.
The workflow of the pyramid pooling module comprises:
(1) For characteristic map FL 5 And characteristic diagram FR 5 Performing adaptive global average pooling operations with the sizes of 64×64, 32×32, 16×16 and 8×8 respectively to obtain four feature images with different resolutions, respectively reducing the dimensionality of the four feature images with different resolutions through a convolution kernel of 1×1, and performing double linear interpolation up-sampling operation to obtain four feature images with the same resolution respectively;
(2) Map FL of the characteristic 5 Corresponding four feature maps and feature map FL with the same resolution 3 And characteristic map FL 5 Splicing to obtain a size of
Figure BDA0003989779000000034
Characteristic map FL of (2) 6
(3) Map FR of the characteristic 5 Corresponding four feature graphs with the same resolution and feature graph FR 3 And characteristic diagram FR 5 Splicing to obtain a size of
Figure BDA0003989779000000035
Is of characteristic diagram FR of (2) 6
(4) Map FL of the characteristic 6 And characteristic diagram FR 6 Input to the mixed attention mechanism module.
As shown in fig. 2, the mixed attention mechanism module includes a channel attention mechanism and a spatial attention mechanism, and the workflow of the mixed attention mechanism module includes:
(1) For characteristic map FL 6 And characteristic diagram FR 6 Respectively carrying out global maximum pooling and average pooling of the space to obtain two channel descriptions of 1 multiplied by C, respectively sending the two channel descriptions of 1 multiplied by C into a two-layer neural network with an activation function of Relu, outputting two C-dimensional characteristics, adding the two C-dimensional characteristics, obtaining a weight coefficient Ac through a Sigmoid activation function, and respectively mixing Ac with a characteristic map FL 6 And characteristic diagram FR 6 Multiplication to obtain intermediate characteristic FL 7 and FR7
(2) For FL 7 and FR7 Respectively carrying out maximum pooling and average pooling of one channel dimension to obtain two H×W×1 channel descriptions, splicing the two H×W×1 channel descriptions together according to the channels, then obtaining a weight coefficient As through a 7×7 convolution layer with an activation function of Sigmoid, and respectively mixing As with FL 7 and FR7 Multiplication to obtain a characteristic map FL 8 And characteristic diagram FR 8
After the mixed attention mechanism module is processed, the characteristic diagram FL is processed according to the channel and the parallax dimension 8 And characteristic diagram FR 8 Splicing to obtain a size of
Figure BDA0003989779000000036
Four-dimensional cost volume C of (2) disp (u, v, d,:). Then for four-dimensional cost volume C disp (u, v, d,:) performing bilinear interpolation and parallax-depth conversion to obtain a depth cost volume C depth (u, v, z,:); finally, the depth cost volume C depth (u, v, z,:) to the stacked hourglass module.
Wherein the disparity-depth conversion is expressed as:
Figure BDA0003989779000000041
wherein ,fU Representing a horizontal focal length; b represents a baseline length; d (u, v) and Z (u, v) represent the disparity and depth of the feature map at the (u, v) position, respectively.
The stacked hourglass module includes three hourglass networks, each hourglass networkFor depth cost volume C depth (u, v, z,:) processing to obtain three sizes respectively
Figure BDA0003989779000000042
For the three dimensions as
Figure BDA0003989779000000043
Performing bilinear interpolation operation on the feature map of (2) to obtain feature maps S with dimensions Z×H×W respectively depth1 、S depth2 and Sdepth3 . When training the depth estimation network framework, S is carried out depth1 、S depth2 and Sdepth3 As an initial prediction result S; when the depth estimation network framework is verified, the feature map S output by the last hourglass network is obtained depth3 As an initial prediction result S. Performing depth regression on the S to obtain an initial depth map DM with the size of H multiplied by W; the initial depth map DM is input to the edge steering module.
Wherein the depth regression is expressed as:
Figure BDA0003989779000000044
as shown in fig. 3, the workflow of the edge guiding module includes:
(1) By edge detection operators
Figure BDA0003989779000000045
Performing joint extraction on edge information of the preprocessed image IML to obtain edge density E (u, v); at the same time by an edge detection operator->
Figure BDA0003989779000000046
Extracting edge information of the initial depth map DM to obtain edge density e (u, v);
wherein ,
Figure BDA0003989779000000047
Figure BDA0003989779000000048
(2) Constructing an edge loss function by E (u, v) and E (u, v), expressed as:
Figure BDA0003989779000000049
(3) Constructing a depth loss function corresponding to the initial depth map DM, wherein the depth loss function is expressed as:
Figure BDA00039897790000000410
(4) Finally, the joint loss function L is obtained tatal Expressed as:
L total =αL edge +βL depth (7)
wherein alpha and beta represent balance coefficients of corresponding loss terms;
(5) By the joint loss function L tatal And performing joint supervision to obtain a final prediction depth map with clear edges.
As a specific embodiment, the edge detection operator comprises a Laplacian operator and a Canny operator, and the edge information of the preprocessed image or the initial depth map is jointly extracted through the Laplacian operator and the Canny operator, wherein the Canny operator can detect the weak edge of the image under the noise condition and is complementary with the Laplacian operator in performance that the stepped edge can be accurately positioned but is easily interfered by noise, so that the effect of edge enhancement is achieved.
S2: and training the depth estimation network framework to obtain a first depth estimation network.
Specifically, step S2 includes:
s21: inputting an image_2 and an image_3 in a training set and a verification set in the KITTI data set into a GAN network obtained by training in advance to realize style migration from a color image to an infrared image;
s22: constructing a data set through a binocular infrared image file and calib.txt and velodyne.bin files corresponding to the binocular infrared image file;
s23: training using Adam optimizer; wherein, the initial learning rate is set to be 1e-4, and the learning rate is automatically reduced in the training process; beta 1 =0.9,β 2 =0.999;
S24: after each iteration, training loss and verification loss are calculated, verification loss of each iteration is compared, model parameters with minimum verification loss are stored, and a model corresponding to the model parameters with minimum verification loss is the first depth estimation network.
S3: and (3) verifying the first depth estimation network, when the accuracy of the first depth estimation network meets a preset threshold, completing verification, otherwise, repeating the step (S2).
Specifically, step S3 includes:
s31: inputting the binocular infrared image in the verification set into the first depth estimation network to obtain a predicted depth map Z p
S32: predicting the depth map Z p The 3D coordinates of the radar point cloud are converted into pixel plane coordinates, the depth information z is reserved, and the accuracy of the depth estimation is calculated and expressed as follows:
z=Z(u,v);(8)
Figure BDA0003989779000000051
Figure BDA0003989779000000052
wherein x, y, z represent the spatial coordinate components of the radar point cloud; c U 、c V Representing a camera principal point coordinate component; f (f) U 、f V Representing the horizontal and vertical focal lengths of the camera, respectively.
S4: and performing binocular infrared depth estimation through the first depth estimation network.
The foregoing is an exemplary embodiment of the present application, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A binocular infrared depth estimation method based on edge steering and attention mechanisms, comprising:
s1: constructing a depth estimation network framework based on edge guidance;
s2: training the depth estimation network framework to obtain a first depth estimation network;
s3: verifying the first depth estimation network, completing verification when the accuracy of the first depth estimation network meets a preset threshold, otherwise, repeating the step S2;
s4: performing binocular infrared depth estimation through the first depth estimation network;
the depth estimation network framework comprises an image preprocessing module, an edge guiding module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module and a stacking hourglass module.
2. The method of claim 1, wherein the preprocessing of the image preprocessing module comprises:
preprocessing operations based on gamma correction and median filtering are carried out on the binocular infrared images subjected to distortion correction and Bouguet polar correction, so that preprocessed images IML and IMR are obtained respectively;
inputting an IML and an IMR to the feature extraction module, and inputting an IML to the edge guide module;
wherein the gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
3. The method of claim 2, wherein the workflow of the feature extraction module comprises:
performing one step length on IML and IMR respectivelyA 3×3 convolution downsampling operation of 2, and batch normalization and Relu activation are performed to finally obtain a size of
Figure FDA0003989778990000011
Characteristic map FL of (2) 1 And characteristic diagram FR 1
Map FL of the characteristic 1 And characteristic diagram FR 1 Respectively transmitting the residual blocks into 3 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size of
Figure FDA0003989778990000012
Characteristic map FL of (2) 2 And characteristic diagram FR 2
Map FL of the characteristic 2 And characteristic diagram FR 2 Respectively transmitting the residual blocks into 16 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size of
Figure FDA0003989778990000013
Characteristic map FL of (2) 3 And characteristic diagram FR 3
Map FL of the characteristic 3 And characteristic diagram FR 3 Respectively transmitting into 3 continuous different residual blocks to execute expansion convolution operation with expansion coefficient of 2, and performing batch normalization and Relu activation to obtain a size of 2
Figure FDA0003989778990000014
Characteristic map FL of (2) 4 And characteristic diagram FR 4
Map FL of the characteristic 4 And characteristic diagram FR 4 Performing expansion convolution operation with expansion coefficient of 4 in 3 different residual blocks respectively, and performing batch normalization and Relu activation to obtain a block with size of 4
Figure FDA0003989778990000015
Characteristic map FL of (2) 5 And characteristic diagram FR 5
Map FL of the characteristic 5 And characteristic diagram FR 5 And inputting the data to the pyramid pooling module.
4. The method of claim 3, wherein the workflow of the pyramid pooling module comprises:
for characteristic map FL 5 And characteristic diagram FR 5 Performing adaptive global average pooling operations with the sizes of 64×64, 32×32, 16×16 and 8×8 respectively to obtain four feature images with different resolutions, respectively reducing the dimensionality of the four feature images with different resolutions through a convolution kernel of 1×1, and performing double linear interpolation up-sampling operation to obtain four feature images with the same resolution respectively;
map FL of the characteristic 5 Corresponding four feature maps and feature map FL with the same resolution 3 And characteristic map FL 5 Splicing to obtain a size of
Figure FDA0003989778990000021
Characteristic map FL of (2) 6
Map FR of the characteristic 5 Corresponding four feature graphs with the same resolution and feature graph FR 3 And characteristic diagram FR 5 Splicing to obtain a size of
Figure FDA0003989778990000022
Is of characteristic diagram FR of (2) 6
Map FL of the characteristic 6 And characteristic diagram FR 6 Input to the mixed attention mechanism module.
5. The method of claim 4, wherein the workflow of the mixed attention mechanism module comprises:
for characteristic map FL 6 And characteristic diagram FR 6 Respectively carrying out global maximum pooling and average pooling of the space to obtain two channel descriptions of 1 multiplied by C, respectively sending the two channel descriptions of 1 multiplied by C into a two-layer neural network with an activation function of Relu, outputting two C-dimensional characteristics, adding the two C-dimensional characteristics, obtaining a weight coefficient Ac through a Sigmoid activation function, and dividing Ac into componentsOther and characteristic map FL 6 And characteristic diagram FR 6 Multiplication to obtain intermediate characteristic FL 7 and FR7
For FL 7 and FR7 Respectively carrying out maximum pooling and average pooling of one channel dimension to obtain two H×W×1 channel descriptions, splicing the two H×W×1 channel descriptions together according to the channels, then obtaining a weight coefficient As through a 7×7 convolution layer with an activation function of Sigmoid, and respectively mixing As with FL 7 and FR7 Multiplication to obtain a characteristic map FL 8 And characteristic diagram FR 8
6. The method as recited in claim 5, comprising:
characteristic map FL according to channel and disparity dimensions 8 And characteristic diagram FR 8 Splicing to obtain a size of
Figure FDA0003989778990000023
Figure FDA0003989778990000024
Four-dimensional cost volume C of (2) disp (u,v,d,:);
For four-dimensional cost volume C disp (u, v, d,:) performing bilinear interpolation and parallax-depth conversion to obtain a depth cost volume C depth (u,v,z,:);
Rolling the depth cost to C depth (u, v, z,:) input to the stacked hourglass module;
wherein the disparity-depth conversion is expressed as:
Figure FDA0003989778990000025
wherein ,fU Representing a horizontal focal length; b represents a baseline length; d (u, v) and Z (u, v) represent the disparity and depth of the feature map at the (u, v) position, respectively.
7. The method of claim 6, whereinThe stacked hourglass module comprises three hourglass networks, each of which is respectively corresponding to a depth cost volume C depth (u, v, z,:) processing to obtain three sizes respectively
Figure FDA0003989778990000026
Figure FDA0003989778990000027
Is +.>
Figure FDA0003989778990000028
Performing bilinear interpolation operation on the feature map of (2) to obtain feature maps S with dimensions Z×H×W respectively depth1 、S depth2 and Sdepth3
When training the depth estimation network framework, S is carried out depth1 、S depth2 and Sdepth3 As an initial prediction result S; when the depth estimation network framework is verified, the feature map S output by the last hourglass network is obtained depth3 As an initial prediction result S;
performing depth regression on the S to obtain an initial depth map DM with the size of H multiplied by W;
inputting the initial depth map DM to the edge guiding module;
wherein the depth regression is expressed as:
Figure FDA0003989778990000031
8. the method of claim 7, wherein the workflow of the edge-directed module comprises:
by edge detection operators
Figure FDA0003989778990000032
The edge information of the preprocessed image IML is jointly extracted to obtain edge density E (u, v)The method comprises the steps of carrying out a first treatment on the surface of the At the same time by an edge detection operator->
Figure FDA0003989778990000033
Extracting edge information of the initial depth map DM to obtain edge density e (u, v);
wherein ,
Figure FDA0003989778990000034
Figure FDA0003989778990000035
constructing an edge loss function by E (u, v) and E (u, v), expressed as:
Figure FDA0003989778990000036
and constructs a depth loss function corresponding to the initial depth map DM, expressed as:
Figure FDA0003989778990000037
finally, the joint loss function L is obtained tatal Expressed as:
L total =αL edge +βL depth (7)
wherein alpha and beta represent balance coefficients of corresponding loss terms;
by the joint loss function L tatal And performing joint supervision to obtain a final prediction depth map with clear edges.
9. The method of claim 8, wherein training the depth estimation network framework in step S2 comprises:
inputting an image_2 and an image_3 in a training set and a verification set in the KITTI data set into a GAN network obtained by training in advance to realize style migration from a color image to an infrared image;
constructing a data set through a binocular infrared image file and calib.txt and velodyne.bin files corresponding to the binocular infrared image file;
training using Adam optimizer; wherein, the initial learning rate is set to be 1e-4, and the learning rate is automatically reduced in the training process; beta 1 =0.9,β 2 =0.999;
After each iteration, training loss and verification loss are calculated, verification loss of each iteration is compared, model parameters with minimum verification loss are stored, and a model corresponding to the model parameters with minimum verification loss is the first depth estimation network.
10. The method of claim 9, wherein in step S3, when validating the first depth estimation network, comprising:
inputting the binocular infrared image in the verification set into the first depth estimation network to obtain a predicted depth map Z p
Predicting the depth map Z p The 3D coordinates of the radar point cloud are converted into pixel plane coordinates, the depth information z is reserved, and the accuracy of the depth estimation is calculated and expressed as follows:
z=Z(u,v);(8)
Figure FDA0003989778990000041
Figure FDA0003989778990000042
wherein x, y, z represent the spatial coordinate components of the radar point cloud; c U 、c V Representing a camera principal point coordinate component; f (f) U 、f V Representing the horizontal and vertical focal lengths of the camera, respectively.
CN202211588573.1A 2022-12-09 2022-12-09 Binocular infrared depth estimation method based on edge guiding and attention mechanism Active CN116128946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211588573.1A CN116128946B (en) 2022-12-09 2022-12-09 Binocular infrared depth estimation method based on edge guiding and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211588573.1A CN116128946B (en) 2022-12-09 2022-12-09 Binocular infrared depth estimation method based on edge guiding and attention mechanism

Publications (2)

Publication Number Publication Date
CN116128946A true CN116128946A (en) 2023-05-16
CN116128946B CN116128946B (en) 2024-02-09

Family

ID=86296430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211588573.1A Active CN116128946B (en) 2022-12-09 2022-12-09 Binocular infrared depth estimation method based on edge guiding and attention mechanism

Country Status (1)

Country Link
CN (1) CN116128946B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833386A (en) * 2020-07-22 2020-10-27 中国石油大学(华东) Pyramid binocular stereo matching method based on multi-scale information and attention mechanism
CN112150518A (en) * 2020-08-06 2020-12-29 江苏大学 Attention mechanism-based image stereo matching method and binocular device
CN112581517A (en) * 2020-12-16 2021-03-30 电子科技大学中山学院 Binocular stereo matching device and method
CN113763446A (en) * 2021-08-17 2021-12-07 沈阳工业大学 Stereo matching method based on guide information
CN114119704A (en) * 2021-12-02 2022-03-01 吉林大学 Light field image depth estimation method based on spatial pyramid pooling
US20220215567A1 (en) * 2019-05-10 2022-07-07 Nippon Telegraph And Telephone Corporation Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program
CN114926669A (en) * 2022-05-17 2022-08-19 南京理工大学 Efficient speckle matching method based on deep learning
CN115049739A (en) * 2022-06-14 2022-09-13 贵州大学 Binocular vision stereo matching method based on edge detection
CN115170638A (en) * 2022-07-13 2022-10-11 东北林业大学 Binocular vision stereo matching network system and construction method thereof
CN115170921A (en) * 2022-07-07 2022-10-11 广西师范大学 Binocular stereo matching method based on bilateral grid learning and edge loss

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220215567A1 (en) * 2019-05-10 2022-07-07 Nippon Telegraph And Telephone Corporation Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program
CN111833386A (en) * 2020-07-22 2020-10-27 中国石油大学(华东) Pyramid binocular stereo matching method based on multi-scale information and attention mechanism
CN112150518A (en) * 2020-08-06 2020-12-29 江苏大学 Attention mechanism-based image stereo matching method and binocular device
CN112581517A (en) * 2020-12-16 2021-03-30 电子科技大学中山学院 Binocular stereo matching device and method
CN113763446A (en) * 2021-08-17 2021-12-07 沈阳工业大学 Stereo matching method based on guide information
CN114119704A (en) * 2021-12-02 2022-03-01 吉林大学 Light field image depth estimation method based on spatial pyramid pooling
CN114926669A (en) * 2022-05-17 2022-08-19 南京理工大学 Efficient speckle matching method based on deep learning
CN115049739A (en) * 2022-06-14 2022-09-13 贵州大学 Binocular vision stereo matching method based on edge detection
CN115170921A (en) * 2022-07-07 2022-10-11 广西师范大学 Binocular stereo matching method based on bilateral grid learning and edge loss
CN115170638A (en) * 2022-07-13 2022-10-11 东北林业大学 Binocular vision stereo matching network system and construction method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAO SONG ET AL.: "EdgeStereo: An Effective Multi-Task Learning Network for Stereo Matching and Edge Detection", 《ARXIV:1903.01700V2》, pages 1 - 18 *
XIAOWEI YANG ET AL.: "Edge supervision and multi-scale cost volume for stereo matching", 《IMAGE AND VISION COMPUTING》, pages 3 - 4 *

Also Published As

Publication number Publication date
CN116128946B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN108986050B (en) Image and video enhancement method based on multi-branch convolutional neural network
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
EP3510561B1 (en) Predicting depth from image data using a statistical model
CN109101975B (en) Image semantic segmentation method based on full convolution neural network
CN111259945B (en) Binocular parallax estimation method introducing attention map
WO2021013334A1 (en) Depth maps prediction system and training method for such a system
CN111695633B (en) Low-illumination target detection method based on RPF-CAM
CN109005398B (en) Stereo image parallax matching method based on convolutional neural network
CN111815665B (en) Single image crowd counting method based on depth information and scale perception information
CN109509156B (en) Image defogging processing method based on generation countermeasure model
CN110443775B (en) Discrete wavelet transform domain multi-focus image fusion method based on convolutional neural network
CN111861939B (en) Single image defogging method based on unsupervised learning
CN111553940B (en) Depth image edge optimization method and processing device
CN113222033A (en) Monocular image estimation method based on multi-classification regression model and self-attention mechanism
CN110751157B (en) Image significance segmentation and image significance model training method and device
Hegde et al. Adaptive cubic spline interpolation in cielab color space for underwater image enhancement
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN112509021A (en) Parallax optimization method based on attention mechanism
CN115511759A (en) Point cloud image depth completion method based on cascade feature interaction
CN116128946B (en) Binocular infrared depth estimation method based on edge guiding and attention mechanism
CN110766609B (en) Depth-of-field map super-resolution reconstruction method for ToF camera
Yang et al. Image defogging based on amended dark channel prior and 4‐directional L1 regularisation
Novikov et al. Local-adaptive blocks-based predictor for lossless image compression
CN115631223A (en) Multi-view stereo reconstruction method based on self-adaptive learning and aggregation
Shuang et al. Algorithms for improving the quality of underwater optical images: A comprehensive review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant