CN116128946A - Binocular infrared depth estimation method based on edge guiding and attention mechanism - Google Patents
Binocular infrared depth estimation method based on edge guiding and attention mechanism Download PDFInfo
- Publication number
- CN116128946A CN116128946A CN202211588573.1A CN202211588573A CN116128946A CN 116128946 A CN116128946 A CN 116128946A CN 202211588573 A CN202211588573 A CN 202211588573A CN 116128946 A CN116128946 A CN 116128946A
- Authority
- CN
- China
- Prior art keywords
- depth
- map
- characteristic
- edge
- characteristic diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000007246 mechanism Effects 0.000 title claims abstract description 21
- 238000010586 diagram Methods 0.000 claims description 37
- 238000011176 pooling Methods 0.000 claims description 17
- 230000004913 activation Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 238000012795 verification Methods 0.000 claims description 15
- 238000012937 correction Methods 0.000 claims description 14
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 238000003708 edge detection Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 2
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000013508 migration Methods 0.000 claims description 2
- 230000005012 migration Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims 1
- 238000005286 illumination Methods 0.000 abstract description 5
- 230000007547 defect Effects 0.000 abstract description 3
- 230000001737 promoting effect Effects 0.000 abstract description 2
- 238000011160 research Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/92—Dynamic range modification of images or parts thereof based on global image properties
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20032—Median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a binocular infrared depth estimation method based on edge guiding and attention mechanism, relates to the technical field of binocular vision in computer vision, and solves the technical problem of low depth estimation precision caused by the defects of unclear texture, blurred edge, unobvious characteristics and the like of an infrared image; constructing a mixed attention module in the high-dimensional feature map to acquire depth association of different channels and spatial positions among the features to be matched, and promoting the effective depth reasoning of a subsequent network; meanwhile, an edge guiding module is introduced, an edge-depth joint loss function is constructed to generate a foreground depth map with clear edges, smooth depth and no depth cavity, and the possibility is provided for an intelligent body to maintain normal operation in a low-illumination environment.
Description
Technical Field
The application relates to the technical field of binocular vision in computer vision, in particular to a binocular infrared depth estimation method based on edge guiding and attention mechanisms.
Background
As a research hotspot in the field of computer vision, binocular depth estimation has been widely used in the fields of three-dimensional reconstruction, autopilot, mobile robots, and the like. For a group of corrected stereo images captured by a binocular camera, the essence of depth estimation is to find a matching point corresponding to each pixel when the cost value is minimum, and take the parallax between the left matching point and the right matching point at this time as parallax output. Compared with the traditional depth estimation algorithm, the deep learning-based algorithm can effectively optimize the problem of inappropriateness in image depth estimation, and can learn and estimate the depth information of the shielding and weak texture areas by using priori knowledge. However, most of researches are only based on visible light images, and compared with a visible light camera limited by low illumination and severe environment, an infrared camera can still image through receiving infrared electromagnetic waves emitted by environmental objects under the low illumination environment, so that the purpose of sensing the surrounding environment is achieved. While it is possible to grasp the characteristic of the infrared camera and develop a depth estimation research based on the infrared image, on the other hand, the research of the infrared image has certain difficulty due to inherent defects such as unclear texture, blurred edge, unobvious characteristics and the like.
Disclosure of Invention
The application provides a binocular infrared depth estimation method based on an edge guiding and attention mechanism, which aims to make up for inherent defects of unclear texture, blurred edge, unobvious characteristics and the like of an infrared image, so that an intelligent agent can maintain normal operation in a low-illumination environment.
The technical aim of the application is achieved through the following technical scheme:
a binocular infrared depth estimation method based on edge steering and attention mechanisms, comprising:
s1: constructing a depth estimation network framework based on edge guidance;
s2: training the depth estimation network framework to obtain a first depth estimation network;
s3: verifying the first depth estimation network, completing verification when the accuracy of the first depth estimation network meets a preset threshold, otherwise, repeating the step S2;
s4: performing binocular infrared depth estimation through the first depth estimation network;
the depth estimation network framework comprises an image preprocessing module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module, a stacking hourglass module and an edge guiding module.
The beneficial effects of this application lie in: according to the binocular infrared depth estimation method based on the edge guiding and attention mechanism, an image preprocessing module based on gamma correction and median filtering is introduced to enhance image edge and detail information, so that more deep feature representations which can be mined are provided for a convolutional neural network; constructing a mixed attention module in the high-dimensional feature map to acquire depth association of different channels and spatial positions among the features to be matched, and promoting the effective depth reasoning of a subsequent network; meanwhile, an edge guiding module is introduced, an edge-depth joint loss function is constructed to generate a foreground depth map with clear edges, smooth depth and no depth cavity, and the possibility is provided for an intelligent body to maintain normal operation in a low-illumination environment.
Drawings
FIG. 1 is a flow chart of a method described herein;
FIG. 2 is a schematic diagram of a hybrid attention mechanism module;
fig. 3 is a schematic view of an edge guide module.
Detailed Description
The technical scheme of the application will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the binocular infrared depth estimation method based on the edge guiding and attention mechanism described in the application comprises the following steps:
s1: an edge-guided depth estimation network framework is constructed.
Specifically, the depth estimation network framework comprises an image preprocessing module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module, a stacking hourglass module and an edge guiding module.
The preprocessing of the image preprocessing module comprises the following steps:
(1) And (3) preprocessing operations based on gamma correction and median filtering are carried out on the binocular infrared images subjected to distortion correction and Bouguet polar correction, so as to obtain preprocessed images IML and IMR respectively.
After distortion correction and Bouguet polar correction, the left image and the right image in the binocular infrared image are subjected to preprocessing operation based on gamma correction and median filtering.
The gamma correction is to edit the gamma curve of the image to edit the nonlinear tone of the image, detect the dark color part and the light color part in the image signal, and increase the proportion of the dark color part and the light color part, thereby improving the contrast effect of the image. Gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
The basic principle of median filtering is to replace the value of a point in a digital image or digital sequence with the median of the values of points in a neighborhood of the point, so that the surrounding pixel values are close to the true value, thereby eliminating isolated noise points.
(2) IML and IMR are input to the feature extraction module, and IML is input to the edge steering module.
Wherein the gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
The workflow of the feature extraction module comprises:
(1) Performing 3×3 convolution downsampling operation with a step length of 2 on IML and IMR, and performing batch normalization and Relu activation to obtain a final product with a size of 2Characteristic map FL of (2) 1 And characteristic diagram FR 1 ;
(2) Map FL of the characteristic 1 And characteristic diagram FR 1 Respectively transmitting the residual blocks into 3 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size ofCharacteristic map FL of (2) 2 And characteristic diagram FR 2 ;
(3) Map FL of the characteristic 2 And characteristic diagram FR 2 Respectively transmitting the residual blocks into 16 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size ofCharacteristic map FL of (2) 3 And characteristic diagram FR 3 ;
(4) Map FL of the characteristic 3 And characteristic diagram FR 3 Respectively transmitting into 3 continuous different residual blocks to execute expansion convolution operation with expansion coefficient of 2, and performing batch normalization and Relu activation to obtain a size of 2Characteristic map FL of (2) 4 And characteristic diagram FR 4 ;
(5) Map FL of the characteristic 4 And featuresFIG. FR 4 Performing expansion convolution operation with expansion coefficient of 4 in 3 different residual blocks respectively, and performing batch normalization and Relu activation to obtain a block with size of 4Characteristic map FL of (2) 5 And characteristic diagram FR 5 ;
(6) Map FL of the characteristic 5 And characteristic diagram FR 5 And inputting the data to the pyramid pooling module.
The workflow of the pyramid pooling module comprises:
(1) For characteristic map FL 5 And characteristic diagram FR 5 Performing adaptive global average pooling operations with the sizes of 64×64, 32×32, 16×16 and 8×8 respectively to obtain four feature images with different resolutions, respectively reducing the dimensionality of the four feature images with different resolutions through a convolution kernel of 1×1, and performing double linear interpolation up-sampling operation to obtain four feature images with the same resolution respectively;
(2) Map FL of the characteristic 5 Corresponding four feature maps and feature map FL with the same resolution 3 And characteristic map FL 5 Splicing to obtain a size ofCharacteristic map FL of (2) 6 ;
(3) Map FR of the characteristic 5 Corresponding four feature graphs with the same resolution and feature graph FR 3 And characteristic diagram FR 5 Splicing to obtain a size ofIs of characteristic diagram FR of (2) 6 ;
(4) Map FL of the characteristic 6 And characteristic diagram FR 6 Input to the mixed attention mechanism module.
As shown in fig. 2, the mixed attention mechanism module includes a channel attention mechanism and a spatial attention mechanism, and the workflow of the mixed attention mechanism module includes:
(1) For characteristic map FL 6 And characteristic diagram FR 6 Respectively carrying out global maximum pooling and average pooling of the space to obtain two channel descriptions of 1 multiplied by C, respectively sending the two channel descriptions of 1 multiplied by C into a two-layer neural network with an activation function of Relu, outputting two C-dimensional characteristics, adding the two C-dimensional characteristics, obtaining a weight coefficient Ac through a Sigmoid activation function, and respectively mixing Ac with a characteristic map FL 6 And characteristic diagram FR 6 Multiplication to obtain intermediate characteristic FL 7 and FR7 ;
(2) For FL 7 and FR7 Respectively carrying out maximum pooling and average pooling of one channel dimension to obtain two H×W×1 channel descriptions, splicing the two H×W×1 channel descriptions together according to the channels, then obtaining a weight coefficient As through a 7×7 convolution layer with an activation function of Sigmoid, and respectively mixing As with FL 7 and FR7 Multiplication to obtain a characteristic map FL 8 And characteristic diagram FR 8 。
After the mixed attention mechanism module is processed, the characteristic diagram FL is processed according to the channel and the parallax dimension 8 And characteristic diagram FR 8 Splicing to obtain a size ofFour-dimensional cost volume C of (2) disp (u, v, d,:). Then for four-dimensional cost volume C disp (u, v, d,:) performing bilinear interpolation and parallax-depth conversion to obtain a depth cost volume C depth (u, v, z,:); finally, the depth cost volume C depth (u, v, z,:) to the stacked hourglass module.
Wherein the disparity-depth conversion is expressed as:
wherein ,fU Representing a horizontal focal length; b represents a baseline length; d (u, v) and Z (u, v) represent the disparity and depth of the feature map at the (u, v) position, respectively.
The stacked hourglass module includes three hourglass networks, each hourglass networkFor depth cost volume C depth (u, v, z,:) processing to obtain three sizes respectivelyFor the three dimensions asPerforming bilinear interpolation operation on the feature map of (2) to obtain feature maps S with dimensions Z×H×W respectively depth1 、S depth2 and Sdepth3 . When training the depth estimation network framework, S is carried out depth1 、S depth2 and Sdepth3 As an initial prediction result S; when the depth estimation network framework is verified, the feature map S output by the last hourglass network is obtained depth3 As an initial prediction result S. Performing depth regression on the S to obtain an initial depth map DM with the size of H multiplied by W; the initial depth map DM is input to the edge steering module.
Wherein the depth regression is expressed as:
as shown in fig. 3, the workflow of the edge guiding module includes:
(1) By edge detection operatorsPerforming joint extraction on edge information of the preprocessed image IML to obtain edge density E (u, v); at the same time by an edge detection operator->Extracting edge information of the initial depth map DM to obtain edge density e (u, v);
(2) Constructing an edge loss function by E (u, v) and E (u, v), expressed as:
(3) Constructing a depth loss function corresponding to the initial depth map DM, wherein the depth loss function is expressed as:
(4) Finally, the joint loss function L is obtained tatal Expressed as:
L total =αL edge +βL depth (7)
wherein alpha and beta represent balance coefficients of corresponding loss terms;
(5) By the joint loss function L tatal And performing joint supervision to obtain a final prediction depth map with clear edges.
As a specific embodiment, the edge detection operator comprises a Laplacian operator and a Canny operator, and the edge information of the preprocessed image or the initial depth map is jointly extracted through the Laplacian operator and the Canny operator, wherein the Canny operator can detect the weak edge of the image under the noise condition and is complementary with the Laplacian operator in performance that the stepped edge can be accurately positioned but is easily interfered by noise, so that the effect of edge enhancement is achieved.
S2: and training the depth estimation network framework to obtain a first depth estimation network.
Specifically, step S2 includes:
s21: inputting an image_2 and an image_3 in a training set and a verification set in the KITTI data set into a GAN network obtained by training in advance to realize style migration from a color image to an infrared image;
s22: constructing a data set through a binocular infrared image file and calib.txt and velodyne.bin files corresponding to the binocular infrared image file;
s23: training using Adam optimizer; wherein, the initial learning rate is set to be 1e-4, and the learning rate is automatically reduced in the training process; beta 1 =0.9,β 2 =0.999;
S24: after each iteration, training loss and verification loss are calculated, verification loss of each iteration is compared, model parameters with minimum verification loss are stored, and a model corresponding to the model parameters with minimum verification loss is the first depth estimation network.
S3: and (3) verifying the first depth estimation network, when the accuracy of the first depth estimation network meets a preset threshold, completing verification, otherwise, repeating the step (S2).
Specifically, step S3 includes:
s31: inputting the binocular infrared image in the verification set into the first depth estimation network to obtain a predicted depth map Z p ;
S32: predicting the depth map Z p The 3D coordinates of the radar point cloud are converted into pixel plane coordinates, the depth information z is reserved, and the accuracy of the depth estimation is calculated and expressed as follows:
z=Z(u,v);(8)
wherein x, y, z represent the spatial coordinate components of the radar point cloud; c U 、c V Representing a camera principal point coordinate component; f (f) U 、f V Representing the horizontal and vertical focal lengths of the camera, respectively.
S4: and performing binocular infrared depth estimation through the first depth estimation network.
The foregoing is an exemplary embodiment of the present application, the scope of which is defined by the claims and their equivalents.
Claims (10)
1. A binocular infrared depth estimation method based on edge steering and attention mechanisms, comprising:
s1: constructing a depth estimation network framework based on edge guidance;
s2: training the depth estimation network framework to obtain a first depth estimation network;
s3: verifying the first depth estimation network, completing verification when the accuracy of the first depth estimation network meets a preset threshold, otherwise, repeating the step S2;
s4: performing binocular infrared depth estimation through the first depth estimation network;
the depth estimation network framework comprises an image preprocessing module, an edge guiding module, a feature extraction module, a pyramid pooling module, a mixed attention mechanism module and a stacking hourglass module.
2. The method of claim 1, wherein the preprocessing of the image preprocessing module comprises:
preprocessing operations based on gamma correction and median filtering are carried out on the binocular infrared images subjected to distortion correction and Bouguet polar correction, so that preprocessed images IML and IMR are obtained respectively;
inputting an IML and an IMR to the feature extraction module, and inputting an IML to the edge guide module;
wherein the gamma correction is denoted as V out =V in γ The method comprises the following steps: when gamma is larger than 1, the contrast of the high gray area of the image is enhanced; when γ is less than 1, the contrast of the low gray area of the image is enhanced; when γ is equal to 1, the original is not changed.
3. The method of claim 2, wherein the workflow of the feature extraction module comprises:
performing one step length on IML and IMR respectivelyA 3×3 convolution downsampling operation of 2, and batch normalization and Relu activation are performed to finally obtain a size ofCharacteristic map FL of (2) 1 And characteristic diagram FR 1 ;
Map FL of the characteristic 1 And characteristic diagram FR 1 Respectively transmitting the residual blocks into 3 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size ofCharacteristic map FL of (2) 2 And characteristic diagram FR 2 ;
Map FL of the characteristic 2 And characteristic diagram FR 2 Respectively transmitting the residual blocks into 16 continuous different residual blocks, and carrying out batch normalization and Relu activation to obtain a block with the size ofCharacteristic map FL of (2) 3 And characteristic diagram FR 3 ;
Map FL of the characteristic 3 And characteristic diagram FR 3 Respectively transmitting into 3 continuous different residual blocks to execute expansion convolution operation with expansion coefficient of 2, and performing batch normalization and Relu activation to obtain a size of 2Characteristic map FL of (2) 4 And characteristic diagram FR 4 ;
Map FL of the characteristic 4 And characteristic diagram FR 4 Performing expansion convolution operation with expansion coefficient of 4 in 3 different residual blocks respectively, and performing batch normalization and Relu activation to obtain a block with size of 4Characteristic map FL of (2) 5 And characteristic diagram FR 5 ;
Map FL of the characteristic 5 And characteristic diagram FR 5 And inputting the data to the pyramid pooling module.
4. The method of claim 3, wherein the workflow of the pyramid pooling module comprises:
for characteristic map FL 5 And characteristic diagram FR 5 Performing adaptive global average pooling operations with the sizes of 64×64, 32×32, 16×16 and 8×8 respectively to obtain four feature images with different resolutions, respectively reducing the dimensionality of the four feature images with different resolutions through a convolution kernel of 1×1, and performing double linear interpolation up-sampling operation to obtain four feature images with the same resolution respectively;
map FL of the characteristic 5 Corresponding four feature maps and feature map FL with the same resolution 3 And characteristic map FL 5 Splicing to obtain a size ofCharacteristic map FL of (2) 6 ;
Map FR of the characteristic 5 Corresponding four feature graphs with the same resolution and feature graph FR 3 And characteristic diagram FR 5 Splicing to obtain a size ofIs of characteristic diagram FR of (2) 6 ;
Map FL of the characteristic 6 And characteristic diagram FR 6 Input to the mixed attention mechanism module.
5. The method of claim 4, wherein the workflow of the mixed attention mechanism module comprises:
for characteristic map FL 6 And characteristic diagram FR 6 Respectively carrying out global maximum pooling and average pooling of the space to obtain two channel descriptions of 1 multiplied by C, respectively sending the two channel descriptions of 1 multiplied by C into a two-layer neural network with an activation function of Relu, outputting two C-dimensional characteristics, adding the two C-dimensional characteristics, obtaining a weight coefficient Ac through a Sigmoid activation function, and dividing Ac into componentsOther and characteristic map FL 6 And characteristic diagram FR 6 Multiplication to obtain intermediate characteristic FL 7 and FR7 ;
For FL 7 and FR7 Respectively carrying out maximum pooling and average pooling of one channel dimension to obtain two H×W×1 channel descriptions, splicing the two H×W×1 channel descriptions together according to the channels, then obtaining a weight coefficient As through a 7×7 convolution layer with an activation function of Sigmoid, and respectively mixing As with FL 7 and FR7 Multiplication to obtain a characteristic map FL 8 And characteristic diagram FR 8 。
6. The method as recited in claim 5, comprising:
characteristic map FL according to channel and disparity dimensions 8 And characteristic diagram FR 8 Splicing to obtain a size of Four-dimensional cost volume C of (2) disp (u,v,d,:);
For four-dimensional cost volume C disp (u, v, d,:) performing bilinear interpolation and parallax-depth conversion to obtain a depth cost volume C depth (u,v,z,:);
Rolling the depth cost to C depth (u, v, z,:) input to the stacked hourglass module;
wherein the disparity-depth conversion is expressed as:
wherein ,fU Representing a horizontal focal length; b represents a baseline length; d (u, v) and Z (u, v) represent the disparity and depth of the feature map at the (u, v) position, respectively.
7. The method of claim 6, whereinThe stacked hourglass module comprises three hourglass networks, each of which is respectively corresponding to a depth cost volume C depth (u, v, z,:) processing to obtain three sizes respectively Is +.>Performing bilinear interpolation operation on the feature map of (2) to obtain feature maps S with dimensions Z×H×W respectively depth1 、S depth2 and Sdepth3 ;
When training the depth estimation network framework, S is carried out depth1 、S depth2 and Sdepth3 As an initial prediction result S; when the depth estimation network framework is verified, the feature map S output by the last hourglass network is obtained depth3 As an initial prediction result S;
performing depth regression on the S to obtain an initial depth map DM with the size of H multiplied by W;
inputting the initial depth map DM to the edge guiding module;
wherein the depth regression is expressed as:
8. the method of claim 7, wherein the workflow of the edge-directed module comprises:
by edge detection operatorsThe edge information of the preprocessed image IML is jointly extracted to obtain edge density E (u, v)The method comprises the steps of carrying out a first treatment on the surface of the At the same time by an edge detection operator->Extracting edge information of the initial depth map DM to obtain edge density e (u, v);
constructing an edge loss function by E (u, v) and E (u, v), expressed as:
and constructs a depth loss function corresponding to the initial depth map DM, expressed as:
finally, the joint loss function L is obtained tatal Expressed as:
L total =αL edge +βL depth (7)
wherein alpha and beta represent balance coefficients of corresponding loss terms;
by the joint loss function L tatal And performing joint supervision to obtain a final prediction depth map with clear edges.
9. The method of claim 8, wherein training the depth estimation network framework in step S2 comprises:
inputting an image_2 and an image_3 in a training set and a verification set in the KITTI data set into a GAN network obtained by training in advance to realize style migration from a color image to an infrared image;
constructing a data set through a binocular infrared image file and calib.txt and velodyne.bin files corresponding to the binocular infrared image file;
training using Adam optimizer; wherein, the initial learning rate is set to be 1e-4, and the learning rate is automatically reduced in the training process; beta 1 =0.9,β 2 =0.999;
After each iteration, training loss and verification loss are calculated, verification loss of each iteration is compared, model parameters with minimum verification loss are stored, and a model corresponding to the model parameters with minimum verification loss is the first depth estimation network.
10. The method of claim 9, wherein in step S3, when validating the first depth estimation network, comprising:
inputting the binocular infrared image in the verification set into the first depth estimation network to obtain a predicted depth map Z p ;
Predicting the depth map Z p The 3D coordinates of the radar point cloud are converted into pixel plane coordinates, the depth information z is reserved, and the accuracy of the depth estimation is calculated and expressed as follows:
z=Z(u,v);(8)
wherein x, y, z represent the spatial coordinate components of the radar point cloud; c U 、c V Representing a camera principal point coordinate component; f (f) U 、f V Representing the horizontal and vertical focal lengths of the camera, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211588573.1A CN116128946B (en) | 2022-12-09 | 2022-12-09 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211588573.1A CN116128946B (en) | 2022-12-09 | 2022-12-09 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116128946A true CN116128946A (en) | 2023-05-16 |
CN116128946B CN116128946B (en) | 2024-02-09 |
Family
ID=86296430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211588573.1A Active CN116128946B (en) | 2022-12-09 | 2022-12-09 | Binocular infrared depth estimation method based on edge guiding and attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116128946B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111833386A (en) * | 2020-07-22 | 2020-10-27 | 中国石油大学(华东) | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism |
CN112150518A (en) * | 2020-08-06 | 2020-12-29 | 江苏大学 | Attention mechanism-based image stereo matching method and binocular device |
CN112581517A (en) * | 2020-12-16 | 2021-03-30 | 电子科技大学中山学院 | Binocular stereo matching device and method |
CN113763446A (en) * | 2021-08-17 | 2021-12-07 | 沈阳工业大学 | Stereo matching method based on guide information |
CN114119704A (en) * | 2021-12-02 | 2022-03-01 | 吉林大学 | Light field image depth estimation method based on spatial pyramid pooling |
US20220215567A1 (en) * | 2019-05-10 | 2022-07-07 | Nippon Telegraph And Telephone Corporation | Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program |
CN114926669A (en) * | 2022-05-17 | 2022-08-19 | 南京理工大学 | Efficient speckle matching method based on deep learning |
CN115049739A (en) * | 2022-06-14 | 2022-09-13 | 贵州大学 | Binocular vision stereo matching method based on edge detection |
CN115170921A (en) * | 2022-07-07 | 2022-10-11 | 广西师范大学 | Binocular stereo matching method based on bilateral grid learning and edge loss |
CN115170638A (en) * | 2022-07-13 | 2022-10-11 | 东北林业大学 | Binocular vision stereo matching network system and construction method thereof |
-
2022
- 2022-12-09 CN CN202211588573.1A patent/CN116128946B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220215567A1 (en) * | 2019-05-10 | 2022-07-07 | Nippon Telegraph And Telephone Corporation | Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program |
CN111833386A (en) * | 2020-07-22 | 2020-10-27 | 中国石油大学(华东) | Pyramid binocular stereo matching method based on multi-scale information and attention mechanism |
CN112150518A (en) * | 2020-08-06 | 2020-12-29 | 江苏大学 | Attention mechanism-based image stereo matching method and binocular device |
CN112581517A (en) * | 2020-12-16 | 2021-03-30 | 电子科技大学中山学院 | Binocular stereo matching device and method |
CN113763446A (en) * | 2021-08-17 | 2021-12-07 | 沈阳工业大学 | Stereo matching method based on guide information |
CN114119704A (en) * | 2021-12-02 | 2022-03-01 | 吉林大学 | Light field image depth estimation method based on spatial pyramid pooling |
CN114926669A (en) * | 2022-05-17 | 2022-08-19 | 南京理工大学 | Efficient speckle matching method based on deep learning |
CN115049739A (en) * | 2022-06-14 | 2022-09-13 | 贵州大学 | Binocular vision stereo matching method based on edge detection |
CN115170921A (en) * | 2022-07-07 | 2022-10-11 | 广西师范大学 | Binocular stereo matching method based on bilateral grid learning and edge loss |
CN115170638A (en) * | 2022-07-13 | 2022-10-11 | 东北林业大学 | Binocular vision stereo matching network system and construction method thereof |
Non-Patent Citations (2)
Title |
---|
XIAO SONG ET AL.: "EdgeStereo: An Effective Multi-Task Learning Network for Stereo Matching and Edge Detection", 《ARXIV:1903.01700V2》, pages 1 - 18 * |
XIAOWEI YANG ET AL.: "Edge supervision and multi-scale cost volume for stereo matching", 《IMAGE AND VISION COMPUTING》, pages 3 - 4 * |
Also Published As
Publication number | Publication date |
---|---|
CN116128946B (en) | 2024-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108986050B (en) | Image and video enhancement method based on multi-branch convolutional neural network | |
US10353271B2 (en) | Depth estimation method for monocular image based on multi-scale CNN and continuous CRF | |
EP3510561B1 (en) | Predicting depth from image data using a statistical model | |
CN109101975B (en) | Image semantic segmentation method based on full convolution neural network | |
CN111259945B (en) | Binocular parallax estimation method introducing attention map | |
CN111695633B (en) | Low-illumination target detection method based on RPF-CAM | |
CN109005398B (en) | Stereo image parallax matching method based on convolutional neural network | |
CN111815665B (en) | Single image crowd counting method based on depth information and scale perception information | |
CN110443775B (en) | Discrete wavelet transform domain multi-focus image fusion method based on convolutional neural network | |
CN109509156B (en) | Image defogging processing method based on generation countermeasure model | |
CN111861939B (en) | Single image defogging method based on unsupervised learning | |
CN111553940B (en) | Depth image edge optimization method and processing device | |
CN113222033A (en) | Monocular image estimation method based on multi-classification regression model and self-attention mechanism | |
CN110751157B (en) | Image significance segmentation and image significance model training method and device | |
CN114677479A (en) | Natural landscape multi-view three-dimensional reconstruction method based on deep learning | |
Hegde et al. | Adaptive cubic spline interpolation in cielab color space for underwater image enhancement | |
CN112509021A (en) | Parallax optimization method based on attention mechanism | |
CN115511759A (en) | Point cloud image depth completion method based on cascade feature interaction | |
CN117314990A (en) | Non-supervision binocular depth estimation method and system based on shielding decoupling network | |
CN116128946B (en) | Binocular infrared depth estimation method based on edge guiding and attention mechanism | |
Yang et al. | Image defogging based on amended dark channel prior and 4‐directional L1 regularisation | |
CN110766609B (en) | Depth-of-field map super-resolution reconstruction method for ToF camera | |
Novikov et al. | Local-adaptive blocks-based predictor for lossless image compression | |
CN115631223A (en) | Multi-view stereo reconstruction method based on self-adaptive learning and aggregation | |
Shuang et al. | Algorithms for improving the quality of underwater optical images: A comprehensive review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |