CN114494386A - Infrared image depth estimation method based on multi-spectral image supervision - Google Patents

Infrared image depth estimation method based on multi-spectral image supervision Download PDF

Info

Publication number
CN114494386A
CN114494386A CN202111531301.3A CN202111531301A CN114494386A CN 114494386 A CN114494386 A CN 114494386A CN 202111531301 A CN202111531301 A CN 202111531301A CN 114494386 A CN114494386 A CN 114494386A
Authority
CN
China
Prior art keywords
image
loss
parallax
infrared
depth estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111531301.3A
Other languages
Chinese (zh)
Inventor
孙正兴
刘胡伟
孙蕴瀚
张巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202111531301.3A priority Critical patent/CN114494386A/en
Publication of CN114494386A publication Critical patent/CN114494386A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-spectral image supervised infrared image depth estimation method, which comprises the following steps: 1) constructing a spectrum conversion module: obtaining a spectrum conversion image according to the multi-spectrum image; 2) constructing a spectrum conversion loss module: obtaining parallax according to the infrared image; 3) constructing an auxiliary loss module: and calculating auxiliary loss by using the spectrum conversion image and the parallax acquired in the step through image warping, and iteratively optimizing the spectrum conversion network model by using the loss. 4) Constructing a depth estimation loss module: carrying out image warping by using the obtained parallax, calculating depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss; 5) constructing an auxiliary loss module: and calculating auxiliary loss by using the obtained spectrum conversion image and the parallax through image warping, and iteratively optimizing a spectrum conversion network model by using the loss. 6) Training the whole frame: the method comprises four stages of data preprocessing, model framework preheating, training and testing.

Description

Infrared image depth estimation method based on multi-spectral image supervision
Technical Field
The invention relates to an infrared image depth estimation method, belongs to the technical field of computer graphics, and particularly relates to an infrared image depth estimation method based on multi-spectral image supervision.
Background
For many large and complex projects, there is an urgent need for a low-cost solution capable of monitoring the quality of the project for a long time and periodically detecting the defects of the project to ensure the safety of the project and meet the requirement of daily maintenance, so that the inspection robot has received much attention. The inspection robot acquires multi-mode information through various sensors to complete a series of two-dimensional or three-dimensional tasks, such as defect detection in the two-dimensional tasks and three-dimensional reconstruction in the three-dimensional tasks. And depth information plays an important role in these tasks.
Considering that the infrared camera has the characteristic of being insensitive to the environment, the infrared camera can directly measure the infrared radiation of an object and the environment no matter whether an external light source exists or not, and therefore, the depth information obtained by utilizing the infrared image acquired by the monocular infrared camera has the more outstanding advantage than other methods. For example, compared with active sensors such as a laser radar and a depth camera with a light structure, which are expensive and have different defects when facing a complex scene, a passive sensor based on a standard imaging technology, such as an infrared camera, is cheaper in price, lighter in weight and stronger in adaptability, can be more flexibly deployed on an inspection robot, and can adapt to different complex environments; compared with a method that prediction cannot be well performed in a night environment and a low-light or even zero-light environment, a method for obtaining depth information by combining an RGB image acquired by a monocular RGB camera with a depth estimation technology is disclosed in document 1: godard C, Aodha O M, Firman M, et al, scaling Into Self-supper simple Depth estimation, International Conference on Computer Vision, 2019, 3827 and 3837.
Currently, an infrared image depth estimation method for obtaining depth information from a single infrared image is a supervised method, for example, document 2: wang, q., Zhao, h., Hu, z.et al.discrete connected CRF networks for depth estimation from cellular in images, int.j.mach.learn. & cyber.12, 2021, 187 + 200. Without the use of depth tags, it is difficult to obtain depth information from only a single infrared image, where it is desirable to generate the supervisory signals with a cheaper RGB camera. However, the infrared image depth estimation based on multi-spectrum supervision has the problem that the spectrums have large appearance difference, and the spectrums cannot be directly matched. Therefore, a multi-spectral image supervised infrared image depth estimation method is provided. The method can obtain the depth information from a single infrared image by using multi-modal information as a supervision signal under the condition of no supervision of a depth label.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a multi-spectral-image-supervised infrared image depth estimation method which can accurately estimate the depth information of a single infrared image.
In order to solve the technical problem, the invention discloses an infrared image depth method for multi-spectral image supervision, which comprises the following steps of:
step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network, inputting the multi-frequency spectrum image into a frequency spectrum conversion network model, and obtaining a frequency spectrum conversion image of the multi-frequency spectrum image under the condition of neglecting parallax, namely converting an infrared right image and an RGB left image into an RGB right image and an infrared left image; the right image is an image obtained by a right camera in the binocular camera, the left image is an image obtained by a left camera in the binocular camera, the infrared spectrum indicates that the frequency spectrum of the image is an infrared spectrum, and the RGB indicates that the frequency spectrum of the image is a visible light spectrum, namely, the image is composed of three channels of red, green and blue.
Step 2, constructing a depth estimation module: building a depth estimation network, and inputting the infrared right image into a depth estimation network model to obtain parallax;
step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain a spectrum conversion loss, and iteratively optimizing the spectrum conversion network model by using the loss;
step 4, constructing a depth estimation loss module: carrying out image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss;
step 5, constructing an auxiliary loss module: and (3) calculating to obtain auxiliary loss through image warping by using the spectrum conversion image and the parallax obtained in the step (1) and the step (2), and performing iterative optimization on the spectrum conversion network model by using the loss.
Step 6, training the whole framework: the method comprises the steps of unifying a multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the depth estimation loss module, the spectrum conversion loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.
The step 1 comprises the following steps:
step 1-1: the construction of the spectrum conversion network comprises the construction of a spectrum conversion generator G and a spectrum conversion discriminator D
Step 1-2: the input multi-spectral spectrum image obtains a spectrum conversion image with parallax neglected, there are
Figure BDA0003410869700000031
And
Figure BDA0003410869700000032
i.e. the infrared right image IA(p) conversion to RGB Right image
Figure BDA0003410869700000033
RGB left image IB(p) conversion to an infrared left image
Figure BDA0003410869700000034
Wherein the superscript fake indicates that the image is the output term obtained by the spectral conversion module.
The step 2 comprises the following steps:
step 2-1: constructing a depth estimation network comprises constructing a depth estimation network M;
step 2-2: inputting a single infrared right image IA(p) generating left and right parallaxes dlAnd dr. Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax. l and r represent the left and right images, respectively;
the step 3 comprises the following steps:
step 3-1: acquiring a cyclic conversion image and a consistent reconstruction image;
step 3-2: the spectral conversion loss is designed. The first iteration optimizes the spectrum conversion network by utilizing the spectrum conversion loss which is LGAnd LDThe two parts are composed of the following forms:
Figure BDA0003410869700000035
Figure BDA0003410869700000036
wherein λ iscycrecgAnd λdAre respectively loss terms
Figure BDA0003410869700000037
And
Figure BDA0003410869700000038
the weights of (a) and (b) are set to 10,5,1 and 1, respectively, in the present invention.
Figure BDA0003410869700000039
And
Figure BDA00034108697000000310
the cycles of the shared encoder generate the countering losses of the countering network F-CycleGAN generator G and the discriminator D, respectively, which fulfill the task of image conversion.
Figure BDA00034108697000000311
Is a loss of consistency of the cycle,
Figure BDA00034108697000000312
is a consistent reconstruction penalty, both of which together fulfill the task of ignoring parallax when transforming the spectrum. Cyc in the upper subscript indicates that the variable is associated with a cyclic uniform loss, rec indicates that the variable is associated with a uniform reconstruction loss, adv indicates that the variable is associated with an antagonistic loss, and G and D indicate that the variable is associated with an antagonistic loss of the generator G and an antagonistic loss of the discriminator D, respectively.
Step 3-1 comprises the following steps:
step 3-1-1: converting a spectrum into an image
Figure BDA0003410869700000041
And
Figure BDA0003410869700000042
inputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion image
Figure BDA0003410869700000043
And
Figure BDA0003410869700000044
namely have
Figure BDA0003410869700000045
And
Figure BDA0003410869700000046
step 3-1-2: will multi-spectral image IA(p) and IB(p) input into a generator of a spectral conversion network, but using the same as beforeTo generators inverse to the multi-spectral image, resulting in a coherent reconstructed image
Figure BDA0003410869700000047
And
Figure BDA0003410869700000048
namely have
Figure BDA0003410869700000049
And
Figure BDA00034108697000000410
step 4 comprises the following steps:
step 4-1: the image is warped.
Step 4-2: the depth estimation penalty is designed. Utilizing depth estimation loss L in iteratively optimizing depth estimation networkMEN. Depth estimation penalty LMENThe form is as follows:
Figure BDA00034108697000000411
wherein alpha isap,αdsAnd alphalrAre corresponding loss weights, which are set to 1.0,0.2,0.1, respectively, in the present invention.
Figure BDA00034108697000000412
Respectively, appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.
Step 4-1 comprises the following steps:
step 4-1-1: constructing a warp module, defining a warp operation ω, which performs the following for p ═ x, y, as:
Figure BDA00034108697000000413
wherein, IlAnd IrRespectively representing a left image and a right image,
Figure BDA00034108697000000414
and
Figure BDA00034108697000000415
respectively representing the warped pseudo left image and the pseudo right image.
Step 4-1-2: will infrared right image Ir(p) infrared left image Il(p) performing left parallax dlAnd right parallax drCarrying out warping operation to obtain a pseudo infrared left image
Figure BDA00034108697000000416
And pseudo-infrared right image
Figure BDA00034108697000000417
Step 5 comprises the following steps:
step 5-1: loss-assisted module image warping is assisted. Utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming a warping operation, including
Figure BDA0003410869700000051
Recalculating to obtain the infrared right synthetic image
Figure BDA0003410869700000052
And RGB left composite image
Figure BDA0003410869700000053
Step 5-2: the auxiliary losses are designed. The auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image IA(p) and RGB left image IBAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image.Loss of assistance
Figure BDA0003410869700000054
The design is as follows:
Figure BDA0003410869700000055
wherein the content of the first and second substances,
Figure BDA0003410869700000056
and
Figure BDA0003410869700000057
is a hyper-parameter, here all set to 20,
Figure BDA0003410869700000058
and
Figure BDA0003410869700000059
the image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that the loss is related to the auxiliary loss.
Step 6 comprises the following steps:
step 6-1: in the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I isa(p) expansion into a three-channel image IA(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channelsaAnd (p) realizing the consistency of the infrared image and the RGB image.
Step 6-2: the mold frame is preheated. Model preheating process only utilizes spectrum conversion loss L in one iteration turnGAnd LDAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax.
Step 6-3: and (5) training a model framework. The complete model training process firstly utilizes the spectrum conversion loss L in an iteration roundGAnd LDTraining a spectral translation network and secondarily using depthEstimating loss LMENTraining the depth estimation, and finally utilizing the auxiliary loss
Figure BDA00034108697000000510
And retraining the spectrum conversion network.
And 6-4: and (5) testing a model framework.
Step 6-4 comprises the following steps:
and 6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into the trained depth estimation network to obtain the right parallax.
Step 6-4-2, post-processing the right parallax by using a formula
Figure BDA0003410869700000061
Converting the disparity into depth, wherein
Figure BDA0003410869700000062
And obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.
Has the advantages that: the invention has the following advantages: firstly, the depth estimation method of the invention can complete the depth estimation of a single infrared image, and only multi-spectral images are needed without using a depth label as supervision during training, thereby reducing the acquisition difficulty and cost of a training data set. Secondly, the invention solves the problem of larger appearance difference between frequency spectrums by utilizing a frequency spectrum conversion network, so that the multi-spectrum image can be used for depth estimation. Finally, the auxiliary loss designed by the invention enables the frequency spectrum conversion network to obtain a clearer image, and the accuracy of depth estimation of the depth estimation network is improved.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic process flow diagram of the present invention.
Fig. 2a is an RGB left image example of the embodiment.
Fig. 2b is an infrared right image example of an embodiment.
Fig. 2c is an example of the right disparity estimation result of the embodiment.
Detailed Description
Examples
As shown in fig. 1, the present invention relates to an infrared image depth method with multi-spectral image supervision, which comprises the following steps:
1. constructing a spectrum conversion module
Inputting: the multi-spectral image includes an infrared right image and an RGB left image.
And (3) outputting: the spectrum conversion image includes an RGB right image and an infrared left image.
1.1 building a spectrum conversion network includes building a spectrum conversion generator G and a spectrum conversion discriminator D
The spectrum conversion network can obtain a spectrum conversion image from a spectrum image to be converted under the condition of ignoring parallax, namely, an infrared right image is converted into an RGB right image, and an RGB left image is converted into an infrared left image, so that the problem that the spectrum and the infrared left image have large appearance difference and cannot be directly matched in the depth estimation network is solved. The spectrum conversion network of this example is based on document 3: F-CycleGAN construction of Liang M, Guo X, Li H, et al, in the case of unsuperved cross-spectral analysis by learning to synthesis, proceedings of the AAAI reference on the scientific interest.2019, 33(01): 8706-. The spectral transformation generator G comprises an encoder F, two decoders GAAnd GBThe spectrum conversion discriminator D consists of two discriminators DAAnd DBAnd (4) forming. Wherein the encoder F comprises 2 convolutional layers for downsampling and 4 residual blocks, two decoders GAAnd GBComprising 2 convolutional layers for upsampling and 4 residual blocks, a discriminator DAAnd DBConsists of 5 convolutional layers.
1.2 input Multi-spectral image spectral conversion image is obtained ignoring parallax
Figure BDA0003410869700000071
And
Figure BDA0003410869700000072
i.e. the infrared right image IA(p) conversion to RGB Right image
Figure BDA0003410869700000073
RGB left image IB(p) conversion to an infrared left image
Figure BDA0003410869700000074
Wherein the superscript fake indicates that the image is the output term obtained by the spectral conversion module.
2. Constructing a depth estimation module
Inputting: and infrared right image.
And (3) outputting: parallax, including left parallax and right parallax.
2.1 building a depth estimation network includes building a depth estimation network M
And the depth estimation network inputs a single infrared right image and outputs a left-right parallax corresponding to the image, and the converted infrared left image converted by the frequency spectrum conversion network and the input single infrared right image are used as a monitoring signal. The depth estimation network of this example is based on document 4: monodepth, Godard, O.Mac Aodha, and G.J.Brostow.Unsurmounted monoclonal estimation with left-right consistency. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017. The depth estimation network M comprises an encoder and a decoder. Using resnet18 as the encoder, the decoder consists of 4 convolutional layers.
2.2 Single Infrared Right image I to be inputA(p) generating left and right parallaxes dlAnd dr. Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax. l and r represent the left and right images, respectively.
3. Building spectral conversion loss module
Inputting: a spectral conversion network, a multi-spectral image, and a spectral conversion image.
And (3) outputting: spectral conversion loss.
3.1 acquiring circularly transformed and consistently reconstructed images
Step 1, converting the frequency spectrum into an image
Figure BDA0003410869700000081
And
Figure BDA0003410869700000082
inputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion image
Figure BDA0003410869700000083
And
Figure BDA0003410869700000084
namely have
Figure BDA0003410869700000085
And
Figure BDA0003410869700000086
step 2, the multi-spectral image IA(p) and IB(p) input into a generator of a spectral conversion network, but using a generator that is the inverse of the previously obtained multi-spectral image, a coherent reconstructed image is obtained
Figure BDA0003410869700000087
And
Figure BDA0003410869700000088
namely have
Figure BDA0003410869700000089
And
Figure BDA00034108697000000810
3.2 design spectral conversion loss
As shown in stage 1 of FIG. 1, the first iteration optimizes the spectral translation networkThe spectrum conversion loss is utilized when the network is on, and the spectrum conversion loss is LGAnd LDThe two parts are composed of the following forms:
Figure BDA00034108697000000811
Figure BDA00034108697000000812
wherein λ iscycrecgAnd λdAre respectively loss terms
Figure BDA00034108697000000813
And
Figure BDA00034108697000000814
the weights of (a) and (b) are set to 10,5,1 and 1, respectively, in the present invention.
Figure BDA00034108697000000815
And
Figure BDA00034108697000000816
the cycles of the shared encoder generate the countering losses of the countering network F-CycleGAN generator G and the discriminator D, respectively, which fulfill the task of image conversion.
Figure BDA00034108697000000817
Is a loss of consistency of the cycle,
Figure BDA00034108697000000818
is a consistent reconstruction penalty, both of which together fulfill the task of ignoring parallax when transforming the spectrum. Cyc in the upper subscript indicates that the variable is associated with a cyclic uniform loss, rec indicates that the variable is associated with a uniform reconstruction loss, adv indicates that the variable is associated with an antagonistic loss, and G and D indicate that the variable is associated with an antagonistic loss of the generator G and an antagonistic loss of the discriminator D, respectively.
Loss of consistency of circulation
Figure BDA00034108697000000819
The design of (1) is as follows:
Figure BDA00034108697000000820
wherein N is the number of picture pixels,
Figure BDA00034108697000000821
and
Figure BDA00034108697000000822
is a cyclic conversion image.
Consistent reconstruction loss
Figure BDA00034108697000000823
The design of (1) is as follows:
Figure BDA00034108697000000824
wherein N is the number of picture pixels,
Figure BDA0003410869700000091
and
Figure BDA0003410869700000092
is a consistent reconstructed image.
4. Building a depth estimation loss module
Inputting: an infrared right image, an infrared left image in the spectrum conversion image, a left parallax, and a right parallax.
And (3) outputting: the depth estimates the loss.
4.1 image warping
This document refers to a single infrared right image IA(p) Relabelling as Ir(p), converted infrared left image
Figure BDA0003410869700000093
Relabeling as Il(p)。
Step 1, constructing a warping module, defining a warping operation ω, which performs the following operations for p ═ x, y, including:
Figure BDA0003410869700000094
wherein, IlAnd IrRespectively representing a left image and a right image,
Figure BDA0003410869700000095
and
Figure BDA0003410869700000096
respectively representing the warped pseudo left image and the pseudo right image.
Step 2, the infrared right image Ir(p) infrared left image Il(p) using respective left parallaxes dlAnd right parallax drCarrying out warping operation to obtain a pseudo infrared left image
Figure BDA0003410869700000097
And pseudo-infrared right image
Figure BDA0003410869700000098
4.2 design depth estimation penalty
As shown in stage 2 of FIG. 1, depth estimation loss L is utilized in iteratively optimizing a depth estimation networkMEN. Depth estimation penalty LMENThe form is as follows:
Figure BDA0003410869700000099
wherein alpha isapdsAnd alphalrAre the corresponding loss weights, which in this example are set to 1.0,0.2,0.1, respectively.
Figure BDA00034108697000000910
Respectively, loss of appearance reconstruction, visionDifferential smooth loss and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.
Figure BDA00034108697000000911
And
Figure BDA00034108697000000912
the items on the left and the right are indicated separately, and only the item on the left is explained here, since the right can derive the method by interchanging the labels l and r.
Figure BDA00034108697000000913
Is the loss of appearance reconstruction, which is designed to be:
Figure BDA00034108697000000914
where α is a hyper-parameter, which is set to 0.9 in this example, SSIM is a structural similarity function, and N is the number of picture pixels.
Figure BDA0003410869700000101
Is the parallax smoothing loss, which is designed to be:
Figure BDA0003410869700000102
wherein the content of the first and second substances,
Figure BDA0003410869700000103
and
Figure BDA0003410869700000104
are respectively dlAnd IlN is the number of picture pixels.
Figure BDA0003410869700000105
Is a left-right parallax consistent loss, which is designed as:
Figure BDA0003410869700000106
where N is the number of picture pixels.
5. Building auxiliary loss modules
Inputting: infrared right image, RGB left image, left parallax, and right parallax.
And (3) outputting: loss of support.
5.1 auxiliary loss Module image warping
Utilizing the warping operation omega constructed in the step 4.1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming a warping operation, including
Figure BDA0003410869700000107
Figure BDA0003410869700000108
Recalculating to obtain an infrared right composite image
Figure BDA0003410869700000109
And RGB left composite image
Figure BDA00034108697000001010
5.2 design assistance loss
As shown in stage 3 of fig. 1, the second iteration optimizes the spectrum conversion network by using the auxiliary loss which uses the original infrared right image IA(p) and RGB left image IBAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image. Loss of assistance
Figure BDA00034108697000001011
The design is as follows:
Figure BDA00034108697000001012
wherein the content of the first and second substances,
Figure BDA00034108697000001013
and
Figure BDA00034108697000001014
is a hyper-parameter, here all set to 20,
Figure BDA00034108697000001015
and
Figure BDA00034108697000001016
the image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that this variable is associated with a loss of assist.
6. Whole frame training
The method comprises the steps of unifying an input multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the spectrum conversion loss module, the depth estimation loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.
6.1 data preprocessing
Inputting: multi-spectral images.
And (3) outputting: channel-consistent multi-spectral images.
In the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I isa(p) expansion into a three-channel image IA(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channelsaAnd (p) realizing the consistency of the infrared image and the RGB image. And unifies the sizes of all images to 256 × 256.
6.2 mold frame preheating
Inputting: channel consistent multi-spectral images.
And (3) outputting: spectrally converted images
Model framework preheating process only utilizes spectrum conversion loss L in one iteration turnGAnd LDAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax. I.e. only phase 1 of fig. 1 is performed.
6.3 model framework training
Inputting: channel consistent multi-spectral images
And (3) outputting: parallax, including left parallax and right parallax
In an iteration turn, the model framework training process firstly utilizes the spectrum conversion loss LGAnd LDTraining the spectrum conversion network, and estimating the loss L by using depthMENTraining the depth estimation, and finally utilizing the auxiliary loss
Figure BDA0003410869700000111
And retraining the spectrum conversion network. I.e. phase 1, phase 2 and phase 3 of fig. 1 are performed.
6.4 model frame testing
Inputting: target infrared image
And (3) outputting: depth estimation result
Step 1, channel expansion is carried out on the target infrared image, and then the target infrared image is input into a trained depth estimation network to obtain right parallax.
Step 2, post-processing the right parallax by using a formula
Figure BDA0003410869700000121
Converting the disparity into depth, wherein
Figure BDA0003410869700000122
And obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.
In the present embodiment, fig. 2a is an RGB left image example, fig. 2b is an infrared right image example, and fig. 2c is a right disparity estimation result example. By the depth estimation method of the embodiment, the depth estimation of a single infrared image is completed, as shown in fig. 2c, and a depth tag is not needed for supervision during training, but only multi-spectral images are needed, as shown in fig. 2a and fig. 2b, which reduces the difficulty and cost of acquiring a training data set.
The present invention provides a multi-spectral supervised infrared image depth estimation method, and a number of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a number of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (10)

1. A multi-spectral image supervised infrared image depth estimation method is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network model, and inputting the multi-spectrum image into the frequency spectrum conversion network model to obtain a frequency spectrum conversion image of which the multi-spectrum image ignores parallax error;
step 2, constructing a depth estimation module: building a depth estimation network model, and inputting the infrared right image into the depth estimation network model to obtain parallax;
step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain spectrum conversion loss, and performing iterative optimization on the spectrum conversion network model by using the spectrum conversion loss;
step 4, constructing a depth estimation loss module: performing image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and performing iterative optimization on a depth estimation network model by using the depth estimation loss;
step 5, constructing an auxiliary loss module: obtaining auxiliary loss through image warping calculation by using the spectrum conversion image and the parallax obtained in the step 1 and the step 2, and performing iterative optimization on the spectrum conversion network model by using the auxiliary loss;
step 6, training the whole framework: unifying the multi-spectral image data set to a consistent channel through channel expansion, and inputting the processed data to a spectrum conversion module to obtain a spectrum conversion image ignoring parallax; preheating the frequency spectrum conversion module through the frequency spectrum conversion loss module, and inputting the infrared right image into the depth estimation module to obtain parallax; the spectrum conversion module and the depth estimation module are sequentially and iteratively optimized through the spectrum conversion loss module, the depth estimation loss module and the auxiliary loss module, and the depth estimation of a single infrared image is realized through the trained depth estimation module and post-processing.
2. The multi-spectral image-supervised infrared image depth estimation method of claim 1, wherein: the step 1 comprises the following steps of,
step 1-1: building a spectrum conversion network, including building a spectrum conversion generator G and a spectrum conversion discriminator D; the spectral transformation generator G comprises an encoder F, two decoders GAAnd GBThe spectrum conversion discriminator D consists of two discriminators DAAnd DBComposition is carried out;
step 1-2: inputting a multi-spectrum image, obtaining a spectrum conversion image ignoring parallax,
Figure FDA0003410869690000011
and
Figure FDA0003410869690000012
will infrared right image IA(p) conversion to RGB Right image
Figure FDA0003410869690000013
RGB left image IB(p) conversion to an Infrared left image
Figure FDA0003410869690000014
Wherein p represents the pixel coordinate of any point in the image, subscript A represents the infrared spectrum, subscript B represents the RGB spectrum, superscript fake represents that the image is the output item obtained by the spectrum conversion module, RGB represents the image
Consists of three channels of red, green and blue.
3. The method of claim 2, wherein the method comprises: the step 2 comprises the following steps of,
step 2-1: building a depth estimation network, including building a depth estimation network M;
step 2-2: inputting a single infrared right image IA(p) generating left and right parallaxes dlAnd dr(ii) a Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax; l and r represent the left and right images, respectively.
4. The method of claim 3, wherein the method comprises: the step 3 comprises the following steps of,
step 3-1: acquiring a cyclic conversion image and a consistent reconstruction image;
step 3-2: designing spectral conversion loss; spectral conversion loss from LGAnd LDTwo parts, denoted as:
Figure FDA0003410869690000021
Figure FDA0003410869690000022
wherein the content of the first and second substances,
Figure FDA0003410869690000023
and
Figure FDA0003410869690000024
respectively, the countermeasure losses of the generator G and the discriminator D;
Figure FDA0003410869690000025
indicating a loss of consistency in the cycle,
Figure FDA00034108696900000216
representing a consistent reconstruction loss; lambda [ alpha ]cyc、λrec、λgAnd λdAre respectively loss terms
Figure FDA0003410869690000026
And
Figure FDA0003410869690000027
the weight of (c); lambda denotes that the variable is a hyperparameter, cyc denotes that the variable is related to the cyclic uniform loss, rec denotes that the variable is related to the uniform reconstruction loss, adv denotes that the variable is related to the countermeasure loss, and G and D denote that the variable is related to the countermeasure loss of the generator G and the countermeasure loss of the discriminator D, respectively.
5. The method of claim 4, wherein the method comprises: step 3-1 comprises the following steps:
step 3-1-1: converting a spectrum into an image
Figure FDA0003410869690000028
And
Figure FDA0003410869690000029
inputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion image
Figure FDA00034108696900000210
And
Figure FDA00034108696900000211
namely:
Figure FDA00034108696900000212
and
Figure FDA00034108696900000213
step 3-1-2: infrared right image I of multi-spectrum imageA(p) and RGB left image IB(p) inputting into a generator of a spectral transformation network, obtaining a coherent reconstructed image using the generator as opposed to obtaining a multi-spectral image
Figure FDA00034108696900000214
And
Figure FDA00034108696900000215
namely:
Figure FDA0003410869690000031
and
Figure FDA0003410869690000032
6. the method of claim 5, wherein the method comprises: the step 4 comprises the following steps of,
step 4-1: warping the image;
step 4-2: designing a depth estimation penalty; utilizing depth estimation loss L in iteratively optimizing depth estimation networkMEN
Depth estimation penalty LMENThe form is as follows:
Figure FDA0003410869690000033
wherein alpha isap、αdsAnd alphalrRepresenting an appearance reconstruction loss weight, a parallax smoothing loss weight and a left-right parallax consistency loss weight;
Figure FDA0003410869690000034
and
Figure FDA0003410869690000035
representing appearance reconstruction loss, parallax smoothing loss and left-right parallax consistency loss; ap, ds, and lr represent variables associated with appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.
7. The method of claim 6, wherein the method comprises: the step 4-1 comprises the steps of,
step 4-1-1: constructing a warp module, defining a warp operation ω, for p ═ x, y,
Figure FDA0003410869690000036
wherein, IlAnd IrRespectively representing a left image and a right image,
Figure FDA0003410869690000037
and
Figure FDA0003410869690000038
respectively representing a warped pseudo left image and a pseudo right image, and an integer x is in the range of [1, H ∈]Representing the pixel abscissa, the integer y ∈ [1, W ∈ ]]Representing the pixel ordinate, W and H being the width and length of the image, respectively;
step 4-1-2: will infrared right image Ir(p) infrared left image Il(p) using left parallax respectivelydlAnd right parallax drCarrying out warping operation to obtain a pseudo infrared left image
Figure FDA0003410869690000039
And pseudo-infrared right image
Figure FDA00034108696900000310
8. The method of claim 7, wherein the method comprises: the step 5 comprises the steps of,
step 5-1: auxiliary loss module image warping; utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming warping operations, i.e.
Figure FDA0003410869690000041
Recalculating to obtain the infrared right synthetic image
Figure FDA0003410869690000042
And RGB left composite image
Figure FDA0003410869690000043
Step 5-2: designing auxiliary loss; the auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image IA(p) and RGB left image IB(p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image; loss of assistance
Figure FDA0003410869690000044
The design is as follows:
Figure FDA0003410869690000045
wherein the content of the first and second substances,
Figure FDA0003410869690000046
and
Figure FDA0003410869690000047
representing the weight of the loss term, alpha representing that the variable is a hyperparameter, and aux representing that the variable is related to the auxiliary loss;
Figure FDA0003410869690000048
and
Figure FDA0003410869690000049
respectively representing an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels.
9. The method of claim 8, wherein the method comprises: the step 6 comprises the steps of,
step 6-1: executing infrared image channel expansion in the data preprocessing process, and expanding the infrared right image into a three-channel image; copying a single-channel image matrix once on each channel of the three channels to realize the consistency of the infrared image and the RGB image;
step 6-2: preheating a model frame: exploiting spectral translation loss L in one iteration roundGAnd LDTraining a spectrum conversion network, and obtaining a spectrum conversion image with parallax omitted by using the multi-spectrum image;
step 6-3: training a model framework: in an iteration round, the spectral conversion loss L is usedGAnd LDTraining the frequency spectrum conversion network and estimating the loss L by using depthMENTraining a depth estimation network with assistance loss
Figure FDA00034108696900000410
Retraining the spectrum conversion network;
step 6-4: and (5) testing a model framework.
10. The method of claim 9, wherein the method comprises: step 6-4 comprises the steps of,
6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into a trained depth estimation network to obtain right parallax;
6-4-2, post-processing the right parallax through a formula
Figure FDA00034108696900000411
Converting the disparity into depth, wherein
Figure FDA00034108696900000412
And obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.
CN202111531301.3A 2021-12-14 2021-12-14 Infrared image depth estimation method based on multi-spectral image supervision Pending CN114494386A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111531301.3A CN114494386A (en) 2021-12-14 2021-12-14 Infrared image depth estimation method based on multi-spectral image supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111531301.3A CN114494386A (en) 2021-12-14 2021-12-14 Infrared image depth estimation method based on multi-spectral image supervision

Publications (1)

Publication Number Publication Date
CN114494386A true CN114494386A (en) 2022-05-13

Family

ID=81493818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111531301.3A Pending CN114494386A (en) 2021-12-14 2021-12-14 Infrared image depth estimation method based on multi-spectral image supervision

Country Status (1)

Country Link
CN (1) CN114494386A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI787141B (en) * 2022-06-21 2022-12-11 鴻海精密工業股份有限公司 Method and equipment for training depth estimation model, and method and equipment for depth estimation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI787141B (en) * 2022-06-21 2022-12-11 鴻海精密工業股份有限公司 Method and equipment for training depth estimation model, and method and equipment for depth estimation

Similar Documents

Publication Publication Date Title
US11830222B2 (en) Bi-level optimization-based infrared and visible light fusion method
CN111145131A (en) Infrared and visible light image fusion method based on multi-scale generation type countermeasure network
US20060228049A1 (en) Method for capturing images comprising a measurement of local motions
US20160094827A1 (en) Hyperspectral imaging devices using hybrid vector and tensor processing
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN107680116A (en) A kind of method for monitoring moving object in video sequences
Arad et al. NTIRE 2022 spectral demosaicing challenge and data set
Sun et al. Color polarization demosaicking by a convolutional neural network
CN113887645B (en) Remote sensing image fusion classification method based on joint attention twin network
CN114494386A (en) Infrared image depth estimation method based on multi-spectral image supervision
CN117252928B (en) Visual image positioning system for modular intelligent assembly of electronic products
CN105844589A (en) Method for realizing light field image super-resolution based on mixed imaging system
Nguyen et al. Different structured-light patterns in single-shot 2D-to-3D image conversion using deep learning
CN113992920A (en) Video compressed sensing reconstruction method based on deep expansion network
Tang et al. MPCFusion: Multi-scale parallel cross fusion for infrared and visible images via convolution and vision Transformer
CN111369483A (en) Method for generating high-spatial-temporal-resolution remote sensing data by fusing multi-source remote sensing data
CN116486155A (en) Target detection method based on transducer and cascade characteristics
CN116452480A (en) Method for fusing infrared and visible light images
Yang et al. Mipi 2022 challenge on rgbw sensor fusion: Dataset and report
CN109993782B (en) Heterogeneous remote sensing image registration method and device for ring-shaped generation countermeasure network
Zhang et al. Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes
Jayalaxmi et al. Performance analysis on color image mosaicing techniques on FPGA
Liu et al. Underwater image stitching method based on optimal seam and multi-resolution fusion algorithms
Fan et al. An effective single image depth estimating algorithm
Luo et al. Simultaneous monocular visual odometry and depth reconstruction with scale recover

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination