CN114494386A - Infrared image depth estimation method based on multi-spectral image supervision - Google Patents
Infrared image depth estimation method based on multi-spectral image supervision Download PDFInfo
- Publication number
- CN114494386A CN114494386A CN202111531301.3A CN202111531301A CN114494386A CN 114494386 A CN114494386 A CN 114494386A CN 202111531301 A CN202111531301 A CN 202111531301A CN 114494386 A CN114494386 A CN 114494386A
- Authority
- CN
- China
- Prior art keywords
- image
- loss
- parallax
- infrared
- depth estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000006243 chemical reaction Methods 0.000 claims abstract description 131
- 238000001228 spectrum Methods 0.000 claims abstract description 128
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 230000003595 spectral effect Effects 0.000 claims description 20
- 125000004122 cyclic group Chemical group 0.000 claims description 11
- 239000002131 composite material Substances 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 238000013461 design Methods 0.000 claims description 8
- 238000009499 grossing Methods 0.000 claims description 7
- 238000012805 post-processing Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 230000001427 coherent effect Effects 0.000 claims description 3
- 238000002329 infrared spectrum Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims 1
- 230000003042 antagnostic effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-spectral image supervised infrared image depth estimation method, which comprises the following steps: 1) constructing a spectrum conversion module: obtaining a spectrum conversion image according to the multi-spectrum image; 2) constructing a spectrum conversion loss module: obtaining parallax according to the infrared image; 3) constructing an auxiliary loss module: and calculating auxiliary loss by using the spectrum conversion image and the parallax acquired in the step through image warping, and iteratively optimizing the spectrum conversion network model by using the loss. 4) Constructing a depth estimation loss module: carrying out image warping by using the obtained parallax, calculating depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss; 5) constructing an auxiliary loss module: and calculating auxiliary loss by using the obtained spectrum conversion image and the parallax through image warping, and iteratively optimizing a spectrum conversion network model by using the loss. 6) Training the whole frame: the method comprises four stages of data preprocessing, model framework preheating, training and testing.
Description
Technical Field
The invention relates to an infrared image depth estimation method, belongs to the technical field of computer graphics, and particularly relates to an infrared image depth estimation method based on multi-spectral image supervision.
Background
For many large and complex projects, there is an urgent need for a low-cost solution capable of monitoring the quality of the project for a long time and periodically detecting the defects of the project to ensure the safety of the project and meet the requirement of daily maintenance, so that the inspection robot has received much attention. The inspection robot acquires multi-mode information through various sensors to complete a series of two-dimensional or three-dimensional tasks, such as defect detection in the two-dimensional tasks and three-dimensional reconstruction in the three-dimensional tasks. And depth information plays an important role in these tasks.
Considering that the infrared camera has the characteristic of being insensitive to the environment, the infrared camera can directly measure the infrared radiation of an object and the environment no matter whether an external light source exists or not, and therefore, the depth information obtained by utilizing the infrared image acquired by the monocular infrared camera has the more outstanding advantage than other methods. For example, compared with active sensors such as a laser radar and a depth camera with a light structure, which are expensive and have different defects when facing a complex scene, a passive sensor based on a standard imaging technology, such as an infrared camera, is cheaper in price, lighter in weight and stronger in adaptability, can be more flexibly deployed on an inspection robot, and can adapt to different complex environments; compared with a method that prediction cannot be well performed in a night environment and a low-light or even zero-light environment, a method for obtaining depth information by combining an RGB image acquired by a monocular RGB camera with a depth estimation technology is disclosed in document 1: godard C, Aodha O M, Firman M, et al, scaling Into Self-supper simple Depth estimation, International Conference on Computer Vision, 2019, 3827 and 3837.
Currently, an infrared image depth estimation method for obtaining depth information from a single infrared image is a supervised method, for example, document 2: wang, q., Zhao, h., Hu, z.et al.discrete connected CRF networks for depth estimation from cellular in images, int.j.mach.learn. & cyber.12, 2021, 187 + 200. Without the use of depth tags, it is difficult to obtain depth information from only a single infrared image, where it is desirable to generate the supervisory signals with a cheaper RGB camera. However, the infrared image depth estimation based on multi-spectrum supervision has the problem that the spectrums have large appearance difference, and the spectrums cannot be directly matched. Therefore, a multi-spectral image supervised infrared image depth estimation method is provided. The method can obtain the depth information from a single infrared image by using multi-modal information as a supervision signal under the condition of no supervision of a depth label.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a multi-spectral-image-supervised infrared image depth estimation method which can accurately estimate the depth information of a single infrared image.
In order to solve the technical problem, the invention discloses an infrared image depth method for multi-spectral image supervision, which comprises the following steps of:
step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network, inputting the multi-frequency spectrum image into a frequency spectrum conversion network model, and obtaining a frequency spectrum conversion image of the multi-frequency spectrum image under the condition of neglecting parallax, namely converting an infrared right image and an RGB left image into an RGB right image and an infrared left image; the right image is an image obtained by a right camera in the binocular camera, the left image is an image obtained by a left camera in the binocular camera, the infrared spectrum indicates that the frequency spectrum of the image is an infrared spectrum, and the RGB indicates that the frequency spectrum of the image is a visible light spectrum, namely, the image is composed of three channels of red, green and blue.
Step 2, constructing a depth estimation module: building a depth estimation network, and inputting the infrared right image into a depth estimation network model to obtain parallax;
step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain a spectrum conversion loss, and iteratively optimizing the spectrum conversion network model by using the loss;
step 4, constructing a depth estimation loss module: carrying out image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss;
step 5, constructing an auxiliary loss module: and (3) calculating to obtain auxiliary loss through image warping by using the spectrum conversion image and the parallax obtained in the step (1) and the step (2), and performing iterative optimization on the spectrum conversion network model by using the loss.
Step 6, training the whole framework: the method comprises the steps of unifying a multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the depth estimation loss module, the spectrum conversion loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.
The step 1 comprises the following steps:
step 1-1: the construction of the spectrum conversion network comprises the construction of a spectrum conversion generator G and a spectrum conversion discriminator D
Step 1-2: the input multi-spectral spectrum image obtains a spectrum conversion image with parallax neglected, there areAndi.e. the infrared right image IA(p) conversion to RGB Right imageRGB left image IB(p) conversion to an infrared left imageWherein the superscript fake indicates that the image is the output term obtained by the spectral conversion module.
The step 2 comprises the following steps:
step 2-1: constructing a depth estimation network comprises constructing a depth estimation network M;
step 2-2: inputting a single infrared right image IA(p) generating left and right parallaxes dlAnd dr. Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax. l and r represent the left and right images, respectively;
the step 3 comprises the following steps:
step 3-1: acquiring a cyclic conversion image and a consistent reconstruction image;
step 3-2: the spectral conversion loss is designed. The first iteration optimizes the spectrum conversion network by utilizing the spectrum conversion loss which is LGAnd LDThe two parts are composed of the following forms:
wherein λ iscyc,λrec,λgAnd λdAre respectively loss termsAndthe weights of (a) and (b) are set to 10,5,1 and 1, respectively, in the present invention.Andthe cycles of the shared encoder generate the countering losses of the countering network F-CycleGAN generator G and the discriminator D, respectively, which fulfill the task of image conversion.Is a loss of consistency of the cycle,is a consistent reconstruction penalty, both of which together fulfill the task of ignoring parallax when transforming the spectrum. Cyc in the upper subscript indicates that the variable is associated with a cyclic uniform loss, rec indicates that the variable is associated with a uniform reconstruction loss, adv indicates that the variable is associated with an antagonistic loss, and G and D indicate that the variable is associated with an antagonistic loss of the generator G and an antagonistic loss of the discriminator D, respectively.
Step 3-1 comprises the following steps:
step 3-1-1: converting a spectrum into an imageAndinputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion imageAndnamely have
step 3-1-2: will multi-spectral image IA(p) and IB(p) input into a generator of a spectral conversion network, but using the same as beforeTo generators inverse to the multi-spectral image, resulting in a coherent reconstructed imageAndnamely haveAnd
step 4 comprises the following steps:
step 4-1: the image is warped.
Step 4-2: the depth estimation penalty is designed. Utilizing depth estimation loss L in iteratively optimizing depth estimation networkMEN. Depth estimation penalty LMENThe form is as follows:
wherein alpha isap,αdsAnd alphalrAre corresponding loss weights, which are set to 1.0,0.2,0.1, respectively, in the present invention.Respectively, appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.
Step 4-1 comprises the following steps:
step 4-1-1: constructing a warp module, defining a warp operation ω, which performs the following for p ═ x, y, as:
wherein, IlAnd IrRespectively representing a left image and a right image,andrespectively representing the warped pseudo left image and the pseudo right image.
Step 4-1-2: will infrared right image Ir(p) infrared left image Il(p) performing left parallax dlAnd right parallax drCarrying out warping operation to obtain a pseudo infrared left imageAnd pseudo-infrared right image
Step 5 comprises the following steps:
step 5-1: loss-assisted module image warping is assisted. Utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming a warping operation, includingRecalculating to obtain the infrared right synthetic imageAnd RGB left composite image
Step 5-2: the auxiliary losses are designed. The auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image IA(p) and RGB left image IBAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image.Loss of assistanceThe design is as follows:
wherein the content of the first and second substances,andis a hyper-parameter, here all set to 20,andthe image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that the loss is related to the auxiliary loss.
Step 6 comprises the following steps:
step 6-1: in the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I isa(p) expansion into a three-channel image IA(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channelsaAnd (p) realizing the consistency of the infrared image and the RGB image.
Step 6-2: the mold frame is preheated. Model preheating process only utilizes spectrum conversion loss L in one iteration turnGAnd LDAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax.
Step 6-3: and (5) training a model framework. The complete model training process firstly utilizes the spectrum conversion loss L in an iteration roundGAnd LDTraining a spectral translation network and secondarily using depthEstimating loss LMENTraining the depth estimation, and finally utilizing the auxiliary lossAnd retraining the spectrum conversion network.
And 6-4: and (5) testing a model framework.
Step 6-4 comprises the following steps:
and 6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into the trained depth estimation network to obtain the right parallax.
Step 6-4-2, post-processing the right parallax by using a formulaConverting the disparity into depth, whereinAnd obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.
Has the advantages that: the invention has the following advantages: firstly, the depth estimation method of the invention can complete the depth estimation of a single infrared image, and only multi-spectral images are needed without using a depth label as supervision during training, thereby reducing the acquisition difficulty and cost of a training data set. Secondly, the invention solves the problem of larger appearance difference between frequency spectrums by utilizing a frequency spectrum conversion network, so that the multi-spectrum image can be used for depth estimation. Finally, the auxiliary loss designed by the invention enables the frequency spectrum conversion network to obtain a clearer image, and the accuracy of depth estimation of the depth estimation network is improved.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic process flow diagram of the present invention.
Fig. 2a is an RGB left image example of the embodiment.
Fig. 2b is an infrared right image example of an embodiment.
Fig. 2c is an example of the right disparity estimation result of the embodiment.
Detailed Description
Examples
As shown in fig. 1, the present invention relates to an infrared image depth method with multi-spectral image supervision, which comprises the following steps:
1. constructing a spectrum conversion module
Inputting: the multi-spectral image includes an infrared right image and an RGB left image.
And (3) outputting: the spectrum conversion image includes an RGB right image and an infrared left image.
1.1 building a spectrum conversion network includes building a spectrum conversion generator G and a spectrum conversion discriminator D
The spectrum conversion network can obtain a spectrum conversion image from a spectrum image to be converted under the condition of ignoring parallax, namely, an infrared right image is converted into an RGB right image, and an RGB left image is converted into an infrared left image, so that the problem that the spectrum and the infrared left image have large appearance difference and cannot be directly matched in the depth estimation network is solved. The spectrum conversion network of this example is based on document 3: F-CycleGAN construction of Liang M, Guo X, Li H, et al, in the case of unsuperved cross-spectral analysis by learning to synthesis, proceedings of the AAAI reference on the scientific interest.2019, 33(01): 8706-. The spectral transformation generator G comprises an encoder F, two decoders GAAnd GBThe spectrum conversion discriminator D consists of two discriminators DAAnd DBAnd (4) forming. Wherein the encoder F comprises 2 convolutional layers for downsampling and 4 residual blocks, two decoders GAAnd GBComprising 2 convolutional layers for upsampling and 4 residual blocks, a discriminator DAAnd DBConsists of 5 convolutional layers.
1.2 input Multi-spectral image spectral conversion image is obtained ignoring parallaxAndi.e. the infrared right image IA(p) conversion to RGB Right imageRGB left image IB(p) conversion to an infrared left imageWherein the superscript fake indicates that the image is the output term obtained by the spectral conversion module.
2. Constructing a depth estimation module
Inputting: and infrared right image.
And (3) outputting: parallax, including left parallax and right parallax.
2.1 building a depth estimation network includes building a depth estimation network M
And the depth estimation network inputs a single infrared right image and outputs a left-right parallax corresponding to the image, and the converted infrared left image converted by the frequency spectrum conversion network and the input single infrared right image are used as a monitoring signal. The depth estimation network of this example is based on document 4: monodepth, Godard, O.Mac Aodha, and G.J.Brostow.Unsurmounted monoclonal estimation with left-right consistency. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017. The depth estimation network M comprises an encoder and a decoder. Using resnet18 as the encoder, the decoder consists of 4 convolutional layers.
2.2 Single Infrared Right image I to be inputA(p) generating left and right parallaxes dlAnd dr. Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax. l and r represent the left and right images, respectively.
3. Building spectral conversion loss module
Inputting: a spectral conversion network, a multi-spectral image, and a spectral conversion image.
And (3) outputting: spectral conversion loss.
3.1 acquiring circularly transformed and consistently reconstructed images
Step 1, converting the frequency spectrum into an imageAndinputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion imageAndnamely haveAnd
step 2, the multi-spectral image IA(p) and IB(p) input into a generator of a spectral conversion network, but using a generator that is the inverse of the previously obtained multi-spectral image, a coherent reconstructed image is obtainedAndnamely haveAnd
3.2 design spectral conversion loss
As shown in stage 1 of FIG. 1, the first iteration optimizes the spectral translation networkThe spectrum conversion loss is utilized when the network is on, and the spectrum conversion loss is LGAnd LDThe two parts are composed of the following forms:
wherein λ iscyc,λrec,λgAnd λdAre respectively loss termsAndthe weights of (a) and (b) are set to 10,5,1 and 1, respectively, in the present invention.Andthe cycles of the shared encoder generate the countering losses of the countering network F-CycleGAN generator G and the discriminator D, respectively, which fulfill the task of image conversion.Is a loss of consistency of the cycle,is a consistent reconstruction penalty, both of which together fulfill the task of ignoring parallax when transforming the spectrum. Cyc in the upper subscript indicates that the variable is associated with a cyclic uniform loss, rec indicates that the variable is associated with a uniform reconstruction loss, adv indicates that the variable is associated with an antagonistic loss, and G and D indicate that the variable is associated with an antagonistic loss of the generator G and an antagonistic loss of the discriminator D, respectively.
4. Building a depth estimation loss module
Inputting: an infrared right image, an infrared left image in the spectrum conversion image, a left parallax, and a right parallax.
And (3) outputting: the depth estimates the loss.
4.1 image warping
This document refers to a single infrared right image IA(p) Relabelling as Ir(p), converted infrared left imageRelabeling as Il(p)。
Step 1, constructing a warping module, defining a warping operation ω, which performs the following operations for p ═ x, y, including:
wherein, IlAnd IrRespectively representing a left image and a right image,andrespectively representing the warped pseudo left image and the pseudo right image.
Step 2, the infrared right image Ir(p) infrared left image Il(p) using respective left parallaxes dlAnd right parallax drCarrying out warping operation to obtain a pseudo infrared left imageAnd pseudo-infrared right image
4.2 design depth estimation penalty
As shown in stage 2 of FIG. 1, depth estimation loss L is utilized in iteratively optimizing a depth estimation networkMEN. Depth estimation penalty LMENThe form is as follows:
wherein alpha isap,αdsAnd alphalrAre the corresponding loss weights, which in this example are set to 1.0,0.2,0.1, respectively.Respectively, loss of appearance reconstruction, visionDifferential smooth loss and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.Andthe items on the left and the right are indicated separately, and only the item on the left is explained here, since the right can derive the method by interchanging the labels l and r.Is the loss of appearance reconstruction, which is designed to be:
where α is a hyper-parameter, which is set to 0.9 in this example, SSIM is a structural similarity function, and N is the number of picture pixels.
wherein the content of the first and second substances,andare respectively dlAnd IlN is the number of picture pixels.
where N is the number of picture pixels.
5. Building auxiliary loss modules
Inputting: infrared right image, RGB left image, left parallax, and right parallax.
And (3) outputting: loss of support.
5.1 auxiliary loss Module image warping
Utilizing the warping operation omega constructed in the step 4.1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming a warping operation, including Recalculating to obtain an infrared right composite imageAnd RGB left composite image
5.2 design assistance loss
As shown in stage 3 of fig. 1, the second iteration optimizes the spectrum conversion network by using the auxiliary loss which uses the original infrared right image IA(p) and RGB left image IBAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image. Loss of assistanceThe design is as follows:
wherein the content of the first and second substances,andis a hyper-parameter, here all set to 20,andthe image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that this variable is associated with a loss of assist.
6. Whole frame training
The method comprises the steps of unifying an input multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the spectrum conversion loss module, the depth estimation loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.
6.1 data preprocessing
Inputting: multi-spectral images.
And (3) outputting: channel-consistent multi-spectral images.
In the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I isa(p) expansion into a three-channel image IA(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channelsaAnd (p) realizing the consistency of the infrared image and the RGB image. And unifies the sizes of all images to 256 × 256.
6.2 mold frame preheating
Inputting: channel consistent multi-spectral images.
And (3) outputting: spectrally converted images
Model framework preheating process only utilizes spectrum conversion loss L in one iteration turnGAnd LDAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax. I.e. only phase 1 of fig. 1 is performed.
6.3 model framework training
Inputting: channel consistent multi-spectral images
And (3) outputting: parallax, including left parallax and right parallax
In an iteration turn, the model framework training process firstly utilizes the spectrum conversion loss LGAnd LDTraining the spectrum conversion network, and estimating the loss L by using depthMENTraining the depth estimation, and finally utilizing the auxiliary lossAnd retraining the spectrum conversion network. I.e. phase 1, phase 2 and phase 3 of fig. 1 are performed.
6.4 model frame testing
Inputting: target infrared image
And (3) outputting: depth estimation result
Step 1, channel expansion is carried out on the target infrared image, and then the target infrared image is input into a trained depth estimation network to obtain right parallax.
Step 2, post-processing the right parallax by using a formulaConverting the disparity into depth, whereinAnd obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.
In the present embodiment, fig. 2a is an RGB left image example, fig. 2b is an infrared right image example, and fig. 2c is a right disparity estimation result example. By the depth estimation method of the embodiment, the depth estimation of a single infrared image is completed, as shown in fig. 2c, and a depth tag is not needed for supervision during training, but only multi-spectral images are needed, as shown in fig. 2a and fig. 2b, which reduces the difficulty and cost of acquiring a training data set.
The present invention provides a multi-spectral supervised infrared image depth estimation method, and a number of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a number of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (10)
1. A multi-spectral image supervised infrared image depth estimation method is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network model, and inputting the multi-spectrum image into the frequency spectrum conversion network model to obtain a frequency spectrum conversion image of which the multi-spectrum image ignores parallax error;
step 2, constructing a depth estimation module: building a depth estimation network model, and inputting the infrared right image into the depth estimation network model to obtain parallax;
step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain spectrum conversion loss, and performing iterative optimization on the spectrum conversion network model by using the spectrum conversion loss;
step 4, constructing a depth estimation loss module: performing image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and performing iterative optimization on a depth estimation network model by using the depth estimation loss;
step 5, constructing an auxiliary loss module: obtaining auxiliary loss through image warping calculation by using the spectrum conversion image and the parallax obtained in the step 1 and the step 2, and performing iterative optimization on the spectrum conversion network model by using the auxiliary loss;
step 6, training the whole framework: unifying the multi-spectral image data set to a consistent channel through channel expansion, and inputting the processed data to a spectrum conversion module to obtain a spectrum conversion image ignoring parallax; preheating the frequency spectrum conversion module through the frequency spectrum conversion loss module, and inputting the infrared right image into the depth estimation module to obtain parallax; the spectrum conversion module and the depth estimation module are sequentially and iteratively optimized through the spectrum conversion loss module, the depth estimation loss module and the auxiliary loss module, and the depth estimation of a single infrared image is realized through the trained depth estimation module and post-processing.
2. The multi-spectral image-supervised infrared image depth estimation method of claim 1, wherein: the step 1 comprises the following steps of,
step 1-1: building a spectrum conversion network, including building a spectrum conversion generator G and a spectrum conversion discriminator D; the spectral transformation generator G comprises an encoder F, two decoders GAAnd GBThe spectrum conversion discriminator D consists of two discriminators DAAnd DBComposition is carried out;
step 1-2: inputting a multi-spectrum image, obtaining a spectrum conversion image ignoring parallax,
will infrared right image IA(p) conversion to RGB Right imageRGB left image IB(p) conversion to an Infrared left imageWherein p represents the pixel coordinate of any point in the image, subscript A represents the infrared spectrum, subscript B represents the RGB spectrum, superscript fake represents that the image is the output item obtained by the spectrum conversion module, RGB represents the image
Consists of three channels of red, green and blue.
3. The method of claim 2, wherein the method comprises: the step 2 comprises the following steps of,
step 2-1: building a depth estimation network, including building a depth estimation network M;
step 2-2: inputting a single infrared right image IA(p) generating left and right parallaxes dlAnd dr(ii) a Wherein, the left parallax dlCorresponding to RGB left image I in multi-spectral imageB(p) parallax, right parallax drCorresponding input infrared right image IA(p) parallax; l and r represent the left and right images, respectively.
4. The method of claim 3, wherein the method comprises: the step 3 comprises the following steps of,
step 3-1: acquiring a cyclic conversion image and a consistent reconstruction image;
step 3-2: designing spectral conversion loss; spectral conversion loss from LGAnd LDTwo parts, denoted as:
wherein the content of the first and second substances,andrespectively, the countermeasure losses of the generator G and the discriminator D;indicating a loss of consistency in the cycle,representing a consistent reconstruction loss; lambda [ alpha ]cyc、λrec、λgAnd λdAre respectively loss termsAndthe weight of (c); lambda denotes that the variable is a hyperparameter, cyc denotes that the variable is related to the cyclic uniform loss, rec denotes that the variable is related to the uniform reconstruction loss, adv denotes that the variable is related to the countermeasure loss, and G and D denote that the variable is related to the countermeasure loss of the generator G and the countermeasure loss of the discriminator D, respectively.
5. The method of claim 4, wherein the method comprises: step 3-1 comprises the following steps:
step 3-1-1: converting a spectrum into an imageAndinputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion imageAndnamely:
step 3-1-2: infrared right image I of multi-spectrum imageA(p) and RGB left image IB(p) inputting into a generator of a spectral transformation network, obtaining a coherent reconstructed image using the generator as opposed to obtaining a multi-spectral imageAndnamely:
6. the method of claim 5, wherein the method comprises: the step 4 comprises the following steps of,
step 4-1: warping the image;
step 4-2: designing a depth estimation penalty; utilizing depth estimation loss L in iteratively optimizing depth estimation networkMEN;
Depth estimation penalty LMENThe form is as follows:
wherein alpha isap、αdsAnd alphalrRepresenting an appearance reconstruction loss weight, a parallax smoothing loss weight and a left-right parallax consistency loss weight;andrepresenting appearance reconstruction loss, parallax smoothing loss and left-right parallax consistency loss; ap, ds, and lr represent variables associated with appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.
7. The method of claim 6, wherein the method comprises: the step 4-1 comprises the steps of,
step 4-1-1: constructing a warp module, defining a warp operation ω, for p ═ x, y,
wherein, IlAnd IrRespectively representing a left image and a right image,andrespectively representing a warped pseudo left image and a pseudo right image, and an integer x is in the range of [1, H ∈]Representing the pixel abscissa, the integer y ∈ [1, W ∈ ]]Representing the pixel ordinate, W and H being the width and length of the image, respectively;
8. The method of claim 7, wherein the method comprises: the step 5 comprises the steps of,
step 5-1: auxiliary loss module image warping; utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image IA(p), RGB left image IB(p) using respective left parallaxes dlAnd right parallax drPerforming warping operations, i.e.
Step 5-2: designing auxiliary loss; the auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image IA(p) and RGB left image IB(p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image; loss of assistanceThe design is as follows:
wherein the content of the first and second substances,andrepresenting the weight of the loss term, alpha representing that the variable is a hyperparameter, and aux representing that the variable is related to the auxiliary loss;andrespectively representing an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels.
9. The method of claim 8, wherein the method comprises: the step 6 comprises the steps of,
step 6-1: executing infrared image channel expansion in the data preprocessing process, and expanding the infrared right image into a three-channel image; copying a single-channel image matrix once on each channel of the three channels to realize the consistency of the infrared image and the RGB image;
step 6-2: preheating a model frame: exploiting spectral translation loss L in one iteration roundGAnd LDTraining a spectrum conversion network, and obtaining a spectrum conversion image with parallax omitted by using the multi-spectrum image;
step 6-3: training a model framework: in an iteration round, the spectral conversion loss L is usedGAnd LDTraining the frequency spectrum conversion network and estimating the loss L by using depthMENTraining a depth estimation network with assistance lossRetraining the spectrum conversion network;
step 6-4: and (5) testing a model framework.
10. The method of claim 9, wherein the method comprises: step 6-4 comprises the steps of,
6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into a trained depth estimation network to obtain right parallax;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111531301.3A CN114494386A (en) | 2021-12-14 | 2021-12-14 | Infrared image depth estimation method based on multi-spectral image supervision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111531301.3A CN114494386A (en) | 2021-12-14 | 2021-12-14 | Infrared image depth estimation method based on multi-spectral image supervision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114494386A true CN114494386A (en) | 2022-05-13 |
Family
ID=81493818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111531301.3A Pending CN114494386A (en) | 2021-12-14 | 2021-12-14 | Infrared image depth estimation method based on multi-spectral image supervision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114494386A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI787141B (en) * | 2022-06-21 | 2022-12-11 | 鴻海精密工業股份有限公司 | Method and equipment for training depth estimation model, and method and equipment for depth estimation |
-
2021
- 2021-12-14 CN CN202111531301.3A patent/CN114494386A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI787141B (en) * | 2022-06-21 | 2022-12-11 | 鴻海精密工業股份有限公司 | Method and equipment for training depth estimation model, and method and equipment for depth estimation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11830222B2 (en) | Bi-level optimization-based infrared and visible light fusion method | |
CN111145131A (en) | Infrared and visible light image fusion method based on multi-scale generation type countermeasure network | |
US20060228049A1 (en) | Method for capturing images comprising a measurement of local motions | |
US20160094827A1 (en) | Hyperspectral imaging devices using hybrid vector and tensor processing | |
CN111861880B (en) | Image super-fusion method based on regional information enhancement and block self-attention | |
CN107680116A (en) | A kind of method for monitoring moving object in video sequences | |
Arad et al. | NTIRE 2022 spectral demosaicing challenge and data set | |
Sun et al. | Color polarization demosaicking by a convolutional neural network | |
CN113887645B (en) | Remote sensing image fusion classification method based on joint attention twin network | |
CN114494386A (en) | Infrared image depth estimation method based on multi-spectral image supervision | |
CN117252928B (en) | Visual image positioning system for modular intelligent assembly of electronic products | |
CN105844589A (en) | Method for realizing light field image super-resolution based on mixed imaging system | |
Nguyen et al. | Different structured-light patterns in single-shot 2D-to-3D image conversion using deep learning | |
CN113992920A (en) | Video compressed sensing reconstruction method based on deep expansion network | |
Tang et al. | MPCFusion: Multi-scale parallel cross fusion for infrared and visible images via convolution and vision Transformer | |
CN111369483A (en) | Method for generating high-spatial-temporal-resolution remote sensing data by fusing multi-source remote sensing data | |
CN116486155A (en) | Target detection method based on transducer and cascade characteristics | |
CN116452480A (en) | Method for fusing infrared and visible light images | |
Yang et al. | Mipi 2022 challenge on rgbw sensor fusion: Dataset and report | |
CN109993782B (en) | Heterogeneous remote sensing image registration method and device for ring-shaped generation countermeasure network | |
Zhang et al. | Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes | |
Jayalaxmi et al. | Performance analysis on color image mosaicing techniques on FPGA | |
Liu et al. | Underwater image stitching method based on optimal seam and multi-resolution fusion algorithms | |
Fan et al. | An effective single image depth estimating algorithm | |
Luo et al. | Simultaneous monocular visual odometry and depth reconstruction with scale recover |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |