CN114494386A

CN114494386A - Infrared image depth estimation method based on multi-spectral image supervision

Info

Publication number: CN114494386A
Application number: CN202111531301.3A
Authority: CN
Inventors: 孙正兴; 刘胡伟; 孙蕴瀚; 张巍
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-05-13

Abstract

The invention discloses a multi-spectral image supervised infrared image depth estimation method, which comprises the following steps: 1) constructing a spectrum conversion module: obtaining a spectrum conversion image according to the multi-spectrum image; 2) constructing a spectrum conversion loss module: obtaining parallax according to the infrared image; 3) constructing an auxiliary loss module: and calculating auxiliary loss by using the spectrum conversion image and the parallax acquired in the step through image warping, and iteratively optimizing the spectrum conversion network model by using the loss. 4) Constructing a depth estimation loss module: carrying out image warping by using the obtained parallax, calculating depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss; 5) constructing an auxiliary loss module: and calculating auxiliary loss by using the obtained spectrum conversion image and the parallax through image warping, and iteratively optimizing a spectrum conversion network model by using the loss. 6) Training the whole frame: the method comprises four stages of data preprocessing, model framework preheating, training and testing.

Description

Infrared image depth estimation method based on multi-spectral image supervision

Technical Field

The invention relates to an infrared image depth estimation method, belongs to the technical field of computer graphics, and particularly relates to an infrared image depth estimation method based on multi-spectral image supervision.

Background

For many large and complex projects, there is an urgent need for a low-cost solution capable of monitoring the quality of the project for a long time and periodically detecting the defects of the project to ensure the safety of the project and meet the requirement of daily maintenance, so that the inspection robot has received much attention. The inspection robot acquires multi-mode information through various sensors to complete a series of two-dimensional or three-dimensional tasks, such as defect detection in the two-dimensional tasks and three-dimensional reconstruction in the three-dimensional tasks. And depth information plays an important role in these tasks.

Considering that the infrared camera has the characteristic of being insensitive to the environment, the infrared camera can directly measure the infrared radiation of an object and the environment no matter whether an external light source exists or not, and therefore, the depth information obtained by utilizing the infrared image acquired by the monocular infrared camera has the more outstanding advantage than other methods. For example, compared with active sensors such as a laser radar and a depth camera with a light structure, which are expensive and have different defects when facing a complex scene, a passive sensor based on a standard imaging technology, such as an infrared camera, is cheaper in price, lighter in weight and stronger in adaptability, can be more flexibly deployed on an inspection robot, and can adapt to different complex environments; compared with a method that prediction cannot be well performed in a night environment and a low-light or even zero-light environment, a method for obtaining depth information by combining an RGB image acquired by a monocular RGB camera with a depth estimation technology is disclosed in document 1: godard C, Aodha O M, Firman M, et al, scaling Into Self-supper simple Depth estimation, International Conference on Computer Vision, 2019, 3827 and 3837.

Currently, an infrared image depth estimation method for obtaining depth information from a single infrared image is a supervised method, for example, document 2: wang, q., Zhao, h., Hu, z.et al.discrete connected CRF networks for depth estimation from cellular in images, int.j.mach.learn. & cyber.12, 2021, 187 + 200. Without the use of depth tags, it is difficult to obtain depth information from only a single infrared image, where it is desirable to generate the supervisory signals with a cheaper RGB camera. However, the infrared image depth estimation based on multi-spectrum supervision has the problem that the spectrums have large appearance difference, and the spectrums cannot be directly matched. Therefore, a multi-spectral image supervised infrared image depth estimation method is provided. The method can obtain the depth information from a single infrared image by using multi-modal information as a supervision signal under the condition of no supervision of a depth label.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a multi-spectral-image-supervised infrared image depth estimation method which can accurately estimate the depth information of a single infrared image.

In order to solve the technical problem, the invention discloses an infrared image depth method for multi-spectral image supervision, which comprises the following steps of:

step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network, inputting the multi-frequency spectrum image into a frequency spectrum conversion network model, and obtaining a frequency spectrum conversion image of the multi-frequency spectrum image under the condition of neglecting parallax, namely converting an infrared right image and an RGB left image into an RGB right image and an infrared left image; the right image is an image obtained by a right camera in the binocular camera, the left image is an image obtained by a left camera in the binocular camera, the infrared spectrum indicates that the frequency spectrum of the image is an infrared spectrum, and the RGB indicates that the frequency spectrum of the image is a visible light spectrum, namely, the image is composed of three channels of red, green and blue.

Step 2, constructing a depth estimation module: building a depth estimation network, and inputting the infrared right image into a depth estimation network model to obtain parallax;

step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain a spectrum conversion loss, and iteratively optimizing the spectrum conversion network model by using the loss;

step 4, constructing a depth estimation loss module: carrying out image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and iteratively optimizing a depth estimation network model by using the loss;

step 5, constructing an auxiliary loss module: and (3) calculating to obtain auxiliary loss through image warping by using the spectrum conversion image and the parallax obtained in the step (1) and the step (2), and performing iterative optimization on the spectrum conversion network model by using the loss.

Step 6, training the whole framework: the method comprises the steps of unifying a multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the depth estimation loss module, the spectrum conversion loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.

The step 1 comprises the following steps:

step 1-1: the construction of the spectrum conversion network comprises the construction of a spectrum conversion generator G and a spectrum conversion discriminator D

Step 1-2: the input multi-spectral spectrum image obtains a spectrum conversion image with parallax neglected, there are

And

i.e. the infrared right image I_A(p) conversion to RGB Right image

RGB left image I_B(p) conversion to an infrared left image

Wherein the superscript fake indicates that the image is the output term obtained by the spectral conversion module.

The step 2 comprises the following steps:

step 2-1: constructing a depth estimation network comprises constructing a depth estimation network M;

step 2-2: inputting a single infrared right image I_A(p) generating left and right parallaxes d^lAnd d^r. Wherein, the left parallax d^lCorresponding to RGB left image I in multi-spectral image_B(p) parallax, right parallax d^rCorresponding input infrared right image I_A(p) parallax. l and r represent the left and right images, respectively;

the step 3 comprises the following steps:

step 3-1: acquiring a cyclic conversion image and a consistent reconstruction image;

step 3-2: the spectral conversion loss is designed. The first iteration optimizes the spectrum conversion network by utilizing the spectrum conversion loss which is L_GAnd L_DThe two parts are composed of the following forms:

wherein λ is_cyc,λ_rec,λ_gAnd λ_dAre respectively loss terms

And

the weights of (a) and (b) are set to 10,5,1 and 1, respectively, in the present invention.

And

the cycles of the shared encoder generate the countering losses of the countering network F-CycleGAN generator G and the discriminator D, respectively, which fulfill the task of image conversion.

Is a loss of consistency of the cycle,

is a consistent reconstruction penalty, both of which together fulfill the task of ignoring parallax when transforming the spectrum. Cyc in the upper subscript indicates that the variable is associated with a cyclic uniform loss, rec indicates that the variable is associated with a uniform reconstruction loss, adv indicates that the variable is associated with an antagonistic loss, and G and D indicate that the variable is associated with an antagonistic loss of the generator G and an antagonistic loss of the discriminator D, respectively.

Step 3-1 comprises the following steps:

step 3-1-1: converting a spectrum into an image

And

inputting the image into a generator of a spectrum conversion network to obtain a cyclic conversion image

And

namely have

And

step 3-1-2: will multi-spectral image I_A(p) and I_B(p) input into a generator of a spectral conversion network, but using the same as beforeTo generators inverse to the multi-spectral image, resulting in a coherent reconstructed image

And

namely have

And

step 4 comprises the following steps:

step 4-1: the image is warped.

Step 4-2: the depth estimation penalty is designed. Utilizing depth estimation loss L in iteratively optimizing depth estimation network_MEN. Depth estimation penalty L_MENThe form is as follows:

wherein alpha is_ap，α_dsAnd alpha_lrAre corresponding loss weights, which are set to 1.0,0.2,0.1, respectively, in the present invention.

Respectively, appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.

Step 4-1 comprises the following steps:

step 4-1-1: constructing a warp module, defining a warp operation ω, which performs the following for p ═ x, y, as:

wherein, I^lAnd I^rRespectively representing a left image and a right image,

and

respectively representing the warped pseudo left image and the pseudo right image.

Step 4-1-2: will infrared right image I^r(p) infrared left image I^l(p) performing left parallax d^lAnd right parallax d^rCarrying out warping operation to obtain a pseudo infrared left image

And pseudo-infrared right image

Step 5 comprises the following steps:

step 5-1: loss-assisted module image warping is assisted. Utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image I_A(p), RGB left image I_B(p) using respective left parallaxes d^lAnd right parallax d^rPerforming a warping operation, including

Recalculating to obtain the infrared right synthetic image

And RGB left composite image

Step 5-2: the auxiliary losses are designed. The auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image I_A(p) and RGB left image I_BAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image.Loss of assistance

The design is as follows:

wherein the content of the first and second substances,

and

is a hyper-parameter, here all set to 20,

and

the image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that the loss is related to the auxiliary loss.

Step 6 comprises the following steps:

step 6-1: in the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I is_a(p) expansion into a three-channel image I_A(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channels_aAnd (p) realizing the consistency of the infrared image and the RGB image.

Step 6-2: the mold frame is preheated. Model preheating process only utilizes spectrum conversion loss L in one iteration turn_GAnd L_DAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax.

Step 6-3: and (5) training a model framework. The complete model training process firstly utilizes the spectrum conversion loss L in an iteration round_GAnd L_DTraining a spectral translation network and secondarily using depthEstimating loss L_MENTraining the depth estimation, and finally utilizing the auxiliary loss

And retraining the spectrum conversion network.

And 6-4: and (5) testing a model framework.

Step 6-4 comprises the following steps:

and 6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into the trained depth estimation network to obtain the right parallax.

Step 6-4-2, post-processing the right parallax by using a formula

Converting the disparity into depth, wherein

And obtaining a depth estimation result of the target infrared image, wherein d is the parallax, B is the base length, and f is the focal length of the camera.

Has the advantages that: the invention has the following advantages: firstly, the depth estimation method of the invention can complete the depth estimation of a single infrared image, and only multi-spectral images are needed without using a depth label as supervision during training, thereby reducing the acquisition difficulty and cost of a training data set. Secondly, the invention solves the problem of larger appearance difference between frequency spectrums by utilizing a frequency spectrum conversion network, so that the multi-spectrum image can be used for depth estimation. Finally, the auxiliary loss designed by the invention enables the frequency spectrum conversion network to obtain a clearer image, and the accuracy of depth estimation of the depth estimation network is improved.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic process flow diagram of the present invention.

Fig. 2a is an RGB left image example of the embodiment.

Fig. 2b is an infrared right image example of an embodiment.

Fig. 2c is an example of the right disparity estimation result of the embodiment.

Detailed Description

Examples

As shown in fig. 1, the present invention relates to an infrared image depth method with multi-spectral image supervision, which comprises the following steps:

1. constructing a spectrum conversion module

Inputting: the multi-spectral image includes an infrared right image and an RGB left image.

And (3) outputting: the spectrum conversion image includes an RGB right image and an infrared left image.

1.1 building a spectrum conversion network includes building a spectrum conversion generator G and a spectrum conversion discriminator D

The spectrum conversion network can obtain a spectrum conversion image from a spectrum image to be converted under the condition of ignoring parallax, namely, an infrared right image is converted into an RGB right image, and an RGB left image is converted into an infrared left image, so that the problem that the spectrum and the infrared left image have large appearance difference and cannot be directly matched in the depth estimation network is solved. The spectrum conversion network of this example is based on document 3: F-CycleGAN construction of Liang M, Guo X, Li H, et al, in the case of unsuperved cross-spectral analysis by learning to synthesis, proceedings of the AAAI reference on the scientific interest.2019, 33(01): 8706-. The spectral transformation generator G comprises an encoder F, two decoders G_AAnd G_BThe spectrum conversion discriminator D consists of two discriminators D_AAnd D_BAnd (4) forming. Wherein the encoder F comprises 2 convolutional layers for downsampling and 4 residual blocks, two decoders G_AAnd G_BComprising 2 convolutional layers for upsampling and 4 residual blocks, a discriminator D_AAnd D_BConsists of 5 convolutional layers.

1.2 input Multi-spectral image spectral conversion image is obtained ignoring parallax

And

i.e. the infrared right image I_A(p) conversion to RGB Right image

RGB left image I_B(p) conversion to an infrared left image

2. Constructing a depth estimation module

Inputting: and infrared right image.

And (3) outputting: parallax, including left parallax and right parallax.

2.1 building a depth estimation network includes building a depth estimation network M

And the depth estimation network inputs a single infrared right image and outputs a left-right parallax corresponding to the image, and the converted infrared left image converted by the frequency spectrum conversion network and the input single infrared right image are used as a monitoring signal. The depth estimation network of this example is based on document 4: monodepth, Godard, O.Mac Aodha, and G.J.Brostow.Unsurmounted monoclonal estimation with left-right consistency. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017. The depth estimation network M comprises an encoder and a decoder. Using resnet18 as the encoder, the decoder consists of 4 convolutional layers.

2.2 Single Infrared Right image I to be input_A(p) generating left and right parallaxes d^lAnd d^r. Wherein, the left parallax d^lCorresponding to RGB left image I in multi-spectral image_B(p) parallax, right parallax d^rCorresponding input infrared right image I_A(p) parallax. l and r represent the left and right images, respectively.

3. Building spectral conversion loss module

Inputting: a spectral conversion network, a multi-spectral image, and a spectral conversion image.

And (3) outputting: spectral conversion loss.

3.1 acquiring circularly transformed and consistently reconstructed images

Step 1, converting the frequency spectrum into an image

And

And

namely have

And

step 2, the multi-spectral image I_A(p) and I_B(p) input into a generator of a spectral conversion network, but using a generator that is the inverse of the previously obtained multi-spectral image, a coherent reconstructed image is obtained

And

namely have

And

3.2 design spectral conversion loss

As shown in stage 1 of FIG. 1, the first iteration optimizes the spectral translation networkThe spectrum conversion loss is utilized when the network is on, and the spectrum conversion loss is L_GAnd L_DThe two parts are composed of the following forms:

wherein λ is_cyc,λ_rec,λ_gAnd λ_dAre respectively loss terms

And

And

Is a loss of consistency of the cycle,

Loss of consistency of circulation

The design of (1) is as follows:

wherein N is the number of picture pixels,

and

is a cyclic conversion image.

Consistent reconstruction loss

The design of (1) is as follows:

wherein N is the number of picture pixels,

and

is a consistent reconstructed image.

4. Building a depth estimation loss module

Inputting: an infrared right image, an infrared left image in the spectrum conversion image, a left parallax, and a right parallax.

And (3) outputting: the depth estimates the loss.

4.1 image warping

This document refers to a single infrared right image I_A(p) Relabelling as I^r(p), converted infrared left image

Relabeling as I^l(p)。

Step 1, constructing a warping module, defining a warping operation ω, which performs the following operations for p ═ x, y, including:

wherein, I^lAnd I^rRespectively representing a left image and a right image,

and

Step 2, the infrared right image I^r(p) infrared left image I^l(p) using respective left parallaxes d^lAnd right parallax d^rCarrying out warping operation to obtain a pseudo infrared left image

And pseudo-infrared right image

4.2 design depth estimation penalty

As shown in stage 2 of FIG. 1, depth estimation loss L is utilized in iteratively optimizing a depth estimation network_MEN. Depth estimation penalty L_MENThe form is as follows:

wherein alpha is_ap,α_dsAnd alpha_lrAre the corresponding loss weights, which in this example are set to 1.0,0.2,0.1, respectively.

Respectively, loss of appearance reconstruction, visionDifferential smooth loss and left-right parallax coincidence loss. Ap, ds, lr in the upper subscript indicate that the variable is associated with apparent reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.

And

the items on the left and the right are indicated separately, and only the item on the left is explained here, since the right can derive the method by interchanging the labels l and r.

Is the loss of appearance reconstruction, which is designed to be:

where α is a hyper-parameter, which is set to 0.9 in this example, SSIM is a structural similarity function, and N is the number of picture pixels.

Is the parallax smoothing loss, which is designed to be:

wherein the content of the first and second substances,

and

are respectively d^lAnd I^lN is the number of picture pixels.

Is a left-right parallax consistent loss, which is designed as:

where N is the number of picture pixels.

5. Building auxiliary loss modules

Inputting: infrared right image, RGB left image, left parallax, and right parallax.

And (3) outputting: loss of support.

5.1 auxiliary loss Module image warping

Utilizing the warping operation omega constructed in the step 4.1 to obtain the infrared right image I_A(p), RGB left image I_B(p) using respective left parallaxes d^lAnd right parallax d^rPerforming a warping operation, including

Recalculating to obtain an infrared right composite image

And RGB left composite image

5.2 design assistance loss

As shown in stage 3 of fig. 1, the second iteration optimizes the spectrum conversion network by using the auxiliary loss which uses the original infrared right image I_A(p) and RGB left image I_BAnd (p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image. Loss of assistance

The design is as follows:

wherein the content of the first and second substances,

and

is a hyper-parameter, here all set to 20,

and

the image processing method comprises the steps of respectively obtaining an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels. Aux in the superscript indicates that this variable is associated with a loss of assist.

6. Whole frame training

The method comprises the steps of unifying an input multi-spectral image data set to a consistent channel through channel expansion, inputting processed data to a spectrum conversion module to obtain a spectrum conversion image, preheating the spectrum conversion module through a spectrum conversion loss module, inputting the preheated data to a depth estimation module to obtain parallax, sequentially and iteratively optimizing the spectrum conversion module and the depth estimation module through the spectrum conversion loss module, the depth estimation loss module and an auxiliary loss module, and realizing depth estimation of a single infrared image by using a trained depth estimation module and post-processing.

6.1 data preprocessing

Inputting: multi-spectral images.

And (3) outputting: channel-consistent multi-spectral images.

In the data preprocessing process, the infrared image channel expansion is executed, namely the infrared right image I is_a(p) expansion into a three-channel image I_A(p) of the formula (I). In detail, the one-channel matrix I is replicated once for each of the three channels_aAnd (p) realizing the consistency of the infrared image and the RGB image. And unifies the sizes of all images to 256 × 256.

6.2 mold frame preheating

Inputting: channel consistent multi-spectral images.

And (3) outputting: spectrally converted images

Model framework preheating process only utilizes spectrum conversion loss L in one iteration turn_GAnd L_DAnd training the spectrum conversion network. The spectrum conversion network subjected to the model preheating process can obtain the spectrum conversion image from the multi-spectrum image under the condition of ignoring parallax. I.e. only phase 1 of fig. 1 is performed.

6.3 model framework training

Inputting: channel consistent multi-spectral images

And (3) outputting: parallax, including left parallax and right parallax

In an iteration turn, the model framework training process firstly utilizes the spectrum conversion loss L_GAnd L_DTraining the spectrum conversion network, and estimating the loss L by using depth_MENTraining the depth estimation, and finally utilizing the auxiliary loss

And retraining the spectrum conversion network. I.e. phase 1, phase 2 and phase 3 of fig. 1 are performed.

6.4 model frame testing

Inputting: target infrared image

And (3) outputting: depth estimation result

Step 1, channel expansion is carried out on the target infrared image, and then the target infrared image is input into a trained depth estimation network to obtain right parallax.

Step 2, post-processing the right parallax by using a formula

Converting the disparity into depth, wherein

In the present embodiment, fig. 2a is an RGB left image example, fig. 2b is an infrared right image example, and fig. 2c is a right disparity estimation result example. By the depth estimation method of the embodiment, the depth estimation of a single infrared image is completed, as shown in fig. 2c, and a depth tag is not needed for supervision during training, but only multi-spectral images are needed, as shown in fig. 2a and fig. 2b, which reduces the difficulty and cost of acquiring a training data set.

The present invention provides a multi-spectral supervised infrared image depth estimation method, and a number of methods and approaches for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a number of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A multi-spectral image supervised infrared image depth estimation method is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

step 1, constructing a spectrum conversion module: building a frequency spectrum conversion network model, and inputting the multi-spectrum image into the frequency spectrum conversion network model to obtain a frequency spectrum conversion image of which the multi-spectrum image ignores parallax error;

step 2, constructing a depth estimation module: building a depth estimation network model, and inputting the infrared right image into the depth estimation network model to obtain parallax;

step 3, constructing a spectrum conversion loss module: acquiring a cyclic conversion image and a consistent reconstruction image by using the spectrum conversion network model and the spectrum conversion image obtained in the step 1, calculating to obtain spectrum conversion loss, and performing iterative optimization on the spectrum conversion network model by using the spectrum conversion loss;

step 4, constructing a depth estimation loss module: performing image warping by using the parallax obtained in the step (2), calculating to obtain depth estimation loss, and performing iterative optimization on a depth estimation network model by using the depth estimation loss;

step 5, constructing an auxiliary loss module: obtaining auxiliary loss through image warping calculation by using the spectrum conversion image and the parallax obtained in the step 1 and the step 2, and performing iterative optimization on the spectrum conversion network model by using the auxiliary loss;

step 6, training the whole framework: unifying the multi-spectral image data set to a consistent channel through channel expansion, and inputting the processed data to a spectrum conversion module to obtain a spectrum conversion image ignoring parallax; preheating the frequency spectrum conversion module through the frequency spectrum conversion loss module, and inputting the infrared right image into the depth estimation module to obtain parallax; the spectrum conversion module and the depth estimation module are sequentially and iteratively optimized through the spectrum conversion loss module, the depth estimation loss module and the auxiliary loss module, and the depth estimation of a single infrared image is realized through the trained depth estimation module and post-processing.

2. The multi-spectral image-supervised infrared image depth estimation method of claim 1, wherein: the step 1 comprises the following steps of,

step 1-1: building a spectrum conversion network, including building a spectrum conversion generator G and a spectrum conversion discriminator D; the spectral transformation generator G comprises an encoder F, two decoders G_AAnd G_BThe spectrum conversion discriminator D consists of two discriminators D_AAnd D_BComposition is carried out;

step 1-2: inputting a multi-spectrum image, obtaining a spectrum conversion image ignoring parallax,

and

will infrared right image I_A(p) conversion to RGB Right image

RGB left image I_B(p) conversion to an Infrared left image

Wherein p represents the pixel coordinate of any point in the image, subscript A represents the infrared spectrum, subscript B represents the RGB spectrum, superscript fake represents that the image is the output item obtained by the spectrum conversion module, RGB represents the image

Consists of three channels of red, green and blue.

3. The method of claim 2, wherein the method comprises: the step 2 comprises the following steps of,

step 2-1: building a depth estimation network, including building a depth estimation network M;

step 2-2: inputting a single infrared right image I_A(p) generating left and right parallaxes d^lAnd d^r(ii) a Wherein, the left parallax d^lCorresponding to RGB left image I in multi-spectral image_B(p) parallax, right parallax d^rCorresponding input infrared right image I_A(p) parallax; l and r represent the left and right images, respectively.

4. The method of claim 3, wherein the method comprises: the step 3 comprises the following steps of,

step 3-2: designing spectral conversion loss; spectral conversion loss from L_GAnd L_DTwo parts, denoted as:

wherein the content of the first and second substances,

and

respectively, the countermeasure losses of the generator G and the discriminator D;

indicating a loss of consistency in the cycle,

representing a consistent reconstruction loss; lambda [ alpha ]_cyc、λ_rec、λ_gAnd λ_dAre respectively loss terms

And

the weight of (c); lambda denotes that the variable is a hyperparameter, cyc denotes that the variable is related to the cyclic uniform loss, rec denotes that the variable is related to the uniform reconstruction loss, adv denotes that the variable is related to the countermeasure loss, and G and D denote that the variable is related to the countermeasure loss of the generator G and the countermeasure loss of the discriminator D, respectively.

5. The method of claim 4, wherein the method comprises: step 3-1 comprises the following steps:

step 3-1-1: converting a spectrum into an image

And

And

namely:

and

step 3-1-2: infrared right image I of multi-spectrum image_A(p) and RGB left image I_B(p) inputting into a generator of a spectral transformation network, obtaining a coherent reconstructed image using the generator as opposed to obtaining a multi-spectral image

And

namely:

and

6. the method of claim 5, wherein the method comprises: the step 4 comprises the following steps of,

step 4-1: warping the image;

step 4-2: designing a depth estimation penalty; utilizing depth estimation loss L in iteratively optimizing depth estimation network_MEN；

Depth estimation penalty L_MENThe form is as follows:

wherein alpha is_ap、α_dsAnd alpha_lrRepresenting an appearance reconstruction loss weight, a parallax smoothing loss weight and a left-right parallax consistency loss weight;

and

representing appearance reconstruction loss, parallax smoothing loss and left-right parallax consistency loss; ap, ds, and lr represent variables associated with appearance reconstruction loss, parallax smoothing loss, and left-right parallax coincidence loss, respectively.

7. The method of claim 6, wherein the method comprises: the step 4-1 comprises the steps of,

step 4-1-1: constructing a warp module, defining a warp operation ω, for p ═ x, y,

wherein, I^lAnd I^rRespectively representing a left image and a right image,

and

respectively representing a warped pseudo left image and a pseudo right image, and an integer x is in the range of [1, H ∈]Representing the pixel abscissa, the integer y ∈ [1, W ∈ ]]Representing the pixel ordinate, W and H being the width and length of the image, respectively;

step 4-1-2: will infrared right image I^r(p) infrared left image I^l(p) using left parallax respectivelyd^lAnd right parallax d^rCarrying out warping operation to obtain a pseudo infrared left image

And pseudo-infrared right image

8. The method of claim 7, wherein the method comprises: the step 5 comprises the steps of,

step 5-1: auxiliary loss module image warping; utilizing the warping operation omega constructed in the step 4-1 to obtain the infrared right image I_A(p), RGB left image I_B(p) using respective left parallaxes d^lAnd right parallax d^rPerforming warping operations, i.e.

Recalculating to obtain the infrared right synthetic image

And RGB left composite image

Step 5-2: designing auxiliary loss; the auxiliary loss is utilized when the frequency spectrum conversion network is optimized in the second iteration, and the auxiliary loss utilizes the original infrared right image I_A(p) and RGB left image I_B(p) warping the image converted by the frequency spectrum conversion network and the parallax obtained by the depth estimation network to obtain an image; loss of assistance

The design is as follows:

wherein the content of the first and second substances,

and

representing the weight of the loss term, alpha representing that the variable is a hyperparameter, and aux representing that the variable is related to the auxiliary loss;

and

respectively representing an infrared right composite image and an RGB left composite image, wherein N is the number of picture pixels.

9. The method of claim 8, wherein the method comprises: the step 6 comprises the steps of,

step 6-1: executing infrared image channel expansion in the data preprocessing process, and expanding the infrared right image into a three-channel image; copying a single-channel image matrix once on each channel of the three channels to realize the consistency of the infrared image and the RGB image;

step 6-2: preheating a model frame: exploiting spectral translation loss L in one iteration round_GAnd L_DTraining a spectrum conversion network, and obtaining a spectrum conversion image with parallax omitted by using the multi-spectrum image;

step 6-3: training a model framework: in an iteration round, the spectral conversion loss L is used_GAnd L_DTraining the frequency spectrum conversion network and estimating the loss L by using depth_MENTraining a depth estimation network with assistance loss

Retraining the spectrum conversion network;

step 6-4: and (5) testing a model framework.

10. The method of claim 9, wherein the method comprises: step 6-4 comprises the steps of,

6-4-1, performing channel expansion on the target infrared image, and inputting the target infrared image into a trained depth estimation network to obtain right parallax;

6-4-2, post-processing the right parallax through a formula

Converting the disparity into depth, wherein