CN109636742B

CN109636742B - Mode conversion method of SAR image and visible light image based on countermeasure generation network

Info

Publication number: CN109636742B
Application number: CN201811405188.2A
Authority: CN
Inventors: 张瑞峰; 刘长卫; 李晖晖; 郭雷; 吴东庆; 翟庆刚; 汤剑; 冯和军; 杨岗军; 韩太初; 胡树正
Original assignee: Aviation Army Institute People's Liberation Army Air Force Research Institute; Northwestern Polytechnical University
Current assignee: Aviation Army Institute People's Liberation Army Air Force Research Institute; Northwestern Polytechnical University
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2020-09-22
Anticipated expiration: 2038-11-23
Also published as: CN109636742A

Abstract

The invention relates to a mode conversion method for converting an SAR image into a visible light image based on a countermeasure generating network, which comprises the following steps of firstly, extracting a characteristic vector of a satellite image at the same position, and using the characteristic vector as prior information of the SAR image; the prior information and the SAR image are input into a generator together to generate a visible light image with an SAR image target. Secondly, training a discriminator in the generation countermeasure network, and adopting a formula L_GAN(G_AB，D，A，B)＝E_b～B[log D(b)]+E_a～A[log(1‑D(G_AB(a)))]As a discriminant loss. And finally, judging whether the trained confrontation generation network has model folding errors or not, namely inputting different SAR images, wherein most of the output of the generator is only the same visible light image. And training the other generator, and comparing the feature similarity of the two images by adopting generation loss. Generating a loss of L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_AB(G_BA(a))‑a||₁]. When the network training is finished, the curves of the discrimination loss and the generation loss tend to be stable, the discrimination loss is not increased any more, and the generation loss is not reduced any more.

Description

Mode conversion method of SAR image and visible light image based on countermeasure generation network

Technical Field

The invention belongs to the field of image translation in deep learning, and relates to a mode conversion method of an SAR image and a visible light image based on a countermeasure generation network.

Background

Since 1978, the emergence of Synthetic Aperture Radar (SAR) has revolutionized radar technology. The advantages of the radar antenna are incomparable all-time and all-weather, and the wide application prospect is generated, thereby attracting countless sights of the radar science community. The subsequent SAR-related research constitutes the main melody of the technical revolution surge, and SAR systems of different wave bands, different polarizations and even different resolutions are emerging continuously. Needless to say, this great revolution has affected various fields of military and civil use.

Due to the improvement of resolution, the data volume of the SAR increases in a series manner, and many difficulties are faced in artificial-based information processing and application research (such as target identification): firstly, in a large-scale area, the tasks of ground object detection and identification based on SAR images are realized by manual interpretation, the task amount greatly exceeds the limit of rapid judgment by manual operation, and subjective errors and understanding errors caused by the task amount are inevitable. Secondly, due to a special imaging mechanism of the SAR image, a target is very sensitive to an azimuth angle, and a completely different SAR image can be caused by a large azimuth angle difference, so that the difference between the SAR image and an optical image in visual effect is further increased, and the difficulty of image interpretation and judgment is increased; thirdly, with the continuous improvement of the resolution of the SAR sensor, diversification of the sensor mode, the wave band and the polarization mode, the target information in the SAR image also shows explosive growth, and the target is changed from a point target on the original single-channel single-polarization low-resolution image into a surface target with abundant detail characteristics and scattering characteristics, so that on one hand, more detailed explanation and identification work on the surface feature information is possible, and meanwhile, the types and instability of the surface feature characteristics are greatly increased, therefore, the traditional information processing and application method cannot meet the requirements of practical application, and related key technologies must be subjected to attack and customs, the data processing speed is accelerated, and the accuracy of information extraction is improved.

Based on the method, the SAR image is converted into an image mode with visible light image characteristics based on a method for generating a network by countermeasure, and the main advantages comprise the following aspects: firstly, when a CycleGAN network in a countermeasure generation network is used for generating a picture, an SAR image in a source image domain is used as input, semantic features of a satellite image at the same position are extracted as prior information of the SAR image, and the prior information is used as conditions and input into a generator at the same time, so that the generated image not only has a visible light style, but also has target information which cannot be seen in the SAR image; secondly, in order to prevent the situation that the generation network generates model folding (Mode Collapse) in the training process, two generators are trained, wherein the first generator is used for generating a visible light image required by a target through an SAR image, and the second generator is used for retranslating the generated visible light image into the SAR image. In the training process, the generation loss between the original SAR image and the generated SAR image is calculated, and the common network model memory error in the training countermeasure generation network is avoided by continuously reducing the generation loss.

Disclosure of Invention

Technical problem to be solved

When the SAR image is subjected to image processing, on one hand, because the traditional method is sensitive to a model and has high requirements on the image, the resolution of the SAR image is relatively low, the image quality is fuzzy, and when the image does not accord with the model, a satisfactory result can not be obtained; on the other hand, it is difficult to establish a semi-empirical formula or mathematical model between the characteristic signals of the SAR image and the target, because the characteristics of the target, such as reflection, scattering, transmission, absorption and radiation, are not known sufficiently. Therefore, a method based on a countermeasure generation network is provided, and an SAR image is converted into an image mode with visible light image characteristics, so that the target is recognized and detected.

Technical scheme

The basic idea of the invention is as follows: a deep learning method for generating a countermeasure Network (GAN) is adopted to train an unsupervised learning Network, namely a cyclic GAN, so that the conversion from a low-resolution SAR image to a visible light image mode is realized. The cycleGAN learns the mapping from a source image domain to a target image domain by taking a visible light image as a target image domain and taking an SAR image as a source image domain, so that the conversion from the source image to the target image is realized. The CycleGAN includes two generators G and one discriminator D. The generator G1 is for converting from the source image domain to the target image domain (i.e. converting the SAR image into a visible light mode image), the generator G2 is for converting from the target domain to the source image domain (i.e. converting the visible light mode image into a SAR image), and the discriminator D is for judging whether the inputted picture is a real visible light image.

The method of the invention is characterized by comprising the following steps:

step 1: acquiring prior information of the SAR image: the method based on the neural network extracts the characteristic vector of the satellite image at the same position as the prior information of the SAR image, so that the target in the visible light image generated by the countermeasure network is clearer.

(1) Feature extraction of satellite images: features are extracted by the convolutional layer and compressed by the pooling layer.

(a) Convolution operation extraction features: input feature map F of hypothetical convolutional layer_inHas a parameter of W_in×H_in×C_in，W_inWidth of input feature graph, H_inHeight of input feature map, C_inThe number of channels of the input profile is indicated. Convolution parameters of the convolutional layer areK, S, P, Stride, K denotes the number of convolution kernels, S denotes the width and height of the convolution kernels, P denotes the zero padding operation performed on the input feature map, for example, P ═ 1 denotes padding the input feature map by 0 around, Stride denotes the sliding step size of the convolution kernels on the input feature map. The output characteristic diagram F of the convolution layer_outHas a parameter of W_out×H_out×C_out，W_outWidth of the output characteristic diagram, H_outHeight of the output characteristic diagram, C_outThe number of channels representing the output signature is calculated as follows:

the size of the current convolution kernel is generally 3 × 3, P is 1, and Stride is 1, which can ensure that the sizes of the input feature map and the output feature map are consistent.

(b) Pooling layer compression characteristics: the maximum pooling layer is generally adopted, that is, when the feature map is downsampled, the number with the largest value in a 2 × 2 grid is selected and transmitted to the output feature map. In the pooling operation, the number of channels of the input and output feature layers is unchanged, and the size of the output feature map is half of the size of the input feature map.

The feature extraction of the satellite image can be completed through the convolution pooling operation.

Step 2: the generator generates a visible light image: the designed generator is provided with two input interfaces, wherein the first interface receives the SAR image, the second interface receives the feature vector of the satellite image extracted in the first step, and then a visible light image is generated under the action of the encoder, the converter and the decoder.

(1) Encoder for encoding a video signal

The SAR image is input into an encoder, and the encoder extracts the characteristic information of the SAR image and expresses the characteristic information by a characteristic vector. The encoder consists of three convolutional layers, one 7 x 7 convolutional layer with 32 filters and step 1, one 3 x 3 convolutional layer with 64 filters and step 2, and one 3 x 3 convolutional layer with 128 filters and step 2. A SAR image with the size of [256, 3] is input into a designed encoder, convolution kernels with different sizes in the encoder move on the input image and extract features, and a feature vector with the size of [64, 256] is obtained.

(2) Converter

The role of the converter is to combine different close features of the SAR image and then, based on these features, determine how to convert into a feature vector of the image in the target domain (visible light image). As the characteristic vector of the satellite image is obtained in the step 1 and is used as the prior information of the SAR image, the encoder obtains the characteristic vector of the SAR image. Therefore, two different feature vectors are first fused to be input as features of the decoder. The decoder consists of several residual blocks, the purpose of which is to ensure that the input data information of the previous network layer is directly applied to the following network layer, so that the deviation of the corresponding output (feature vector of the visible light image) from the original input is reduced.

(3) Decoder

The feature vector of the image (visible light image) in the target domain obtained by the converter is taken as input, the decoder receives the input and restores low-level features from the feature vector, the decoding process and the encoding process are completely opposite, and the whole decoding process adopts a transposed convolution layer. Finally, the low-level features are converted to obtain an image in the target image domain, i.e., a visible light image.

And step 3: the discriminator discriminates the visible light image: the picture output by the generator is input into a trained discriminator D, which generates a score D. The closer the output is to the image in the target domain (i.e., the visible light image), the closer the value of d is to 1; otherwise the closer the value of d is to 0. The discriminator D judges whether or not the generated image is a visible light image. The judgment of D is completed by calculating the judgment loss.

The discrimination loss is:

L_GAN(G_AB，D，A，B)＝E_b～B[logD(b)]+E_a～A[log(1-D(G_AB(a)))](2)

wherein, A is a source image domain (SAR image), and B is a target image domain(visible light image), a is SAR image in source image domain, b is visible light image in target image domain, G_ABIs a generator from a source image domain A to a target image domain B, and D is a discriminator. The training process is to make discrimination lose L_GAN(G_ABD, A, B) are as small as possible.

And 4, step 4: verifying the feature similarity of the generated pictures: since the discriminator D can only judge whether the picture generated by the generator is in the visible light image style, the target feature in the picture cannot be discriminated well. In order to prevent the generation of modelCollapse (model folding), the generator has memory. And simultaneously training the other generator, converting the visible light image generated by the generator in the step two into the SAR image, wherein the network architectures of the two generators are completely the same. And verifying the characteristic similarity of the generated pictures by calculating the generation loss.

The generation loss is:

L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_BA(G_AB(a))-a||₁](3)

the generation loss is the Euclidean distance (the Euclidean distance refers to the real distance between two points in m-dimensional space) of two SAR images, wherein A is a source image domain (SAR image), B is a target image domain (visible light image), and G is_ABFor the generator from the source image domain A to the target image domain B, G_BAFrom the target image domain B to the source image domain a. a is the SAR image in the source image domain, G_AB(G_BA(a) Is the generated SAR image. In the training process, L is required_GAN(G_AB，G_BAA, B) are as small as possible.

Advantageous effects

The invention provides a mode method for converting an SAR image into a visible light image based on a countermeasure generation network. The method effectively solves the problem that the traditional model-based method cannot effectively detect and identify the target in the SAR image due to relatively low resolution and fuzzy image quality of the SAR image. The method not only retains the advantages of the SAR image, but also can effectively utilize the existing image processing method, reduce the limitation caused by the SAR image quality problem, has low research cost and considerable research value, and plays an important role in the national economy and military fields.

Drawings

FIG. 1: general framework diagram of the inventive method.

Fig. 2 (a): a network structure diagram of a generator in a countermeasure network is generated.

Fig. 2 (b): a network structure diagram of discriminators in the countermeasure network is generated.

Detailed Description

The invention will now be further described with reference to the examples, figure 1 and figures 2(a) and 2 (b):

the hardware environment tested here was: GPU: intel to strong series, memory: 8G, hard disk: 500G mechanical hard disk, independent display card: NVIDIA GeForce GTX 1080Ti, 11G; the system environment is Ubuntu 16.0.4; the software environment was python3.6, Tensorflow-GPU. The experiment of actually measured data is performed aiming at a mode conversion method of an SAR image and a visible light image. Firstly, obtaining an SAR image (about 5000 images with the image size of 256 x 256) aerial photographed in the flying process of a military aircraft, then obtaining satellite images (5000 images with the image size of 256 x 256) at the same position by combining with satellites, inputting the satellite images into a network built by people, iteratively training the network for 200000 times, wherein the basic learning rate is 0.0002, the learning rate is changed every 100000 times, an Adam optimizer is adopted in the training process, and the model is saved every 10000 times in the training process. Through practical tests, the generated image not only has the property of a visible light image, but also can well show the target characteristics which are not clearly seen in the original SAR image.

The invention is implemented as follows:

step 1: acquiring prior information of the SAR image: the method based on the neural network extracts the characteristic vector of the satellite image at the same position as the prior information of the aerial SAR image, so that the target in the visible light image generated by the countermeasure network is clearer.

(a) Convolution operation extractionIs characterized in that: example input feature map (i.e., satellite image) F of the convolutional layer we designed_inThe parameter of (3) is 256 × 3, 256 indicates the width of the input feature map, 256 indicates the height of the input feature map, and 3 indicates the number of channels of the input feature map. The convolution parameters of the convolution layer are K, S, P, Stride, where K denotes the number of convolution kernels, S denotes the width and height of the convolution kernels, P denotes the zero padding operation performed on the input feature map, for example, P ═ 1 denotes the padding of 0 around the input feature map, and Stride denotes the sliding step of the convolution kernels on the input feature map. In the example we use convolution layers with convolution parameters of 64, 3, 1, respectively. The output characteristic diagram F of the convolution layer_outThe parameter of (3) is 256 × 64,256 represents the width of the output feature map, 256 represents the height of the output feature map, and 64 represents the number of channels of the output feature map, and the parameter is calculated as follows:

wherein, W_in，H_inAnd C_inParameters representing input feature maps, W_out，H_outAnd C_outParameters representing the output profile obtained after convolution of each convolutional layer.

(b) Pooling layer compression characteristics: the maximum pooling layer is adopted to pool the output characteristic diagram obtained after convolution layer convolution, namely when the characteristic diagram is subjected to down-sampling, the number with the maximum median value in a 2 x 2 grid is selected and transmitted to the output characteristic diagram. In the pooling operation, the number of channels of the input and output feature layers is unchanged, and the size of the output feature map is half of the size of the input feature map. We pooled only the first, third, and fourth convolutional layers in the experiment.

Through the convolution pooling operation, the feature extraction of the image can be completed, and the feature vector of the satellite image at the same position is obtained, and the size of the vector is [256, 64 ].

Step 2: training a first generator to generate a visible light image: the generator designed by the inventor is provided with two input interfaces, the first interface receives the SAR image, and the size of the SAR image in the experiment is 256 × 3; and the second interface receives the feature vector of the satellite image extracted in the first step, and then generates a visible light image through the actions of the encoder, the converter and the decoder.

(1) Encoder for encoding a video signal

The SAR image is input into an encoder, and the encoder extracts the characteristic information of the SAR image and expresses the characteristic information by a characteristic vector. The encoder consists of three convolutional layers, one 7 x 7 convolutional layer with 32 filters and step 1, one 3 x 3 convolutional layer with 64 filters and step 2, and one 3 x 3 convolutional layer with 128 filters and step 2. In the experiment, the output scale of the first convolution module was 256 × 64, the output scale of the second convolution module was 256 × 128, and the output scale of the third convolution module was 64 × 256. That is, we input a SAR image with the size of [256, 3] into a designed encoder, and convolution kernels with different sizes in the encoder move on the input image and extract features, and finally obtain a feature vector with the size of [64, 256 ].

(2) Converter

The role of the converter is to combine different close features of the SAR image and then, based on these features, determine how to convert into a feature vector of the image in the target domain (visible light image). Since the prior information of the SAR image, namely the eigenvector of the satellite image is obtained in the step 1, the encoder obtains the eigenvector of the SAR image. Therefore, two different feature vectors are first fused and then input as features of the decoder. The decoder consists of several residual blocks, the purpose of which is to ensure that the input data information of the previous network layer is directly applied to the following network layer, so that the deviation of the corresponding output (feature vector of the visible light image) from the original input is reduced. We used 9 residual blocks in the experiment, each consisting of two 3 x 3 convolutional layers with 256 filters and step 2, and the output scale of the 9 th residual block was 64 x 256.

(3) Decoder

The feature vector of the image (visible light image) in the target domain obtained by the converter is taken as input, the decoder receives the input and restores low-level features from the feature vector, the decoding process and the encoding process are completely opposite, and the whole decoding process adopts a transposed convolution layer. Finally, the low-level features are converted to obtain an image in the target image domain, i.e., a visible light image. In the experiment, three deconvolution modules are defined, each deconvolution module receives the output of the last module as input, and the first deconvolution module receives the output of the 9 th residual block in the converter as input. Wherein the first deconvolution module consists of a 3 x 3 convolutional layer with 128 filters and steps 2, and the output scale is 128 x 128; the second deconvolution module consisted of a 3 x 3 convolutional layer with 64 filters and steps 2, with an output scale of 256 x 64; the third deconvolution module consisted of a 7 x 7 convolutional layer with 3 filters and steps 1, with an output scale of 256 x 3. And finally, activating through the tanh function to obtain the generated output.

And step 3: training a discriminator to discriminate the visible light image: the picture output by the generator is input into a trained discriminator D, which generates a score D. The closer the output is to the target domain image (i.e., the visible light image), the closer the value of d is to 1; otherwise the closer the value of d is to 0. The discriminator D judges whether or not the generated image is a visible light image. The judgment of D is completed by calculating the judgment loss.

The discrimination loss is:

L_GAN(G_AB，D，A，B)＝E_b～B[logD(b)]+E_a～A[log(1-D(G_AB(a)))](5)

wherein, A is a source image domain (SAR image), B is a target image domain (visible light image), a is the SAR image in the source image domain, B is the visible light image in the target image domain, G_ABIs a generator from a source image domain A to a target image domain B, and D is a discriminator. The training process is to make discrimination lose L_GAN(G_ABD, A, B) are as small as possible.

In the experiment, 5 convolution modules are designed for a generator, and the last convolution module is followed by a sigmoid layer to control the output within the range of 0 to 1. The first convolution module consists of a 4 x 4 convolution layer with 64 filters and steps 2, with an output scale of 128 x 64; the second convolution module consists of a 4 x 4 convolution layer with 128 filters and steps 2, with an output scale of 64 x 128; the third convolution module consists of a 4 x 4 convolution layer with 256 filters and steps 2, with an output scale of 32 x 256; the fourth convolution module consists of a 4 x 4 convolution layer with 512 filters and steps 1, with an output scale of 32 x 512; the fifth convolution module consists of a 4 x 4 convolution layer with 1 filter and step 1, with an output scale of 32 x 1. And inputting the output of the fifth convolution module into a sigmoid layer, and activating a sigmoid function to obtain the final output.

Step 4, training a second generator, and verifying the feature similarity of the generated pictures: since the discriminator D can only judge whether the picture generated by the generator is of a visible light style, the target feature in the picture cannot be discriminated well. In order to prevent the generation of the model Collapse (model folding), the generator has memory. And simultaneously training another generator, and converting the visible light image generated by the generator in the step two into an SAR image. And verifying the characteristic similarity of the generated pictures by calculating the generation loss.

The generation loss is:

L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_BA(G_AB(a))-a||₁](6)

the generation loss is the Euclidean distance (the Euclidean distance refers to the real distance between two points in m-dimensional space) of two SAR images, wherein A is a source image domain (SAR image), B is a target image domain (visible light image), and G is_ABFor the generator from the source image domain A to the target image domain B, G_BAFrom the target image domain B to the source image domain a. a is the original SAR image, G_BA(G_AB(a) Is the generated SAR image. In the training process, L is required_GAN(G_AB，G_BAA, B) are as small as possible.

In the experiment, the network architectures of two generators designed by the inventor are completely the same (see step two for details)). During the course of the experiment, we recorded three logs of production losses. The first is to calculate the generation loss of the visible light image generated from the SAR image, L_GAN(G_AB，A,B)＝E_a～A[||G_AB(a)-a||₁]Represents; the second is to calculate the generation loss of the SAR image reconstructed from the visible light image, L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_BA(G_AB(a))-G_AB(a)||₁]Represents; the third is the generation loss of the whole generator, which is represented by L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_BA(G_AB(a))-a||₁]And (4) showing. Wherein, A is a source image domain (SAR image), B is a target image domain (visible light image), G_ABFor the generator from the source image domain A to the target image domain B, G_BAFrom the target image domain B to the source image domain a. a is the original SAR image, G_AB(a) Is the visible light image generated, G_BA(G_AB(a) Is the generated SAR image.

Claims

1. A mode conversion method based on SAR images and visible light images of a countermeasure generation network is characterized by comprising the following steps:

step 1: acquiring prior information of the SAR image:

extracting a characteristic vector of a satellite image at the same position as prior information of an SAR image based on a neural network method, so that a target in a visible light image generated by a countermeasure network is clearer;

step 2: the generator generates a visible light image:

the designed generator is provided with two input interfaces, the first interface receives the SAR image, the second interface receives the eigenvector of the satellite image extracted in the first step, and then a visible light image is generated under the action of the encoder, the converter and the decoder;

and step 3: the discriminator discriminates the visible light image:

inputting the picture output by the generator into a trained discriminator D, wherein the discriminator D can generate a score D; the closer the output is to the image in the target domain, the closer the value of d is to 1; otherwise, the closer the value of d is to 0; judging whether the generated image is a visible light image or not by a discriminator D; the judgment of the discriminator D is completed by calculating the discrimination loss;

and 4, step 4: verifying the feature similarity of the generated pictures: the discriminator D can only judge whether the picture generated by the generator is in the visible light image style or not, and cannot well discriminate the target characteristics in the picture; in order to prevent the model from being folded into ModCollapse, namely, the generator has memory; training the other generator simultaneously, converting the visible light image generated by the generator in the step two into an SAR image, wherein the network architectures of the two generators are completely the same; and verifying the characteristic similarity of the generated pictures by calculating the generation loss.

2. The method of claim 1, wherein the method comprises the following steps: the feature extraction of the satellite image is to extract features through a convolution layer and compress the features through a pooling layer;

convolution operation extraction features: input feature map F of hypothetical convolutional layer_inHas a parameter of W_in×H_in×C_in，W_inWidth of input feature graph, H_inHeight of input feature map, C_inRepresenting the number of channels of the input feature map; the convolution parameters of the convolution layer are K, S, P and Stride, wherein K represents the number of convolution kernels, S represents the width and height of the convolution kernels, P represents the zero padding operation on the input feature map, P-1 represents the padding of a circle 0 around the input feature map, and Stride represents the sliding step length of the convolution kernels on the input feature map; the output characteristic diagram F of the convolution layer_outHas a parameter of W_out×H_out×C_out，W_outWidth of the output characteristic diagram, H_outHeight of the output characteristic diagram, C_outThe number of channels representing the output signature is calculated as follows:

pooling layer compression characteristics: and (4) adopting a maximum pooling layer, namely selecting the number with the maximum value in a 2 x 2 grid to be transmitted to an output feature map when the feature map is subjected to down-sampling.

3. The method of claim 2, wherein the method comprises the following steps: the size of the convolution kernel is 3 × 3, P is 1, and Stride is 1, so that the sizes of the input feature map and the output feature map are consistent.

4. The method of claim 2, wherein the method comprises the following steps: in the pooling operation, the number of channels of the input and output feature layers is unchanged, and the size of the output feature map is half of the size of the input feature map.

5. The method of claim 1, wherein the method comprises the following steps: the SAR image is input into an encoder, and the encoder extracts the characteristic information of the SAR image and expresses the characteristic information by a characteristic vector.

6. Mode conversion method based on SAR images and visible light images of a countermeasure generation network according to claim 1 or 5, characterized in that: the encoder consists of three convolutional layers, one with 32 filters and 7 x 7 with step 1, one with 64 filters and 3 x 3 with step 2, and one with 128 filters and 3 x 3 with step 2; inputting a SAR image with the size of [256, 3] into a designed encoder, moving convolution kernels with different sizes in the encoder on the input image and extracting features to obtain a feature vector with the size of [64, 256 ].

7. The method of claim 1, wherein the method comprises the following steps: the characteristic vector of the satellite image is obtained in the step 1 and is used as prior information of the SAR image, and the encoder obtains the characteristic vector of the SAR image; therefore, two different feature vectors are fused firstly and can be used as the feature input of the decoder; the decoder consists of several residual blocks, which are used to ensure that the input data information of the previous network layer is directly applied to the following network layer, so that the deviation of the corresponding output from the original input is reduced.

8. The method of claim 1, wherein the method comprises the following steps: the feature vector of the image in the target domain obtained by the converter is used as input, the decoder receives the input and restores low-level features from the feature vector, the decoding process and the encoding process are completely opposite, and the whole decoding process adopts a transposed convolution layer; finally, the low-level features are converted to obtain an image in the target image domain, i.e., a visible light image.

9. The method of claim 1, wherein the method comprises the following steps: the discrimination loss is:

L_GAN(G_AB，D，A，B)＝-(E_b～B[logD(b)]+E_a～A[log(1-D(G_AB(a)))]) (2)

wherein, A is a source image domain, B is a target image domain, a is an SAR image in the source image domain, B is a visible light image in the target image domain, G_ABA generator from a source image domain A to a target image domain B, and a discriminator; the training process is to make discrimination lose L_GAN(G_ABD, A, B) are as small as possible.

10. The method of claim 1, wherein the method comprises the following steps: the generation loss is:

L_GAN(G_AB，G_BA，A，B)＝E_a～A[||G_BA(G_AB(a))-a||₁](3)

the generation loss is the Euclidean distance of two SAR images, wherein A is a source image domain, B is a target image domain, and G is_ABFor the generator from the source image domain A to the target image domain B, G_BAA generator from a target image domain B to a source image domain A; a is the SAR image in the source image domain, G_AB(G_BA(a) Is the generated SAR image; in the training process, L is required_GAN(G_AB，G_BAA, B) are as small as possible.