CN115797163A

CN115797163A - Target data cross-domain inversion augmentation method based on remote sensing image

Info

Publication number: CN115797163A
Application number: CN202310101406.8A
Authority: CN
Inventors: 杨小冈; 王思宇; 申通; 卢瑞涛; 席建祥; 朱正杰; 陈璐; 李云松
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-03-14
Anticipated expiration: 2043-02-13
Also published as: CN115797163B

Abstract

The invention provides a target data cross-domain inversion augmentation method based on remote sensing images, which comprises the following steps: step 1, generating image data multi-domain conversion of a countermeasure network based on circulation; step 2, augmenting multi-domain data based on comparison learning; and 3, carrying out image migration and synthesis to obtain a multi-domain augmentation data set. The method takes a generation countermeasure network as a frame, introduces an image data multi-domain conversion method based on a cycle generation countermeasure network and a multi-domain data augmentation method based on contrast learning, transfers a visible light remote sensing image into an infrared image and an SAR image, and takes a synthetic data set as a matching reference map of the unmanned aerial vehicle, thereby realizing the navigation and positioning tasks of the unmanned aerial vehicle by using the multi-domain images in a multi-source sensor; the method has good performance and improves the precision of the positioning matching algorithm.

Description

Target data cross-domain inversion augmentation method based on remote sensing image

Technical Field

The invention belongs to the technical field of image data set preparation, relates to target data, and particularly relates to a remote sensing image-based target data cross-domain inversion augmentation method.

Background

In recent years, unmanned flying patrol devices are rapidly developed and gradually applied to multiple fields such as military reconnaissance, striking, surveying and mapping, fire rescue, electric power line patrol and the like, and the flying patrol device realizes intelligent visual navigation and positioning technology by using a multi-source image sensor carried by the flying patrol device, so that the flying patrol device becomes a current research hotspot.

With the progress of the technology, the resolution of the optical remote sensing image is continuously improved. And obtaining information of a remote target and surrounding environment by using the optical remote sensing image, thereby realizing tasks such as navigation, positioning, reconnaissance, striking and the like of the unmanned aerial vehicle.

With the improvement of the artificial intelligence technology, the future human society is inevitably a big data era of high-speed and intelligent development, and the intelligent scene matching technology becomes one of the important ways of navigation and positioning. The intelligent matching algorithm model based on deep learning is obtained by training a data set and analyzing and mining different differential information of data, so that a large number of multi-domain heterogeneous images are required to be used as data support in an early model training process, and the quality of the data set directly influences the capability of an artificial intelligent algorithm model. Most research focuses mainly on the algorithm model of artificial intelligence, but neglects that the intelligent algorithm needs a lot of data as a driver to obtain better algorithm performance.

Due to the lack of other domain image samples, navigation and positioning by using a multi-source imaging sensor are a difficult task, so that the development of a target data cross-domain inversion augmentation method based on remote sensing images is a task with great practical significance and higher difficulty.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a target data cross-domain inversion augmentation method based on remote sensing images, and solve the technical problem that the positioning accuracy of various imaging technologies in the unmanned aerial vehicle navigation positioning task in the prior art needs to be further improved.

In order to solve the technical problems, the invention adopts the following technical scheme:

a target data cross-domain inversion augmentation method based on remote sensing images comprises the following steps:

step 1, generating image data multi-domain conversion of the countermeasure network based on circulation:

step 101, generating an image of the countermeasure network based on the loop.

And 102, judging a generated image of the loop generation countermeasure network based on the loop generation countermeasure network.

Step 103, designing a total loss function between the generated image and the true value.

And 2, augmenting multi-domain data based on comparative learning.

Step 3, image migration and synthesis are carried out to obtain a multi-domain augmentation data set:

step 301, a set of visible light remote sensing image/infrared image unpaired data sets are given for use respectively

And

representing, and using a set of visible remote sensing image/SAR image unpaired data sets respectively

And

representing, given a set of remote-sensing image data of visible light to be converted

For use as a verification set;

step 302, training the two groups of data given in step 301 respectively through the method of generating image data multi-domain conversion of the countermeasure network based on circulation in step 1, and realizing conversion of the visible light remote sensing image into a corresponding infrared image data set through model reasoning

And SAR image dataset

；

Step 303, training the two groups of data given in the step 301 respectively through the contrast learning-based multi-domain data augmentation method in the step 2, and realizing conversion of the visible light remote sensing image into a corresponding infrared image data set through model reasoning

And SAR image dataset

；

Step 304, respectively fusing the data sets obtained in the step 302 and the step 303 to form a fused infrared image data set

And fusing the SAR image dataset

Thereby forming a multi-domain augmented data set.

And 4, similarity calculation and matching test:

and (4) taking the multi-domain augmentation data set obtained in the step (3) as a reference image, and calculating the similarity of the image through a PSNR algorithm and an LPIPS algorithm.

And (4) taking the multi-domain augmentation data set obtained in the step (3) as a reference map, and performing matching test through an ORB algorithm and a LoFTR algorithm.

Compared with the prior art, the invention has the following technical effects:

the method comprises the steps that (I) a countermeasure network is generated as a framework, an image data multi-domain conversion method based on a cyclic generation countermeasure network and a multi-domain data augmentation method based on contrast learning are introduced, a visible light remote sensing image is transferred into an infrared image and an SAR image, a synthetic data set is used as a matching reference map of the unmanned aerial vehicle, and therefore the unmanned aerial vehicle can utilize the multi-domain image in a multi-source sensor to conduct navigation and positioning tasks; the method has good performance and improves the precision of the positioning matching algorithm.

The method of the invention does not need a training data set based on image pairs in the model training process based on the cycle generation countermeasure network and the contrast learning-based multi-domain data augmentation method, greatly reduces the difficulty of data preparation before training, and improves the image conversion efficiency.

The method of the invention converts the single-domain image into the multi-domain image, reduces the single sensor limitation of the unmanned aerial vehicle in the visual navigation, and utilizes the multi-domain image in the multi-source sensor to perform navigation positioning, thereby effectively improving the positioning precision of the aerial vehicle.

(IV) the method of the invention carries out a large amount of data generation and experimental comparison. Compared with the traditional matching algorithm and the existing intelligent matching algorithm, the multi-domain data set generated by the method improves the probability of image matching and the effectiveness is well verified.

And (V) the method prevents the overfitting problem frequently occurring in deep learning training by adding the data set, improves the precision and generalization capability of the model, and enriches the types of the different source data sets, thereby realizing the visual navigation and positioning of the multi-domain image.

Drawings

Fig. 1 is a schematic diagram of a cycle generation countermeasure network architecture.

FIG. 2 is a comparative learning generator framework diagram.

Fig. 3 (a) and fig. 3 (b) are schematic diagrams of the conversion effect of the visible light remote sensing image/the infrared image.

Fig. 4 (a) and fig. 4 (b) are schematic diagrams of the conversion effect of the visible light remote sensing image/SAR image.

Fig. 5 is a schematic diagram of a matching result of an original visible light remote sensing image/an original infrared image.

Fig. 6 is a diagram illustrating the converted infrared image/original infrared matching result.

Fig. 7 is a schematic diagram of an original visible light remote sensing image/original SAR matching result.

Fig. 8 is a diagram illustrating the SAR/original SAR matching result after conversion.

The present invention will be explained in further detail with reference to examples.

Detailed Description

It is to be understood that all devices and algorithms described in the present invention, unless otherwise specified, are intended to be implemented using any and all materials and algorithms known in the art.

In the present invention, "/" means "and", and for example, "visible light remote sensing image/SAR image" means a visible light remote sensing image and a SAR image.

SAR, known as Synthetic Aperture Radar, is a Synthetic Aperture Radar.

The invention discloses a target data cross-domain inversion augmentation method based on remote sensing images, and provides a data augmentation method which is expanded from a single domain to multiple domains to solve the problem of multi-source scene matching navigation positioning based on deep learning. According to the method, the overfitting problem frequently occurring in deep learning training is prevented by adding the data set, and the accuracy and generalization capability of the model are improved.

In the invention, various imaging technologies in the navigation and positioning task of the unmanned aerial vehicle are considered, and in order to meet the navigation and positioning requirements, the target data cross-domain inversion augmentation method based on the remote sensing image is designed to enrich the variety of different source data sets, thereby realizing the multi-domain image visual navigation and positioning.

The following embodiments are given as examples of the present invention, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are included in the protection scope of the present invention.

Example (b):

the embodiment provides a target data cross-domain inversion augmentation method based on remote sensing images, which comprises the following steps:

step 1, generating image data multi-domain conversion of the confrontation network based on circulation:

the architecture of the loop generation countermeasure network is shown in fig. 1, and the loop generation countermeasure network includes three parts, namely feature extraction (i.e. encoding), image domain conversion and image reconstruction (i.e. decoding).

Step 101, generating images of the countermeasure network based on loop generation:

this step is intended to learn two domains of a given training sample

And

a mapping relationship between, wherein

，

The method comprises two generation mapping relations

And

first, using the generator

So that the sample is made of

Domain conversion to

A domain then utilizes a generator

So that the sample is made of

Domain conversion to

A domain.

In step 10101, the initial convolution operation is performed on the original image, the image size is not changed, but the feature map of the image is converted from 3 to 64.

Step 10102, two convolution layers are adopted to extract abstract features of the input image, and finally the dimension of the input image is converted from 256 × 256 × 64 to 64 × 64 × 256.

10103, using a plurality of residual error modules to combine the features

Domain conversion to

A domain.

Step 10104, finally, decoding is carried out by utilizing two layers of deconvolution to realize the image decoding

Domain to

And (4) converting the domain.

Step 102, based on the generated image discrimination of the loop generation countermeasure network:

the discriminator for generating image discrimination is a classifier based on four convolutional layers, the feature map of the input image is extracted from 3 dimensions to 512 dimensions by utilizing the convolutional layers, and then the confidence rate of the image is discriminated through a full connection layer and an average pooling layer.

Step 103, designing a total loss function between the generated image and the true value:

step 10301, resist loss function

Applied to mapping functions

And corresponding discriminator

(ii) a Will fight the loss function

Applied to mapping functions

And corresponding discriminatorD _A 。

In the formula:

Ato representAA domain;

Bto representBA domain;

to representAA generator corresponding to the domain;

to representBA generator corresponding to the domain;

D _A to representAA discriminator corresponding to the domain;

to representBAnd a discriminator corresponding to the domain.

The expression of (A) is as follows:

in the formula:

arepresenting an image;

ba true value is represented;

denotes b is distributed over P _data (b) (iii) a desire;

denotes a distribution in P _data (a) (iii) a desire;

represents a mathematical expectation;

is represented as distributed;

P _data () Representing the probability density of the data.

In the present step, the first step is carried out,

for generating like

Image of a domain

，

For distinguishing transformed image samples

With real samples

。

Mapping function of this step

And corresponding arbiterD _A Introduce similar antagonism losses

。

Step 10302, for each sheet, from

Image of a domain

Using a cyclic consistency loss function

For images

Processed, image

One cycle should suffice to image

Restore to original image, e.g.

。

Said cyclic consistency loss function

The expression is as follows:

in the formula:

| represents the norm.

Step 10303, generate a total loss function between the image and the true value

Expressed as:

in the formula:

parameters representing a cyclic loss function.

In the present embodiment, the first and second electrodes are,

the relative importance of these two goals is controlled.

Step 2, contrast learning-based multi-domain data augmentation:

generating images based on a codec, wherein the input field of the generator for generating the images is

The output domain is

Giving unpaired datasets

；

In the formula:

representing an input field;

representing an output domain;

representing a set of real numbers;

Hrepresents the height of the image;

Wrepresents the width of the image;

Crepresenting the number of channels of the image;

Arepresenting unpaired datasets corresponding to input fields;

Brepresenting an unpaired data set corresponding to an output domain;

arepresenting a data setAThe data of (1);

brepresenting a data setBThe data of (1).

The image generation generator is divided into two partial coders

And decoder

Thereby producing an output image

(ii) a In the present embodiment, the framework of the generator for generating the image is as shown in fig. 2. Using encoders

And acquiring high-dimensional characteristic vectors, and performing iterative training through a total contrast loss function to realize multi-domain data augmentation.

The total contrast loss function is:

in the formula:

representing a resistance loss function;

representing a maximum mutual information loss function;

representing an external loss function;

representing a maximum mutual information loss parameter;

representing an external loss parameter;

Ga representation generator;

Da presentation discriminator;

Arepresenting unpaired datasets corresponding to input fields;

Brepresenting unpaired datasets corresponding to output domains;

Mrepresenting a multi-tier perceptron network.

In this embodiment, when

，

The method can be regarded as a lightweight version of a CycleGan network during the combined training.

In this embodiment, the penalty function is resisted

Maximizing mutual information loss function

And external loss function

Are calculated by common calculation methods known in the art.

And 3, carrying out image migration and synthesis to obtain a multi-domain augmentation data set:

in step 301, a set of variables is givenRespectively using visible light remote sensing image/infrared image non-paired data sets

And

the representation and a group of visible light remote sensing image/SAR image unpaired data sets are respectively used

And

For use as a verification set;

And SAR image dataset

；

And SAR image dataset

；

And fusing SAR image datasets

Thereby forming a multi-domain augmented data set.

In the present embodiment, the visible light remote sensing image conversion effect is as shown in fig. 3 (a), fig. 3 (b), fig. 4 (a), and fig. 4 (b).

Step 4, similarity calculation and matching test:

and (3) taking the multi-domain augmentation data set obtained in the step (3) as a reference image, and calculating the similarity of the images through a PSNR (peak-to-noise ratio) algorithm and an LPIPS (learning and perception image block similarity) algorithm.

In the step, the generation effect of the visible light remote sensing image/infrared image and the visible light remote sensing image/SAR image is evaluated through the similarity, wherein the larger the PSNR is, the smaller the LPIPS is, the higher the representative image similarity is.

In the present example, the evaluation results are shown in tables 1 and 2.

TABLE 1 comparison of conversion effect between visible light remote sensing image and infrared image

TABLE 2 comparison of conversion effects of visible light remote sensing image/SAR image

And (4) taking the multi-domain augmentation data set obtained in the step (3) as a reference map, and performing matching test through an ORB (brief feature point description) algorithm and a LoFTR (local feature matching) algorithm.

In this example, the test results are shown in fig. 5, 6, 7, and 8.

Simulation example:

the effects of the present invention are further illustrated by the following simulations:

1. simulation conditions are as follows:

in order to verify the effectiveness of the invention, multi-domain augmentation is carried out on a plurality of groups of data sets, and corresponding infrared and SAR image results are obtained. The experimental environment is as follows: the operating system is Ubuntu18.04, and the processor is a notebook computer with 2.9GHz IntelXeon E5-2667.

2. Simulation experiment:

the invention is used for generating a large amount of data and comparing experiments. Compared with the conventional matching algorithm and the existing intelligent matching algorithm, the multi-domain data set generated by the method improves the image matching precision and has a good effect on navigation and positioning of the unmanned aerial vehicle.

Fig. 5 shows the matching result of the original visible light remote sensing image/the original infrared image. Fig. 6 is a converted infrared image/original infrared matching result. Fig. 7 shows the original visible light remote sensing image/original SAR matching result. Fig. 8 shows the SAR/raw SAR matching result after conversion. From the figure, the multi-domain data augmentation method solves the problems of mismatching, inaccurate navigation positioning and the like, achieves navigation positioning of multi-domain images in the multi-source sensor, and effectively improves the positioning precision of the aircraft.

Comparative example 1:

the comparison example shows a target data cross-domain inversion augmentation method, and other steps of the method are basically the same as those of the embodiment except for the difference of the first step. In this comparative example, specifically:

step one, adjusting a loss function:

the loss function of the algorithm is a binary cross entropy loss function, namely, the loss function combining the binary cross entropy and the Sigmoid activation function is used for training.

Comparative example 2:

step one, adjusting a loss function:

the loss function of the algorithm is the Smooth L1 loss function, i.e. trained with a loss function that uses a square function around point 0 to make it smoother.

Compared with the embodiment, the comparative example 1 and the comparative example 2, the network convergence speed of the method is higher, the network stability is higher, the conversion effect after model training is shown in fig. 3 (a), fig. 3 (b), fig. 4 (a) and fig. 4 (b), and the matching experiment result shows that the loss function used by the method has a better conversion effect.

Comparative example 3:

the comparison example provides a target data cross-domain inversion augmentation method, and the method adopts a styleGAN model to perform cross-domain inversion on a remote sensing visible light remote sensing image.

Comparative example 4:

the comparison example provides a target data cross-domain inversion augmentation method, and the method adopts a Pix2Pix model to perform cross-domain inversion on a remote sensing visible light remote sensing image.

Comparing and analyzing the embodiment, the comparative example 3 and the comparative example 4, it can be found that the infrared and SAR images generated by the method have better modal consistency, and are closer to real infrared and SAR images on the basis of ensuring that the details of the image content are unchanged, while the comparative example 3 and the comparative example 4 have partial distortion.

Claims

1. A target data cross-domain inversion augmentation method based on remote sensing images is characterized by comprising the following steps:

step 101, generating an image of a countermeasure network based on loop generation;

102, judging a generated image of the countermeasure network based on cycle generation;

103, designing a total loss function between the generated image and a true value;

step 2, augmenting multi-domain data based on comparison learning;

step 301, a set of visible light remote sensing image/infrared image unpairedThe data sets being used separately

And

And

For use as a verification set;

And SAR image dataset

；

And SAR image dataset

；

Step 304, respectively fusing the data sets obtained in the step 302 and the step 303 to form a fused infrared imageImage data set

And fusing SAR image datasets

Thereby forming a multi-domain augmented data set.

2. The remote sensing image-based target data cross-domain inversion augmentation method of claim 1, wherein the step 101 comprises the following steps:

step 10101, carrying out initialization convolution operation on the original image, wherein the size of the image is unchanged, but the feature map of the image is converted from 3 to 64;

10102, extracting abstract features of the input image by adopting two convolution layers, and finally converting the dimensionality of the input image from 256 × 256 × 64 into 64 × 64 × 256;

10103, using a plurality of residual error modules to combine the features

Domain conversion to

A domain;

step 10104, finally, decoding is carried out by utilizing two layers of deconvolution to realize image decoding

Domain to

And (4) converting the domain.

3. The remote sensing image-based target data cross-domain inversion augmentation method of claim 2, wherein the step 102 comprises the following steps: the discriminator for generating image discrimination is a classifier based on four convolutional layers, extracts the feature map of the input image from 3 dimensions to 512 dimensions by using the convolutional layers, and then discriminates the confidence rate of the image through a full connection layer and an average pooling layer.

4. The remote sensing image-based target data cross-domain inversion augmentation method of claim 3, wherein step 103 comprises the following steps:

step 10301, a penalty function is run

Applied to mapping functions

And corresponding discriminator

(ii) a Will fight against loss function

Applied to mapping functions

And corresponding arbiterD _A ；

In the formula:

Ato representAA domain;

Bto representBA domain;

to representAA generator corresponding to the domain;

to representBA generator corresponding to the domain;

D _A to representAA discriminator corresponding to the domain;

to representBA discriminator corresponding to the domain;

step 10302, for each sheet, from

Image of a domain

Using a cyclic consistency loss function

For images

Processed, image

One cycle should suffice to image

Restoring to an original image;

step 10303, generate a total loss function between the image and the true value

Expressed as:

in the formula:

parameters representing a cyclic loss function.

5. The remote sensing image-based target data cross-domain inversion augmentation method of claim 1, wherein the step 2 comprises the following steps:

image generation based on codecThe input field of the device is

The output domain is

Giving unpaired datasets

；

In the formula:

representing an input field;

representing an output domain;

representing a set of real numbers;

Hrepresents the height of the image;

Wrepresents the width of the image;

Crepresenting the number of channels of the image;

Arepresenting unpaired datasets corresponding to input fields;

Brepresenting unpaired datasets corresponding to output domains;

arepresenting a data setAThe data of (1);

brepresenting a data setBThe data of (1);

the image generation generator is divided into two partial coders

And decoder

Thereby producing an output image

(ii) a Using encoders

Obtaining high-dimensional characteristic vectors, and performing iterative training through a total contrast loss function to realize multi-domain data augmentation;

the total contrast loss function is:

in the formula:

representing a resistance loss function;

representing a maximum mutual information loss function;

representing an external loss function;

representing a maximum mutual information loss parameter;

representing an external loss parameter;

Ga representation generator;

Da presentation discriminator;

Arepresenting an unpaired data set corresponding to an input domain;

Bindication inputOut-of-domain corresponding unpaired datasets;

Mrepresenting a multi-tier perceptron network.

6. The remote sensing image-based target data cross-domain inversion augmentation method of claim 1, further comprising a step 4 of similarity calculation and matching test:

taking the multi-domain augmentation data set obtained in the step 3 as a reference image, and calculating the similarity of the image through a PSNR algorithm and an LPIPS algorithm;