CN115049841A

CN115049841A - Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method

Info

Publication number: CN115049841A
Application number: CN202210664345.1A
Authority: CN
Inventors: 张雨; 任仲乐; 侯彪; 焦李成; 韩祥永; 张锐; 苏海波; 李永强
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-09-13

Abstract

A high-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaptation comprises the steps of firstly converting a source domain image into a similar style of a target domain image by using an image transformation model, thereby reducing the distribution difference of a source domain and a target domain; then, the source domain image with the converted style and the unmarked target domain image are sent to an adaptive network, on one hand, the deep network is trained to learn the characteristics of the source domain and the target domain and carry out semantic segmentation, on the other hand, a discriminant is trained to distinguish whether the input of the source domain or the target domain is from the source domain, and the deep network is guided to align the distribution characteristics of the source domain and the target domain in a feedback way; finally, predicting the ground feature type of the large scene of the target domain by using the trained model, and completing pixel-level ground feature element extraction of the single-polarized high-resolution SAR image; the method breaks through the bottleneck that the model popularization is poor due to insufficient labeled samples and inconsistent distribution of the source domain data and the target domain data, and improves the accuracy and performance of the ground feature classification of the SAR image of the target domain.

Description

High-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaption

Technical Field

The invention belongs to the technical field of intelligent interpretation of radar remote sensing images, and particularly relates to a high-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaptation.

Background

Synthetic Aperture Radar (SAR) is a high-resolution active microwave remote sensing imaging radar, has the characteristics of all weather, short imaging period and continuous monitoring all day long, has strong penetrability, is not influenced by weather conditions such as cloud, rain and fog, can penetrate earth surface structures such as earth surface and leaf clusters, and is widely applied to military and civil aspects, such as environmental protection, disaster monitoring, ocean observation, resource protection, land coverage, accurate agriculture, urban area detection, geographical mapping and the like. Semantic segmentation of high-resolution large-scene land coverage images obtained by satellites is a very challenging task, and faces the difficult problems of marked data scarcity, data feature difference caused by different imaging parameters (sensors, frequency bands, polarization, resolution or incidence angles) and the like. Therefore, how to fully utilize the existing tag data for migration becomes a solution which is popular in the present time. Domain adaptation (Domain adaptation) can overcome the difference of different SAR data, and migrate knowledge from a source Domain to a different but related target Domain, which solves the problem of limited training samples in the target Domain.

The traditional domain self-adaptive method is mainly carried out on the three aspects of feature-based, example-based and model-based, and aims to adjust a source domain sample and a target domain sample to the same feature space by using a mapping phi so as to enable the feature space samples to be aligned; considering that some samples in the source domain are similar to the samples in the target domain, the example-based method multiplies the Loss of all the samples in the source domain by a weight wi during training, and the weight is larger for the samples with the similarity to the target domain; the model-based approach aims at finding new parameters θ', which enable the model to work better on the target domain through migration of the parameters.

Deep learning domain adaptation mainly comprises difference-based, confrontation-based and reconstruction-based methods. The method is a domain adaptive method based on countermeasure, and since the source domain data and the target domain data with different distributions naturally exist in the domain adaptive method, the process of generating samples can be omitted, and the target domain is directly used as the generated samples. At this time, the generator degenerates into a feature extractor, constantly learning domain features, so that the discriminator cannot distinguish between the two domains. However, the unsupervised antibody domain adaptive method has many defects, and the robustness of the antibody domain adaptive method is poor due to long tail distribution on data categories and category distribution difference on different data domains; the adaptation of the anti-domain is not theoretically guaranteed in the regression problem, the distribution of the features is discrete and diffuse in the whole space, and even if the discriminator is successfully deceived, the features of the source domain and the target domain cannot be guaranteed to be drawn close according to the same label; in addition, although the distribution of data in different domains is drawn in the feature space by the aid of the anti-domain self-adaptation, and the mobility of features in different domains is improved, the discriminability of the data features is possibly reduced, and the difficulty in training the classifier by the aid of the fixed anti-domain self-adaptation features is increased.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a high-resolution SAR image surface feature extraction method based on depth unsupervised multi-step anti-domain self-adaptation, which improves the classification precision of a target domain through an auxiliary task and solves the problems of insufficient training samples and inconsistent distribution of training and testing samples.

In order to achieve the purpose, the invention adopts the following technical scheme:

a high resolution SAR image surface feature element extraction method based on depth unsupervised multistep anti-domain self-adaptation translates a source domain image into a style of a target domain by means of a style migration upstream task, and draws the distribution of the source domain and the target domain; and the translated source domain and the unmarked target domain data are sent to an impedance self-adaptive network, a feature extractor is trained to extract and classify the features of the source domain and the target domain, and a domain discrimination network is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain and encourage the feature extractor to align the output distribution of the target domain and the source domain.

A high-resolution SAR image surface feature extraction method based on depth unsupervised multi-step anti-domain self-adaptation comprises the following steps:

s1, performing data preprocessing on the source domain image and the target domain image, wherein the data preprocessing comprises the steps of converting 16 bits into 8 bits of the SAR image, cutting, clipping, dividing and converting the data format;

s2, sending the preprocessed source domain image S and the preprocessed target domain image T into an image translation network for style migration to obtain translation source data S';

s3, initializing a segmentation network M of a downstream task and an optimizer SGD thereof, and initializing a domain discrimination network D and an optimizer Adam thereof; distinguishing whether the output of the feature extractor comes from a source domain or a target domain through a training domain distinguishing network D, and simultaneously encouraging the feature extractor to align the output distribution of the target domain image and the source domain image to help the feature extractor to learn the invariant features of the domain;

s4, sending the translation source data S', the corresponding label Ys and the target domain image T into a segmentation network M to obtain segmentation outputs M (S) and M (T), and using the corresponding label Y _S Calculating the segmentation loss of the source domain;

s5, inputting the output M (T) of the segmented network M to the target domain into a domain discrimination network D, calculating the confrontation loss of the discriminating network D, multiplying the confrontation loss by a corresponding coefficient, adding the multiplied counteracting loss into the target domain segmentation loss, and updating the segmented network M and the optimizer SGD thereof;

s6, respectively sending the output M (S) and M (T) of the divided network into a domain discrimination network D to calculate the domain classification loss, and updating the domain discrimination network D and an optimizer Adam thereof;

s7, repeating S4 to S6 until the maximum training times are reached, and obtaining model parameters of the segmentation network M;

s8, the target domain data are sent into a trained segmentation network M for classification, label optimization is carried out by using TTA test or training CRF to obtain pixel-level classification results, each class is endowed with colors to generate an RGB prediction result graph, the RGB prediction result graph is compared with a real class label, and evaluation indexes Precision, Recall and F1ccore of each class and overall evaluation indexes OA, kappa, MIoU and IoFWU are calculated.

The S1 specifically includes:

(1a) storing conversion and truncation: performing truncation/contrast stretching on the image, discarding the gray level with low occurrence probability, and reserving the gray level range with high occurrence probability; counting the gray level distribution of the 16-bit SAR image, counting the occurrence frequency of the 16-bit SAR image according to the size of the gray level, and when the current distribution is accumulated to a Threshold (Threshold) through an accumulation distribution function, discarding the residual pixels, setting all the pixels exceeding the Threshold gray level as the gray level of the current Threshold, and dividing the gray level by the Threshold and multiplying the gray level by 255 to convert the gray level into 8-bit SAR data storage;

the linear stretching formula is:

wherein gray represents a gray level; min _in ，max _in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min(s) _out ，max _out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, Threshold is generally set to 95%, min _in Set to zero;

(1b) resampling test data by adopting expansion sampling mode and splicingIgnoring edge prediction; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is

Setting the dilation boundary slidesize to 100, the size of A-a;

(1c) based on an AI development platform ModelArts and a self-research framework Mindspore thereof, image data in the jpg format, the png format and the tif format are converted into a MindRecord format, and the data is further read through a MindDataset interface, wherein the data format has the characteristics that: the unified storage and access of data are realized; data are aggregated and stored and efficiently read, so that the data are convenient to manage and move during training; efficient data encoding and decoding operations enable a user to be unaware of the data operations; the partition size of data segmentation is flexibly controlled, and distributed data processing is realized.

The S2 specifically includes:

(2a) the translation network uses classical cycleGAN, establishes bidirectional mapping relation generators G and F between a source domain image S and a target domain image T, and uses two discriminators D _S And D _T Distinguishing source domain images S and F (T), target domain images T and G (S) respectively, wherein the loss function comprises two parts of loss resistance and cycle consistency loss; in addition, define

Is the sample space of the source domain S,

a sample space that is a target domain T;

(2b) the resistance loss: the mapped data distribution is made to approach that of the target domain, and the generator G learns the mapping of the source domain image S to the target domain image T (G: S → T); the generator F learns the mapping of the target domain image T to the source domain image S (F: T → S);

the challenge loss for S- > T is:

wherein G(s) is a false graph similar to the target domain Y generated by the generator G, D _T Representing the probability that the input variable is a sample in T-space, aiming at distinguishing the translation sample G(s) from the true sample T; the goal is to minimize G and maximize D _T ；

The challenge loss for T- > S is:

wherein F (t) is a false graph, D, generated by the generator F, similar to the source domain S _S Representing the probability that the input variable is a sample in S-space, intended to distinguish the translated samples f (t) from the true samples S; the goal is to minimize F and maximize D _S ；

(2c) Loss of cycle consistency: ensuring that generators G and F of two learned mappings do not contradict with each other, and hopefully, G (F (T)) is similar to T and F (G (S)) is similar to S as much as possible while the generators G and F learn the two mappings, so that the generator G is prevented from excessively learning samples of a target domain image T space and excessively changing samples of a source domain image S space, and L1loss is used;

for each image S from the source domain S, G and F satisfy a forward loop consistency, bringing S back to the original image one cycle after the image translation loop, i.e., x → G (x) → F (G (x)) ≈ x; similarly, for each image T of the target field T, G and F should also satisfy the inverse cycle consistency, i.e., y → F (y) → G (F (y) ≈ y);

(2d) final loss function:

l(G,F,D _S ,D _T )＝l _GAN (G,D _T ,S,T)+l _GAN (F,D _S ,S,T)+λl(G,F)

the final overall loss is expressed as the competing losses of S- > T, T- > S, and the cyclic consistent loss components of generator G and generator F, where λ is a coefficient;

the final goal is to optimize:

the S3 specifically includes:

(3a) the segmentation network M uses a DeepLabv2 model architecture based on ResNet101 and is used for outputting spatial structural information and continuously learning domain features, so that a discriminator cannot distinguish two domains; the ASPP (asynchronous spatial profiling) module uses a multi-scale to increase the receptive field from k of the normal convolution to (k + (k-1) (r-1));

(3b) the domain discrimination network D consists of an input layer, 5 convolutional layers and an activation function layer, wherein the convolutional layers use 2D convolution, the pooling layers use LeakyReLU, and the alpha coefficient is 0.2; the LeakyReLU adjusts for the negative zero gradient (zero gradients) problem by giving the negative input α x a very small linear component of x, which gets a positive gradient of α when x <0, alleviating the Dead ReLU problem;

setting the number of feature maps to be 5 for the input layer of the layer 1;

for the 2 nd convolutional layer, setting the number of feature maps to be 64, setting the size of a filter to be 4 and the step size to be 2;

for the layer 3 convolutional layer, setting the number of feature maps to be 128, setting the filter size to be 4 and the step size to be 2;

for the 4 th convolutional layer, set the number of feature maps to 256, set the filter size to 4, and set the step size to 2;

for the 5 th convolutional layer, setting the number of feature maps to be 512, setting the size of a filter to be 4 and the step size to be 2;

for the 6 th convolutional layer, setting the number of feature maps to be 1, setting the size of a filter to be 4 and the step length to be 2;

setting the alpha coefficient to be 0.2 for the 7 th layer of the activation function;

(3c) for the segmented network M, the maximum iteration number is set to 56000iter, and the initial learning rate lr is set to 2.5e ^-4 Weight attenuation of 5e ^-4 A random gradient descent method SGD is adopted to minimize a loss function of the segmentation network M;

(3d) for the domain discrimination network D, the initial learning rate is set to 1e ^-4 Coefficient of opposing loss λ _adv At 0.001, the loss function of the network D is judged using the adaptive moment estimation Adam minimization domain.

The segmentation loss l in S4 _seg Using the cross-entropy penalty, the source domain partition penalty is defined as follows:

wherein Y is _S Is the label mapping of Is, C Is the number of classes, H and W are the height and width of the output probability mapping, P _S Is the source domain probability of the segmentation adaptation model M, defined as P _S ＝M(I’ _S )。

The antagonistic loss in S5 is defined as follows:

wherein, define

Is the sample space of the source domain S,

a sample space that is a target domain T; i' _s ，I _t Respectively representing input translation source domain and target domain samples; discriminator D for counterlearning _M Intended to reduce the difference between the source domain and target domain features extracted by the segmented network M;

the total loss function of the learning segmentation network M is defined as follows:

lM＝λ _adv l _adv (M(S′),M(T))+l _seg (M(S′),Y _S )

wherein λ is _adv Representing the coefficients of the penalty loss, the total penalty of training the segmentation network M being the penalty loss ladv and the segmentation penalty l _seg The sum of (1).

The domain discrimination network D in S6 uses BCEloss, and is defined as follows:

wherein S 'represents the translated source domain data, T represents the target domain data, and M (T) is the mapping M (S') of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain.

The test data in S8 enhances TTA: TTA makes the model predict each image by amplifying the copy of the image obtained by vertically and horizontally turning and inversely turning the test input image, then returns the set of the predictions, and obtains the final result of the image by averaging the prediction results of the original image and the turned image;

conditional random field CRF: a to the condition distribution carries on the model-building undirected graph model, when calculating the label of a certain pixel, obtain the information of the pixel of the field, make the segmentation result more accurate; the condition in the conditional random field is conditional probability, namely the probability that the current pixel belongs to a certain class under the condition of the gray value of the current pixel and the pixels in the surrounding area; the conditional probability distribution is specifically a Gibbs distribution, and refers to a conditional probability that the categories of all pixels on the image are determined: p ═ exp (-E)/Z;

where exp (-E) is the probability that the current pixel is the current segmentation result; z represents a matrix of the same size as the image, with the probability that the grey value of each primary color exactly constitutes the current image.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to a high-resolution SAR image surface feature element extraction method based on depth unsupervised multistep anti-domain self-adaptation, which uses labeled source domain data and unlabeled target domain data to train a network, and transfers knowledge from a source domain to different but related target domains, thereby solving the problem that a training sample in the target domain is unlabeled or the labeling is limited and is not enough to support network training due to difficult and expensive labeling of single-polarized high-resolution SAR data. By means of the style migration upstream task, the source domain image is translated into the style of the target domain, the distribution of the source domain and the target domain is drawn, the domain gap is reduced, the downstream task is easier to learn, the translated source domain and the unmarked target domain data are sent to an impedance adaptive network, a feature extractor is trained, the features of the source domain and the target domain are extracted and classified, a domain discriminator is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain, meanwhile, the feature extractor is encouraged to align the output distribution of the target domain and the source domain, and the feature extractor is effectively helped to learn more effective domain invariant features. The invention is based on depth unsupervised, uses a typical generator-discriminator structure, wherein an encoder adopts a DeepLabv2 model architecture based on ResNet101, is used for outputting spatial structured information, is used as a part of countermeasure loss and returns to a generator for obtaining effective domain invariant features, prevents training deviation, improves the classification accuracy when training samples are insufficient, and can be used for classification and change detection.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a flow chart of data reading in hardware according to the present invention.

FIG. 3 is an embodiment source domain image and target domain image.

FIG. 4 is a source domain (target domain style) image translated and a target domain (source domain style) image translated using a style migration network according to an embodiment.

FIG. 5 is a diagram of an embodiment target domain tag.

FIG. 6 is a diagram illustrating a target domain test result of the single-step anti-domain adaptation method according to the embodiment.

FIG. 7 is a diagram illustrating the test results of the target domain of the embodiment.

Detailed Description

The present invention is described in further detail below with reference to the attached drawings.

Referring to fig. 1, a method for extracting surface feature elements of a high-resolution SAR image based on depth unsupervised multi-step anti-domain self-adaptation comprises the following steps:

(1a) storing conversion and truncation: because the acquired panoramic SAR images are all 16-bit unsigned shaped data and most of the acquired panoramic SAR images are concentrated in the range of [0,500], the images are directly read by the libraries of cv2 and PIL and compressed from 65536 gray levels to 256 gray level ranges, the data can be compressed in the first few small gray levels, so that correct images cannot be displayed, and a neural network cannot learn, therefore, the images must be cut off/contrastively stretched, the gray levels with low occurrence probability are discarded, and the gray level range with high occurrence probability is reserved; counting the gray level distribution of the 16-bit SAR image, counting the occurrence frequency of the 16-bit SAR image according to the size of the gray level, and when the current distribution is accumulated to a Threshold (Threshold) through an accumulation distribution function, discarding the residual pixels, setting all the pixels exceeding the Threshold gray level as the gray level of the current Threshold, and dividing the gray level by the Threshold and multiplying the gray level by 255 to convert the gray level into 8-bit SAR data storage;

the linear stretching formula is:

wherein gray represents a gray level; min _in ，max _in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min(s) _out ，max _out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, the Threshold is generally set to 95%, min _in Set to zero;

(1b) it is known that, in the prediction process of large scene semantic segmentation, if a large remote sensing image to be classified is directly input into a network model, memory overflow is caused, so that the image to be classified is generally cut into a series of small images which are respectively input into a network for prediction, and then the prediction result is input into the network according to the result of the predictionSplicing the images according to the cutting sequence to form a final result image; if the conventional regular grid cutting prediction splicing is adopted, obvious splicing traces among blocks can exist inevitably, because the information of the boundary part is less, the network can not accurately estimate the category of the blocks, the invention adopts an expansion sampling mode to resample test data aiming at the phenomenon, and ignores edge prediction during the splicing; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is

From experimental experience, the dilation boundary slidesize is usually set to 100, the size of A-a;

(1c) the data is the basis of deep learning, high-quality data input can play a positive role in the whole deep neural network, and because the experiment is based on Hua for AI development platform ModelArts and self-research frame Mindspore thereof, according to the characteristics of the data, the invention converts image data in the formats of jpg, png, tif and the like into the format of MindRecord and further realizes the reading of the data through a MindDataset interface, as shown in FIG. 2; the data format has the characteristics that unified storage and access of data are realized, so that data reading during training is simpler and more convenient; data are aggregated and stored and efficiently read, so that the data are convenient to manage and move during training; efficient data encoding and decoding operations enable a user to be unaware of the data operations; the partition size of data segmentation can be flexibly controlled, and distributed data processing is realized;

s2, sending the preprocessed source domain image S and target domain image T into an image translation network for style migration to obtain translation source data S', drawing up data distribution of a source domain and a target domain, and reducing domain gaps to enable downstream tasks to learn more easily;

(2a) the translation network uses classical cycleGAN, is an annular structure and mainly comprises two generators and two discriminators, two-way mapping relation generators G and F are established between a source domain image S and a target domain image T, and two discriminators D are used _S And D _T Distinguishing source domain images S and F (T), target domain images T and G (S) respectively, wherein the loss function comprises two parts of loss resistance and cycle consistency loss; in addition, define

Is the sample space of the source domain S,

a sample space that is a target domain T;

(2b) the resistance loss: making the mapped data distribution close to that of the target domain, and learning the mapping from the source domain image S to the target domain image T by a generator G (G: S → T); a generator F for learning the mapping of the target domain image T to the source domain image S (F: T → S);

the challenge loss for S- > T is:

The challenge loss for T- > S is:

(2c) Loss of cycle consistency: ensuring that generators G and F of the two learned mappings do not contradict with each other, and the generators G and F also hope that G (F (T)) is similar to T and F (G (S)) is similar to S as much as possible while learning the two mappings, so as to prevent the generator G from excessively learning the sample of the target domain image T space and excessively changing the sample of the source domain image S space, and using L1 loss;

for each image S from the source domain S, G and F satisfy the forward loop consistency, bringing S back to the original image one period after the image translation loop, i.e., x → G (x) → F (G (x)) apprxx; similarly, for each image T of the target field T, G and F should also satisfy the inverse cycle consistency, i.e., y → F (y) → G (F (y) ≈ y);

(2d) final loss function:

l(G,F,D _S ,D _T )＝l _GAN (G,D _T ,S,T)+l _GAN (F,D _S ,S,T)+λl(G,F)

the final goal is to optimize:

s3, initializing a segmentation network M of a downstream task and an optimizer SGD thereof, and initializing a domain discrimination network D and an optimizer Adam thereof;

(3a) the segmentation network M uses a DeepLabv2 model architecture based on ResNet101 and is used for outputting spatial structural information and continuously learning domain characteristics, so that a discriminator cannot distinguish two fields; the ASPP (asynchronous spatial profiling) module uses a multi-scale to increase the receptive field from k of the normal convolution to (k + (k-1) (r-1));

(3b) the domain discrimination network D consists of an input layer, 5 convolutional layers and an activation function layer, wherein the convolutional layers use 2D convolution, the pooling layers use LeakyReLU, and the alpha coefficient is 0.2; the LeakyReLU adjusts for the negative zero gradient (zero gradients) problem by giving the negative input α x a very small linear component of x, which gets a positive gradient of α when x <0, alleviating the Dead ReLU problem to some extent; distinguishing whether the output of the feature extractor comes from a source domain or a target domain through a training domain distinguishing network D, and simultaneously encouraging the feature extractor to align the output distribution of the target domain and the source domain, so as to effectively help the feature extractor to learn more effective domain invariant features;

setting the number of feature maps to be 5 for the input layer of the layer 1;

for the 4 th convolutional layer, setting the number of feature maps to 256, setting the filter size to 4 and the step size to 2;

for the convolution layer of the 6 th layer, setting the number of the characteristic mapping graphs as 1, setting the size of a filter as 4 and the step length as 2;

(3d) for the domain discrimination network D, the initial learning rate is set to 1e ^-4 Against the loss coefficient lambda _adv The loss function of the network D is judged by adopting an adaptive moment estimation Adam minimized domain, wherein the loss function is 0.001;

wherein the division loss l _seg Using the cross-entropy penalty, the source domain partitioning penalty is defined as follows:

wherein, Y _S Is the label mapping of Is, C Is the class number, H and W are the height and width of the output probability mapping, P _S Is the source domain probability of the segmentation adaptation model M, defined as P _S ＝M(I’ _S )；

wherein the antagonistic loss is defined as follows:

wherein, define

Is the sample space of the source domain S,

a sample space that is a target domain T; i' _s ，I _t Respectively representing input translation source domain and target domain samples; discriminator D for counterlearning _M The aim is to reduce the difference between the source domain and target domain features extracted by the segmented network M;

thus, the total loss function of the learning split network M can be defined as follows:

lM＝λ _adv l _adv (M(S′),M(T))+l _seg (M(S′),Y _S )

wherein λ is _adv Representing the coefficients of the penalty loss, the total penalty of training the segmentation network M being the penalty loss ladv and the segmentation penalty l _seg The sum of (1);

s6, respectively sending the output M (S) and M (T) of the segmentation network into a domain discrimination network D to calculate the domain classification loss, and updating the domain discrimination network D and an optimizer Adam thereof to further reduce the domain gap;

wherein the domain discrimination network D uses BCEloss, defined as follows:

wherein S 'represents the translated source domain data, T represents the target domain data, and M (S'), M (T), the mapping of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain;

s8, sending the target domain data into a trained segmentation network M for classification, then using TTA test or training CRF for label optimization to obtain a pixel-level classification result, giving a color to each class to generate an RGB prediction result graph, comparing the RGB prediction result graph with a real class, and calculating the evaluation indexes of each class, namely Precision, Recall and F1ccore, and the overall evaluation indexes OA, kappa, MIoU and IoFWU;

(8a) test data enhancement TTA: using test data enhancement (TTA) to improve a prediction result, wherein TTA and CRF are a means for improving noise in the test result, and TTA mainly predicts each image by a model through image amplification copies obtained by vertically and horizontally turning and reversely turning a test input image, then returns a set of predictions, and obtains a final result of the image by averaging prediction results of an original image and a turned image;

(8b) conditional random field CRF: a non-directional graph model for modeling condition distribution has the core idea that when a label of a certain pixel is calculated, information of a field pixel is obtained, so that a segmentation result is more accurate; the condition in the conditional random field is conditional probability, namely the probability that the current pixel belongs to a certain class under the condition of the gray value of the current pixel and the pixels in the surrounding area; the conditional probability distribution is specifically a Gibbs distribution, and refers to a conditional probability that the categories of all pixels on the image are determined: p ═ exp (-E)/Z;

The effects of the present invention can be further illustrated by the following simulations:

1. simulation conditions are as follows: the hardware test platform used in the simulation experiment is as follows: based on AI development platform on ModelArts cloud, video card Ascend-910(32GB) | ARM: 96GB in 24 cores; the software platform is as follows: python 3.7, Mindspore 1.6.0; operating the system: eulerosv2r8 aarch64r 64 bit operating system.

The single-polarization high-resolution SAR data used in the simulation experiment are two private data sets shot by GF-3 satellite, the radar parameters are C fluctuation, VV polarization and resolution ratio 1m, the private data sets are respectively Chinese Dongyong (shandong |10240 x 9216) and Korea (korea |9728 x 7680), and the private data sets comprise six types of ground objects which are respectively invalid categories, buildings, water bodies, cultivated land, greening and roads.

During training, seamlessly cutting each data set into 512-512 small graphs, taking a group of labeled data sets as a source domain, and taking a group of unlabeled data sets as a target domain to be sent to network training; during testing, the target domain data set is cut in an expansion sampling mode, the expansion size is 100, the target domain data set is still cut to be 512-512, but each test small graph only takes the result of 312-312 at the center, and finally all test graphs are spliced together to obtain the final test result; detailed data information is as follows in tables 1 and 2

Table 1: data set information label

Table 2: region class-RGB value comparison table

Claims

1. A high-resolution SAR image surface feature extraction method based on depth unsupervised multistep anti-domain self-adaptation is characterized by comprising the following steps: translating the source domain image into the style of a target domain by virtue of the style migration upstream task, and drawing the distribution of the source domain and the target domain closer; and the translated source domain and the unmarked target domain data are sent to an impedance self-adaptive network, a feature extractor is trained to extract and classify the features of the source domain and the target domain, and a domain discrimination network is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain and encourage the feature extractor to align the output distribution of the target domain and the source domain.

2. The method for extracting the surface feature of the high-resolution SAR image based on the depth unsupervised multi-step anti-domain self-adaptation is characterized by comprising the following steps of:

3. The method according to claim 2, wherein S1 specifically is:

the linear stretching formula is:

wherein gray represents a gray level; min _in ，max _in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min _out ，max _out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, Threshold is set to 95%, min _in Set to zero;

(1b) resampling the test data by adopting an expansion sampling mode, and neglecting edge prediction during splicing; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is

Setting the dilation boundary slidesize to 100, the size of A-a;

4. The method according to claim 2, wherein S2 specifically is:

Is the sample space of the source domain S,

a sample space that is a target domain T;

the challenge loss for S- > T is:

wherein G(s) is a false graph similar to the target domain Y generated by the generator G, D _T Representing the probability that the input variable is a sample in T-space, aiming at distinguishing the translation sample G(s) from the true sample T; our goal is to minimize G and maximize D _T ；

The challenge loss for T- > S is:

(2d) final loss function:

l(G,F,D _S ,D _T )＝l _GAN (G,D _T ,S,T)+l _GAN (F,D _S ,S,T)+λl(G,F)

the final goal is to optimize:

5. the method according to claim 2, wherein S3 specifically is:

setting the number of feature maps to be 5 for the input layer of the layer 1;

for the 2 nd convolutional layer, setting the number of feature maps to 64, setting the filter size to 4, and setting the step size to 2;

for layer 3 convolutional layers, set the number of feature maps to 128, set the filter size to 4, and set the step size to 2;

setting an alpha coefficient to be 0.2 for the 7 th layer of the activation function layer;

(3d) for the domain discrimination network D, initial learning is setRate of 1e ^-4 Coefficient of opposing loss λ _adv At 0.001, the loss function of the network D is judged using the adaptive moment estimation Adam minimization domain.

6. The method of claim 2, wherein the segmentation in S4 yields a loss/ _seg Using the cross-entropy penalty, the source domain partitioning penalty is defined as follows:

wherein, Y _S Is the label mapping of Is, C Is the class number, H and W are the height and width of the output probability mapping, P _S Is the source domain probability of the segmentation adaptation model M, defined as P _S ＝M(I’ _S )。

7. The method according to claim 2, wherein the antagonistic loss in S5 is defined as follows:

wherein, define

Is the sample space of the source domain S,

the total loss function of the learning split network M is defined as follows:

lM＝λ _adv l _adv (M(S′),M(T))+l _seg (M(S′),Y _S )

8. The method according to claim 2, wherein the domain discrimination network D in S6 uses BCEloss, which is defined as follows:

wherein S 'represents the translated source domain data, T represents the target domain data, and M (S'), M (T), the mapping of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain.

9. The method of claim 2, wherein the test data in S8 enhances TTA: TTA makes the model predict each image by amplifying the copy of the image obtained by vertically and horizontally turning and inversely turning the test input image, then returns the set of the predictions, and obtains the final result of the image by averaging the prediction results of the original image and the turned image;

wherein exp (-E) is the probability that the current pixel is the current segmentation result; z represents a matrix of the same size as the image, with the probability that the grey value of each primary color exactly constitutes the current image.