CN115049841A - Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method - Google Patents

Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method Download PDF

Info

Publication number
CN115049841A
CN115049841A CN202210664345.1A CN202210664345A CN115049841A CN 115049841 A CN115049841 A CN 115049841A CN 202210664345 A CN202210664345 A CN 202210664345A CN 115049841 A CN115049841 A CN 115049841A
Authority
CN
China
Prior art keywords
domain
image
data
target domain
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210664345.1A
Other languages
Chinese (zh)
Inventor
张雨
任仲乐
侯彪
焦李成
韩祥永
张锐
苏海波
李永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210664345.1A priority Critical patent/CN115049841A/en
Publication of CN115049841A publication Critical patent/CN115049841A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A high-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaptation comprises the steps of firstly converting a source domain image into a similar style of a target domain image by using an image transformation model, thereby reducing the distribution difference of a source domain and a target domain; then, the source domain image with the converted style and the unmarked target domain image are sent to an adaptive network, on one hand, the deep network is trained to learn the characteristics of the source domain and the target domain and carry out semantic segmentation, on the other hand, a discriminant is trained to distinguish whether the input of the source domain or the target domain is from the source domain, and the deep network is guided to align the distribution characteristics of the source domain and the target domain in a feedback way; finally, predicting the ground feature type of the large scene of the target domain by using the trained model, and completing pixel-level ground feature element extraction of the single-polarized high-resolution SAR image; the method breaks through the bottleneck that the model popularization is poor due to insufficient labeled samples and inconsistent distribution of the source domain data and the target domain data, and improves the accuracy and performance of the ground feature classification of the SAR image of the target domain.

Description

High-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaption
Technical Field
The invention belongs to the technical field of intelligent interpretation of radar remote sensing images, and particularly relates to a high-resolution SAR image surface feature element extraction method based on depth unsupervised multi-step anti-domain self-adaptation.
Background
Synthetic Aperture Radar (SAR) is a high-resolution active microwave remote sensing imaging radar, has the characteristics of all weather, short imaging period and continuous monitoring all day long, has strong penetrability, is not influenced by weather conditions such as cloud, rain and fog, can penetrate earth surface structures such as earth surface and leaf clusters, and is widely applied to military and civil aspects, such as environmental protection, disaster monitoring, ocean observation, resource protection, land coverage, accurate agriculture, urban area detection, geographical mapping and the like. Semantic segmentation of high-resolution large-scene land coverage images obtained by satellites is a very challenging task, and faces the difficult problems of marked data scarcity, data feature difference caused by different imaging parameters (sensors, frequency bands, polarization, resolution or incidence angles) and the like. Therefore, how to fully utilize the existing tag data for migration becomes a solution which is popular in the present time. Domain adaptation (Domain adaptation) can overcome the difference of different SAR data, and migrate knowledge from a source Domain to a different but related target Domain, which solves the problem of limited training samples in the target Domain.
The traditional domain self-adaptive method is mainly carried out on the three aspects of feature-based, example-based and model-based, and aims to adjust a source domain sample and a target domain sample to the same feature space by using a mapping phi so as to enable the feature space samples to be aligned; considering that some samples in the source domain are similar to the samples in the target domain, the example-based method multiplies the Loss of all the samples in the source domain by a weight wi during training, and the weight is larger for the samples with the similarity to the target domain; the model-based approach aims at finding new parameters θ', which enable the model to work better on the target domain through migration of the parameters.
Deep learning domain adaptation mainly comprises difference-based, confrontation-based and reconstruction-based methods. The method is a domain adaptive method based on countermeasure, and since the source domain data and the target domain data with different distributions naturally exist in the domain adaptive method, the process of generating samples can be omitted, and the target domain is directly used as the generated samples. At this time, the generator degenerates into a feature extractor, constantly learning domain features, so that the discriminator cannot distinguish between the two domains. However, the unsupervised antibody domain adaptive method has many defects, and the robustness of the antibody domain adaptive method is poor due to long tail distribution on data categories and category distribution difference on different data domains; the adaptation of the anti-domain is not theoretically guaranteed in the regression problem, the distribution of the features is discrete and diffuse in the whole space, and even if the discriminator is successfully deceived, the features of the source domain and the target domain cannot be guaranteed to be drawn close according to the same label; in addition, although the distribution of data in different domains is drawn in the feature space by the aid of the anti-domain self-adaptation, and the mobility of features in different domains is improved, the discriminability of the data features is possibly reduced, and the difficulty in training the classifier by the aid of the fixed anti-domain self-adaptation features is increased.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a high-resolution SAR image surface feature extraction method based on depth unsupervised multi-step anti-domain self-adaptation, which improves the classification precision of a target domain through an auxiliary task and solves the problems of insufficient training samples and inconsistent distribution of training and testing samples.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high resolution SAR image surface feature element extraction method based on depth unsupervised multistep anti-domain self-adaptation translates a source domain image into a style of a target domain by means of a style migration upstream task, and draws the distribution of the source domain and the target domain; and the translated source domain and the unmarked target domain data are sent to an impedance self-adaptive network, a feature extractor is trained to extract and classify the features of the source domain and the target domain, and a domain discrimination network is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain and encourage the feature extractor to align the output distribution of the target domain and the source domain.
A high-resolution SAR image surface feature extraction method based on depth unsupervised multi-step anti-domain self-adaptation comprises the following steps:
s1, performing data preprocessing on the source domain image and the target domain image, wherein the data preprocessing comprises the steps of converting 16 bits into 8 bits of the SAR image, cutting, clipping, dividing and converting the data format;
s2, sending the preprocessed source domain image S and the preprocessed target domain image T into an image translation network for style migration to obtain translation source data S';
s3, initializing a segmentation network M of a downstream task and an optimizer SGD thereof, and initializing a domain discrimination network D and an optimizer Adam thereof; distinguishing whether the output of the feature extractor comes from a source domain or a target domain through a training domain distinguishing network D, and simultaneously encouraging the feature extractor to align the output distribution of the target domain image and the source domain image to help the feature extractor to learn the invariant features of the domain;
s4, sending the translation source data S', the corresponding label Ys and the target domain image T into a segmentation network M to obtain segmentation outputs M (S) and M (T), and using the corresponding label Y S Calculating the segmentation loss of the source domain;
s5, inputting the output M (T) of the segmented network M to the target domain into a domain discrimination network D, calculating the confrontation loss of the discriminating network D, multiplying the confrontation loss by a corresponding coefficient, adding the multiplied counteracting loss into the target domain segmentation loss, and updating the segmented network M and the optimizer SGD thereof;
s6, respectively sending the output M (S) and M (T) of the divided network into a domain discrimination network D to calculate the domain classification loss, and updating the domain discrimination network D and an optimizer Adam thereof;
s7, repeating S4 to S6 until the maximum training times are reached, and obtaining model parameters of the segmentation network M;
s8, the target domain data are sent into a trained segmentation network M for classification, label optimization is carried out by using TTA test or training CRF to obtain pixel-level classification results, each class is endowed with colors to generate an RGB prediction result graph, the RGB prediction result graph is compared with a real class label, and evaluation indexes Precision, Recall and F1ccore of each class and overall evaluation indexes OA, kappa, MIoU and IoFWU are calculated.
The S1 specifically includes:
(1a) storing conversion and truncation: performing truncation/contrast stretching on the image, discarding the gray level with low occurrence probability, and reserving the gray level range with high occurrence probability; counting the gray level distribution of the 16-bit SAR image, counting the occurrence frequency of the 16-bit SAR image according to the size of the gray level, and when the current distribution is accumulated to a Threshold (Threshold) through an accumulation distribution function, discarding the residual pixels, setting all the pixels exceeding the Threshold gray level as the gray level of the current Threshold, and dividing the gray level by the Threshold and multiplying the gray level by 255 to convert the gray level into 8-bit SAR data storage;
the linear stretching formula is:
Figure BDA0003692364260000031
wherein gray represents a gray level; min in ,max in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min(s) out ,max out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, Threshold is generally set to 95%, min in Set to zero;
(1b) resampling test data by adopting expansion sampling mode and splicingIgnoring edge prediction; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is
Figure BDA0003692364260000032
Setting the dilation boundary slidesize to 100, the size of A-a;
(1c) based on an AI development platform ModelArts and a self-research framework Mindspore thereof, image data in the jpg format, the png format and the tif format are converted into a MindRecord format, and the data is further read through a MindDataset interface, wherein the data format has the characteristics that: the unified storage and access of data are realized; data are aggregated and stored and efficiently read, so that the data are convenient to manage and move during training; efficient data encoding and decoding operations enable a user to be unaware of the data operations; the partition size of data segmentation is flexibly controlled, and distributed data processing is realized.
The S2 specifically includes:
(2a) the translation network uses classical cycleGAN, establishes bidirectional mapping relation generators G and F between a source domain image S and a target domain image T, and uses two discriminators D S And D T Distinguishing source domain images S and F (T), target domain images T and G (S) respectively, wherein the loss function comprises two parts of loss resistance and cycle consistency loss; in addition, define
Figure BDA0003692364260000041
Is the sample space of the source domain S,
Figure BDA0003692364260000042
a sample space that is a target domain T;
(2b) the resistance loss: the mapped data distribution is made to approach that of the target domain, and the generator G learns the mapping of the source domain image S to the target domain image T (G: S → T); the generator F learns the mapping of the target domain image T to the source domain image S (F: T → S);
the challenge loss for S- > T is:
Figure BDA0003692364260000043
wherein G(s) is a false graph similar to the target domain Y generated by the generator G, D T Representing the probability that the input variable is a sample in T-space, aiming at distinguishing the translation sample G(s) from the true sample T; the goal is to minimize G and maximize D T
The challenge loss for T- > S is:
Figure BDA0003692364260000044
wherein F (t) is a false graph, D, generated by the generator F, similar to the source domain S S Representing the probability that the input variable is a sample in S-space, intended to distinguish the translated samples f (t) from the true samples S; the goal is to minimize F and maximize D S
(2c) Loss of cycle consistency: ensuring that generators G and F of two learned mappings do not contradict with each other, and hopefully, G (F (T)) is similar to T and F (G (S)) is similar to S as much as possible while the generators G and F learn the two mappings, so that the generator G is prevented from excessively learning samples of a target domain image T space and excessively changing samples of a source domain image S space, and L1loss is used;
Figure BDA0003692364260000045
for each image S from the source domain S, G and F satisfy a forward loop consistency, bringing S back to the original image one cycle after the image translation loop, i.e., x → G (x) → F (G (x)) ≈ x; similarly, for each image T of the target field T, G and F should also satisfy the inverse cycle consistency, i.e., y → F (y) → G (F (y) ≈ y);
(2d) final loss function:
l(G,F,D S ,D T )=l GAN (G,D T ,S,T)+l GAN (F,D S ,S,T)+λl(G,F)
the final overall loss is expressed as the competing losses of S- > T, T- > S, and the cyclic consistent loss components of generator G and generator F, where λ is a coefficient;
the final goal is to optimize:
Figure BDA0003692364260000051
the S3 specifically includes:
(3a) the segmentation network M uses a DeepLabv2 model architecture based on ResNet101 and is used for outputting spatial structural information and continuously learning domain features, so that a discriminator cannot distinguish two domains; the ASPP (asynchronous spatial profiling) module uses a multi-scale to increase the receptive field from k of the normal convolution to (k + (k-1) (r-1));
(3b) the domain discrimination network D consists of an input layer, 5 convolutional layers and an activation function layer, wherein the convolutional layers use 2D convolution, the pooling layers use LeakyReLU, and the alpha coefficient is 0.2; the LeakyReLU adjusts for the negative zero gradient (zero gradients) problem by giving the negative input α x a very small linear component of x, which gets a positive gradient of α when x <0, alleviating the Dead ReLU problem;
setting the number of feature maps to be 5 for the input layer of the layer 1;
for the 2 nd convolutional layer, setting the number of feature maps to be 64, setting the size of a filter to be 4 and the step size to be 2;
for the layer 3 convolutional layer, setting the number of feature maps to be 128, setting the filter size to be 4 and the step size to be 2;
for the 4 th convolutional layer, set the number of feature maps to 256, set the filter size to 4, and set the step size to 2;
for the 5 th convolutional layer, setting the number of feature maps to be 512, setting the size of a filter to be 4 and the step size to be 2;
for the 6 th convolutional layer, setting the number of feature maps to be 1, setting the size of a filter to be 4 and the step length to be 2;
setting the alpha coefficient to be 0.2 for the 7 th layer of the activation function;
(3c) for the segmented network M, the maximum iteration number is set to 56000iter, and the initial learning rate lr is set to 2.5e -4 Weight attenuation of 5e -4 A random gradient descent method SGD is adopted to minimize a loss function of the segmentation network M;
(3d) for the domain discrimination network D, the initial learning rate is set to 1e -4 Coefficient of opposing loss λ adv At 0.001, the loss function of the network D is judged using the adaptive moment estimation Adam minimization domain.
The segmentation loss l in S4 seg Using the cross-entropy penalty, the source domain partition penalty is defined as follows:
Figure BDA0003692364260000061
wherein Y is S Is the label mapping of Is, C Is the number of classes, H and W are the height and width of the output probability mapping, P S Is the source domain probability of the segmentation adaptation model M, defined as P S =M(I’ S )。
The antagonistic loss in S5 is defined as follows:
Figure BDA0003692364260000062
wherein, define
Figure BDA0003692364260000063
Is the sample space of the source domain S,
Figure BDA0003692364260000064
a sample space that is a target domain T; i' s ,I t Respectively representing input translation source domain and target domain samples; discriminator D for counterlearning M Intended to reduce the difference between the source domain and target domain features extracted by the segmented network M;
the total loss function of the learning segmentation network M is defined as follows:
lM=λ adv l adv (M(S′),M(T))+l seg (M(S′),Y S )
wherein λ is adv Representing the coefficients of the penalty loss, the total penalty of training the segmentation network M being the penalty loss ladv and the segmentation penalty l seg The sum of (1).
The domain discrimination network D in S6 uses BCEloss, and is defined as follows:
Figure BDA0003692364260000065
wherein S 'represents the translated source domain data, T represents the target domain data, and M (T) is the mapping M (S') of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain.
The test data in S8 enhances TTA: TTA makes the model predict each image by amplifying the copy of the image obtained by vertically and horizontally turning and inversely turning the test input image, then returns the set of the predictions, and obtains the final result of the image by averaging the prediction results of the original image and the turned image;
conditional random field CRF: a to the condition distribution carries on the model-building undirected graph model, when calculating the label of a certain pixel, obtain the information of the pixel of the field, make the segmentation result more accurate; the condition in the conditional random field is conditional probability, namely the probability that the current pixel belongs to a certain class under the condition of the gray value of the current pixel and the pixels in the surrounding area; the conditional probability distribution is specifically a Gibbs distribution, and refers to a conditional probability that the categories of all pixels on the image are determined: p ═ exp (-E)/Z;
where exp (-E) is the probability that the current pixel is the current segmentation result; z represents a matrix of the same size as the image, with the probability that the grey value of each primary color exactly constitutes the current image.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention relates to a high-resolution SAR image surface feature element extraction method based on depth unsupervised multistep anti-domain self-adaptation, which uses labeled source domain data and unlabeled target domain data to train a network, and transfers knowledge from a source domain to different but related target domains, thereby solving the problem that a training sample in the target domain is unlabeled or the labeling is limited and is not enough to support network training due to difficult and expensive labeling of single-polarized high-resolution SAR data. By means of the style migration upstream task, the source domain image is translated into the style of the target domain, the distribution of the source domain and the target domain is drawn, the domain gap is reduced, the downstream task is easier to learn, the translated source domain and the unmarked target domain data are sent to an impedance adaptive network, a feature extractor is trained, the features of the source domain and the target domain are extracted and classified, a domain discriminator is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain, meanwhile, the feature extractor is encouraged to align the output distribution of the target domain and the source domain, and the feature extractor is effectively helped to learn more effective domain invariant features. The invention is based on depth unsupervised, uses a typical generator-discriminator structure, wherein an encoder adopts a DeepLabv2 model architecture based on ResNet101, is used for outputting spatial structured information, is used as a part of countermeasure loss and returns to a generator for obtaining effective domain invariant features, prevents training deviation, improves the classification accuracy when training samples are insufficient, and can be used for classification and change detection.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a flow chart of data reading in hardware according to the present invention.
FIG. 3 is an embodiment source domain image and target domain image.
FIG. 4 is a source domain (target domain style) image translated and a target domain (source domain style) image translated using a style migration network according to an embodiment.
FIG. 5 is a diagram of an embodiment target domain tag.
FIG. 6 is a diagram illustrating a target domain test result of the single-step anti-domain adaptation method according to the embodiment.
FIG. 7 is a diagram illustrating the test results of the target domain of the embodiment.
Detailed Description
The present invention is described in further detail below with reference to the attached drawings.
Referring to fig. 1, a method for extracting surface feature elements of a high-resolution SAR image based on depth unsupervised multi-step anti-domain self-adaptation comprises the following steps:
s1, performing data preprocessing on the source domain image and the target domain image, wherein the data preprocessing comprises the steps of converting 16 bits into 8 bits of the SAR image, cutting, clipping, dividing and converting the data format;
(1a) storing conversion and truncation: because the acquired panoramic SAR images are all 16-bit unsigned shaped data and most of the acquired panoramic SAR images are concentrated in the range of [0,500], the images are directly read by the libraries of cv2 and PIL and compressed from 65536 gray levels to 256 gray level ranges, the data can be compressed in the first few small gray levels, so that correct images cannot be displayed, and a neural network cannot learn, therefore, the images must be cut off/contrastively stretched, the gray levels with low occurrence probability are discarded, and the gray level range with high occurrence probability is reserved; counting the gray level distribution of the 16-bit SAR image, counting the occurrence frequency of the 16-bit SAR image according to the size of the gray level, and when the current distribution is accumulated to a Threshold (Threshold) through an accumulation distribution function, discarding the residual pixels, setting all the pixels exceeding the Threshold gray level as the gray level of the current Threshold, and dividing the gray level by the Threshold and multiplying the gray level by 255 to convert the gray level into 8-bit SAR data storage;
the linear stretching formula is:
Figure BDA0003692364260000081
wherein gray represents a gray level; min in ,max in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min(s) out ,max out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, the Threshold is generally set to 95%, min in Set to zero;
(1b) it is known that, in the prediction process of large scene semantic segmentation, if a large remote sensing image to be classified is directly input into a network model, memory overflow is caused, so that the image to be classified is generally cut into a series of small images which are respectively input into a network for prediction, and then the prediction result is input into the network according to the result of the predictionSplicing the images according to the cutting sequence to form a final result image; if the conventional regular grid cutting prediction splicing is adopted, obvious splicing traces among blocks can exist inevitably, because the information of the boundary part is less, the network can not accurately estimate the category of the blocks, the invention adopts an expansion sampling mode to resample test data aiming at the phenomenon, and ignores edge prediction during the splicing; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is
Figure BDA0003692364260000082
From experimental experience, the dilation boundary slidesize is usually set to 100, the size of A-a;
(1c) the data is the basis of deep learning, high-quality data input can play a positive role in the whole deep neural network, and because the experiment is based on Hua for AI development platform ModelArts and self-research frame Mindspore thereof, according to the characteristics of the data, the invention converts image data in the formats of jpg, png, tif and the like into the format of MindRecord and further realizes the reading of the data through a MindDataset interface, as shown in FIG. 2; the data format has the characteristics that unified storage and access of data are realized, so that data reading during training is simpler and more convenient; data are aggregated and stored and efficiently read, so that the data are convenient to manage and move during training; efficient data encoding and decoding operations enable a user to be unaware of the data operations; the partition size of data segmentation can be flexibly controlled, and distributed data processing is realized;
s2, sending the preprocessed source domain image S and target domain image T into an image translation network for style migration to obtain translation source data S', drawing up data distribution of a source domain and a target domain, and reducing domain gaps to enable downstream tasks to learn more easily;
(2a) the translation network uses classical cycleGAN, is an annular structure and mainly comprises two generators and two discriminators, two-way mapping relation generators G and F are established between a source domain image S and a target domain image T, and two discriminators D are used S And D T Distinguishing source domain images S and F (T), target domain images T and G (S) respectively, wherein the loss function comprises two parts of loss resistance and cycle consistency loss; in addition, define
Figure BDA0003692364260000091
Is the sample space of the source domain S,
Figure BDA0003692364260000092
a sample space that is a target domain T;
(2b) the resistance loss: making the mapped data distribution close to that of the target domain, and learning the mapping from the source domain image S to the target domain image T by a generator G (G: S → T); a generator F for learning the mapping of the target domain image T to the source domain image S (F: T → S);
the challenge loss for S- > T is:
Figure BDA0003692364260000093
wherein G(s) is a false graph similar to the target domain Y generated by the generator G, D T Representing the probability that the input variable is a sample in T-space, aiming at distinguishing the translation sample G(s) from the true sample T; the goal is to minimize G and maximize D T
The challenge loss for T- > S is:
Figure BDA0003692364260000094
wherein F (t) is a false graph, D, generated by the generator F, similar to the source domain S S Representing the probability that the input variable is a sample in S-space, intended to distinguish the translated samples f (t) from the true samples S; the goal is to minimize F and maximize D S
(2c) Loss of cycle consistency: ensuring that generators G and F of the two learned mappings do not contradict with each other, and the generators G and F also hope that G (F (T)) is similar to T and F (G (S)) is similar to S as much as possible while learning the two mappings, so as to prevent the generator G from excessively learning the sample of the target domain image T space and excessively changing the sample of the source domain image S space, and using L1 loss;
Figure BDA0003692364260000095
for each image S from the source domain S, G and F satisfy the forward loop consistency, bringing S back to the original image one period after the image translation loop, i.e., x → G (x) → F (G (x)) apprxx; similarly, for each image T of the target field T, G and F should also satisfy the inverse cycle consistency, i.e., y → F (y) → G (F (y) ≈ y);
(2d) final loss function:
l(G,F,D S ,D T )=l GAN (G,D T ,S,T)+l GAN (F,D S ,S,T)+λl(G,F)
the final overall loss is expressed as the competing losses of S- > T, T- > S, and the cyclic consistent loss components of generator G and generator F, where λ is a coefficient;
the final goal is to optimize:
Figure BDA0003692364260000101
s3, initializing a segmentation network M of a downstream task and an optimizer SGD thereof, and initializing a domain discrimination network D and an optimizer Adam thereof;
(3a) the segmentation network M uses a DeepLabv2 model architecture based on ResNet101 and is used for outputting spatial structural information and continuously learning domain characteristics, so that a discriminator cannot distinguish two fields; the ASPP (asynchronous spatial profiling) module uses a multi-scale to increase the receptive field from k of the normal convolution to (k + (k-1) (r-1));
(3b) the domain discrimination network D consists of an input layer, 5 convolutional layers and an activation function layer, wherein the convolutional layers use 2D convolution, the pooling layers use LeakyReLU, and the alpha coefficient is 0.2; the LeakyReLU adjusts for the negative zero gradient (zero gradients) problem by giving the negative input α x a very small linear component of x, which gets a positive gradient of α when x <0, alleviating the Dead ReLU problem to some extent; distinguishing whether the output of the feature extractor comes from a source domain or a target domain through a training domain distinguishing network D, and simultaneously encouraging the feature extractor to align the output distribution of the target domain and the source domain, so as to effectively help the feature extractor to learn more effective domain invariant features;
setting the number of feature maps to be 5 for the input layer of the layer 1;
for the 2 nd convolutional layer, setting the number of feature maps to be 64, setting the size of a filter to be 4 and the step size to be 2;
for the layer 3 convolutional layer, setting the number of feature maps to be 128, setting the filter size to be 4 and the step size to be 2;
for the 4 th convolutional layer, setting the number of feature maps to 256, setting the filter size to 4 and the step size to 2;
for the 5 th convolutional layer, setting the number of feature maps to be 512, setting the size of a filter to be 4 and the step size to be 2;
for the convolution layer of the 6 th layer, setting the number of the characteristic mapping graphs as 1, setting the size of a filter as 4 and the step length as 2;
setting the alpha coefficient to be 0.2 for the 7 th layer of the activation function;
(3c) for the segmented network M, the maximum iteration number is set to 56000iter, and the initial learning rate lr is set to 2.5e -4 Weight attenuation of 5e -4 A random gradient descent method SGD is adopted to minimize a loss function of the segmentation network M;
(3d) for the domain discrimination network D, the initial learning rate is set to 1e -4 Against the loss coefficient lambda adv The loss function of the network D is judged by adopting an adaptive moment estimation Adam minimized domain, wherein the loss function is 0.001;
s4, sending the translation source data S', the corresponding label Ys and the target domain image T into a segmentation network M to obtain segmentation outputs M (S) and M (T), and using the corresponding label Y S Calculating the segmentation loss of the source domain;
wherein the division loss l seg Using the cross-entropy penalty, the source domain partitioning penalty is defined as follows:
Figure BDA0003692364260000111
wherein, Y S Is the label mapping of Is, C Is the class number, H and W are the height and width of the output probability mapping, P S Is the source domain probability of the segmentation adaptation model M, defined as P S =M(I’ S );
S5, inputting the output M (T) of the segmented network M to the target domain into a domain discrimination network D, calculating the confrontation loss of the discriminating network D, multiplying the confrontation loss by a corresponding coefficient, adding the multiplied counteracting loss into the target domain segmentation loss, and updating the segmented network M and the optimizer SGD thereof;
wherein the antagonistic loss is defined as follows:
Figure BDA0003692364260000112
wherein, define
Figure BDA0003692364260000113
Is the sample space of the source domain S,
Figure BDA0003692364260000114
a sample space that is a target domain T; i' s ,I t Respectively representing input translation source domain and target domain samples; discriminator D for counterlearning M The aim is to reduce the difference between the source domain and target domain features extracted by the segmented network M;
thus, the total loss function of the learning split network M can be defined as follows:
lM=λ adv l adv (M(S′),M(T))+l seg (M(S′),Y S )
wherein λ is adv Representing the coefficients of the penalty loss, the total penalty of training the segmentation network M being the penalty loss ladv and the segmentation penalty l seg The sum of (1);
s6, respectively sending the output M (S) and M (T) of the segmentation network into a domain discrimination network D to calculate the domain classification loss, and updating the domain discrimination network D and an optimizer Adam thereof to further reduce the domain gap;
wherein the domain discrimination network D uses BCEloss, defined as follows:
Figure BDA0003692364260000121
wherein S 'represents the translated source domain data, T represents the target domain data, and M (S'), M (T), the mapping of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain;
s7, repeating S4 to S6 until the maximum training times are reached, and obtaining model parameters of the segmentation network M;
s8, sending the target domain data into a trained segmentation network M for classification, then using TTA test or training CRF for label optimization to obtain a pixel-level classification result, giving a color to each class to generate an RGB prediction result graph, comparing the RGB prediction result graph with a real class, and calculating the evaluation indexes of each class, namely Precision, Recall and F1ccore, and the overall evaluation indexes OA, kappa, MIoU and IoFWU;
(8a) test data enhancement TTA: using test data enhancement (TTA) to improve a prediction result, wherein TTA and CRF are a means for improving noise in the test result, and TTA mainly predicts each image by a model through image amplification copies obtained by vertically and horizontally turning and reversely turning a test input image, then returns a set of predictions, and obtains a final result of the image by averaging prediction results of an original image and a turned image;
(8b) conditional random field CRF: a non-directional graph model for modeling condition distribution has the core idea that when a label of a certain pixel is calculated, information of a field pixel is obtained, so that a segmentation result is more accurate; the condition in the conditional random field is conditional probability, namely the probability that the current pixel belongs to a certain class under the condition of the gray value of the current pixel and the pixels in the surrounding area; the conditional probability distribution is specifically a Gibbs distribution, and refers to a conditional probability that the categories of all pixels on the image are determined: p ═ exp (-E)/Z;
where exp (-E) is the probability that the current pixel is the current segmentation result; z represents a matrix of the same size as the image, with the probability that the grey value of each primary color exactly constitutes the current image.
The effects of the present invention can be further illustrated by the following simulations:
1. simulation conditions are as follows: the hardware test platform used in the simulation experiment is as follows: based on AI development platform on ModelArts cloud, video card Ascend-910(32GB) | ARM: 96GB in 24 cores; the software platform is as follows: python 3.7, Mindspore 1.6.0; operating the system: eulerosv2r8 aarch64r 64 bit operating system.
The single-polarization high-resolution SAR data used in the simulation experiment are two private data sets shot by GF-3 satellite, the radar parameters are C fluctuation, VV polarization and resolution ratio 1m, the private data sets are respectively Chinese Dongyong (shandong |10240 x 9216) and Korea (korea |9728 x 7680), and the private data sets comprise six types of ground objects which are respectively invalid categories, buildings, water bodies, cultivated land, greening and roads.
During training, seamlessly cutting each data set into 512-512 small graphs, taking a group of labeled data sets as a source domain, and taking a group of unlabeled data sets as a target domain to be sent to network training; during testing, the target domain data set is cut in an expansion sampling mode, the expansion size is 100, the target domain data set is still cut to be 512-512, but each test small graph only takes the result of 312-312 at the center, and finally all test graphs are spliced together to obtain the final test result; detailed data information is as follows in tables 1 and 2
Table 1: data set information label
Figure BDA0003692364260000131
Table 2: region class-RGB value comparison table
Categories Others Water area Tree/green Construction of buildings Farmland Road
RGB value [0,0,0] [0,0,255] [0,255,0] [255,0,0] [255,255,0] [210,180,140]
The related evaluation indexes adopted by the simulation experiment are defined as follows: i represents a positive example, j represents a negative example; p is a radical of ii The total number of pixels representing the true class identified as i as class i, i.e. true instances (TP), p ij The total number of pixels representing the true class identified as j as class i, i.e. false positive examples (FP), p ji False Negative (FN), p for the total number of pixels for which the true class is identified as i as class j jj The total number of pixels representing the true class identified as j as class j is a true inverse (TN).
Precision (Precision): the correct prediction is the positive proportion of all predictions, defined as follows:
Figure BDA0003692364260000132
recall (Recall): the correct prediction is the proportion of positive samples that are all positive, defined as follows:
Figure BDA0003692364260000141
f1 value: based on the harmonic mean of recall and precision, the following is defined:
Figure BDA0003692364260000142
overall Accuracy (OA): the proportion of pixels marked correctly to the total pixels is defined as follows:
Figure BDA0003692364260000143
kappa coefficient: the "bias" of the penalty model obtains a more fair model, which is defined as follows:
Figure BDA0003692364260000144
wherein p is o Represents the ratio of the number of correctly classified samples of each class to the total number of samples, equivalent OA; a1, a 2., aC represents the number of real samples of each type; b1, b 2.. bC represents the number of samples in each type of prediction; the number of classes is C, and the total number of samples is n;
mean cross-over ratio (MIoU): the sum average of the intersection and union ratio of the predicted result and the true value of each class is defined as follows:
Figure BDA0003692364260000145
frequency-weighted cross-over ratio (FWIoU): for the improvement of MIoU, weights are set according to the class occurrence frequency, defined as follows:
Figure BDA0003692364260000146
2. simulation experiment contents: the invention and the classic single-step anti-domain self-adaptive algorithm are respectively used for completing the extraction of the ground feature elements of the single-polarized high-resolution SAR image and the radar with the same resolution cross-region (the intra-class difference is large) on the private data set, and calculating the related evaluation index. Wherein, FIG. 3 shows the original SAR images of the source domain korea and the target domain shandong in the korea- > shandong experiment; fig. 4 shows a translated source domain korea (shandong style) SAR image and a translated target domain shandong (korea style) SAR image in the korea- > shandong experiment, and by comparing the two sets of images of fig. 3 and 4, it can be clearly seen that the translated graph (a) in fig. 4 is closer to the brightness and texture of the graph (b) in fig. 3, and the translated graph (b) is closer to the brightness and texture of the graph (a) in fig. 3, thereby further approximating the distribution of the two domains. The results of the various groups of experiments are shown in tables 3 and 4 below:
table 3: shandong- > korea same-resolution cross-region simulation result comparison table with radar
Evaluation index OA Kappa MIoU FWIoU
Model_1 0.6694 0.5372 0.3948 0.5385
Model_2 0.7355 0.6262 0.4553 0.6053
Table 4: comparison table of same-resolution trans-regional simulation results of korea-shandong and radar
Evaluation index OA Kappa MIoU FWIoU
Model_1 0.5888 0.4430 0.3414 0.4347
Model_2 0.8137 0.7520 0.5626 0.7029
3. And (3) analyzing an experimental result: as can be seen from Table 3, the self-adaptive total accuracy OA of the shandong- > korea domain on the data set spanning the same radar resolution can reach 73.55%, the MIoU can reach 45.53%, and the accuracy is improved by 5%; the effect is more remarkable in the domain self-adaption of the data set korea- > shandong, the total precision OA can reach 81.37%, the MIoU can reach 56.26%, and the precision is improved by 22%. In the shandong- > korea, farmlands and green plants are easy to be confused, green vegetation is obviously improved by the method, in the korea- > shandong, the shandong is a typical Chinese town image, the water area, cultivated land and building information contained in the image are mutually staggered and more complicated, the method enables the green plants and the buildings to be better distinguished, the area consistency is better, the edge information is clearer, a label map of the area is shown in fig. 5, a Model _1 prediction result map is shown in fig. 6, a Model _2 prediction result map is shown in fig. 7, and the fact that the label map is closer to that of fig. 6, the division is more accurate, and the boundary is clearer can be obviously seen.
By integrating the analysis of the simulation result, the method for extracting the surface feature of the high-resolution SAR image based on the depth unsupervised multi-step domain-adaptive anti-domain self-adaption effectively solves the problem that the prediction effect of the model on the test data set which does not satisfy independent and same distribution is poor due to the domain deviation of the cross-region data, the knowledge is transferred to different but related target domains from the source domain, the problem that the existing SAR data cannot support network training without a label is solved, and the classification precision of the classifier on the label-free target domain is improved.

Claims (9)

1. A high-resolution SAR image surface feature extraction method based on depth unsupervised multistep anti-domain self-adaptation is characterized by comprising the following steps: translating the source domain image into the style of a target domain by virtue of the style migration upstream task, and drawing the distribution of the source domain and the target domain closer; and the translated source domain and the unmarked target domain data are sent to an impedance self-adaptive network, a feature extractor is trained to extract and classify the features of the source domain and the target domain, and a domain discrimination network is trained to distinguish whether the output of the feature extractor comes from the source domain or the target domain and encourage the feature extractor to align the output distribution of the target domain and the source domain.
2. The method for extracting the surface feature of the high-resolution SAR image based on the depth unsupervised multi-step anti-domain self-adaptation is characterized by comprising the following steps of:
s1, performing data preprocessing on the source domain image and the target domain image, wherein the data preprocessing comprises the steps of converting 16 bits into 8 bits of the SAR image, cutting, clipping, dividing and converting the data format;
s2, sending the preprocessed source domain image S and the preprocessed target domain image T into an image translation network for style migration to obtain translation source data S';
s3, initializing a segmentation network M of a downstream task and an optimizer SGD thereof, and initializing a domain discrimination network D and an optimizer Adam thereof; distinguishing whether the output of the feature extractor comes from a source domain or a target domain through a training domain distinguishing network D, and simultaneously encouraging the feature extractor to align the output distribution of the target domain image and the source domain image to help the feature extractor to learn the invariant features of the domain;
s4, sending the translation source data S', the corresponding label Ys and the target domain image T into a segmentation network M to obtain segmentation outputs M (S) and M (T), and using the corresponding label Y S Calculating the segmentation loss of the source domain;
s5, inputting the output M (T) of the segmented network M to the target domain into a domain discrimination network D, calculating the confrontation loss of the discriminating network D, multiplying the confrontation loss by a corresponding coefficient, adding the multiplied counteracting loss into the target domain segmentation loss, and updating the segmented network M and the optimizer SGD thereof;
s6, respectively sending the output M (S) and M (T) of the divided network into a domain discrimination network D to calculate the domain classification loss, and updating the domain discrimination network D and an optimizer Adam thereof;
s7, repeating S4 to S6 until the maximum training times are reached, and obtaining model parameters of the segmentation network M;
s8, the target domain data are sent into a trained segmentation network M for classification, label optimization is carried out by using TTA test or training CRF to obtain pixel-level classification results, each class is endowed with colors to generate an RGB prediction result graph, the RGB prediction result graph is compared with a real class label, and evaluation indexes Precision, Recall and F1ccore of each class and overall evaluation indexes OA, kappa, MIoU and IoFWU are calculated.
3. The method according to claim 2, wherein S1 specifically is:
(1a) storing conversion and truncation: performing truncation/contrast stretching on the image, discarding the gray level with low occurrence probability, and reserving the gray level range with high occurrence probability; counting the gray level distribution of the 16-bit SAR image, counting the occurrence frequency of the 16-bit SAR image according to the size of the gray level, and when the current distribution is accumulated to a Threshold (Threshold) through an accumulation distribution function, discarding the residual pixels, setting all the pixels exceeding the Threshold gray level as the gray level of the current Threshold, and dividing the gray level by the Threshold and multiplying the gray level by 255 to convert the gray level into 8-bit SAR data storage;
the linear stretching formula is:
Figure FDA0003692364250000021
wherein gray represents a gray level; min in ,max in Respectively representing the minimum gray level and the maximum gray level of the truncation part in the input format; min out ,max out Respectively representing the minimum value and the maximum value of the gray level of the output format; for SAR data, Threshold is set to 95%, min in Set to zero;
(1b) resampling the test data by adopting an expansion sampling mode, and neglecting edge prediction during splicing; the result size of actual cutting image prediction is A, the result of splicing is a central area a, the percentage of the area of A occupied by A is r, and the overlapping proportion of adjacent cutting images is
Figure FDA0003692364250000022
Setting the dilation boundary slidesize to 100, the size of A-a;
(1c) based on an AI development platform ModelArts and a self-research framework Mindspore thereof, image data in the jpg format, the png format and the tif format are converted into a MindRecord format, and the data is further read through a MindDataset interface, wherein the data format has the characteristics that: the unified storage and access of data are realized; data are aggregated and stored and efficiently read, so that the data are convenient to manage and move during training; efficient data encoding and decoding operations enable a user to be unaware of the data operations; the partition size of data segmentation is flexibly controlled, and distributed data processing is realized.
4. The method according to claim 2, wherein S2 specifically is:
(2a) the translation network uses classical cycleGAN, establishes bidirectional mapping relation generators G and F between a source domain image S and a target domain image T, and uses two discriminators D S And D T Distinguishing source domain images S and F (T), target domain images T and G (S) respectively, wherein the loss function comprises two parts of loss resistance and cycle consistency loss; in addition, define
Figure FDA0003692364250000024
Is the sample space of the source domain S,
Figure FDA0003692364250000023
a sample space that is a target domain T;
(2b) the resistance loss: the mapped data distribution is made to approach that of the target domain, and the generator G learns the mapping of the source domain image S to the target domain image T (G: S → T); the generator F learns the mapping of the target domain image T to the source domain image S (F: T → S);
the challenge loss for S- > T is:
Figure FDA0003692364250000031
wherein G(s) is a false graph similar to the target domain Y generated by the generator G, D T Representing the probability that the input variable is a sample in T-space, aiming at distinguishing the translation sample G(s) from the true sample T; our goal is to minimize G and maximize D T
The challenge loss for T- > S is:
Figure FDA0003692364250000032
wherein F (t) is a false graph, D, generated by the generator F, similar to the source domain S S Representing the probability that the input variable is a sample in S-space, intended to distinguish the translated samples f (t) from the true samples S; the goal is to minimize F and maximize D S
(2c) Loss of cycle consistency: ensuring that generators G and F of two learned mappings do not contradict with each other, and hopefully, G (F (T)) is similar to T and F (G (S)) is similar to S as much as possible while the generators G and F learn the two mappings, so that the generator G is prevented from excessively learning samples of a target domain image T space and excessively changing samples of a source domain image S space, and L1loss is used;
Figure FDA0003692364250000033
for each image S from the source domain S, G and F satisfy the forward loop consistency, bringing S back to the original image one period after the image translation loop, i.e., x → G (x) → F (G (x)) apprxx; similarly, for each image T of the target field T, G and F should also satisfy the inverse cycle consistency, i.e., y → F (y) → G (F (y) ≈ y);
(2d) final loss function:
l(G,F,D S ,D T )=l GAN (G,D T ,S,T)+l GAN (F,D S ,S,T)+λl(G,F)
the final overall loss is expressed as the competing losses of S- > T, T- > S, and the cyclic consistent loss components of generator G and generator F, where λ is a coefficient;
the final goal is to optimize:
Figure FDA0003692364250000034
5. the method according to claim 2, wherein S3 specifically is:
(3a) the segmentation network M uses a DeepLabv2 model architecture based on ResNet101 and is used for outputting spatial structural information and continuously learning domain characteristics, so that a discriminator cannot distinguish two fields; the ASPP (asynchronous spatial profiling) module uses a multi-scale to increase the receptive field from k of the normal convolution to (k + (k-1) (r-1));
(3b) the domain discrimination network D consists of an input layer, 5 convolutional layers and an activation function layer, wherein the convolutional layers use 2D convolution, the pooling layers use LeakyReLU, and the alpha coefficient is 0.2; the LeakyReLU adjusts for the negative zero gradient (zero gradients) problem by giving the negative input α x a very small linear component of x, which gets a positive gradient of α when x <0, alleviating the Dead ReLU problem;
setting the number of feature maps to be 5 for the input layer of the layer 1;
for the 2 nd convolutional layer, setting the number of feature maps to 64, setting the filter size to 4, and setting the step size to 2;
for layer 3 convolutional layers, set the number of feature maps to 128, set the filter size to 4, and set the step size to 2;
for the 4 th convolutional layer, setting the number of feature maps to 256, setting the filter size to 4 and the step size to 2;
for the 5 th convolutional layer, setting the number of feature maps to be 512, setting the size of a filter to be 4 and the step size to be 2;
for the 6 th convolutional layer, setting the number of feature maps to be 1, setting the size of a filter to be 4 and the step length to be 2;
setting an alpha coefficient to be 0.2 for the 7 th layer of the activation function layer;
(3c) for the segmented network M, the maximum iteration number is set to 56000iter, and the initial learning rate lr is set to 2.5e -4 Weight attenuation of 5e -4 A random gradient descent method SGD is adopted to minimize a loss function of the segmentation network M;
(3d) for the domain discrimination network D, initial learning is setRate of 1e -4 Coefficient of opposing loss λ adv At 0.001, the loss function of the network D is judged using the adaptive moment estimation Adam minimization domain.
6. The method of claim 2, wherein the segmentation in S4 yields a loss/ seg Using the cross-entropy penalty, the source domain partitioning penalty is defined as follows:
Figure FDA0003692364250000041
wherein, Y S Is the label mapping of Is, C Is the class number, H and W are the height and width of the output probability mapping, P S Is the source domain probability of the segmentation adaptation model M, defined as P S =M(I’ S )。
7. The method according to claim 2, wherein the antagonistic loss in S5 is defined as follows:
Figure FDA0003692364250000051
wherein, define
Figure FDA0003692364250000052
Is the sample space of the source domain S,
Figure FDA0003692364250000053
a sample space that is a target domain T; i' s ,I t Respectively representing input translation source domain and target domain samples; discriminator D for counterlearning M Intended to reduce the difference between the source domain and target domain features extracted by the segmented network M;
the total loss function of the learning split network M is defined as follows:
lM=λ adv l adv (M(S′),M(T))+l seg (M(S′),Y S )
wherein λ is adv Representing the coefficients of the penalty loss, the total penalty of training the segmentation network M being the penalty loss ladv and the segmentation penalty l seg The sum of (1).
8. The method according to claim 2, wherein the domain discrimination network D in S6 uses BCEloss, which is defined as follows:
Figure FDA0003692364250000054
wherein S 'represents the translated source domain data, T represents the target domain data, and M (S'), M (T), the mapping of the segmentation network M to the source domain and the target domain; the domain discrimination network D is intended to distinguish whether input data is a source domain or a target domain.
9. The method of claim 2, wherein the test data in S8 enhances TTA: TTA makes the model predict each image by amplifying the copy of the image obtained by vertically and horizontally turning and inversely turning the test input image, then returns the set of the predictions, and obtains the final result of the image by averaging the prediction results of the original image and the turned image;
conditional random field CRF: a to the condition distribution carries on the model-building undirected graph model, when calculating the label of a certain pixel, obtain the information of the pixel of the field, make the segmentation result more accurate; the condition in the conditional random field is conditional probability, namely the probability that the current pixel belongs to a certain class under the condition of the gray value of the current pixel and the pixels in the surrounding area; the conditional probability distribution is specifically a Gibbs distribution, and refers to a conditional probability that the categories of all pixels on the image are determined: p ═ exp (-E)/Z;
wherein exp (-E) is the probability that the current pixel is the current segmentation result; z represents a matrix of the same size as the image, with the probability that the grey value of each primary color exactly constitutes the current image.
CN202210664345.1A 2022-06-14 2022-06-14 Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method Pending CN115049841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210664345.1A CN115049841A (en) 2022-06-14 2022-06-14 Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210664345.1A CN115049841A (en) 2022-06-14 2022-06-14 Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method

Publications (1)

Publication Number Publication Date
CN115049841A true CN115049841A (en) 2022-09-13

Family

ID=83161887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210664345.1A Pending CN115049841A (en) 2022-06-14 2022-06-14 Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method

Country Status (1)

Country Link
CN (1) CN115049841A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113366494A (en) * 2019-01-29 2021-09-07 辉达公司 Method for few-sample unsupervised image-to-image conversion
CN115830597A (en) * 2023-01-05 2023-03-21 安徽大学 Domain self-adaptive remote sensing image semantic segmentation method from local to global based on pseudo label generation
CN116403058A (en) * 2023-06-09 2023-07-07 昆明理工大学 Remote sensing cross-scene multispectral laser radar point cloud classification method
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 Target detection method, system, computing device and medium
CN118037755A (en) * 2024-04-11 2024-05-14 苏州大学 Focus segmentation domain generalization method and system based on double space constraint

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113366494A (en) * 2019-01-29 2021-09-07 辉达公司 Method for few-sample unsupervised image-to-image conversion
CN115830597A (en) * 2023-01-05 2023-03-21 安徽大学 Domain self-adaptive remote sensing image semantic segmentation method from local to global based on pseudo label generation
CN115830597B (en) * 2023-01-05 2023-07-07 安徽大学 Domain self-adaptive remote sensing image semantic segmentation method from local to global based on pseudo tag generation
CN116403058A (en) * 2023-06-09 2023-07-07 昆明理工大学 Remote sensing cross-scene multispectral laser radar point cloud classification method
CN116403058B (en) * 2023-06-09 2023-09-12 昆明理工大学 Remote sensing cross-scene multispectral laser radar point cloud classification method
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 Target detection method, system, computing device and medium
CN118037755A (en) * 2024-04-11 2024-05-14 苏州大学 Focus segmentation domain generalization method and system based on double space constraint

Similar Documents

Publication Publication Date Title
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN110119728B (en) Remote sensing image cloud detection method based on multi-scale fusion semantic segmentation network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN115049841A (en) Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method
CN106547880B (en) Multi-dimensional geographic scene identification method fusing geographic area knowledge
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
CN106909902B (en) Remote sensing target detection method based on improved hierarchical significant model
CN111259906B (en) Method for generating remote sensing image target segmentation countermeasures under condition containing multilevel channel attention
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
Liu et al. Remote sensing image change detection based on information transmission and attention mechanism
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN108052966A (en) Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique
CN112541508A (en) Fruit segmentation and recognition method and system and fruit picking robot
CN103049763A (en) Context-constraint-based target identification method
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN110334656B (en) Multi-source remote sensing image water body extraction method and device based on information source probability weighting
CN112633140A (en) Multi-spectral remote sensing image urban village multi-category building semantic segmentation method and system
CN115565019A (en) Single-channel high-resolution SAR image ground object classification method based on deep self-supervision generation countermeasure
CN114283431B (en) Text detection method based on differentiable binarization
CN110135435B (en) Saliency detection method and device based on breadth learning system
Wang et al. Pedestrian detection in infrared image based on depth transfer learning
CN115019163A (en) City factor identification method based on multi-source big data
CN112330562B (en) Heterogeneous remote sensing image transformation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination