CN113160234B

CN113160234B - Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation

Info

Publication number: CN113160234B
Application number: CN202110530385.2A
Authority: CN
Inventors: 郭学俊; 陈泽华; 杨佳林; 刘晓峰; 赵哲峰; 杨莹; 张佳鹏
Original assignee: Taiyuan University of Technology
Current assignee: Shanxi Corps Of China Building Materials Industry Geological Exploration Center
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2021-12-14
Anticipated expiration: 2041-05-14
Also published as: CN113160234A

Abstract

The invention relates to a non-supervision remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation, belonging to the technical field of remote sensing image semantic segmentation methods; the technical problem to be solved is as follows: the improvement of the unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation is provided; the technical scheme for solving the technical problems is as follows: the method comprises the following steps: acquiring a source domain low-resolution remote sensing image data set and a target domain high-resolution remote sensing image data set, and dividing the acquired target domain image data set into a training image and a test image according to a set proportion; building a remote sensing image semantic segmentation network and a super-resolution network; performing network pre-training and parameter optimization on the built super-resolution network; training a semantic segmentation network of the remote sensing image; inputting the preprocessed test set data into a trained remote sensing image semantic segmentation network, and outputting an accurate segmentation result of the remote sensing image; the invention is applied to remote sensing image processing.

Description

Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation

Technical Field

The invention relates to an improved unsupervised high-resolution remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation, and belongs to the technical field of remote sensing image semantic segmentation methods.

Background

In recent years, with continuous progress and wide application of high-resolution earth observation technology, the spatial resolution of high-resolution remote sensing data is continuously improved and accumulated in a geometric progression, so that how to automatically, quickly and accurately extract high-value geographic information from a high-resolution remote sensing image becomes one of important problems which need to be solved urgently. Semantic segmentation marks each pixel in an image as a specific ground feature type, also called ground feature extraction or land classification, is one of important means for extracting information of a high-resolution remote sensing image, and is widely applied to the fields of land planning, environment monitoring, disaster assessment and the like.

The deep neural network can automatically extract semantic information of each grade from an image, and has strong feature expression capability, so that the deep neural network has great success in image semantic segmentation application at present. The excellent performance of these deep learning based semantic segmentation methods relies on remote sensing image labels labeled at the millions of pixel level. Because manual labeling of high-resolution remote sensing images is time-consuming, labor-consuming and requires abundant professional knowledge, the current semantic segmentation model in the field only depends on a small-scale training set acquired by a specific time period, a limited individual region and a specific remote sensing detector. This leads to these models and their limited generalization performance, and the segmentation accuracy is greatly reduced when the models are applied to different regions or probes, i.e. Domain Shifts (Domain Shifts). In order to solve the domain migration and fully utilize the existing data set, the unsupervised domain self-adaptive method realizes the semantic segmentation task of the unlabeled target domain data set by migrating the knowledge learned in the source domain data set, wherein the domain self-adaptive method adopting the countermeasure generation network learns the domain invariant features through the countermeasures of the generator and the discriminator, and the inter-domain difference can be effectively reduced.

The method is different from most of current high-Resolution remote sensing image Domain self-adaptive methods, only style difference, namely spectrum difference, between different detectors is considered, an unsupervised semantic segmentation method based on Super-Resolution Domain Adaptation (SRDA) notices that remote sensing images obtained by different detectors have difference in Resolution, meanwhile, different types of ground objects have different sizes in the remote sensing images, and the unsupervised Domain self-adaptive semantic segmentation of the remote sensing images is realized by utilizing the characteristics that the Super-Resolution and semantic segmentation multitask and multi-scale generation confrontation network learns spectrum and scale invariance at the same time. The model utilizes an porous Spatial Pyramid Pooling Module (atomic Spatial Pyramid Pooling Module) to extract multi-scale features or scale invariant features, but the void Convolution (scaled Convolution) is a sparse calculation and may cause Grid Artifacts (Grid Artifacts); while the spatial pyramid pooling module may cause pixel-level positioning information to be lost. In addition, semantic segmentation of high-resolution remote sensing images often faces a serious variety imbalance problem.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to solve the technical problems that: the method provides an improvement of a semantic segmentation method of the unsupervised remote sensing image based on super-resolution and domain self-adaptation.

In order to solve the technical problems, the invention adopts the technical scheme that: the unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation comprises the following steps:

the method comprises the following steps: acquiring a source domain low-resolution remote sensing image data set and a target domain high-resolution remote sensing image data set, and dividing the acquired target domain image data set into a training image and a test image according to a set proportion;

forming a training set of image semantic segmentation by using the source domain image data set and the target domain training image, and forming a test set of image semantic segmentation by using the test image of the target domain;

preprocessing the remote sensing image data of the training set to obtain a remote sensing image data set subjected to data enhancement;

step two: building a remote sensing image semantic segmentation network, wherein the super-resolution remote sensing image semantic segmentation network comprises a feature coding module, a super-resolution domain distinguishing module, a semantic segmentation module and a semantic segmentation domain distinguishing module, and the feature coding module and the super-resolution module jointly form the super-resolution network of the remote sensing image;

step three: performing network pre-training and parameter optimization on the super-resolution network built in the step two;

step four: training a semantic segmentation network of the remote sensing image;

step five: and inputting the preprocessed test set data into the trained semantic segmentation network of the remote sensing image in the fourth step, and outputting an accurate segmentation result of the remote sensing image.

And acquiring a source domain low-resolution remote sensing image data set and a target domain high-resolution remote sensing image data set in the first step through a remote sensing satellite, wherein the source domain low-resolution remote sensing image data set comprises a low-resolution original image and a label data image which is artificially marked, and the target domain high-resolution remote sensing image data set comprises a high-resolution original image.

Preprocessing the source domain image data set and the target domain training set image remote sensing image data in the first step specifically comprises image cutting, image sampling and data enhancement of original images in a training set;

the image clipping specifically comprises the following steps: cutting the original image of the source domain and the label data image into images with 256 pixels multiplied by 256 pixels and the resolution of 1 meter per pixel; cutting the target domain training set and the test set image into images with 512 pixels multiplied by 512 pixels and 0.5 meter per pixel resolution;

the image sampling specifically comprises: up-sampling the original image of the source domain and the label data image to obtain a high-resolution image with 512 pixels multiplied by 512 pixels and the resolution of 0.5 meter per pixel; down-sampling the target domain image to obtain a low-resolution image with 256 pixels multiplied by 256 pixels and the resolution of 1 meter per pixel;

the data enhancement comprises: and carrying out image rotation, image vertical and horizontal overturning and image size adjustment on the images in the semantic segmentation training set of the remote sensing images.

The second step of building the super-resolution remote sensing image semantic segmentation network comprises the following steps:

step 2.1: inputting the image into a feature coding module to obtain multi-scale and multi-level image features: the feature coding module realizes the hierarchical feature extraction of the image from the bottom level detail feature to the high level semantic feature through convolution and maximum pooling operation, and specifically comprises the following steps: performing convolution and maximum pooling operation on the image for three times to extract bottom-layer features of the network, extracting multi-scale features of the image through a residual feature pyramid attention module, fusing the extracted multi-scale features of the image through a residual network module, and repeating the extraction and fusion of the multi-scale features for two times to finally obtain rich image features with multiple scales and multiple levels;

step 2.2: 2.1, the image features extracted by the feature coding module in the step 2.1 are processed by the super-resolution module to obtain a high-resolution image with the enlarged size of the original image, then the image generated by the super-resolution module is input into the super-resolution domain judging module, and the domain to which the input image belongs is judged to optimize the parameters of the super-resolution module and the feature coding module;

the feature coding module and the super-resolution module jointly form a super-resolution network of the remote sensing image, and the super-resolution network is combined with the super-resolution domain distinguishing module to jointly realize the super-resolution of the low-resolution image;

step 2.3: the features extracted by the feature coding module and the super-resolution module trained in the step 2.2 are taken as the input of the semantic segmentation module together: the features extracted by the feature coding module and part of the features in the image super-resolution module are input into a semantic segmentation module of the image together to realize semantic segmentation of the remote sensing image data;

the system comprises a feature extraction module, a super-resolution module, a semantic segmentation module and a semantic segmentation module, wherein the feature extraction module, the super-resolution module and the semantic segmentation module jointly form a semantic segmentation network of an image, a high-resolution remote sensing image and a probability map generated by the semantic segmentation module are spliced and then input into the semantic segmentation domain discrimination module to optimize the semantic segmentation network of the image, and finally, the segmentation function of the remote sensing image is realized.

The third step specifically comprises:

step 3.1: carrying out parameter random initialization on the feature coding module and the super-resolution module, inputting the training set data preprocessed in the step one into the remote sensing image super-resolution network in the step two to generate a high-resolution image, and calculating super-resolution loss;

inputting the generated high-resolution image and the original high-resolution image into a super-resolution domain discrimination module, discriminating the domain of the input image, and calculating the discrimination loss of the super-resolution domain;

step 3.2: loss reverse propagation is performed, parameters of a super-resolution network and a super-resolution domain discrimination module are alternately optimized, and super resolution of the low-resolution image is finally achieved;

step 3.3: and after the training is finished, storing the parameters of the trained feature coding module, the super-resolution module and the super-resolution domain distinguishing module.

The fourth step specifically comprises:

step 4.1: initializing a feature coding module, a super-resolution module and a super-resolution domain distinguishing module in the semantic segmentation network by using the model parameters saved in the third step, simultaneously performing random initialization on the parameters of the semantic segmentation module and the semantic segmentation domain distinguishing module, inputting the training set data preprocessed in the first step into the remote sensing image semantic segmentation network in the second step, generating a semantic segmentation probability map of the remote sensing image, and calculating semantic segmentation loss;

splicing the high-resolution remote sensing image and the semantic segmentation probability map, and inputting the spliced high-resolution remote sensing image and the semantic segmentation probability map into a semantic segmentation domain discrimination module to realize discrimination of a domain to which the semantic segmentation network generation probability map belongs and calculate discrimination loss of the semantic segmentation domain;

step 4.2: loss back propagation, namely alternately optimizing parameters of the semantic segmentation network and the two domain discrimination modules, and finally finishing the optimization of the parameters of the semantic segmentation network by taking the minimization of a loss function as an optimization target;

step 4.3: and after the training is finished, storing the trained semantic segmentation network model parameters.

The network structure of the feature coding module in step 2.1 is as follows:

the first, second and third layers are convolution layers: performing convolution with convolution kernel size of 3 × 3 and step size of 1;

the fourth layer is a maximum pooling layer: the largest pooling layer with the step length of 2 is arranged behind the convolution layer;

the feature coding module is provided with two repeated residual feature pyramid attention modules and a residual module for realizing multi-scale feature fusion behind the maximum pooling layer;

the residual error feature pyramid attention module is composed of three continuous feature pyramid attention module networks containing residual error connection, the feature pyramid attention module is divided into two paths, the first path adopts global pooling for input features, convolution with convolution kernel of 1 × 1 and an upper sampling layer to achieve feature transfer of the network, the second path adopts a U-shaped network structure to achieve multi-layer feature extraction, convolution operation with step length of 2 is conducted on the features three times to obtain feature maps with different sizes of input feature sizes 1/2, 1/4 and 1/8, the convolution kernels of the convolution three times are respectively 7 × 7, 5 × 5 and 3 × 3 in size, then the feature map with size of 1/8 is sampled and overlapped with the feature map with size of 1/4, the steps are repeated twice, and finally the output feature map and the feature map which is subjected to convolution with size of 1 × 1 are multiplied pixel by pixel to obtain the feature map with the same size as the input feature map Figure representation; finally, overlapping the characteristic diagrams of the two paths to obtain a multi-scale characteristic diagram;

and the two convolution operations in the residual error module realize the characteristic channel fusion by the convolution with the step length of 1 and the convolution kernel of 3 multiplied by 3, and the residual error module is internally provided with a short circuit connection for accelerating the network convergence.

The super-resolution module reduces the number of output channels by half through a decoder module, and simultaneously gradually restores the size of an image; the decoder module comprises a convolution layer with convolution kernel of 1 multiplied by 1 and step length of 1 and a deconvolution layer with convolution kernel of 3 multiplied by 3 and step length of 2, and is used for controlling the number of output channels and the rise of image resolution; finally obtaining a super-resolution image of the image through a continuous three-time deconvolution module;

the semantic segmentation module gradually restores the resolution of the image through the decoder module, and simultaneously cascades the decoder module and the image with the same resolution restored by the super-resolution as the input of the next decoder module, and finally realizes the semantic segmentation of the network through the two decoder modules;

the super-resolution domain distinguishing module consists of five convolution layers with convolution kernel size of 3 multiplied by 3 and step length of 2, a residual error feature pyramid attention module and a sigmoid activation layer, wherein feature extraction of a high-resolution image is realized through the convolution layers, feature extraction and integration are performed on a network through the feature pyramid attention module, and finally a final super-resolution domain label feature map is obtained through the sigmoid activation layer;

the semantic segmentation domain distinguishing module consists of five convolution layers with convolution kernel size of 3 multiplied by 3 and step length of 2, a residual error feature pyramid attention module and a sigmoid activation layer, the feature extraction of a semantic segmentation image and a probability map is realized through the convolution layers, and finally, the final semantic segmentation domain label feature map is obtained through the sigmoid activation layer.

In the third step, the data set used in training the remote sensing image super-resolution network is low-resolution target domain data, initial high-resolution target domain data and low-resolution source domain data which are subjected to down-sampling;

in the third step, the loss function used in training the remote sensing image super-resolution network is a mean square loss function, and a calculation formula of the mean square loss function is as follows:

in the above formula: x is a super-resolution image generated by a super-resolution network, Y is a real high-resolution image, and N is the number of image pixel points;

the loss function of the super-resolution domain discrimination module in the third step is a mean square loss function, and the calculation formula of the mean square loss function is as follows:

L_dsr＝Ε_S[(I_s-1)²]+Ε_T[(I_t)²]

L_dsrinv＝Ε_S[(I_s)²]+Ε_T[(I_t-1)²]；

in the above formula: l is_dsrFor loss of the super-resolution domain discrimination module Dsr in training the generator, L_dsrinvFor loss of the super-resolution domain discrimination module Dsr when training the discriminator network, I_sSuper-resolution maps generated for source domain low resolution images, I_tHigh resolution image generated for down-sampling a low resolution image in the target domain, E_STo expect the loss of all inputs belonging to the source domain S, E_TThe loss of all the target domains T is expected;

in the third step, when a super-resolution network E-SR consisting of the characteristic extraction module E and the super-resolution module SR is trained, the loss function L is minimized_GTo optimize the parameter theta of the super-resolution network discriminator section_E-SR；

When the super-resolution domain discrimination module Dsr is trained, the loss function L is minimized_DTo optimize the network parameter theta of the domain discriminator part of the super-resolution domain discrimination module_Dsr；

The super-resolution of the image is realized through the alternative confrontation training of a super-resolution network and a super-resolution domain discrimination module;

the loss function of the super-resolution network in the training process is as follows:

the loss function of the super-resolution domain discrimination module in the training process is as follows:

in the fourth step, a loss function used in training the remote sensing image semantic segmentation network is a Dice coefficient loss function and a cross entropy loss function which are jointly used as loss functions, wherein a calculation formula of the cross entropy loss function is as follows:

in the above formula: y is a real label graph, y' is a predicted label graph, and N is the number of pixel points of the image;

the calculation formula of the Dice coefficient loss function is as follows:

in the above formula: x is a generated prediction label probability graph, Y is a real label graph, | X |, N.Y | is an intersection between the real label graph and the prediction label graph, | X | is the number of elements of the prediction label graph, | Y | is the number of elements of the real label, and K is the category number of the label;

the loss function of the semantic partition domain judging module in the fourth step is a mean square loss function, and the calculation formula of the mean square loss function is as follows:

L_ds＝Ε_S[(L_s)²]+Ε_T[(L_t-1)²]

L_dsinv＝Ε_S[(L_s-1)²]+Ε_T[(L_t)²]；

in the above formula: l is_dsFor loss of the semantic Domain discriminant Module Ds when training the generators, L_dsinvFor loss of the semantic Domain partition discrimination Module Ds in training the discriminator, L_sSemantic segmentation domain discrimination Module Label map, L, generated for Source Domain images_tSemantic segmentation domain discrimination Module tag map generated for target Domain images, E_STo expect the loss of all inputs belonging to the source domain S, E_TThe loss of all the target domains T is expected;

in the fourth step, when a semantic segmentation network E-SR-S consisting of the feature extraction module E, the semantic segmentation module S and the super-resolution module SR is trained, a loss function L is minimized_GTo optimize the parameter theta of a semantic segmentation network_E-SR-SWherein the loss function L_GThe sum of the cross entropy loss function, the Dice coefficient loss function, the super resolution loss and the loss of a super resolution domain discrimination module and a semantic segmentation domain discrimination module is obtained;

when training the super-resolution domain discrimination module Dsr and the semantic segmentation domain discrimination module Ds, the loss function L is minimized_DTo optimize the network parameters theta of the two domain discrimination modules_Dsr；

The semantic segmentation of the image is realized through the alternative countermeasure training of a semantic segmentation network, a semantic segmentation domain judging module Dsr and a super-resolution domain judging module Ds;

the loss function of the semantic segmentation network in the training process is as follows:

loss functions of the super-resolution domain discrimination module and the semantic segmentation domain discrimination module in the training process are as follows:

compared with the prior art, the invention has the beneficial effects that:

1) the method of the invention uses a residual error feature pyramid attention module in a feature coding module, and ensures that more global information is extracted by extracting image features under different resolutions. The structure can adopt convolution kernels of different receptive fields to achieve feature acquisition of targets of different sizes, and can be combined with residual connection to avoid explosion and disappearance of gradients.

2) The problem of unbalanced variety in semantic segmentation of the high-resolution remote sensing image can be effectively solved.

3) The method adopts the characteristics of jump connection transmission from the super-resolution module when the image semantic segmentation network is built. In the image processing, the feature map transmitted from the super-resolution process through jump connection not only contains the position, edge and other detailed features of the target, but also contains a large amount of high-level semantic information; the method has good segmentation effect and strong robustness.

Drawings

The invention is further described below with reference to the accompanying drawings:

FIG. 1 is a flow chart of an embodiment of an image semantic segmentation network constructed in the method of the present invention;

FIG. 2 is a schematic diagram of a structure of an image semantic segmentation network constructed in the method of the present invention;

FIG. 3 is a schematic diagram of a super-resolution network in an image semantic segmentation network constructed in the method of the present invention;

FIG. 4 is a schematic diagram of a component structure of a feature coding module in an image semantic segmentation network constructed in the method of the present invention;

FIG. 5 is a schematic diagram of a structure of a residual feature pyramid attention module in the image semantic segmentation network constructed in the method of the present invention;

FIG. 6 is a schematic diagram of a structure of a feature pyramid attention module in an image semantic segmentation network constructed in the method of the present invention;

FIG. 7 is a schematic diagram of a structure of a residual error module in an image semantic segmentation network constructed by the method of the present invention;

FIG. 8 is a schematic diagram of a super-resolution domain discrimination module in the image semantic segmentation network constructed in the method of the present invention;

FIG. 9 is a schematic diagram of the structure of the semantic segmentation domain discrimination module in the image semantic segmentation network constructed by the method of the present invention.

Detailed Description

As shown in fig. 1 to 9, in the unsupervised remote sensing image semantic segmentation method based on super-resolution and domain adaptation, the feature pyramid attention module replaces the ASPP module in the original unsupervised remote sensing image semantic segmentation method based on super-resolution domain adaptation, and the residual feature pyramid attention module is applied to the discriminator to acquire accurate pixel level attention for high-level semantic features; the method for relieving the class imbalance problem through the Dice coefficient loss function comprises the following steps:

the method comprises the following steps: obtaining a source domain low-resolution remote sensing image data set and a target domain high-resolution remote sensing image data set, wherein the source domain remote sensing image data set and the target domain remote sensing image data set are obtained through a remote sensing satellite; the source domain data comprises a low-resolution original image and an artificially marked label data image, and the target domain data only comprises a high-resolution original image; dividing an acquired target domain image data set into a training image and a test image according to a certain proportion, forming a training set image of image semantic segmentation by using source domain data and a target domain training set image, and forming a test set image of image semantic segmentation by using a test set image of a target domain;

step two: building a semantic segmentation network of the remote sensing image: the remote sensing image semantic segmentation network comprises a feature coding module, a super-resolution domain distinguishing module, a semantic segmentation module and a semantic segmentation domain distinguishing module, and the construction steps comprise:

step 2.1: and inputting the image into a characteristic coding module to obtain multi-scale and multi-level image characteristics. The feature coding module realizes the hierarchical feature extraction of the image from the bottom level detail feature to the high level semantic feature through convolution and maximum pooling operation. Performing convolution and maximum pooling operation on the image for three times to extract bottom-layer features of the network, extracting multi-scale features of the image through a residual feature pyramid attention module, fusing the extracted multi-scale features of the image through a residual network module, and repeating the extraction and fusion of the multi-scale features for two times to finally obtain rich image features with multiple scales and multiple levels;

step 2.2: and 2.1, obtaining a high-resolution image with the enlarged size of the original image by the image features extracted by the feature coding module in the step 2.1 through the super-resolution module, and inputting the image generated by the super-resolution module into the super-resolution domain distinguishing module to realize the distinguishing of the domain to which the input image belongs so as to optimize the parameters of the super-resolution module and the feature coding module. The feature coding module and the super-resolution module jointly form a super-resolution network of the remote sensing image, and the super-resolution network is combined with the super-resolution domain distinguishing module to jointly realize the super-resolution of the low-resolution image;

step 2.3: and (3) taking the features extracted by the feature coding module and the super-resolution module trained in the step 2.2 as the input of the semantic segmentation module. The features extracted by the feature coding module and part of the features in the image super-resolution module are input into the semantic segmentation module of the image together, so that the semantic segmentation of the remote sensing image data is realized. The feature extraction module, the super-resolution module and the semantic segmentation module jointly form a semantic segmentation network of the image. Splicing the high-resolution remote sensing image and the probability map generated by the semantic segmentation module, inputting the spliced probability map into the semantic segmentation domain discrimination module to realize optimization of the image semantic segmentation network, and finally realizing the segmentation function of the remote sensing image

Step three: pre-training and parameter optimization of the super-resolution network:

step 3.1: carrying out parameter random initialization on the feature coding module and the super-resolution module, inputting the training set data preprocessed in the step one into the remote sensing image super-resolution network in the step two to generate a high-resolution image, and calculating super-resolution loss; and inputting the generated high-resolution image and the original high-resolution image into a super-resolution domain discrimination module to realize discrimination of the domain to which the input image belongs and calculate the discrimination loss of the super-resolution domain.

Step 3.2: loss reverse propagation is performed, parameters of the super-resolution network and the super-resolution domain discrimination module are alternately optimized, and finally super resolution of the low-resolution image is achieved.

Step 3.3: after training is finished, storing the parameters of the trained feature coding module, the super-resolution module and the super-resolution domain distinguishing module;

step four: training a semantic segmentation model of the remote sensing image:

step 4.1: initializing a feature coding module, a super-resolution module and a super-resolution domain distinguishing module in the semantic segmentation network by using the model parameters saved in the third step, simultaneously performing random initialization on the parameters of the semantic segmentation module and the semantic segmentation domain distinguishing module, inputting the training set data preprocessed in the first step into the remote sensing image semantic segmentation network in the second step, generating a semantic segmentation probability map of the remote sensing image, and calculating semantic segmentation loss; and splicing the high-resolution remote sensing image and the semantic segmentation probability map, and inputting the spliced high-resolution remote sensing image and the semantic segmentation probability map into a semantic segmentation domain discrimination module to realize discrimination of the domain to which the semantic segmentation network generation probability map belongs and calculate the discrimination loss of the semantic segmentation domain.

Step 4.2: and (4) loss back propagation, alternately optimizing parameters of the semantic segmentation network and the two domain discrimination modules, and finally finishing the optimization of the parameters of the semantic segmentation network by taking the minimization of a loss function as an optimization target.

Step 4.3: after training is finished, storing the trained semantic segmentation network model parameters;

step five: and inputting the processed test set data into the trained semantic segmentation network of the remote sensing image in the fourth step, and outputting an accurate segmentation result of the remote sensing image.

Preprocessing the remote sensing image training set data in the first step, wherein the preprocessing comprises image cutting, image sampling and data enhancement of original images in the training set;

the image clipping specifically comprises the following steps: cutting the source domain training set image and the label into an image with 256 pixels multiplied by 256 pixels and the resolution of 1 meter per pixel; cutting the target domain training and testing set image into an image with 512 pixels multiplied by 512 pixels and 0.5 meter per pixel resolution;

the image sampling specifically comprises: and upsampling the source domain image and the label to obtain a high-resolution image with 512 pixels multiplied by 512 pixels and the resolution of 0.5 meter per pixel. And downsampling the target domain image to obtain a low-resolution image with the resolution of 1 meter per pixel and the pixel of 256 pixels.

The step 2.1 is as follows as the network structure of the feature coding module:

the feature coding module is provided with two repeated residual feature pyramid attention modules and a residual module for realizing multi-scale feature fusion behind the maximum pooling layer; the residual error feature pyramid attention module consists of three continuous feature pyramid attention module networks containing residual error connection; the feature pyramid attention module is divided into two paths, the first path realizes network feature transfer by adopting global pooling, convolution with convolution kernel of 1 × 1 and an up-sampling layer for input features, the second path adopts a U-shaped network structure to realize multi-layer feature extraction, and convolution operation with step length of 2 is carried out on the features for three times to obtain feature graphs with different sizes of input feature sizes 1/2, 1/4 and 1/8, wherein the convolution kernels of the three times of convolution are 7 × 7, 5 × 5 and 3 × 3 respectively. Sampling the characteristic diagram with the size of 1/8, overlapping the characteristic diagram with the size of 1/4, repeating the steps twice, and finally multiplying the output characteristic diagram and the characteristic diagram which is subjected to 1 multiplied by 1 convolution pixel by pixel to obtain the characteristic diagram with the same size as the input characteristic diagram; and finally, overlapping the characteristic diagrams of the two paths to obtain a multi-scale characteristic diagram. And the combination of two convolution operations in the residual error module realizes the fusion of the characteristic channels by convolution with the step length of 1 and the convolution kernel of 3 multiplied by 3, and the residual error module is internally provided with a short circuit connection for accelerating the network convergence.

The super-resolution module reduces the number of output channels by half through the decoder module, and simultaneously gradually restores the size of the image. The decoder module comprises a convolution layer with convolution kernel of 1 multiplied by 1 and step length of 1 and a deconvolution layer with convolution kernel of 3 multiplied by 3 and step length of 2, and is used for controlling the number of output channels and the rise of image resolution; and finally obtaining a super-resolution image of the image through a continuous three-time deconvolution module.

The semantic segmentation module gradually restores the resolution of the image through the decoder module, and simultaneously cascades the decoder module and the image with the same resolution restored by the super-resolution as the input of the next decoder module, and finally realizes the semantic segmentation of the network through the two decoder modules.

The super-resolution domain distinguishing module comprises five convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 2, a residual feature pyramid attention module and a sigmoid activation layer, wherein feature extraction of a high-resolution image is achieved through the convolution layers, then feature extraction and integration are conducted on a network through the residual feature pyramid attention module, and finally a final super-resolution domain label feature map is obtained through the sigmoid activation layer.

The semantic segmentation domain distinguishing module comprises five convolution layers with the convolution kernel size of 3 multiplied by 3 and the step length of 2, a residual error feature pyramid attention module and a sigmoid activation layer, the feature extraction of a semantic segmentation image and a probability map is realized through the convolution layers, then the feature extraction and integration are carried out on the network through the residual error feature pyramid attention module, and finally the final semantic segmentation domain label feature map is obtained through the sigmoid activation layer.

In the third step, the data set used in training the remote sensing image super-resolution network is low-resolution target domain data, initial high-resolution target domain data and low-resolution source domain data which are subjected to down-sampling, and the loss function is a mean square loss function as a loss function, wherein the calculation formula of the mean square loss function is as follows:

the loss function of the super-resolution domain discrimination module in the third step is a mean square loss function, and the calculation formula of the mean square loss function is

L_dsr＝Ε_S[(I_s-1)²]+Ε_T[(I_t)²](2)

L_dsrinv＝Ε_S[(I_s)²]+Ε_T[(I_t-1)²](3)

In the above formula: i is_sSuper-resolution maps generated for source domain low resolution images, I_tA high resolution image generated for down-sampling the low resolution image in the target domain; l is_dsrLoss of a super-resolution domain discrimination module Dsr when a generator is trained; l is_dsrinvLoss of a super-resolution domain discrimination module Dsr when a discriminator network is trained;

and the countermeasure training process of the super-resolution network in the third step is to alternately optimize the super-resolution network and the super-resolution domain self-adaptive module. When training the super-resolution network E-SR formed by the characteristic extraction module E and the super-resolution module SR, the loss function L is minimized_GOptimizing parameters of the super-resolution network; when the super-resolution domain discrimination module Dsr is trained, the loss function L is minimized_DTo optimize the network parameters of the domain discriminator section. And realizing the super-resolution of the image through the alternative confrontation training of the super-resolution network and the super-resolution domain discriminator module. The loss function of the network during training is:

in the above formula: y is a real label graph, y' is a predicted label graph, N is the number of pixel points of the image, and K;

the calculation formula of the Dice coefficient loss function is as follows:

the loss function of the semantic segmentation domain discrimination module in the fourth step is a mean square loss function, and the calculation formula of the mean square loss function is

L_ds＝Ε_S[(L_s)²]+Ε_T[(L_t-1)²] (8)

L_dsinv＝Ε_S[(L_s-1)²]+Ε_T[(L_t)²] (9)

In the above formula: l is_sSemantic segmentation domain discrimination Module Label map, L, generated for Source Domain images_tGenerating a semantic segmentation domain discrimination module label graph for the target domain image; l is_dsLoss of a semantic segmentation domain discrimination module Ds when training a generator; l is_dsinvFor trainingLoss of a semantic division domain discrimination module Ds during discriminant training;

the countermeasure training process of the semantic segmentation network in the fourth step is alternately optimizing the semantic segmentation network, the super-resolution domain judging module and the semantic segmentation domain judging module. When training a semantic segmentation network E-SR-S consisting of a feature extraction module E, a semantic segmentation module S and a super-resolution module SR, a loss function L is minimized_GTo optimize parameters of a semantically segmented network, wherein a loss function L_GThe sum of the cross entropy loss function, the Dice coefficient loss function, the super-resolution loss and the loss of the two domain discrimination modules is obtained; when training the super-resolution domain discrimination module Dsr and the semantic segmentation domain discrimination module Ds, the loss function L is minimized_DNetwork parameters of the two domain discrimination modules are optimized. And the semantic segmentation network E-SR-S, the semantic segmentation domain discriminator module Dsr and the super-resolution domain discrimination module Ds alternately resist training to realize the semantic segmentation of the image. The loss function of the network during training is:

dividing remote sensing image data sets under different resolutions into a training set and a test set according to a certain proportion; constructing a depth remote sensing image semantic segmentation network combining super-resolution and domain self-adaptation: the remote sensing image semantic segmentation network comprises a feature coding module, a super-resolution domain distinguishing module, a semantic segmentation module and a semantic segmentation domain distinguishing module; inputting the preprocessed training set data into a remote sensing image semantic segmentation network, training the remote sensing image super-resolution network and the image semantic segmentation network in a segmented manner, and storing network parameters; and inputting the test set data into the trained remote sensing image semantic segmentation network, and outputting the segmentation result of the test image data.

The invention discloses a semantic segmentation method for an unsupervised remote sensing image by combining super-resolution and domain self-adaptation. Dividing the image into a test set and a training set, and preprocessing the image in the training set; then, a remote sensing image semantic segmentation network based on a deep learning network is constructed, a training set image segmentation training remote sensing super-resolution network and the semantic segmentation network are input, and model parameters are stored when the network converges; and finally, obtaining a final prediction result image from the test set image through an image semantic segmentation network. Compared with the prior art, the method realizes semantic segmentation of the remote sensing image by adding the super-resolution module and the multi-size feature extraction module. The method has the advantages of good segmentation effect and strong robustness.

It should be noted that, regarding the specific structure of the present invention, the connection relationship between the modules adopted in the present invention is determined and can be realized, except for the specific description in the embodiment, the specific connection relationship can bring the corresponding technical effect, and the technical problem proposed by the present invention is solved on the premise of not depending on the execution of the corresponding software program.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation is characterized by comprising the following steps of: the method comprises the following steps:

step five: inputting the preprocessed test set data into the trained semantic segmentation network of the remote sensing image in the fourth step, and outputting an accurate segmentation result of the remote sensing image;

the super-resolution domain distinguishing module is provided with a residual characteristic pyramid attention module;

the semantic division domain distinguishing module is provided with a residual error characteristic pyramid attention module;

the residual error feature pyramid attention module is composed of three continuous feature pyramid attention module networks containing residual error connection, the feature pyramid attention module is divided into two paths, the first path adopts global pooling for input features, convolution with convolution kernel of 1 × 1 and an upper sampling layer to achieve feature transfer of the network, the second path adopts a U-shaped network structure to achieve multi-layer feature extraction, convolution operation with feature step length of 2 is conducted three times to obtain feature maps with different sizes of input feature sizes 1/2, 1/4 and 1/8, the convolution kernels of the convolution for three times are respectively 7 × 7, 5 × 5 and 3 × 3, then the feature map with size of 1/8 is sampled and overlapped with the feature map with size of 1/4, the steps are repeated twice, and finally the output feature map and the feature map which is subjected to convolution with size of 1 × 1 are multiplied pixel by pixel to obtain the feature map with the same size as the input feature map Figure representation; finally, overlapping the characteristic diagrams of the two paths to obtain a multi-scale characteristic diagram;

2. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: and acquiring a source domain low-resolution remote sensing image data set and a target domain high-resolution remote sensing image data set in the first step through a remote sensing satellite, wherein the source domain low-resolution remote sensing image data set comprises a low-resolution original image and a label data image which is artificially marked, and the target domain high-resolution remote sensing image data set comprises a high-resolution original image.

3. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: preprocessing the source domain image data set and the target domain training set image remote sensing image data in the first step specifically comprises image cutting, image sampling and data enhancement of original images in a training set;

4. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: the second step of building the super-resolution remote sensing image semantic segmentation network comprises the following steps:

5. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: the third step specifically comprises:

6. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: the fourth step specifically comprises:

7. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain adaptation according to claim 4, characterized in that: the network structure of the feature coding module in step 2.1 is as follows:

the fourth layer is a maximum pooling layer: the convolutional layer is followed by the largest pooling layer with step size 2.

8. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that:

9. The unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: in the third step, the data set used in training the remote sensing image super-resolution network is low-resolution target domain data, initial high-resolution target domain data and low-resolution source domain data which are subjected to down-sampling;

L_dsr＝Ε_S[(I_s-1)²]+Ε_T[(I_t)²]

L_dsrinv＝Ε_S[(I_s)²]+Ε_T[(I_t-1)²]；

10. the unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation according to claim 1, characterized in that: in the fourth step, a loss function used in training the remote sensing image semantic segmentation network is a Dice coefficient loss function and a cross entropy loss function which are jointly used as loss functions, wherein a calculation formula of the cross entropy loss function is as follows:

the calculation formula of the Dice coefficient loss function is as follows:

L_ds＝Ε_S[(L_s)²]+Ε_T[(L_t-1)²]

L_dsinv＝Ε_S[(L_s-1)²]+Ε_T[(L_t)²]；