CN116912595A

CN116912595A - Cross-domain multi-mode remote sensing image classification method based on contrast learning

Info

Publication number: CN116912595A
Application number: CN202310959584.4A
Authority: CN
Inventors: 董文倩; 杨岳广; 曲家慧; 杨腾; 肖嵩; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-08-01
Filing date: 2023-08-01
Publication date: 2023-10-20

Abstract

A cross-domain multi-mode remote sensing image classification method based on contrast learning comprises the following steps: a pre-training stage, which is to perform data preprocessing on the source domain data; extracting source domain data characteristics to obtain source domain fusion characteristics; inputting the source domain fusion characteristics into a classifier to obtain a classification result; a cross-domain contrast learning stage, namely re-inputting source domain data and target domain data and preprocessing the source domain data and the target domain data; initializing source domain and target domain networks at the pre-training stage by utilizing the optimal parameters of the source domain network obtained at the pre-training stage, and initializing the characteristic queues of each class of the target domain; extracting data characteristics of a source domain and a target domain to obtain fusion characteristics of the source domain and the target domain; obtaining a classification result and high-dimensional characteristics, and performing domain self-adaptive alignment by using contrast learning; the back propagation updates the source domain network, the momentum updates the target domain network, and the parameters of the optimal target domain network and the classification result are saved. The invention can realize the unsupervised classification of the target domain, reduce the inter-domain difference between the source domain and the target domain, and remarkably improve the classification precision of the remote sensing image of the target domain.

Description

Cross-domain multi-mode remote sensing image classification method based on contrast learning

Technical Field

The invention belongs to the technical field of remote sensing image classification, and particularly relates to a cross-domain multi-mode remote sensing image classification method based on contrast learning.

Background

In the field of remote sensing, multi-source remote sensing data based on hyperspectral images have been systematically applied to land utilization/land cover classification, target detection and environmental change monitoring. In particular, hyperspectral images can provide detailed spectral information, while remote sensing data from other sources can provide other complementary information, such as lidar data providing meaningful elevation and spatial information. Multimodal data improves classification accuracy with its unique advantages over single modality.

In recent years, deep learning has achieved tremendous success in many image processing applications. Deep networks, such as convolutional layers, have a strong ability to extract high-level features for pattern recognition. In light of this, many efforts have applied deep neural networks to remote sensing image classification and have demonstrated superior performance when large amounts of tagged data are available, these methods being known as supervised learning methods. To fuse spatial information and spectral information for more accurate classification, two-dimensional and three-dimensional convolution layers are also used to extract deep features of remote sensing images.

However, it is laborious and expensive to perform the tagging of large-scale training data to obtain supervised data, and one potential solution is to migrate a model trained on a tagged source domain to a desired untagged target domain, however, directly performing model migration often results in a decrease in the classification performance of the target domain due to domain offset or misalignment distribution.

Domain adaptation is an effective measure to solve the above-mentioned problems. Domain adaptation mainly includes two main types of methods, a traditional domain adaptation method and a domain adaptation method based on deep learning.

The traditional domain adaptation method mainly comprises the following steps: feature-based adaptation that adjusts the source domain samples and the target domain samples to the same feature space using a mapping f so that the two can be aligned in this feature space; based on the adaptation of the examples, considering that there are always some samples in the source domain that are very similar to the samples in the target domain, it multiplies the loss of all samples in the source domain by a weight when training, and the more similar the samples in the target domain, the greater the weight; based on the self-adaption of the model parameters, the model can work on the target domain better by finding new model parameters theta' and migrating the parameters.

Domain adaptation method based on deep learning: a maximum average difference-based method for reducing a target domain generalization error by reducing a difference between two domains, a common migrated component analysis for reducing a distribution deviation by mapping a source domain and a target domain into a regenerated kernel hilbert space and measuring a difference between two mapped domain data distributions by using the maximum average difference, and constructing a regularization term to restrict a learned representation at the time of feature learning by using the maximum average difference so that features on the two domains are as identical as possible; a countermeasure-based method, which generates a feature by a generator and then lets a discriminator discriminate whether it is a feature of a source domain or a target domain, if it is not discriminated, it is indicated that the source domain and the target domain are identical in the feature space; the reconstruction-based method, such as DRCN, encodes the source domain and target domain samples by an encoder, then classifies the source domain features by a classifier, and decodes the target domain features by a decoder, so that the samples of the target domain can be restored as much as possible, and the feature space where the next generated features are located is similar on the source domain and target domain samples.

In the domain adaptation method based on deep learning, it is very complicated to design an inter-domain distance representation based on the method of maximum average difference; when the feature extraction network after the countermeasure training is too strong, the countermeasure-based method can lead to mismatch of different category distributions of the source domain and the target domain in the feature space, so that the classifier has poor classifying effect on the target domain sample; reconstruction-based methods rely on powerful feature extractors and are relatively sensitive to noise and outliers, which can lead to model bias in processing the target domain data.

Disclosure of Invention

In order to overcome the problems in the prior art, the invention aims to provide a cross-domain multi-mode remote sensing image classification method based on contrast learning, which can realize the target domain unsupervised remote sensing image classification and effectively solve the problem that the remote sensing image labels are difficult to acquire; the invention normalizes the mean variance of the source domain and the target domain data, thereby effectively reducing the inter-domain difference; according to the invention, a two-step training strategy of source domain-target domain network contrast learning cross-domain training is provided after source domain pre-training, so that the network convergence speed is effectively increased; according to the method, contrast learning is integrated into cross-domain training, so that the inter-domain difference between a source domain and a target domain is effectively reduced, and the classification accuracy of the target domain remote sensing image is effectively improved.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a cross-domain multi-mode remote sensing image classification method based on contrast learning comprises the following steps:

s101: the method comprises the steps of firstly inputting hyperspectral and laser radar image data of a source domain to be classified, and carrying out data preprocessing on the source domain data;

s102: extracting the characteristics of the preprocessed source domain data by using a convolution layer, flattening the characteristics, and then splicing to obtain source domain fusion characteristics;

s103: inputting the source domain fusion characteristics into a classifier to obtain a classification result, repeating S102 to S103, training the network for multiple times, and storing optimal parameters of the source domain network;

s104: in the cross-domain contrast learning stage, firstly, re-inputting source domain hyperspectral and source domain laser radar image data, inputting target domain hyperspectral and target domain laser radar image data, and preprocessing the data; initializing a characteristic queue of each class corresponding to the target domain by taking the optimal parameters of the source domain network as initial parameters of the source domain network and the target domain network at the stage;

s105: extracting the characteristics of the source domain and the target domain data preprocessed in the step S104 by using a convolution layer, respectively flattening and splicing the characteristics of the source domain and the target domain to obtain a source domain fusion characteristic and a target domain fusion characteristic;

s106: inputting the source domain fusion features and the target domain fusion features into a classifier and a mapper to obtain classification results and high-dimensional features of the source domain and the target domain, updating a feature queue of S104 according to the classification results of the target domain, and performing contrast learning at the same time;

s107: and (5) reversely propagating and updating the source domain network, updating the target domain network by momentum, repeating S104 to S107, and storing parameters of the optimal target domain network and classification results thereof. As a further technical solution of the present invention, the step S101 specifically includes:

in the pre-training stage, firstly, the mean variance normalization is carried out on the input source domain hyperspectral and source domain laser radar image data to obtain a source domain hyperspectral image H _S And source domain lidar image L _S Obtained by the following two formulas:

wherein H' _S For a source domain hyperspectral image that is not normalized,is the average value of the source domain hyperspectral image, +.>Standard deviation of the hyperspectral image in the source domain;

L' _S for a source-domain lidar image that is not normalized,is the average value of the source-domain lidar image, +.>Standard deviation of a source domain laser radar image;

respectively for source domain hyperspectral image H _S And source domain lidar image L _S Edge filling is carried out, and each pixel point before filling is taken as a center, so that a source domain hyperspectral image block and a source domain laser radar image block which are in one-to-one correspondence are constructed;

next, 200 pairs are randomly selected from the corresponding source domain hyperspectral image blocks and source domain laser radar image blocks with labels according to the category of the central pixel point from each category to serve as a training set, and the rest serve as a test set.

As a further technical solution of the present invention, the step S102 specifically includes:

in a source domain network, a hyperspectral image processing branch and a laser radar image processing branch respectively apply two layers of convolution layers to perform feature extraction;

assume a source domain hyperspectral image blockThe size is C multiplied by 11, wherein C is the channel number of the hyperspectral image block, the size of a convolution kernel is constructed to be 64 multiplied by 3, the step length is 2, the filling is 1, the size of the convolution kernel is 32 multiplied by 3, the step length is 2, the filling is 1, two layers of convolution layers are filled, each layer is output and then input into an activation function ReLU, and finally the source domain hyperspectral image characteristic is obtained>The size is 32 multiplied by 3;

assume a source domain lidar image blockThe size is 1 multiplied by 11, the size of a convolution kernel is 16 multiplied by 3, the step length is 2, the filling is 1, the size of the convolution kernel is 32 multiplied by 3, the step length is 2, the filling is 1, two layers of convolution layers are filled, each layer is output and then input with an activation function ReLU, and finally the image characteristics of the source domain laser radar are obtained>The size is 32 multiplied by 3;

the source domain hyperspectral image obtained by convolution is characterizedAnd Source Domain lidar image feature ∈ ->Flattening the characteristic under the condition that the channel dimension is kept unchanged to obtain the source domain hyperspectral image characteristic with the size of 32 multiplied by 9 +.>And source-domain lidar image features of size 32×9->

Characterizing the resulting source domain hyperspectral imageAnd Source Domain lidar image feature ∈ ->Splicing and fusing in the channel dimension to obtain the source domain fusion characteristic +.>The size of the material is 64 multiplied by 9;

as a further technical solution of the present invention, the step S103 specifically includes:

selecting a network consisting of a linear layer, a batch normalization layer and a ReLu layer as a classifier, wherein the last layer of the classifier is the linear layer, and the number of output channels of the classifier is the number of ground object categories; let the output be Y _S The real label is Y _S The resulting cross entropy loss is as follows:

wherein M is the number of categories; y is _ic Is a sign function (0 or 1), if the true class of the sample i is equal to c, taking 1, otherwise taking 0; p (P) ^S Is Y _S A probability vector of a predicted sample obtained through a Softmax function;the prediction probability of the observation sample i belonging to the category c;

the network learning is guided in a supervision manner through the cross entropy loss function, the source domain network parameters are updated by using a back propagation and random gradient descent method, and the source domain network parameters which are optimally represented on a source domain test set are stored.

As a further technical solution of the present invention, the specific steps of step S104 are as follows:

in the cross-domain contrast learning stage, preprocessing of source domain data is consistent with that in the pre-training stage, and mean variance normalization is carried out on an input target domain hyperspectral image and a target domain laser radar image to obtain a target domain hyperspectral image H _T And target area lidar image L _T Obtained by the following two formulas:

wherein H' _T For target domain hyperspectral images that are not normalized,is the average value of the hyperspectral image of the target domain,standard deviation of the hyperspectral image of the target domain;

L' _T for target field lidar images that are not normalized,is the average value of the target field lidar image,the standard deviation of the target domain laser radar image;

then respectively aiming at the hyperspectral image H of the target domain _T And target area lidar image L _T And (3) performing edge filling, and constructing a target domain hyperspectral image block and a target domain laser radar image block which are in one-to-one correspondence by taking each pixel point before filling as a center. The target domain data has no real label, a training set and a testing set are not needed to be divided, and all samples are directly taken for training;

loading the source domain network optimal parameters stored in the S103 as initial parameters of the source domain network and the target domain network in subsequent training;

the invention sets a queue with the capacity of K for each category of the target domain data to store corresponding characteristics and provides data for the subsequent calculation and comparison of learning loss values; and testing all samples of the target domain by using the initialized target domain network, taking an index value of a probability vector maximum value obtained after the classifier output is subjected to a Softmax function as a pseudo tag, and if the confidence coefficient of the pseudo tag is larger than a set threshold value, taking the corresponding features output by the mapper into the corresponding category queues according to the pseudo tag, so as to complete all tests, and thus obtaining the initial feature queues of each category.

As a further technical solution of the present invention, the step S105 specifically includes:

the cross-domain contrast learning stage is that the structures of the source domain and the target domain network are completely consistent, so that the convolution kernel used for extracting the characteristics is consistent, the parameters of the convolution kernel are consistent with those of the source domain convolution kernel in the pre-training stage, and the extracted target domain hyperspectral image characteristics can be obtainedAnd target area lidar image feature +.>Is 32 x 3 and 32 x 3;

the flattening and splicing operation of the target domain is consistent with the source domain, and the target domain is spliced in the channel dimension, so that the source domain fusion characteristic of the stage can be obtainedFusion features with the target Domain->The sizes of the two components are 64 multiplied by 9.

As a further technical solution of the present invention, the step S106 specifically includes:

in this stage, the target domain network adopts the same network composed of a linear layer, a batch normalization layer and a ReLu layer as the source domain network as a classifier, the last layer of the classifier is a linear layer, the number of output channels of the classifier is the number of ground object categories, but the target domain data is not provided with a real label, so that the cross entropy loss of the target domain is not required to be calculated, only the cross entropy loss of the source domain is required to be calculated, and the cross entropy loss calculation of the source domain is consistent with the calculation mode of the pre-training stage;

the source domain and the target domain network at this stage both adopt a network consisting of a linear layer, a batch normalization layer and a ReLu layer as mappers, the mappers map features to a high-dimensional space, and a feature queue stores corresponding high-dimensional features output by the mappers according to a target domain network classification result and lists some earlier features;

according to the invention, when the cross-domain training is performed in contrast learning, a feature queue of a corresponding class of a target domain is selected according to the real label of a current sample of a source domain, all features of the queue are averaged to obtain a high-dimensional feature, the corresponding high-dimensional feature output by the source domain sample through a mapper is regarded as a positive sample of the high-dimensional feature, the high-dimensional features in the feature queues corresponding to other classes are regarded as negative samples, and contrast loss is calculated by using an InFoNCE loss function, wherein the following formula is shown:

wherein q is the high-dimensional characteristic of the corresponding queue after the average value is obtained, and k ₊ N is the number of all samples (containing positive and negative samples) and τ is a temperature super-parameter;

as a further technical solution of the present invention, the step S107 specifically includes:

at this stage, the entropy loss function L is crossed by the source domain _CE And contrast loss L _CL Guiding source domain network learning, and updating source domain network parameters by using a back propagation and random gradient descent method;

the target domain network gradient will not back propagate, and the target domain network parameters are updated by the following momentum update method:

θ＝m·θ+(1-m)·ξ

wherein the target domain network parameter is theta, the source domain network parameter is zeta and m is super parameter;

training for multiple times until the network converges, and storing the parameters and the classification results of the target domain network with the optimal testing effect in the target domain sample.

The invention has the beneficial effects that:

1. the invention uses the image block for training, and reduces the hardware requirement on the premise of ensuring the integrity of the space information.

2. The invention adopts the mean variance to normalize the data of the source domain and the target domain, so that the data of the source domain and the target domain are approximately subjected to 0-1 normal distribution in the whole, and the inter-domain difference is reduced.

3. The invention provides a two-step training strategy, namely, source domain pre-training is performed firstly, and then the optimal parameters of the source domain network obtained by the source domain pre-training are loaded to the source domain network and the target domain network in the second-stage cross-domain contrast learning training as initial parameters, so that the network training speed is effectively accelerated.

4. According to the method, contrast learning is integrated into cross-domain training, so that the encoded representation can capture information shared between two domains in the same class, meanwhile, the inter-domain difference between a source domain and a target domain is effectively reduced, and the classification precision of the target domain remote sensing image is effectively improved.

5. The invention can realize the non-supervision remote sensing image classification of the target domain and effectively solve the problem that the remote sensing image label is difficult to acquire.

Drawings

Fig. 1 is a flowchart of a cross-domain multi-mode remote sensing image classification method based on contrast learning according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a source domain network structure in a pre-training stage according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a cross-domain contrast learning training network according to an embodiment of the present invention.

Fig. 4 is a target domain remote sensing image classification result diagram and a real label diagram of the method according to the embodiment of the present invention, where fig. 4 (a) is a real label diagram and fig. 4 (b) is a target domain remote sensing image classification result diagram obtained by the method according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

As shown in FIG. 1, the cross-domain multi-mode remote sensing image classification method based on contrast learning provided by the invention comprises the following steps of

s104: in the cross-domain contrast learning stage, firstly, re-inputting source domain hyperspectral and source domain laser radar image data, inputting target domain hyperspectral and target domain laser radar image data, and preprocessing the data; taking the optimal parameters of the source domain network obtained in the pre-training stage in the S103 as initial parameters of the source domain network and the target domain network in the stage, and initializing the characteristic queues of the target domain corresponding to each category;

s105: and extracting the characteristics of the source domain and the target domain data preprocessed in the step S104 by using a convolution layer, respectively flattening and splicing the source domain and the target domain characteristics to obtain a source domain fusion characteristic and a target domain fusion characteristic.

S106: inputting the source domain fusion features and the target domain fusion features into a classifier and a mapper to obtain classification results and high-dimensional features, updating a feature queue according to the classification results of the target domain, and performing contrast learning at the same time;

and S107, back propagation updates the source domain network, momentum updates the target domain network, and repeatedly S104 to S107, and saves parameters and classification results of the optimal target domain network.

As shown in fig. 1, the cross-domain multi-mode remote sensing image classification method based on contrast learning provided by the invention has the following implementation process:

(1) Fig. 2 shows a source domain network model of the pre-training phase. In the pre-training stage, firstly, source domain hyperspectral and source domain laser radar image data to be classified are input, and data preprocessing is carried out on the source domain data.

In order to enable the characteristics which are possibly large in distribution difference originally to have the same weight influence on the model, mean variance standardization is carried out on the input source domain hyperspectral image and source domain laser radar image data, and the obtained characteristics meet normal distribution with a mean value of 0 and a standard deviation of 1. Processed source domain hyperspectral image H _S And source domain lidar image L _S The method is obtained by the following two formulas:

hyperspectral images usually contain abundant information, the whole image is directly taken for training, the requirement on hardware is high, but single pixel points are used for training, and the spatial correlation among pixels is ignored. In view of this, the present invention is directed to a source domain hyperspectral image H _S And source domain lidar image L _S And (3) edge filling is carried out, each pixel point before filling is taken as a center, a source domain hyperspectral image block and a source domain laser radar image block which are in one-to-one correspondence are constructed, the size of the image block is C multiplied by 11, wherein C is the number of channels of the source domain hyperspectral image block or the source domain laser radar image block, and therefore the hardware requirement is reduced on the premise that the information is basically complete.

(2) And extracting the characteristics of the preprocessed source domain data by using a convolution layer, flattening the characteristics, and then splicing to obtain the source domain fusion characteristics.

In a source domain network, a hyperspectral image processing branch and a laser radar image processing branch apply two layers of convolution layers to extract space, spectrum information and space and elevation information;

source domain high lightSpectral image blockThe size is C multiplied by 11, the size of a convolution kernel is 64 multiplied by 3, the step length is 2, the filling is 1, the size of the convolution kernel is 32 multiplied by 3, the step length is 2, the filling is 1, two layers of convolution layers are respectively input into an activation function ReLU after each layer is output, the nonlinear relation among layers of a neural network is improved, and the expression capability of the network is enhanced. Finally, the source domain hyperspectral image characteristic is obtained>The size is 32 multiplied by 3;

source field laser radar image blockThe size is 1 multiplied by 11, the convolution kernel size is 16 multiplied by 3, the step length is 2, the filling is 1, the convolution kernel size is 32 multiplied by 3, the step length is 2, the filling is 1, two layers of convolution layers are respectively arranged, each layer is output and then input with an activation function ReLU, and finally the image characteristics of the source domain laser radar are obtained>The size is 32 multiplied by 3;

the source domain hyperspectral image obtained by convolution is characterizedAnd Source Domain lidar image feature ∈ ->Flattening the feature under the condition that the channel dimension is kept unchanged to obtain the source domain hyperspectral image feature with the size of 32 multiplied by 9 +.>And source-domain lidar image features of size 32×9->

Hyperspectral images usually contain abundant spectral information, but the space information is relatively deficient, laser radar images contain abundant space-elevation information, but only hyperspectral features or laser radar features are used as classification tasks, so that the effect is not ideal, and the obtained source domain hyperspectral image featuresAnd Source Domain lidar image feature ∈ ->Splicing and fusing in the channel dimension to obtain the source domain fusion characteristic +.>The size of the material is 64 multiplied by 9;

(3) Inputting the source domain fusion characteristics into a classifier to obtain a classification result, repeating S102 to S103, training the network for multiple times, and storing optimal parameters of the source domain network.

And selecting a network consisting of a linear layer, a batch normalization layer and a ReLu layer as a classifier, wherein the last layer of the classifier is the linear layer, the number of output channels of the classifier is the number of ground object categories, and inputting the source domain fusion characteristics into the classifier to obtain a classification result. Let the output be Y _S The real label is Y _S The cross entropy loss is obtained as follows:

wherein M is the number of categories; y is _ic Is a sign function (0 or 1), if the true class of the sample i is equal to c, taking 1, otherwise taking 0; p (P) ^S Is Y _S The probability vector of the predicted sample obtained via the Softmax function,the prediction probability of the observation sample i belonging to the category c;

the source domain network learning is guided in a supervision manner through the cross entropy loss function, the source domain network parameters are updated by using a back propagation and random gradient descent method, and the source domain network parameters with optimal performance on a source domain test set are saved.

(4) The cross-domain contrast learning phase is shown in fig. 3. In the cross-domain contrast learning stage, firstly, re-inputting source domain hyperspectral and source domain laser radar image data, inputting and target domain hyperspectral and target domain laser radar image data, preprocessing the data, taking the optimal parameters of the source domain network obtained in the pre-training stage in S103 as the initial parameters of the source domain network and the target domain network in the stage, and initializing the characteristic queues corresponding to each class of the target domain.

(4a) In the cross-domain contrast learning stage, preprocessing of source domain data is consistent with that in the pre-training stage, and mean variance normalization is performed on an input target domain hyperspectral image and a target domain laser radar image to obtain a target domain hyperspectral image H _T And target area lidar image L _T Obtained by the following two formulas:

L' _T for target field lidar images that are not normalized,flattening for target area lidar imagesThe average value of the two values,the standard deviation of the target domain laser radar image;

after the mean variance normalization is carried out on the source domain data and the target domain data, the source domain data and the target domain data are approximately subjected to 0-1 normal distribution in the whole, so that the inter-domain difference is reduced.

(4b) Loading optimal parameters of a source domain network in a pre-training stage as initial parameters of the source domain network and a target domain network in subsequent training;

(4c) The invention sets a queue with the capacity of K for each category of the target domain data to store corresponding characteristics and provides data for the subsequent calculation and comparison of learning loss values; and testing all samples of the target domain by using the initialized target domain network, taking an index value of a maximum value of a probability vector obtained by a classifier output after a Softmax function as a pseudo tag, and if the maximum value of the probability vector, namely the confidence coefficient of the pseudo tag, is larger than a set threshold value, taking the corresponding features output by the mapper into the corresponding category queues according to the pseudo tag to finish all tests, so that the initial feature queues of each category are obtained. In the subsequent training process, the feature queue is dynamically updated, the latest features are incorporated, and some features which are incorporated earlier are dequeued, so that the feature difference between early enqueuing and later enqueuing in the queue is not great.

(5) And extracting the characteristics of the source domain and the target domain data by using the convolution layer, respectively flattening and splicing the source domain and the target domain characteristics to obtain the source domain fusion characteristics and the target domain fusion characteristics.

The structure of the source domain and the target domain network is completely consistent in the cross-domain contrast learning stage, soThe convolution kernel used for extracting the characteristics is consistent, the parameters of the convolution kernel are consistent with the parameters of the source domain convolution kernel in the pre-training stage, and the extracted target domain hyperspectral image characteristics can be obtainedAnd target area lidar image feature +.>Is 32 x 3 and 32 x 3;

the flattening and splicing operation of the hyperspectral image features of the target domain and the laser radar image features of the target domain are consistent with the source domain, and are spliced in the channel dimension, so that the source domain fusion features at the stage can be obtainedFusion features with the target Domain->The sizes of the two components are 64 multiplied by 9;

(6) And inputting the source domain fusion features and the target domain fusion features into a classifier and a mapper to obtain classification results and high-dimensional features, updating a feature queue according to the classification results of the target domain, and performing contrast learning.

In this stage, the target domain network adopts the same network composed of a linear layer, a batch normalization layer and a ReLu layer as the source domain network as a classifier, the last layer of the classifier is the linear layer, the output channel number of the classifier is the ground object class number, but the target domain data is not provided with a real label, so that the target domain cross entropy loss is not required to be calculated, only the source domain cross entropy loss is required to be calculated, and the calculation of the source domain cross entropy loss is consistent with the calculation mode of the pre-training stage;

in contrast learning, each sample is usually mapped to a certain projection space, and the distance of the positive sample is shortened in the space, the distance of the negative sample is shortened, the representation model is forced to ignore surface factors, and the inherent consistent structural information of the sample is learned. The invention adopts a linear layer, a batch normalization layer and a ReLu layer as mappers in a source domain network and a target domain network, the mappers map features to a high-dimensional space, and a feature queue of the target domain stores corresponding high-dimensional features according to a target domain network classification result and lists some earlier features.

According to the invention, when the contrast learning cross-domain training is performed, a characteristic queue of a corresponding class of a target domain is selected according to the real label of a current sample of a source domain, all the characteristics of the queue are averaged to obtain a high-dimensional characteristic Anchor, the corresponding high-dimensional characteristic output by the source domain sample through a mapper is regarded as a positive sample of the high-dimensional characteristic, the high-dimensional characteristics in the characteristic queues corresponding to other classes are regarded as negative samples, and the contrast loss is calculated by using an InFoNCE loss function, wherein the following formula is shown:

wherein q is the high-dimensional characteristic of the corresponding queue after the average value is obtained, and k ₊ For the high-dimensional characteristics of the source domain samples output by the mapper, N is the number of all samples (containing positive and negative samples), and τ is a temperature super-parameter;

according to the invention, through the contrast loss function, the distance between the Anchor and the positive sample is shortened in the high-dimensional mapping space, and the distance between the Anchor and the negative sample is further shortened, so that the encoded representation can capture information shared between two domains.

(7) And updating the source domain network by using back propagation, updating the target domain network by using momentum, repeating training the network, and storing the optimal target domain network parameters and the classification results thereof.

θ＝m·θ+(1-m)·ξ

the consistency of the source domain network and the target domain network can be ensured through momentum updating, so that the target domain network evolves more smoothly, the characteristic difference in the queue caused by the abrupt change of the target domain network parameter is avoided, and the consistency of the representation is destroyed;

training for multiple times until the network converges, and storing the parameters of the target domain network with the optimal testing effect of the network in the target domain sample and the classification result thereof. The technical effects of the present invention are described in detail below in conjunction with simulation experiments:

(1) Simulation experiment conditions:

the hardware platform of the simulation experiment of the invention is: NVIDIA GeForce 3090 and Intel (R) Core (TM) i9-10900X CPU@3.70GHz.

The software platform of the simulation experiment of the invention is: operating systems ubuntu18.06, python 3.7 and pytorch1.12.

The data set used by the simulation experiment of the present invention includes Houston2013-LiDAR and Houston2018-LiDAR image data, both of which are hyperspectral images and their corresponding LiDAR images. Houston2013 and Houston2018 are hyperspectral images of Houston university campus and its nearby scenes taken by different sensors at different times. Houston2013 image data consists of 349×1905 pixels, comprises 144 spectral bands, has a wavelength range of 380-1050nm, and has an image spatial resolution of 2.5m. The Houston2018 image data is the same as the Houston2013 image data in wavelength range but contains only 48 spectral bands, and the spatial resolution of the image is 1m. Seven types of ground features with the same category exist in the two scenes. 48 spectral bands (wavelength range 0.38-1.05 um) are extracted from a scene of Houston2013 image data corresponding to a scene of Houston2018 image data, and an overlapping region is selected, wherein the size of the region is 209×955. Houston2013-LiDAR image data is selected as source domain data, and Houston2018-LiDAR image data is selected as target domain data. The sample classes and amounts are listed in Table 1.

Table 1 source and destination domain data sample numbers

(2) Experimental content and results analysis

In order to verify the validity of the proposed method. We selected three widely used cross-domain remote sensing image classification methods, including DeepCoral, DAAN, and DSAN. And respectively carrying out cross-domain remote sensing image classification on the input Houston2013-LiDAR and Houston2018-LiDAR image data to obtain a final target domain remote sensing image classification result diagram.

The prior art contrast cross-domain remote sensing image classification method used in the invention refers to:

the prior art Deep CORAL cross-domain remote sensing image classification method is a cross-domain remote sensing image classification method proposed by Sun et al in the literature of Deep CORAL: correlation Alignment for Deep Domain adaptation. In: huan, G., J gou, H. (eds) Computer Vision-ECCV 2016Workshops.ECCV 2016.Lecture Notes in Computer Science (), vol 9915.Springer, cham.).

The prior art DAAN cross-domain remote sensing image classification method refers to a cross-domain remote sensing image classification method proposed by Yu et al in documents "Transfer Learning with Dynamic Adversarial Adaptation Network,2019 IEEE International Conference on Data Mining (ICDM), beijin, china,2019, pp.778-786.

The prior art DSAN cross-domain remote sensing image classification method is a cross-domain remote sensing image classification method proposed by Zhu et al in documents "Deep Subdomain Adaptation Network for Image Classification," in IEEE Transactions on Neural Networks and Learning Systems, vol.32, no.4, pp.1713-1722, april 2021.

Objective evaluation is carried out on the target domain image classification results obtained by the four methods by using two evaluation indexes (total accuracy OA and chi-square coefficient Kappa). The total accuracy OA represents the proportion of correctly classified samples to the total samples, and the closer the OA value is to 1, the higher the detection accuracy is; the Kappa coefficient Kappa characterizes the consistency of the obtained result with the reference graph, and the closer the Kappa value is to 1, the better the method performance is. The values of the various statistical evaluation indexes are plotted in table 2.

TABLE 2 quantitative analysis Table of classification results of target remote sensing images obtained by performing cross-domain remote sensing image classification on Houston2013-LiDAR and Houston2018-LiDAR image data according to the present invention

	DeepCoral	DAAN	DSAN	Proposed
					OA	52.30	53.27	57.75	67.76
Kappa	33.94	35.81	39.30	53.46

As can be seen from the combination of Table 2, the total accuracy OA of the invention reaches 67.76%, the Kappa value reaches 53.46, and compared with the best (DSAN) in the currently listed comparison method, the method has the advantages that the method is respectively improved by 10.01% and 14.16, and the method is obviously higher than the method in the prior art, so that the classification accuracy of the remote sensing image in the target domain can be effectively improved. Fig. 4 (b) is a diagram of classification results of the target domain remote sensing image obtained by the method of the present invention, and fig. 4 (a) is a diagram of a real label thereof.

The simulation experiment shows that the cross-domain multi-mode remote sensing image classification method based on the contrast learning realizes the non-supervision remote sensing image classification of the target domain, integrates the contrast learning into the cross-domain training, effectively reduces the inter-domain difference between the source domain and the target domain, and effectively improves the classification precision of the remote sensing image of the target domain.

The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A cross-domain multi-mode remote sensing image classification method based on contrast learning is characterized by comprising the following steps:

s107: and (5) reversely propagating and updating the source domain network, updating the target domain network by momentum, repeating S104 to S107, and storing parameters of the optimal target domain network and classification results thereof.

2. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S101 specifically comprises:

3. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S102 is specifically:

assume a source domain lidar image blockWith the size of 1 multiplied by 11, a two-layer roll with the convolution kernel size of 16 multiplied by 3, the step length of 2, the filling of 1 and the convolution kernel size of 32 multiplied by 3, the step length of 2 and the filling of 1 is constructedLaminating, inputting an activation function ReLU after outputting each layer, and finally obtaining the image characteristics of the source domain laser radar>The size is 32 multiplied by 3;

Characterizing the resulting source domain hyperspectral imageAnd Source Domain lidar image feature ∈ ->Splicing and fusing in the channel dimension to obtain the source domain fusion characteristic +.>The size of the dimension is 64×9.

4. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S103 is specifically as follows:

selection ofThe network consisting of the linear layer, the batch normalization layer and the ReLu layer is used as a classifier, the last layer of the classifier is the linear layer, and the number of output channels of the classifier is the number of ground object categories; let the output be Y _S The real label is Y _S The resulting cross entropy loss is as follows:

5. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the specific steps of step S104 are as follows:

wherein H' _T For target domain hyperspectral images that are not normalized,is the average value of the hyperspectral image of the target domain, +.>Standard deviation of the hyperspectral image of the target domain;

L' _T for target field lidar images that are not normalized,for the mean value of the target-domain lidar image, +.>The standard deviation of the target domain laser radar image;

then respectively aiming at the hyperspectral image H of the target domain _T And target area lidar image L _T Performing edge filling, and constructing a target domain hyperspectral image block and a target domain laser radar image block which are in one-to-one correspondence by taking each pixel point before filling as a center; the target domain data has no real label, a training set and a testing set are not needed to be divided, and all samples are directly taken for training;

and loading the saved source domain network optimal parameters as initial parameters of the source domain network and the target domain network in subsequent training S103.

6. The contrast learning-based cross-domain multi-mode remote sensing image classification method according to claim 5, wherein each class of target domain data is provided with a queue with a capacity of K for storing corresponding characteristics, and data is provided for subsequent calculation of contrast learning loss values; and testing all samples of the target domain by using the initialized target domain network, taking an index value of a probability vector maximum value obtained after the classifier output is subjected to a Softmax function as a pseudo tag, and if the confidence coefficient of the pseudo tag is larger than a set threshold value, taking the corresponding features output by the mapper into the corresponding category queues according to the pseudo tag, so as to complete all tests, and thus obtaining the initial feature queues of each category.

7. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S105 is specifically as follows:

8. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S106 is specifically as follows:

at this stage, the source domain network and the target domain network both adopt a network consisting of a linear layer, a batch normalization layer and a ReLu layer as mappers, the mappers map features to a high-dimensional space, and feature queues store corresponding high-dimensional features output by the mappers according to the classification result of the target domain network and list some earlier features.

9. The cross-domain multi-mode remote sensing image classification method based on contrast learning according to claim 8, wherein a feature queue of a corresponding class of a target domain is selected according to a real label of a current sample of a source domain, all features of the queue are averaged to obtain a high-dimensional feature, the corresponding high-dimensional feature output by a source domain sample through a mapper is regarded as a positive sample of the high-dimensional feature, the high-dimensional features in the feature queues corresponding to other classes are regarded as negative samples, and contrast loss is calculated by using an InFoNCE loss function, wherein the following formula is shown:

wherein q is the high-dimensional characteristic of the corresponding queue after the average value is obtained, and k ₊ For high-dimensional characteristics of source domain samples, N is the number of all samples (including positive and negative samples), and τ is a temperature hyper-parameter.

10. The method for classifying cross-domain multi-modal remote sensing images based on contrast learning according to claim 1, wherein the step S107 is specifically as follows:

θ＝m·θ+(1-m)·ξ