CN115471739A - Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning - Google Patents
Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning Download PDFInfo
- Publication number
- CN115471739A CN115471739A CN202210927707.1A CN202210927707A CN115471739A CN 115471739 A CN115471739 A CN 115471739A CN 202210927707 A CN202210927707 A CN 202210927707A CN 115471739 A CN115471739 A CN 115471739A
- Authority
- CN
- China
- Prior art keywords
- domain
- image
- target
- target domain
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7753—Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning, which comprises the following steps: a) Acquiring a remote sensing image and constructing input data; b) Constructing a loss function based on self-supervision contrast learning and combining a known sample and an unknown sample of a target domain image, constructing a depth domain adaptive learning network, and training the depth domain adaptive learning network by using input data and the loss function; c) Classifying the target domain images by using a depth domain adaptive learning network, extracting target image feature vectors of the target domain images to construct a feature database, extracting query image feature vectors of the target domain query images, calculating Euclidean distances between the query image feature vectors and the target image feature vectors in the feature database, and selecting a required retrieval target based on the Euclidean distances. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning can still have good cross-domain classification and retrieval precision under the condition that unknown classes exist in the target domain.
Description
Technical Field
The invention relates to the technical field of optical remote sensing image retrieval, in particular to a cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning.
Background
In recent years, the progress of earth observation technology provides more and more high-resolution remote sensing images for human beings, brings huge opportunities for the remote sensing field, and greatly promotes the application of the remote sensing images in different fields. The remote sensing image scene classification and retrieval are basic tasks in the field of remote sensing image interpretation, can quickly and accurately understand and manage remote sensing image data, and play an important role in the fields of environment monitoring, land utilization, visual navigation and the like.
The deep Convolutional Neural Networks (CNNs) developed in recent years have strong feature fitting capability, and exhibit great superiority in remote sensing scene classification and retrieval tasks. The general process is that firstly, a main network pre-trained by a general large-scale image data set (such as ImageNet) is finely adjusted on a remote sensing image data set, and then the activation output of the network is extracted to be used as an image feature representation for retrieval or classification.
However, most of the existing CNN-based methods are supervised, and it is usually assumed that the training set and the test set share the same data distribution, and in practical applications, due to the difference of imaging conditions such as sensors, shooting angles, shooting weather, etc., the feature of the same type has a huge difference in different data distributions, which is called data migration, when there is data migration between the training set and the test set, the generalization effect of the model on the test set is poor, and re-labeling the test set is time-consuming, labor-consuming and impractical; in addition, most of existing domain adaptation remote sensing scene classification or retrieval methods are proposed for closed set scenes, that is, a target domain and a source domain share the same label space, and in a complex actual scene, the assumption is easily violated because the class of the source domain is often incomplete, the source domain cannot cover all classes, and the target domain may contain class samples which are not shared by the source domain.
In view of the above, it is necessary to design a cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning.
Disclosure of Invention
The invention aims to provide a cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning, which can still have good classification and retrieval precision of a target domain under the condition that the target domain has unknown classes.
In order to solve the technical problem, the invention provides a cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning, which comprises the following steps:
a) Acquiring a remote sensing image, and dividing a source domain image and a target domain image of the remote sensing image to construct input data;
b) Constructing a loss function based on self-supervision contrast learning and combined with the known class and the unknown class of the target domain image, constructing a depth domain adaptive learning network, and training the depth domain adaptive learning network by using the input data and the loss function;
c) Classifying the target domain images by using the trained depth domain adaptive learning network, extracting target image feature vectors of the target domain images to construct a feature database, extracting query image feature vectors of the target domain query images, calculating Euclidean distances between the query image feature vectors and all the target image feature vectors in the feature database, arranging according to the Euclidean distances, and obtaining the required retrieval target according to a set Euclidean distance range.
Further, the step of constructing the input data comprises: extracting from the data set of the remote sensing imageA stem image {1,2, …, N }, and constructing the source domain imageThe source domain image comprises n s Image with label source domainRepresenting annotated source domain imagesThe corresponding label, wherein,representing the label space of the image with the labeling source domain, and C representing the total number of categories of the image with the labeling source domain; the target domain image isThe target domain image comprises n t Label-free target domain imageWherein the target domain imageThe label space of (a) is: {1,2, …, C +1}, C +1 representing the unknown class of the label-free target domain image.
Further, the deep domain adaptive learning network comprises a plurality of feature coding networks f (-), a plurality of contrast learning networks g (-), and a plurality of classifiers c (-).
Further, the feature coding network f (-) is a depth residual network with a full connection layer removed, and an average pooling layer of the depth residual network is replaced by a bottleneck layer.
Further, the contrast learning network g (-) is a perceptron with a ReLU (modified linear unit) activation function.
Further, the classifier c (-) is a fully connected network, and the output dimension of the classifier c (-) is consistent with the class number of the target domain image.
Further, the constructing step of the loss function includes:
b11 Construct source domain classification loss: and carrying out supervised learning on the source domain image, and calculating the classification accuracy by adopting cross entropy loss:
wherein L is softmax In order to classify the function of the loss,a source domain annotated image representing the source domain imageTrue class distribution, functionA source domain weakly enhanced sample class probability distribution representing the classifier output,a collection of label exemplars representing an annotated image in a source domain;
b12 Construct the self-supervised contrast loss: constructing a target domain strong enhancement sample of the target domain imageAnd target domain weakly enhanced samplesTo calculate the contrast loss L ssl :
Wherein sim (-) is the similarity measure function, θ is the scaling factor, A ∈ {0,1} is an indication function for evaluating whether k equals j, B represents the number of samples selected by one training;
b13 Construct a known class identification penalty as:
where μ represents the proportion of samples within a training run that meet the selection requirements for a known class threshold, H (-) represents the cross entropy loss,for weakly enhancing samples from the target domainThe collected set of the target domain known pseudo labels obtained through screening, ind represents the target domain weak enhancement sampleThe category to which the known pseudo label belongs after screening, and ind epsilon {1,2, …, C },representing strongly enhanced samples of the target domainIs determined based on the predicted class probability distribution of (c),a collection of labeled exemplars representing strongly enhanced exemplars of the target domain;
b14 Construct unknown class identification loss: consistency classification loss L of unknown class identification loss as high-confidence unknown class sample unknown :
Wherein, the first and the second end of the pipe are connected with each other,for weakly enhancing samples from the target domainScreening the obtained collection of the unknown pseudo labels of the target domain,representing strongly enhanced samples of the target domainA predicted class probability distribution of (a);
b15 ) the constructed total loss function L is:
L=L softmax +αL ssl +βL known +γL unknown
where α, β and γ are parameters that balance the optimization objectives of the model.
Further, the target domain weakly enhances samplesBy comparing the label-free target domain imageObtaining the product by random cutting and turning; the target domain strongly enhanced sampleBy comparing the label-free target domain imageObtaining by using a random enhancement method; the source domain weakly enhances the sample by matching the annotated source domain imageAnd obtaining the target by random cutting and overturning.
Further, the training step of the deep domain adaptive learning network comprises:
b21 The source domain weakly enhanced sample and the target domain weakly enhanced sampleAnd target domain strongly enhanced samplesInputting into the feature coding network f (-) to respectively obtain the source domain featuresTarget domain weakly enhanced image featuresAnd strong enhancement of image features in the target domain
B22 Weakly enhancing the target domain image featuresAnd the target domain strongly enhances image featuresInputting the contrast learning network g (-) to obtain the embedded characteristics of the projected target domain weak enhanced imageAnd strong enhancement of image embedding characteristics in the target domainTo calculate the contrast loss L ssl ;
B23 Characterize the source domainThe target domain weakly enhances image featuresAnd the target domain strongly enhances image featuresInputting the classifier c (-) to respectively obtain the source domain weakly enhanced sample class probability distribution predicted by the classifierThe target domain weakly enhanced sample class probability distributionAnd the target domain strongly enhances the sample class probability distribution
B24 Weakly enhancing sample class probability distribution to the source domainBased on the classification loss function L softmax Calculating the classification loss of the source domain;
b25 Weakly enhancing sample class probability distribution to the target domainFirstly, finding the category of the maximum prediction probability, comparing the probability value of the category with a preset predefined threshold tau, abandoning the samples smaller than tau, reserving the samples larger than tau as pseudo label samples, and taking the category of the maximum prediction probability as a known hard label, wherein the screening formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,to representA category in which the maximum prediction probability that satisfies a threshold condition is located;
b26 Using the target domain weakly enhanced samplesKnown pseudo-labelAs strong enhancement samples of the corresponding target domainCalculates the target domain strong enhancement samplesSaid known class of (1) identifies a loss L known ;
B27 Selecting the target domain weakly enhanced sample class probability distributionAnd taking the sample with lower confidence level as a candidate unknown sample, wherein the specific selection formula is as follows:
whereinFor the preliminarily screened candidate unknown samples, t l Selecting a threshold value for the candidate sample, selecting a sample predicted that the probability of the unknown class is higher than the set unknown class sample selection threshold value as an unknown class sample,
whereinIs the candidate sampleProbability of prediction as unknown class, t uk A threshold value is chosen for the unknown class sample,unknown pseudo-label for target domain;
b28 With the target domain unknown class pseudo-tagAs a strongly enhanced sample of the target domainCalculating a consistent classification loss L of the unknown class samples unknown And obtaining the total loss function L, and updating the parameters of the deep domain adaptive learning network.
Further, the step of obtaining the retrieval target is:
c21 Extracting the query image feature vector based on the trained feature coding network;
c22 Computing Euclidean distances between the query image feature vector and each target image feature vector in the feature database one by one;
c23 According to the Euclidean distance between the target image feature vector and the query image feature vector, sorting the target image feature vectors to obtain the retrieval target corresponding to the target image feature vectors.
According to the technical scheme, in the cross-domain remote sensing scene classification and retrieval method of the self-supervision contrast learning, input data are constructed firstly and comprise data of a source domain image and data of a target domain image, wherein the data of the source domain image is marked data, the data of the target domain image is unmarked data, the constructed input data are correspondingly enhanced, then the enhanced data of the source domain image and the data of the target domain image are input into a corresponding feature coding network, an output result is compared with the input data, a loss function is constructed on the basis of the self-supervision contrast learning by combining a known class and an unknown class of the target domain image, so that network parameters of the feature coding network can be adjusted on the basis of the loss function, the influence of the unknown class samples existing in the target domain on the feature coding network can be reduced, and the trained depth domain adaptive learning network has a better effect when the data containing the unknown class samples are classified or retrieved.
Further advantages of the present invention, as well as the technical effects of preferred embodiments, are further described in the following detailed description.
Drawings
FIG. 1 is a flow chart of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 2 is a schematic diagram illustrating the principle of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 3 is a schematic diagram of a training process of a deep domain adaptive learning network in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 4 is a schematic diagram of a retrieval process in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 5 is a classification confusion matrix adapted to a counterdiscrimination domain in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 6 is a classification confusion matrix of batch singular value constraints in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 7 is a classification confusion matrix of a depth domain adaptive network in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 8 is a classification confusion matrix for a back propagation open set domain adaptation in the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to the present invention;
FIG. 9 is a classification confusion matrix of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning.
Detailed Description
The following describes in detail embodiments of the present invention with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1 and fig. 2, as an embodiment of the method for classifying and retrieving cross-domain remote sensing scenes based on self-supervised contrast learning provided by the present invention, the method includes the following steps:
a) Acquiring a remote sensing image, and dividing a source domain image and a target domain image of the remote sensing image to construct input data;
b) Constructing a loss function based on self-supervision contrast learning and combining a known sample and an unknown sample of a target domain image, constructing a depth domain adaptive learning network, and training the depth domain adaptive learning network by using input data and the loss function;
c) Classifying the target domain images by using a trained depth domain adaptive learning network, extracting target image feature vectors of the target domain images to construct a feature database, extracting query image feature vectors of the target domain query images, calculating Euclidean distances between the query image feature vectors and all target image feature vectors in the feature database, arranging according to the Euclidean distances, and obtaining a required retrieval target according to a set Euclidean distance range.
Specifically, in an embodiment of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning provided by the present invention, the input data construction step includes: extracting a plurality of images {1,2, …, N } from a data set of remote sensing images to construct source domain imagesThe source domain image contains n s Image with label source domainRepresenting annotated source domain imagesThe corresponding label, wherein, c, representing the label space of the image with the labeling source domain, and representing the total number of categories of the image with the labeling source domain; the target domain image isThe target domain image contains n t Non-annotated target domain imageWherein the target domain imageThe label space of (a) is: {1,2, …, C +1}, C +1 representing the unknown class of the label-free target domain image.
Further, in an embodiment of the cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning provided by the present invention, as shown in fig. 3 and fig. 4, the deep domain adaptive learning network includes a plurality of feature coding networks f (·), a plurality of contrast learning networks g (·), and a plurality of classifiers c (·); the feature coding network is a depth residual error network with a full connection layer removed, the last average pooling layer is replaced by a bottleneck layer, and 256-dimensional feature vectors are output; the contrast learning network g (-) is a perceptron with a ReLU (modified Linear Unit) activation function; the classifier C (-) is a fully-connected network, the input of the classifier C (-) is a 256-dimensional feature vector, and the output dimension of the classifier C (-) is consistent with the number of classes of the target domain image (namely, probability distribution of {1,2, …, C, C +1} classes).
Further, in an embodiment of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning provided by the present invention, the step of constructing the loss function includes:
b11 Construct source domain classification loss: because the source domain images are semantically labeled (namely, the source domain images are labeled), supervised learning can be carried out on the source domain images, and the classification accuracy is calculated by adopting cross entropy loss:
wherein L is softmax In order to classify the function of the loss,annotated image of source domain representing image of source domainTrue class distribution, function ofA source domain weakly enhanced sample class probability distribution representing the classifier output,a collection of labels representing source domain annotated images;
b12 Construct an unsupervised contrast loss: self-supervised contrast learning is to learn a representation by maximizing information between different views of the data, and in particular, by encouraging two views from the same image of the target domain (i.e., a strongly enhanced view and a weakly enhanced view) to be similar, and two views from different images to be dissimilar, to learn more discriminative image features, and thus, a strongly enhanced sample of the target domain that can construct an image of the target domainAnd target domain weakly enhanced samplesTo calculate the contrast loss L ssl :
Wherein sim (-) is the similarity measure function, θ is the scaling factor, A ∈ {0,1} is an indication function for evaluating whether k equals j, B represents the number of samples selected by one training; in particular, the target domain weakly enhances the samplesIs formed by the image of the unmarked target domainObtaining the product by random cutting and overturning; target domain strongly enhanced samplesIs formed by the image of the target domain without markingObtaining by using a random enhancement method;
b13 Construct the known class identification loss as:
wherein, mu represents the sample proportion meeting the selection requirement of the threshold value of the known class in one training, H (-) represents the cross entropy loss,for weakly enhancing samples from the target domainThe collected set of the target domain known pseudo labels is obtained through screening, and ind represents a target domain weak enhancement sampleThe category to which the pseudo label belongs after being screened by the known class, and ind belongs to {1,2, …, C },representing strongly enhanced samples of a target domainThe probability distribution of the prediction classes of (a),a collection of labeled exemplars representing strongly enhanced exemplars of the target domain;
b14 Constructing unknown class identification loss: consistency classification loss L for unknown class identification loss as high confidence unknown class samples unknown :
Wherein the content of the first and second substances,for weakly enhancing samples from the target domainScreening the obtained collection of the unknown pseudo labels of the target domain,representing strongly enhanced samples of a target domainA predicted class probability distribution of (a);
b15 The constructed total loss function L) is:
L=L softmax +αL ssl +βL known +γL unknown
where α, β and γ are parameters that balance the optimization objectives of the model.
Further, in an embodiment of the cross-domain remote sensing scene classification and retrieval method based on the self-supervised contrast learning provided by the present invention, the training step of the deep-domain adaptive learning network includes:
b21 C) the feature coding network f (-) can be set to three, and the source domain weakly enhanced samples and the target domain weakly enhanced samples are setAnd target domain strongly enhanced samplesRespectively input into corresponding feature coding networks f (-) to respectively obtain source domain features f i s Target domain weakly enhanced image featuresAnd strong enhancement of image features in the target domainWherein, the source domain weakly enhances the sample by the pair of the labeled source domain imagesObtaining the product by random cutting and turning;
b22 The contrast learning network g (-) can be set to two, and the target domain is weakly enhanced with the image characteristicsAnd strong enhancement of image features in the target domainRespectively input into corresponding comparison learning networks g (-) to obtain the embedded characteristics of the target domain weakly enhanced images after projection in decibelsAnd strong enhancement of image embedding characteristics in the target domainTo calculate the contrast loss L ssl ;
B23 C (-) can be set to three and source domain features fi i s Target domain weakly enhanced image featuresAnd strong enhancement of image characteristics in the target domainRespectively input into corresponding classifiers c (-) to respectively obtain the source domain weakly enhanced sample class probability distribution predicted by the classifiersTarget domain weakly enhanced sample class probability distributionStrongly enhancing sample class probability distribution with target domain
B24 Weakly enhancing sample class probability distribution to source domainBased on the classification loss function L softmax Calculating the classification loss of the source domain;
b25 Weakly enhancing sample class probability distribution to target domainFirstly, finding out the probability distribution of the target domain weakly enhanced sample classThen the probability value of the category is compared with a preset predefined threshold value tau, and samples smaller than the predefined threshold value tau are abandoned so as to keep samples larger than the predefined threshold value tauDefining a sample of a threshold value tau as a pseudo label sample, and taking a class in which the maximum prediction probability is located as a known class hard label, wherein a screening formula of the sample is as follows:
wherein, the first and the second end of the pipe are connected with each other,representing a target domain weakly enhanced sample class probability distributionThe category in which the maximum prediction probability that satisfies the threshold condition is located;
b26 Using target domain weakly enhanced samplesKnown pseudo-labelStrongly enhanced samples as corresponding target domainsTo calculate a target domain strong enhancement sampleIs known to identify the loss L known ;
B27 Selects a target domain weakly enhanced sample class probability distributionAnd taking the sample with lower confidence level as a candidate unknown sample, wherein the specific selection formula is as follows:
whereinFor the preliminarily screened candidate unknown samples, t l Selecting a threshold value for the candidate sample, selecting a sample predicted that the probability of the unknown class is higher than the set unknown class sample selection threshold value as an unknown class sample,
whereinAs candidate samplesProbability of prediction as unknown class, t uk A threshold value is selected for the unknown class of samples,unknown pseudo-label for target domain;
b28 With target domain unknown pseudo-tagsStrongly enhanced samples as target domainsComputing a consistent classification loss L of the unknown class sample unknown And obtaining the total loss function L and updating the parameters of the depth domain adaptive learning network.
Further, in an embodiment of the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning provided by the invention, the step of acquiring the target domain image is as follows:
c21 Extracting a query image feature vector based on the trained feature coding network;
c22 Calculating Euclidean distances between the feature vectors of the query image and the feature vectors of each target image in the feature database one by one;
c23 According to the Euclidean distance between the target image feature vector and the query image feature vector, sorting the target image feature vectors to obtain the retrieval target corresponding to the target image feature vectors.
The construction of the input data and the training of the depth Domain adaptive learning Network are realized based on a Pythch library of a Python language, and in addition, simulation experiments of Domain adaptive methods such as ADDA (adaptive discrete Domain Adaptation), BSP (Batch singular value constraint), DAN (Deep Adaptation Network) and OSBP (Open Set Domain Adaptation) are also carried out for comparing with the cross-Domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning of the invention; the invention adopts the overall classification precision and the classification confusion matrix to evaluate the classification effect, and adopts the Average Normalized Modified Retrieval Rank (ANMRR), the average retrieval precision (mAP) and PK (retrieval precision of the previous K images) to evaluate the retrieval effect, wherein the higher the average retrieval precision (mAP) and the PK values of the retrieval precision of the previous K images are, the better the retrieval performance is, the smaller the ANMRR value of the average normalized modified retrieval rank is, the better the retrieval performance is, and the comparison result is shown in table 1:
Method | accuracy of classification | ANMRR | mAP | P5 | P10 | P20 | P50 | P100 |
ADDA | 0.602 | 0.2872 | 0.5845 | 0.7770 | 0.7540 | 0.7215 | 0.6546 | 0.5280 |
BSP | 0.616 | 0.2800 | 0.5928 | 0.8070 | 0.7675 | 0.7238 | 0.6498 | 0.5324 |
DAN | 0.6 | 0.2622 | 0.5997 | 0.7930 | 0.7695 | 0.7375 | 0.6658 | 0.5503 |
OSBP | 0.6563 | 0.2725 | 0.5921 | 0.7260 | 0.7000 | 0.6880 | 0.6365 | 0.5403 |
The invention | 0.8063 | 0.2222 | 0.6777 | 0.8800 | 0.8635 | 0.8318 | 0.7630 | 0.6103 |
TABLE 1
The results in table 1 show that the cross-domain remote sensing scene classification and retrieval method based on the self-supervision comparison learning achieves the highest retrieval accuracy, compared with the comparison method, the classification accuracy of the method is improved by 15% to 20.63%, meanwhile, the retrieval accuracy also exceeds the comparison method comprehensively, specifically, the average retrieval accuracy of the method is improved by at least 7.8% compared with the comparison method, and the P5-P100 and ANMRR of the method are superior to the comparison method. In addition, fig. 5 to 9 also show different methods and the classification confusion matrix of the present invention, in which the numerical value on the diagonal line in the classification confusion matrix represents the probability of a correct classification of a certain class, and the numerical value outside the diagonal line represents the probability of an incorrect classification of other classes, and the results show that the method of the present invention effectively improves the classification accuracy of the target domain, particularly greatly improves the classification accuracy of the unknown class of the target domain, and simultaneously reduces the confusion between the unknown class and the known class. In conclusion, the cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning provided by the invention can effectively improve the cross-domain classification and retrieval effect under the condition of data distribution difference and inconsistent class space.
According to the technical scheme, in the cross-domain remote sensing scene classification and retrieval method of the self-supervision contrast learning, input data are constructed firstly, wherein the input data comprise data of a source domain image and data of a target domain image, the data of the source domain image are labeled data, the data of the target domain image are unlabeled, the constructed input data are correspondingly enhanced, then, the enhanced data of the source domain image and the data of the target domain image are input into a corresponding feature coding network, an output result is compared with the input data, a loss function is constructed by combining a known class and an unknown class of the target domain image on the basis of the self-supervision contrast learning, so that network parameters of the feature coding network can be adjusted on the basis of the loss function, the influence of the unknown class samples existing in the target domain on the feature coding network can be reduced, and the trained depth domain adaptive learning network has a better effect when the data containing the unknown class samples are classified or retrieved.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited thereto. Within the scope of the technical idea of the invention, numerous simple modifications can be made to the technical solution of the invention, including combinations of the specific features in any suitable way, and the invention will not be further described in relation to the various possible combinations in order to avoid unnecessary repetition. Such simple modifications and combinations should also be considered as disclosed in the present invention, and all such modifications and combinations are intended to be included within the scope of the present invention.
Claims (10)
1. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning is characterized by comprising the following steps of:
a) Acquiring a remote sensing image, and dividing a source domain image and a target domain image of the remote sensing image to construct input data;
b) Constructing a loss function based on self-supervision contrast learning and combined with a known sample and an unknown sample of the target domain image, constructing a depth domain adaptive learning network, and training the depth domain adaptive learning network by using the input data and the loss function;
c) Classifying the target domain images by using the trained depth domain adaptive learning network, extracting target image feature vectors of the target domain images to construct a feature database, extracting query image feature vectors of the target domain query images, calculating Euclidean distances between the query image feature vectors and all the target image feature vectors in the feature database, arranging according to the Euclidean distances, and obtaining the required retrieval target according to a set Euclidean distance range.
2. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning according to claim 1, characterized in that the construction steps of the input data comprise: extracting a plurality of images {1,2,. And N } from the data set of the remote sensing image, and constructing the source domain imageThe source domain image comprises ns marked source domain imagesRepresenting annotated source domain imagesThe corresponding label, wherein,representing the label space of the image with the labeling source domain, and C representing the total number of categories of the image with the labeling source domain; the target domain image isThe target domain image comprises nt unmarked target domain imagesWherein the target domain imageThe label space of (a) is: {1,2, ·, C +1}, C +1 denotes the unknown class of the label-free target domain image.
3. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision comparison learning according to claim 2, characterized in that the deep domain adaptive learning network comprises a plurality of feature coding networks f (-) and a plurality of comparison learning networks g (-) and a plurality of classifiers c (-).
4. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning according to claim 3, characterized in that the feature coding network f (-) is a depth residual network with a full connection layer removed, and an average pooling layer of the depth residual network is replaced by a bottleneck layer.
5. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision comparison learning according to claim 4, characterized in that the comparison learning network g (-) is a perceptron with a ReLU activation function.
6. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning of claim 5 is characterized in that the classifier c (-) is a full-connection network, and the output dimension of the classifier c (-) is consistent with the category number of the target domain images.
7. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning according to claim 6, characterized in that the construction step of the loss function comprises:
b11 Construct source domain classification loss: and carrying out supervised learning on the source domain image, and calculating the classification accuracy by adopting cross entropy loss:
wherein L is softmax In order to classify the function of the loss,a source domain annotated image representing the source domain imageTrue class distribution, functionA source domain weakly enhanced sample class probability distribution representing the classifier output,a collection of labeled exemplars representing annotated images in the source domain;
b12 Construct an unsupervised contrast loss: constructing a target domain strong enhancement sample of the target domain imageAnd target domain weakly enhanced samplesTo calculate the contrast loss L ssl :
Wherein sim (-) is the similarity measure function, θ is the scaling factor, A ∈ {0,1} is an indication function for evaluating whether k equals j, B represents the number of samples selected by one training;
b13 Construct a known class identification penalty as:
where μ represents the proportion of samples within a training run that meet the selection requirements for a known class threshold, H (-) represents the cross entropy loss,for weakly enhancing samples from the target domainThe collected set of the target domain known pseudo labels obtained through screening, ind represents the target domain weak enhancement sampleThe class to which the known pseudo-label belongs after being screened, and ind ∈ {1,2.Representing strongly enhanced samples of the target domainIs determined based on the predicted class probability distribution of (c),a collection of labeled exemplars representing strongly enhanced exemplars of the target domain;
b14 Constructing unknown class identification loss: consistency classification loss L for unknown class identification loss as high confidence unknown class samples unknown :
Wherein the content of the first and second substances,for weakly enhancing samples from the target domainScreening the obtained collection of the unknown pseudo labels of the target domain,representing strongly enhanced samples of the target domainA predicted class probability distribution of (a);
b15 ) the constructed total loss function L is:
L=L softmax +αL ssl +βL known +γL unknown
where α, β and γ are parameters that balance the optimization objectives of the model.
8. The cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to claim 7, characterized in that the target domain weakly-enhanced samplesBy comparing the label-free target domain imageObtaining the product by random cutting and overturning; the target domain strongly enhanced sampleBy comparing the label-free target domain imageObtaining by using a random enhancement method; the source domain weakly enhanced sample is selected from the labeled source domain imageAnd obtaining the target by random cutting and overturning.
9. The cross-domain remote sensing scene classification and retrieval method based on the self-supervision contrast learning according to claim 8, wherein the training step of the deep domain adaptive learning network comprises:
b21 The source domain weakly enhanced samples and the target domain weakly enhanced samplesAnd target domain strongly enhanced samplesInputting the characteristics into the characteristic coding network f (-) to respectively obtain the source domain characteristics f i s Target domain weakly enhanced image features f j w And strong enhancement of image features in the target domain
B22 Weakly enhancing the target domain image featuresAnd the target domain strongly enhances image featuresInputting the contrast learning network g (-) to obtain the embedded characteristics of the projected target domain weak enhancement imageAnd strong enhancement of image embedding characteristics in the target domainTo calculate the contrast loss Lssl;
b23 Characterize the source domainThe target domain weakly enhances image featuresAnd the target domain strongly enhances image featuresInputting the classifier c (-) to respectively obtain the source domain weakly enhanced sample class probability distribution predicted by the classifierThe target domain weakly enhanced sample class probability distributionAnd the target domain strongly enhances the sample class probability distribution
B24 Weakly enhancing sample class probability distribution to the source domainBased on the classification loss function L softmax Calculating the classification loss of the source domain;
b25 Weakly enhancing sample class probability distribution to the target domainFirstly, the category where the maximum prediction probability is located is found, the probability value of the category is compared with a preset predefined threshold value sigma, and the category which is smaller than tau is abandonedAnd (3) reserving samples larger than tau as pseudo label samples, and taking the class where the maximum prediction probability is as a known class hard label, wherein the screening formula is as follows:
wherein the content of the first and second substances,to representA category in which the maximum prediction probability that satisfies a threshold condition is located;
b26 Using the target domain weakly enhanced samplesKnown pseudo-labelStrongly enhancing samples as the corresponding target domainCalculates the target domain strong enhancement samplesSaid known class of (1) identifies a loss L known ;
B27 Selecting the target domain weakly enhanced sample class probability distributionAnd taking the sample with lower confidence level as a candidate unknown sample, wherein the specific selection formula is as follows:
whereinFor the preliminarily screened candidate unknown samples, t l Selecting a threshold value for the candidate sample, selecting a sample predicted that the probability of the unknown class is higher than the set unknown class sample selection threshold value as an unknown class sample,
whereinIs the candidate sampleProbability of prediction as unknown class, t uk A threshold value is chosen for the unknown class sample,unknown pseudo-label for target domain;
b28 With the target domain unknown class pseudo-tagAs a strongly enhanced sample of the target domainComputing a consistent classification loss L of the unknown class samples unknown And obtaining the total loss function L, and updating the parameters of the depth domain adaptive learning network by using a gradient descent algorithm.
10. The cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning according to claim 9, characterized in that the step of obtaining the retrieval target is:
c21 Extracting the query image feature vector based on the trained feature coding network;
c22 Computing Euclidean distances between the query image feature vector and each target image feature vector in the feature database one by one;
c23 According to the Euclidean distance between the target image feature vector and the query image feature vector, sorting the target image feature vectors to obtain the retrieval target corresponding to the target image feature vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927707.1A CN115471739A (en) | 2022-08-03 | 2022-08-03 | Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927707.1A CN115471739A (en) | 2022-08-03 | 2022-08-03 | Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115471739A true CN115471739A (en) | 2022-12-13 |
Family
ID=84368251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210927707.1A Pending CN115471739A (en) | 2022-08-03 | 2022-08-03 | Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115471739A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116524302A (en) * | 2023-05-05 | 2023-08-01 | 广州市智慧城市投资运营有限公司 | Training method, device and storage medium for scene recognition model |
CN116543269A (en) * | 2023-07-07 | 2023-08-04 | 江西师范大学 | Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof |
CN116740578A (en) * | 2023-08-14 | 2023-09-12 | 北京数慧时空信息技术有限公司 | Remote sensing image recommendation method based on user selection |
CN117456309A (en) * | 2023-12-20 | 2024-01-26 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
-
2022
- 2022-08-03 CN CN202210927707.1A patent/CN115471739A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116524302A (en) * | 2023-05-05 | 2023-08-01 | 广州市智慧城市投资运营有限公司 | Training method, device and storage medium for scene recognition model |
CN116524302B (en) * | 2023-05-05 | 2024-01-26 | 广州市智慧城市投资运营有限公司 | Training method, device and storage medium for scene recognition model |
CN116543269A (en) * | 2023-07-07 | 2023-08-04 | 江西师范大学 | Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof |
CN116543269B (en) * | 2023-07-07 | 2023-09-05 | 江西师范大学 | Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof |
CN116740578A (en) * | 2023-08-14 | 2023-09-12 | 北京数慧时空信息技术有限公司 | Remote sensing image recommendation method based on user selection |
CN116740578B (en) * | 2023-08-14 | 2023-10-27 | 北京数慧时空信息技术有限公司 | Remote sensing image recommendation method based on user selection |
CN117456309A (en) * | 2023-12-20 | 2024-01-26 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
CN117456309B (en) * | 2023-12-20 | 2024-03-15 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Cross-domain target identification method based on intermediate domain guidance and metric learning constraint |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191732B (en) | Target detection method based on full-automatic learning | |
CN107679250B (en) | Multi-task layered image retrieval method based on deep self-coding convolutional neural network | |
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN113190699B (en) | Remote sensing image retrieval method and device based on category-level semantic hash | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
CN115471739A (en) | Cross-domain remote sensing scene classification and retrieval method based on self-supervision contrast learning | |
CN110717534B (en) | Target classification and positioning method based on network supervision | |
CN110909820A (en) | Image classification method and system based on self-supervision learning | |
CN108108657A (en) | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning | |
CN109871875B (en) | Building change detection method based on deep learning | |
CN108052966A (en) | Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique | |
CN110852107B (en) | Relation extraction method, device and storage medium | |
CN115934990B (en) | Remote sensing image recommendation method based on content understanding | |
CN112132014B (en) | Target re-identification method and system based on non-supervised pyramid similarity learning | |
CN114358188A (en) | Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment | |
CN113032613B (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN115292532B (en) | Remote sensing image domain adaptive retrieval method based on pseudo tag consistency learning | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
Zhang et al. | An efficient class-constrained DBSCAN approach for large-scale point cloud clustering | |
CN115965867A (en) | Remote sensing image earth surface coverage classification method based on pseudo label and category dictionary learning | |
CN115393713A (en) | Scene understanding method based on plot perception dynamic memory | |
Sari et al. | Parking Lots Detection in Static Image Using Support Vector Machine Based on Genetic Algorithm. | |
CN114882376B (en) | Convolutional neural network remote sensing image target detection method based on optimal anchor point scale | |
CN116932487B (en) | Quantized data analysis method and system based on data paragraph division | |
Gaynor | Unsupervised Context Distillation from Weakly Supervised Data to Augment Video Question Answering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |