CN115601584A - Remote sensing scene image multi-label classification method and device and storage medium - Google Patents

Remote sensing scene image multi-label classification method and device and storage medium Download PDF

Info

Publication number
CN115601584A
CN115601584A CN202211113132.6A CN202211113132A CN115601584A CN 115601584 A CN115601584 A CN 115601584A CN 202211113132 A CN202211113132 A CN 202211113132A CN 115601584 A CN115601584 A CN 115601584A
Authority
CN
China
Prior art keywords
label
remote sensing
class
scene image
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211113132.6A
Other languages
Chinese (zh)
Inventor
刘宏哲
吴宏俊
刘力铭
徐成
代松银
潘卫国
徐冰心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Union University
Original Assignee
Beijing Union University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Union University filed Critical Beijing Union University
Priority to CN202211113132.6A priority Critical patent/CN115601584A/en
Publication of CN115601584A publication Critical patent/CN115601584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing scene image multi-label classification method and device and a storage medium, wherein the method comprises the following steps: extracting the image characteristics of the remote sensing scene; converting the remote sensing scene image characteristics into label embedding corresponding to each category label; obtaining a first inter-class relation matrix according to the correlation between the label embedding; constructing a mask according to the first inter-class relationship matrix to obtain a second inter-class relationship matrix; updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label; and determining the label of the remote sensing scene image according to the prediction score of each class label. By adopting the technical scheme of the invention, the problem that the deviation caused by the class which does not exist in the image is not eliminated when the relation between the classes is modeled in the prior art is solved.

Description

Remote sensing scene image multi-label classification method and device and storage medium
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a multi-label classification method and device for remote sensing scene images based on a mask attention mechanism, and a storage medium.
Background
In recent years, with the continuous development of remote sensing technology, airborne and spaceborne remote sensing images have been widely used for land cover mapping and monitoring. Generally, as the land cover depicted by the high-resolution remote sensing image is various in types, the content in the image cannot be accurately described only by using a single label. The multi-label remote sensing image classification method can distribute a plurality of land cover labels for each remote sensing image, thereby accurately expressing the remote sensing image and being more in line with the actual requirement for understanding the remote sensing image.
Recently, visual feature extractors based on Deep learning have made great progress in the field of image recognition, such as ResNet (depth residual error Network) in DCNN (Deep Convolutional Neural Network) and Swin Transformer (high Vision Transformer using Shifted window layered Visual Transformer) in Visual Transformer. The feature extractors can extract high-level semantic features which are easier to distinguish, and are greatly helpful for single-label image classification. However, multi-label classification of remote sensing images is a more complex task than single label classification of remote sensing images. In one aspect, there are a plurality of surface coverings of different spatial resolution in one remotely sensed image. For example, "cars" are much smaller in size than "courses," and thus "cars" are one of the unobtrusive categories. On the other hand, since land cover objects generally coexist in the remote sensing image, the inter-class relationship is another key of classification. Therefore, the multi-label classification task of the remote sensing image not only considers the accurate spatial feature extraction, but also considers the correlation among a plurality of classes.
In typical multi-label image classification, utilization of spatial information and inter-class relationships are important issues. Methods for processing spatial information mainly include introduction of region suggestions, implicit spatial attention, or multi-scale features. Introducing regional recommendations requires additional bounding box labeling, which can be labor intensive. The use of implicit spatial attention enables automatic localization of the location of various class objects in an image through classification-loss supervision without the need for manually labeled bounding-box supervision. The use of multi-scale features can increase the recognition capability of the model for objects of different scales to some extent, but can increase the amount of computation.
On the other hand, modeling of relationships between classes is also widely studied. Early methods used RNN (Recurrent Neural Network) or LSTM (Long Short-Term Memory) to predict multiple tags in an image in a sequential manner and learned the sequential relevance of the tags. However, the performance of RNN or LSTM based methods is affected by the order that is preset or learned. Other studies describe the multi-label image classification task as a probabilistic graphical model-based structural inference problem, but its utility is limited due to high computational complexity. Inspired by GCN (Graph Convolutional Neural Network) in terms of multivariate relational representation, some researchers use GCN to explicitly model tag relevance. The performance of the convolutional neural network is limited by the receptive field of convolution, and the long-range relational modeling effect is poor. A transducer based on attention mechanism learns the relationship between each pair of elements in a long sequence using a self-attention mechanism, which is more advantageous than a convolutional neural network in terms of long-range relationship modeling. Currently, transformers have been widely used in the fields of natural language processing and computer vision.
Two broad categories of problems in multi-label classification are addressed: more accurate spatial information and inter-class relation modeling are needed, and the existing multi-label classification method of the remote sensing image is mainly divided into two types: a method of processing a space and a method of processing a relationship between classes, but a method of comprehensively considering both problems is lacking. Meanwhile, the existing inter-class relationship modeling method generally directly models the overall label dependency relationship among all classes. However, only some class objects exist in a single image, and most of the visual features extracted from the image are related to the real tags, but the features related to the nonexistent class are lacked. By this, inter-class relationships computed between non-existent classes are inaccurate. These inaccurate inter-tag dependencies can be noisy for the classification task.
Disclosure of Invention
The invention aims to solve the technical problem of providing a multi-label classification method and device for remote sensing scene images based on a mask attention mechanism and a storage medium, so as to solve the problem that the prior art does not eliminate the deviation caused by the categories which do not exist in the images when modeling the relationship between the categories.
In order to realize the purpose, the invention adopts the following technical scheme:
a remote sensing scene image multi-label classification method comprises the following steps:
s1, extracting the characteristics of a remote sensing scene image;
s2, converting the remote sensing scene image characteristics into label embedding corresponding to each class label;
s3, obtaining a first inter-class relation matrix according to the correlation between the label embedding;
s4, constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
s5, updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and S6, determining the label of the remote sensing scene image according to the prediction score of each class label.
Preferably, step S2 includes:
converting the remote sensing scene image characteristics into category specific activation;
and obtaining label embedding corresponding to each category label according to the remote sensing scene image characteristics and the category specific activation.
Preferably, step S3 is specifically: learning the correlation between the label embedding through a multi-head dot product self-attention mechanism to obtain a first inter-class relation matrix; first the tag embedding E is divided into h subsequences [ E 1 ,e 2 ,…,e h ],
Figure BDA0003844371250000041
i =1,2, \ 8230;, h; then for each subsequence e i Learning three weight matrices
Figure BDA0003844371250000042
Sub-sequence e using the following formula i Conversion to vector Q i ,K i ,V i
Figure BDA0003844371250000043
Calculating the vector Q i And K i And mapping the (0, 1) interval to obtain the first inter-class relationship matrix.
Preferably, step S4 is specifically: converting the category-specific activation into a category prediction score 1 of the remote sensing scene image by using a global maximum pooling function; according to the category prediction score 1, selecting the index of the top k bits with the highest numerical value, and adding the index into a set I;
the mask was constructed using the following formula:
Figure BDA0003844371250000044
inaccurate inter-class relationships are filtered using the following formula:
Figure BDA0003844371250000045
obtaining a second inter-class relationship matrix [ A ] 1 ,A 2 ,…,A h ]。
Preferably, step S5 is specifically:
updating the tag embedding E using the following formula:
E=E+A(E,E,E),
E=σ(EP 1 +b 1 )P 2 +b 2 +E,
where σ (-) refers to the nonlinear activation function, P 1 、P 2 、b 1 、b 2 Learning parameters;
obtaining a category prediction score 2 according to the updated label embedding;
and selecting the mean value of the category prediction score 1 and the category prediction score 2 to obtain the final prediction score of each category label.
Preferably, step S6 determines the label of the remote sensing scene image using a method,
Figure BDA0003844371250000051
wherein, Y i Represents the ith element in the final class prediction score Y.
The invention also provides a multi-label classification device for remote sensing scene images, which comprises:
the extraction module is used for extracting the remote sensing scene image characteristics;
the conversion module is used for converting the remote sensing scene image characteristics into label embedding corresponding to each category label;
the first calculation module is used for obtaining a first inter-class relation matrix according to the correlation between the label embedding;
the second calculation module is used for constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
the third calculation module is used for updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and the classification module is used for determining the label of the remote sensing scene image according to the prediction score of each class label.
The invention also provides a storage medium storing machine executable instructions which, when invoked and executed by a processor, cause the processor to implement the remote sensing scene image multi-label classification method.
The invention comprehensively considers the spatial information and the inter-class relation modeling, uses implicit spatial attention to position the position of the ground coverage of each class on the image, and simultaneously uses a mask attention mechanism to replace a standard self-attention mechanism in a Transformer, and the used mask attention mechanism can filter the inaccurate inter-class dependency relation, thereby improving the multi-label classification performance.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a remote sensing scene image multi-label classification method of the present invention;
FIG. 2 is a schematic diagram illustrating the principle of the remote sensing scene image multi-label classification method of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
Example 1:
as shown in fig. 1 and 2, the invention provides a remote sensing scene image multi-label classification method based on a mask attention mechanism, which comprises the following steps:
s1, extracting the remote sensing scene image features by using a feature extractor;
s2, converting the remote sensing scene image characteristics into label embedding corresponding to each class label;
s3, obtaining a first inter-class relation matrix according to the correlation between the label embedding;
s4, constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
s5, updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and S6, determining the label of the remote sensing scene image according to the prediction score of each class label.
As an implementation manner of the embodiment of the present invention, step S1 obtains the characteristics of the remote sensing scene image by the following method, including the following steps:
s11, preprocessing the remote sensing scene images, including horizontally turning the images, cutting and adjusting the images to be uniform in size, wherein all the remote sensing scene images are uniformly adjusted to 224 × 224 pixels in the embodiment;
s12, extracting the characteristics of the remote sensing scene image by using a deep convolutional neural network, wherein the final pooling layer is removed by all the deep convolutional neural networks; preferably, in order to take account of the accuracy and real-time performance of the network, the deep residual error network ResNet50 is adopted as a main network for extracting image characteristics to obtain the image characteristics of the remote sensing scene
Figure BDA0003844371250000081
Wherein H, W and D are three dimensions of the remote sensing scene image characteristicsRespectively, the length, width and number of channels of the feature. The dimension of the remote sensing scene image feature X obtained in the embodiment is 2048 × 7 × 7.
As an implementation manner of the embodiment of the present invention, the step S2 of converting the remote sensing scene image features into tag embedding corresponding to each category tag includes:
converting the remote sensing scene image characteristics into category specific activation; and obtaining label embedding corresponding to each class label according to the remote sensing scene image characteristics and the class specific activation.
Further, converting the remote sensing scene image features into category specific activation by the following method, specifically:
characterizing the remotely sensed scene image using the following formula
Figure BDA0003844371250000082
Transition to category-specific activation
Figure BDA0003844371250000083
Figure BDA0003844371250000084
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003844371250000085
the method is a two-dimensional convolution function with a convolution kernel of 1 multiplied by 1, H, W and D are three dimensions of the remote sensing scene image characteristics, namely the length, the width and the channel number of the characteristics, and C is the total number of label categories of a data set. The class-specific activation G represents that each position in the remote sensing scene image feature X belongs to a class { c } 1 ,c 2 ,…,c C -the position of each category of ground cover in the image can be located.
Obtaining label embedding corresponding to each category label according to the remote sensing scene image characteristics and the category specific activation by the following method, specifically:
according to the remote sensing scene image characteristics X and the category specific activation G, the following formula is used to obtain the label embedding E:
Figure BDA0003844371250000091
wherein the function
Figure BDA0003844371250000092
Is a matrix dimension change function, and can be used for characterizing the remote sensing scene image
Figure BDA0003844371250000093
And category specific activation
Figure BDA0003844371250000094
Change in dimension of
Figure BDA0003844371250000095
And
Figure BDA0003844371250000096
namely, two dimensions of H and W are combined;
Figure BDA0003844371250000097
the method is a one-dimensional convolution function with a convolution kernel of 1, and is used for changing the dimension of the remote sensing scene image characteristic.
As another implementation manner of the embodiment of the present invention, the step S2 obtains the tag embedding by using the following method, including the following steps:
and S21, converting the remote sensing scene image characteristics X into class-specific activation G by using a convolution layer and a sigmoid activation layer. Specifically, in this embodiment, the convolutional layer is a two-dimensional convolutional layer, the size of the convolutional core is 1 × 1, the step size is 1, the input dimension is 2048, and the output dimension is the number of ground cover categories labeled in the remote sensing scene image data set, that is, the number of label categories C. The dimension of the class-specific activation G obtained in this example is C × 7 × 7;
and S22, changing the dimension of the remote sensing scene image characteristic X by using a convolution layer. Specifically, in this embodiment, the convolution layer is a two-dimensional convolution layer, the size of the convolution kernel is 1 × 1, the step size is 1, the input dimension is 2048, and the output dimension is 1024. The dimensionality of the remote sensing scene image feature X is 1024 × 7 × 7.
S23, combining the dimensionalities of the category-specific activation G and the remote sensing scene image feature X, in the embodiment, after combination,
Figure BDA0003844371250000098
and
Figure BDA0003844371250000099
and S24, multiplying the category-specific activation G and the remote sensing scene image feature X by using a multiplier to obtain a label embedding E. In the present embodiment, it is preferred that,
Figure BDA00038443712500000910
as another implementation manner of the embodiment of the present invention, in step S3, a first inter-class relationship matrix is obtained according to the correlation between the embedded tags, and specifically:
and learning the correlation between the label embedding through a multi-head dot product self-attention mechanism to obtain a first inter-class relation matrix. The multi-head dot product self-attention mechanism consists of h dot product self-attention heads and can learn the relevance of different aspects of characteristics. First E is divided into h subsequences [ E 1 ,e 2 ,…,e h ],
Figure BDA0003844371250000101
i =1,2, \8230;, h. Then for each subsequence e i Learning three weight matrices
Figure BDA0003844371250000102
Sub-sequence e using the following formula i Conversion to vector Q i ,K i ,V i
Figure BDA0003844371250000103
Calculating the vector Q i And K i And mapping the dot product of (1) to the interval of (0) to obtain the first inter-class relationship matrix.
As another implementation manner of the embodiment of the present invention, in step S3, the following method is used to obtain the first inter-class relationship matrix, and includes the following steps:
s31, dividing the label embedding E into h subsequences [ E 1 ,e 2 ,…,e h ]I =1,2, \ 8230;, h, in this example, h =4;
s32, converting each subsequence into three vectors Q by using three full-connection layers respectively i ,K i ,V i In this embodiment, the input dimension and the output dimension of each fully connected layer are both 1024;
s33, calculating a vector Q i And V i And normalizing by using a softmax function to obtain a first inter-class relationship matrix [ A' 1 ,A’ 2 ,...,A’ h ];
As an implementation manner of the embodiment of the present invention, in step S4, a mask is constructed by using the following method, and the first inter-class relationship matrix with low confidence is filtered out to obtain the second inter-class relationship matrix, which specifically is:
converting the category-specific activation G into a category prediction score 1 of the remote sensing scene image by using a global maximum pooling function maxporoling 2D; and selecting the index of the front k bits with the highest numerical value according to the class prediction score 1, and adding the index into the set I.
The mask was constructed using the following formula:
Figure BDA0003844371250000111
wherein x and y are rows and columns of the first relational matrix, respectively.
Inaccurate inter-class relationships are filtered using the following formula:
Figure BDA0003844371250000112
obtaining a second inter-class relationship matrix [ A ] 1 ,A 2 ,…,A h ]。
As another implementation manner of the embodiment of the present invention, in step S4, the following method is used to obtain the second inter-class relationship matrix, and the method includes the following steps:
s41, according to the class prediction score 1, selecting the index of the top k bit with the highest numerical value by using a topK function, and adding the index into the set I. In the present embodiment, k =20;
s42, constructing a mask by using the following formula:
Figure BDA0003844371250000113
s43, using an adder to mask the mask
Figure BDA0003844371250000114
Respectively adding the first and second inter-class relationship matrixes to obtain a second inter-class relationship matrix [ A ] 1 ,A 2 ,…,A h ]。
As an implementation manner of the embodiment of the present invention, step S5 updates the label embedding by the following method to obtain the prediction score of each category label, specifically:
the tag embedding E is updated using the following formula:
E=E+A(E,E,E),
E=σ(EP 1 +b 1 )P 2 +b 2 +E,
where σ (-) refers to the nonlinear activation function, P 1 、P 2 、b 1 、b 2 To learn parameters.
Obtaining a category prediction score 2 according to the updated label embedding;
and taking the average value of the category prediction score 1 and the category prediction score 2 to obtain the final prediction score of each category label.
As another implementation manner of the embodiment of the present invention, in step S5, the following method is used to obtain the prediction score of each category label, and the method includes the following steps:
s51, using a multiplier to convert the relationship matrix [ A ] between the second classes 1 ,A 2 ,…,A h ]Each element A in i Respectively with said vector V i Multiplying to obtain new label embedding E 1
S52, embedding the label into E by using two full connection layers and one GELU activation layer 1 Update to tag embedding E 2 . In this embodiment, the input dimension of the fully-connected layer 1 is 1024, the output dimension is 2048, the input dimension of the fully-connected layer 2 is 2048, and the output dimension is 1024;
s53, embedding the label into E 2 A full connectivity layer and a sigmoid activation layer are used to translate to a class prediction score of 2. In this embodiment, the input dimension of the fully-connected layer is 1024, and the output dimension is the tag category number C;
and S54, adding the category prediction score 1 and the category prediction score 2 in a bitwise manner to obtain a final category prediction score Y.
As an implementation manner of the embodiment of the present invention, step S6 determines the label of the remote sensing scene image by using the following method:
Figure BDA0003844371250000121
wherein Y is i Represents the ith element in the final class prediction score Y.
By adopting the technical scheme, namely the multi-label image classification network based on the mask attention mechanism, the corresponding position of each ground cover in the image can be automatically positioned, the self-attention mechanism is used for distributing the weight to each group of label pairs, and the inaccurate part with low confidence coefficient in the mask filtering inter-class relation is constructed, so that the more accurate inter-class relation can be obtained. And the spatial information and the inter-class relation are combined, so that the classification label of the remote sensing scene image is more accurate.
The embodiment of the invention uses implicit spatial attention to position the position of each type of ground covering on the image, automatically extracts the relevant regional characteristics of the ground covering of the types, and fully utilizes the spatial information in the remote sensing image.
The embodiment of the invention uses a mask attention mechanism to replace a standard self-attention mechanism and automatically learns the relationship between classes. The masking attention mechanism can filter out the partially inaccurate inter-class dependency relationship, so that the accuracy of multi-label classification is improved.
To verify the validity of the present application, the following comparative experiments were performed:
data set
The UC-Merced multi-label land use data set is a multi-label remote sensing image data set. The data set had a total of 2100 remote sensing images, type 17 land cover. Each image had 1-7 footprints of 256 pixels by 256. We randomly chosen 80% of the images for training the model and the remaining images for validation and testing.
Evaluation index
Five common multi-label remote sensing image classification evaluation indexes are selected: overall accuracy (OP), overall Recall (OR), accuracy per Class (CP), recall per Class (CR) and F1 score per class (CF 1) were used as evaluation indices. For all indices, a larger value indicates a better classification effect.
The comparison method comprises the following steps:
the prior method comprises the following steps: resNet50, a method proposed by Homingming et al in the document "Deep residual learning for image recognition";
the prior method II comprises the following steps: CA-ResNet-BilSTM, a method proposed by Hua et al in the document "Current applied class-wide identification in a hybrid volume and bidirectional LSTM network for multi-layer image class";
the existing method three: AL-RN-ResNet50, a method proposed by Hua et AL in the document "relationship network for Multi laboratory image classification";
the result of performing a multi-label image classification task on a UC-Mercded multi-label land using a data set by using the method of the present application and the existing method is as follows:
Figure BDA0003844371250000141
the above experiment illustrates that: the method is superior to the existing method in most evaluation indexes, and the effectiveness of the method is reflected.
Example 2:
the invention also provides a multi-label classification device for remote sensing scene images, which comprises:
the extraction module is used for extracting the remote sensing scene image characteristics;
the conversion module is used for converting the remote sensing scene image characteristics into label embedding corresponding to each category label;
the first calculation module is used for obtaining a first inter-class relation matrix according to the correlation between the label embedding;
the second calculation module is used for constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
the third calculation module is used for updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and the classification module is used for determining the label of the remote sensing scene image according to the prediction score of each class label.
Example 3:
the invention also provides a storage medium, which stores machine executable instructions, and when the machine executable instructions are called and executed by a processor, the machine executable instructions cause the processor to realize the remote sensing scene image multi-label classification method.
The above-described embodiments are only intended to describe the preferred embodiments of the present invention, and not to limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims (8)

1. A remote sensing scene image multi-label classification method is characterized by comprising the following steps:
s1, extracting the characteristics of a remote sensing scene image;
s2, converting the remote sensing scene image characteristics into label embedding corresponding to each class label;
s3, obtaining a first inter-class relation matrix according to the correlation between the label embedding;
s4, constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
s5, updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and S6, determining the label of the remote sensing scene image according to the prediction score of each class label.
2. The remote sensing scene image multi-label classification method according to claim 1, wherein the step S2 comprises:
converting the remote sensing scene image characteristics into category specific activation;
and obtaining label embedding corresponding to each category label according to the remote sensing scene image characteristics and the category specific activation.
3. The remote sensing scene image multi-label classification method according to claim 2, characterized in that step S3 specifically comprises: learning the correlation between the label embedding through a multi-head dot product self-attention mechanism to obtain a first inter-class relation matrix; first the tag embedding E is divided into h subsequences
Figure FDA0003844371240000011
Then for each subsequence e i Learning three weight matrices
Figure FDA0003844371240000021
Sub-sequence e using the following formula i Conversion to vector Q i ,K i ,V i
Figure FDA0003844371240000022
Calculating the vector Q i And K i And mapping the (0, 1) interval to obtain the first inter-class relationship matrix.
4. The remote sensing scene image multi-label classification method according to claim 3, wherein the step S4 specifically comprises: converting the category-specific activation into a category prediction score 1 of the remote sensing scene image by using a global maximum pooling function; according to the category prediction score 1, selecting the index of the front k bits with the highest numerical value, and adding the index into a set I;
the mask was constructed using the following formula:
Figure FDA0003844371240000023
inaccurate inter-class relationships are filtered using the following formula:
Figure FDA0003844371240000024
obtaining a second inter-class relationship matrix [ A ] 1 ,A 2 ,…,A h ]。
5. The remote sensing scene image multi-label classification method according to claim 4, wherein the step S5 specifically comprises:
the tag embedding E is updated using the following formula:
E=E+A(E,E,E),
E=σ(EP1+b1)P2+b1+E,
where σ (-) refers to the nonlinear activation function, P 1 、P 2 、b 1 、b 2 Learning parameters;
obtaining a category prediction score 2 according to the updated label embedding;
and selecting the mean value of the category prediction score 1 and the category prediction score 2 to obtain the final prediction score of each category label.
6. The remote sensing scene image multi-label classification method according to claim 5, characterized in that step S6 determines the label of the remote sensing scene image using the following method,
Figure FDA0003844371240000031
wherein Y is i Represents the ith element in the final class prediction score Y.
7. A multi-label classification device for remote sensing scene images is characterized by comprising:
the extraction module is used for extracting the remote sensing scene image characteristics;
the conversion module is used for converting the remote sensing scene image characteristics into label embedding corresponding to each category label;
the first calculation module is used for obtaining a first inter-class relation matrix according to the correlation between the label embedding;
the second calculation module is used for constructing a mask according to the first inter-class relation matrix to obtain a second inter-class relation matrix;
the third calculation module is used for updating the label embedding according to the second inter-class relation matrix to obtain the prediction score of each class label;
and the classification module is used for determining the label of the remote sensing scene image according to the prediction score of each class label.
8. A storage medium having stored thereon machine executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of multi-label classification of images of remote sensing scenes according to any one of claims 1 to 6.
CN202211113132.6A 2022-09-14 2022-09-14 Remote sensing scene image multi-label classification method and device and storage medium Pending CN115601584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211113132.6A CN115601584A (en) 2022-09-14 2022-09-14 Remote sensing scene image multi-label classification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211113132.6A CN115601584A (en) 2022-09-14 2022-09-14 Remote sensing scene image multi-label classification method and device and storage medium

Publications (1)

Publication Number Publication Date
CN115601584A true CN115601584A (en) 2023-01-13

Family

ID=84842543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211113132.6A Pending CN115601584A (en) 2022-09-14 2022-09-14 Remote sensing scene image multi-label classification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN115601584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524258A (en) * 2023-04-25 2023-08-01 云南师范大学 Landslide detection method and system based on multi-label classification

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222068A (en) * 2021-06-03 2021-08-06 西安电子科技大学 Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222068A (en) * 2021-06-03 2021-08-06 西安电子科技大学 Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HONGJUN WU ET AL.: "S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524258A (en) * 2023-04-25 2023-08-01 云南师范大学 Landslide detection method and system based on multi-label classification

Similar Documents

Publication Publication Date Title
He et al. Learning and incorporating top-down cues in image segmentation
Wang et al. Evaluation of a deep-learning model for multispectral remote sensing of land use and crop classification
CN104866810A (en) Face recognition method of deep convolutional neural network
CN109522961B (en) Semi-supervised image classification method based on dictionary deep learning
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
Huo et al. A batch-mode active learning algorithm using region-partitioning diversity for SVM classifier
CN112115806B (en) Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning
CN115601584A (en) Remote sensing scene image multi-label classification method and device and storage medium
CN111191700B (en) Hyperspectral image dimension reduction method and device based on self-adaptive collaborative image discriminant analysis
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN116311323A (en) Pre-training document model alignment optimization method based on contrast learning
CN114913425A (en) Corn disease degree detection method based on belief learning and fine-grained feature extraction
CN114999637A (en) Pathological image diagnosis method and system based on multi-angle coding and embedded mutual learning
CN114580525A (en) Gesture action classification method for data containing missing
López-Cifuentes et al. Attention-based knowledge distillation in scene recognition: the impact of a dct-driven loss
CA3096145A1 (en) System and method of machine learning using embedding networks
CN112464172A (en) Growth parameter active and passive remote sensing inversion method and device
CN117011701A (en) Remote sensing image feature extraction method for hierarchical feature autonomous learning
CN115640418A (en) Cross-domain multi-view target website retrieval method and device based on residual semantic consistency
CN116524258A (en) Landslide detection method and system based on multi-label classification
CN114708307B (en) Target tracking method, system, storage medium and device based on correlation filter
CN115830322A (en) Building semantic segmentation label expansion method based on weak supervision network
CN115034837A (en) Product sales prediction method, equipment and medium based on knowledge graph
CN114492657A (en) Plant disease classification method and device, electronic equipment and storage medium
CN113409351A (en) Unsupervised field self-adaptive remote sensing image segmentation method based on optimal transmission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination