CN115546569B - Attention mechanism-based data classification optimization method and related equipment - Google Patents
Attention mechanism-based data classification optimization method and related equipment Download PDFInfo
- Publication number
- CN115546569B CN115546569B CN202211550245.2A CN202211550245A CN115546569B CN 115546569 B CN115546569 B CN 115546569B CN 202211550245 A CN202211550245 A CN 202211550245A CN 115546569 B CN115546569 B CN 115546569B
- Authority
- CN
- China
- Prior art keywords
- fusion
- data
- attention
- classification
- feature extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/10—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a data classification optimization method based on an attention mechanism and related equipment, wherein the method comprises the following steps: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the HSI and the LiDAR are accurately classified through efficient feature extraction and fusion.
Description
Technical Field
The invention relates to the technical field of multi-source data fusion classification, in particular to a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium.
Background
With the rapid development of earth observation technology, different types of sensors have been developed to obtain multi-source information of ground objects (ground objects). For example, multispectral and hyperspectral cameras can acquire spectral attributes of the ground features, liDAR (laser Detection and Ranging) sensors can directly acquire three-dimensional spatial information of the ground features, and Synthetic Aperture Radar (SAR) sensors can acquire amplitude and phase information.
Although these types of sensors play an important role in remote sensing earth observation and surface feature classification applications, there are drawbacks to using only a single sensor. For example, hyper Spectral Images (HSI) have rich Spectral information, and can identify different material attributes, but it is difficult to distinguish land features with similar spectra and different elevation information (e.g., grasslands and trees, parking lots and building roofs, roads and viaducts cannot be effectively distinguished only by HSI Spectral information); on the other hand, liDAR data can directly classify features of different elevations by using height information, but cannot distinguish features of the same height but different spectra (for example, asphalt and concrete, iron skin tiles and glazed tiles, trees and pseudo-tree signal towers and the like cannot be effectively distinguished by using LiDAR point cloud only). Therefore, any single sensor data cannot comprehensively capture real and accurate ground feature information, and the requirement for reliable remote sensing ground feature classification is difficult to meet. By combining LiDAR point cloud and HSI, the advantages of different types of data are fully utilized for complementation, and the method is a key technical means for realizing fine classification of remote sensing images.
Currently, liDAR point cloud and HSI fusion classification methods can be classified into the following categories: a fusion classification method based on feature stack, a fusion classification method based on low-dimensional subspace, a fusion classification method based on kernel transformation, and a fusion classification method based on deep learning.
Among them, feature stacking is the simplest and easiest to implement feature fusion method, however, a simple concatenation or stacking method may cause fused features to contain a large amount of redundant information, and due to limited labeled samples, the fusion method usually faces the problem of "dimension disaster", resulting in limited classification accuracy; the fusion method based on the low-dimensional subspace can effectively avoid dimension disasters generated in the classification process by decomposing high-dimensional hyperspectral data into the low-dimensional spectrum subspace and coefficients, and improve the calculation efficiency, however, the method needs to solve a complex decomposition model, and the performance of the method is greatly influenced by the coefficients obtained by the solution; the fusion method based on kernel transformation maps linear inseparable data in an original space into a high-dimensional space to enable the linear inseparable data to become linearly separable, and is widely used for LiDAR point cloud and HSI fusion classification research, however, the method needs manual kernel function selection and cannot guarantee that the performance of the selected kernel function is optimal in all scenes; the method based on deep learning is a current mainstream method, the method extracts high-representation semantic features by constructing a deep neural network and realizes deep fusion of the high-representation semantic features and the HSI and LiDAR point cloud features by fusing full-connection layers, however, the method based on deep learning needs a large number of label samples to perform model training, generally calibrated hyperspectral pixels are very limited, and the application of the deep learning method in the hyperspectral field is limited to a certain extent.
Although several exploratory works have been carried out for the HSI and LiDAR point cloud multi-source data fusion classification problem, better ground object classification results are obtained. However, since the complexity of the spatial structure of the remote sensing data is high, the heterogeneity between the HSI and the LiDAR point cloud is strong, the feature characterization capability obtained by the current multi-source data feature extraction and fusion method is still insufficient, and the requirement of high-precision classification of the local object is difficult to meet.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The invention mainly aims to provide a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium, and aims to solve the problems that the feature characterization capability obtained by a multi-source data feature extraction and fusion method in the prior art is insufficient, and the requirement of high-precision classification of the current ground features is difficult to meet.
In order to achieve the above object, the present invention provides an attention-based data classification optimization method, which includes the following steps:
dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set;
embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism;
acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;
and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.
Optionally, the method for data classification optimization based on an attention mechanism, wherein the dividing all labeled pixels into a training set and a test set and respectively acquiring true label data of the training set and the test set, further includes:
if it isAndrespectively representing marker pixel sets in the HSI and LiDAR point cloud depth images;
wherein the content of the first and second substances,andrespectively representAn HSI pixel and aA LiDAR pixel;is the total number of the marker pixel sets,is the number of HSI spectral bands;
Wherein the content of the first and second substances,is shown asThe true label of a single pixel of the image,representing the total number of categories.
Optionally, the method for data classification and optimization based on an attention mechanism, where the dividing all labeled pixels into a training set and a test set and respectively obtaining real label data of the training set and the test set specifically includes:
forming a sample pair by using pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;
andthe training set and the test set are represented separately,andthe true label data representing the training set and the test set, respectively, wherein,andrespectively represent the number of training samples and the number of test samples, and satisfy。
Optionally, the data classification optimization method based on the attention mechanism, wherein the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.
Optionally, the method for optimizing data classification based on attention mechanism includes:
to mark pixelsAndas a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair sample is constructedWherein, in the step (A),is a hyperspectral image block,is a block of depth images of a point cloud of LiDAR,is the image block size;
using two different convolution layers in respective pairsAndperforming a convolution operation such thatAndthe dimensions of the data are equal, and the preprocessed data are expressed as follows:
wherein the content of the first and second substances,andrespectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layerAndrespectively, of a convolution kernel size ofAndwherein, in the process,is the spatial size of the convolution kernel,the number of output channels of the convolution kernel.
Optionally, the method for data classification optimization based on an attention mechanism, wherein the residual-attention mechanism-based feature extraction module is configured to:
Wherein the content of the first and second substances,network functions being two convolutional layers, i.e.;
Wherein the content of the first and second substances,andin the form of a convolution kernel, the kernel is,andin order to be a vector of the offset,which represents a convolution operation, the operation of the convolution,representing a ReLU activation function;
wherein the content of the first and second substances,a global average pooling operation is represented as,which represents the regularization of the batch,andthe dimension reduction layer and the dimension increase layer are respectively represented,in order to reduce the factor for the channel,presentation inputThe number of characteristic channels of (a);
wherein the content of the first and second substances,andrepresenting two point-by-point convolution operations in the local feature extraction process,andrespectively having a convolution kernel size ofAnd(ii) a Local featureAnd inputThe sizes are the same;
the output characteristics of the multi-scale channel attention module are expressed as:
wherein, the first and the second end of the pipe are connected with each other,the weight of attention is represented by the weight of attention,which represents an element-by-element multiplication operation,it means that the broadcast addition method is performed,representing a sigmoid activation function;
after being processed by a plurality of residual error-attention mechanism modules, extracted HSI and LiDAR image characteristics are respectively recorded asAnd。
optionally, the method for data classification optimization based on an attention mechanism includes:
for the extracted HIS image characteristicsAnd LiDAR image featuresRespectively performing global pooling operation, and respectively generating corresponding semantic features through vector stretching and full-connection layer processingAnd;
two feature level fusion strategies are employed to take advantage of complementary information between HSI and LiDAR data;
wherein the first fusion strategy is feature fusion based on addition, and is directly opposite toAndadding the two to obtain the semantic features after the fusion of the two;
Wherein, the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pairAndfusing, vector stretching and full-connection layer processing to generate fused semantic features(ii) a After the features to be fused are subjected to summation operation, the features are input into a multi-scale channel attention module to generate attention-based fusion weights, which are expressed as follows:
wherein the content of the first and second substances,represents the fused features, M represents the fusion weights,andrepresenting two features to be fused;
after being processed, four semantic features are formed jointly, including two single-source data semantic featuresAndand two fused semantic featuresAnd。
optionally, the method for data classification optimization based on an attention mechanism includes:
semantic features of single source dataAndand fused semantic featuresAndrespectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results;
and optimizing the four classification results by adopting a decision-level fusion strategy, wherein the final classification result is expressed as:
optionally, the attention-based data classification optimization method includes acquiring training data with sample semantic information and similar information fused, where the supervised training of the multi-source data feature extraction and fusion network specifically includes:
and designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method.
Optionally, the data classification optimization method based on the attention mechanism, where the designing of a loss function fusing sample semantic information and similar information and solving of network parameters of the multi-source data feature extraction and fusion network by using a gradient descent method specifically includes:
constraining similarity between pairs of image block samples using depth hash-based metric learning;
the extracted semantic features are binarized into hash codes, and corresponding hash code matrixes are obtained:
wherein, the first and the second end of the pipe are connected with each other,、andhash code matrices representing HSI, liDAR and HSI-LiDAR respectively,andrespectively representHash codes for individual HSI and LiDAR pixels;
defining any sample pairOf similarity variableIf the two category labels are the same, thenOtherwise, the value is 0;
and (3) calculating the negative log-likelihood of the sample to the label to obtain the similarity loss between the single-source sample and the cross-source sample:
adopting semantic features of continuous variables to approximate the discrete hash code, wherein the quantization loss generated by the serialization is expressed as follows:
on the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
wherein the content of the first and second substances,,representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
wherein the content of the first and second substances,、、being a hyper-parameter, for balancing the weights of different types of losses;
and solving the objective function by adopting a gradient descent algorithm, and obtaining appropriate network parameters through continuous updating and iteration.
Optionally, the method for data classification optimization based on an attention mechanism, where the inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result specifically includes:
for any test sample pairWill beInputting the data into the trained multi-source data feature extraction and fusion network;
extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network、、And;
combining four semantic features、、Andrespectively inputting the data into different classifiers to obtain respective classification results;
integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
wherein the classifier adopts a softmax function.
In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the data classification optimization method comprises the following steps of a memory, a processor and an attention-based data classification optimization program stored on the memory and capable of running on the processor, wherein when executed by the processor, the attention-based data classification optimization program realizes the steps of the attention-based data classification optimization method.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program, when executed by a processor, implements the steps of the attention-based data classification optimization method as described above.
In the invention, all the marked pixels are divided into a training set and a testing set, and real label data of the training set and the testing set are respectively obtained; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;
FIG. 2 is a block diagram of a multi-source data feature extraction and fusion network according to a preferred embodiment of the data classification optimization method based on attention mechanism;
FIG. 3 is a schematic diagram of the data preprocessing module processing data according to the preferred embodiment of the data classification optimization method based on attention mechanism;
FIG. 4 is a schematic diagram of the feature extraction module based on the residual-attention mechanism for processing data according to the preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;
FIG. 5 is a schematic diagram of feature extraction using a multi-scale channel attention module MS-CAM according to a preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;
FIG. 6 is a schematic diagram of the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention;
FIG. 7 is a schematic diagram of the feature to be fused being input into the MS-CAM module after the summation operation to generate the fusion weight based on attention in the preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;
FIG. 8 is a schematic diagram of a decision-level fusion-based classification module for processing data according to a preferred embodiment of the present invention;
fig. 9 is a schematic operating environment of a terminal according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As shown in fig. 1, the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention includes the following steps:
and S10, dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set.
In particular, ifAndrespectively representing marker pixel sets in the HSI and LiDAR point cloud depth images; wherein the content of the first and second substances,andrespectively represent the firstAn HSI pixel anda LiDAR pixel;is the total number of the marker pixel sets,is the number of HSI spectral bands.
Corresponding genuine tag data is expressed as(ii) a Wherein the content of the first and second substances,is shown asThe true label of an individual pixel or pixels,representing the total number of categories.
Forming a sample pair by pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;andthe training set and the test set are represented separately,andthe true label data representing the training set and the test set, respectively, wherein,andrespectively represent the number of training samples and the number of test samples, and satisfy。
And S20, embedding the attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism.
Specifically, an attention mechanism is embedded into a convolutional neural network, a multi-source data feature extraction and fusion network based on the attention mechanism is constructed, and high-representation semantic features are extracted; as shown in fig. 2, the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.
WhereinFor the data preprocessing module, as shown in fig. 3, the data preprocessing module includes two parts of image fetching and dimension transformation. First, marking the pixelsAndas a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair sample is constructedWherein, in the step (A),is a hyperspectral image block,is a block of depth images of a point cloud of LiDAR,is the image block size; secondly, two different convolution layers are adopted to be respectively pairedAndperforming a convolution operation such thatAndthe dimensions of the data are equal, and the preprocessed data are expressed as follows:
wherein the content of the first and second substances,andrespectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layerAndrespectively having a convolution kernel size ofAndwherein, in the process,is the spatial size of the convolution kernel,the number of output channels of the convolution kernel.
For the feature extraction module based on the residual error-attention mechanism, as shown in fig. 4, the constructed feature extraction module based on the residual error-attention mechanism adopts a dual-branch structure, and a weight sharing mechanism is adopted between each branch to reduce the number of network parameters. Each branch is in turn made up of a number of residual-attention mechanism blocks (i.e., res-MS-CAM blocks). In the residual learning, the network residual between the plurality of convolutional layers is made to be zero by adopting jump connection, so that the network residual is made to be approximately identical mapping. The network performance of the network is better under a deep structure because the network parameters are not increased in the jump connection and the training of the whole network is optimized.
Wherein the content of the first and second substances,network functions being two convolutional layers, i.e.(ii) a Wherein, the first and the second end of the pipe are connected with each other,andin the form of a convolution kernel, the kernel is,andin order to be a vector of the offset,which represents a convolution operation, the operation of the convolution,indicating the ReLU activation function.
In addition, in order to focus the network on more significant information in the feature extraction process, the implementation of the invention also adopts a Multi-Scale Channel Attention Module (MS-CAM). As shown in fig. 5, the MS-CAM utilizes both global and local features. If the input of the multi-scale channel attention module (MS-CAM) isExtracted global featuresExpressed as:
wherein the content of the first and second substances,a global average pooling operation is represented as,which represents the regularization of the batch,andthe dimension reduction layer and the dimension increase layer are respectively represented,in order to reduce the factor by which the channel is,presentation inputThe number of characteristic channels of (2).
whereinAndrepresenting two point-by-point convolution operations in the local feature extraction process,andrespectively having a convolution kernel size ofAnd(ii) a Thus, local featuresAnd inputThe sizes are the same.
Finally, the output characteristics of the multi-scale channel attention module (MS-CAM) are expressed as:
wherein the content of the first and second substances,the weight of attention is represented as a weight of attention,which represents an element-by-element multiplication operation,it means that the broadcast addition method is performed,representing the sigmoid activation function.
After processing by a plurality of residual error-attention mechanism modules (Res-MS-CAM), extracted HSI and LiDAR image characteristics are respectively recorded asAnd。
wherein, for the attention-based feature fusion module, as shown in fig. 6, the extracted HIS image features are processedAnd LiDAR image featuresRespectively carrying out global pooling (Globavalgpool), and then respectively generating corresponding semantic features through vector stretching (Flatten) and full connection layer (FC) processingAnd。
in addition, the invention also adopts two characteristic level fusion strategies to utilize complementary information between HSI and LiDAR data; wherein the first fusion strategy is an additive-based feature fusion, i.e. direct pairAndadding to obtain the semantic features after the two are fused(ii) a The second fusion strategy is basedIn Feature Fusion of the Attention mechanism, an Attention Feature Fusion (AFF) module pair is first adoptedAndfusing, and processing with vector stretching (Flatten) and full connection layer (FC) to generate fused semantic features(ii) a As shown in fig. 7, after the features to be fused are summed, they are input into the multi-scale channel attention module to generate the attention-based fusion weight, which is expressed as follows:
wherein the content of the first and second substances,represents the fused features, M represents the fusion weights,andrepresenting two features to be fused.
Compared with fusion based on addition, AFF simultaneously utilizes local and global characteristics of input characteristics and realizes depth fusion from the same layer to the cross-layer.
Therefore, after the processing of the modules, four semantic features are formed together, including two single-source data semantic featuresAndand two fused semantic featuresAnd。
for the decision-level fusion-based classification module, as shown in fig. 8, the four semantic features are respectively input into different classifiers, that is, the semantic features of the single-source data are input into different classifiersAndand fused semantic featuresAndrespectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results; in order to improve the classification result, the implementation of the invention adopts a decision-level fusion strategy to optimize the four classification results, namely the final classification result is expressed as:
and S30, acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network.
Specifically, a loss function fusing sample semantic information and similar information is designed, and a gradient descent method is adopted to solve network parameters of the multi-source data feature extraction and fusion network. The sample similarity information refers to the similarity between samples, that is, the feature distance of samples in the same class should be as small as possible, and the feature distance of samples in different classes should be as large as possible. To learn the similarity information between samples, the present embodiment employs metric learning based on depth hash to constrain the similarity between image block sample pairs.
Firstly, further binarizing the extracted semantic features into hash codes to obtain corresponding hash code matrixes:
wherein the content of the first and second substances,、andhash code matrices representing HSI, liDAR and HSI-LiDAR respectively,andrespectively representHash codes of individual HSI and LiDAR pixels.
In addition, any sample pair is definedOf similarity variableIf 2 isIf the class labels are the same, thenOtherwise, it is 0.
Based on the above definition, the similarity loss between the single-source and cross-source samples is obtained by calculating the negative log-likelihood of the samples to the label:
due to loss functionThere is a discontinuity constraint (i.e., the hash code matrix elements are discrete values) and solving the loss function directly is an NP-hard problem. To this end, embodiments of the present invention employ semantic features of continuous variables (i.e.) To approximate a discrete hash code (i.e.) The quantization loss generated by the serialization is expressed as:
in addition to inter-sample correlation, each sample has rich semantic information. On the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
wherein the content of the first and second substances,,representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
wherein the content of the first and second substances,、、being a hyper-parameter, for balancing the weights of different types of losses; i.e., by minimizing the above-mentioned loss function, the predicted class of the network output can be made as close as possible to the true class of the sample,
the embodiment of the invention adopts a gradient descent algorithm to solve the objective function, and obtains appropriate network parameters through continuous updating iteration.
And S40, inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.
In particular, for any one test sample pairWill beInputting the data into the trained multi-source data feature extraction and fusion network; extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network、、And(ii) a Combining four semantic features、、Andrespectively inputting the data into different classifiers to obtain respective classification results; and finally, integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
wherein the classifier adopts a softmax function.
Further, this embodiment shows classification results under different metric indexes, and the adopted classification metric indexes include: overall Accuracy (OA), average Accuracy (AA), class Accuracy (CA), and Kappa coefficient. In addition to the method proposed in this embodiment, other deep learning based methods of HSI and LiDAR classification are further compared, including: two-branch CNN, FDSSCN, coupled CNN. Table 1 shows the results of the quantitative comparisons of the different classification methods.
TABLE 1 Classification results of different methods on Houston data sets
As can be seen from Table 1, the method provided by the embodiment of the present invention achieves the best classification results on the three indexes OA, AA and Kappa. In addition, the classification results of the method of the present invention are also higher in most categories than other classification methods. The experimental results further prove the effectiveness and superiority of the method for multi-source data fusion classification.
Further, as shown in fig. 9, based on the above method and system for optimizing data classification based on attention mechanism, the present invention also provides a terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 9 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk provided on the terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an attention-based data classification optimization program 40, and the attention-based data classification optimization program 40 can be executed by the processor 10 to implement the attention-based data classification optimization method of the present application.
The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, for example, execute the attention-based data classification optimization method.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.
In one embodiment, the steps of the attention-based data classification optimization method described above are implemented when the processor 10 executes the attention-based data classification optimization program 40 in the memory 20.
The present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program implements the steps of the attention-based data classification optimization method as described above when executed by a processor.
In summary, the present invention provides a data classification optimization method based on attention mechanism and related apparatus, the method includes: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is obviously improved, the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion, and an effective method is provided for the current combined utilization of multi-source data.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in the process, method, article, or terminal that comprises the element.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.
Claims (4)
1. An attention-based data classification optimization method is characterized by comprising the following steps:
dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set;
embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism;
acquiring training data fusing sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;
inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result;
the method for dividing all the labeled pixels into a training set and a test set and respectively obtaining the real label data of the training set and the test set comprises the following steps:
if it isAnd &>Respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images;
wherein the content of the first and second substances,and &>Respectively denote a fifth->An HSI pixel and a ^ th ^ or ^ th ^>A LiDAR pixel; />Is the total number of the marker pixel sets,is the number of HSI spectral bands;
Wherein, the first and the second end of the pipe are connected with each other,indicates the fifth->A true label of an individual pixel, based on the number of pixels in the image>Representing a total number of categories;
the dividing all the labeled pixels into a training set and a test set, and respectively obtaining real label data of the training set and the test set specifically includes:
forming a sample pair by using pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;
and &>The training set and the test set are represented separately,and &>Real label data representing a training set and a test set, respectively, wherein ` is `>And &>Represents the number of training samples and the number of test samples, respectively, and satisfies->;
The multi-source data feature extraction and fusion network comprises: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification-based module;
the data preprocessing module is used for:
marking the pixelAnd &>As a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair is constructed to be based on a sample->Wherein is present>For a hyperspectral image block,>for LiDAR point cloud depth image block, \>Is the image block size;
using two different convolutional layers in respective pairsAnd &>Performing convolution operation to make->And &>The dimensions of the data are equal, and the preprocessed data are expressed as follows:
wherein the content of the first and second substances,and &>Respectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layerAnd &>Respectively has a convolution kernel size of->Andin which>Is the spatial magnitude of the convolution kernel, < >>Being convolution kernelsThe number of output channels;
the residual-attention mechanism-based feature extraction module is configured to:
Wherein, the first and the second end of the pipe are connected with each other,network functions being two convolutional layers, i.e.;
Wherein the content of the first and second substances,and &>For convolution kernel, <' > based on>And &>For a bias vector>Represents a convolution operation, <' > or>Representing a ReLU activation function;
wherein the content of the first and second substances,represents a global average pooling operation, <' > or>Indicating that the batch is regularized>Andrepresents a dimension decreasing layer and a dimension increasing layer, <' > respectively>Is reduced by a factor for the channel>Represents input>The number of characteristic channels of (a);
wherein, the first and the second end of the pipe are connected with each other,and &>Represents two point-by-point convolution operations in a local feature extraction process, and>and &>Respectively has a convolution kernel size of->And &>(ii) a Local characteristic->And input->The sizes are the same;
the output characteristics of the multi-scale channel attention module are expressed as:
wherein the content of the first and second substances,represents an attention weight, is asserted>Represents an element-by-element multiplication operation, and->Indicates broadcast addition, and->Representing a sigmoid activation function;
the extracted HSI and LiDAR images are processed by a plurality of residual error-attention mechanism modulesFeatures are respectively noted asAnd &>;
The attention-based mechanism feature fusion module is to:
for the extracted HIS image characteristicsAnd LiDAR image features>Respectively performing global pooling operation, and respectively generating corresponding semantic features based on vector stretching and full-link layer processing>And &>;
Two feature level fusion strategies are employed to take advantage of complementary information between HSI and LiDAR data;
wherein the first fusion strategy is feature fusion based on addition, and is directly opposite toAnd &>Adding the semantic features to obtain the semantic feature of the fused cell phone and the semantic feature of the fused cell phone>;
Wherein the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pairAnd &>Fusing, vector stretching and full-connection layer processing to generate fused semantic features(ii) a After the features to be fused are subjected to summation operation, the features are input into a multi-scale channel attention module to generate attention-based fusion weights, which are expressed as follows:
wherein the content of the first and second substances,represents the feature after fusion, M represents the fusion weight, is>And &>Representing two features to be fused;
after being processed, four semantic features are formed jointly, including two single-source data semantic featuresAnd &>And two fused semantic features->And &>;
The decision-level based fusion classification module is to:
single source data semanticsFeature(s)And &>And fused semantic feature>And &>Respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results;
and optimizing the four classification results by adopting a decision-level fusion strategy, wherein the final classification result is expressed as:
the method comprises the following steps of acquiring training data fusing sample semantic information and similar information, wherein the supervised training of the multi-source data feature extraction and fusion network specifically comprises the following steps:
designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method;
the method comprises the following steps of designing a loss function fusing semantic information and similar information of a sample, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method, wherein the method specifically comprises the following steps:
adopting metric learning based on depth hash to restrict the similarity between the image block sample pairs;
the extracted semantic features are binarized into hash codes, and corresponding hash code matrixes are obtained:
wherein the content of the first and second substances,、/>and &>Hash code matrices representing HSI, liDAR and HSI-LiDAR, respectively, < >>And &>Respectively denote a fifth->Hash codes for individual HSI and LiDAR pixels;
defining any sample pairIn a degree of similarity variable>If the two category labels are the same, then->Otherwise, the value is 0;
and (3) calculating the negative log-likelihood of the sample to the label to obtain the similarity loss between the single-source sample and the cross-source sample:
wherein, the first and the second end of the pipe are connected with each other,representing a sigmoid activation function;
approximating the discrete hash code by adopting the semantic features of continuous variables, wherein the quantization loss generated by the serialization is expressed as follows:
on the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
wherein the content of the first and second substances,,/>representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
wherein the content of the first and second substances,、/>、/>being a hyper-parameter, for balancing the weights of different types of losses;
and solving the objective function by adopting a gradient descent algorithm, and obtaining appropriate network parameters through continuous updating and iteration.
2. The attention-based data classification optimization method of claim 1, wherein the inputting of the sample to be tested into the trained multi-source data feature extraction and fusion network and the outputting of the final classification label according to the decision-level fusion result specifically comprises:
for any test sample pairWill>Inputting the data into the trained multi-source data feature extraction and fusion network;
extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network、/>、/>And &>;
Combining four semantic features、/>、/>And &>Respectively inputting the data into different classifiers to obtain respective classification results;
integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
wherein the classifier adopts a softmax function.
3. A terminal, characterized in that the terminal comprises: a memory, a processor, and an attention-based system data classification optimization program stored on the memory and executable on the processor, the attention-based system data classification optimization program when executed by the processor implementing the steps of the attention-based system data classification optimization method of any of claims 1-2.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an attention-based data classification optimization program, which when executed by a processor implements the steps of the attention-based data classification optimization method according to any one of claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211550245.2A CN115546569B (en) | 2022-12-05 | 2022-12-05 | Attention mechanism-based data classification optimization method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211550245.2A CN115546569B (en) | 2022-12-05 | 2022-12-05 | Attention mechanism-based data classification optimization method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115546569A CN115546569A (en) | 2022-12-30 |
CN115546569B true CN115546569B (en) | 2023-04-07 |
Family
ID=84722227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211550245.2A Active CN115546569B (en) | 2022-12-05 | 2022-12-05 | Attention mechanism-based data classification optimization method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546569B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894972B (en) * | 2023-06-25 | 2024-02-13 | 耕宇牧星(北京)空间科技有限公司 | Wetland information classification method and system integrating airborne camera image and SAR image |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022073452A1 (en) * | 2020-10-07 | 2022-04-14 | 武汉大学 | Hyperspectral remote sensing image classification method based on self-attention context network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993220B (en) * | 2019-03-23 | 2022-12-06 | 西安电子科技大学 | Multi-source remote sensing image classification method based on double-path attention fusion neural network |
CN113435253B (en) * | 2021-05-31 | 2022-12-02 | 西安电子科技大学 | Multi-source image combined urban area ground surface coverage classification method |
CN114708455A (en) * | 2022-03-24 | 2022-07-05 | 中国人民解放军战略支援部队信息工程大学 | Hyperspectral image and LiDAR data collaborative classification method |
-
2022
- 2022-12-05 CN CN202211550245.2A patent/CN115546569B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022073452A1 (en) * | 2020-10-07 | 2022-04-14 | 武汉大学 | Hyperspectral remote sensing image classification method based on self-attention context network |
Also Published As
Publication number | Publication date |
---|---|
CN115546569A (en) | 2022-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321963B (en) | Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features | |
Shabbir et al. | Satellite and scene image classification based on transfer learning and fine tuning of ResNet50 | |
Delibasoglu et al. | Improved U-Nets with inception blocks for building detection | |
Li et al. | Toward in situ zooplankton detection with a densely connected YOLOV3 model | |
Zhang et al. | Semantic segmentation of very high-resolution remote sensing image based on multiple band combinations and patchwise scene analysis | |
Zhou et al. | Surveillance of pine wilt disease by high resolution satellite | |
CN115546569B (en) | Attention mechanism-based data classification optimization method and related equipment | |
Huang et al. | Attention-guided label refinement network for semantic segmentation of very high resolution aerial orthoimages | |
CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
Abbas et al. | Deep neural networks for automatic flower species localization and recognition | |
Sjahputera et al. | Clustering of detected changes in high-resolution satellite imagery using a stabilized competitive agglomeration algorithm | |
Ps et al. | Building footprint extraction from very high-resolution satellite images using deep learning | |
Cheng et al. | Multi-scale Feature Fusion and Transformer Network for urban green space segmentation from high-resolution remote sensing images | |
Li | Segment Any Building | |
Song et al. | Multi-source remote sensing image classification based on two-channel densely connected convolutional networks. | |
CN116630700A (en) | Remote sensing image classification method based on introduction channel-space attention mechanism | |
İsa | Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images | |
Alshammari et al. | An efficient deep learning mechanism for the recognition of olive trees in Jouf Region | |
Wu et al. | Research on asphalt pavement disease detection based on improved YOLOv5s | |
Sivagami et al. | Analysis of encoder-decoder based deep learning architectures for semantic segmentation in remote sensing images | |
Moody et al. | Land cover classification in multispectral satellite imagery using sparse approximations on learned dictionaries | |
Yuan et al. | Hyperspectral image classification using residual 2d and 3d convolutional neural network joint attention model | |
He et al. | Tackling the over-smoothing problem of CNN-based hyperspectral image classification | |
Yifter et al. | Deep transfer learning of satellite imagery for land use and land cover classification | |
Subhashini et al. | A hybrid optimal technique for road extraction using entropy rate super-pixel segmentation and probabilistic neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |