CN115546569B - Attention mechanism-based data classification optimization method and related equipment - Google Patents

Attention mechanism-based data classification optimization method and related equipment Download PDF

Info

Publication number
CN115546569B
CN115546569B CN202211550245.2A CN202211550245A CN115546569B CN 115546569 B CN115546569 B CN 115546569B CN 202211550245 A CN202211550245 A CN 202211550245A CN 115546569 B CN115546569 B CN 115546569B
Authority
CN
China
Prior art keywords
fusion
data
attention
classification
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211550245.2A
Other languages
Chinese (zh)
Other versions
CN115546569A (en
Inventor
宋伟伟
莫继学
戴勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202211550245.2A priority Critical patent/CN115546569B/en
Publication of CN115546569A publication Critical patent/CN115546569A/en
Application granted granted Critical
Publication of CN115546569B publication Critical patent/CN115546569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a data classification optimization method based on an attention mechanism and related equipment, wherein the method comprises the following steps: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the HSI and the LiDAR are accurately classified through efficient feature extraction and fusion.

Description

Attention mechanism-based data classification optimization method and related equipment
Technical Field
The invention relates to the technical field of multi-source data fusion classification, in particular to a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium.
Background
With the rapid development of earth observation technology, different types of sensors have been developed to obtain multi-source information of ground objects (ground objects). For example, multispectral and hyperspectral cameras can acquire spectral attributes of the ground features, liDAR (laser Detection and Ranging) sensors can directly acquire three-dimensional spatial information of the ground features, and Synthetic Aperture Radar (SAR) sensors can acquire amplitude and phase information.
Although these types of sensors play an important role in remote sensing earth observation and surface feature classification applications, there are drawbacks to using only a single sensor. For example, hyper Spectral Images (HSI) have rich Spectral information, and can identify different material attributes, but it is difficult to distinguish land features with similar spectra and different elevation information (e.g., grasslands and trees, parking lots and building roofs, roads and viaducts cannot be effectively distinguished only by HSI Spectral information); on the other hand, liDAR data can directly classify features of different elevations by using height information, but cannot distinguish features of the same height but different spectra (for example, asphalt and concrete, iron skin tiles and glazed tiles, trees and pseudo-tree signal towers and the like cannot be effectively distinguished by using LiDAR point cloud only). Therefore, any single sensor data cannot comprehensively capture real and accurate ground feature information, and the requirement for reliable remote sensing ground feature classification is difficult to meet. By combining LiDAR point cloud and HSI, the advantages of different types of data are fully utilized for complementation, and the method is a key technical means for realizing fine classification of remote sensing images.
Currently, liDAR point cloud and HSI fusion classification methods can be classified into the following categories: a fusion classification method based on feature stack, a fusion classification method based on low-dimensional subspace, a fusion classification method based on kernel transformation, and a fusion classification method based on deep learning.
Among them, feature stacking is the simplest and easiest to implement feature fusion method, however, a simple concatenation or stacking method may cause fused features to contain a large amount of redundant information, and due to limited labeled samples, the fusion method usually faces the problem of "dimension disaster", resulting in limited classification accuracy; the fusion method based on the low-dimensional subspace can effectively avoid dimension disasters generated in the classification process by decomposing high-dimensional hyperspectral data into the low-dimensional spectrum subspace and coefficients, and improve the calculation efficiency, however, the method needs to solve a complex decomposition model, and the performance of the method is greatly influenced by the coefficients obtained by the solution; the fusion method based on kernel transformation maps linear inseparable data in an original space into a high-dimensional space to enable the linear inseparable data to become linearly separable, and is widely used for LiDAR point cloud and HSI fusion classification research, however, the method needs manual kernel function selection and cannot guarantee that the performance of the selected kernel function is optimal in all scenes; the method based on deep learning is a current mainstream method, the method extracts high-representation semantic features by constructing a deep neural network and realizes deep fusion of the high-representation semantic features and the HSI and LiDAR point cloud features by fusing full-connection layers, however, the method based on deep learning needs a large number of label samples to perform model training, generally calibrated hyperspectral pixels are very limited, and the application of the deep learning method in the hyperspectral field is limited to a certain extent.
Although several exploratory works have been carried out for the HSI and LiDAR point cloud multi-source data fusion classification problem, better ground object classification results are obtained. However, since the complexity of the spatial structure of the remote sensing data is high, the heterogeneity between the HSI and the LiDAR point cloud is strong, the feature characterization capability obtained by the current multi-source data feature extraction and fusion method is still insufficient, and the requirement of high-precision classification of the local object is difficult to meet.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The invention mainly aims to provide a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium, and aims to solve the problems that the feature characterization capability obtained by a multi-source data feature extraction and fusion method in the prior art is insufficient, and the requirement of high-precision classification of the current ground features is difficult to meet.
In order to achieve the above object, the present invention provides an attention-based data classification optimization method, which includes the following steps:
dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set;
embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism;
acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;
and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.
Optionally, the method for data classification optimization based on an attention mechanism, wherein the dividing all labeled pixels into a training set and a test set and respectively acquiring true label data of the training set and the test set, further includes:
if it is
Figure DEST_PATH_IMAGE001
And
Figure DEST_PATH_IMAGE002
respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images;
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
and
Figure DEST_PATH_IMAGE004
respectively represent
Figure DEST_PATH_IMAGE005
An HSI pixel and a
Figure 511428DEST_PATH_IMAGE005
A LiDAR pixel;
Figure DEST_PATH_IMAGE006
is the total number of the marker pixel sets,
Figure DEST_PATH_IMAGE007
is the number of HSI spectral bands;
the real tag data is expressed as
Figure DEST_PATH_IMAGE008
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE009
is shown as
Figure 331617DEST_PATH_IMAGE005
The true label of a single pixel of the image,
Figure DEST_PATH_IMAGE010
representing the total number of categories.
Optionally, the method for data classification and optimization based on an attention mechanism, where the dividing all labeled pixels into a training set and a test set and respectively obtaining real label data of the training set and the test set specifically includes:
forming a sample pair by using pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;
Figure DEST_PATH_IMAGE011
and
Figure DEST_PATH_IMAGE012
the training set and the test set are represented separately,
Figure DEST_PATH_IMAGE013
and
Figure DEST_PATH_IMAGE014
the true label data representing the training set and the test set, respectively, wherein,
Figure DEST_PATH_IMAGE015
and
Figure DEST_PATH_IMAGE016
respectively represent the number of training samples and the number of test samples, and satisfy
Figure DEST_PATH_IMAGE017
Optionally, the data classification optimization method based on the attention mechanism, wherein the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.
Optionally, the method for optimizing data classification based on attention mechanism includes:
to mark pixels
Figure DEST_PATH_IMAGE018
And
Figure DEST_PATH_IMAGE019
as a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair sample is constructed
Figure DEST_PATH_IMAGE020
Wherein, in the step (A),
Figure DEST_PATH_IMAGE021
is a hyperspectral image block,
Figure DEST_PATH_IMAGE022
is a block of depth images of a point cloud of LiDAR,
Figure DEST_PATH_IMAGE023
is the image block size;
using two different convolution layers in respective pairs
Figure DEST_PATH_IMAGE024
And
Figure DEST_PATH_IMAGE025
performing a convolution operation such that
Figure 261265DEST_PATH_IMAGE024
And
Figure 203813DEST_PATH_IMAGE025
the dimensions of the data are equal, and the preprocessed data are expressed as follows:
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
and
Figure DEST_PATH_IMAGE029
respectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layer
Figure DEST_PATH_IMAGE030
And
Figure DEST_PATH_IMAGE031
respectively, of a convolution kernel size of
Figure DEST_PATH_IMAGE032
And
Figure DEST_PATH_IMAGE033
wherein, in the process,
Figure DEST_PATH_IMAGE034
is the spatial size of the convolution kernel,
Figure DEST_PATH_IMAGE035
the number of output channels of the convolution kernel.
Optionally, the method for data classification optimization based on an attention mechanism, wherein the residual-attention mechanism-based feature extraction module is configured to:
if it is
Figure DEST_PATH_IMAGE036
For a residual block input, the output is expressed as
Figure DEST_PATH_IMAGE037
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE038
network functions being two convolutional layers, i.e.
Figure DEST_PATH_IMAGE039
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE040
and
Figure DEST_PATH_IMAGE041
in the form of a convolution kernel, the kernel is,
Figure DEST_PATH_IMAGE042
and
Figure DEST_PATH_IMAGE043
in order to be a vector of the offset,
Figure DEST_PATH_IMAGE044
which represents a convolution operation, the operation of the convolution,
Figure DEST_PATH_IMAGE045
representing a ReLU activation function;
if the input of the multi-scale channel attention module is
Figure DEST_PATH_IMAGE046
Extracted global features
Figure DEST_PATH_IMAGE047
Expressed as:
Figure DEST_PATH_IMAGE048
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE049
a global average pooling operation is represented as,
Figure DEST_PATH_IMAGE050
which represents the regularization of the batch,
Figure DEST_PATH_IMAGE051
and
Figure DEST_PATH_IMAGE052
the dimension reduction layer and the dimension increase layer are respectively represented,
Figure DEST_PATH_IMAGE053
in order to reduce the factor for the channel,
Figure DEST_PATH_IMAGE054
presentation input
Figure 15649DEST_PATH_IMAGE046
The number of characteristic channels of (a);
local features
Figure DEST_PATH_IMAGE055
Expressed as:
Figure DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE057
and
Figure DEST_PATH_IMAGE058
representing two point-by-point convolution operations in the local feature extraction process,
Figure 639528DEST_PATH_IMAGE057
and
Figure 846518DEST_PATH_IMAGE058
respectively having a convolution kernel size of
Figure DEST_PATH_IMAGE059
And
Figure DEST_PATH_IMAGE060
(ii) a Local feature
Figure 615760DEST_PATH_IMAGE055
And input
Figure 806570DEST_PATH_IMAGE046
The sizes are the same;
the output characteristics of the multi-scale channel attention module are expressed as:
Figure DEST_PATH_IMAGE061
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE062
the weight of attention is represented by the weight of attention,
Figure DEST_PATH_IMAGE063
which represents an element-by-element multiplication operation,
Figure DEST_PATH_IMAGE064
it means that the broadcast addition method is performed,
Figure DEST_PATH_IMAGE065
representing a sigmoid activation function;
after being processed by a plurality of residual error-attention mechanism modules, extracted HSI and LiDAR image characteristics are respectively recorded as
Figure DEST_PATH_IMAGE066
And
Figure DEST_PATH_IMAGE067
optionally, the method for data classification optimization based on an attention mechanism includes:
for the extracted HIS image characteristics
Figure 636072DEST_PATH_IMAGE066
And LiDAR image features
Figure 228728DEST_PATH_IMAGE067
Respectively performing global pooling operation, and respectively generating corresponding semantic features through vector stretching and full-connection layer processing
Figure DEST_PATH_IMAGE068
And
Figure DEST_PATH_IMAGE069
two feature level fusion strategies are employed to take advantage of complementary information between HSI and LiDAR data;
wherein the first fusion strategy is feature fusion based on addition, and is directly opposite to
Figure 44237DEST_PATH_IMAGE068
And
Figure 332130DEST_PATH_IMAGE069
adding the two to obtain the semantic features after the fusion of the two
Figure DEST_PATH_IMAGE070
Wherein, the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pair
Figure 891287DEST_PATH_IMAGE068
And
Figure 72870DEST_PATH_IMAGE069
fusing, vector stretching and full-connection layer processing to generate fused semantic features
Figure DEST_PATH_IMAGE071
(ii) a After the features to be fused are subjected to summation operation, the features are input into a multi-scale channel attention module to generate attention-based fusion weights, which are expressed as follows:
Figure DEST_PATH_IMAGE072
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE073
represents the fused features, M represents the fusion weights,
Figure DEST_PATH_IMAGE074
and
Figure DEST_PATH_IMAGE075
representing two features to be fused;
after being processed, four semantic features are formed jointly, including two single-source data semantic features
Figure 980652DEST_PATH_IMAGE068
And
Figure 21420DEST_PATH_IMAGE069
and two fused semantic features
Figure DEST_PATH_IMAGE076
And
Figure DEST_PATH_IMAGE077
optionally, the method for data classification optimization based on an attention mechanism includes:
semantic features of single source data
Figure 118689DEST_PATH_IMAGE068
And
Figure 420357DEST_PATH_IMAGE069
and fused semantic features
Figure 46511DEST_PATH_IMAGE076
And
Figure 558264DEST_PATH_IMAGE077
respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results;
and optimizing the four classification results by adopting a decision-level fusion strategy, wherein the final classification result is expressed as:
Figure DEST_PATH_IMAGE078
optionally, the attention-based data classification optimization method includes acquiring training data with sample semantic information and similar information fused, where the supervised training of the multi-source data feature extraction and fusion network specifically includes:
and designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method.
Optionally, the data classification optimization method based on the attention mechanism, where the designing of a loss function fusing sample semantic information and similar information and solving of network parameters of the multi-source data feature extraction and fusion network by using a gradient descent method specifically includes:
constraining similarity between pairs of image block samples using depth hash-based metric learning;
the extracted semantic features are binarized into hash codes, and corresponding hash code matrixes are obtained:
Figure DEST_PATH_IMAGE079
Figure DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE081
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE082
Figure DEST_PATH_IMAGE083
and
Figure DEST_PATH_IMAGE084
hash code matrices representing HSI, liDAR and HSI-LiDAR respectively,
Figure DEST_PATH_IMAGE085
and
Figure DEST_PATH_IMAGE086
respectively represent
Figure DEST_PATH_IMAGE087
Hash codes for individual HSI and LiDAR pixels;
defining any sample pair
Figure DEST_PATH_IMAGE088
Of similarity variable
Figure DEST_PATH_IMAGE089
If the two category labels are the same, then
Figure DEST_PATH_IMAGE090
Otherwise, the value is 0;
and (3) calculating the negative log-likelihood of the sample to the label to obtain the similarity loss between the single-source sample and the cross-source sample:
Figure DEST_PATH_IMAGE091
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE092
Figure DEST_PATH_IMAGE093
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE094
representing a sigmoid activation function;
adopting semantic features of continuous variables to approximate the discrete hash code, wherein the quantization loss generated by the serialization is expressed as follows:
Figure DEST_PATH_IMAGE095
on the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
Figure DEST_PATH_IMAGE096
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE097
Figure DEST_PATH_IMAGE098
representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
Figure DEST_PATH_IMAGE099
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE100
Figure DEST_PATH_IMAGE101
Figure DEST_PATH_IMAGE102
being a hyper-parameter, for balancing the weights of different types of losses;
and solving the objective function by adopting a gradient descent algorithm, and obtaining appropriate network parameters through continuous updating and iteration.
Optionally, the method for data classification optimization based on an attention mechanism, where the inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result specifically includes:
for any test sample pair
Figure DEST_PATH_IMAGE103
Will be
Figure 115016DEST_PATH_IMAGE103
Inputting the data into the trained multi-source data feature extraction and fusion network;
extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network
Figure DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE105
Figure DEST_PATH_IMAGE106
And
Figure DEST_PATH_IMAGE107
combining four semantic features
Figure 458141DEST_PATH_IMAGE104
Figure 520775DEST_PATH_IMAGE105
Figure 129611DEST_PATH_IMAGE106
And
Figure 303103DEST_PATH_IMAGE107
respectively inputting the data into different classifiers to obtain respective classification results;
integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
Figure DEST_PATH_IMAGE108
wherein the classifier adopts a softmax function.
In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the data classification optimization method comprises the following steps of a memory, a processor and an attention-based data classification optimization program stored on the memory and capable of running on the processor, wherein when executed by the processor, the attention-based data classification optimization program realizes the steps of the attention-based data classification optimization method.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program, when executed by a processor, implements the steps of the attention-based data classification optimization method as described above.
In the invention, all the marked pixels are divided into a training set and a testing set, and real label data of the training set and the testing set are respectively obtained; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;
FIG. 2 is a block diagram of a multi-source data feature extraction and fusion network according to a preferred embodiment of the data classification optimization method based on attention mechanism;
FIG. 3 is a schematic diagram of the data preprocessing module processing data according to the preferred embodiment of the data classification optimization method based on attention mechanism;
FIG. 4 is a schematic diagram of the feature extraction module based on the residual-attention mechanism for processing data according to the preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;
FIG. 5 is a schematic diagram of feature extraction using a multi-scale channel attention module MS-CAM according to a preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;
FIG. 6 is a schematic diagram of the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention;
FIG. 7 is a schematic diagram of the feature to be fused being input into the MS-CAM module after the summation operation to generate the fusion weight based on attention in the preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;
FIG. 8 is a schematic diagram of a decision-level fusion-based classification module for processing data according to a preferred embodiment of the present invention;
fig. 9 is a schematic operating environment of a terminal according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As shown in fig. 1, the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention includes the following steps:
and S10, dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set.
In particular, if
Figure 720309DEST_PATH_IMAGE001
And
Figure 688265DEST_PATH_IMAGE002
respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images; wherein the content of the first and second substances,
Figure 49977DEST_PATH_IMAGE003
and
Figure 27160DEST_PATH_IMAGE004
respectively represent the first
Figure 423506DEST_PATH_IMAGE005
An HSI pixel and
Figure 562364DEST_PATH_IMAGE005
a LiDAR pixel;
Figure 270425DEST_PATH_IMAGE006
is the total number of the marker pixel sets,
Figure 51300DEST_PATH_IMAGE007
is the number of HSI spectral bands.
Corresponding genuine tag data is expressed as
Figure DEST_PATH_IMAGE109
(ii) a Wherein the content of the first and second substances,
Figure 302152DEST_PATH_IMAGE009
is shown as
Figure 877490DEST_PATH_IMAGE005
The true label of an individual pixel or pixels,
Figure 823581DEST_PATH_IMAGE010
representing the total number of categories.
Forming a sample pair by pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;
Figure 142566DEST_PATH_IMAGE011
and
Figure 247926DEST_PATH_IMAGE012
the training set and the test set are represented separately,
Figure 728586DEST_PATH_IMAGE013
and
Figure 552185DEST_PATH_IMAGE014
the true label data representing the training set and the test set, respectively, wherein,
Figure 533916DEST_PATH_IMAGE015
and
Figure 759361DEST_PATH_IMAGE016
respectively represent the number of training samples and the number of test samples, and satisfy
Figure 410923DEST_PATH_IMAGE017
And S20, embedding the attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism.
Specifically, an attention mechanism is embedded into a convolutional neural network, a multi-source data feature extraction and fusion network based on the attention mechanism is constructed, and high-representation semantic features are extracted; as shown in fig. 2, the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.
WhereinFor the data preprocessing module, as shown in fig. 3, the data preprocessing module includes two parts of image fetching and dimension transformation. First, marking the pixels
Figure 456239DEST_PATH_IMAGE018
And
Figure DEST_PATH_IMAGE110
as a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair sample is constructed
Figure 117028DEST_PATH_IMAGE020
Wherein, in the step (A),
Figure 72345DEST_PATH_IMAGE021
is a hyperspectral image block,
Figure 160387DEST_PATH_IMAGE022
is a block of depth images of a point cloud of LiDAR,
Figure 692999DEST_PATH_IMAGE023
is the image block size; secondly, two different convolution layers are adopted to be respectively paired
Figure 891900DEST_PATH_IMAGE024
And
Figure 826358DEST_PATH_IMAGE025
performing a convolution operation such that
Figure 209934DEST_PATH_IMAGE024
And
Figure 229843DEST_PATH_IMAGE025
the dimensions of the data are equal, and the preprocessed data are expressed as follows:
Figure 232434DEST_PATH_IMAGE026
Figure 21398DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 451243DEST_PATH_IMAGE028
and
Figure 692868DEST_PATH_IMAGE029
respectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layer
Figure 374516DEST_PATH_IMAGE030
And
Figure 17987DEST_PATH_IMAGE031
respectively having a convolution kernel size of
Figure 884312DEST_PATH_IMAGE032
And
Figure 613234DEST_PATH_IMAGE033
wherein, in the process,
Figure 957628DEST_PATH_IMAGE034
is the spatial size of the convolution kernel,
Figure 580239DEST_PATH_IMAGE035
the number of output channels of the convolution kernel.
For the feature extraction module based on the residual error-attention mechanism, as shown in fig. 4, the constructed feature extraction module based on the residual error-attention mechanism adopts a dual-branch structure, and a weight sharing mechanism is adopted between each branch to reduce the number of network parameters. Each branch is in turn made up of a number of residual-attention mechanism blocks (i.e., res-MS-CAM blocks). In the residual learning, the network residual between the plurality of convolutional layers is made to be zero by adopting jump connection, so that the network residual is made to be approximately identical mapping. The network performance of the network is better under a deep structure because the network parameters are not increased in the jump connection and the training of the whole network is optimized.
In the examples of the invention, if
Figure 617465DEST_PATH_IMAGE036
For a residual block input, the output is expressed as
Figure 568103DEST_PATH_IMAGE037
Wherein the content of the first and second substances,
Figure 716188DEST_PATH_IMAGE038
network functions being two convolutional layers, i.e.
Figure 334251DEST_PATH_IMAGE039
(ii) a Wherein, the first and the second end of the pipe are connected with each other,
Figure 417745DEST_PATH_IMAGE040
and
Figure 855679DEST_PATH_IMAGE041
in the form of a convolution kernel, the kernel is,
Figure 807455DEST_PATH_IMAGE042
and
Figure 280024DEST_PATH_IMAGE043
in order to be a vector of the offset,
Figure 659053DEST_PATH_IMAGE044
which represents a convolution operation, the operation of the convolution,
Figure 584284DEST_PATH_IMAGE045
indicating the ReLU activation function.
In addition, in order to focus the network on more significant information in the feature extraction process, the implementation of the invention also adopts a Multi-Scale Channel Attention Module (MS-CAM). As shown in fig. 5, the MS-CAM utilizes both global and local features. If the input of the multi-scale channel attention module (MS-CAM) is
Figure 198805DEST_PATH_IMAGE046
Extracted global features
Figure 791460DEST_PATH_IMAGE047
Expressed as:
Figure DEST_PATH_IMAGE111
wherein the content of the first and second substances,
Figure 75811DEST_PATH_IMAGE049
a global average pooling operation is represented as,
Figure 753917DEST_PATH_IMAGE050
which represents the regularization of the batch,
Figure 922861DEST_PATH_IMAGE051
and
Figure 370023DEST_PATH_IMAGE052
the dimension reduction layer and the dimension increase layer are respectively represented,
Figure 825275DEST_PATH_IMAGE053
in order to reduce the factor by which the channel is,
Figure 725098DEST_PATH_IMAGE054
presentation input
Figure 556788DEST_PATH_IMAGE046
The number of characteristic channels of (2).
In addition, local features
Figure DEST_PATH_IMAGE112
Expressed as:
Figure DEST_PATH_IMAGE113
wherein
Figure 514249DEST_PATH_IMAGE057
And
Figure 405981DEST_PATH_IMAGE058
representing two point-by-point convolution operations in the local feature extraction process,
Figure 527521DEST_PATH_IMAGE057
and
Figure 162902DEST_PATH_IMAGE058
respectively having a convolution kernel size of
Figure 460022DEST_PATH_IMAGE059
And
Figure 522656DEST_PATH_IMAGE060
(ii) a Thus, local features
Figure 131492DEST_PATH_IMAGE055
And input
Figure 304984DEST_PATH_IMAGE046
The sizes are the same.
Finally, the output characteristics of the multi-scale channel attention module (MS-CAM) are expressed as:
Figure DEST_PATH_IMAGE114
wherein the content of the first and second substances,
Figure 971458DEST_PATH_IMAGE062
the weight of attention is represented as a weight of attention,
Figure 939414DEST_PATH_IMAGE063
which represents an element-by-element multiplication operation,
Figure 301125DEST_PATH_IMAGE064
it means that the broadcast addition method is performed,
Figure 278308DEST_PATH_IMAGE065
representing the sigmoid activation function.
After processing by a plurality of residual error-attention mechanism modules (Res-MS-CAM), extracted HSI and LiDAR image characteristics are respectively recorded as
Figure 409075DEST_PATH_IMAGE066
And
Figure 547933DEST_PATH_IMAGE067
wherein, for the attention-based feature fusion module, as shown in fig. 6, the extracted HIS image features are processed
Figure 272306DEST_PATH_IMAGE066
And LiDAR image features
Figure 53180DEST_PATH_IMAGE067
Respectively carrying out global pooling (Globavalgpool), and then respectively generating corresponding semantic features through vector stretching (Flatten) and full connection layer (FC) processing
Figure 38454DEST_PATH_IMAGE068
And
Figure 348212DEST_PATH_IMAGE069
in addition, the invention also adopts two characteristic level fusion strategies to utilize complementary information between HSI and LiDAR data; wherein the first fusion strategy is an additive-based feature fusion, i.e. direct pair
Figure 684516DEST_PATH_IMAGE068
And
Figure 128135DEST_PATH_IMAGE069
adding to obtain the semantic features after the two are fused
Figure DEST_PATH_IMAGE115
(ii) a The second fusion strategy is basedIn Feature Fusion of the Attention mechanism, an Attention Feature Fusion (AFF) module pair is first adopted
Figure 233495DEST_PATH_IMAGE068
And
Figure 979734DEST_PATH_IMAGE069
fusing, and processing with vector stretching (Flatten) and full connection layer (FC) to generate fused semantic features
Figure DEST_PATH_IMAGE116
(ii) a As shown in fig. 7, after the features to be fused are summed, they are input into the multi-scale channel attention module to generate the attention-based fusion weight, which is expressed as follows:
Figure DEST_PATH_IMAGE117
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE118
represents the fused features, M represents the fusion weights,
Figure 209858DEST_PATH_IMAGE074
and
Figure 66956DEST_PATH_IMAGE075
representing two features to be fused.
Compared with fusion based on addition, AFF simultaneously utilizes local and global characteristics of input characteristics and realizes depth fusion from the same layer to the cross-layer.
Therefore, after the processing of the modules, four semantic features are formed together, including two single-source data semantic features
Figure 26821DEST_PATH_IMAGE068
And
Figure 943962DEST_PATH_IMAGE069
and two fused semantic features
Figure DEST_PATH_IMAGE119
And
Figure 379491DEST_PATH_IMAGE116
for the decision-level fusion-based classification module, as shown in fig. 8, the four semantic features are respectively input into different classifiers, that is, the semantic features of the single-source data are input into different classifiers
Figure 40280DEST_PATH_IMAGE068
And
Figure 854652DEST_PATH_IMAGE069
and fused semantic features
Figure 942694DEST_PATH_IMAGE119
And
Figure 350672DEST_PATH_IMAGE116
respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results; in order to improve the classification result, the implementation of the invention adopts a decision-level fusion strategy to optimize the four classification results, namely the final classification result is expressed as:
Figure DEST_PATH_IMAGE120
and S30, acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network.
Specifically, a loss function fusing sample semantic information and similar information is designed, and a gradient descent method is adopted to solve network parameters of the multi-source data feature extraction and fusion network. The sample similarity information refers to the similarity between samples, that is, the feature distance of samples in the same class should be as small as possible, and the feature distance of samples in different classes should be as large as possible. To learn the similarity information between samples, the present embodiment employs metric learning based on depth hash to constrain the similarity between image block sample pairs.
Firstly, further binarizing the extracted semantic features into hash codes to obtain corresponding hash code matrixes:
Figure DEST_PATH_IMAGE121
Figure DEST_PATH_IMAGE122
Figure DEST_PATH_IMAGE123
wherein the content of the first and second substances,
Figure 736523DEST_PATH_IMAGE082
Figure 405402DEST_PATH_IMAGE083
and
Figure 664345DEST_PATH_IMAGE084
hash code matrices representing HSI, liDAR and HSI-LiDAR respectively,
Figure DEST_PATH_IMAGE124
and
Figure DEST_PATH_IMAGE125
respectively represent
Figure 949833DEST_PATH_IMAGE087
Hash codes of individual HSI and LiDAR pixels.
In addition, any sample pair is defined
Figure 827790DEST_PATH_IMAGE088
Of similarity variable
Figure 616754DEST_PATH_IMAGE089
If 2 isIf the class labels are the same, then
Figure 312178DEST_PATH_IMAGE090
Otherwise, it is 0.
Based on the above definition, the similarity loss between the single-source and cross-source samples is obtained by calculating the negative log-likelihood of the samples to the label:
Figure DEST_PATH_IMAGE126
wherein the content of the first and second substances,
Figure 819383DEST_PATH_IMAGE092
Figure 496438DEST_PATH_IMAGE093
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE127
representing a sigmoid activation function;
due to loss function
Figure DEST_PATH_IMAGE128
There is a discontinuity constraint (i.e., the hash code matrix elements are discrete values) and solving the loss function directly is an NP-hard problem. To this end, embodiments of the present invention employ semantic features of continuous variables (i.e.
Figure DEST_PATH_IMAGE129
) To approximate a discrete hash code (i.e.
Figure DEST_PATH_IMAGE130
) The quantization loss generated by the serialization is expressed as:
Figure DEST_PATH_IMAGE131
in addition to inter-sample correlation, each sample has rich semantic information. On the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
Figure DEST_PATH_IMAGE132
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE133
Figure DEST_PATH_IMAGE134
representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
Figure DEST_PATH_IMAGE135
wherein the content of the first and second substances,
Figure 264543DEST_PATH_IMAGE100
Figure 599709DEST_PATH_IMAGE101
Figure 594210DEST_PATH_IMAGE102
being a hyper-parameter, for balancing the weights of different types of losses; i.e., by minimizing the above-mentioned loss function, the predicted class of the network output can be made as close as possible to the true class of the sample,
the embodiment of the invention adopts a gradient descent algorithm to solve the objective function, and obtains appropriate network parameters through continuous updating iteration.
And S40, inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.
In particular, for any one test sample pair
Figure 938604DEST_PATH_IMAGE103
Will be
Figure 436581DEST_PATH_IMAGE103
Inputting the data into the trained multi-source data feature extraction and fusion network; extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network
Figure 473807DEST_PATH_IMAGE104
Figure 565391DEST_PATH_IMAGE105
Figure 713476DEST_PATH_IMAGE106
And
Figure 65960DEST_PATH_IMAGE107
(ii) a Combining four semantic features
Figure 274087DEST_PATH_IMAGE104
Figure 977601DEST_PATH_IMAGE105
Figure 788431DEST_PATH_IMAGE106
And
Figure 261001DEST_PATH_IMAGE107
respectively inputting the data into different classifiers to obtain respective classification results; and finally, integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
Figure DEST_PATH_IMAGE136
wherein the classifier adopts a softmax function.
Further, this embodiment shows classification results under different metric indexes, and the adopted classification metric indexes include: overall Accuracy (OA), average Accuracy (AA), class Accuracy (CA), and Kappa coefficient. In addition to the method proposed in this embodiment, other deep learning based methods of HSI and LiDAR classification are further compared, including: two-branch CNN, FDSSCN, coupled CNN. Table 1 shows the results of the quantitative comparisons of the different classification methods.
Figure DEST_PATH_IMAGE137
TABLE 1 Classification results of different methods on Houston data sets
As can be seen from Table 1, the method provided by the embodiment of the present invention achieves the best classification results on the three indexes OA, AA and Kappa. In addition, the classification results of the method of the present invention are also higher in most categories than other classification methods. The experimental results further prove the effectiveness and superiority of the method for multi-source data fusion classification.
Further, as shown in fig. 9, based on the above method and system for optimizing data classification based on attention mechanism, the present invention also provides a terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 9 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk provided on the terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an attention-based data classification optimization program 40, and the attention-based data classification optimization program 40 can be executed by the processor 10 to implement the attention-based data classification optimization method of the present application.
The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, for example, execute the attention-based data classification optimization method.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.
In one embodiment, the steps of the attention-based data classification optimization method described above are implemented when the processor 10 executes the attention-based data classification optimization program 40 in the memory 20.
The present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program implements the steps of the attention-based data classification optimization method as described above when executed by a processor.
In summary, the present invention provides a data classification optimization method based on attention mechanism and related apparatus, the method includes: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is obviously improved, the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion, and an effective method is provided for the current combined utilization of multi-source data.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in the process, method, article, or terminal that comprises the element.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (4)

1. An attention-based data classification optimization method is characterized by comprising the following steps:
dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set;
embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism;
acquiring training data fusing sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;
inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result;
the method for dividing all the labeled pixels into a training set and a test set and respectively obtaining the real label data of the training set and the test set comprises the following steps:
if it is
Figure QLYQS_1
And &>
Figure QLYQS_2
Respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images;
wherein the content of the first and second substances,
Figure QLYQS_3
and &>
Figure QLYQS_4
Respectively denote a fifth->
Figure QLYQS_5
An HSI pixel and a ^ th ^ or ^ th ^>
Figure QLYQS_6
A LiDAR pixel; />
Figure QLYQS_7
Is the total number of the marker pixel sets,
Figure QLYQS_8
is the number of HSI spectral bands;
the real tag data is expressed as
Figure QLYQS_9
Wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_10
indicates the fifth->
Figure QLYQS_11
A true label of an individual pixel, based on the number of pixels in the image>
Figure QLYQS_12
Representing a total number of categories;
the dividing all the labeled pixels into a training set and a test set, and respectively obtaining real label data of the training set and the test set specifically includes:
forming a sample pair by using pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;
Figure QLYQS_13
and &>
Figure QLYQS_14
The training set and the test set are represented separately,
Figure QLYQS_15
and &>
Figure QLYQS_16
Real label data representing a training set and a test set, respectively, wherein ` is `>
Figure QLYQS_17
And &>
Figure QLYQS_18
Represents the number of training samples and the number of test samples, respectively, and satisfies->
Figure QLYQS_19
The multi-source data feature extraction and fusion network comprises: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification-based module;
the data preprocessing module is used for:
marking the pixel
Figure QLYQS_20
And &>
Figure QLYQS_21
As a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair is constructed to be based on a sample->
Figure QLYQS_22
Wherein is present>
Figure QLYQS_23
For a hyperspectral image block,>
Figure QLYQS_24
for LiDAR point cloud depth image block, \>
Figure QLYQS_25
Is the image block size;
using two different convolutional layers in respective pairs
Figure QLYQS_26
And &>
Figure QLYQS_27
Performing convolution operation to make->
Figure QLYQS_28
And &>
Figure QLYQS_29
The dimensions of the data are equal, and the preprocessed data are expressed as follows:
Figure QLYQS_30
Figure QLYQS_31
wherein the content of the first and second substances,
Figure QLYQS_32
and &>
Figure QLYQS_33
Respectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;
convolutional layer
Figure QLYQS_34
And &>
Figure QLYQS_35
Respectively has a convolution kernel size of->
Figure QLYQS_36
And
Figure QLYQS_37
in which>
Figure QLYQS_38
Is the spatial magnitude of the convolution kernel, < >>
Figure QLYQS_39
Being convolution kernelsThe number of output channels;
the residual-attention mechanism-based feature extraction module is configured to:
if it is
Figure QLYQS_40
Is an input to a residual module, the output is expressed as ≥ v>
Figure QLYQS_41
Wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_42
network functions being two convolutional layers, i.e.
Figure QLYQS_43
Wherein the content of the first and second substances,
Figure QLYQS_44
and &>
Figure QLYQS_45
For convolution kernel, <' > based on>
Figure QLYQS_46
And &>
Figure QLYQS_47
For a bias vector>
Figure QLYQS_48
Represents a convolution operation, <' > or>
Figure QLYQS_49
Representing a ReLU activation function;
if the input of the multi-scale channel attention module is
Figure QLYQS_50
Extracted global feature>
Figure QLYQS_51
Expressed as:
Figure QLYQS_52
wherein the content of the first and second substances,
Figure QLYQS_53
represents a global average pooling operation, <' > or>
Figure QLYQS_54
Indicating that the batch is regularized>
Figure QLYQS_55
And
Figure QLYQS_56
represents a dimension decreasing layer and a dimension increasing layer, <' > respectively>
Figure QLYQS_57
Is reduced by a factor for the channel>
Figure QLYQS_58
Represents input>
Figure QLYQS_59
The number of characteristic channels of (a);
local features
Figure QLYQS_60
Expressed as:
Figure QLYQS_61
wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_64
and &>
Figure QLYQS_65
Represents two point-by-point convolution operations in a local feature extraction process, and>
Figure QLYQS_67
and &>
Figure QLYQS_62
Respectively has a convolution kernel size of->
Figure QLYQS_66
And &>
Figure QLYQS_68
(ii) a Local characteristic->
Figure QLYQS_69
And input->
Figure QLYQS_63
The sizes are the same;
the output characteristics of the multi-scale channel attention module are expressed as:
Figure QLYQS_70
wherein the content of the first and second substances,
Figure QLYQS_71
represents an attention weight, is asserted>
Figure QLYQS_72
Represents an element-by-element multiplication operation, and->
Figure QLYQS_73
Indicates broadcast addition, and->
Figure QLYQS_74
Representing a sigmoid activation function;
the extracted HSI and LiDAR images are processed by a plurality of residual error-attention mechanism modulesFeatures are respectively noted as
Figure QLYQS_75
And &>
Figure QLYQS_76
The attention-based mechanism feature fusion module is to:
for the extracted HIS image characteristics
Figure QLYQS_77
And LiDAR image features>
Figure QLYQS_78
Respectively performing global pooling operation, and respectively generating corresponding semantic features based on vector stretching and full-link layer processing>
Figure QLYQS_79
And &>
Figure QLYQS_80
Two feature level fusion strategies are employed to take advantage of complementary information between HSI and LiDAR data;
wherein the first fusion strategy is feature fusion based on addition, and is directly opposite to
Figure QLYQS_81
And &>
Figure QLYQS_82
Adding the semantic features to obtain the semantic feature of the fused cell phone and the semantic feature of the fused cell phone>
Figure QLYQS_83
Wherein the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pair
Figure QLYQS_84
And &>
Figure QLYQS_85
Fusing, vector stretching and full-connection layer processing to generate fused semantic features
Figure QLYQS_86
(ii) a After the features to be fused are subjected to summation operation, the features are input into a multi-scale channel attention module to generate attention-based fusion weights, which are expressed as follows:
Figure QLYQS_87
wherein the content of the first and second substances,
Figure QLYQS_88
represents the feature after fusion, M represents the fusion weight, is>
Figure QLYQS_89
And &>
Figure QLYQS_90
Representing two features to be fused;
after being processed, four semantic features are formed jointly, including two single-source data semantic features
Figure QLYQS_91
And &>
Figure QLYQS_92
And two fused semantic features->
Figure QLYQS_93
And &>
Figure QLYQS_94
The decision-level based fusion classification module is to:
single source data semanticsFeature(s)
Figure QLYQS_95
And &>
Figure QLYQS_96
And fused semantic feature>
Figure QLYQS_97
And &>
Figure QLYQS_98
Respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results;
and optimizing the four classification results by adopting a decision-level fusion strategy, wherein the final classification result is expressed as:
Figure QLYQS_99
the method comprises the following steps of acquiring training data fusing sample semantic information and similar information, wherein the supervised training of the multi-source data feature extraction and fusion network specifically comprises the following steps:
designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method;
the method comprises the following steps of designing a loss function fusing semantic information and similar information of a sample, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method, wherein the method specifically comprises the following steps:
adopting metric learning based on depth hash to restrict the similarity between the image block sample pairs;
the extracted semantic features are binarized into hash codes, and corresponding hash code matrixes are obtained:
Figure QLYQS_100
Figure QLYQS_101
Figure QLYQS_102
wherein the content of the first and second substances,
Figure QLYQS_103
、/>
Figure QLYQS_104
and &>
Figure QLYQS_105
Hash code matrices representing HSI, liDAR and HSI-LiDAR, respectively, < >>
Figure QLYQS_106
And &>
Figure QLYQS_107
Respectively denote a fifth->
Figure QLYQS_108
Hash codes for individual HSI and LiDAR pixels;
defining any sample pair
Figure QLYQS_109
In a degree of similarity variable>
Figure QLYQS_110
If the two category labels are the same, then->
Figure QLYQS_111
Otherwise, the value is 0;
and (3) calculating the negative log-likelihood of the sample to the label to obtain the similarity loss between the single-source sample and the cross-source sample:
Figure QLYQS_112
wherein the content of the first and second substances,
Figure QLYQS_113
; />
Figure QLYQS_114
wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_115
representing a sigmoid activation function;
approximating the discrete hash code by adopting the semantic features of continuous variables, wherein the quantization loss generated by the serialization is expressed as follows:
Figure QLYQS_116
on the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:
Figure QLYQS_117
wherein the content of the first and second substances,
Figure QLYQS_118
,/>
Figure QLYQS_119
representing the classification result predicted by the classifier;
by jointly minimizing the above three loss functions, the objective function is expressed as follows:
Figure QLYQS_120
wherein the content of the first and second substances,
Figure QLYQS_121
、/>
Figure QLYQS_122
、/>
Figure QLYQS_123
being a hyper-parameter, for balancing the weights of different types of losses;
and solving the objective function by adopting a gradient descent algorithm, and obtaining appropriate network parameters through continuous updating and iteration.
2. The attention-based data classification optimization method of claim 1, wherein the inputting of the sample to be tested into the trained multi-source data feature extraction and fusion network and the outputting of the final classification label according to the decision-level fusion result specifically comprises:
for any test sample pair
Figure QLYQS_124
Will>
Figure QLYQS_125
Inputting the data into the trained multi-source data feature extraction and fusion network;
extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network
Figure QLYQS_126
、/>
Figure QLYQS_127
、/>
Figure QLYQS_128
And &>
Figure QLYQS_129
Combining four semantic features
Figure QLYQS_130
、/>
Figure QLYQS_131
、/>
Figure QLYQS_132
And &>
Figure QLYQS_133
Respectively inputting the data into different classifiers to obtain respective classification results;
integrating the four classification results by adopting decision-level fusion to obtain a final classification result:
Figure QLYQS_134
wherein the classifier adopts a softmax function.
3. A terminal, characterized in that the terminal comprises: a memory, a processor, and an attention-based system data classification optimization program stored on the memory and executable on the processor, the attention-based system data classification optimization program when executed by the processor implementing the steps of the attention-based system data classification optimization method of any of claims 1-2.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an attention-based data classification optimization program, which when executed by a processor implements the steps of the attention-based data classification optimization method according to any one of claims 1-2.
CN202211550245.2A 2022-12-05 2022-12-05 Attention mechanism-based data classification optimization method and related equipment Active CN115546569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211550245.2A CN115546569B (en) 2022-12-05 2022-12-05 Attention mechanism-based data classification optimization method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211550245.2A CN115546569B (en) 2022-12-05 2022-12-05 Attention mechanism-based data classification optimization method and related equipment

Publications (2)

Publication Number Publication Date
CN115546569A CN115546569A (en) 2022-12-30
CN115546569B true CN115546569B (en) 2023-04-07

Family

ID=84722227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211550245.2A Active CN115546569B (en) 2022-12-05 2022-12-05 Attention mechanism-based data classification optimization method and related equipment

Country Status (1)

Country Link
CN (1) CN115546569B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894972B (en) * 2023-06-25 2024-02-13 耕宇牧星(北京)空间科技有限公司 Wetland information classification method and system integrating airborne camera image and SAR image

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022073452A1 (en) * 2020-10-07 2022-04-14 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993220B (en) * 2019-03-23 2022-12-06 西安电子科技大学 Multi-source remote sensing image classification method based on double-path attention fusion neural network
CN113435253B (en) * 2021-05-31 2022-12-02 西安电子科技大学 Multi-source image combined urban area ground surface coverage classification method
CN114708455A (en) * 2022-03-24 2022-07-05 中国人民解放军战略支援部队信息工程大学 Hyperspectral image and LiDAR data collaborative classification method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022073452A1 (en) * 2020-10-07 2022-04-14 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network

Also Published As

Publication number Publication date
CN115546569A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN110321963B (en) Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features
Shabbir et al. Satellite and scene image classification based on transfer learning and fine tuning of ResNet50
Delibasoglu et al. Improved U-Nets with inception blocks for building detection
Li et al. Toward in situ zooplankton detection with a densely connected YOLOV3 model
Zhang et al. Semantic segmentation of very high-resolution remote sensing image based on multiple band combinations and patchwise scene analysis
Zhou et al. Surveillance of pine wilt disease by high resolution satellite
CN115546569B (en) Attention mechanism-based data classification optimization method and related equipment
Huang et al. Attention-guided label refinement network for semantic segmentation of very high resolution aerial orthoimages
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
Abbas et al. Deep neural networks for automatic flower species localization and recognition
Sjahputera et al. Clustering of detected changes in high-resolution satellite imagery using a stabilized competitive agglomeration algorithm
Ps et al. Building footprint extraction from very high-resolution satellite images using deep learning
Cheng et al. Multi-scale Feature Fusion and Transformer Network for urban green space segmentation from high-resolution remote sensing images
Li Segment Any Building
Song et al. Multi-source remote sensing image classification based on two-channel densely connected convolutional networks.
CN116630700A (en) Remote sensing image classification method based on introduction channel-space attention mechanism
İsa Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images
Alshammari et al. An efficient deep learning mechanism for the recognition of olive trees in Jouf Region
Wu et al. Research on asphalt pavement disease detection based on improved YOLOv5s
Sivagami et al. Analysis of encoder-decoder based deep learning architectures for semantic segmentation in remote sensing images
Moody et al. Land cover classification in multispectral satellite imagery using sparse approximations on learned dictionaries
Yuan et al. Hyperspectral image classification using residual 2d and 3d convolutional neural network joint attention model
He et al. Tackling the over-smoothing problem of CNN-based hyperspectral image classification
Yifter et al. Deep transfer learning of satellite imagery for land use and land cover classification
Subhashini et al. A hybrid optimal technique for road extraction using entropy rate super-pixel segmentation and probabilistic neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant