CN115546569B

CN115546569B - Attention mechanism-based data classification optimization method and related equipment

Info

Publication number: CN115546569B
Application number: CN202211550245.2A
Authority: CN
Inventors: 宋伟伟; 莫继学; 戴勇
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-04-07
Anticipated expiration: 2042-12-05
Also published as: CN115546569A

Abstract

The invention discloses a data classification optimization method based on an attention mechanism and related equipment, wherein the method comprises the following steps: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the HSI and the LiDAR are accurately classified through efficient feature extraction and fusion.

Description

Attention mechanism-based data classification optimization method and related equipment

Technical Field

The invention relates to the technical field of multi-source data fusion classification, in particular to a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium.

Background

With the rapid development of earth observation technology, different types of sensors have been developed to obtain multi-source information of ground objects (ground objects). For example, multispectral and hyperspectral cameras can acquire spectral attributes of the ground features, liDAR (laser Detection and Ranging) sensors can directly acquire three-dimensional spatial information of the ground features, and Synthetic Aperture Radar (SAR) sensors can acquire amplitude and phase information.

Although these types of sensors play an important role in remote sensing earth observation and surface feature classification applications, there are drawbacks to using only a single sensor. For example, hyper Spectral Images (HSI) have rich Spectral information, and can identify different material attributes, but it is difficult to distinguish land features with similar spectra and different elevation information (e.g., grasslands and trees, parking lots and building roofs, roads and viaducts cannot be effectively distinguished only by HSI Spectral information); on the other hand, liDAR data can directly classify features of different elevations by using height information, but cannot distinguish features of the same height but different spectra (for example, asphalt and concrete, iron skin tiles and glazed tiles, trees and pseudo-tree signal towers and the like cannot be effectively distinguished by using LiDAR point cloud only). Therefore, any single sensor data cannot comprehensively capture real and accurate ground feature information, and the requirement for reliable remote sensing ground feature classification is difficult to meet. By combining LiDAR point cloud and HSI, the advantages of different types of data are fully utilized for complementation, and the method is a key technical means for realizing fine classification of remote sensing images.

Currently, liDAR point cloud and HSI fusion classification methods can be classified into the following categories: a fusion classification method based on feature stack, a fusion classification method based on low-dimensional subspace, a fusion classification method based on kernel transformation, and a fusion classification method based on deep learning.

Among them, feature stacking is the simplest and easiest to implement feature fusion method, however, a simple concatenation or stacking method may cause fused features to contain a large amount of redundant information, and due to limited labeled samples, the fusion method usually faces the problem of "dimension disaster", resulting in limited classification accuracy; the fusion method based on the low-dimensional subspace can effectively avoid dimension disasters generated in the classification process by decomposing high-dimensional hyperspectral data into the low-dimensional spectrum subspace and coefficients, and improve the calculation efficiency, however, the method needs to solve a complex decomposition model, and the performance of the method is greatly influenced by the coefficients obtained by the solution; the fusion method based on kernel transformation maps linear inseparable data in an original space into a high-dimensional space to enable the linear inseparable data to become linearly separable, and is widely used for LiDAR point cloud and HSI fusion classification research, however, the method needs manual kernel function selection and cannot guarantee that the performance of the selected kernel function is optimal in all scenes; the method based on deep learning is a current mainstream method, the method extracts high-representation semantic features by constructing a deep neural network and realizes deep fusion of the high-representation semantic features and the HSI and LiDAR point cloud features by fusing full-connection layers, however, the method based on deep learning needs a large number of label samples to perform model training, generally calibrated hyperspectral pixels are very limited, and the application of the deep learning method in the hyperspectral field is limited to a certain extent.

Although several exploratory works have been carried out for the HSI and LiDAR point cloud multi-source data fusion classification problem, better ground object classification results are obtained. However, since the complexity of the spatial structure of the remote sensing data is high, the heterogeneity between the HSI and the LiDAR point cloud is strong, the feature characterization capability obtained by the current multi-source data feature extraction and fusion method is still insufficient, and the requirement of high-precision classification of the local object is difficult to meet.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The invention mainly aims to provide a data classification optimization method based on an attention mechanism, a terminal and a computer readable storage medium, and aims to solve the problems that the feature characterization capability obtained by a multi-source data feature extraction and fusion method in the prior art is insufficient, and the requirement of high-precision classification of the current ground features is difficult to meet.

In order to achieve the above object, the present invention provides an attention-based data classification optimization method, which includes the following steps:

dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set;

embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism;

acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;

and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.

Optionally, the method for data classification optimization based on an attention mechanism, wherein the dividing all labeled pixels into a training set and a test set and respectively acquiring true label data of the training set and the test set, further includes:

if it is

And

respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images;

wherein the content of the first and second substances,

and

respectively represent

An HSI pixel and a

A LiDAR pixel;

is the total number of the marker pixel sets,

is the number of HSI spectral bands;

the real tag data is expressed as

；

Wherein the content of the first and second substances,

is shown as

The true label of a single pixel of the image,

representing the total number of categories.

Optionally, the method for data classification and optimization based on an attention mechanism, where the dividing all labeled pixels into a training set and a test set and respectively obtaining real label data of the training set and the test set specifically includes:

forming a sample pair by using pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;

and

the training set and the test set are represented separately,

and

the true label data representing the training set and the test set, respectively, wherein,

and

respectively represent the number of training samples and the number of test samples, and satisfy

。

Optionally, the data classification optimization method based on the attention mechanism, wherein the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.

Optionally, the method for optimizing data classification based on attention mechanism includes:

to mark pixels

And

as a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair sample is constructed

Wherein, in the step (A),

is a hyperspectral image block,

is a block of depth images of a point cloud of LiDAR,

is the image block size;

using two different convolution layers in respective pairs

And

performing a convolution operation such that

And

the dimensions of the data are equal, and the preprocessed data are expressed as follows:

；

；

wherein the content of the first and second substances,

and

respectively representing a preprocessed hyperspectral image block and a preprocessed LiDAR point cloud depth image block;

convolutional layer

And

respectively, of a convolution kernel size of

And

wherein, in the process,

is the spatial size of the convolution kernel,

the number of output channels of the convolution kernel.

Optionally, the method for data classification optimization based on an attention mechanism, wherein the residual-attention mechanism-based feature extraction module is configured to:

if it is

For a residual block input, the output is expressed as

；

Wherein the content of the first and second substances,

network functions being two convolutional layers, i.e.

；

Wherein the content of the first and second substances,

and

in the form of a convolution kernel, the kernel is,

and

in order to be a vector of the offset,

which represents a convolution operation, the operation of the convolution,

representing a ReLU activation function;

if the input of the multi-scale channel attention module is

Extracted global features

Expressed as:

；

wherein the content of the first and second substances,

a global average pooling operation is represented as,

which represents the regularization of the batch,

and

the dimension reduction layer and the dimension increase layer are respectively represented,

in order to reduce the factor for the channel,

presentation input

The number of characteristic channels of (a);

local features

Expressed as:

；

wherein the content of the first and second substances,

and

representing two point-by-point convolution operations in the local feature extraction process,

and

respectively having a convolution kernel size of

And

(ii) a Local feature

And input

The sizes are the same;

the output characteristics of the multi-scale channel attention module are expressed as:

；

wherein, the first and the second end of the pipe are connected with each other,

the weight of attention is represented by the weight of attention,

which represents an element-by-element multiplication operation,

it means that the broadcast addition method is performed,

representing a sigmoid activation function;

after being processed by a plurality of residual error-attention mechanism modules, extracted HSI and LiDAR image characteristics are respectively recorded as

And

。

optionally, the method for data classification optimization based on an attention mechanism includes:

for the extracted HIS image characteristics

And LiDAR image features

Respectively performing global pooling operation, and respectively generating corresponding semantic features through vector stretching and full-connection layer processing

And

；

two feature level fusion strategies are employed to take advantage of complementary information between HSI and LiDAR data;

wherein the first fusion strategy is feature fusion based on addition, and is directly opposite to

And

adding the two to obtain the semantic features after the fusion of the two

；

Wherein, the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pair

And

fusing, vector stretching and full-connection layer processing to generate fused semantic features

(ii) a After the features to be fused are subjected to summation operation, the features are input into a multi-scale channel attention module to generate attention-based fusion weights, which are expressed as follows:

；

wherein the content of the first and second substances,

represents the fused features, M represents the fusion weights,

and

representing two features to be fused;

after being processed, four semantic features are formed jointly, including two single-source data semantic features

And

and two fused semantic features

And

。

semantic features of single source data

And

and fused semantic features

And

respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results;

and optimizing the four classification results by adopting a decision-level fusion strategy, wherein the final classification result is expressed as:

optionally, the attention-based data classification optimization method includes acquiring training data with sample semantic information and similar information fused, where the supervised training of the multi-source data feature extraction and fusion network specifically includes:

and designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method.

Optionally, the data classification optimization method based on the attention mechanism, where the designing of a loss function fusing sample semantic information and similar information and solving of network parameters of the multi-source data feature extraction and fusion network by using a gradient descent method specifically includes:

constraining similarity between pairs of image block samples using depth hash-based metric learning;

the extracted semantic features are binarized into hash codes, and corresponding hash code matrixes are obtained:

；

；

；

、

and

hash code matrices representing HSI, liDAR and HSI-LiDAR respectively,

and

respectively represent

Hash codes for individual HSI and LiDAR pixels;

defining any sample pair

Of similarity variable

If the two category labels are the same, then

Otherwise, the value is 0;

and (3) calculating the negative log-likelihood of the sample to the label to obtain the similarity loss between the single-source sample and the cross-source sample:

；

wherein the content of the first and second substances,

；

；

wherein the content of the first and second substances,

representing a sigmoid activation function;

adopting semantic features of continuous variables to approximate the discrete hash code, wherein the quantization loss generated by the serialization is expressed as follows:

；

on the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:

；

wherein the content of the first and second substances,

，

representing the classification result predicted by the classifier;

by jointly minimizing the above three loss functions, the objective function is expressed as follows:

；

wherein the content of the first and second substances,

、

、

being a hyper-parameter, for balancing the weights of different types of losses;

and solving the objective function by adopting a gradient descent algorithm, and obtaining appropriate network parameters through continuous updating and iteration.

Optionally, the method for data classification optimization based on an attention mechanism, where the inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result specifically includes:

for any test sample pair

Will be

Inputting the data into the trained multi-source data feature extraction and fusion network;

extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network

、

、

And

；

combining four semantic features

、

、

And

respectively inputting the data into different classifiers to obtain respective classification results;

integrating the four classification results by adopting decision-level fusion to obtain a final classification result:

wherein the classifier adopts a softmax function.

In addition, to achieve the above object, the present invention further provides a terminal, wherein the terminal includes: the data classification optimization method comprises the following steps of a memory, a processor and an attention-based data classification optimization program stored on the memory and capable of running on the processor, wherein when executed by the processor, the attention-based data classification optimization program realizes the steps of the attention-based data classification optimization method.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program, when executed by a processor, implements the steps of the attention-based data classification optimization method as described above.

In the invention, all the marked pixels are divided into a training set and a testing set, and real label data of the training set and the testing set are respectively obtained; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is remarkably improved, and the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;

FIG. 2 is a block diagram of a multi-source data feature extraction and fusion network according to a preferred embodiment of the data classification optimization method based on attention mechanism;

FIG. 3 is a schematic diagram of the data preprocessing module processing data according to the preferred embodiment of the data classification optimization method based on attention mechanism;

FIG. 4 is a schematic diagram of the feature extraction module based on the residual-attention mechanism for processing data according to the preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;

FIG. 5 is a schematic diagram of feature extraction using a multi-scale channel attention module MS-CAM according to a preferred embodiment of the data classification optimization method based on the attention mechanism of the present invention;

FIG. 6 is a schematic diagram of the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention;

FIG. 7 is a schematic diagram of the feature to be fused being input into the MS-CAM module after the summation operation to generate the fusion weight based on attention in the preferred embodiment of the data classification optimization method based on attention mechanism of the present invention;

FIG. 8 is a schematic diagram of a decision-level fusion-based classification module for processing data according to a preferred embodiment of the present invention;

fig. 9 is a schematic operating environment of a terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, the data classification optimization method based on attention mechanism according to the preferred embodiment of the present invention includes the following steps:

and S10, dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set.

In particular, if

And

respectively representing marker pixel sets in the HSI and LiDAR point cloud depth images; wherein the content of the first and second substances,

and

respectively represent the first

An HSI pixel and

a LiDAR pixel;

is the total number of the marker pixel sets,

is the number of HSI spectral bands.

Corresponding genuine tag data is expressed as

(ii) a Wherein the content of the first and second substances,

is shown as

The true label of an individual pixel or pixels,

representing the total number of categories.

Forming a sample pair by pixels at the same coordinate position in the HSI point cloud depth image and the LiDAR point cloud depth image, and dividing all marked pixels into a training set and a test set according to a predefined data division criterion;

and

the training set and the test set are represented separately,

and

and

。

And S20, embedding the attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism.

Specifically, an attention mechanism is embedded into a convolutional neural network, a multi-source data feature extraction and fusion network based on the attention mechanism is constructed, and high-representation semantic features are extracted; as shown in fig. 2, the multi-source data feature extraction and fusion network includes: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification module.

WhereinFor the data preprocessing module, as shown in fig. 3, the data preprocessing module includes two parts of image fetching and dimension transformation. First, marking the pixels

And

Wherein, in the step (A),

is a hyperspectral image block,

is a block of depth images of a point cloud of LiDAR,

is the image block size; secondly, two different convolution layers are adopted to be respectively paired

And

performing a convolution operation such that

And

；

；

wherein the content of the first and second substances,

and

convolutional layer

And

respectively having a convolution kernel size of

And

wherein, in the process,

is the spatial size of the convolution kernel,

the number of output channels of the convolution kernel.

For the feature extraction module based on the residual error-attention mechanism, as shown in fig. 4, the constructed feature extraction module based on the residual error-attention mechanism adopts a dual-branch structure, and a weight sharing mechanism is adopted between each branch to reduce the number of network parameters. Each branch is in turn made up of a number of residual-attention mechanism blocks (i.e., res-MS-CAM blocks). In the residual learning, the network residual between the plurality of convolutional layers is made to be zero by adopting jump connection, so that the network residual is made to be approximately identical mapping. The network performance of the network is better under a deep structure because the network parameters are not increased in the jump connection and the training of the whole network is optimized.

In the examples of the invention, if

For a residual block input, the output is expressed as

；

Wherein the content of the first and second substances,

network functions being two convolutional layers, i.e.

(ii) a Wherein, the first and the second end of the pipe are connected with each other,

and

in the form of a convolution kernel, the kernel is,

and

in order to be a vector of the offset,

which represents a convolution operation, the operation of the convolution,

indicating the ReLU activation function.

In addition, in order to focus the network on more significant information in the feature extraction process, the implementation of the invention also adopts a Multi-Scale Channel Attention Module (MS-CAM). As shown in fig. 5, the MS-CAM utilizes both global and local features. If the input of the multi-scale channel attention module (MS-CAM) is

Extracted global features

Expressed as:

；

wherein the content of the first and second substances,

a global average pooling operation is represented as,

which represents the regularization of the batch,

and

in order to reduce the factor by which the channel is,

presentation input

The number of characteristic channels of (2).

In addition, local features

Expressed as:

；

wherein

And

and

respectively having a convolution kernel size of

And

(ii) a Thus, local features

And input

The sizes are the same.

Finally, the output characteristics of the multi-scale channel attention module (MS-CAM) are expressed as:

；

wherein the content of the first and second substances,

the weight of attention is represented as a weight of attention,

which represents an element-by-element multiplication operation,

it means that the broadcast addition method is performed,

representing the sigmoid activation function.

After processing by a plurality of residual error-attention mechanism modules (Res-MS-CAM), extracted HSI and LiDAR image characteristics are respectively recorded as

And

。

wherein, for the attention-based feature fusion module, as shown in fig. 6, the extracted HIS image features are processed

And LiDAR image features

Respectively carrying out global pooling (Globavalgpool), and then respectively generating corresponding semantic features through vector stretching (Flatten) and full connection layer (FC) processing

And

。

in addition, the invention also adopts two characteristic level fusion strategies to utilize complementary information between HSI and LiDAR data; wherein the first fusion strategy is an additive-based feature fusion, i.e. direct pair

And

adding to obtain the semantic features after the two are fused

(ii) a The second fusion strategy is basedIn Feature Fusion of the Attention mechanism, an Attention Feature Fusion (AFF) module pair is first adopted

And

fusing, and processing with vector stretching (Flatten) and full connection layer (FC) to generate fused semantic features

(ii) a As shown in fig. 7, after the features to be fused are summed, they are input into the multi-scale channel attention module to generate the attention-based fusion weight, which is expressed as follows:

；

wherein the content of the first and second substances,

represents the fused features, M represents the fusion weights,

and

representing two features to be fused.

Compared with fusion based on addition, AFF simultaneously utilizes local and global characteristics of input characteristics and realizes depth fusion from the same layer to the cross-layer.

Therefore, after the processing of the modules, four semantic features are formed together, including two single-source data semantic features

And

and two fused semantic features

And

。

for the decision-level fusion-based classification module, as shown in fig. 8, the four semantic features are respectively input into different classifiers, that is, the semantic features of the single-source data are input into different classifiers

And

and fused semantic features

And

respectively inputting the four classification prediction results into different classifiers to obtain four classification prediction results; in order to improve the classification result, the implementation of the invention adopts a decision-level fusion strategy to optimize the four classification results, namely the final classification result is expressed as:

and S30, acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network.

Specifically, a loss function fusing sample semantic information and similar information is designed, and a gradient descent method is adopted to solve network parameters of the multi-source data feature extraction and fusion network. The sample similarity information refers to the similarity between samples, that is, the feature distance of samples in the same class should be as small as possible, and the feature distance of samples in different classes should be as large as possible. To learn the similarity information between samples, the present embodiment employs metric learning based on depth hash to constrain the similarity between image block sample pairs.

Firstly, further binarizing the extracted semantic features into hash codes to obtain corresponding hash code matrixes:

；

；

；

wherein the content of the first and second substances,

、

and

hash code matrices representing HSI, liDAR and HSI-LiDAR respectively,

and

respectively represent

Hash codes of individual HSI and LiDAR pixels.

In addition, any sample pair is defined

Of similarity variable

If 2 isIf the class labels are the same, then

Otherwise, it is 0.

Based on the above definition, the similarity loss between the single-source and cross-source samples is obtained by calculating the negative log-likelihood of the samples to the label:

wherein the content of the first and second substances,

；

；

wherein the content of the first and second substances,

representing a sigmoid activation function;

due to loss function

There is a discontinuity constraint (i.e., the hash code matrix elements are discrete values) and solving the loss function directly is an NP-hard problem. To this end, embodiments of the present invention employ semantic features of continuous variables (i.e.

) To approximate a discrete hash code (i.e.

) The quantization loss generated by the serialization is expressed as:

；

in addition to inter-sample correlation, each sample has rich semantic information. On the basis of the extracted semantic features, measuring the semantic loss of each sample by adopting a cross entropy loss function:

wherein the content of the first and second substances,

，

representing the classification result predicted by the classifier;

；

wherein the content of the first and second substances,

、

、

being a hyper-parameter, for balancing the weights of different types of losses; i.e., by minimizing the above-mentioned loss function, the predicted class of the network output can be made as close as possible to the true class of the sample,

the embodiment of the invention adopts a gradient descent algorithm to solve the objective function, and obtains appropriate network parameters through continuous updating iteration.

And S40, inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result.

In particular, for any one test sample pair

Will be

Inputting the data into the trained multi-source data feature extraction and fusion network; extracting four semantic features by the multi-source data feature extraction and the feedforward operation of the fusion network

、

、

And

(ii) a Combining four semantic features

、

、

And

respectively inputting the data into different classifiers to obtain respective classification results; and finally, integrating the four classification results by adopting decision-level fusion to obtain a final classification result:

wherein the classifier adopts a softmax function.

Further, this embodiment shows classification results under different metric indexes, and the adopted classification metric indexes include: overall Accuracy (OA), average Accuracy (AA), class Accuracy (CA), and Kappa coefficient. In addition to the method proposed in this embodiment, other deep learning based methods of HSI and LiDAR classification are further compared, including: two-branch CNN, FDSSCN, coupled CNN. Table 1 shows the results of the quantitative comparisons of the different classification methods.

TABLE 1 Classification results of different methods on Houston data sets

As can be seen from Table 1, the method provided by the embodiment of the present invention achieves the best classification results on the three indexes OA, AA and Kappa. In addition, the classification results of the method of the present invention are also higher in most categories than other classification methods. The experimental results further prove the effectiveness and superiority of the method for multi-source data fusion classification.

Further, as shown in fig. 9, based on the above method and system for optimizing data classification based on attention mechanism, the present invention also provides a terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 9 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk provided on the terminal, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various types of data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an attention-based data classification optimization program 40, and the attention-based data classification optimization program 40 can be executed by the processor 10 to implement the attention-based data classification optimization method of the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, for example, execute the attention-based data classification optimization method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In one embodiment, the steps of the attention-based data classification optimization method described above are implemented when the processor 10 executes the attention-based data classification optimization program 40 in the memory 20.

The present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores an attention-based data classification optimization program, and the attention-based data classification optimization program implements the steps of the attention-based data classification optimization method as described above when executed by a processor.

In summary, the present invention provides a data classification optimization method based on attention mechanism and related apparatus, the method includes: dividing all the marked pixels into a training set and a testing set, and respectively acquiring real label data of the training set and the testing set; embedding an attention mechanism into a convolutional neural network, and constructing a multi-source data feature extraction and fusion network based on the attention mechanism; acquiring training data fused with sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network; and inputting the sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result. According to the method, a feature extraction and fusion framework based on an attention mechanism is constructed, a novel target loss function is designed, semantic information and similar information of a sample are considered, the feature characterization capability is obviously improved, the accurate classification of HSI and LiDAR is realized through efficient feature extraction and fusion, and an effective method is provided for the current combined utilization of multi-source data.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of additional like elements in the process, method, article, or terminal that comprises the element.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing relevant hardware (such as a processor, a controller, etc.) through a computer program, and the program can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. An attention-based data classification optimization method is characterized by comprising the following steps:

acquiring training data fusing sample semantic information and similar information, and performing supervised training on the multi-source data feature extraction and fusion network;

inputting a sample to be tested into the trained multi-source data feature extraction and fusion network, and outputting a final classification label according to a decision-level fusion result;

the method for dividing all the labeled pixels into a training set and a test set and respectively obtaining the real label data of the training set and the test set comprises the following steps:

if it is

And &>

wherein the content of the first and second substances,

and &>

Respectively denote a fifth->

An HSI pixel and a ^ th ^ or ^ th ^>

A LiDAR pixel; />

Is the total number of the marker pixel sets,

is the number of HSI spectral bands;

the real tag data is expressed as

；

indicates the fifth->

A true label of an individual pixel, based on the number of pixels in the image>

Representing a total number of categories;

the dividing all the labeled pixels into a training set and a test set, and respectively obtaining real label data of the training set and the test set specifically includes:

and &>

The training set and the test set are represented separately,

and &>

Real label data representing a training set and a test set, respectively, wherein ` is `>

And &>

Represents the number of training samples and the number of test samples, respectively, and satisfies->

；

The multi-source data feature extraction and fusion network comprises: the system comprises a data preprocessing module, a residual error-attention mechanism-based feature extraction module, an attention mechanism-based feature fusion module and a decision-level fusion classification-based module;

the data preprocessing module is used for:

marking the pixel

And &>

As a center, image blocks with preset sizes are respectively intercepted on the HSI point cloud depth image and the LiDAR point cloud depth image, and an image pair is constructed to be based on a sample->

Wherein is present>

For a hyperspectral image block,>

for LiDAR point cloud depth image block, \>

Is the image block size;

using two different convolutional layers in respective pairs

And &>

Performing convolution operation to make->

And &>

；

；

wherein the content of the first and second substances,

and &>

convolutional layer

And &>

Respectively has a convolution kernel size of->

And

in which>

Is the spatial magnitude of the convolution kernel, < >>

Being convolution kernelsThe number of output channels;

the residual-attention mechanism-based feature extraction module is configured to:

if it is

Is an input to a residual module, the output is expressed as ≥ v>

；

network functions being two convolutional layers, i.e.

；

Wherein the content of the first and second substances,

and &>

For convolution kernel, <' > based on>

And &>

For a bias vector>

Represents a convolution operation, <' > or>

Representing a ReLU activation function;

if the input of the multi-scale channel attention module is

Extracted global feature>

Expressed as:

；

wherein the content of the first and second substances,

represents a global average pooling operation, <' > or>

Indicating that the batch is regularized>

And

represents a dimension decreasing layer and a dimension increasing layer, <' > respectively>

Is reduced by a factor for the channel>

Represents input>

The number of characteristic channels of (a);

local features

Expressed as:

；

and &>

Represents two point-by-point convolution operations in a local feature extraction process, and>

and &>

Respectively has a convolution kernel size of->

And &>

(ii) a Local characteristic->

And input->

The sizes are the same;

；

wherein the content of the first and second substances,

represents an attention weight, is asserted>

Represents an element-by-element multiplication operation, and->

Indicates broadcast addition, and->

Representing a sigmoid activation function;

the extracted HSI and LiDAR images are processed by a plurality of residual error-attention mechanism modulesFeatures are respectively noted as

And &>

；

The attention-based mechanism feature fusion module is to:

for the extracted HIS image characteristics

And LiDAR image features>

Respectively performing global pooling operation, and respectively generating corresponding semantic features based on vector stretching and full-link layer processing>

And &>

；

And &>

Adding the semantic features to obtain the semantic feature of the fused cell phone and the semantic feature of the fused cell phone>

；

Wherein the second fusion strategy is feature fusion based on attention mechanism, and adopts an attention feature fusion module pair

And &>

；

wherein the content of the first and second substances,

represents the feature after fusion, M represents the fusion weight, is>

And &>

Representing two features to be fused;

And &>

And two fused semantic features->

And &>

；

The decision-level based fusion classification module is to:

single source data semanticsFeature(s)

And &>

And fused semantic feature>

And &>

；

the method comprises the following steps of acquiring training data fusing sample semantic information and similar information, wherein the supervised training of the multi-source data feature extraction and fusion network specifically comprises the following steps:

designing a loss function fusing sample semantic information and similar information, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method;

the method comprises the following steps of designing a loss function fusing semantic information and similar information of a sample, and solving network parameters of the multi-source data feature extraction and fusion network by adopting a gradient descent method, wherein the method specifically comprises the following steps:

adopting metric learning based on depth hash to restrict the similarity between the image block sample pairs;

；

；

；

wherein the content of the first and second substances,

、/>

and &>

Hash code matrices representing HSI, liDAR and HSI-LiDAR, respectively, < >>

And &>

Respectively denote a fifth->

Hash codes for individual HSI and LiDAR pixels;

defining any sample pair

In a degree of similarity variable>

If the two category labels are the same, then->

Otherwise, the value is 0;

；

wherein the content of the first and second substances,

； />

；

representing a sigmoid activation function;

approximating the discrete hash code by adopting the semantic features of continuous variables, wherein the quantization loss generated by the serialization is expressed as follows:

；

；

wherein the content of the first and second substances,

，/>

representing the classification result predicted by the classifier;

；

wherein the content of the first and second substances,

、/>

、/>

2. The attention-based data classification optimization method of claim 1, wherein the inputting of the sample to be tested into the trained multi-source data feature extraction and fusion network and the outputting of the final classification label according to the decision-level fusion result specifically comprises:

for any test sample pair

Will>

、/>

、/>

And &>

；

Combining four semantic features

、/>

、/>

And &>

；

wherein the classifier adopts a softmax function.

3. A terminal, characterized in that the terminal comprises: a memory, a processor, and an attention-based system data classification optimization program stored on the memory and executable on the processor, the attention-based system data classification optimization program when executed by the processor implementing the steps of the attention-based system data classification optimization method of any of claims 1-2.

4. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an attention-based data classification optimization program, which when executed by a processor implements the steps of the attention-based data classification optimization method according to any one of claims 1-2.