CN116229163A

CN116229163A - Medical hyperspectral image classification method based on space-spectrum self-attention mechanism

Info

Publication number: CN116229163A
Application number: CN202310152996.7A
Authority: CN
Inventors: 黄鸿; 李�远; 谭崎娟
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-06-06

Abstract

The invention discloses a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which comprises the following steps: s1: normalizing the obtained original hyperspectral image; s2: cutting the normalized image by taking the target sample as the center; s3: inputting the cut image into a processing unit for processing to obtain a first space-spectrum characteristic, a second space-spectrum characteristic and a third space-spectrum characteristic; s4: processing the three space-spectrum features by using three classifiers to obtain three prediction results; s5: weighting and fusing the three prediction results to obtain a final prediction result; the beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.

Description

Medical hyperspectral image classification method based on space-spectrum self-attention mechanism

Technical Field

The invention relates to a hyperspectral image processing technology, in particular to a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism.

Background

The hyperspectral imaging technology is an advanced image space information and spectrum information extraction technology, and can simultaneously acquire two-dimensional space information and one-dimensional spectrum information of a shooting object and cover spectrum ranges of visible light, infrared, ultraviolet and the like. In recent years, since hyperspectral imaging can provide diagnostic information about physiological, morphological and biochemical components of tissues, it provides fine spectral features for biological histology research, and is receiving widespread attention as a non-invasive auxiliary diagnostic means, and has been successfully applied to diagnosis and monitoring of non-invasive diseases, image-guided minimally invasive surgery, drug dose evaluation, and the like. Along with the high-speed development of accurate medical theory, how to design an efficient and accurate diagnosis algorithm aiming at the characteristics of high dimensionality, high redundancy and 'map in one' of hyperspectral medical images has become a research hotspot in the field of medical hyperspectral image analysis.

Traditional medical hyperspectral image classification methods generally use a classifier to classify manual features after extracting them, however, traditional classification methods cannot extract deep features and are greatly limited in performance. In recent years, deep learning has begun to be applied to the field of medical hyperspectral image processing as an end-to-end method. Deep learning relies on data driving to learn each level of features of the low, medium and high levels of images. Among these, convolutional neural networks are excellent in diagnostic tasks due to their use of local receptive fields and translational invariance. However, hyperspectral image bands are abundant in number, and a traditional convolutional neural network cannot mine effective relation information among long-distance bands and distort the original spectrum sequence relation. This limits the performance of convolutional neural network methods on medical hyperspectral images.

Recently, transformers have received great attention with their strong global modeling capabilities. The self-attention mechanism in the transducer can capture the relation between long-distance spectrum bands, better model the spectrum sequence and achieve certain effect in the field of medical hyperspectral images. However, in the process of acquiring the medical hyperspectral image, due to different acquisition equipment, operation means and preprocessing modes (spectrum correction, noise reduction, unmixing and the like), the spectrum resolution and the spatial resolution of the hyperspectral image are often different, and the spectrum curves of the photographed biological tissues have larger difference. Thus, each specific diagnostic task often requires a different algorithm to be designed. When the transducer algorithm is applied to different diagnostic tasks, its performance is difficult to meet further accuracy requirements. In addition, in the traditional hyperspectral image classification deep learning algorithm, single output prediction is often carried out, and comprehensive prediction cannot be carried out on image types by combining information of multiple fields of view, so that a bottleneck is brought to the performance of a model.

Disclosure of Invention

Aiming at the problems in the background technology, the invention provides a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which is innovated in that: comprising the following steps:

s1: normalizing the obtained original hyperspectral image to obtain a normalized image;

s2: cutting the normalized image by taking the target sample as a center to obtain a cut image;

s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;

s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;

s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;

the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics (the classifier and the encoder are trained in advance when the classifier and the encoder are trained as an integral network); the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;

the first method comprises the following steps:

1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder ¹ 、K ¹ 、V ¹ ；

2) Matrix Q based on spatial attention mechanism ¹ Processing is performed from matrix Q ¹ Extracting spatial features from the image

3) Matrix K based on spectrum attention mechanism ¹ Processing is performed from matrix K ¹ Extracting spectral features from

4) Will be

and />

Performing dot product to obtain first space-spectrum self-attention parameter->

5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V ¹ Extracting first space-spectrum characteristics from the spectrum

The procedure of step 5) is represented as follows:

wherein f^a×a×a (V ¹ ) Representation pair matrix V ¹ The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, as a result, the dot-product operation (note that the dot-product operation in the formula of step 5) and the dot-product operation in step 4) were two different dot-product operations, f ^a×a×a (V ¹ ) Essentially to matrix V ¹ Preliminary feature extraction is performed, and then by

and f^a×a×a (V ¹ ) Performing a dot product operation to extract a first space-spectrum feature;

the second method comprises the following steps:

a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder ² 、K ² 、V ² ；

B) Matrix Q based on spatial attention mechanism ² Processing is performed from matrix Q ² Extracting spatial features from the image

C) Matrix K based on spectrum attention mechanism ² Processing is performed from matrix K ² Extracting spectral features from

D) Will be

and />

Performing dot product to obtain second space-spectrum self-attention parameter->

E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing and fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a first fusion self-attention parameter, and then a matrix V is obtained according to the first fusion self-attention parameter based on a self-attention mechanism ² Extracting a second space-spectrum characteristic

The procedure of step E) is represented as follows:

wherein ,

representation pair->

and />

Performing dimension reduction treatment on the spliced and fused results, and f ^b×b×b (V ² ) Representation pair matrix V ² The region with the size of b x b is subjected to 3D convolution processing, b x b is the size of the receptive field of the second encoder, concat (-) means that the point product operation in the formula of step E and the point product operation in step D are two different point products in the concatenation process (note)Operation f ^b×b×b (V ² ) Is a matrix V ² Performing preliminary feature extraction, and then performing the method by using +.>

and f^b×b×b (V ² ) Performing a dot product operation to extract a second spatio-spectral feature);

the third method comprises the following steps:

first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder ³ 、K ³ 、V ³ ；

Two) matrix Q based on spatial attention mechanism ³ Processing is performed from matrix Q ³ Extracting spatial features from the image

Three) matrix K based on spectrum attention mechanism ³ Processing is performed from matrix K ³ Extracting spectral features from

Fourth) will

and />

Dot product was performed to obtain a third space-spectrum self-attention parameter->

Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are subjected to splicing fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, the second fusion self-attention parameter is used for obtaining a matrix V ³ Extracting a third space-spectrum characteristic

The process of step five) is represented as follows:

wherein ,

representation pair->

and />

Performing dimension reduction treatment on the spliced and fused results, and f ^c×c×c (V ³ ) Representation pair matrix V ³ The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c (note, step five) the dot-product operation in the formula and the dot-product operation in step four) are two different dot-product operations, f ^c×c×c (V ³ ) Is a matrix V ³ Performing preliminary feature extraction, and then performing the method by using +.>

and f^c×c×c (V ³ ) A dot product operation is performed to extract a third space-spectrum feature).

The principle of the scheme is as follows:

1) In order to meet the processing requirements of medical hyperspectral images with different information densities acquired by different instruments and equipment and extract characteristic information under different visual field conditions, three transform encoders with different visual field conditions are designed, namely, three transform encoders with different internal convolution kernel sizes have three visual field sizes, so that the characteristic information under the three visual field conditions can be extracted; 2) The existing transducer encoder generally processes and realizes the three mapped matrixes Q, K, V directly by a self-attention mechanism, but the invention is different from the prior art, the invention firstly adopts a spatial attention mechanism to extract spatial characteristics from a Q matrix, then uses a spectral attention mechanism to extract spectral characteristics from a K matrix, and constructs a space-spectrum self-attention parameter by dot product of the spatial characteristics and the spectral characteristics; then based on the self-attention mechanism, extracting the space-spectrum characteristics from the corresponding matrix according to the self-attention parameters; 3) Except the first encoder with the largest visual field size, the other two encoders adopt fusion self-attention parameters, namely, the first fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter and then carrying out dimension reduction processing, the second fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter and then carrying out dimension reduction processing, and compared with the prior art, the processing mode can effectively combine the space-spectrum self-attention of different scales obtained under different visual fields and extract key space-spectrum information under a plurality of scales; 4) And for the prediction results of the three classifiers, the three are weighted and fused to form a final prediction result, and compared with the prior art, the processing mode can more effectively fuse the prediction probabilities under a plurality of fields of view, so that the prediction probability is more accurate.

Preferably, in the step S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.

The beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.

Detailed Description

A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is innovated in that: comprising the following steps:

s1: for a pair ofNormalizing the obtained original hyperspectral image to obtain a normalized image; in particular, the smallest pixel value V in the image is found _min Maximum pixel value V _max For any pixel value V in the image, the normalized pixel value V _norm Can be expressed as V _norm ＝(V-V _min )/(V _max -V _min )；

S2: cutting the normalized image by taking the target sample as a center to obtain a cut image; in the specific implementation, the target pixel is taken as the center, and the image blocks with the length and the width of m pixels are cut along the whole spectrum dimension;

the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics; the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;

the first method comprises the following steps:

1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder ¹ 、K ¹ 、V ¹ The method comprises the steps of carrying out a first treatment on the surface of the In specific implementation, the input features are mapped into three matrices through layer normalization, multi-layer perceptron and Reshape operations

and />

Wherein w is the spatial scale of the feature, d is the number of bands contained in the feature (the process of obtaining the corresponding matrix in the second and third methods is similar);

In particular, the pair Q is pooled by global maximization and global averaging ¹ The channel domain features of the (2) are compressed, the two obtained features are spliced along the channel dimension, then the two channel features are converted into single channel features through a convolution layer, and then the single channel features are activated by a Sigmoid function to obtain the spatial attention A ¹ _spa Then Q is taken ¹ Performing point multiplication and residual connection with spatial attention, and activating with ReLU function to obtain spatial feature->

(corresponding steps in method two and method three are similar to the steps in the method two and the method three);

In practice, K is pooled using max pooling and mean pooling ¹ The spatial feature information in each band is compressed, and then the multi-layer perceptron is used for mapping the compressed features to improve the compressionAnd finally, adding and fusing the compressed information obtained by the two compression modes on each wave band, and activating the compressed information by using a Sigmoid function to obtain the spectrum attention A ¹ _spe Then K is taken up ¹ Performing point multiplication and residual connection with spectrum attention, and activating with ReLU function to obtain spectral feature +.>

4) Will be

and />

The procedure of step 5) is represented as follows:

wherein f^a×a×a (V ¹ ) Representation pair matrix V ¹ The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, the ";

the second method comprises the following steps:

D) Will be

and />

E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing fusion processing (in order to better utilize the space-spectrum self-attention obtained under different visual fields, the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are integrated through the splicing fusion processing), then the spliced fusion result is subjected to dimension reduction processing by a convolution layer to obtain a first fusion self-attention parameter, and then, based on a self-attention mechanism, a matrix V is obtained according to the first fusion self-attention parameter ² Extracting a second space-spectrum characteristic

The procedure of step E) is represented as follows:

wherein ,

representation pair->

and />

Performing dimension reduction processing (namely 3D convolution processing in specific operation) on the spliced and fused result, f ^b×b×b (V ² ) Representation pair matrix V ² The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;

the third method comprises the following steps:

/>

Fourth) will

and />

Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are enteredPerforming row-splice fusion processing (integrating three space-spectrum self-attention parameters based on the same reason as corresponding processing in the second method), performing dimension reduction processing on the spliced and fused result by using one convolution layer to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, performing matrix V according to the second fusion self-attention parameter ³ Extracting a third space-spectrum characteristic

The process of step five) is represented as follows:

wherein ,

representation pair->

and />

Performing dimension reduction treatment on the spliced and fused results, and f ^c×c×c (V ³ ) Representation pair matrix V ³ The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c.

Further, in S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.

In order to verify the effectiveness of the proposed method, some experimental demonstrations were made. All experiments were performed with In-vivo Human Brain HSI Dataset and BloodCell HSI Dataset. The following description will be given of each:

(1) In-vivo Human Brain HSI Dataset (Brain HSI Dataset): the data set was collected together by the university of south Anpton Hospital (UHS) and the university of Spanish Laplace Max Hospital (UHDRN). Acquisition systemFrom the following components

VNIR a-Series camera composition. The camera is based on push-broom technology, uses a silicon CCD detector array, has the lowest frame rate of 90 frames/second, the spectral range of 400-1000 nm and the spectral resolution of 2-3 nm, and can capture 826 spectral bands of 1004 spatial pixels per row. The acquisition object is 16 adult patients in the craniotomy brain tumor operation process, and finally 26 hyperspectral images are obtained, wherein the hyperspectral images comprise four categories of background, normal, tumor and blood vessels. In the experiments herein, hyperspectral images containing all four categories were selected for the experiments, containing a total of 6 patients, 9 images.

(2) BloodCell HSI Dataset: the data set is obtained by integrating a microscope and a silicon charge-coupled device with

A Liquid Crystal Tunable Filter (LCTFS) is collected in combination. The dataset contained two blood cell images, named blood cell1-3 and blood cell2-2, respectively. The size of Bloodcell1-3 is 973×799 and the size of Bloodcell2-2 is 462×451, all of which contain 33 bands. Each hyperspectral image contains 3 categories of erythrocytes, leukocytes and background. In the aspect of comparison methods, a convolutional neural network method HybridSN, SSRN and a DBDA, a transform deep learning method spectra-wise ViT, an SSFTT and a CTMixer are selected as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA), the Average classification Accuracy (AA), and the Kappa coefficient (Kappa Coefficient, KC) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm.

To verify the effectiveness of the two strategies proposed by the present invention, ablation experiments were performed on Brain HSI dataset, and the results showed that OA, AA and KC increased by 6.27%, 8.55% and 6.63%, and 4.82%, 5.12% and 5.61%, respectively, after addition of the two strategies, air-spectrum self-attention and multi-view predictive fusion, respectively, to the network alone. After two strategies are added to the network at the same time, the prediction capacity of the model is further improved, and the OA, AA and KC are respectively improved by 10.67%, 9.99% and 8.36%. This shows that the multi-view space-spectrum information strategy successfully focuses on the key space-spectrum characteristic region, and extracts the space-spectrum characteristics with more discrimination. Meanwhile, the multi-view prediction fusion strategy organically fuses diagnosis predictions in different views, so that the model classification effect is improved. The experiment respectively verifies the effectiveness of the added two strategies of multi-view space-spectrum information fusion and multi-view prediction fusion on the improvement of the network classification performance, and proves that the two strategies have better effects when being applied to the network.

In order to further verify the effectiveness of the algorithm provided by the invention, a convolutional neural network method HybridSN, SSRN and a DBDA (data base architecture) are selected, and a transform deep learning method spectra-wise ViT, SSFTT and CTMixer are used as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm. In the experiments, the experimental setup on both data is shown in table 1.

Table 1 experimental setup on Brain and blood cell HSI datasets

On the Brain HSI dataset, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the proposed method achieved classification accuracies of 68.32%, 72.25%, 68.99%, 51.75%, 71.07%, 65.66% and 82.25% respectively, because the spatial attention extraction and spectral attention extraction operations in the present invention successfully achieved important spatial regions and important spectral bands, eliminating feature redundancy. In addition, the prediction probability under the multi-view encoder is weighted and fused according to the self-adaptive learned weight, so that the prediction capability of the model is improved. On the blood cell HSI data set, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the method provided by the invention respectively obtain 89.21%, 88.71%, 88.25%, 77.13%, 90.45%, 89.50% and 91.74% classification precision, and the invention still obtains better classification results. This is because the proposed empty-spectral information extraction block is still able to capture critical empty-spectral information, improve the classification ability of the model, and fuse diagnostic information in different fields of view. Experiments prove that the method provided by the invention can be suitable for medical hyperspectral images acquired by different instruments and different imaging modes, has better generalization and saves the model development cost.

Claims

1. A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is characterized by comprising the following steps of: comprising the following steps:

the first method comprises the following steps:

4) Will be

and />

The procedure of step 5) is represented as follows:

the second method comprises the following steps:

D) Will be

and />

The procedure of step E) is represented as follows: />

wherein ,

representation pair->

and />

Performing dimension reduction treatment on the spliced and fused results, and f ^b×b×b (V ² ) Representation pair matrix V ² The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;

the third method comprises the following steps:

Fourth) will

and />

The process of step five) is represented as follows:

wherein ,

representation pair->

and />

2. The medical hyperspectral image classification method based on the space-spectrum self-attention mechanism as recited in claim 1, wherein: in the step S5, a trainable weight is adopted to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.