CN116229163A - Medical hyperspectral image classification method based on space-spectrum self-attention mechanism - Google Patents

Medical hyperspectral image classification method based on space-spectrum self-attention mechanism Download PDF

Info

Publication number
CN116229163A
CN116229163A CN202310152996.7A CN202310152996A CN116229163A CN 116229163 A CN116229163 A CN 116229163A CN 202310152996 A CN202310152996 A CN 202310152996A CN 116229163 A CN116229163 A CN 116229163A
Authority
CN
China
Prior art keywords
space
spectrum
self
matrix
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310152996.7A
Other languages
Chinese (zh)
Inventor
黄鸿
李�远
谭崎娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202310152996.7A priority Critical patent/CN116229163A/en
Publication of CN116229163A publication Critical patent/CN116229163A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/194Terrestrial scenes using hyperspectral data, i.e. more or other wavelengths than RGB
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which comprises the following steps: s1: normalizing the obtained original hyperspectral image; s2: cutting the normalized image by taking the target sample as the center; s3: inputting the cut image into a processing unit for processing to obtain a first space-spectrum characteristic, a second space-spectrum characteristic and a third space-spectrum characteristic; s4: processing the three space-spectrum features by using three classifiers to obtain three prediction results; s5: weighting and fusing the three prediction results to obtain a final prediction result; the beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.

Description

Medical hyperspectral image classification method based on space-spectrum self-attention mechanism
Technical Field
The invention relates to a hyperspectral image processing technology, in particular to a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism.
Background
The hyperspectral imaging technology is an advanced image space information and spectrum information extraction technology, and can simultaneously acquire two-dimensional space information and one-dimensional spectrum information of a shooting object and cover spectrum ranges of visible light, infrared, ultraviolet and the like. In recent years, since hyperspectral imaging can provide diagnostic information about physiological, morphological and biochemical components of tissues, it provides fine spectral features for biological histology research, and is receiving widespread attention as a non-invasive auxiliary diagnostic means, and has been successfully applied to diagnosis and monitoring of non-invasive diseases, image-guided minimally invasive surgery, drug dose evaluation, and the like. Along with the high-speed development of accurate medical theory, how to design an efficient and accurate diagnosis algorithm aiming at the characteristics of high dimensionality, high redundancy and 'map in one' of hyperspectral medical images has become a research hotspot in the field of medical hyperspectral image analysis.
Traditional medical hyperspectral image classification methods generally use a classifier to classify manual features after extracting them, however, traditional classification methods cannot extract deep features and are greatly limited in performance. In recent years, deep learning has begun to be applied to the field of medical hyperspectral image processing as an end-to-end method. Deep learning relies on data driving to learn each level of features of the low, medium and high levels of images. Among these, convolutional neural networks are excellent in diagnostic tasks due to their use of local receptive fields and translational invariance. However, hyperspectral image bands are abundant in number, and a traditional convolutional neural network cannot mine effective relation information among long-distance bands and distort the original spectrum sequence relation. This limits the performance of convolutional neural network methods on medical hyperspectral images.
Recently, transformers have received great attention with their strong global modeling capabilities. The self-attention mechanism in the transducer can capture the relation between long-distance spectrum bands, better model the spectrum sequence and achieve certain effect in the field of medical hyperspectral images. However, in the process of acquiring the medical hyperspectral image, due to different acquisition equipment, operation means and preprocessing modes (spectrum correction, noise reduction, unmixing and the like), the spectrum resolution and the spatial resolution of the hyperspectral image are often different, and the spectrum curves of the photographed biological tissues have larger difference. Thus, each specific diagnostic task often requires a different algorithm to be designed. When the transducer algorithm is applied to different diagnostic tasks, its performance is difficult to meet further accuracy requirements. In addition, in the traditional hyperspectral image classification deep learning algorithm, single output prediction is often carried out, and comprehensive prediction cannot be carried out on image types by combining information of multiple fields of view, so that a bottleneck is brought to the performance of a model.
Disclosure of Invention
Aiming at the problems in the background technology, the invention provides a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which is innovated in that: comprising the following steps:
s1: normalizing the obtained original hyperspectral image to obtain a normalized image;
s2: cutting the normalized image by taking the target sample as a center to obtain a cut image;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics (the classifier and the encoder are trained in advance when the classifier and the encoder are trained as an integral network); the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the image
Figure BDA0004091399350000021
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features from
Figure BDA0004091399350000022
4) Will be
Figure BDA0004091399350000023
and />
Figure BDA0004091399350000024
Performing dot product to obtain first space-spectrum self-attention parameter->
Figure BDA0004091399350000025
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrum
Figure BDA0004091399350000026
The procedure of step 5) is represented as follows:
Figure BDA0004091399350000027
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, as a result, the dot-product operation (note that the dot-product operation in the formula of step 5) and the dot-product operation in step 4) were two different dot-product operations, f a×a×a (V 1 ) Essentially to matrix V 1 Preliminary feature extraction is performed, and then by
Figure BDA0004091399350000028
and fa×a×a (V 1 ) Performing a dot product operation to extract a first space-spectrum feature;
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
Figure BDA0004091399350000029
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
Figure BDA00040913993500000210
D) Will be
Figure BDA00040913993500000211
and />
Figure BDA00040913993500000212
Performing dot product to obtain second space-spectrum self-attention parameter->
Figure BDA00040913993500000213
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing and fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a first fusion self-attention parameter, and then a matrix V is obtained according to the first fusion self-attention parameter based on a self-attention mechanism 2 Extracting a second space-spectrum characteristic
Figure BDA0004091399350000031
The procedure of step E) is represented as follows:
Figure BDA0004091399350000032
wherein ,
Figure BDA0004091399350000033
representation pair->
Figure BDA0004091399350000034
and />
Figure BDA0004091399350000035
Performing dimension reduction treatment on the spliced and fused results, and f b×b×b (V 2 ) Representation pair matrix V 2 The region with the size of b x b is subjected to 3D convolution processing, b x b is the size of the receptive field of the second encoder, concat (-) means that the point product operation in the formula of step E and the point product operation in step D are two different point products in the concatenation process (note)Operation f b×b×b (V 2 ) Is a matrix V 2 Performing preliminary feature extraction, and then performing the method by using +.>
Figure BDA0004091399350000036
and fb×b×b (V 2 ) Performing a dot product operation to extract a second spatio-spectral feature);
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Figure BDA0004091399350000037
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from
Figure BDA0004091399350000038
Fourth) will
Figure BDA0004091399350000039
and />
Figure BDA00040913993500000310
Dot product was performed to obtain a third space-spectrum self-attention parameter->
Figure BDA00040913993500000311
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are subjected to splicing fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, the second fusion self-attention parameter is used for obtaining a matrix V 3 Extracting a third space-spectrum characteristic
Figure BDA00040913993500000312
The process of step five) is represented as follows:
Figure BDA00040913993500000313
wherein ,
Figure BDA00040913993500000314
representation pair->
Figure BDA00040913993500000315
and />
Figure BDA00040913993500000316
Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c (note, step five) the dot-product operation in the formula and the dot-product operation in step four) are two different dot-product operations, f c×c×c (V 3 ) Is a matrix V 3 Performing preliminary feature extraction, and then performing the method by using +.>
Figure BDA00040913993500000317
and fc×c×c (V 3 ) A dot product operation is performed to extract a third space-spectrum feature).
The principle of the scheme is as follows:
1) In order to meet the processing requirements of medical hyperspectral images with different information densities acquired by different instruments and equipment and extract characteristic information under different visual field conditions, three transform encoders with different visual field conditions are designed, namely, three transform encoders with different internal convolution kernel sizes have three visual field sizes, so that the characteristic information under the three visual field conditions can be extracted; 2) The existing transducer encoder generally processes and realizes the three mapped matrixes Q, K, V directly by a self-attention mechanism, but the invention is different from the prior art, the invention firstly adopts a spatial attention mechanism to extract spatial characteristics from a Q matrix, then uses a spectral attention mechanism to extract spectral characteristics from a K matrix, and constructs a space-spectrum self-attention parameter by dot product of the spatial characteristics and the spectral characteristics; then based on the self-attention mechanism, extracting the space-spectrum characteristics from the corresponding matrix according to the self-attention parameters; 3) Except the first encoder with the largest visual field size, the other two encoders adopt fusion self-attention parameters, namely, the first fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter and then carrying out dimension reduction processing, the second fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter and then carrying out dimension reduction processing, and compared with the prior art, the processing mode can effectively combine the space-spectrum self-attention of different scales obtained under different visual fields and extract key space-spectrum information under a plurality of scales; 4) And for the prediction results of the three classifiers, the three are weighted and fused to form a final prediction result, and compared with the prior art, the processing mode can more effectively fuse the prediction probabilities under a plurality of fields of view, so that the prediction probability is more accurate.
Preferably, in the step S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
The beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.
Detailed Description
A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is innovated in that: comprising the following steps:
s1: for a pair ofNormalizing the obtained original hyperspectral image to obtain a normalized image; in particular, the smallest pixel value V in the image is found min Maximum pixel value V max For any pixel value V in the image, the normalized pixel value V norm Can be expressed as V norm =(V-V min )/(V max -V min );
S2: cutting the normalized image by taking the target sample as a center to obtain a cut image; in the specific implementation, the target pixel is taken as the center, and the image blocks with the length and the width of m pixels are cut along the whole spectrum dimension;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics; the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1 The method comprises the steps of carrying out a first treatment on the surface of the In specific implementation, the input features are mapped into three matrices through layer normalization, multi-layer perceptron and Reshape operations
Figure BDA0004091399350000051
and />
Figure BDA0004091399350000052
Wherein w is the spatial scale of the feature, d is the number of bands contained in the feature (the process of obtaining the corresponding matrix in the second and third methods is similar);
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the image
Figure BDA0004091399350000053
In particular, the pair Q is pooled by global maximization and global averaging 1 The channel domain features of the (2) are compressed, the two obtained features are spliced along the channel dimension, then the two channel features are converted into single channel features through a convolution layer, and then the single channel features are activated by a Sigmoid function to obtain the spatial attention A 1 spa Then Q is taken 1 Performing point multiplication and residual connection with spatial attention, and activating with ReLU function to obtain spatial feature->
Figure BDA0004091399350000054
(corresponding steps in method two and method three are similar to the steps in the method two and the method three);
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features from
Figure BDA0004091399350000055
In practice, K is pooled using max pooling and mean pooling 1 The spatial feature information in each band is compressed, and then the multi-layer perceptron is used for mapping the compressed features to improve the compressionAnd finally, adding and fusing the compressed information obtained by the two compression modes on each wave band, and activating the compressed information by using a Sigmoid function to obtain the spectrum attention A 1 spe Then K is taken up 1 Performing point multiplication and residual connection with spectrum attention, and activating with ReLU function to obtain spectral feature +.>
Figure BDA0004091399350000056
(corresponding steps in method two and method three are similar to the steps in the method two and the method three);
4) Will be
Figure BDA0004091399350000057
and />
Figure BDA0004091399350000058
Performing dot product to obtain first space-spectrum self-attention parameter->
Figure BDA0004091399350000059
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrum
Figure BDA00040913993500000510
The procedure of step 5) is represented as follows:
Figure BDA00040913993500000511
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, the ";
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
Figure BDA00040913993500000512
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
Figure BDA00040913993500000513
D) Will be
Figure BDA00040913993500000514
and />
Figure BDA00040913993500000515
Performing dot product to obtain second space-spectrum self-attention parameter->
Figure BDA00040913993500000516
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing fusion processing (in order to better utilize the space-spectrum self-attention obtained under different visual fields, the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are integrated through the splicing fusion processing), then the spliced fusion result is subjected to dimension reduction processing by a convolution layer to obtain a first fusion self-attention parameter, and then, based on a self-attention mechanism, a matrix V is obtained according to the first fusion self-attention parameter 2 Extracting a second space-spectrum characteristic
Figure BDA0004091399350000061
The procedure of step E) is represented as follows:
Figure BDA0004091399350000062
wherein ,
Figure BDA0004091399350000063
representation pair->
Figure BDA0004091399350000064
and />
Figure BDA0004091399350000065
Performing dimension reduction processing (namely 3D convolution processing in specific operation) on the spliced and fused result, f b×b×b (V 2 ) Representation pair matrix V 2 The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Figure BDA0004091399350000066
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from
Figure BDA0004091399350000067
/>
Fourth) will
Figure BDA0004091399350000068
and />
Figure BDA0004091399350000069
Dot product was performed to obtain a third space-spectrum self-attention parameter->
Figure BDA00040913993500000610
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are enteredPerforming row-splice fusion processing (integrating three space-spectrum self-attention parameters based on the same reason as corresponding processing in the second method), performing dimension reduction processing on the spliced and fused result by using one convolution layer to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, performing matrix V according to the second fusion self-attention parameter 3 Extracting a third space-spectrum characteristic
Figure BDA00040913993500000611
The process of step five) is represented as follows:
Figure BDA00040913993500000612
wherein ,
Figure BDA00040913993500000613
representation pair->
Figure BDA00040913993500000614
and />
Figure BDA00040913993500000615
Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c.
Further, in S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
In order to verify the effectiveness of the proposed method, some experimental demonstrations were made. All experiments were performed with In-vivo Human Brain HSI Dataset and BloodCell HSI Dataset. The following description will be given of each:
(1) In-vivo Human Brain HSI Dataset (Brain HSI Dataset): the data set was collected together by the university of south Anpton Hospital (UHS) and the university of Spanish Laplace Max Hospital (UHDRN). Acquisition systemFrom the following components
Figure BDA00040913993500000616
VNIR a-Series camera composition. The camera is based on push-broom technology, uses a silicon CCD detector array, has the lowest frame rate of 90 frames/second, the spectral range of 400-1000 nm and the spectral resolution of 2-3 nm, and can capture 826 spectral bands of 1004 spatial pixels per row. The acquisition object is 16 adult patients in the craniotomy brain tumor operation process, and finally 26 hyperspectral images are obtained, wherein the hyperspectral images comprise four categories of background, normal, tumor and blood vessels. In the experiments herein, hyperspectral images containing all four categories were selected for the experiments, containing a total of 6 patients, 9 images.
(2) BloodCell HSI Dataset: the data set is obtained by integrating a microscope and a silicon charge-coupled device with
Figure BDA0004091399350000071
A Liquid Crystal Tunable Filter (LCTFS) is collected in combination. The dataset contained two blood cell images, named blood cell1-3 and blood cell2-2, respectively. The size of Bloodcell1-3 is 973×799 and the size of Bloodcell2-2 is 462×451, all of which contain 33 bands. Each hyperspectral image contains 3 categories of erythrocytes, leukocytes and background. In the aspect of comparison methods, a convolutional neural network method HybridSN, SSRN and a DBDA, a transform deep learning method spectra-wise ViT, an SSFTT and a CTMixer are selected as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA), the Average classification Accuracy (AA), and the Kappa coefficient (Kappa Coefficient, KC) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm.
To verify the effectiveness of the two strategies proposed by the present invention, ablation experiments were performed on Brain HSI dataset, and the results showed that OA, AA and KC increased by 6.27%, 8.55% and 6.63%, and 4.82%, 5.12% and 5.61%, respectively, after addition of the two strategies, air-spectrum self-attention and multi-view predictive fusion, respectively, to the network alone. After two strategies are added to the network at the same time, the prediction capacity of the model is further improved, and the OA, AA and KC are respectively improved by 10.67%, 9.99% and 8.36%. This shows that the multi-view space-spectrum information strategy successfully focuses on the key space-spectrum characteristic region, and extracts the space-spectrum characteristics with more discrimination. Meanwhile, the multi-view prediction fusion strategy organically fuses diagnosis predictions in different views, so that the model classification effect is improved. The experiment respectively verifies the effectiveness of the added two strategies of multi-view space-spectrum information fusion and multi-view prediction fusion on the improvement of the network classification performance, and proves that the two strategies have better effects when being applied to the network.
In order to further verify the effectiveness of the algorithm provided by the invention, a convolutional neural network method HybridSN, SSRN and a DBDA (data base architecture) are selected, and a transform deep learning method spectra-wise ViT, SSFTT and CTMixer are used as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm. In the experiments, the experimental setup on both data is shown in table 1.
Table 1 experimental setup on Brain and blood cell HSI datasets
Figure BDA0004091399350000081
On the Brain HSI dataset, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the proposed method achieved classification accuracies of 68.32%, 72.25%, 68.99%, 51.75%, 71.07%, 65.66% and 82.25% respectively, because the spatial attention extraction and spectral attention extraction operations in the present invention successfully achieved important spatial regions and important spectral bands, eliminating feature redundancy. In addition, the prediction probability under the multi-view encoder is weighted and fused according to the self-adaptive learned weight, so that the prediction capability of the model is improved. On the blood cell HSI data set, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the method provided by the invention respectively obtain 89.21%, 88.71%, 88.25%, 77.13%, 90.45%, 89.50% and 91.74% classification precision, and the invention still obtains better classification results. This is because the proposed empty-spectral information extraction block is still able to capture critical empty-spectral information, improve the classification ability of the model, and fuse diagnostic information in different fields of view. Experiments prove that the method provided by the invention can be suitable for medical hyperspectral images acquired by different instruments and different imaging modes, has better generalization and saves the model development cost.

Claims (2)

1. A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is characterized by comprising the following steps of: comprising the following steps:
s1: normalizing the obtained original hyperspectral image to obtain a normalized image;
s2: cutting the normalized image by taking the target sample as a center to obtain a cut image;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics; the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the image
Figure FDA0004091399340000011
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features from
Figure FDA0004091399340000012
4) Will be
Figure FDA0004091399340000013
and />
Figure FDA0004091399340000014
Performing dot product to obtain first space-spectrum self-attention parameter->
Figure FDA0004091399340000015
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrum
Figure FDA0004091399340000016
The procedure of step 5) is represented as follows:
Figure FDA0004091399340000017
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, the ";
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
Figure FDA0004091399340000018
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
Figure FDA0004091399340000019
D) Will be
Figure FDA0004091399340000021
and />
Figure FDA0004091399340000022
Performing dot product to obtain second space-spectrum self-attention parameter->
Figure FDA0004091399340000023
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing and fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a first fusion self-attention parameter, and then a matrix V is obtained according to the first fusion self-attention parameter based on a self-attention mechanism 2 Extracting a second space-spectrum characteristic
Figure FDA0004091399340000024
The procedure of step E) is represented as follows: />
Figure FDA0004091399340000025
wherein ,
Figure FDA0004091399340000026
representation pair->
Figure FDA0004091399340000027
and />
Figure FDA0004091399340000028
Performing dimension reduction treatment on the spliced and fused results, and f b×b×b (V 2 ) Representation pair matrix V 2 The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Figure FDA0004091399340000029
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from
Figure FDA00040913993400000210
Fourth) will
Figure FDA00040913993400000211
and />
Figure FDA00040913993400000212
Dot product was performed to obtain a third space-spectrum self-attention parameter->
Figure FDA00040913993400000213
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are subjected to splicing fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, the second fusion self-attention parameter is used for obtaining a matrix V 3 Extracting a third space-spectrum characteristic
Figure FDA00040913993400000214
The process of step five) is represented as follows:
Figure FDA00040913993400000215
wherein ,
Figure FDA00040913993400000216
representation pair->
Figure FDA00040913993400000217
and />
Figure FDA00040913993400000218
Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c.
2. The medical hyperspectral image classification method based on the space-spectrum self-attention mechanism as recited in claim 1, wherein: in the step S5, a trainable weight is adopted to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
CN202310152996.7A 2023-02-22 2023-02-22 Medical hyperspectral image classification method based on space-spectrum self-attention mechanism Pending CN116229163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310152996.7A CN116229163A (en) 2023-02-22 2023-02-22 Medical hyperspectral image classification method based on space-spectrum self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310152996.7A CN116229163A (en) 2023-02-22 2023-02-22 Medical hyperspectral image classification method based on space-spectrum self-attention mechanism

Publications (1)

Publication Number Publication Date
CN116229163A true CN116229163A (en) 2023-06-06

Family

ID=86576418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310152996.7A Pending CN116229163A (en) 2023-02-22 2023-02-22 Medical hyperspectral image classification method based on space-spectrum self-attention mechanism

Country Status (1)

Country Link
CN (1) CN116229163A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740474A (en) * 2023-08-15 2023-09-12 南京信息工程大学 Remote sensing image classification method based on anchoring stripe attention mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740474A (en) * 2023-08-15 2023-09-12 南京信息工程大学 Remote sensing image classification method based on anchoring stripe attention mechanism

Similar Documents

Publication Publication Date Title
Xia et al. A novel improved deep convolutional neural network model for medical image fusion
US9317761B2 (en) Method and an apparatus for determining vein patterns from a colour image
Lau et al. Automatically early detection of skin cancer: Study based on nueral netwok classification
US9795786B2 (en) Saliency-based apparatus and methods for visual prostheses
Abunadi et al. Deep learning and machine learning techniques of diagnosis dermoscopy images for early detection of skin diseases
WO2009142758A1 (en) Systems and methods for hyperspectral medical imaging
Abade et al. NemaNet: A convolutional neural network model for identification of soybean nematodes
Abdolmaleki et al. Selecting optimum base wavelet for extracting spectral alteration features associated with porphyry copper mineralization using hyperspectral images
CN116229163A (en) Medical hyperspectral image classification method based on space-spectrum self-attention mechanism
Goceri Comparison of the impacts of dermoscopy image augmentation methods on skin cancer classification and a new augmentation method with wavelet packets
Tawfik et al. Early recognition and grading of cataract using a combined log Gabor/discrete wavelet transform with ANN and SVM
Gray et al. A perceptron reveals the face of sex
Junayed et al. A transformer-based versatile network for acne vulgaris segmentation
Akbari et al. Wavelet-based compression and segmentation of hyperspectral images in surgery
Shobarani et al. Melanoma Malignancy Prognosis Using Deep Transfer Learning
CN116309754A (en) Brain medical image registration method and system based on local-global information collaboration
Yang et al. A multispectral digital cervigram analyzer in the wavelet domain for early detection of cervical cancer
Xu et al. A Multi-scale Attention-based Convolutional Network for Identification of Alzheimer's Disease based on Hippocampal Subfields
Ma et al. Conditional generative adversarial network (cGAN) for synthesis of digital histologic images from hyperspectral images
Prema et al. Infrared and Visible image fusion using LatLRR and ResNet
Garcia et al. Non-invasive diabetes detection using facial texture features captured in a less restrictive environment
Zhan et al. Saliency based wireless capsule endoscopy video abstract
Bano et al. Automatic Detection of Melanoma and Non Melanoma Skin Cancer: Using Classification Framework of Neural Network
Lu et al. 3D Visual Discomfort Assessment with a Weakly Supervised Graph Convolution Neural Network Based on Inaccurately Labeled EEG
Arora et al. Dermcare-Skin Disease Classification from Image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination