CN116229163A - Medical hyperspectral image classification method based on space-spectrum self-attention mechanism - Google Patents
Medical hyperspectral image classification method based on space-spectrum self-attention mechanism Download PDFInfo
- Publication number
- CN116229163A CN116229163A CN202310152996.7A CN202310152996A CN116229163A CN 116229163 A CN116229163 A CN 116229163A CN 202310152996 A CN202310152996 A CN 202310152996A CN 116229163 A CN116229163 A CN 116229163A
- Authority
- CN
- China
- Prior art keywords
- space
- spectrum
- self
- matrix
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 title claims abstract description 80
- 230000007246 mechanism Effects 0.000 title claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 52
- 239000011159 matrix material Substances 0.000 claims description 61
- 230000004927 fusion Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 20
- 230000003595 spectral effect Effects 0.000 claims description 19
- 230000009467 reduction Effects 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 10
- 238000007499 fusion processing Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 210000000601 blood cell Anatomy 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 238000000701 chemical imaging Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000004958 brain cell Anatomy 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007428 craniotomy Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000002324 minimally invasive surgery Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/194—Terrestrial scenes using hyperspectral data, i.e. more or other wavelengths than RGB
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/10—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which comprises the following steps: s1: normalizing the obtained original hyperspectral image; s2: cutting the normalized image by taking the target sample as the center; s3: inputting the cut image into a processing unit for processing to obtain a first space-spectrum characteristic, a second space-spectrum characteristic and a third space-spectrum characteristic; s4: processing the three space-spectrum features by using three classifiers to obtain three prediction results; s5: weighting and fusing the three prediction results to obtain a final prediction result; the beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.
Description
Technical Field
The invention relates to a hyperspectral image processing technology, in particular to a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism.
Background
The hyperspectral imaging technology is an advanced image space information and spectrum information extraction technology, and can simultaneously acquire two-dimensional space information and one-dimensional spectrum information of a shooting object and cover spectrum ranges of visible light, infrared, ultraviolet and the like. In recent years, since hyperspectral imaging can provide diagnostic information about physiological, morphological and biochemical components of tissues, it provides fine spectral features for biological histology research, and is receiving widespread attention as a non-invasive auxiliary diagnostic means, and has been successfully applied to diagnosis and monitoring of non-invasive diseases, image-guided minimally invasive surgery, drug dose evaluation, and the like. Along with the high-speed development of accurate medical theory, how to design an efficient and accurate diagnosis algorithm aiming at the characteristics of high dimensionality, high redundancy and 'map in one' of hyperspectral medical images has become a research hotspot in the field of medical hyperspectral image analysis.
Traditional medical hyperspectral image classification methods generally use a classifier to classify manual features after extracting them, however, traditional classification methods cannot extract deep features and are greatly limited in performance. In recent years, deep learning has begun to be applied to the field of medical hyperspectral image processing as an end-to-end method. Deep learning relies on data driving to learn each level of features of the low, medium and high levels of images. Among these, convolutional neural networks are excellent in diagnostic tasks due to their use of local receptive fields and translational invariance. However, hyperspectral image bands are abundant in number, and a traditional convolutional neural network cannot mine effective relation information among long-distance bands and distort the original spectrum sequence relation. This limits the performance of convolutional neural network methods on medical hyperspectral images.
Recently, transformers have received great attention with their strong global modeling capabilities. The self-attention mechanism in the transducer can capture the relation between long-distance spectrum bands, better model the spectrum sequence and achieve certain effect in the field of medical hyperspectral images. However, in the process of acquiring the medical hyperspectral image, due to different acquisition equipment, operation means and preprocessing modes (spectrum correction, noise reduction, unmixing and the like), the spectrum resolution and the spatial resolution of the hyperspectral image are often different, and the spectrum curves of the photographed biological tissues have larger difference. Thus, each specific diagnostic task often requires a different algorithm to be designed. When the transducer algorithm is applied to different diagnostic tasks, its performance is difficult to meet further accuracy requirements. In addition, in the traditional hyperspectral image classification deep learning algorithm, single output prediction is often carried out, and comprehensive prediction cannot be carried out on image types by combining information of multiple fields of view, so that a bottleneck is brought to the performance of a model.
Disclosure of Invention
Aiming at the problems in the background technology, the invention provides a medical hyperspectral image classification method based on a space-spectrum self-attention mechanism, which is innovated in that: comprising the following steps:
s1: normalizing the obtained original hyperspectral image to obtain a normalized image;
s2: cutting the normalized image by taking the target sample as a center to obtain a cut image;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics (the classifier and the encoder are trained in advance when the classifier and the encoder are trained as an integral network); the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1 ;
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the image
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features from
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrumThe procedure of step 5) is represented as follows:
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, as a result, the dot-product operation (note that the dot-product operation in the formula of step 5) and the dot-product operation in step 4) were two different dot-product operations, f a×a×a (V 1 ) Essentially to matrix V 1 Preliminary feature extraction is performed, and then by and fa×a×a (V 1 ) Performing a dot product operation to extract a first space-spectrum feature;
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2 ;
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing and fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a first fusion self-attention parameter, and then a matrix V is obtained according to the first fusion self-attention parameter based on a self-attention mechanism 2 Extracting a second space-spectrum characteristicThe procedure of step E) is represented as follows:
wherein ,representation pair-> and />Performing dimension reduction treatment on the spliced and fused results, and f b×b×b (V 2 ) Representation pair matrix V 2 The region with the size of b x b is subjected to 3D convolution processing, b x b is the size of the receptive field of the second encoder, concat (-) means that the point product operation in the formula of step E and the point product operation in step D are two different point products in the concatenation process (note)Operation f b×b×b (V 2 ) Is a matrix V 2 Performing preliminary feature extraction, and then performing the method by using +.> and fb×b×b (V 2 ) Performing a dot product operation to extract a second spatio-spectral feature);
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3 ;
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from
Fourth) will and />Dot product was performed to obtain a third space-spectrum self-attention parameter->
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are subjected to splicing fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, the second fusion self-attention parameter is used for obtaining a matrix V 3 Extracting a third space-spectrum characteristicThe process of step five) is represented as follows:
wherein ,representation pair-> and />Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c (note, step five) the dot-product operation in the formula and the dot-product operation in step four) are two different dot-product operations, f c×c×c (V 3 ) Is a matrix V 3 Performing preliminary feature extraction, and then performing the method by using +.> and fc×c×c (V 3 ) A dot product operation is performed to extract a third space-spectrum feature).
The principle of the scheme is as follows:
1) In order to meet the processing requirements of medical hyperspectral images with different information densities acquired by different instruments and equipment and extract characteristic information under different visual field conditions, three transform encoders with different visual field conditions are designed, namely, three transform encoders with different internal convolution kernel sizes have three visual field sizes, so that the characteristic information under the three visual field conditions can be extracted; 2) The existing transducer encoder generally processes and realizes the three mapped matrixes Q, K, V directly by a self-attention mechanism, but the invention is different from the prior art, the invention firstly adopts a spatial attention mechanism to extract spatial characteristics from a Q matrix, then uses a spectral attention mechanism to extract spectral characteristics from a K matrix, and constructs a space-spectrum self-attention parameter by dot product of the spatial characteristics and the spectral characteristics; then based on the self-attention mechanism, extracting the space-spectrum characteristics from the corresponding matrix according to the self-attention parameters; 3) Except the first encoder with the largest visual field size, the other two encoders adopt fusion self-attention parameters, namely, the first fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter and then carrying out dimension reduction processing, the second fusion self-attention parameter is obtained by carrying out splicing fusion on the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter and then carrying out dimension reduction processing, and compared with the prior art, the processing mode can effectively combine the space-spectrum self-attention of different scales obtained under different visual fields and extract key space-spectrum information under a plurality of scales; 4) And for the prediction results of the three classifiers, the three are weighted and fused to form a final prediction result, and compared with the prior art, the processing mode can more effectively fuse the prediction probabilities under a plurality of fields of view, so that the prediction probability is more accurate.
Preferably, in the step S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
The beneficial technical effects of the invention are as follows: the medical hyperspectral image classification method based on the space-spectrum self-attention mechanism is provided, corresponding space-spectrum characteristics can be extracted from multiple view angles by the scheme, and finally the classification accuracy can be improved.
Detailed Description
A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is innovated in that: comprising the following steps:
s1: for a pair ofNormalizing the obtained original hyperspectral image to obtain a normalized image; in particular, the smallest pixel value V in the image is found min Maximum pixel value V max For any pixel value V in the image, the normalized pixel value V norm Can be expressed as V norm =(V-V min )/(V max -V min );
S2: cutting the normalized image by taking the target sample as a center to obtain a cut image; in the specific implementation, the target pixel is taken as the center, and the image blocks with the length and the width of m pixels are cut along the whole spectrum dimension;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics; the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1 The method comprises the steps of carrying out a first treatment on the surface of the In specific implementation, the input features are mapped into three matrices through layer normalization, multi-layer perceptron and Reshape operations and />Wherein w is the spatial scale of the feature, d is the number of bands contained in the feature (the process of obtaining the corresponding matrix in the second and third methods is similar);
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the imageIn particular, the pair Q is pooled by global maximization and global averaging 1 The channel domain features of the (2) are compressed, the two obtained features are spliced along the channel dimension, then the two channel features are converted into single channel features through a convolution layer, and then the single channel features are activated by a Sigmoid function to obtain the spatial attention A 1 spa Then Q is taken 1 Performing point multiplication and residual connection with spatial attention, and activating with ReLU function to obtain spatial feature->(corresponding steps in method two and method three are similar to the steps in the method two and the method three);
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features fromIn practice, K is pooled using max pooling and mean pooling 1 The spatial feature information in each band is compressed, and then the multi-layer perceptron is used for mapping the compressed features to improve the compressionAnd finally, adding and fusing the compressed information obtained by the two compression modes on each wave band, and activating the compressed information by using a Sigmoid function to obtain the spectrum attention A 1 spe Then K is taken up 1 Performing point multiplication and residual connection with spectrum attention, and activating with ReLU function to obtain spectral feature +.>(corresponding steps in method two and method three are similar to the steps in the method two and the method three);
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrumThe procedure of step 5) is represented as follows:
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, the ";
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2 ;
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing fusion processing (in order to better utilize the space-spectrum self-attention obtained under different visual fields, the first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are integrated through the splicing fusion processing), then the spliced fusion result is subjected to dimension reduction processing by a convolution layer to obtain a first fusion self-attention parameter, and then, based on a self-attention mechanism, a matrix V is obtained according to the first fusion self-attention parameter 2 Extracting a second space-spectrum characteristicThe procedure of step E) is represented as follows:
wherein ,representation pair-> and />Performing dimension reduction processing (namely 3D convolution processing in specific operation) on the spliced and fused result, f b×b×b (V 2 ) Representation pair matrix V 2 The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3 ;
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from/>
Fourth) will and />Dot product was performed to obtain a third space-spectrum self-attention parameter->
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are enteredPerforming row-splice fusion processing (integrating three space-spectrum self-attention parameters based on the same reason as corresponding processing in the second method), performing dimension reduction processing on the spliced and fused result by using one convolution layer to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, performing matrix V according to the second fusion self-attention parameter 3 Extracting a third space-spectrum characteristicThe process of step five) is represented as follows:
wherein ,representation pair-> and />Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c.
Further, in S5, a trainable weight is used to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
In order to verify the effectiveness of the proposed method, some experimental demonstrations were made. All experiments were performed with In-vivo Human Brain HSI Dataset and BloodCell HSI Dataset. The following description will be given of each:
(1) In-vivo Human Brain HSI Dataset (Brain HSI Dataset): the data set was collected together by the university of south Anpton Hospital (UHS) and the university of Spanish Laplace Max Hospital (UHDRN). Acquisition systemFrom the following componentsVNIR a-Series camera composition. The camera is based on push-broom technology, uses a silicon CCD detector array, has the lowest frame rate of 90 frames/second, the spectral range of 400-1000 nm and the spectral resolution of 2-3 nm, and can capture 826 spectral bands of 1004 spatial pixels per row. The acquisition object is 16 adult patients in the craniotomy brain tumor operation process, and finally 26 hyperspectral images are obtained, wherein the hyperspectral images comprise four categories of background, normal, tumor and blood vessels. In the experiments herein, hyperspectral images containing all four categories were selected for the experiments, containing a total of 6 patients, 9 images.
(2) BloodCell HSI Dataset: the data set is obtained by integrating a microscope and a silicon charge-coupled device withA Liquid Crystal Tunable Filter (LCTFS) is collected in combination. The dataset contained two blood cell images, named blood cell1-3 and blood cell2-2, respectively. The size of Bloodcell1-3 is 973×799 and the size of Bloodcell2-2 is 462×451, all of which contain 33 bands. Each hyperspectral image contains 3 categories of erythrocytes, leukocytes and background. In the aspect of comparison methods, a convolutional neural network method HybridSN, SSRN and a DBDA, a transform deep learning method spectra-wise ViT, an SSFTT and a CTMixer are selected as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA), the Average classification Accuracy (AA), and the Kappa coefficient (Kappa Coefficient, KC) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm.
To verify the effectiveness of the two strategies proposed by the present invention, ablation experiments were performed on Brain HSI dataset, and the results showed that OA, AA and KC increased by 6.27%, 8.55% and 6.63%, and 4.82%, 5.12% and 5.61%, respectively, after addition of the two strategies, air-spectrum self-attention and multi-view predictive fusion, respectively, to the network alone. After two strategies are added to the network at the same time, the prediction capacity of the model is further improved, and the OA, AA and KC are respectively improved by 10.67%, 9.99% and 8.36%. This shows that the multi-view space-spectrum information strategy successfully focuses on the key space-spectrum characteristic region, and extracts the space-spectrum characteristics with more discrimination. Meanwhile, the multi-view prediction fusion strategy organically fuses diagnosis predictions in different views, so that the model classification effect is improved. The experiment respectively verifies the effectiveness of the added two strategies of multi-view space-spectrum information fusion and multi-view prediction fusion on the improvement of the network classification performance, and proves that the two strategies have better effects when being applied to the network.
In order to further verify the effectiveness of the algorithm provided by the invention, a convolutional neural network method HybridSN, SSRN and a DBDA (data base architecture) are selected, and a transform deep learning method spectra-wise ViT, SSFTT and CTMixer are used as comparison algorithms. Each algorithm was repeated 10 times to characterize the Overall classification Accuracy (OA) in terms of mean ± standard deviation (STD) to comprehensively compare and determine the classification performance of each algorithm. In the experiments, the experimental setup on both data is shown in table 1.
Table 1 experimental setup on Brain and blood cell HSI datasets
On the Brain HSI dataset, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the proposed method achieved classification accuracies of 68.32%, 72.25%, 68.99%, 51.75%, 71.07%, 65.66% and 82.25% respectively, because the spatial attention extraction and spectral attention extraction operations in the present invention successfully achieved important spatial regions and important spectral bands, eliminating feature redundancy. In addition, the prediction probability under the multi-view encoder is weighted and fused according to the self-adaptive learned weight, so that the prediction capability of the model is improved. On the blood cell HSI data set, hybridSN, SSRN, DBDA, viT, CTMixer, SSFTT and the method provided by the invention respectively obtain 89.21%, 88.71%, 88.25%, 77.13%, 90.45%, 89.50% and 91.74% classification precision, and the invention still obtains better classification results. This is because the proposed empty-spectral information extraction block is still able to capture critical empty-spectral information, improve the classification ability of the model, and fuse diagnostic information in different fields of view. Experiments prove that the method provided by the invention can be suitable for medical hyperspectral images acquired by different instruments and different imaging modes, has better generalization and saves the model development cost.
Claims (2)
1. A medical hyperspectral image classification method based on a space-spectrum self-attention mechanism is characterized by comprising the following steps of: comprising the following steps:
s1: normalizing the obtained original hyperspectral image to obtain a normalized image;
s2: cutting the normalized image by taking the target sample as a center to obtain a cut image;
s3: inputting the cut image into a processing unit for processing: the processing unit consists of three convertors encoders with the same structure, and the sizes of the internal convolution kernels of the three convertors are different; the three transducer encoders are respectively marked as a first encoder, a second encoder and a third encoder according to the size of the internal convolution kernel from large to small; the first encoder processes the cut image according to a first method to obtain a first space-spectrum characteristic, the second encoder processes the first space-spectrum characteristic according to a second method to obtain a second space-spectrum characteristic, and the third encoder processes the second space-spectrum characteristic according to a third method to obtain a third space-spectrum characteristic;
s4: inputting the first space-spectrum characteristic into a first classifier, outputting a first prediction result by the first classifier, inputting the second space-spectrum characteristic into a second classifier, outputting a second prediction result by the second classifier, inputting the third space-spectrum characteristic into a third classifier, and outputting a third prediction result by the third classifier;
s5: weighting and fusing the first prediction result, the second prediction result and the third prediction result to obtain a final prediction result; the final prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the classifier can predict the probability that the target sample belongs to different medical semantic categories according to the space-spectrum characteristics; the prediction result comprises a plurality of probabilities, and each probability corresponds to one medical semantic category;
the first method comprises the following steps:
1) Mapping the cropped image into three matrices Q by the multi-layer perceptron module of the first encoder 1 、K 1 、V 1 ;
2) Matrix Q based on spatial attention mechanism 1 Processing is performed from matrix Q 1 Extracting spatial features from the image
3) Matrix K based on spectrum attention mechanism 1 Processing is performed from matrix K 1 Extracting spectral features from
5) Based on the self-attention mechanism, according to a first space-spectrum self-attention parameter, from a matrix V 1 Extracting first space-spectrum characteristics from the spectrumThe procedure of step 5) is represented as follows:
wherein fa×a×a (V 1 ) Representation pair matrix V 1 The size of the middle receptive field is a x the region of a is subjected to a 3D convolution process, a x a is the receptive field size of the first encoder, the ";
the second method comprises the following steps:
a) Mapping the first space-spectrum feature into three matrices Q by a multi-layer perceptron module of a second encoder 2 、K 2 、V 2 ;
B) Matrix Q based on spatial attention mechanism 2 Processing is performed from matrix Q 2 Extracting spatial features from the image
C) Matrix K based on spectrum attention mechanism 2 Processing is performed from matrix K 2 Extracting spectral features from
E) The first space-spectrum self-attention parameter and the second space-spectrum self-attention parameter are subjected to splicing and fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a first fusion self-attention parameter, and then a matrix V is obtained according to the first fusion self-attention parameter based on a self-attention mechanism 2 Extracting a second space-spectrum characteristicThe procedure of step E) is represented as follows: />
wherein ,representation pair-> and />Performing dimension reduction treatment on the spliced and fused results, and f b×b×b (V 2 ) Representation pair matrix V 2 The region with the middle receptive field size of b x b is subjected to 3D convolution processing, b x b is the receptive field size of the second encoder, concat (-) represents a splice fusion process;
the third method comprises the following steps:
first) mapping the second space-spectrum feature into three matrices Q by a multi-layer perceptron module of a third encoder 3 、K 3 、V 3 ;
Two) matrix Q based on spatial attention mechanism 3 Processing is performed from matrix Q 3 Extracting spatial features from the image
Three) matrix K based on spectrum attention mechanism 3 Processing is performed from matrix K 3 Extracting spectral features from
Fourth) will and />Dot product was performed to obtain a third space-spectrum self-attention parameter->
Fifth), the first space-spectrum self-attention parameter, the second space-spectrum self-attention parameter and the third space-spectrum self-attention parameter are subjected to splicing fusion processing, then a convolution layer is used for carrying out dimension reduction processing on the spliced and fused result to obtain a second fusion self-attention parameter, and then based on a self-attention mechanism, the second fusion self-attention parameter is used for obtaining a matrix V 3 Extracting a third space-spectrum characteristicThe process of step five) is represented as follows:
wherein ,representation pair-> and />Performing dimension reduction treatment on the spliced and fused results, and f c×c×c (V 3 ) Representation pair matrix V 3 The region with the middle receptive field size of c x c is subjected to 3D convolution processing, c×c×c is third code the receptive field size of the device; a > b > c.
2. The medical hyperspectral image classification method based on the space-spectrum self-attention mechanism as recited in claim 1, wherein: in the step S5, a trainable weight is adopted to dynamically weight and fuse the first prediction result, the second prediction result and the third prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310152996.7A CN116229163A (en) | 2023-02-22 | 2023-02-22 | Medical hyperspectral image classification method based on space-spectrum self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310152996.7A CN116229163A (en) | 2023-02-22 | 2023-02-22 | Medical hyperspectral image classification method based on space-spectrum self-attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116229163A true CN116229163A (en) | 2023-06-06 |
Family
ID=86576418
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310152996.7A Pending CN116229163A (en) | 2023-02-22 | 2023-02-22 | Medical hyperspectral image classification method based on space-spectrum self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116229163A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740474A (en) * | 2023-08-15 | 2023-09-12 | 南京信息工程大学 | Remote sensing image classification method based on anchoring stripe attention mechanism |
-
2023
- 2023-02-22 CN CN202310152996.7A patent/CN116229163A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116740474A (en) * | 2023-08-15 | 2023-09-12 | 南京信息工程大学 | Remote sensing image classification method based on anchoring stripe attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xia et al. | A novel improved deep convolutional neural network model for medical image fusion | |
US9317761B2 (en) | Method and an apparatus for determining vein patterns from a colour image | |
US9795786B2 (en) | Saliency-based apparatus and methods for visual prostheses | |
Abunadi et al. | Deep learning and machine learning techniques of diagnosis dermoscopy images for early detection of skin diseases | |
CN105068644A (en) | Method for detecting P300 electroencephalogram based on convolutional neural network | |
WO2009142758A1 (en) | Systems and methods for hyperspectral medical imaging | |
He et al. | Spectral super-resolution meets deep learning: Achievements and challenges | |
Shabanzade et al. | Combination of wavelet and contourlet transforms for PET and MRI image fusion | |
CN107862249A (en) | A kind of bifurcated palm grain identification method and device | |
CN116229163A (en) | Medical hyperspectral image classification method based on space-spectrum self-attention mechanism | |
Tawfik et al. | Early recognition and grading of cataract using a combined log Gabor/discrete wavelet transform with ANN and SVM | |
CN117765252A (en) | breast cancer identification system and method based on Swin transducer and contrast learning | |
Gray et al. | A perceptron reveals the face of sex | |
Akbari et al. | Wavelet-based compression and segmentation of hyperspectral images in surgery | |
CN116309754A (en) | Brain medical image registration method and system based on local-global information collaboration | |
Yang et al. | A multispectral digital cervigram analyzer in the wavelet domain for early detection of cervical cancer | |
Xu et al. | A Multi-scale Attention-based Convolutional Network for Identification of Alzheimer's Disease based on Hippocampal Subfields | |
Wang et al. | Multi-modality anatomical and functional medical image fusion based on simplified-spatial frequency-pulse coupled neural networks and region energy-weighted average strategy in non-sub sampled contourlet transform domain | |
Di et al. | ECRNet: Hybrid network for skin cancer identification | |
Garcia et al. | Non-invasive diabetes detection using facial texture features captured in a less restrictive environment | |
Gharge et al. | Skin Cancer Detection Application | |
Arora et al. | Dermcare-Skin Disease Classification from Image | |
CN116662782A (en) | MSFF-SENET-based motor imagery electroencephalogram decoding method | |
Bhardwaj et al. | SkinSage-Lesion Diagnosis Using Deep Learning Techniques | |
Hernandez-Diaz et al. | Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |