CN117274662A - Lightweight multi-mode medical image classification method for improving ResNeXt neural network - Google Patents
Lightweight multi-mode medical image classification method for improving ResNeXt neural network Download PDFInfo
- Publication number
- CN117274662A CN117274662A CN202311022573.XA CN202311022573A CN117274662A CN 117274662 A CN117274662 A CN 117274662A CN 202311022573 A CN202311022573 A CN 202311022573A CN 117274662 A CN117274662 A CN 117274662A
- Authority
- CN
- China
- Prior art keywords
- convolution
- layer
- feature
- data
- residual block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 11
- 238000013145 classification model Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000005457 optimization Methods 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 238000012795 verification Methods 0.000 claims abstract description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 39
- 239000013598 vector Substances 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims 1
- 238000010801 machine learning Methods 0.000 abstract description 8
- 238000013135 deep learning Methods 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 6
- 238000007635 classification algorithm Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 230000036285 pathological change Effects 0.000 description 1
- 231100000915 pathological change Toxicity 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNe multiplied by t neural network. The method aims at solving the problem that in medical image classification, large differences exist among different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak. The main scheme is that medical images from two different modes are preprocessed to obtain processed images; dividing the processed image into a training set, a testing set and a verification set according to the proportion; data enhancement is carried out on the data of the training set; constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors; constructing an improved ResNeXt convolutional neural network as a classification model; and after parameter optimization, sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
Description
Technical Field
The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network.
Background
Traditional machine learning based method: the method comprises the steps of supporting a vector machine, random forest, naive Bayes and the like, converting medical images into numerical characteristics through methods of characteristic extraction, characteristic selection and the like, and then classifying by using a traditional machine learning algorithm. Such algorithms are classified based on manually designed features, whose feature extraction is typically based on filtering, edge detection, etc., and it is difficult to capture high-level features and semantic information of the image. And because the classifier used in the traditional machine learning method is usually a linear classifier, such as a Support Vector Machine (SVM), and cannot well represent nonlinear characteristics, it is difficult to process image data with high and nonlinearity, and the generalization performance of the classifier is limited. In addition, the traditional machine learning method has high requirements on data quantity and data quality because the characteristics are required to be manually extracted, a large amount of labeling data is required, and the data quality has an important influence on the performance of the model.
Deep learning-based method: such methods use deep Convolutional Neural Networks (CNNs) for feature extraction and classification, including classical models such as AlexNet, VGG, resNet, and the like. Such methods typically require a large amount of labeling data and computational resources, and if the training data is insufficient, this may result in over-fitting or under-fitting of the model, especially for large-scale neural networks and complex image classification tasks, requiring a greater amount of data for model training. In addition, because the deep learning model has a complex structure, in a deep network, the model has poor interpretability due to huge parameter quantity, the inherent logic of the model in the classification decision making process is difficult to understand, the network model is complex, and the network model is very troublesome in the practical application deployment process.
The traditional machine learning method requires manual design of a feature extractor, has limited feature expression capability, cannot fully extract information in images, requires a large amount of data and calculation resources for training, has huge parameter quantity of a model, and is difficult to be deployed on mobile equipment with limited resources. In medical image classification, there is a large difference between different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak.
Disclosure of Invention
The invention aims to solve the problem that the processing capacity of the traditional machine learning method based on deep learning on multi-mode data is weak because of large difference among different mode images in medical image classification.
The invention adopts the following technical scheme for realizing the purposes:
a lightweight multi-mode medical image classification method for improving ResNeXt neural network comprises the following steps:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
In the above technical solution, in step 1, the input medical images are preprocessed, that is, the images of different modes are aligned, so that the spatial positions and directions of the images are consistent, and the sample size is unified.
In the above technical solution, the step 3 of data enhancement on the data set includes randomly rotating, turning over, scaling and cropping the images of different modes.
In the above technical solution, the sub-network constructing in step 4 includes the constructed sub-network alexent and the sub-network densenett, and step 4 specifically includes the following steps:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
In the above technical solution, step 5 specifically includes the following steps:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in the step 5.3, including a channel attention and depth separable convolution operation, except that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
Because the invention adopts the technical means, the invention has the following beneficial effects:
the technology adopts a deep learning convolutional neural network model, uses ResNeXt as a backbone network, uses different modes of images as different input channels of the network, fuses the characteristics of data of different modes, and further carries out classification prediction. The method can utilize the powerful feature extraction capability of the ResNeXt network, simultaneously fuses multi-mode features, improves classification accuracy, and adopts a plurality of lightweight technologies such as depth separable convolution and channel attention mechanisms and the like while improving accuracy, so that the calculation complexity and storage complexity of the model can be reduced, and the time and resource consumption of model training and reasoning are reduced. Medical image classification in a single modality is typically performed using only one type of image data and is often affected by a number of factors, resulting in limited classification accuracy. Different types of medical images (e.g., CT, MRI, X-ray, etc.) have different image characteristics, and the classification of different types of images using the same classification algorithm may be limited. Moreover, single modality classification methods may exhibit poor performance for some specific types of medical images. And the medical image classification algorithm of a single mode usually needs a large amount of data to train to ensure the classification accuracy, but due to the high acquisition cost of the medical image data, the data amount is often limited, which may cause the classification algorithm of the single mode to perform poorly. In addition, medical images are often subject to interference from various noise, such as artifacts, motion artifacts, noise, etc., which can affect the quality of the medical image and thus the performance of the classification algorithm. Single modality classification algorithms may have difficulty distinguishing between noise and authentic features, resulting in classification errors. Therefore, the multi-mode medical image classification method can comprehensively utilize various types of medical image data, overcomes the defects of the single-mode classification method, and improves the classification precision and stability.
Drawings
FIG. 1 is a diagram of an improved classification network architecture;
FIG. 2 is an internal block diagram of a residual block;
fig. 3 is a channel attention module.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail. While the invention will be described and illustrated in conjunction with certain specific embodiments, it will be understood that it is not intended to limit the invention to these embodiments alone. On the contrary, the invention is intended to cover modifications and equivalent arrangements included within the scope of the appended claims.
In addition, numerous specific details are set forth in the following description in order to provide a better illustration of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details.
Therefore, the invention provides a technical scheme which has the following characteristics:
lightweight design: on the premise of ensuring classification accuracy, the technology uses a series of lightweight designs, reduces the number of parameters and the calculated amount, and is suitable for medical image equipment and mobile terminal application;
multimodal fusion: when medical images of different modes are processed simultaneously, a mode interaction mechanism is adopted, information of different modes is fully fused, and the accuracy of the classifier is improved;
attention mechanism: the technology adds a attention mechanism in the last layer of the original network residual block, enhances the characterization capability of the network, filters out unimportant noise and interference information, is more important information focused on the network, and improves the feature extraction capability of the network for different samples;
the original resnext network is pretrained on the ImageNet, inherits the superior performance of resnext on a large-scale image classification task, and is excellent in medical image classification task.
In order to facilitate the understanding of the technical scheme of the invention by the person skilled in the art, the invention provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network, which comprises the following steps:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
In the above technical solution, in step 1, the input medical images are preprocessed, that is, the images of different modes are aligned, so that the spatial positions and directions of the images are consistent, and the sample size is unified.
In the above technical solution, the step 3 of data enhancement on the data set includes randomly rotating, turning over, scaling and cropping the images of different modes.
In the above technical solution, the sub-network constructing in step 4 includes the constructed sub-network alexent and the sub-network densenett, and step 4 specifically includes the following steps:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
In the above technical solution, step 5 specifically includes the following steps:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in the step 5.3, including a channel attention and depth separable convolution operation, except that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
The improved ResNeXt structure is shown in FIG. 1: in the figure, CT and MRI are taken as examples of data of two modes respectively;
wherein the internal structure of the residual block is shown in fig. 2, the calculation of the depth separable convolution here can be divided into two steps: depth convolution and point-by-point convolution. The depth convolution convolves each channel of the input tensor, using a separate convolution kernel for each channel. For example, the input tensor has C in The number of channels is C in Each channel is convolved by 3X3 convolutions to obtain C in And a plurality of output channels. Then, point-by-point convolution is applied to the output of the depth convolution, the feature images of the channels are subjected to 1X1 convolution pixel by pixel, and the output of the depth convolution is mapped to C out And obtaining a final output tensor through the channels.
The calculation formula of the depth separable convolution is:
Y=depthwise(X)*pointwise(X)=pointwise(depthwise(X))
where X is the input tensor, Y is the output tensor, X is the element-wise product, depthwise is the depth convolution, and pointwise is the point-wise convolution.
The specific structure of the channel attention module is shown in fig. 3, and the specific calculation of the channel attention module is as follows: for a feature map input as h×w×c, where H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map, first, global average pooling is performed on the input feature map to obtain feature vectors of each channel:
wherein i represents enumeration in the dimension H, i.e. the feature map height, k represents enumeration in the width dimension W, i.e. the feature map, i ranges from [1, H]K is in the range of [1, w],X i,j,k Representing a feature map X with a height i, a width j and a channel number c.
And then, calculating the feature vectors by two full connection layers to obtain s and f:
s=σ(W 1 z+b 1 ),f=σ(W 2 s+b 2 )
wherein σ is a sigmoid activation function, z represents an adjustable factor, b 1 As bias term, b 2 For the bias items, b1 and b2 can be used for adjusting and shifting the output of the activation function, so that the model is better adapted to different data distribution, and better model performance is realized;
and then taking each element of f as a channel weight, and carrying out weighted rescaling on the input feature map to obtain an output feature map Y:
Y i,k,j =f j X i,k ,j∈[1,C]。
by adopting the scheme, the invention has the following characteristics:
by fusing the multi-mode medical image data, the information of different mode images can be fully utilized, so that the classification accuracy is improved.
And a channel attention mechanism is added to the last layer of each residual structure, so that the capability of extracting features of the network is effectively improved, and the classification capability of the network on medical images of different modes is enhanced. Through multi-mode data fusion, more comprehensive and more accurate disease or pathological change characteristics can be obtained, so that pathological analysis and medical diagnosis are better carried out, and the interpretation of the model is improved.
The convolution layer in the residual error network is improved, the convolution layer is changed into light-weight depth separable convolution, the parameter quantity is reduced, and the calculation complexity and the storage complexity of the model are reduced, so that the time and the resource consumption of model training and reasoning are reduced. The network uses the residual connection of the original residual network to help the model to better process deep information and to better process complex data.
Claims (5)
1. The lightweight multi-mode medical image classification method for improving the ResNeXt neural network is characterized by comprising the following steps of:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
2. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks as claimed in claim 1, wherein the input medical images are preprocessed in step 1, i.e., the images of different modalities are aligned such that their spatial locations and directions are consistent, unifying sample sizes.
3. The method of claim 1, wherein the step 3 of enhancing the data set comprises randomly rotating, flipping, scaling, and cropping the images of different modalities.
4. The method for classifying lightweight multi-modal medical images with improved ResNeXt neural networks according to claim 1, wherein the constructing of the sub-networks in step 4 includes constructing sub-networks alexent and sub-network DenseNet, and step 4 specifically includes the steps of:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
5. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks according to claim 1, wherein step 5 comprises the steps of:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in step 5.3, and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311022573.XA CN117274662A (en) | 2023-08-14 | 2023-08-14 | Lightweight multi-mode medical image classification method for improving ResNeXt neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311022573.XA CN117274662A (en) | 2023-08-14 | 2023-08-14 | Lightweight multi-mode medical image classification method for improving ResNeXt neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117274662A true CN117274662A (en) | 2023-12-22 |
Family
ID=89209410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311022573.XA Pending CN117274662A (en) | 2023-08-14 | 2023-08-14 | Lightweight multi-mode medical image classification method for improving ResNeXt neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117274662A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765322A (en) * | 2023-12-25 | 2024-03-26 | 中国科学技术大学 | Classifying system capable of explaining deep learning based on multi-modal data |
CN117934962A (en) * | 2024-02-06 | 2024-04-26 | 青岛兴牧畜牧科技发展有限公司 | Pork quality classification method based on reference color card image correction |
CN118196584A (en) * | 2024-05-15 | 2024-06-14 | 江苏富翰医疗产业发展有限公司 | Multi-mode glaucoma image recognition method and system |
-
2023
- 2023-08-14 CN CN202311022573.XA patent/CN117274662A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765322A (en) * | 2023-12-25 | 2024-03-26 | 中国科学技术大学 | Classifying system capable of explaining deep learning based on multi-modal data |
CN117934962A (en) * | 2024-02-06 | 2024-04-26 | 青岛兴牧畜牧科技发展有限公司 | Pork quality classification method based on reference color card image correction |
CN118196584A (en) * | 2024-05-15 | 2024-06-14 | 江苏富翰医疗产业发展有限公司 | Multi-mode glaucoma image recognition method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Blum et al. | The fishyscapes benchmark: Measuring blind spots in semantic segmentation | |
Patravali et al. | 2D-3D fully convolutional neural networks for cardiac MR segmentation | |
Li et al. | Deep convolutional neural networks for detecting secondary structures in protein density maps from cryo-electron microscopy | |
CN117274662A (en) | Lightweight multi-mode medical image classification method for improving ResNeXt neural network | |
CN105608478B (en) | image feature extraction and classification combined method and system | |
CN111680755A (en) | Medical image recognition model construction method, medical image recognition device, medical image recognition medium and medical image recognition terminal | |
CN112348059A (en) | Deep learning-based method and system for classifying multiple dyeing pathological images | |
CN111639697B (en) | Hyperspectral image classification method based on non-repeated sampling and prototype network | |
Avola et al. | Real-time deep learning method for automated detection and localization of structural defects in manufactured products | |
Khellal et al. | Pedestrian classification and detection in far infrared images | |
CN114445356A (en) | Multi-resolution-based full-field pathological section image tumor rapid positioning method | |
Qin et al. | Learning from limited and imbalanced medical images with finer synthetic images from gans | |
Haque et al. | Improving Performance of a Brain Tumor Detection on MRI Images Using DCGAN-Based Data Augmentation and Vision Transformer (ViT) Approach | |
CN114742750A (en) | Abnormal cell detection method, abnormal cell detection device, terminal device and readable storage medium | |
Krishna et al. | Stain normalized breast histopathology image recognition using convolutional neural networks for cancer detection | |
Song et al. | Simultaneous cell detection and classification with an asymmetric deep autoencoder in bone marrow histology images | |
Ahmadian et al. | Single image super-resolution with self-organization neural networks and image laplace gradient operator | |
Li et al. | Automatic classification of galaxy morphology based on the RegNetX-CBAM3 algorithm | |
Bricman et al. | CocoNet: A deep neural network for mapping pixel coordinates to color values | |
Sharma et al. | Solving image processing critical problems using machine learning | |
CN113192085A (en) | Three-dimensional organ image segmentation method and device and computer equipment | |
CN113256556A (en) | Image selection method and device | |
Rajawat et al. | Advanced Identification of Alzheimer's Disease from Brain MRI Images Using Convolution Neural Network | |
Ahmed et al. | A CNN-based novel approach for the detection of compound Bangla handwritten characters | |
Kate et al. | Multiple Classifier Framework System for Fast Sequential Prediction of Breast Cancer using Deep Learning Models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |