CN117274662A - Lightweight multi-mode medical image classification method for improving ResNeXt neural network - Google Patents

Lightweight multi-mode medical image classification method for improving ResNeXt neural network Download PDF

Info

Publication number
CN117274662A
CN117274662A CN202311022573.XA CN202311022573A CN117274662A CN 117274662 A CN117274662 A CN 117274662A CN 202311022573 A CN202311022573 A CN 202311022573A CN 117274662 A CN117274662 A CN 117274662A
Authority
CN
China
Prior art keywords
convolution
layer
feature
data
residual block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311022573.XA
Other languages
Chinese (zh)
Inventor
付立军
仇慧琪
李旭
伍强
刘婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Zhihe Digital Technology Beijing Co ltd
Original Assignee
Zhongke Zhihe Digital Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Zhihe Digital Technology Beijing Co ltd filed Critical Zhongke Zhihe Digital Technology Beijing Co ltd
Priority to CN202311022573.XA priority Critical patent/CN117274662A/en
Publication of CN117274662A publication Critical patent/CN117274662A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNe multiplied by t neural network. The method aims at solving the problem that in medical image classification, large differences exist among different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak. The main scheme is that medical images from two different modes are preprocessed to obtain processed images; dividing the processed image into a training set, a testing set and a verification set according to the proportion; data enhancement is carried out on the data of the training set; constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors; constructing an improved ResNeXt convolutional neural network as a classification model; and after parameter optimization, sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.

Description

Lightweight multi-mode medical image classification method for improving ResNeXt neural network
Technical Field
The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network.
Background
Traditional machine learning based method: the method comprises the steps of supporting a vector machine, random forest, naive Bayes and the like, converting medical images into numerical characteristics through methods of characteristic extraction, characteristic selection and the like, and then classifying by using a traditional machine learning algorithm. Such algorithms are classified based on manually designed features, whose feature extraction is typically based on filtering, edge detection, etc., and it is difficult to capture high-level features and semantic information of the image. And because the classifier used in the traditional machine learning method is usually a linear classifier, such as a Support Vector Machine (SVM), and cannot well represent nonlinear characteristics, it is difficult to process image data with high and nonlinearity, and the generalization performance of the classifier is limited. In addition, the traditional machine learning method has high requirements on data quantity and data quality because the characteristics are required to be manually extracted, a large amount of labeling data is required, and the data quality has an important influence on the performance of the model.
Deep learning-based method: such methods use deep Convolutional Neural Networks (CNNs) for feature extraction and classification, including classical models such as AlexNet, VGG, resNet, and the like. Such methods typically require a large amount of labeling data and computational resources, and if the training data is insufficient, this may result in over-fitting or under-fitting of the model, especially for large-scale neural networks and complex image classification tasks, requiring a greater amount of data for model training. In addition, because the deep learning model has a complex structure, in a deep network, the model has poor interpretability due to huge parameter quantity, the inherent logic of the model in the classification decision making process is difficult to understand, the network model is complex, and the network model is very troublesome in the practical application deployment process.
The traditional machine learning method requires manual design of a feature extractor, has limited feature expression capability, cannot fully extract information in images, requires a large amount of data and calculation resources for training, has huge parameter quantity of a model, and is difficult to be deployed on mobile equipment with limited resources. In medical image classification, there is a large difference between different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak.
Disclosure of Invention
The invention aims to solve the problem that the processing capacity of the traditional machine learning method based on deep learning on multi-mode data is weak because of large difference among different mode images in medical image classification.
The invention adopts the following technical scheme for realizing the purposes:
a lightweight multi-mode medical image classification method for improving ResNeXt neural network comprises the following steps:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
In the above technical solution, in step 1, the input medical images are preprocessed, that is, the images of different modes are aligned, so that the spatial positions and directions of the images are consistent, and the sample size is unified.
In the above technical solution, the step 3 of data enhancement on the data set includes randomly rotating, turning over, scaling and cropping the images of different modes.
In the above technical solution, the sub-network constructing in step 4 includes the constructed sub-network alexent and the sub-network densenett, and step 4 specifically includes the following steps:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
In the above technical solution, step 5 specifically includes the following steps:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in the step 5.3, including a channel attention and depth separable convolution operation, except that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
Because the invention adopts the technical means, the invention has the following beneficial effects:
the technology adopts a deep learning convolutional neural network model, uses ResNeXt as a backbone network, uses different modes of images as different input channels of the network, fuses the characteristics of data of different modes, and further carries out classification prediction. The method can utilize the powerful feature extraction capability of the ResNeXt network, simultaneously fuses multi-mode features, improves classification accuracy, and adopts a plurality of lightweight technologies such as depth separable convolution and channel attention mechanisms and the like while improving accuracy, so that the calculation complexity and storage complexity of the model can be reduced, and the time and resource consumption of model training and reasoning are reduced. Medical image classification in a single modality is typically performed using only one type of image data and is often affected by a number of factors, resulting in limited classification accuracy. Different types of medical images (e.g., CT, MRI, X-ray, etc.) have different image characteristics, and the classification of different types of images using the same classification algorithm may be limited. Moreover, single modality classification methods may exhibit poor performance for some specific types of medical images. And the medical image classification algorithm of a single mode usually needs a large amount of data to train to ensure the classification accuracy, but due to the high acquisition cost of the medical image data, the data amount is often limited, which may cause the classification algorithm of the single mode to perform poorly. In addition, medical images are often subject to interference from various noise, such as artifacts, motion artifacts, noise, etc., which can affect the quality of the medical image and thus the performance of the classification algorithm. Single modality classification algorithms may have difficulty distinguishing between noise and authentic features, resulting in classification errors. Therefore, the multi-mode medical image classification method can comprehensively utilize various types of medical image data, overcomes the defects of the single-mode classification method, and improves the classification precision and stability.
Drawings
FIG. 1 is a diagram of an improved classification network architecture;
FIG. 2 is an internal block diagram of a residual block;
fig. 3 is a channel attention module.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail. While the invention will be described and illustrated in conjunction with certain specific embodiments, it will be understood that it is not intended to limit the invention to these embodiments alone. On the contrary, the invention is intended to cover modifications and equivalent arrangements included within the scope of the appended claims.
In addition, numerous specific details are set forth in the following description in order to provide a better illustration of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details.
Therefore, the invention provides a technical scheme which has the following characteristics:
lightweight design: on the premise of ensuring classification accuracy, the technology uses a series of lightweight designs, reduces the number of parameters and the calculated amount, and is suitable for medical image equipment and mobile terminal application;
multimodal fusion: when medical images of different modes are processed simultaneously, a mode interaction mechanism is adopted, information of different modes is fully fused, and the accuracy of the classifier is improved;
attention mechanism: the technology adds a attention mechanism in the last layer of the original network residual block, enhances the characterization capability of the network, filters out unimportant noise and interference information, is more important information focused on the network, and improves the feature extraction capability of the network for different samples;
the original resnext network is pretrained on the ImageNet, inherits the superior performance of resnext on a large-scale image classification task, and is excellent in medical image classification task.
In order to facilitate the understanding of the technical scheme of the invention by the person skilled in the art, the invention provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network, which comprises the following steps:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
In the above technical solution, in step 1, the input medical images are preprocessed, that is, the images of different modes are aligned, so that the spatial positions and directions of the images are consistent, and the sample size is unified.
In the above technical solution, the step 3 of data enhancement on the data set includes randomly rotating, turning over, scaling and cropping the images of different modes.
In the above technical solution, the sub-network constructing in step 4 includes the constructed sub-network alexent and the sub-network densenett, and step 4 specifically includes the following steps:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
In the above technical solution, step 5 specifically includes the following steps:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in the step 5.3, including a channel attention and depth separable convolution operation, except that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
The improved ResNeXt structure is shown in FIG. 1: in the figure, CT and MRI are taken as examples of data of two modes respectively;
wherein the internal structure of the residual block is shown in fig. 2, the calculation of the depth separable convolution here can be divided into two steps: depth convolution and point-by-point convolution. The depth convolution convolves each channel of the input tensor, using a separate convolution kernel for each channel. For example, the input tensor has C in The number of channels is C in Each channel is convolved by 3X3 convolutions to obtain C in And a plurality of output channels. Then, point-by-point convolution is applied to the output of the depth convolution, the feature images of the channels are subjected to 1X1 convolution pixel by pixel, and the output of the depth convolution is mapped to C out And obtaining a final output tensor through the channels.
The calculation formula of the depth separable convolution is:
Y=depthwise(X)*pointwise(X)=pointwise(depthwise(X))
where X is the input tensor, Y is the output tensor, X is the element-wise product, depthwise is the depth convolution, and pointwise is the point-wise convolution.
The specific structure of the channel attention module is shown in fig. 3, and the specific calculation of the channel attention module is as follows: for a feature map input as h×w×c, where H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map, first, global average pooling is performed on the input feature map to obtain feature vectors of each channel:
wherein i represents enumeration in the dimension H, i.e. the feature map height, k represents enumeration in the width dimension W, i.e. the feature map, i ranges from [1, H]K is in the range of [1, w],X i,j,k Representing a feature map X with a height i, a width j and a channel number c.
And then, calculating the feature vectors by two full connection layers to obtain s and f:
s=σ(W 1 z+b 1 ),f=σ(W 2 s+b 2 )
wherein σ is a sigmoid activation function, z represents an adjustable factor, b 1 As bias term, b 2 For the bias items, b1 and b2 can be used for adjusting and shifting the output of the activation function, so that the model is better adapted to different data distribution, and better model performance is realized;
and then taking each element of f as a channel weight, and carrying out weighted rescaling on the input feature map to obtain an output feature map Y:
Y i,k,j =f j X i,k ,j∈[1,C]。
by adopting the scheme, the invention has the following characteristics:
by fusing the multi-mode medical image data, the information of different mode images can be fully utilized, so that the classification accuracy is improved.
And a channel attention mechanism is added to the last layer of each residual structure, so that the capability of extracting features of the network is effectively improved, and the classification capability of the network on medical images of different modes is enhanced. Through multi-mode data fusion, more comprehensive and more accurate disease or pathological change characteristics can be obtained, so that pathological analysis and medical diagnosis are better carried out, and the interpretation of the model is improved.
The convolution layer in the residual error network is improved, the convolution layer is changed into light-weight depth separable convolution, the parameter quantity is reduced, and the calculation complexity and the storage complexity of the model are reduced, so that the time and the resource consumption of model training and reasoning are reduced. The network uses the residual connection of the original residual network to help the model to better process deep information and to better process complex data.

Claims (5)

1. The lightweight multi-mode medical image classification method for improving the ResNeXt neural network is characterized by comprising the following steps of:
step 1: preprocessing medical images from two different modes to obtain processed images;
step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;
step 3: data enhancement is carried out on the data of the training set;
step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;
step 5: constructing an improved ResNeXt convolutional neural network as a classification model;
step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;
step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;
step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.
2. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks as claimed in claim 1, wherein the input medical images are preprocessed in step 1, i.e., the images of different modalities are aligned such that their spatial locations and directions are consistent, unifying sample sizes.
3. The method of claim 1, wherein the step 3 of enhancing the data set comprises randomly rotating, flipping, scaling, and cropping the images of different modalities.
4. The method for classifying lightweight multi-modal medical images with improved ResNeXt neural networks according to claim 1, wherein the constructing of the sub-networks in step 4 includes constructing sub-networks alexent and sub-network DenseNet, and step 4 specifically includes the steps of:
step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;
step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;
step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;
step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.
5. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks according to claim 1, wherein step 5 comprises the steps of:
step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;
step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;
the specific formula of the convolution operation here is:
wherein F is out Is an output feature map, F in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;
step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;
step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;
step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;
step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in step 5.3, and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 512, and the number of output channels is 1024;
step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;
step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;
step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;
step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.
CN202311022573.XA 2023-08-14 2023-08-14 Lightweight multi-mode medical image classification method for improving ResNeXt neural network Pending CN117274662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311022573.XA CN117274662A (en) 2023-08-14 2023-08-14 Lightweight multi-mode medical image classification method for improving ResNeXt neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311022573.XA CN117274662A (en) 2023-08-14 2023-08-14 Lightweight multi-mode medical image classification method for improving ResNeXt neural network

Publications (1)

Publication Number Publication Date
CN117274662A true CN117274662A (en) 2023-12-22

Family

ID=89209410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311022573.XA Pending CN117274662A (en) 2023-08-14 2023-08-14 Lightweight multi-mode medical image classification method for improving ResNeXt neural network

Country Status (1)

Country Link
CN (1) CN117274662A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765322A (en) * 2023-12-25 2024-03-26 中国科学技术大学 Classifying system capable of explaining deep learning based on multi-modal data
CN117934962A (en) * 2024-02-06 2024-04-26 青岛兴牧畜牧科技发展有限公司 Pork quality classification method based on reference color card image correction
CN118196584A (en) * 2024-05-15 2024-06-14 江苏富翰医疗产业发展有限公司 Multi-mode glaucoma image recognition method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765322A (en) * 2023-12-25 2024-03-26 中国科学技术大学 Classifying system capable of explaining deep learning based on multi-modal data
CN117934962A (en) * 2024-02-06 2024-04-26 青岛兴牧畜牧科技发展有限公司 Pork quality classification method based on reference color card image correction
CN118196584A (en) * 2024-05-15 2024-06-14 江苏富翰医疗产业发展有限公司 Multi-mode glaucoma image recognition method and system

Similar Documents

Publication Publication Date Title
Blum et al. The fishyscapes benchmark: Measuring blind spots in semantic segmentation
Patravali et al. 2D-3D fully convolutional neural networks for cardiac MR segmentation
Li et al. Deep convolutional neural networks for detecting secondary structures in protein density maps from cryo-electron microscopy
CN117274662A (en) Lightweight multi-mode medical image classification method for improving ResNeXt neural network
CN105608478B (en) image feature extraction and classification combined method and system
CN111680755A (en) Medical image recognition model construction method, medical image recognition device, medical image recognition medium and medical image recognition terminal
CN112348059A (en) Deep learning-based method and system for classifying multiple dyeing pathological images
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
Avola et al. Real-time deep learning method for automated detection and localization of structural defects in manufactured products
Khellal et al. Pedestrian classification and detection in far infrared images
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
Qin et al. Learning from limited and imbalanced medical images with finer synthetic images from gans
Haque et al. Improving Performance of a Brain Tumor Detection on MRI Images Using DCGAN-Based Data Augmentation and Vision Transformer (ViT) Approach
CN114742750A (en) Abnormal cell detection method, abnormal cell detection device, terminal device and readable storage medium
Krishna et al. Stain normalized breast histopathology image recognition using convolutional neural networks for cancer detection
Song et al. Simultaneous cell detection and classification with an asymmetric deep autoencoder in bone marrow histology images
Ahmadian et al. Single image super-resolution with self-organization neural networks and image laplace gradient operator
Li et al. Automatic classification of galaxy morphology based on the RegNetX-CBAM3 algorithm
Bricman et al. CocoNet: A deep neural network for mapping pixel coordinates to color values
Sharma et al. Solving image processing critical problems using machine learning
CN113192085A (en) Three-dimensional organ image segmentation method and device and computer equipment
CN113256556A (en) Image selection method and device
Rajawat et al. Advanced Identification of Alzheimer's Disease from Brain MRI Images Using Convolution Neural Network
Ahmed et al. A CNN-based novel approach for the detection of compound Bangla handwritten characters
Kate et al. Multiple Classifier Framework System for Fast Sequential Prediction of Breast Cancer using Deep Learning Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination