CN117274662A

CN117274662A - Lightweight multi-mode medical image classification method for improving ResNeXt neural network

Info

Publication number: CN117274662A
Application number: CN202311022573.XA
Authority: CN
Inventors: 付立军; 仇慧琪; 李旭; 伍强; 刘婧
Original assignee: Zhongke Zhihe Digital Technology Beijing Co ltd
Current assignee: Zhongke Zhihe Digital Technology Beijing Co ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-12-22

Abstract

The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNe multiplied by t neural network. The method aims at solving the problem that in medical image classification, large differences exist among different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak. The main scheme is that medical images from two different modes are preprocessed to obtain processed images; dividing the processed image into a training set, a testing set and a verification set according to the proportion; data enhancement is carried out on the data of the training set; constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors; constructing an improved ResNeXt convolutional neural network as a classification model; and after parameter optimization, sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.

Description

Lightweight multi-mode medical image classification method for improving ResNeXt neural network

Technical Field

The invention relates to the field of image processing, and provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network.

Background

Traditional machine learning based method: the method comprises the steps of supporting a vector machine, random forest, naive Bayes and the like, converting medical images into numerical characteristics through methods of characteristic extraction, characteristic selection and the like, and then classifying by using a traditional machine learning algorithm. Such algorithms are classified based on manually designed features, whose feature extraction is typically based on filtering, edge detection, etc., and it is difficult to capture high-level features and semantic information of the image. And because the classifier used in the traditional machine learning method is usually a linear classifier, such as a Support Vector Machine (SVM), and cannot well represent nonlinear characteristics, it is difficult to process image data with high and nonlinearity, and the generalization performance of the classifier is limited. In addition, the traditional machine learning method has high requirements on data quantity and data quality because the characteristics are required to be manually extracted, a large amount of labeling data is required, and the data quality has an important influence on the performance of the model.

Deep learning-based method: such methods use deep Convolutional Neural Networks (CNNs) for feature extraction and classification, including classical models such as AlexNet, VGG, resNet, and the like. Such methods typically require a large amount of labeling data and computational resources, and if the training data is insufficient, this may result in over-fitting or under-fitting of the model, especially for large-scale neural networks and complex image classification tasks, requiring a greater amount of data for model training. In addition, because the deep learning model has a complex structure, in a deep network, the model has poor interpretability due to huge parameter quantity, the inherent logic of the model in the classification decision making process is difficult to understand, the network model is complex, and the network model is very troublesome in the practical application deployment process.

The traditional machine learning method requires manual design of a feature extractor, has limited feature expression capability, cannot fully extract information in images, requires a large amount of data and calculation resources for training, has huge parameter quantity of a model, and is difficult to be deployed on mobile equipment with limited resources. In medical image classification, there is a large difference between different modality images, and the processing capacity of the traditional machine learning method based on deep learning on multi-modality data is weak.

Disclosure of Invention

The invention aims to solve the problem that the processing capacity of the traditional machine learning method based on deep learning on multi-mode data is weak because of large difference among different mode images in medical image classification.

The invention adopts the following technical scheme for realizing the purposes:

a lightweight multi-mode medical image classification method for improving ResNeXt neural network comprises the following steps:

step 1: preprocessing medical images from two different modes to obtain processed images;

step 2: dividing the images processed in the step 1 into a training set, a testing set and a verification set according to the proportion;

step 3: data enhancement is carried out on the data of the training set;

step 4: constructing a sub-network, respectively carrying out feature extraction on the image data of the two modes, and splicing the feature tensors with different dimensions after obtaining the feature tensors;

step 5: constructing an improved ResNeXt convolutional neural network as a classification model;

step 6: taking the result of the step 4 as the input of the step 5, and taking different categories as classification results;

step 7: parameter optimization is carried out on the ResNeXt convolutional neural network, and a finally optimized classification model is saved;

step 8: and sending the preprocessed data into an optimized classification model, and outputting a classification result through a classifier of the classification model.

In the above technical solution, in step 1, the input medical images are preprocessed, that is, the images of different modes are aligned, so that the spatial positions and directions of the images are consistent, and the sample size is unified.

In the above technical solution, the step 3 of data enhancement on the data set includes randomly rotating, turning over, scaling and cropping the images of different modes.

In the above technical solution, the sub-network constructing in step 4 includes the constructed sub-network alexent and the sub-network densenett, and step 4 specifically includes the following steps:

step 4.1: constructing a sub-network AlexNet, receiving image data from a mode, extracting characteristics of the image data to obtain characteristic representation of the image data, wherein the AlexNet comprises 5 convolution layers, each convolution layer is followed by a ReLU activation function and a local response normalization layer, and then connecting two maximum pooling layers;

step 4.2: constructing a sub-network DenseNet to receive image data from another mode and extract characteristics of the image data to obtain characteristic representation of the image data, wherein a convolution layer of the DenseNet consists of a plurality of Dense blocks of Denseblocks, the convolution layer in each Dense block can be spliced with a previous convolution layer to serve as an input of a current convolution layer, so that information is fully transferred, and the convolution layer in each Dense block is a 3X3 convolution layer and is followed by a Batch Normalization and ReLU activation function;

step 4.3: adding a global tie pooling layer behind the convolution layer and pooling layer of the two sub-networks, and averaging the characteristic values on each channel to obtain a global characteristic;

step 4.4: and connecting the global features of the two sub-networks to obtain final features, and taking the final features as the input of the classification model.

In the above technical solution, step 5 specifically includes the following steps:

step 5.1: receiving the processed result in the step 4, wherein the result is the characteristic vector after the multi-mode data are spliced and is used as the input of a classification model;

step 5.2: through a first convolution layer, the layer is a convolution kernel of 7X7, the step length is 2, and a feature map with the output dimension of 64 is obtained after convolution operation;

the specific formula of the convolution operation here is:

wherein F is _out Is an output feature map, F _in Is an input feature map, k is the size of the convolution kernel, p is padding, the size of the padding, s is the size of the stride step;

step 5.3: performing convolution operation by using a 3X3 convolution check feature map in the step 5.2 through a second convolution layer, wherein an input channel is 64, connecting a normalization layer after convolution, and connecting a ReLU activation function to obtain a feature map with the channel number of 128;

step 5.4: then through the first residual block, called residual block 1, the residual block 1 replaces the convolution layer in the ResNeXt residual block by depth separable convolution, and adds a channel attention mechanism in the last layer, firstly, the convolution kernel of 1X1 is used for carrying out channel number conversion to obtain the channel number as 128, then the depth separable convolution of 3X3 is used for carrying out feature extraction, the input images are separated, convolution operations are respectively carried out in the depth direction and the space direction, the data of different modes are added at corresponding positions one by one, and then different weight factors are multiplied, finally, the result is added to obtain new data input, the quantity of parameters and the calculated quantity are reduced, light-weight feature extraction is realized, then, a channel attention is connected, the feature map is normalized in the channel dimension by using a sigmoid activation function, the weight of each channel is calculated, the features of different channels are weighted, the weight of important features is improved, then a multi-branch fusion module is connected, each branch corresponds to a feature map of one mode, the convolution kernel is adopted for carrying out the convolution operation in the depth direction, the feature pattern of 3X3 is adopted, then the feature extraction is carried out on the convolution kernel, and the feature is adopted for carrying out the feature extraction in the same mode, and the size is changed into a double-line 1, and finally, the size is adopted for outputting the double-line 1;

step 5.5: a residual block is connected, which is called residual block 2, and the residual block 2 has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 256, and the number of output channels is 512;

step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in the step 5.3, including a channel attention and depth separable convolution operation, except that the number of channels is 512, and the number of output channels is 1024;

step 5.7: a residual block called residual block 4 is connected, and the residual block has the same structure as that described in the step 5.3 and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 1024, and the number of output channels is 2048;

step 5.8: connecting a global pooling layer, averaging the feature images on each channel, and outputting a global feature vector;

step 5.9: connecting a full connection layer, classifying the output feature vectors of the global pooling layer, mapping the feature vectors to a vector space of a prediction category, and outputting a final prediction result;

step 5.10: and the output layer normalizes the output of the full-connection layer by using a softmax activation function, converts the output into probability distribution of a predicted class, and predicts the final result of the classification model into the class with the maximum probability value.

Because the invention adopts the technical means, the invention has the following beneficial effects:

the technology adopts a deep learning convolutional neural network model, uses ResNeXt as a backbone network, uses different modes of images as different input channels of the network, fuses the characteristics of data of different modes, and further carries out classification prediction. The method can utilize the powerful feature extraction capability of the ResNeXt network, simultaneously fuses multi-mode features, improves classification accuracy, and adopts a plurality of lightweight technologies such as depth separable convolution and channel attention mechanisms and the like while improving accuracy, so that the calculation complexity and storage complexity of the model can be reduced, and the time and resource consumption of model training and reasoning are reduced. Medical image classification in a single modality is typically performed using only one type of image data and is often affected by a number of factors, resulting in limited classification accuracy. Different types of medical images (e.g., CT, MRI, X-ray, etc.) have different image characteristics, and the classification of different types of images using the same classification algorithm may be limited. Moreover, single modality classification methods may exhibit poor performance for some specific types of medical images. And the medical image classification algorithm of a single mode usually needs a large amount of data to train to ensure the classification accuracy, but due to the high acquisition cost of the medical image data, the data amount is often limited, which may cause the classification algorithm of the single mode to perform poorly. In addition, medical images are often subject to interference from various noise, such as artifacts, motion artifacts, noise, etc., which can affect the quality of the medical image and thus the performance of the classification algorithm. Single modality classification algorithms may have difficulty distinguishing between noise and authentic features, resulting in classification errors. Therefore, the multi-mode medical image classification method can comprehensively utilize various types of medical image data, overcomes the defects of the single-mode classification method, and improves the classification precision and stability.

Drawings

FIG. 1 is a diagram of an improved classification network architecture;

FIG. 2 is an internal block diagram of a residual block;

fig. 3 is a channel attention module.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail. While the invention will be described and illustrated in conjunction with certain specific embodiments, it will be understood that it is not intended to limit the invention to these embodiments alone. On the contrary, the invention is intended to cover modifications and equivalent arrangements included within the scope of the appended claims.

In addition, numerous specific details are set forth in the following description in order to provide a better illustration of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details.

Therefore, the invention provides a technical scheme which has the following characteristics:

lightweight design: on the premise of ensuring classification accuracy, the technology uses a series of lightweight designs, reduces the number of parameters and the calculated amount, and is suitable for medical image equipment and mobile terminal application;

multimodal fusion: when medical images of different modes are processed simultaneously, a mode interaction mechanism is adopted, information of different modes is fully fused, and the accuracy of the classifier is improved;

attention mechanism: the technology adds a attention mechanism in the last layer of the original network residual block, enhances the characterization capability of the network, filters out unimportant noise and interference information, is more important information focused on the network, and improves the feature extraction capability of the network for different samples;

the original resnext network is pretrained on the ImageNet, inherits the superior performance of resnext on a large-scale image classification task, and is excellent in medical image classification task.

In order to facilitate the understanding of the technical scheme of the invention by the person skilled in the art, the invention provides a lightweight multi-mode medical image classification method for improving a ResNeXt neural network, which comprises the following steps:

step 3: data enhancement is carried out on the data of the training set;

the specific formula of the convolution operation here is:

The improved ResNeXt structure is shown in FIG. 1: in the figure, CT and MRI are taken as examples of data of two modes respectively;

wherein the internal structure of the residual block is shown in fig. 2, the calculation of the depth separable convolution here can be divided into two steps: depth convolution and point-by-point convolution. The depth convolution convolves each channel of the input tensor, using a separate convolution kernel for each channel. For example, the input tensor has C _in The number of channels is C _in Each channel is convolved by 3X3 convolutions to obtain C _in And a plurality of output channels. Then, point-by-point convolution is applied to the output of the depth convolution, the feature images of the channels are subjected to 1X1 convolution pixel by pixel, and the output of the depth convolution is mapped to C _out And obtaining a final output tensor through the channels.

The calculation formula of the depth separable convolution is:

Y＝depthwise(X)*pointwise(X)＝pointwise(depthwise(X))

where X is the input tensor, Y is the output tensor, X is the element-wise product, depthwise is the depth convolution, and pointwise is the point-wise convolution.

The specific structure of the channel attention module is shown in fig. 3, and the specific calculation of the channel attention module is as follows: for a feature map input as h×w×c, where H is the height of the feature map, W is the width of the feature map, and C is the number of channels of the feature map, first, global average pooling is performed on the input feature map to obtain feature vectors of each channel:

wherein i represents enumeration in the dimension H, i.e. the feature map height, k represents enumeration in the width dimension W, i.e. the feature map, i ranges from [1, H]K is in the range of [1, w]，X _i，j，k Representing a feature map X with a height i, a width j and a channel number c.

And then, calculating the feature vectors by two full connection layers to obtain s and f:

s＝σ(W ₁ z+b ₁ )，f＝σ(W ₂ s+b ₂ )

wherein σ is a sigmoid activation function, z represents an adjustable factor, b ₁ As bias term, b ₂ For the bias items, b1 and b2 can be used for adjusting and shifting the output of the activation function, so that the model is better adapted to different data distribution, and better model performance is realized;

and then taking each element of f as a channel weight, and carrying out weighted rescaling on the input feature map to obtain an output feature map Y:

Y _i，k，j ＝f _j X _i，k ，j∈[1，C]。

by adopting the scheme, the invention has the following characteristics:

by fusing the multi-mode medical image data, the information of different mode images can be fully utilized, so that the classification accuracy is improved.

And a channel attention mechanism is added to the last layer of each residual structure, so that the capability of extracting features of the network is effectively improved, and the classification capability of the network on medical images of different modes is enhanced. Through multi-mode data fusion, more comprehensive and more accurate disease or pathological change characteristics can be obtained, so that pathological analysis and medical diagnosis are better carried out, and the interpretation of the model is improved.

The convolution layer in the residual error network is improved, the convolution layer is changed into light-weight depth separable convolution, the parameter quantity is reduced, and the calculation complexity and the storage complexity of the model are reduced, so that the time and the resource consumption of model training and reasoning are reduced. The network uses the residual connection of the original residual network to help the model to better process deep information and to better process complex data.

Claims

1. The lightweight multi-mode medical image classification method for improving the ResNeXt neural network is characterized by comprising the following steps of:

step 3: data enhancement is carried out on the data of the training set;

2. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks as claimed in claim 1, wherein the input medical images are preprocessed in step 1, i.e., the images of different modalities are aligned such that their spatial locations and directions are consistent, unifying sample sizes.

3. The method of claim 1, wherein the step 3 of enhancing the data set comprises randomly rotating, flipping, scaling, and cropping the images of different modalities.

4. The method for classifying lightweight multi-modal medical images with improved ResNeXt neural networks according to claim 1, wherein the constructing of the sub-networks in step 4 includes constructing sub-networks alexent and sub-network DenseNet, and step 4 specifically includes the steps of:

5. The method for classifying lightweight multi-modal medical images for improved ResNeXt neural networks according to claim 1, wherein step 5 comprises the steps of:

the specific formula of the convolution operation here is:

step 5.6: a residual block, called residual block 3, is connected, and has the same structure as that described in step 5.3, and comprises a channel attention and depth separable convolution operation, wherein the difference is that the number of channels is 512, and the number of output channels is 1024;