CN114387467A

CN114387467A - Medical image classification method based on multi-module convolution feature fusion

Info

Publication number: CN114387467A
Application number: CN202111501536.8A
Authority: CN
Inventors: 孙明健; 沈毅; 马凌玉; 胡歆格
Original assignee: Harbin Institute Of Technology At Zhangjiakou; Harbin Institute of Technology
Current assignee: Harbin Institute Of Technology At Zhangjiakou; Harbin Institute of Technology
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-04-22
Anticipated expiration: 2041-12-09
Also published as: CN114387467B

Abstract

The invention discloses a medical image classification method based on multi-module convolution feature fusion, which comprises the following steps: firstly, preprocessing an image; secondly, carrying out modular convolution design on the network model; step three, fusing multi-module features; step four, feature extraction based on multi-module fusion features; and step five, outputting a classification result through the average pooling layer. The classification method provided by the invention has the advantages that the features are extracted in a convolution mode, the parameter quantity is small, the operation speed is high, the method is suitable for the features of a small number of samples of medical image data, excessive weight parameters do not need to be updated, and meanwhile, the method is excellent in performance through experimental verification. The method is not only suitable for two-classification tasks, but also suitable for multi-classification tasks, and can flexibly adjust the number of output channels of each module according to task requirements so as to improve the classification effect.

Description

Medical image classification method based on multi-module convolution feature fusion

Technical Field

The invention belongs to the technical field of artificial intelligence, relates to a medical image classification method, and particularly relates to a medical image classification method based on multi-module convolution feature fusion.

Background

Deep learning techniques enable machines to analyze various training images and automatically extract feature expressions using back-propagation algorithms. It can solve many image understanding and analysis problems such as semantic segmentation, image recognition and classification. At present, Convolutional Neural Networks (CNN) are gradually becoming standard technologies in medical image screening classification, and their applications are very wide, such as classification of skin cancer, identification of lung nodules, diagnosis of thyroid diseases, and the like. The medical image has the characteristics of small sample amount, unbalanced data distribution and the like, so that the feature extraction of limited samples is the key point of effective identification. At present, in medical image classification and identification, more classic CNN models are used, and the accuracy rate needs to be further improved.

Disclosure of Invention

The invention aims to provide a medical image classification method based on multi-module convolution feature fusion so as to fully extract key features of images and improve the accuracy of medical image identification.

The purpose of the invention is realized by the following technical scheme:

a medical image classification method based on multi-module convolution feature fusion comprises the following steps:

step one, image preprocessing

Two-dimensional medical images with any size are converted into 224 multiplied by 224 through a preprocessing module, meanwhile, the using state of a current network model is identified in the preprocessing module, and if the current network model is in a training stage, image data enhancement operation is carried out, such as one or more of rotation, overturning, brightness adjustment, Gaussian noise addition and the like; if the current network model is in the testing stage, the operation is not carried out;

step two, network model sub-module convolution design

The multi-module convolutional network model architecture comprises the following 5 modules:

module 1: comprising a first layer Conv3-64 and a second layer Dwconv3-64, wherein: after the preprocessing module, the input data dimension of the first layer Conv3-64 is nx3x224 x 224, namely the height and the width of an input image are both 224, the size of the data volume loaded in each batch is N, the color channels are RGB, and feature maps with the dimension of nx64 x224 x 224 are output to the second layer Dwconv 3-64; the second layer Dwconv3-64 outputs a feature map A with the dimension of Nx 64 x 224, the feature map A outputs a feature map with the dimension of Nx 64 x 112 through the maximum pooling layer, and the feature map A is input into the module 2; the output of the first layer Conv3-64 and the output of the second layer Dwconv3-64 are connected by using a standard convolutional layer Conv1-32, feature graphs with dimensions of Nx32323232224X 224 are output, summation operation sigma is carried out on the output feature graphs to obtain an information-fused feature graph with dimensions of Nx323232224X 224, the information-fused feature graph is subjected to feature extraction by the standard convolutional layer Conv1-X, and X is a variable and is adaptively adjusted according to the learning category number to suggest a value range: x is more than or equal to 16 and less than or equal to 64, a feature map with the output dimension of NxX multiplied by 224 is output, and a feature map B with the output dimension of NxX multiplied by 7 is output through an average pooling layer with the convolution kernel size of 32 multiplied by 32;

and (3) module 2: comprising a third layer Conv3-128 and a fourth layer Dwconv3-128, wherein: the third layer Conv3-128 inputs the feature maps with the dimensions of Nx 64 x 112 and outputs the feature maps with the dimensions of Nx 128 x 112 to the fourth layer Dwconv 3-128; a feature map C with the dimension of Nx128 × 112 × 112 is output by the fourth layer Dwconv3-128, the feature map C is output by the largest pooling layer, and the feature map with the dimension of Nx128 × 56 × 56 is input into the module 3; the outputs of the Conv3-128 of the third layer and the Dwconv3-128 of the fourth layer are connected by using standard convolutional layers Conv1-32, the output dimensions are all N × 32 × 112 × 112 feature maps, summation operation sigma is carried out on the output feature maps to obtain information-fused feature maps with the dimensions of N × 32 × 112 × 112, feature extraction is carried out on the information-fused feature maps through the standard convolutional layers Conv1-X, feature maps with the dimensions of N × X × 112 × 112 are output, and feature maps D with the dimensions of N × X × 7 × 7 are output through an average pooling layer with the convolutional kernel size of 16 × 16;

and a module 3: the fifth layer Conv3-256, the sixth layer Dwconv3-256 and the seventh layer Dwconv3-256 are included, the fifth layer Conv3-256 inputs the characteristic diagram with the dimension of N × 128 × 56 × 56, and outputs the characteristic diagram with the dimension of N × 256 × 56 × 56 to the sixth layer Dwconv 3-256; the sixth layer Dwconv3-256 outputs a feature map with dimensions of Nx256 x56 x 56 to the seventh layer Dwconv 3-256; the seventh layer Dwconv3-256 outputs a feature map E with the dimension of Nx256 × 56 × 56, the feature map E outputs the feature map with the dimension of Nx256 × 28 × 28 through the maximum pooling layer, and the feature map E is input into the module 4; the outputs of the fifth layer Conv3-256, the sixth layer Dwconv3-256 and the seventh layer Dwconv3-256 are all connected by using a standard convolutional layer Conv1-32, feature maps with N × 32 × 56 × 56 are output, summation operation sigma is carried out on the output feature maps to obtain an information fused feature map with N × 32 × 56 × 56, feature extraction is carried out on the information fused feature map through a standard convolutional layer Conv1-X, a feature map with N × X × 56 × 56 is output, and a feature map F with N × X × 7 × 7 is output through an average pooling layer with 8 × 8 convolutional kernel size;

and (4) module: comprises an eighth layer of Conv3-512, a ninth layer of Dwconv3-512 and a tenth layer of Dwconv3-512, wherein: the eighth layer Conv3-512 inputs the feature map with the dimension of N × 256 × 28 × 28, and outputs the feature map with the dimension of N × 512 × 28 × 28 to the ninth layer Dwconv 3-512; the ninth layer Dwconv3-512 outputs a feature map with dimensions of Nx512 x28 x 28 to the tenth layer Dwconv 3-512; a tenth layer Dwconv3-512 outputs a feature map G with the dimension of Nx512 x28 x 28, the feature map G outputs a feature map with the dimension of Nx512 x14 x 14 through the maximum pooling layer, and the feature map G is input into the module 5; the outputs of the Conv3-512 of the eighth layer, the Dwconv3-512 of the ninth layer and the Dwconv3-512 of the tenth layer are all connected by using standard convolutional layers Conv1-32, feature maps with N multiplied by 32 multiplied by 28 are output, summation operation sigma is carried out on the output feature maps to obtain an information fused feature map with N multiplied by 32 multiplied by 28, feature extraction is carried out on the information fused feature map through the standard convolutional layers Conv1-X, a feature map with N multiplied by X28 multiplied by 28 is output, and a feature map with N multiplied by X7 multiplied by 7 is output through an average pooling layer with the convolutional kernel size of 4 multiplied by 4;

and a module 5: comprising an eleventh layer of Conv3-512, a twelfth layer of Dwconv3-512 and a thirteenth layer of Dwconv3-512, wherein: the eleventh layer Conv3-512 inputs the feature map with the dimension of N × 512 × 14 × 14, and outputs the feature map with the dimension of N × 512 × 14 × 14 to the twelfth layer Dwconv 3-512; the twelfth layer Dwconv3-512 outputs a feature map with dimensions of Nx512 x14 to the thirteenth layer Dwconv 3-512; the thirteenth layer Dwconv3-512 outputs a feature map I with dimensions of Nx 512 x 14; the outputs of the Conv3-512 of the eleventh layer, the Dwconv3-512 of the twelfth layer and the Dwconv3-512 of the thirteenth layer are all connected by using standard convolutional layers Conv1-32, feature maps with N × 32 × 14 × 14 are output, summation operation sigma is carried out on the output feature maps to obtain an information fused feature map with N × 32 × 14 × 14, feature extraction is carried out on the information fused feature map through standard convolutional layers Conv1-X, a feature map with N × X × 14 × 14 is output, and an average pooling layer with a convolutional kernel size of 2 × 2 is output to obtain a feature map J with N × X × 7 × 7;

step three, multi-module feature fusion

Performing multi-module feature fusion by using concat mode: fusing the N multiplied by X7 multiplied by 7 feature map B output by the module 1, the N multiplied by X7 multiplied by 7 feature map D output by the module 2, the N multiplied by X7 multiplied by 7 feature map F output by the module 3, the N multiplied by X7 multiplied by 7 feature map H output by the module 4 and the N multiplied by X7 multiplied by 7 feature map J output by the module 5 into the N multiplied by 5X 7 multiplied by 7 feature map according to channels;

step four, feature extraction based on multi-module fusion features

Performing feature extraction on the Nx 5X 7 feature map fused in the step three by using a standard convolutional layer Conv1-nclass to obtain an Nx nclass X7 feature map;

step five, outputting classification results through an average pooling layer

And (3) outputting a classification result of Nxnclass by dimensionality reduction and outputting a class prediction probability by softmax by using an average pooling layer with a convolution kernel size of 7 × 7.

Compared with the prior art, the invention has the following advantages:

1. the classification method provided by the invention has the advantages that the features are extracted in a convolution mode, the parameter quantity is small (when Conv1-X is Conv1-16, the parameter quantity is 5.2MB), the operation speed is high, the method is suitable for the features of a small number of samples of medical image data, excessive weight parameters do not need to be updated, and meanwhile, the method is excellent in performance through experimental verification.

2. The method is not only suitable for two-classification tasks, but also suitable for multi-classification tasks, and can flexibly adjust the number of output channels of each module according to task requirements so as to improve the classification effect.

3. The last layer of the method uses an average pooling layer, so that the characteristics learned by the network can be conveniently extracted, and the class activation heat map can be visualized.

Drawings

FIG. 1 is a network model architecture proposed by the present invention;

FIG. 2 is an example of a four-position anatomical structure;

FIG. 3 is an example class activation heatmap.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings, but not limited thereto, and any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention shall be covered by the protection scope of the present invention.

The invention provides a medical image classification method based on multi-module convolution feature fusion, which comprises the following steps:

step one, image preprocessing

Two-dimensional medical images of any size are converted into 224 multiplied by 224 through a preprocessing module, meanwhile, the using state of a current network model is identified in the preprocessing module, and if the current network model is in a training stage, image data enhancement operation such as rotation, overturning, brightness adjustment, Gaussian noise addition and the like is carried out; if the current network model is in the testing stage, the operation is not performed.

Step two, network model sub-module convolution design

As shown in fig. 1, the network model architecture comprises 5 modules, each module is composed of a standard convolutional layer, a depth separable convolutional layer and an average pooling layer, the modules are connected by using a maximum pooling layer, the maximum pooling layer uses a convolution kernel of 2 × 2, and the step length stride is 2; the activation function used by the model is the ReLU function. If the classification number nclass < ═ 16, Conv1-X can be Conv 1-16; if nclass >16, the channel depth of Conv1-X can be increased, selecting X > 20. The implementation process is illustrated by taking Conv1-X as Conv1-16 as an example, when Conv1-X is Conv1-16, the total parameter number of the model is 5.2MB, and the calculated FLOPs is about 83.8 GB.

The default input data dimensions of the network are nx3 × 224 × 224, N is the size of each batch of data loaded, the height and width of the input image are both 224, and the color channels are RGB.

(1) Module 1: conv3-64 is the first convolutional layer, i.e. the sliding window of the convolutional kernel used is 3 × 3 × 3, the step size stride is 1, and the padding is 1. By convolving the general formula (1) and the general formula (2), the height and the width of the final output feature map can be both 224, and the feature map with the final output depth of 64 is required by the convolution layer, so the dimension of the feature map output by Conv3-64 is N × 64 × 224 × 224. The second layer Dwconv3-64 represents a depth separable convolution that divides the convolution process into two steps, channel-by-channel convolution and point-by-point 1 × 1 convolution, with the channel-by-channel convolution corresponding to 64 3 × 3 convolution kernels, resulting in an output of size N × 64 × 224 × 224. And then point-by-point 1 × 1 convolution is used to fuse the features of different channels, and the feature map A with the input dimension of N × 64 × 224 × 224 is output. The outputs of the first layer Conv3-64 and the second layer Dwconv3-64 are connected using a standard convolutional layer Conv1-32 (convolutional kernel size 1 × 1, channel depth 32, padding ═ 0, stride ═ 1), and the output profile sizes are each N × 32 × 224 × 224. Then, the summation operation Σ is performed on the output feature maps, and the obtained information-fused feature map N × 32 × 224 × 224 is obtained. Then, feature extraction is performed by using a standard convolution layer Conv1-16 (convolution kernel size 1 × 1, channel depth 16, padding is 0, stride is 1), and the dimension of an output feature map is N × 16 × 224 × 224. The feature map is output as a feature map B having dimensions N × 16 × 7 × 7 via an average pooling layer having a convolution kernel size of 32 × 32.

Wherein, W and H represent the width and height of the image, respectively, and subscript input represents the relevant parameters of the input image; subscript output expresses the related parameters of the output image, and subscript filter expresses the related parameters of a convolution kernel; s represents the step size of the convolution kernel; p (shorthand for padding) represents an increased number of boundary pixel layers at the edge of the image.

The feature map A output by the module 1 has the dimension of Nx 64 x 112 by the maximum pooling layer and is input into the module 2.

(2) And (3) module 2: conv3-128 is the third convolutional layer, using a convolutional kernel size of 3 × 3, step size of 1, padding of 1, and dimension of the feature map to obtain output of N × 128 × 112 × 112. The fourth layer uses the depth separable convolution layer Dwconv3-128, and convolves corresponding 128 3 × 3 convolution kernels channel by channel to obtain the output with the size of N × 128 × 112 × 112; and then point-by-point 1 × 1 convolution is used to fuse the features of different channels, and a feature map C with dimensions of N × 128 × 112 × 112 is output. The outputs of the third layer Conv3-128 and the fourth layer Dwconv3-128 are connected using standard convolutional layers Conv1-32 (convolutional kernel size 1 × 1, channel depth 32, padding ═ 0, stride ═ 1), and the output profile sizes are each N × 32 × 112 × 112. Then, the summation operation Σ is performed on the output feature maps, and the obtained information-fused feature map N × 32 × 112 × 112 is obtained. Then, feature extraction is performed by using a standard convolution layer Conv1-16 (convolution kernel size 1 × 1, channel depth 16, padding is 0, stride is 1), and the dimension of an output feature map is N × 16 × 112 × 112. The feature map is output as a feature map D having dimensions N × 16 × 7 × 7 via an average pooling layer having a convolution kernel size of 16 × 16.

The feature map C output by the module 2 has the dimension of Nx128 x56 x 56 by the maximum pooling layer and is input into the module 3.

(3) And a module 3: conv3-256 is the fifth convolutional layer, using a convolutional kernel size of 3 × 3, step size of 1, padding of 1, and dimension of the feature map to obtain output of N × 256 × 56 × 56. The sixth layer uses the depth separable convolution layer Dwconv3-256, convolves corresponding 256 3 × 3 convolution kernels channel by channel, and obtains the output with the size of N × 256 × 56 × 56; and then point-by-point 1 × 1 convolution is used to fuse the features of different channels, and a feature map with dimensions of Nx256 × 56 × 56 is output to the seventh layer. The seventh layer still uses the depth separable convolutional layers Dwconv3-256 to obtain a feature map E with output dimensions of nx256 × 56 × 56. The outputs of the fifth layer Conv3-256, the sixth layer Dwconv3-256 and the seventh layer Dwconv3-256 are all connected using standard convolutional layers Conv1-32 (convolutional kernel size 1 × 1, channel depth 32, padding ═ 0, stride ═ 1), and the output signature sizes are all N × 32 × 56 × 56. Then, the summation operation Σ is performed on the output feature maps, and the obtained information-fused feature map N × 32 × 56 × 56 is obtained. Then, feature extraction is performed by using a standard convolution layer Conv1-16 (convolution kernel size 1 × 1, channel depth 16, padding is 0, stride is 1), and the dimension of an output feature map is N × 16 × 56 × 56. The feature map is output as a feature map F having dimensions N × 16 × 7 × 7 via an average pooling layer having a convolution kernel size of 8 × 8.

The feature map E output by the module 3 is input into the module 4 with the largest pooling level dimension of nx256 × 28 × 28.

(4) And (4) module: conv3-512 is the eighth convolutional layer, using a convolutional kernel size of 3 × 3, step size of 1, padding of 1, and dimension of the feature map obtained as output of N × 512 × 28 × 28. The ninth layer uses a depth separable convolution layer Dwconv3-512, and convolves corresponding 512 3 × 3 convolution kernels channel by channel to obtain output with the size of N × 512 × 28 × 28; and then point-by-point 1 × 1 convolution is used to fuse the features of different channels, and a feature map with dimensions of N × 512 × 28 × 28 is output to the tenth layer. The tenth layer still uses the depth separable convolutional layers Dwconv3-512, resulting in a feature map G with output dimensions of N × 512 × 28 × 28. The outputs of the eighth layer Conv3-512, the ninth layer Dwconv3-512 and the tenth layer Dwconv3-512 are all connected using standard convolutional layers Conv1-32 (convolutional kernel size 1 × 1, channel depth 32, coding ═ 0, stride ═ 1), and the output signature size is N × 32 × 28 × 28. Then, the summation operation Σ is performed on the output feature maps, and the obtained information-fused feature map N × 32 × 28 × 28 is obtained. Then, feature extraction is performed by using a standard convolution layer Conv1-16 (convolution kernel size 1 × 1, channel depth 16, padding is 0, stride is 1), and the dimension of an output feature map is N × 16 × 28 × 28. The feature map is output as a feature map H having dimensions N × 16 × 7 × 7 through an average pooling layer having a convolution kernel size of 4 × 4.

The feature map G output by the module 4 has the dimension of Nx512 x14 x 14 by the maximum pooling layer and is input into the module 5.

(5) And a module 5: the standard convolutional layer and the depth separable convolutional layer of the module 5 are substantially the same as those of the module 4. Conv3-512 is the eleventh convolutional layer, using a convolutional kernel size of 3 × 3, step size of 1, padding of 1, and dimension of the output feature map of N × 512 × 14 × 14. The twelfth layer uses a depth separable convolution layer Dwconv3-512, and convolution is carried out on the corresponding 512 3 × 3 convolution kernels channel by channel to obtain output with the size of N × 512 × 14 × 14; and then point-by-point 1 × 1 convolution is used to fuse the features of different channels, and a feature map with dimensions of N × 512 × 14 × 14 is output to the thirteenth layer. The thirteenth layer still uses the depth separable convolutional layers Dwconv3-512, resulting in a feature map I with output dimensions of N × 512 × 14 × 14. The outputs of the eleventh layer Conv3-512, the twelfth layer Dwconv3-512 and the thirteenth layer Dwconv3-512 are all connected using a standard convolutional layer Conv1-32 (convolutional kernel size 1 × 1, channel depth 32, coding ═ 0, stride ═ 1), and the output signature size is N × 32 × 14 × 14. Then, summing operation sigma is carried out on the output feature maps, and the obtained information-fused feature map is Nx 32 x 14. Then, feature extraction is performed by using a standard convolution layer Conv1-16 (convolution kernel size 1 × 1, channel depth 16, padding is 0, stride is 1), and the dimension of an output feature map is N × 16 × 14 × 14. The feature map is a feature map J with N × 16 × 7 × 7 in output dimension through an average pooling layer with a convolution kernel size of 2 × 2.

Step three, multi-module feature fusion

Performing multi-module feature fusion by using concat mode: the N × 16 × 7 × 7 feature map B output by the module 1, the N × 16 × 7 × 7 feature map D output by the module 2, the N × 16 × 7 × 7 feature map F output by the module 3, the N × 16 × 7 × 7 feature map H output by the module 4, and the N × 16 × 7 × 7 feature map J output by the module 5 are merged into an N × 80 × 7 × 7 feature map by channel.

Step four, feature extraction based on multi-module fusion features

The input dimension is Nx 80 x 7 by using a standard convolutional layer Conv1-nclass (namely, the size of a convolutional kernel is 1 x 1, and the number of channels is the corresponding class number nclass), and features are extracted to obtain a feature map of Nx nclass x 7.

Step five, outputting classification results through an average pooling layer

And (4) performing dimensionality reduction to output an Nx nclass classification result by using an average pooling layer with a convolution kernel size of 7 x 7. The predicted probability for a category may be output via softmax.

Example (b):

the anatomical position recognition is carried out on cardia, anterior wall of stomach angle, posterior wall of stomach angle and pylorus by using the method provided by the invention, namely nclass is 4. A position example image is shown in fig. 2.

The data set is divided into a training set and a testing set, and the size N of the data volume loaded in each batch is 16. The training set contains 1912 total sheets of 320 sheets of cardia, 634 sheets of anterior wall of stomach corner, 634 sheets of posterior wall of stomach corner and 324 sheets of pylorus; the test set contains 741 total cardia 101, anterior wall of stomach corner 245, posterior wall of stomach corner 262 and pylorus 133. As shown in Table 1, the model has an accuracy of > 99% for the identification of 4 anatomical locations.

TABLE 1 anatomical site identification evaluation index

The last layer of the network model provided by the invention is an average pooling layer, so that the gradient back propagation can be directly carried out without changing the model, and the activation-like heat map can be obtained, as shown in fig. 3.

Claims

1. A medical image classification method based on multi-module convolution feature fusion is characterized by comprising the following steps:

step one, image preprocessing

Two-dimensional medical images with any size are converted into 224 multiplied by 224 through a preprocessing module, meanwhile, the using state of a current network model is identified in the preprocessing module, and if the current network model is in a training stage, image data enhancement operation is carried out; if the current network model is in a test stage, image data enhancement operation is not carried out;

step two, network model sub-module convolution design

module 1: comprising a first layer Conv3-64 and a second layer Dwconv3-64, wherein: after the preprocessing module, the input data dimension of the first layer Conv3-64 is nx3x224 x 224, namely the height and the width of an input image are both 224, the size of the data volume loaded in each batch is N, the color channels are RGB, and feature maps with the dimension of nx64 x224 x 224 are output to the second layer Dwconv 3-64; the second layer Dwconv3-64 outputs a feature map A with the dimension of Nx 64 x 224, the feature map A outputs a feature map with the dimension of Nx 64 x 112 through the maximum pooling layer, and the feature map A is input into the module 2; the output of the first layer Conv3-64 and the output of the second layer Dwconv3-64 are connected by using a standard convolutional layer Conv1-32, output dimensions are all N X32X 224 feature maps, summation operation sigma is carried out on the output feature maps to obtain an information fused feature map with the dimensions of N X32X 224, feature extraction is carried out on the information fused feature map through the standard convolutional layer Conv1-X, a feature map with the dimensions of N X224 is output, and a feature map B with the dimensions of N X7 is output through an average pooling layer with the convolutional kernel size of 32X 32;

step three, multi-module feature fusion

step four, feature extraction based on multi-module fusion features

step five, outputting classification results through an average pooling layer

And (3) outputting a classification result of Nxnclass through dimensionality reduction by using an average pooling layer with a convolution kernel size of 7 × 7, and outputting a prediction probability of the class through softmax.

2. The method for classifying medical images based on multi-module convolution feature fusion according to claim 1, wherein the image data enhancement operation is one or more of rotation, inversion, brightness adjustment and Gaussian noise addition.

3. The method for classifying medical images based on multi-module convolution feature fusion according to claim 1, wherein each of the modules 1 to 5 is composed of a standard convolution layer, a depth separable convolution layer and an average pooling layer, and a maximum pooling layer is used for connection between modules.

4. The method for classifying medical images based on multi-module convolution feature fusion according to claim 1 or 3, characterized in that the maximum pooling layer uses a convolution kernel of 2 x 2 and the step size stride is 2.

5. The method of classifying medical images based on multi-module convolution feature fusion according to claim 1, wherein the activation function used by the network model is a ReLU function.

6. The method of claim 1, wherein X is 16 ≦ X ≦ 64.