CN112561863B

CN112561863B - Medical image multi-classification recognition system based on improved ResNet

Info

Publication number: CN112561863B
Application number: CN202011406222.5A
Authority: CN
Inventors: 李玲; 梁楫坤; 崔红花; 张海蓉; 黄玉兰; 姚桂锦
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2022-06-10
Anticipated expiration: 2040-12-03
Also published as: CN112561863A

Abstract

The invention belongs to the technical field of medical image processing, and particularly relates to a granular cell image fine-grained classification and identification system based on deep learning; the positioning module is used for extracting features of an input granulocyte picture by utilizing a Hourglass network model, respectively positioning all cells in the granulocyte picture, cutting the positioned cells out, leaving single complete cells, and carrying out size normalization processing on all the cut cells; the classification module classifies the granulocytes positioned by the positioning module by adopting the constructed deep learning classification model; the system can assist clinicians in accurately and efficiently completing granulocyte classification, identification and counting tasks, reduce errors caused by subjectivity, reduce the workload of doctors, and assist the doctors in making disease judgment; the system can effectively solve the cell classification under the unbalanced data and the fine-grained classification among granular cells, and improves the network classification and identification effects.

Description

Medical image multi-classification recognition system based on improved ResNet

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a medical image multi-classification recognition system based on improved ResNet.

Background

There are three types of blood cells in the human body, red blood cells, granulocytes, and platelets. The recognition classification of granulocytes is considered to be an active research area compared to other cell types, as granulocytes are responsible for the immunity of the human body. The counting of granulocytes in the bone marrow provides valuable information for physicians and aids for many important diagnoses, such as leukemia and aids. Granulocyte identification and counting is performed manually under a microscope, which is not only time-consuming but also has a high error rate.

At present, the clinical examination method for the granulocytes is manual microscopic examination, the accuracy of the manual microscopic examination can reach more than 95%, but the manual microscopic examination efficiency is low, the classification speed is slow, and the accuracy is influenced by the experience and the state of an inspector. In the field of medical image processing, with the great progress of imaging technology, the auxiliary medical diagnosis by computer graphics is a great trend, on one hand, the development of the imaging technology brings massive medical data, on the other hand, the auxiliary diagnosis by computer graphics can generate images of blood samples, and accurate computer-aided labor is helpful for accelerating the diagnosis of diseases, reducing the workload of doctors and improving the working efficiency, and bringing more accurate and efficient diagnosis results.

Deep learning is a new field in machine learning research, and the motivation is to establish and simulate human brain for analysis learning. Deep learning is a data-driven model, and can simulate human brain visual mechanisms to automatically learn abstract features of data at all levels, so that essential features of the data can be better reflected. At present, deep learning is involved in the aspects of lesion classification, segmentation, identification, brain function research and the like of medical images.

Due to the fact that artificial omission and the unbalanced number of different cell samples are often generated in the data set collection process, the network model classification and identification effect based on deep learning is not ideal, and the accuracy of cell classification of certain types with small data volumes is low.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a medical image multi-classification recognition system based on improved ResNet, which is a granulocytic fine-grained classification recognition statistical system based on improved ResNet under an unbalanced data set, and comprises a preprocessing module, a positioning module and a classification module; the preprocessing module improves the data set capacity and simultaneously reduces background interference caused by uneven dyeing through a data enhancement method; the positioning module and the recognition module use the pre-trained network parameters as initial values of a learning network, simultaneously perform feature extraction on images in different channels, perform pooling downsampling according to the spatial position of a view, perform feature extraction and feature fusion on the images, analyze the granulocyte images collected under a microscope, assist clinicians in accurately and efficiently completing granulocyte classification, recognition and counting tasks, reduce errors caused by subjectivity, reduce the workload of doctors, and assist the doctors in making disease judgment. The system can effectively solve the cell classification under the unbalanced data and the fine-grained classification among granular cells, and improves the network classification and identification effects.

A medical image multi-classification recognition system based on improved ResNet comprises a positioning module and a classification module, wherein the positioning module utilizes a Hourglass network model to perform feature extraction on an input granulocyte picture, positions all cells in the granulocyte picture respectively, cuts out the positioned cells, leaves single complete cells, and performs size normalization processing on all the cut cells;

the classification module classifies the granulocytes positioned by the positioning module by adopting the constructed deep learning classification model:

the network structure of the constructed deep learning classification model is as follows:

the first layer is a convolutional layer, the number of convolutional cores is 64, the size of each convolutional core is 7 x 7, the second layer is a normalization layer and an activation function layer, the third layer is a pooling layer, maximum pooling is adopted, the pooling size is 3 x 3, the fourth layer is a ResNet-Block classification model, and the fifth layer and the sixth layer are TBC-Block classification models;

the ResNet-Block classification model comprises two branches, wherein the first layer of the first branch is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second layer is the convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 x 3, the third layer is the convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second branch is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, and a BN layer and an activation function layer are added after each convolution layer of each branch;

the TBC-Block modules of the fifth layer and the sixth layer have the same structure and respectively comprise three branches and three fully-connected layers, wherein the first branch is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 3 x 3, the second branch is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 256, the size of each convolution kernel is 1 x 1, and a BN layer and an activation function layer are added after each convolution layer of each branch;

and finally, the network is a three-layer full-connection layer, an activation function is added behind the three-layer full-connection layer, a Softmax classifier is added behind the activation function, the Softmax classifier classifies the cells and outputs the category of each cell.

The Hourglass network module is of a symmetrical structure and comprises four lower convolution layer groups and four upper convolution layer groups;

the first lower convolution layer group comprises a first convolution layer, a second convolution layer and a third convolution layer, wherein the size of each convolution kernel of the first convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the second convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the third convolution layer is 1 x 1, and the number of the convolution kernels is 256;

the second lower convolution layer group comprises a fourth convolution layer group, a fifth convolution layer group and a sixth convolution layer group, wherein the size of each convolution kernel of the fourth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the fifth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the sixth convolution layer is 1 x 1, and the number of the convolution kernels is 256;

the third lower convolution layer group comprises a seventh convolution layer group, an eighth convolution layer group and a ninth convolution layer group, wherein the size of each convolution kernel of the seventh convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the eighth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the ninth convolution layer is 1 x 1, and the number of the convolution kernels is 256;

the fourth lower convolution layer group includes the tenth, eleventh and twelfth convolution layers, wherein the size of each convolution kernel of the tenth convolution layer is 1 x 1, the number of convolution kernels is 256, the size of each convolution kernel of the eleventh convolution layer is 3 x 3, the number of convolution kernels is 128, the size of each convolution kernel of the twelfth convolution layer is 1 x 1, and the number of convolution kernels is 256;

the first upper convolution layer group comprises a thirteenth convolution layer group, a fourteenth convolution layer group and a fifteenth convolution layer group, wherein the size of each convolution kernel of the thirteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the fourteenth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the fifteenth convolution layer is 1 x 1, and the number of the convolution kernels is 256;

the second upper convolution layer group includes sixteenth, seventeenth, and eighteenth convolution layers, wherein the size of each convolution kernel of the sixteenth convolution layer is 1 × 1, the number of convolution kernels is 256, the size of each convolution kernel of the seventeenth convolution layer is 3 × 3, the number of convolution kernels is 128, the size of each convolution kernel of the eighteenth convolution layer is 1 × 1, and the number of convolution kernels is 256;

the third upper convolution layer set includes nineteenth, twentieth, and twenty-third convolution layers, where each convolution kernel of the nineteenth convolution layer has a size of 1 x 1, the number of convolution kernels is 256, each convolution kernel of the twentieth convolution layer has a size of 3 x 3, the number of convolution kernels is 128, each convolution kernel of the twenty-first convolution layer has a size of 1 x 1, and the number of convolution kernels is 256;

the fourth upper convolution layer group includes twenty-two, twenty-three, and twenty-four convolution layers, where each convolution kernel of the twenty-second convolution layer is 1 x 1 in size, the number of convolution kernels is 256, each convolution kernel of the twenty-third convolution layer is 3 x 3 in size, the number of convolution kernels is 128, each convolution kernel of the twenty-fourth convolution layer is 1 x 1 in size, and the number of convolution kernels is 256;

adding a pooling layer after each lower convolution layer group, wherein the size of the pooling layer is 2 x 2, and the step length is 2;

adding an upper sampling layer after each upper convolution layer group;

the training process of the Hourglass network model adopted by the positioning module comprises the following steps:

collecting 2000 granulocyte pictures as a training set, and carrying out normalization operation on the sizes of the collected pictures;

manually marking each cell and the category of each cell in each picture in the training set to obtain a marked training set;

improving the image capacity of each image in the annotation training set by a data enhancement method and reducing background interference caused by uneven dyeing;

and step three, inputting the labeling training set processed in the step two into a Hourglass network model of a positioning module for training, enabling the Hourglass network model to learn the characteristics of various cells labeled in the labeling training set, and obtaining a trained model when the positioning accuracy of the Hourglass network model to the cells labeled in the labeling training set is 98%, wherein the positioning accuracy of the Hourglass network model to the cells labeled in the labeling training set is that the Hourglass network model positions the number of all cells in the labeling training set/the number of all cells manually labeled in the labeling training set is 100%.

The training process of the constructed deep learning classification model adopted by the classification module comprises the following contents:

inputting the processed labeling training set into a constructed deep learning classification model adopted by a classification module for training, outputting parameters capable of identifying morphological characteristic information of various cells, classifying through a full connection layer, and outputting the types of the cells, wherein when the classification accuracy of the constructed deep learning classification model to the cells labeled in the labeling training set is 90%, a trained classification model is obtained, and the classification accuracy of the constructed deep learning classification model to the cells labeled in the labeling training set is 100% of the number of the cells of all the cell types in the labeling training set/the number of all the cells of the manually labeled types in the labeling training set.

The counting module counts different types of cells output by the classification module respectively and generates a cell count classification report.

The invention has the beneficial effects that:

the method combines the traditional image processing algorithm and the target recognition network to carry out fine-grained classification, recognition and statistics on granulocytes under a microscope, adopts key point detection and an Anchor-Free-based image positioning network, greatly reduces the calculated amount of the network, improves the network performance, adopts a novel constructed fine-grained classification framework, and effectively improves the recognition accuracy, the judgment precision and the robustness.

Drawings

FIG. 1 is a schematic diagram of a Hourglass network model in the positioning module according to the present invention;

FIG. 2 is a structural diagram of a deep learning classification model constructed in the classification module of the present invention;

FIG. 3 is a diagram of ResNet-Block in the classifier architecture of the present invention;

FIG. 4 is a schematic diagram of TBCBResNet-Block in the classifier structure of the present invention.

Detailed Description

The invention discloses a medical image multi-classification recognition system based on improved ResNet, which comprises two modules: a positioning module and a classification module, wherein:

the positioning module utilizes a Hourglass network model to perform feature extraction on an input granulocyte picture, positions all cells in the granulocyte picture respectively, positions the cells by extracting the central point of a target cell, cuts out the positioned cells, only leaves single complete cells in a visual field, and performs size normalization processing on all the cut cells;

as shown in fig. 1, the Hourglass network module has a symmetrical structure and comprises four lower convolution layer groups and four upper convolution layer groups; and cross-layer connection is carried out in the upper and lower symmetrical convolution layer groups through feature map fusion:

the first lower convolution layer group comprises a first convolution layer group, a second convolution layer group and a third convolution layer group, wherein the size of each convolution kernel of the first convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the second convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the third convolution layer is 1 x 1, the number of the convolution kernels is 256, and the first lower convolution layer group and the fourth upper convolution layer group are connected through the same convolution structure as the first lower convolution layer group;

the second lower convolution layer group comprises a fourth convolution layer group, a fifth convolution layer group and a sixth convolution layer group, wherein the size of each convolution kernel of the fourth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the fifth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the sixth convolution layer is 1 x 1, the number of the convolution kernels is 256, and the third lower convolution layer group and the third upper convolution layer group are connected through the same convolution structure as the second lower convolution layer group;

the third lower convolution layer group comprises a seventh convolution layer group, an eighth convolution layer group and a ninth convolution layer group, wherein the size of each convolution kernel of the seventh convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the eighth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the ninth convolution layer is 1 x 1, the number of the convolution kernels is 256, and the seventh lower convolution layer group and the ninth upper convolution layer group are connected through the same convolution structure as the third lower convolution layer group;

a fourth lower convolution layer group includes the tenth, eleventh, and twelfth convolution layers, wherein the tenth convolution layer has a size of 1 × 1 for each convolution kernel, 256 convolution kernels in number, 3 × 3 for each convolution kernel of the eleventh convolution layer, 128 convolution kernels in number, 1 × 1 for each convolution kernel of the twelfth convolution layer, 256 convolution kernels in number, and is connected to the first upper convolution layer group by the same convolution structure as the fourth lower convolution layer group;

the first upper convolution layer group comprises a thirteenth convolution layer group, a fourteenth convolution layer group and a fifteenth convolution layer group, wherein the size of each convolution kernel of the thirteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the fourteenth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the fifteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, and the input features of the fourth lower convolution layer group are subjected to feature fusion;

the second upper convolution layer group comprises sixteenth, seventeenth and eighteenth convolution layers, wherein the size of each convolution kernel of the sixteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the seventeenth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the eighteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, and the input features of the third lower convolution layer group are subjected to feature fusion;

the third upper convolution layer group comprises nineteenth, twentieth and twenty-third convolution layers, wherein the size of each convolution kernel of the nineteenth convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the twentieth convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the twenty-first convolution layer is 1 x 1, the number of the convolution kernels is 256, and feature fusion is carried out on input features of the second lower convolution layer group;

the fourth upper convolution layer group comprises twenty-two, twenty-three and twenty-four convolution layers, wherein the size of each convolution kernel of the twenty-second convolution layer is 1 x 1, the number of the convolution kernels is 256, the size of each convolution kernel of the twenty-third convolution layer is 3 x 3, the number of the convolution kernels is 128, the size of each convolution kernel of the twenty-fourth convolution layer is 1 x 1, the number of the convolution kernels is 256, and feature fusion is carried out on input features of the first lower convolution layer group;

after each lower convolution layer group, performing down-sampling on the features input by each layer by using maximum mean pooling, wherein the size of the pooling layer is 2 x 2, the step length is 2, and the output feature graph is one half of the size of the input feature graph after each time of pooling;

an upsampling layer is added after each group of the upper convolution layers.

And after each upper convolution layer is layered, an upper sampling layer is used for carrying out upper sampling on the input characteristic diagram, so that the output characteristic diagram is twice of the input characteristic diagram, the width and the height are expanded to be twice of the original width, and an interpolation is carried out by a bilinear interpolation method to obtain the output characteristic diagram.

The invention respectively adopts a Hourglass network training model and a supervised learning mode to evaluate the effectiveness of the model in the recognition task by means of an intersection-to-parallel ratio IOC.

The Hourglass network is a feature extraction network and is an Hourglass-shaped structure, each network reduces a picture from high resolution to low resolution through a Bottom-Up process, and increases the picture from low resolution to high resolution through a Top-Down process, the network structure comprises a plurality of Pooling and Upsampling steps, and Upsampling can combine features of multiple resolutions to better extract key features of the image.

And performing feature extraction on the image by using a convolution kernel with the size of 3 x 3 in the Bottom-up process convolution layer in Hourglass. The convolution process is a process of mapping to a new value by performing linear transformation at each position of the image, and the formula is as follows:

and performing product operation through the input matrix and the weight, and adding an offset value bias, namely vector inner product + offset. From this point of view, the multi-layer convolution is performed with layer-by-layer mapping, and the whole structure constitutes a complex function.

Using the leak ReLU as the activation function,

the (Leaky ReLU) function is a widely used variant of the ReLu activation function, which outputs negative valuesThe input has a small slope. Since the derivative is always not zero, the occurrence of silent neurons can be reduced, learning based on gradient is allowed, and the problem that the neurons cannot learn after the Relu function enters a negative interval is solved.

In the steps, pooling is adopted for feature downsampling, a maximum pooling mode is used, and the pooling layer plays a role in feature fusion and dimension reduction in the convolutional neural network.

As shown in fig. 2, the classification module classifies the granulocytes located by the location module by using the constructed deep learning classification model:

the first layer is a convolutional layer, the number of convolutional cores is 64, the size of each convolutional core is 7 x 7, the second layer is a normalization layer and an activation function layer, the activation function adopts an LRelu activation function, the third layer is a pooling layer, the maximum pooling is adopted, the pooling size is 3 x 3, the fourth layer is a ResNet-Block classification model, and the fifth layer and the sixth layer are TBC-Block classification models;

the ResNet-Block classification model comprises two branches as shown in figure 3, wherein the first layer of the first branch is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second layer is the convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 x 3, the third layer is the convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second branch is the convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, a BN layer and an activation function layer are added after each convolution layer of each branch, feature maps of the two branches are fused, and a new input feature map is Y;

a branch is added to ResNet-Block, the current output is directly transmitted to the next layer of network, the operation of the current layer is skipped, and meanwhile, the gradient of the next layer of network is directly transmitted to the previous layer of network in the backward propagation process, so that the problem of gradient disappearance of the deep layer of network is solved, but the ResNet-Block increases the complexity of the network and makes the training more complex.

The TBC-Block classification models of the fifth layer and the sixth layer are optimized to be TBC-Block classification models through a Tied Block Convolution novel Convolution module, the Tied Block Convolution novel Convolution is the idea of using packet Convolution when an image is subjected to Convolution operation, but different from the packet Convolution, the Tied Block Convolution uses a Convolution kernel of shared weight values to carry out operation, and a final output characteristic diagram is obtained.

As shown in fig. 4, the TBC-Block modules of the fifth layer and the sixth layer have the same structure, and each TBC-Block module includes three branches and three fully-connected layers, where the first branch is a convolutional layer, the number of convolutional cores is 256, each convolutional core is 3 × 3, the second branch is a convolutional layer, the number of convolutional cores is 256, each convolutional core is 3 × 3, the third layer is a convolutional layer, the number of convolutional cores is 256, each convolutional core is 1 × 1, the feature maps of the first two branches are fused and input to the convolutional layer, and a BN layer and an activation function layer are added after each convolutional layer of each branch; the third branch is to fuse the input feature diagram x with the first two branches to obtain a new input feature diagram Y.

And finally, the network is a three-layer full-connection layer, an activation function is added behind the three-layer full-connection layer, a Softmax classifier is added behind the activation function, the Softmax classifier classifies the cells according to the dimensional characteristics, and the category of each cell is output.

And finally, extracting image characteristics by adopting a deep convolution mode of three fully-connected layers in the network, and classifying by using a SoftMax classifier according to the characteristics after obtaining the dimension characteristics by adopting a ReLu activation function as the final activation function of the network to obtain the image category.

The convolutional neural network has the property of invariance, and can robustly classify an object even if the object is placed in different places. CNNs may have invariance to displacement (Translation), view angle (Viewpoint), Size (Size), Illumination (Illumination) (or a combination of the above).

The data enhancement mode of the invention adopts the methods of random rotation transformation, stretching transformation, Gaussian noise addition, random change of the sequence of three channels of RGB of the image and the like.

Rotation | Reflection transform (Rotation/Reflection): randomly rotating the image for a certain angle; the orientation of the image content is changed, and the image is rotated by 45 degrees and 90 degrees and 180 degrees;

stretching and converting: the input rectangular image is stretched into a square image equal to the image width.

Flip transform (Flip) in which flipping the image along the horizontal direction is used;

and (3) noise disturbance, namely randomly disturbing each pixel RGB of the image, wherein Gaussian noise is added to the image.

collecting 2000 granulocyte pictures as a training set, and carrying out normalization operation on the sizes of the collected pictures to ensure that the pixel size of each picture is 512 x 512;

manually marking each cell and the category of each cell in each picture in the training set, wherein each image comprises 6 granulocytes to obtain a marked training set;

firstly, stretching a rectangular image into a square image with the same width as the image by rotating (for example, 45 degrees and 90 degrees and 180 degrees) and stretching and transforming each image in an annotation training set, then adding Gaussian noise into the obtained square image, randomly disturbing each pixel RGB of the image, randomly changing the sequence of three channels of the image RGB, and reducing background interference caused by uneven dyeing;

and then counting the proportion of each type of cells in each image in the labeling training set, giving a cell enhancement weight factor of each type according to the number ratio of each type of cells in the image, and performing number enhancement of different types of cells in each image to different degrees so as to achieve the effect of balancing the types.

And (3) making a PASCAL VOC data set, mainly providing a training sample for a target identification network, wherein the unlabeled data in the cell sample image is not taken as a sample in the data set.

And step three, inputting the labeling training set processed in the step two into a Hourglass network model of a positioning module for training, enabling the Hourglass network model to learn the characteristics of various cells labeled in the labeling training set, adopting a back propagation algorithm and a random gradient descent method for training the Hourglass network, performing back propagation iteration to update the weight of each layer according to the magnitude of the Loss value of forward propagation, stopping the training model until the value of the model tends to converge, and obtaining the trained model when the positioning accuracy of the Hourglass network model to the cells labeled in the labeling training set is 98%, wherein the positioning accuracy of the Hourglass network model to the cells in the training labeling training set is 100% of the images of all the cells in the labeling training set/the number of all the cells manually labeled in the labeling training set.

inputting the processed labeling training set into a constructed deep learning classification model adopted by a classification module for training, outputting parameters capable of identifying morphological characteristic information of various cells, classifying through a full connection layer, and outputting the types of the cells, wherein when the accuracy of the constructed deep learning classification model for classifying the cells labeled in the labeling training set is 90%, a trained classification model is obtained, and the accuracy of the constructed deep learning classification model for positioning the cells in the labeling training set is 100% of the number of the cells of all the cell types in the labeling training set/the number of all the cells of the manually labeled types in the labeling training set.

The positioning module obtains a characteristic graph through a Hourglass network in training through the acquired key points, and the characteristic graph is obtained through a Gaussian kernel formula as follows:

wherein sigma_pIs the target scale-adaptive standard deviation, p is the keypoint on the feature mapThe calculated feature map coordinates are dispersed in the thermodynamic diagram

Where R is the scale of the output size, W is the width of the image, H is the height of the image, and C is the class of object detection. If two gaussian functions overlap for the same class C, we choose the element level largest. The objective function is trained as follows, the Focal local of the pixel-level logistic regression:

wherein alpha and beta are the hyper-parameters of focal loss, alpha is 2 and beta is 4 in the experiment,

is a predicted center point value, Y _xyc1 is expressed as a positive sample, the other case is expressed as a negative sample, Y_xycThe method is characterized in that the method is a real central point value obtained by a Gaussian model, N is the number of key points in an image, and in order to normalize all Focal local values and multiply the normalized Focal local values by one N before a formula, the formula operation is carried out between a predicted value and a real value.

Obtaining the result coordinates of the positions of the cells in the large image, cutting out the positioned cells, only leaving single complete cells in the visual field, and performing size normalization processing to complete a new classification model data set;

the positions of the cells are obtained through positioning by the positioning module, the cells in the picture are cut, a new data set with only single cells in the visual field is obtained, and classification interference caused by the similarity of adjacent cell structures is reduced.

Sending the cells into a Resnet + TBC-Block-based target classification network for training, and obtaining a trained classification network model when the classification accuracy of the classification network on the cells marked in the marking training set is 90%, wherein the classification accuracy of the classification network on the cells marked in the marking training set is that the classification network correctly classifies the number of all cells in the marking training set/the number of all cells in the manual marking training set is 100%, so as to obtain a trained deep learning classification model; the training model is obtained, a new network architecture is designed by using a novel Tied Block Convolution module, the effect of the network on classifying fine-grained cells is improved, and the effect of the classification model is evaluated by Mean Average AP value. And carrying out back propagation iteration to update the weight of each layer according to the magnitude of the Loss value of the forward propagation until the value of the model tends to converge, which represents that the model is trained well, and representing that the network is trained well when the value is reduced to be not reduced any more.

The model uses a deep learning recognition network based on ResNet for backbone networks, with the size of the image input to the network being 128 x 128.

In the above steps, pooling is adopted for feature downsampling, and a maximum pooling mode is used.

The training classification module adopts a back propagation algorithm and a random gradient descent method, carries out back propagation iteration to update the weight of each layer according to the magnitude of the forward propagation Loss value, stops training the model until the Loss value of the model tends to converge, uses Class-balanced-Focal-local as a Loss function of the model for the unbalanced data set in the classification module, and improves the capacity of the model for processing the unbalanced data.

The formula of the Loss function of Class-Balanced-Focal-local is as follows:

wherein the formula (1-beta)ⁿ) V (1- β) represents valid samples of each class of sample data, where n is the number of samples and β ∈ (0,1) is a hyper-parameter, L (p, y) is the Focal-Loss Loss function, where p is the probability of each class and y is the true class of each class;

the research of the invention verifies on a data set obtained by clinical cases, the system can effectively count granulocytes in a rapid and accurate classification way, the reliability of the generalization capability and the popularization capability of the model is strong, the manual evaluation of the granulocytes at present has the defects of strong subjectivity, non-strict standard and time consumption, an experienced doctor observes 1000 granulocytes pictures and needs ten hours, the system detects 1000 granulocytes pictures, the time consumption is only 180 seconds, and the workload of the doctor is greatly reduced; the system has high actual application accuracy and reduces errors caused by subjectivity. The system can assist and partially replace doctors to perform granulocyte classification counting, and has good application prospect.

Claims

1. A medical image multi-classification recognition system based on improved ResNet is characterized by comprising a positioning module and a classification module, wherein the positioning module utilizes a Hourglass network model to perform feature extraction on an input granulocyte picture, positions all cells in the granulocyte picture respectively, cuts out the positioned cells, leaves single complete cells, and performs size normalization processing on all the cut cells;

the ResNet-Block classification model comprises two branches, wherein the first layer of the first branch is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second layer is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 3 x 3, the third layer is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, the second branch is a convolution layer, the number of convolution kernels is 128, the size of each convolution kernel is 1 x 1, and a BN layer and an activation function layer are added after each convolution layer of each branch;

2. The improved ResNet-based medical image multi-classification recognition system as claimed in claim 1, wherein the Hourglass network module is a symmetric structure comprising four lower convolution layer groups and four upper convolution layer groups;

the second upper convolution layer group comprises sixteenth, seventeenth and eighteenth convolution layers, wherein the size of each convolution kernel of the sixteenth convolution layer is 1 x 1, the number of convolution kernels is 256, the size of each convolution kernel of the seventeenth convolution layer is 3 x 3, the number of convolution kernels is 128, the size of each convolution kernel of the eighteenth convolution layer is 1 x 1, and the number of convolution kernels is 256;

the third upper convolution layer group comprises nineteenth, twentieth and twenty-third convolution layers, wherein the size of each convolution kernel of the nineteenth convolution layer is 1 x 1, the number of convolution kernels is 256, the size of each convolution kernel of the twentieth convolution layer is 3 x 3, the number of convolution kernels is 128, the size of each convolution kernel of the twenty-first convolution layer is 1 x 1, and the number of convolution kernels is 256;

an upsampling layer is added after each group of the upper convolution layers.

3. The improved ResNet-based medical image multi-classification recognition system as claimed in claim 2, wherein the training process of the Hourglass network model adopted by the positioning module comprises the following steps:

4. The improved ResNet-based medical image multi-class recognition system according to claim 3, wherein the training process of the constructed deep learning classification model adopted by the classification module comprises the following steps:

5. The improved ResNet-based medical image multi-classification recognition system according to claim 4, further comprising a counting module, wherein the counting module counts different types of cells outputted from the classification module respectively, and generates a cell count classification report.