CN113486930B

CN113486930B - Method and device for establishing and segmenting small intestine lymphoma segmentation model based on improved RetinaNet

Info

Publication number: CN113486930B
Application number: CN202110678494.9A
Authority: CN
Inventors: 谢飞; 郜刚
Original assignee: Shaanxi Great Wisdom Medical Care Technology Co ltd
Current assignee: Shaanxi Great Wisdom Medical Care Technology Co ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2024-04-16
Anticipated expiration: 2041-06-18
Also published as: CN113486930A

Abstract

The invention discloses a small intestine lymphoma segmentation model establishment and segmentation method and device based on improved RetinaNet. The segmentation model constructed by the method is based on a RetinaNet network model, replaces the original FPN structure, uses pyramid networks with different scale features fused in a nonlinear manner in various modes, adds a channel attention module in a main network, improves the feature extraction capability of the model for targets with different scales, and reduces the influence of background factors on training. The method of free-anchor is combined, and the self-adaptive matching of the candidate frame and the detection target is realized, so that targets with different forms are better aimed. Compared with the traditional model, the segmentation model has better performance in segmentation of the small intestine lymphoma.

Description

Method and device for establishing and segmenting small intestine lymphoma segmentation model based on improved RetinaNet

Technical Field

The invention belongs to the field of medical image segmentation, and particularly relates to a small intestine lymphoma segmentation model establishment and segmentation method and device based on an improved RetinaNet.

Background

With the development of computer hardware, it becomes possible to store data on a large scale. The large amount of data provides a neural network model with rich resources available for learning. Meanwhile, the development of an image processor (GPU) enables a computer to rapidly process matrix operation, so that the training speed of a model is greatly increased. This series of developments makes deep learning the dominant algorithm in the current image recognition field. The difference between the deep learning and the traditional method is that the deep learning does not need to manually extract the features, but the most obvious features in the image are adaptively extracted through learning, so that the influence of the difference of the manually extracted features on the final classification result is avoided.

Currently, more and more deep learning methods are applied to diagnosis of medical CT images, and good effects are obtained. Yan et al propose a 3DCE network structure that utilizes the spatial correlation of features between slices, stacks adjacent slices by channels to form a plurality of three-channel data modules, the middle channel of each data module contains labeled information, and these data modules are sent to the network to perform feature extraction and feature fusion respectively, thereby facilitating the improvement of the model detection effect. Li et al propose MVP-Net network model, this model combines the experience of clinical diagnosis, converts the same section into different window types and window width's data types, and send the data after the conversion into the network respectively through different branches and carry out feature extraction and feature fusion, has effectively improved the detection precision of model. However, the above models are all aimed at detecting some tumor diseases with higher incidence rate, and the network model structure is too complex, which requires high computing resources.

At present, few people directly use a target detection method in CT images of the small intestine lymphoma, and if the prior art is directly applied to detection of the small intestine lymphoma, the effect is not ideal. This is because: small intestine lymphoma, which is a kind of small intestine malignant tumor, has very complex morphological structure. The types of morphology and pathology can be classified into burkitt's lymphoma, diffuse large B-cell lymphoma, mantle cell lymphoma, etc. The small intestine lymphoma has complex morphological structure, large scale difference and other factors, and some lymphomas are difficult to effectively identify, and the main difficulties are as follows: 1) The gastrointestinal tract itself is complex in its environment. The shape of the intestinal tract in the gastrointestinal tract varies widely and varies greatly from individual to individual. In addition, tissues and organs such as bones, fat layers, pancreas, kidneys and stomach are arranged around the intestinal tract, and can interfere with tumor recognition. 2) The tumor scale is greatly different. In particular, some smaller tumors are less distinct and have a high degree of similarity to the surrounding intestinal tract. 3) The tumor morphological characteristics are various. The large difference in these morphological scales presents a great challenge to the correct segmentation of small intestine lymphomas.

Disclosure of Invention

The invention aims to provide a small intestine lymphoma segmentation model establishment and segmentation method and device based on improved RetinaNet, which are used for solving the problems that tumors in the prior art have similarity with surrounding background, the segmentation accuracy is low due to the fact that the existing algorithm is easy to interfere with surrounding background information of the tumors when the existing algorithm is used for segmentation, meanwhile, small intestine lymphoma has large-scale morphological difference, and the characteristic extraction capability of the existing method for targets with different scales is insufficient.

In order to realize the tasks, the invention adopts the following technical scheme:

a small intestine lymphoma segmentation model establishment method based on improved RetinaNet comprises the following steps:

step 1: acquiring an abdomen slice image data set, and labeling each abdomen slice image to acquire a label set;

step 2: establishing a RetinaNet model, wherein the RetinaNet model comprises a trunk network, a characteristic pyramid network and a detection network, the trunk network comprises four trunk blocks, a channel attention module is connected behind each trunk block, the trunk network is used for extracting characteristic diagrams { C2, C3, C4, C5, C6 and C7} with different scales, wherein C2, C3, C4 and C5 are respectively output by different trunk blocks, C6 is obtained after being subjected to downsampling by C5, and C7 is obtained after being subjected to downsampling by C6;

the feature pyramid network is used for carrying out feature enhancement on feature graphs with different scales to obtain enhanced feature graphs { P2, P3, P4, P5, P6 and P7}, wherein P3, P4, P5, P6 and P7 are respectively obtained by enhancing C3, C4, C5, C6 and C7 of the same layer, and P2 is obtained by carrying out downsampling by P3 and adding by C2;

the detection network is arranged behind each layer of enhanced feature map of the feature pyramid network, and comprises a regressor and a classifier, wherein the regressor is used for generating an anchor frame, and the classifier is used for classifying targets in the anchor frame;

step 3: pre-training the Retinonet model, initializing parameters of the pre-trained Retinonet model, training the initialized Retinonet model by taking an abdomen slice image data set as a training set and combining a label set, and taking the trained Retinonet model as a small intestine lymphoma segmentation model.

Further, the backbone network is ResNeXt101, and the feature pyramid network is NAS-FPN.

Further, the enhanced feature map is obtained by the following method:

and extracting two feature images from the feature images with different scales from each enhanced feature image, and sequentially carrying out unified scale, feature fusion and convolution to obtain the enhanced feature image.

A small intestine lymphoma segmentation method based on improved RetinaNet comprises the following steps:

step one: acquiring an abdomen slice image to be detected;

step two: inputting the to-be-detected abdominal section image into any small intestine lymphoma segmentation model obtained based on the improved RetinaNet small intestine lymphoma segmentation model building method, and obtaining the segmentation result of the to-be-detected abdominal section image.

A small intestine lymphoma segmentation device based on improved RetinaNet, the device comprising a processor and a memory for storing a plurality of functional modules capable of running on the processor, the functional modules comprising a small intestine lymphoma segmentation model and a segmentation module;

the small intestine lymphoma segmentation model is obtained by adopting a small intestine lymphoma segmentation model establishment method based on improved RetinaNet;

the segmentation module is used for acquiring an abdominal slice image to be detected, inputting the abdominal slice image to be detected into the small intestine lymphoma segmentation model, and obtaining a segmentation result of the abdominal slice image to be detected.

A storage medium having stored thereon a computer program which when executed by a processor implements a method of small intestine lymphoma segmentation based on improved RetinaNet.

Compared with the prior art, the invention has the following technical characteristics:

(1) The invention uses a model trained by the DepLesion dataset as a pre-training model. The parameters of the shallow layer are frozen, and only the parameters of the deep layer are finely adjusted, so that the effect of the model on small intestine lymphoma segmentation is improved.

(2) The invention adopts a pyramid network which is obtained by nonlinear fusion of different scale characteristics in a plurality of modes and a neural network searching algorithm. The feature extraction capability of the model for targets with different morphological scales is improved.

(3) The invention introduces a channel attention module in the model backbone network, improves the extraction capability of the model on small intestine lymphoma characteristics, and reduces the influence of background factors on training.

(4) The invention combines the free-anchor method to realize the self-adaptive matching of the candidate frame and the detection target, thereby better aiming at targets with different forms and improving the positioning and identifying ability of the model to tumors.

Drawings

FIG. 1 is a network structure diagram of a small intestine lymphoma segmentation model;

FIG. 2 is a pre-training flow chart;

FIG. 3 is a schematic diagram of a backbone block structure;

FIG. 4 is a schematic diagram of a channel attention module configuration;

FIG. 5 is a schematic diagram of a backbone network connection and parameter configuration in an embodiment;

FIG. 6 is a schematic diagram of a feature pyramid network architecture;

FIG. 7 is a schematic diagram of a fusion block composition;

FIG. 8 is an ablation experimental result;

fig. 9 is a comparison of experiments for different models.

Detailed Description

First, the technical vocabulary presented in the present invention will be explained:

RetinaNet: retinaNet is a single-stage object detection model proposed by He Kaiming et al. The model mainly comprises a feature pyramid and a Focal loss. The model has a simple structure and has a good detection effect on the unbalance problem of positive and negative samples.

ImageNet: the system is a computer vision system identification project, is a database with the largest image identification in the world at present, and comprises 1000 or more than ten thousand pictures of 1000 categories.

Deep version database: images in this database were developed by a team at the National Institutes of Health Clinical Center (NIHCC) to include a variety of lesion types, such as kidney lesions, bone lesions, lung nodules, and lymphadenopathy. There is a total of 30000 patients 'data and detailed labeling is provided for each patient's focal area.

Anchor frame (anchor box): the anchor box is similar to the candidate box for delineating the location of the target region.

Free-Anchor: literature origin: zhang X, wan F, liu C, et al learning to match anchors for visual object detection.arXiv 2019[ J ]. ArXiv preprintcs.CV/1909.02466.

The embodiment discloses a small intestine lymphoma segmentation model establishment method based on improved RetinaNet, which comprises the following steps:

Specifically, the labeling obtaining a label in step 1 includes:

in the form of tags defined by the COCO dataset. The COCO data set label adopts the data type in an xml format, and the field information related to model training is as follows:

id is used to represent the unique identification of the tumor in the training image. width and height are used to represent the size of the tumor and the width and height of the tumor detection frame, respectively. The tumor detection frame is an external rectangle marked by the doctor. file_name represents the file name of the video slice.

Specifically, the backbone network is ResNeXt101, and the feature pyramid network is NAS-FPN.

Specifically, the enhanced feature map is obtained by the following method:

the NAS-FPN fuses the feature maps of different scales in two ways. The two modes are sum and global mapping, respectively. Before feature fusion is carried out in the two modes, the extracted feature images with different scales are required to be adjusted to the same scale. sum is the element-wise addition of the two feature maps after adjustment. global mapping obtains the attention score of each channel through global maximum pooling operation and sigmoid of one of the feature maps after adjustment, multiplies the attention score with the other feature map, and then adds the multiplied feature map with the initial feature map to obtain the enhanced feature map.

Specifically, the structure of NAS-FPN is shown in Table 1:

TABLE 1 composition of NAS-FPN

Outputting a feature map	Input features FIG. 1	Input features figure 2	Connection mode
				p4_1	c6	c4	global pooling
p4_2	p4_1	c4	sum
				p3	p4_2	c3	sum
p4	p3	p4_2	sum
				p5_temp	p4	p3	global pooling
p5	c5	p5_temp	sum
				p7_temp	p5	p4_2	global pooling
p7	c7	p7_temp	sum
				p6	p7	p7	global pooling

Wherein c3, c4, c5, c6, c7 represent feature maps of different scales extracted by the backbone network. p3, p4, p5, p6, p7 represent feature graphs of the feature pyramid output. The output feature map is correspondingly equal to the original feature map in size. In order to increase the detection capability of the model on small targets, a p2 layer with higher resolution is also used for target detection, and the p2 layer is obtained by directly adding a p3 layer and a c2 layer through downsampling.

Specifically, the detection network generates an anchor box (anchor box) at each of the P2-P7 layers, and the size of the anchor box depends on the size of the P2-P7 layers (the smaller the feature map is, the larger the size of the anchor box mapped to the original). The number is W×H×9.W and H represent the width and height of the feature map, respectively. The regressor is used for fine tuning the anchor frame to generate coordinates of the detection frame (the generated coordinates are generally the upper left corner coordinates and the lower right corner coordinates of the detection frame and are used for representing the position of the detection frame). The classifier classifies the targets in the anchor frame for determining their specific categories. Each of the P2-P7 layers has a classifier and a regressor. The final result was obtained by Non-maximal inhibition (Non-Maximum Suppression, NMS).

Specifically, the pre-training includes: and sending the deep version data with the labels into a model used by the method for training, randomly initializing parameters of the model until training loss of the model converges, stopping training, and storing the trained parameters of the model as the type of the pth file.

Specifically, during training in step 3, small intestine lymphoma data with labels are sent into a model for training, corresponding parameters are sequentially imported from files stored in a pre-training task in the parameter initialization process, and parameters imported in the pre-training stage are frozen (the frozen model does not update parameters of a frozen network layer in the reverse propagation stage in the training process), until training loss converges, and a final trained network model is obtained.

Specifically, during training in step 3, the loss function is Free-Anchor.

The embodiment discloses a small intestine lymphoma segmentation method based on improved RetinaNet, which comprises the following steps:

step one: acquiring an abdomen slice image to be detected;

step two: and inputting the abdominal section image to be detected into a small intestine lymphoma segmentation model to obtain a segmentation result of the abdominal section image to be detected.

In this embodiment, a small intestine lymphoma segmentation device based on improved RetinaNet is disclosed, the device comprises a processor and a memory for storing a plurality of functional modules capable of running on the processor, wherein the functional modules comprise a small intestine lymphoma segmentation model and a segmentation module;

the segmentation model based on the dual-attention mechanism is obtained by adopting a segmentation model establishment method based on the dual-attention mechanism;

Specifically, the segmentation result is a focal region of small intestine lymphoma.

The present embodiment also discloses a storage medium having stored thereon a computer program which, when executed by a processor, implements a small intestine lymphoma segmentation method based on improved RetinaNet.

Example 1

This embodiment uses pytorch as the framework of the experiment and RetinaNet as the baseline network model. The backbone network is ResNext101, the size of the input picture is 512×512, the model is Adam (Adaptive moment estimation) optimizer, and the initial learning rate is 0.00001. The tool class ReduceLROnPlateau of pytorch is used to adjust the learning rate in the training process. Wherein the parameter is set to 4, that is, the learning rate is reduced without reducing the model loss of four continuous iterations. Each time the learning rate is updated, the new learning rate is changed to 80% of the last iterative learning rate. In the loss function, α is set to 0.25, and γ is set to 2. The number of iterations was 18. Anchor scale is set to [0.25,0.5,1.2].

The abdomen slice image dataset of the present embodiment has 34 total bits of patient data with labels. Of these, 26 patients were used for training and 8 patients were used for testing. There were 58 doctors who confirmed the illness but did not have patient data to be noted. In the noted patient data, the DICOM file for each patient is a series of three-dimensional images stacked along the z-axis obtained by a slice scan of the patient's abdomen layer by a corresponding device. The average number of the sections of each patient is about 40, and the number of the sections can reach 80. Some of the patient slices are lymphomas-bearing slices, and some of the slices are labeled by the physician. The format of each patient slice is DICOM format. After the doctor marks the tumor data, the generated marking file is also in DICOM format. Each DICOM file contains, in addition to medical image information, corresponding DICOM data header information. Each data header information contains the identified related information. The patient lymphoma slice information and doctor labeling information contain unique labeling information, the labeling information can establish a one-to-one mapping relation between the patient lymphoma data and the doctor labeling data, and the lymphoma data marked by the doctor can be separated from the original slice data of the patient through the relation. In this example, the pydicom library was used to isolate lymphoma data, resulting in 954 lymphoma slice data labeled by the physician. To facilitate the training of subsequent migration learning, we convert all DICOM format files into JPG format. The converted picture size is 512 x 512.

In this embodiment, the model trained by the existing labeled data is used to detect unlabeled patient data. And the result obtained by detection is selected as training data of the model with high confidence. In order to ensure the accuracy of the obtained results, three models with different network structures are selected for detection, the final labeling result is obtained by comparing the results of the models and using a manual screening mode, and the screening is mainly based on the relevance between adjacent sections of the same patient. The model is iterated twice, and each iteration comprises the data obtained by the last training through screening. Finally, 603 pieces of labeling data are obtained, and all the obtained data are used for training. The data for training totals 1353 and the data for testing 154.

The present embodiment adopts the average accuracy (Average Precision, AP) and Recall (Recall) as the main evaluation indexes of the target detection model. Table 1 details the basic indices required for calculating the above indices.

Table 2 basic indexes required for detecting indexes

Recall represents the Recall ratio of the model to the tumor, i.e., how many tumors are detected by the model among the tumors that are actually present. Precision is the accuracy of the model to tumor detection, indicating how many tumors are correct among the tumors detected by the model. The magnitude of the AP value is represented by the area under the PR curve. The PR curve is a curve obtained by taking the recovery value as the horizontal axis and the precision value as the vertical axis. The larger the AP value is, the better the detection effect of the model is. For ease of calculation, the AP values are typically averaged from 11 points equally spaced on the PR curve.

The Pascal VOC data set usually uses AP0.5 to reflect the quality of the detection result, and 0.5 represents the IOU value between the detection result and the true annotation, namely, the IOU value is more than 0.5, which is considered as a positive example. However, starting from the COCO dataset, the target detection was gradually evaluated using AP {0.5:0.95 }. AP {0.5:0.95} is calculated by calculating the value of the IOU between 0.5 and 0.95, once every interval of 0.05, and then averaging all the values obtained. The method can be used for representing the quality of the model detection result and reflecting the positioning capability of the model to the target.

The present embodiment is divided into two parts altogether, the first part being that the individual modules of the model are subjected to an ablative test, the interactions between the individual modules or modules, and their effects on the results of the test, have been verified. The second part is to compare the model with the target detection model proposed in recent years, and the superiority of the model of the invention on small intestine lymphoma detection is verified.

S represents the incoming channel attention module and Nms represents the use of Nms-FPN network structure incorporating the p2 layer. FA represents the training pattern using the best Anchor matching mechanism. D represents the first phase parameters of the frozen network using the deep version pre-trained model. The comparative experimental setup herein is shown in table 3, where "v" indicates that the corresponding module was selected. For example, S-Nms-D represents a channel attention module, an NMS-FPN network structure added to a p2 layer is used, and a DeepLesion data set is used for migration learning. Baseline in the table represents the original RetinaNet network model, pre-trained using ImageNet dataset.

TABLE 3 model design

Method	S	Nms	FA	D
					baseline
D				√
					S+D	√			√
Nms+D		√		√
					FA+D			√	√
Nms+D+FA		√	√	√
					Nms+D+S	√	√		√
Nms+D+FA+S	√	√	√	√

The following table shows the results between different control groups, using AP _0.5 ，AP _{0.5:0.95} ，Recall _0.5 ，Recall _{0.5:0.95} To evaluate the experimental results. Wherein the recovery decision threshold is 0.5. The experimental results are shown in table 4.

Table 4 ablation experimental results

As can be seen from table 4. The initial model (baseline) test is not good, probably because it uses data from natural images for pre-training, which differ significantly from the medical image data used herein. After pre-training using the deep version dataset, the APs of the model _0.5 The improvement is 7.2 percent. Illustrating that transfer learning between medical image data can be effectively improvedDetection capability of the lifting model. NMS-FPN network structure (Nms+D), channel attention mechanism (S+D) and optimal anchor matching mechanism (FA+D) with p2 layer are respectively added into migration learning models, and three models AP are respectively added _0.5 Values were raised by 22.1%, 5.3% and 3.3%, respectively. The three modules are described as having a certain lifting effect on the initial model. The Nms+D module has the most obvious lifting effect, which indicates that improving the extraction capability of the model to multi-scale features has very important effect on enhancing the detection effect of the model. Nms+d+fa versus nms+d+s, model AP _{0.5:0.95} 、Recall _{0.5:0.95} The value is obviously improved, and the optimal anchor matching mechanism can enable the model to learn better tumor characteristics, so that the positioning capability of the model on a detection target is effectively improved. The model of the invention (nms+d+fa+s) compares with baseline at AP _0.5 、AP _{0.5:0.95} 、Recall _0.5 、Recall _{0.5:0.95} The detection indexes are respectively improved by 32.4%, 65.3%, 21.8% and 28.2%, which shows that the model provided by the invention has obvious effect improvement compared with the original model.

FIG. 8 shows the results of an ablation experiment, wherein a) is the ground trunk method, b) is the baseline method, c) is the D method, D) is the S+D method, e) is the Nms+D method, f) is the FA+D method, g) is the Nms+D+FA method, h) is the Nms+D+S method, i) is the Nms+D+FA+S method.

The experimental results show that the detection effect of the model is obviously improved after the three modules are combined. The specific improvement is that the model has better recognition capability for some smaller targets after the feature pyramid structure is combined with the improvement. Meanwhile, the introduction of the attention mechanism enables the model to avoid the interference of some background information, and the false detection rate of the model is obviously reduced. After the optimal anchor matching mechanism is combined, the model has obvious improvement on the detection of targets with complex forms and the positioning capability of some tumors.

To further verify the superiority of the model presented herein, the model herein was compared to documents Yang Z, liu S, hu H, et al Reppoins: point set representation for object detection [ C ]// Proceedings of the IEEE/CVF International Conference on Computer vision.2019:9657-9666. All models used deep version data for transfer learning, where YOLOv3 used dark 53 for backbone structure and other models used ResNext101 for network backbone structure. Table 5 shows the results of the different model comparison experiments.

TABLE 5 results of comparative experiments on different models

Fig. 9 shows the results of a comparison of the different model experiments. a) is the GT method, b) is the FCOS method, c) is the Reppeint method, d) is the PANet method, e) is the LibraNe method, f) is the YOLOv3 method, g) is the present method.

By comparison with some detection models in recent years, it can be seen that the model presented herein has superior detection capabilities on small intestine lymphoma datasets.

Claims

1. The small intestine lymphoma segmentation model building method based on the improved RetinaNet is characterized by comprising the following steps of:

2. The method for establishing the improved RetinaNet-based small intestine lymphoma segmentation model according to claim 1, wherein the backbone network is ResNeXt101, and the feature pyramid network is NAS-FPN.

3. The method for establishing the small intestine lymphoma segmentation model based on the improved RetinaNet according to claim 1, wherein the enhancement characteristic map is obtained by the following method:

4. The small intestine lymphoma segmentation method based on the improved RetinaNet is characterized by comprising the following steps of:

step one: acquiring an abdomen slice image to be detected;

step two: inputting the to-be-detected abdominal section image into the small intestine lymphoma segmentation model obtained by the small intestine lymphoma segmentation model establishment method based on the improved RetinaNet of any one of claims 1-3, and obtaining the segmentation result of the to-be-detected abdominal section image.

5. A small intestine lymphoma segmentation device based on improved RetinaNet, characterized in that the device comprises a processor and a memory for storing a plurality of functional modules capable of running on the processor, said functional modules comprising a small intestine lymphoma segmentation model and a segmentation module;

the small intestine lymphoma segmentation model is obtained by adopting the small intestine lymphoma segmentation model establishment method based on the improved RetinaNet according to any one of claims 1-3;

6. A storage medium having stored thereon a computer program, which when executed by a processor, implements the improved RetinaNet-based small intestine lymphoma segmentation method according to claim 4.