CN113486930A

CN113486930A - Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet

Info

Publication number: CN113486930A
Application number: CN202110678494.9A
Authority: CN
Inventors: 谢飞; 郜刚
Original assignee: Shaanxi Great Wisdom Medical Care Technology Co ltd
Current assignee: Shaanxi Great Wisdom Medical Care Technology Co ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-10-08
Anticipated expiration: 2041-06-18
Also published as: CN113486930B

Abstract

The invention discloses a method and a device for establishing and segmenting a small intestinal lymphoma segmentation model based on improved RetinaNet. The segmentation model built by the method is based on a RetinaNet network model, replaces the original FPN structure, uses a pyramid network with different scale characteristics fused in a nonlinear way in multiple modes, and adds a channel attention module in a main network, so that the feature extraction capability of the model on targets with different morphological scales is improved, and the influence of background factors on training is reduced. And a free-anchor method is combined to realize the self-adaptive matching of the candidate frame and the detection target, so that the targets with different forms are better targeted. Compared with the traditional model, the segmentation model provided by the invention has better performance in the segmentation of the small intestinal lymphoma.

Description

Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet

Technical Field

The invention belongs to the field of medical image segmentation, and particularly relates to a method and a device for establishing and segmenting a small intestinal lymphoma segmentation model based on improved RetinaNet.

Background

With the development of computer hardware, large-scale storage of data becomes possible. The large amount of data provides a neural network model with rich resources for learning. Meanwhile, due to the development of an image processing unit (GPU), a computer can rapidly process matrix operation, and the training speed of the model is greatly increased. The series of developments make deep learning the mainstream algorithm in the image recognition field at present. The difference between the deep learning and the traditional method is that the deep learning does not need to artificially extract features, but extracts the most significant features in the image in a self-adaptive manner through learning, so that the influence of the difference of the artificially extracted features on the final classification result is avoided.

At present, more and more deep learning methods are applied to the diagnosis of medical CT images, and achieve good effects. Yan et al propose a 3DCE network structure, which utilizes the spatial correlation of features between slices, and stacks adjacent slices according to channels to form a plurality of three-channel data modules, the middle channel of each data module contains labeled information, and the data modules are sent to a network to respectively extract features and perform feature fusion, thereby facilitating the promotion of model detection effect. Li et al propose an MVP-Net network model, which combines the experience of clinical diagnosis, converts the same slice into data types of different window types and window widths, and sends the converted data into the network through different branches for feature extraction and feature fusion, thereby effectively improving the detection precision of the model. However, the above models are all used for detecting some tumor diseases with high incidence, and the network model structure is too complex, which requires high consumption of computing resources.

At present, few people directly apply a target detection method to a small intestine lymphoma CT image, and if the prior art is directly applied to the detection of the small intestine lymphoma, the effect is not ideal. This is because: small intestinal lymphoma, a type of small intestinal malignant tumor, is quite complex in morphological structure. It can be classified into Burkitt's lymphoma, diffuse large B cell lymphoma, mantle cell lymphoma, etc. according to its morphological structure and pathological type. The method is characterized in that the method is based on the factors of complex morphological structure and large scale difference of small intestinal lymphoma, and the lymphoma is difficult to effectively identify and has the following main difficulties: 1) the complexity of the environment of the gastrointestinal tract itself. The shape of the intestinal tract in the gastrointestinal tract varies widely and varies greatly from individual to individual. In addition, the surrounding of the intestinal tract contains tissues and organs such as bones, fat layers, pancreas, kidneys and stomach, which interfere with tumor recognition. 2) The tumor size variation is large. In particular, some smaller tumors are less well characterized and have a high degree of similarity to the peripheral intestinal tract. 3) The morphological characteristics of tumors are diverse. These large differences in morphological dimensions present a significant challenge to the correct segmentation of small bowel lymphomas.

Disclosure of Invention

The invention aims to provide a small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet, which are used for solving the problems that tumors in the prior art are similar to surrounding backgrounds, the segmentation accuracy is low due to the fact that the existing algorithm is easily interfered by the background information around the tumors when segmentation is carried out, meanwhile, the small intestinal lymphoma has large scale morphological difference, the feature extraction capability of the existing method for targets with different scale morphologies is insufficient, and the like.

In order to realize the task, the invention adopts the following technical scheme:

a method for establishing a small intestine lymphoma segmentation model based on improved RetinaNet comprises the following steps:

step 1: acquiring an abdominal slice image data set, and labeling each abdominal slice image to obtain a label set;

step 2: establishing a RetinaNet model, wherein the RetinaNet model comprises a main network, a feature pyramid network and a detection network, the main network comprises four main blocks, a channel attention module is connected behind each main block, the main network is used for extracting feature maps { C2, C3, C4, C5, C6 and C7} with different scales, wherein C2, C3, C4 and C5 are respectively output by different main blocks, C6 is obtained after downsampling by C5, and C7 is obtained after downsampling by C6;

the feature pyramid network is used for performing feature enhancement on feature maps of different scales to obtain enhanced feature maps { P2, P3, P4, P5, P6 and P7}, wherein P3, P4, P5, P6 and P7 are respectively obtained by enhancement of C3, C4, C5, C6 and C7 of the same layer, and P2 is obtained by downsampling by P3 and adding C2;

the detection network is arranged behind each layer of enhanced feature map of the feature pyramid network, and comprises a regressor and a classifier, wherein the regressor is used for generating an anchor frame, and the classifier is used for classifying targets in the anchor frame;

and step 3: pre-training a RetinaNet model, initializing parameters of the RetinaNet model after pre-training, training the initialized RetinaNet model by using an abdominal section image data set as a training set integration label set, and taking the trained RetinaNet model as a small intestinal lymphoma segmentation model.

Further, the backbone network is ResNeXt101, and the feature pyramid network is NAS-FPN.

Further, the enhanced feature map is obtained by the following method:

and extracting two feature maps from feature maps with different scales for each enhanced feature map, and sequentially carrying out uniform scale, feature fusion and convolution on the two feature maps to obtain the enhanced feature map.

A method for segmenting small intestine lymphoma based on improved RetinaNet comprises the following steps:

the method comprises the following steps: acquiring an abdominal section image to be detected;

step two: inputting the abdominal section image to be detected into any small intestinal lymphoma segmentation model obtained by the improved RetinaNet-based small intestinal lymphoma segmentation model establishing method, and obtaining the segmentation result of the abdominal section image to be detected.

An improved RetinaNet based small intestine lymphoma segmentation apparatus, the apparatus comprising a processor and a memory for storing a plurality of functional modules capable of running on the processor, the functional modules including a small intestine lymphoma segmentation model and a segmentation module;

the small intestinal lymphoma segmentation model is obtained by adopting a small intestinal lymphoma segmentation model establishment method based on improved RetinaNet;

the segmentation module is used for acquiring an abdominal slice image to be detected, inputting the abdominal slice image to be detected into the small intestinal lymphoma segmentation model, and acquiring a segmentation result of the abdominal slice image to be detected.

A storage medium having stored thereon a computer program which, when executed by a processor, implements an improved RetinaNet-based method of small bowel lymphoma segmentation.

Compared with the prior art, the invention has the following technical characteristics:

(1) the invention uses the model trained by the DepLesion data set as a pre-training model. The parameters of the superficial layer are frozen, and only the parameters of the deep layer are finely adjusted, so that the effect of the model on small intestinal lymphoma segmentation is improved.

(2) The invention adopts a pyramid network which combines different scale characteristics in a nonlinear way in multiple ways, and the network is obtained by using a neural network search algorithm. The feature extraction capability of the model for the targets with different morphological scales is improved.

(3) According to the method, the channel attention module is introduced into the model backbone network, so that the extraction capability of the model on the small intestinal lymphoma characteristics is improved, and the influence of background factors on training is reduced.

(4) The method combines a free-anchor method to realize the self-adaptive matching of the candidate frame and the detection target, thereby better aiming at the targets with different forms and improving the tumor positioning and identifying capability of the model.

Drawings

FIG. 1 is a network structure diagram of a small intestine lymphoma segmentation model;

FIG. 2 is a pre-training flow diagram;

FIG. 3 is a schematic diagram of a stem block structure;

FIG. 4 is a schematic diagram of a channel attention module configuration;

FIG. 5 shows the connection mode and parameter setting of the backbone network in the embodiment;

FIG. 6 is a schematic diagram of a feature pyramid network structure;

FIG. 7 is a schematic diagram of fusion block composition;

FIG. 8 shows the results of an ablation experiment;

fig. 9 shows comparative results of experiments of different models.

Detailed Description

First, the technical vocabulary appearing in the present invention is explained:

RetinaNet: RetinaNet is a single-stage object detection model proposed by Hommin et al. The model mainly comprises a characteristic pyramid and a Focal length. The model is simple in structure, and has a good detection effect on the unbalance problem of the positive and negative samples.

ImageNet: the method is a computer vision system identification project, is a database with the largest image identification in the world at present, and comprises more than 1000 ten thousand pictures of 1000 categories.

DeepLesion database: images in this database were developed by a team at the National Institute of Health (NIHCC) to include a variety of lesion types, such as kidney lesions, bone lesions, lung nodules, and lymph node enlargement. A total of 30000 patients have data, and each patient has detailed labeling of the lesion area.

Anchor frame (anchor box): the anchor frame is similar to the candidate frame for delineating the location of the target region.

Free-Anchor: the literature is presented: zhang X, Wan F, Liu C, et al, learning to match anchors for visual object detection. arXiv 2019[ J ]. arXiv preprint cs.cv/1909.02466.

The embodiment discloses a method for establishing a small intestinal lymphoma segmentation model based on improved RetinaNet, which comprises the following steps:

Specifically, the labeling of the acquired label in step 1 includes:

in the form of tags defined by the COCO dataset. The COCO data set label adopts a data type in an xml format, and the field information related to model training is as follows:

id is used to represent a unique identification of the tumor in the training image. width and height are used to indicate the size of the tumor, and respectively indicate the width and height of the tumor detection box. The tumor detection frame is a circumscribed rectangle marked by the doctor in the upper drawing. file _ name represents the file name of the picture slice.

Specifically, the backbone network is ResNeXt101, and the feature pyramid network is NAS-FPN.

Specifically, the enhanced feature map is obtained by the following method:

the NAS-FPN fuses feature maps with different scales through two ways. The two modes are sum and global posing, respectively. In both the two methods, before feature fusion, the extracted feature maps with different scales need to be adjusted to the same scale. sum is the element-by-element addition of the two adjusted feature maps. global boosting obtains the attention score of each channel by performing global maximum pooling operation and sigmoid on one of the adjusted feature maps, multiplies the attention score by another feature map, and adds the multiplied feature map to the initial feature map to obtain an enhanced feature map.

Specifically, the structure of NAS-FPN is shown in Table 1:

TABLE 1 compositional Structure of NAS-FPN

Output feature map	Input feature diagram 1	Input feature diagram 2	Connection mode
				p4_1	c6	c4	global pooling
p4_2	p4_1	c4	sum
				p3	p4_2	c3	sum
p4	p3	p4_2	sum
				p5_temp	p4	p3	global pooling
p5	c5	p5_temp	sum
				p7_temp	p5	p4_2	global pooling
p7	c7	p7_temp	sum
				p6	p7	p7	global pooling

Wherein c3, c4, c5, c6 and c7 represent feature maps of different scales extracted by the backbone network. p3, p4, p5, p6 and p7 represent feature maps of feature pyramid outputs. The output characteristic diagram and the original characteristic diagram are correspondingly equal in size. In order to increase the detection capability of the model for small targets, a p2 layer with higher resolution is also used for target detection, and a p2 layer is obtained by directly adding a p3 layer and a c2 layer through downsampling.

Specifically, the inspection network generates anchor boxes (anchor boxes) at each of the P2-P7 layers, the size of the anchor boxes being dependent on the size of the P2-P7 layers (smaller feature maps map to larger anchor boxes for original images). The number is W × H × 9. W and H represent the width and height of the feature map, respectively. The regressor is used for finely adjusting the anchor frame to generate coordinates of the detection frame (the generated coordinates are generally coordinates of the upper left corner and the lower right corner of the detection frame and are used for representing the position of the detection frame). The classifier classifies the objects in the anchor frame for determining their specific categories. Each layer of the P2-P7 has classifiers and regressors. The final result was obtained by Non-Maximum Suppression (NMS).

Specifically, the pre-training comprises: and (3) sending the annotated DeepLesion data into the model used by the method for training, randomly initializing the parameters of the model, stopping training until the training loss of the model is converged, and storing the trained parameters of the model as the type of the pth file.

Specifically, when training is performed in step 3, the labeled small intestine lymphoma data is sent to the model for training, corresponding parameters are sequentially imported from a file stored in a pre-training task in a parameter initialization process, and the imported parameters in the pre-training stage are frozen (the parameters of the frozen network layer are not updated in a back propagation stage of the model in the training process) until the training loss is converged, so that the finally trained network model is obtained.

Specifically, when step 3 is trained, the loss function is Free-Anchor.

In this embodiment, a method for segmenting small intestinal lymphoma based on improved RetinaNet is disclosed, which includes the following steps:

step two: and inputting the abdominal slice image to be detected into the small intestinal lymphoma segmentation model to obtain a segmentation result of the abdominal slice image to be detected.

In this embodiment, a small intestine lymphoma segmentation apparatus based on improved RetinaNet is disclosed, the apparatus includes a processor and a memory for storing a plurality of functional modules capable of running on the processor, the functional modules include a small intestine lymphoma segmentation model and a segmentation module;

the dual attention mechanism-based segmentation model is obtained by adopting the dual attention mechanism-based segmentation model building method as claimed in claim 2;

Specifically, the segmentation result is a focus area of the small intestine lymphoma.

The present embodiment also discloses a storage medium having stored thereon a computer program which, when executed by a processor, implements an improved RetinaNet-based method for small bowel lymphoma segmentation.

Example 1

This example uses a pyrorch as the framework of the experiment and a RetinaNet as the reference network model. The main network is ResNext101, the size of the input picture is 512 x 512, the model adopts an adam (adaptive motion) optimizer, and the initial learning rate is 0.00001. The tool class ReduceLROnPlateau of the pytorch is used to adjust the learning rate during training. Where probability is set to 4, i.e., the model reduces the learning rate without a loss drop for four consecutive iterations. Each time the learning rate is updated, the new learning rate is changed to 80% of the last iterative learning rate. In the loss function, α is set to 0.25 and γ is set to 2. The number of iterations is 18. The anchors scale was set to [0.25,0.5,1.2 ].

The abdominal slice image dataset of this example has 34 patient data with annotations. Of these, 26 patients were used for training and 8 patients were used for testing. There were also 58 patients whose data confirmed the disease, but were not annotated. In the labeled patient data, the DICOM file of each patient is a series of three-dimensional images stacked along the z-axis, which are obtained by performing slice-by-slice cross-section scanning on the abdomen of the patient by a corresponding device. The average number of slices per patient is about 40, and the number of slices can reach 80. One of the patient sections is a section with lymphoma, and the section is labeled by the doctor. The format of each patient section is DICOM format. After the doctor labels the tumor data, the format of the generated label file is also in DICOM format. Each DICOM file contains, in addition to medical image information, corresponding DICOM data header information. Each data header includes the identified related information. The patient lymphoma section information and the doctor labeling information both contain unique marking information, the marking information can establish a one-to-one mapping relation between patient lymphoma data and doctor labeling data, and lymphoma data labeled by doctors can be separated from original section data of patients through the relation. This example uses pydicom as a library to complete the work of separating lymphoma data, and finally 954 pieces of lymphoma section data labeled by doctors are obtained. In order to facilitate the training of subsequent transfer learning, all DICOM format files are converted into JPG format. The converted picture size was 512 x 512.

The present embodiment uses a trained model with labeled data to detect unlabeled patient data. And selecting the result with high confidence as the training data of the model. In order to ensure the accuracy of the obtained result, three models with different network structures are selected for detection, and the final labeling result is obtained by comparing the results of the models and using a manual screening mode, wherein the screening is mainly based on the relevance between adjacent slices of the same patient. The model undergoes two iterations, each iteration including the data from the last training pass. 603 pieces of labeled data are finally obtained, and all the obtained data are used for training. The total number of data used for training was 1353, and the number of data used for testing was 154.

The present embodiment employs Average Precision (AP) and Recall (Recall) as main evaluation indexes of a target detection model. Table 1 details the basic indices required to calculate the above indices.

TABLE 2 basic indexes required for index detection

Recall represents the Recall of the tumor by the model, i.e., how many tumors are detected by the model out of the tumors that actually exist. Precision is the accuracy of the model to tumor detection, and indicates how many tumors are correct among the tumors detected by the model. The magnitude of the AP value is represented by the area under the PR curve. The PR curve is a curve obtained with a recall value as a horizontal axis and a precision value as a vertical axis. The larger the AP value is, the better the detection effect of the model is. For ease of calculation, the AP value is typically calculated as an average of 11 equally spaced points on the PR curve.

The Pascal VOC data set generally uses AP0.5 to reflect the quality of the test results, and 0.5 represents the IOU value between the test results and the true annotation, i.e., IOU values greater than 0.5 are considered as positive examples. But starting from the COCO data set, the target detection gradually adopts AP {0.5:0.95} to evaluate the detection result. AP {0.5:0.95} is calculated by calculating the value of the AP once every 0.05 for IOU values between 0.5 and 0.95, and then averaging all the values obtained. The method can represent the quality of the detection result of the model and can reflect the positioning capability of the model for the target.

The embodiment is divided into two parts, the first part is to adopt an ablation experiment mode for each module of the model, and the interaction among the modules or the modules and the influence of the modules or the modules on the experiment result are verified. The second part is to compare the model with the target detection model proposed in recent years, and the superiority of the model of the invention on the detection of small intestinal lymphoma is verified.

S represents the introduction channel attention module, and Nms represents the use of NMS-FPN network structure added to the p2 layer. FA denotes the training mode using the best Anchor matching mechanism. D denotes freezing the first phase parameters of the network using the deepversion pre-trained model. Comparative experimental setup herein is shown in table 3, where "√" indicates that the corresponding module was selected. For example, S-Nms-D represents the introduction channel attention module, and the NMS-FPN network structure added with the p2 layer is used, and the DeepLesion data set is used for migration learning. Baseline in the table represents the original RetinaNet network model, pre-trained using the ImageNet dataset.

TABLE 3 model design

Method	S	Nms	FA	D
					baseline
D				√
					S+D	√			√
Nms+D		√		√
					FA+D			√	√
Nms+D+FA		√	√	√
					Nms+D+S	√	√		√
Nms+D+FA+S	√	√	√	√

The following table shows the results between the different control groups, using AP_0.5，AP_{0.5:0.95}，Recall_0.5，Recall_{0.5:0.95}The experimental results were evaluated. Wherein the call decision threshold is 0.5. The results of the experiment are shown in table 4.

Table 4 ablation experimental results

As can be seen from table 4. The initial model (baseline) test does not work well, perhaps because it uses pre-trained data from natural images, which differ significantly from the medical image data used herein. AP of the model after pre-training using DeepLesion dataset_0.5The improvement is 7.2 percent. The detection capability of the model can be effectively improved by transfer learning between medical image data. Respectively adding an NMS-FPN network structure (Nms + D) with a p2 layer, a channel attention mechanism (S + D), an optimal anchor matching mechanism (FA + D) and three models of AP (access point) into the model of the transfer learning_0.5The values are improved by 22.1%, 5.3% and 3.3%, respectively. The three modules all have certain promotion effect on the initial model. The promotion effect of the Nms + D module is most obvious, and the improvement of the model has an important effect on enhancing the detection effect of the model on the extraction capability of the multi-scale features. Nms + D + FA compared to Nms + D + S, AP of the model_{0.5:0.95}、Recall_{0.5:0.95}The value is obviously improved, and the optimal anchor matching mechanism is proved to enable the model to learn better tumor characteristics, so that the positioning capability of the model on the detected target is effectively improved. Compared with baseline in AP, the model (Nms + D + FA + S) of the invention_0.5、AP_{0.5:0.95}、Recall_0.5、Recall_{0.5:0.95}The detection indexes are respectively improved by 32.4%, 65.3%, 21.8% and 28.2%, which shows that the model provided by the invention has obvious effect improvement compared with the original model.

Fig. 8 shows a graph of the results of ablation experiments, wherein a) is a ground route method, b) is a baseline method, c) is a D method, D) is an S + D method, e) is an Nms + D method, f) is an FA + D method, g) is an Nms + D + FA method, h) is an Nms + D + S method, i) is an Nms + D + FA + S method.

The combination of the experimental results shows that the detection effect of the model is obviously improved after the three modules are combined. The specific improvement is that after the improved characteristic pyramid structure is combined, the model has better identification capability for some smaller targets. Meanwhile, the introduction of the attention mechanism enables the model to avoid the interference of some background information, and the false detection rate of the model is obviously reduced. After the optimal anchor matching mechanism is combined, the model has obvious improvement on the detection of targets with complex shapes and the positioning capability of some tumors.

To further verify the superiority of the models presented herein, the models herein were compared to the documents Yang Z, Liu S, Hu H, et al. All models used DeepLesion data for transfer learning, where Yolov3 used darknet53 for the backbone structure, and the other models used ResNext101 for the backbone structure. Table 5 shows the results of the different model comparison experiments.

TABLE 5 results of different model comparison experiments

Fig. 9 shows comparative results of different model experiments. a) GT method, b) FCOS method, c) Repoint method, d) PANET method, e) LibraNe method, f) YOLOv3 method, g) this method.

By comparison with some detection models in recent years, it can be seen that the model proposed herein has superior detection capabilities on small bowel lymphoma datasets.

Claims

1. A method for establishing a small intestine lymphoma segmentation model based on improved RetinaNet is characterized by comprising the following steps:

2. The method for establishing the segmentation model of the small intestine lymphoma based on the improved RetinaNet as claimed in claim 1, wherein the main network is ResNeXt101, and the characteristic pyramid network is NAS-FPN.

3. The method for modeling segmentation of intestinal lymphoma according to claim 1 based on improved RetinaNet, wherein said enhanced feature map is obtained by:

4. A method for segmenting small intestine lymphoma based on improved RetinaNet is characterized by comprising the following steps:

step two: inputting the abdominal slice image to be detected into the small intestinal lymphoma segmentation model obtained by any one of claims 1 to 3 based on the improved RetinaNet small intestinal lymphoma segmentation model establishing method, and obtaining the segmentation result of the abdominal slice image to be detected.

5. An improved RetinaNet based small intestine lymphoma segmentation apparatus, the apparatus comprising a processor and a memory for storing a plurality of functional modules capable of running on the processor, the functional modules including a small intestine lymphoma segmentation model and a segmentation module;

the small intestinal lymphoma segmentation model is obtained by adopting the small intestinal lymphoma segmentation model building method based on improved RetinaNet as claimed in any one of claims 1 to 3;

6. A storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, is adapted to carry out the improved RetinaNet-based segmentation method of small intestinal lymphoma according to claim 4.