CN113052799A

CN113052799A - Osteosarcoma and osteochondroma prediction method based on Mask RCNN network

Info

Publication number: CN113052799A
Application number: CN202110253569.9A
Authority: CN
Inventors: 夏国庆; 潘君; 王敏; 吴桓; 冉天飞
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-06-29

Abstract

The invention relates to a bone sarcoma and osteochondroma prediction method based on Mask RCNN network, which comprises the following steps: (1) collecting primary data of osteosarcoma and osteochondroma based on X-ray plain film of patient group; (2) and converting the original data in the dicom format into a JPG format, and manually labeling the converted image by using labeling software labelme to form a data set. The markers are divided into osteosarcoma and osteochondroma. (3) And (4) training an example segmentation model based on Mask RCNN by using a COCO public data set and a labeled data set. (4) The segmentation mask generated by the prediction model is output to a de-overlap module, and the function of the de-overlap module is to effectively eliminate the region with prediction overlap in the prediction result. (5) After the overlapping module is passed, the result is output to a heterogeneous module, and the function of the heterogeneous module relieves the phenomenon that two tumors appear in the prediction result at the same time. (6) And outputting the segmentation mask subjected to the two post-processing screening modules to a projection area calculation module, wherein the projection area calculation module is used for calculating the actual projection area of the identification area. The invention identifies osteosarcoma and osteochondroma on the X-ray plain film through artificial intelligence, and calculates the actual projection area. Low cost, easy learning, and high automation degree and accuracy.

Description

Osteosarcoma and osteochondroma prediction method based on Mask RCNN network

Technical Field

The invention belongs to the field of computer vision deep learning and medical image processing, relates to an image recognition and example segmentation technology, and particularly relates to a bone sarcoma and osteochondrosis detection system method based on a Mask RCNN network.

Background

With the rapid development of economy, the living standard of people is continuously improved, the science and technology are continuously improved, and the medical health becomes a hot problem of the whole society.

In a modern medical system, compared with a large number of patients to be diagnosed, the number of radiologists is small, so that the workload of the radiologists is large, and the diagnosis period is long; and the radiologist needs to face the medical image of the whole department, the required knowledge is wide and difficult. In the image diagnosis work with high intensity and difficulty, the manual work is inevitable to cause errors, and the treatment of the patient can be delayed or even mistakenly treated. Therefore, the manual medical diagnosis is time-consuming, labor-consuming, and may be wrong. Aiming at the current situation, the intelligent diagnosis method which has acceptable accuracy, low cost, high efficiency and easy popularization replaces manual work to a certain extent, and can save a lot of manpower, material resources and financial resources.

In recent years, with the continuous progress of computer vision deep learning, deep learning technology based on convolutional neural network is greatly developed. Wherein, the object detection model Faster RCNN proposed by the Hommine team in 2015 has achieved very good performance in object detection, and the Mask R-CNN is further based on the performance: and obtaining a detection result of the pixel level. For each target object, not only its bounding box is given, but also whether the individual pixels within the bounding box belong to the object is marked.

In the Mask RCNN, the addition of a Mask network enables the Mask RCNN to not only process the object detection problem, but also process the semantic segmentation problem. Mask RCNN replaces the Roi Pooling layer with the RoiAlign layer, and adds a parallel FCN layer (Mask layer), and adds a feature extraction network to ResNet50+ FPN to strengthen the capability of extracting features.

Disclosure of Invention

Aiming at the defects or improvement requirements in the prior art, the osteosarcoma and osteochondroma prediction method based on the Mask RCNN provided by the invention solves the problem that diagnosis by a radiologist is time-consuming and labor-consuming. The method aims to form a data set by collecting X-ray plain medical images of osteochondroma and osteosarcoma of a patient group, identify the symptom characteristics by using a Mask RCNN model, improve the accuracy of an identification result by using an overlap removal module and a heterogeneous module, and finally obtain the actual tiled area of an identification area by using a projection area calculation module.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a bone sarcoma and osteochondroma detection system method based on Mask RCNN network comprises the following steps:

step (1): the deep neural network model for detecting the osteosarcoma and the osteochondroma is constructed and comprises an instance segmentation module, an overlap removal module, a heterogeneous removal module and a projection area calculation module.

The example segmentation module is based on Mask RCNN and is used for identifying semantic masks of osteosarcomas and osteochondromas and giving identification tumor types (the osteosarcomas or the osteochondromas), identification areas and confidence degrees;

the overlap removing module judges whether the pixel of the overlapped area exceeds the set threshold value according to the prediction result of the semantic mask, if so, the semantic mask with a larger recognition area is reserved, and the overlap removing module can effectively eliminate the phenomenon that a plurality of prediction areas exist in the same target.

The heterogeneous removing module removes the semantic masks with low confidence coefficient according to the prediction confidence coefficient in the semantic masks, and can effectively relieve the phenomenon that two tumors appear in the prediction result at the same time.

And the projection area calculation module is used for calculating the tiled area of the identification area according to the predicted semantic mask.

Step (2): training a target instance segmentation module based on Mask RCNN, comprising:

step (2.1), acquiring plain medical images of osteochondroma and osteosarcoma of a patient group to form an original data set;

and (2.2) converting the dicom-format original data of the medical image into a JPG format, and labeling osteosarcoma and osteochondroma labels on the image by using labeling software labelme to obtain a json file comprising the original image and the labels corresponding to the original image. Randomly dividing a data set into a training set, a verification set and a test set;

and (2.3) initializing Mask RCNN by using the migration learning thought in deep learning and using the weight of a COCO public data set. And training by using a training set, wherein each epoch is verified by using a verification set in the training process. If the verification curve is converged, entering the step (3); otherwise, returning to the step (2.1), expanding the original database, re-labeling, and repeating the training and verifying process;

and (3): the test set is used to test the detection effect of the Mask RCNN model.

Further, in order to give the model a universality of multi-angle X-ray and reduce the loss value of the model, the data set of the model contains picture data of different view angles.

Further, when the label of the original data was prepared, the label types were classified into 2 types, i.e., osteosarcoma and osteochondroma.

Further, in step (2.3), the Mask RCNN model adopts the ResNet-50 network + the feature pyramid network as a feature extractor for extracting low-level features and high-level features of the image from the original picture. Through this process, it allows the features of each level to be combined with the features of both the high and low levels; these features are input into a regional suggestion network, suggested regions are generated, and then the suggested regions are aligned and pooled, and semantic masks are identified.

Further, the anchor point is used in the regional suggestion network, and inputs with different sizes can be adjusted into outputs with the same size, so that feature maps with any size can be converted into feature vectors with fixed sizes.

Further, the loss function L of Mask RCNN in step (2.3) is as follows:

L＝Lcls+Lbox+Lmask

wherein Lcls, Lbox, and Lmask respectively represent loss functions of classification, regression, and semantic prediction. The value is the loss value.

In order to achieve the above object, according to another aspect of the present invention, there is provided a system and method for detecting osteosarcoma and osteochondroma based on Mask RCNN network, comprising an X-ray plain film device, a processor, and a deep learning neural network model program module for detecting osteosarcoma and osteochondroma, obtained after training and verification according to the intelligent image detection method of any one of claims 1 to 6; the processor calls the deep learning neural network model program module to analyze the image detected by the X-ray plain film device, so as to identify whether the patient has osteosarcoma or osteochondroma.

In general, the above technical solution conceived by the present invention has the following advantages

(1) The data is easy to obtain, and the cost is low: the data set used for training and testing the model, X-ray plain film from the patient, is most common in patient examination compared to CT film, MRI film.

(2) The result is real and objective: the tumor detection is completed by using an algorithm based on a convolutional neural network, and the method does not depend on expert experience and artificial judgment and has great objectivity.

(3) Automatic, with low costs: because the X-ray plain film of the patient can be directly used, the projection area calculation is calculated from the identification, the full-automatic detection is realized, and the time and the economic cost are saved.

Drawings

FIG. 1 is a schematic flow diagram of a preferred embodiment of the present invention;

FIG. 2 is a basic functional block diagram of a preferred embodiment of the present invention;

fig. 3 is a schematic image processing flow diagram of Mask RCNN according to the preferred embodiment of the present invention.

FIG. 4 is a osteochondroma orthotopic tablet according to a preferred embodiment of the present invention

FIG. 5 is a lateral osteochondral tumor plate according to a preferred embodiment of the present invention

FIG. 6 is a flat osteosarcoma tablet according to a preferred embodiment of the present invention

Detailed Description

The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.

As shown in fig. 1 and 2, the present invention provides a method for predicting osteosarcoma and osteochondroma based on Mask RCNN network, which is used for automatically and intelligently detecting whether a patient has osteosarcoma or osteochondroma;

The example segmentation module is based on Mask RCNN and is used for identifying semantic masks of osteosarcoma and osteochondroma;

the overlap removing module judges whether the pixel of the overlapped area exceeds the set threshold value according to the prediction result of the semantic mask, if so, the semantic mask with a larger identification area is reserved.

The heterogeneous removing module removes the semantic mask with low confidence according to the prediction confidence in the semantic mask.

The following respectively describes the principles and working processes of the example segmentation module, the overlap elimination module, the heterogeneous elimination module and the projection area calculation module, and the detailed description and the examples are given.

1. Example segmentation Module (Mask based RCNN)

The Mask RCNN model adopted by the invention has better effect when being applied to the field of target detection and example segmentation compared with other models (Fast RCNN, Fast RCNN and the like). Mask RCNN is similar to other target detection models, firstly generates candidate regions possibly containing detected targets, then uses a convolution neural network to classify the targets of the candidate regions, and further finely adjusts the position and the size of a frame to encapsulate the targets. The Mask RCN is based on the fast RCNN, and a segmentation Mask network is added on the basis of the Mask RCN, so that the Mask RCN can be competent for instance segmentation tasks on the basis of completing the current detection.

As shown in fig. 3, the image is sent to a Mask RCNN network for network training:

inputting the image into a pre-trained ResNet50 neural network to obtain a corresponding feature map (feature map); and sending the feature map into a regional suggestion network to generate suggestion regions (Region probable) containing the targets to be detected, classifying the suggestions regions, and further finely adjusting the positions and the sizes of the frames to package the targets.

After the proposed region is processed, it is segmented from the feature map according to its position and size. Normalizing the local feature maps to a certain size through a RoIAlign layer, pooling the local feature maps to the local feature maps with uniform sizes, and processing the local feature maps as follows: (1) inputting the data into a full connection layer, and outputting a classification and suggestion frame prediction result; (2) the segmentation mask is output through a plurality of convolution layers.

1.1. Network architecture

1.1.1 feature extractor

The feature extractor consists of a residual error network ResNet-50 and a Feature Pyramid Network (FPN), and has the functions of preliminarily extracting feature images and combining high-level and low-level features with each other.

1.1.2 regional advice networks

The RPN is a lightweight neural network that scans an image with a sliding window and looks for areas where objects are present. The input feature map of the RPN is a shared feature map obtained by the feature extractor, and the feature map is firstly subjected to 3 × 3 convolution operation, and finally a feature map with 256 channels (channels) is obtained, the size of the feature map is the same as that of the common feature map, and 256 × W (H × W) is assumed, so that the feature map has H × W vectors, each vector is 256-dimensional, and then each feature vector is subjected to two independent Full Connection (FC) operations, and the output is: 1)2 x k scores, i.e. the score of the foreground and the score of the background; 2)4 x k coordinates, i.e. the offset coordinates of the pointer to the original coordinates.

1.1.3 candidate region alignment

Compared with the fast RCNN, RoIAlign is a region feature aggregation method proposed in the Mask RCNN, and well solves the problem of region mismatch (mis-alignment) caused by two-time quantization in the ROI firing operation of the fast RCNN.

The RoIAlign is improved from the source of ROI Pooling limitation, namely, the quantization operation is cancelled for the pixel with the coordinate as the floating point number generated in the quantization, and the image numerical value on the corresponding pixel point is obtained by adopting a bilinear interpolation method, so that the whole characteristic aggregation process is converted into a continuous operation.

1.1.4 Split mask network

Through the RoIAlign layer, the local feature map of an arbitrary size becomes a fixed-size feature map as input to a subsequent classification prediction layer (softmax), a bounding box regression prediction layer (bbox reg), and a Segmentation Mask prediction layer (Segmentation Mask). There is a "head" portion after the RoIAlign layer, which primarily serves to expand the output dimension of RoIAlign, which is more accurate in predicting masks.

1.2. Loss function

L＝Lcls+Lbox+Lmask

2. De-overlap module

The image is subjected to model prediction to obtain a series of segmentation masks, when the prediction results are overlapped, one mask of the prediction results is marked as alpha, the other mask is marked as beta, and the delta is marked as a pixel overlapping area of the two masks, and whether the mask with a smaller identification area is reserved is determined according to a formula (1).

Wherein the hyperparameter mu is a threshold value preset according to an experiment. In this embodiment, μ is taken to be 0.5

3. Heterogeneous module

Considering the actual situation, the above-mentioned symptoms do not occur simultaneously, so in order to solve the occurrence of different categories in the prediction result, we only select the category with the highest confidence coefficient for storage. Wherein { γ 1, γ 2, γ 3 … … } is the confidence value of the same tumor set, { ε 1, ε 2, ε 3 … … } is the confidence value of another tumor set, γ max ═ γ 1, γ 2, γ 3 … … } is the highest confidence value of the same tumor, and ε max ═ ε 1, ε 2, ε 3 … … } is the highest confidence value of the other tumor, and the corresponding tumor set is retained according to equation (2)

4. Projection area calculation module

After the module processing, mu is the number of pixels of the whole imageAnd η is the number of pixels in each of the remaining segmentation masks, ψ,

the pixel intervals of X and Y axes, zeta and xi are the corresponding pixel numbers respectively, phi is the projection area, and the calculation mode is as formula (3)

Example osteosarcoma prediction

In order to make the purpose, technical scheme and advantages of the osteosarcoma and osteochondroma prediction method based on Mask RCNN network more clear, the invention is further described in detail with reference to the accompanying drawings and embodiments; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1 and 2, the present invention provides a detection system method for osteosarcoma and osteochondroma based on Mask RCNN network, which is used for automatically and intelligently detecting whether a patient has osteosarcoma or osteochondroma;

the overlap removing module is used for selecting the larger identification area according to whether the pixels with the overlapped prediction masks exceed the set threshold value or not, and can effectively eliminate the phenomenon that the pixels with the overlapped prediction masks overlap the same target prediction area.

The heterogeneous removing module removes the categories with low confidence degrees according to the prediction confidence degrees, and can effectively relieve the phenomenon that two tumors appear simultaneously in the prediction result

The projection area calculation module is used for calculating the tiled area of the identification area according to the prediction mask.

and (2.1) acquiring a plain medical image of the patient suffering from osteosarcoma based on the medical image data of the X-ray plain of the patient to form original data, and referring to fig. 4, 5 and 6. To a certain extent, the more the total amount of raw data, the better;

step (2.2), after the format of the original data in the dicom format is converted, labeling the label of osteosarcoma to form an original image and a corresponding label; randomly dividing a data set into a training set, a verification set and a test set;

step (2.3), initializing Mask RCNN by using the weight of a COCO public data set by using the idea of transfer learning, training by using a training set, and verifying the Mask RCNN of each round of training by using a verification set again for each epoch; and (4) if the curve fitting is verified, entering the step (3), otherwise, returning to the step (2.1), expanding the original database, re-labeling, training and verifying.

And (3): and testing the detection effect of the Mask RCNN model after the training verification by using a test set of random division.

Further, in step (1), in order to give the model a universality of multi-angle X-ray and reduce the loss value of the model, the data set of the model includes picture data of different viewing angles, as shown in fig. 4 and 5.

Further, in the step (2), when the label of the original data is made, the label name is osteosarcoma.

Further, in the step (2.3), the Mask RCNN model adopts a ResNet-50 network + a feature pyramid network as a feature extractor, and is used for extracting low-level features and high-level features of the image from the original picture. Through this process, it allows the features of each level to be combined with the features of both the high and low levels; these features are input into a regional suggestion network, suggested regions are generated, and then the suggested regions are aligned and pooled, and semantic masks are identified.

The anchor point is used in the area proposal network, and the input with different sizes can be adjusted into the output with the same size, so that the feature map with any size can be converted into the feature vector with fixed size.

Further, the loss function L of the Mask RCNN network in step (2.3) is as follows:

L＝Lcls+Lbox+Lmask

Example two osteochondroma prediction

and (2.1) acquiring a plain medical image of the patient suffering from osteochondrosis based on the medical image data of the X-ray plain of the patient to form original data, and referring to the figures 4, 5 and 6. To a certain extent, the more the total amount of raw data, the better;

step (2.2), after the format of the original data in the dicom format is converted, labeling the label of the osteochondroma to the original data to form an original image and a label corresponding to the original image; randomly dividing a data set into a training set, a verification set and a test set;

Further, in the step (2), when the label of the original data is prepared, the label is named osteochondroma.

L＝Lcls+Lbox+Lmask

It will be readily understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention and is not intended to limit the invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A bone sarcoma and osteochondrosis prediction method based on Mask RCNN network comprises the following steps,

The example segmentation module is constructed based on Mask RCNN and used for identifying osteosarcoma and osteochondroma;

the overlap removing module judges whether the pixel of the overlapped area exceeds a set threshold value according to the prediction result of the semantic mask, if so, the semantic mask with a larger identification area is reserved;

the heterogeneous removing module is used for removing the semantic mask with low confidence coefficient according to the prediction confidence coefficient in the semantic mask;

Step (2): training an example segmentation module based on Mask RCNN, comprising:

step (2.1), acquiring X-ray plain film medical images of osteochondroma and osteosarcoma of a patient group to form an original data set;

and (2.2) converting the dicom format original data of the medical image into a JPG format, and respectively labeling osteosarcoma labels and osteochondroma labels on the image by using labeling software labelme to obtain a json file comprising the original image and the labels corresponding to the original image. Randomly dividing a data set into a training set, a verification set and a test set in a ratio of 7:1: 2;

and (3): the test set was used to test the predicted effect of the Mask RCNN model.

2. The method as claimed in claim 1, wherein the step (1) includes the step of including the picture data of different viewing angles in the data set of the model in order to give the model a universal property of multi-angle X-ray plain films and reduce the loss value of the model.

3. The method of claim 1, wherein in the step (2), when the label of the original data is generated, the label is classified into two types, i.e., osteosarcoma and osteochondroma.

4. The method as claimed in claim 1, wherein in step (2.3), the Mask RCNN model uses the ResNet-50 network + feature pyramid network as a feature extractor for extracting low-level features and high-level features of the image from the original picture. Through this process, it allows the features of each level to be combined with the features of both the high and low levels; these features are input into a region suggestion network, which generates a suggestion region. The proposed regions are then aligned and pooled, and semantic masks are identified.

5. The method of claim 4, wherein anchor points are used in the area recommendation network to adjust inputs of different sizes into outputs of the same size, so that feature maps of any size can be converted into feature vectors of fixed size.

6. The method of claim 1, wherein the loss function L of the Mask RCNN network in step (2.3) is as follows:

L＝Lcls+Lbox+Lmask

7. The osteosarcoma and osteochondroma prediction method based on Mask RCNN network of claim 1, comprising an X-ray plain film device, a processor, and a deep learning neural network model program module for detecting osteosarcoma and osteochondroma, obtained after training and verification according to the image intelligent prediction method of any one of claims 1 to 6; the processor calls the deep learning neural network model program module to analyze the image predicted by the X-ray plain film device, so as to identify whether the patient has osteosarcoma or osteochondroma.