CN108538369B

CN108538369B - Method for analyzing central nervous system tumor image data

Info

Publication number: CN108538369B
Application number: CN201810231463.7A
Authority: CN
Inventors: 王中杰; 李学军; 易小平; 王苟思义; 张晓金
Original assignee: Xiangya Hospital of Central South University
Current assignee: Xiangya Hospital of Central South University
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2022-02-15
Anticipated expiration: 2038-03-20
Also published as: CN108538369A

Abstract

The invention relates to a method for analyzing central nervous system tumor image data, which comprises the following steps: s1, acquiring the central nervous system tumor image data; s2, automatically extracting features in the image data by adopting an LASSO algorithm, carrying out linear combination on the features to obtain a first classification probability, and taking the first classification probability as a first classification dimension; s3, generating a plurality of decision trees based on a random forest algorithm according to the characteristics, obtaining a second classification probability according to the decision trees, and taking the second classification probability as a second classification dimension; s4, taking the first classification dimension and the second classification dimension as first two-dimensional features, obtaining a third classification probability based on a two-dimensional SVM algorithm, taking the third classification probability as a first classification result, and improving the accuracy of central nervous system tumor shadow classification by combining the analysis method.

Description

Method for analyzing central nervous system tumor image data

Technical Field

The invention relates to the field of medical images, in particular to an analysis method of central nervous system tumor image data.

Background

The central nervous system tumor is a common tumor in clinic, but due to the specificity of parts, the high invasiveness and the high recurrence rate of the tumor and the high drug resistance of the traditional radiotherapy and chemotherapy, the clinical treatment effect related at home and abroad is very slow to progress in nearly 30 years despite the continuous progress of the technologies of the operation, the radiotherapy and the chemotherapy. With the advancement of information technology, medical image data is showing a tendency of explosive growth. How to systematically and comprehensively utilize the data, continuously innovate life science theories and technologies, research the mechanisms of the generation, development and treatment resistance of the medicine, and further explore a new effective treatment method. Has become a new subject facing the medical industry.

The diagnosis of the cns tumors is difficult because of the wide variety of tumors and the high imaging similarity, which makes it difficult for the physician to determine the type or malignancy of the tumor. If these characteristics of the tumor cannot be accurately judged, the best treatment regimen cannot be determined.

There are two main categories of traditional central nervous system tumor diagnosis methods: 1. the radiologist judges the type and the malignancy degree of the tumor according to personal experience by checking the nuclear magnetic resonance image. 2. A small amount of tumor tissue is extracted through a puncture surgery, and then diagnosis is made by using various pathological analysis methods, gene detection and other means.

The conventional diagnostic method has several problems: 1. there is great uncertainty in the way radiologists make their decisions by looking at medical images. Since doctors need to read a large number of images every day, the judgment result is easily interfered by external and internal factors, and the judgment with high accuracy cannot be always kept. In addition, many tumors have high similarity in imaging, which results in a high misjudgment rate to a great extent; 2. the method of extracting a small amount of tumor tissues through the puncture operation to carry out pathological examination can obtain diagnosis with high accuracy, but the puncture operation inevitably causes certain trauma to the brain tissues of patients, which can increase the pain of the patients, and compared with the imaging examination, the method has long time required by the puncture examination and cannot be popularized on a large scale.

Disclosure of Invention

Therefore, it is necessary to provide a method for analyzing image data of central nervous system tumors, aiming at the technical problem that the central nervous system tumors are not accurately judged from medical images.

A method for analyzing central nervous system tumor image data, comprising:

s1, acquiring the central nervous system tumor image data;

s2, automatically screening out features in the image data by adopting an LASSO algorithm (LASSO algorithm), linearly combining the features to obtain a first classification probability, and taking the first classification probability as a first classification dimension;

s3, generating a plurality of decision trees based on a Random Forest algorithm (namely a Random Forest algorithm) according to the characteristics, obtaining a second classification probability according to the decision trees, and taking the second classification probability as a second classification dimension;

s4, taking the first classification dimension and the second classification dimension as a first two-dimensional feature, obtaining a third classification probability based on a two-dimensional SVM algorithm (namely a support vector machine algorithm), and taking the third classification probability as a first classification result.

Further, the image data includes conventional medical image data and/or computer image processing data.

Still further, the computer image processing data includes texture features.

Preferably, the medical image data is adopted to obtain a final fourth classification probability, the computer image processing data is adopted to obtain a final fifth classification probability, and the fourth classification probability and the fifth classification probability are respectively used as a third classification dimension and a fourth classification dimension; and taking the third classification dimension and the fourth classification dimension as a second two-dimensional feature, obtaining a sixth classification probability based on the two-dimensional SVM algorithm, and taking the sixth classification probability as a second classification result.

Preferably, in step S2, the coefficient of the feature is not zero, and the relationship between the feature and the coefficient is: a is₁×F₁+...+a_k×F_k；

Wherein k is a positive integer, S represents the first classification probability, a₁Representing said coefficient corresponding to the first said feature, a_kRepresenting said coefficient, F, corresponding to the kth feature₁Denotes the first of said features, F_kRepresenting the kth feature.

Preferably, in step S2, each of the decision trees includes a number of the features that are randomly selected.

Further, after step S4, verifying the first classification result includes: selecting a training set and a verification set, using data in the training set to perform machine learning training, and obtaining a correct classification result according to an LASSO algorithm, a random forest algorithm and a two-dimensional SVM algorithm; then classifying the data in the verification set to obtain a verification classification result; and comparing the verification classification result with the correct classification result to obtain the accuracy of the first classification result.

Preferably, the AUC value of said validated classification result is compared with the AUC value of said correct classification result.

Compared with the prior art, the invention has the advantages that: the method comprises the steps of extracting the obtained image data of the central nervous system tumor by adopting an LASSO algorithm to obtain features in the image data, carrying out linear combination on the features to obtain a first classification probability, then generating a plurality of decision trees by adopting a random forest algorithm on the features, obtaining a second classification probability according to the decision trees, improving the accuracy of the random forest algorithm, and finally reclassifying classification results (the first classification probability and the second classification probability) obtained by two classes of algorithms by using a two-dimensional SVM algorithm to obtain a third classification probability, so that the classification accuracy is further improved.

Other beneficial effects are as follows: and a sixth classification probability is obtained by reclassifying classification results (a fourth classification probability and a fifth classification probability) based on two different data (traditional medical image data and computer image processing data) by using a two-dimensional SVM algorithm, so that the classification accuracy is further improved, the accuracy is as high as about 89%, and the AUC value is as high as about 0.97.

Drawings

FIG. 1 is a flowchart illustrating a method for analyzing image data of a CNS tumor according to an embodiment of the present invention.

Fig. 2 is a diagram illustrating the classification result of the training set in the method for analyzing central nervous system tumor image data according to the embodiment of the present invention.

FIG. 3 is a diagram illustrating the classification result of the validation set in the method for analyzing image data of CNS tumor according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.

In an embodiment of the present invention, a method for classifying central nervous system tumor image data is provided, which is combined with fig. 1, and includes:

t1, acquiring the central nervous system tumor image data;

t2, automatically extracting features in the image data by adopting an LASSO algorithm, carrying out linear combination on the features to obtain a first classification probability, and taking the first classification probability as a first classification dimension; wherein the coefficients of the feature are not zero, the coefficients are automatically generated by the LASSO algorithm, and the relationship between the feature and the coefficients is: a is₁×F₁+...+a_k×F_kWherein S represents the first classification probability, a₁Representing said coefficient corresponding to the first said feature, a_kRepresenting said coefficient, F, corresponding to the kth feature₁Denotes the first of said features, F_kRepresenting the kth feature.

And T3, generating a plurality of decision trees based on a random forest algorithm according to the features, obtaining a second classification probability according to the decision trees, and taking the second classification probability as a second classification dimension, wherein each decision tree preferably consists of a plurality of randomly selected features. Each decision tree can obtain a classification result of one case, and then all decision trees obtain final classification probabilities according to a voting method, for example, a total of 10 decision trees, of which 2 obtain a result a and 8 obtain a result B, the classification probability of the result a is 20%, and the classification probability of the result B is 80%. This value may be retained as another classification feature dimension.

And T4, taking the first classification dimension and the second classification dimension as a first two-dimensional feature, obtaining a third classification probability based on a two-dimensional SVM algorithm, and taking the third classification probability as a first classification result.

It should be noted that the classification probability represents what the probabilities in the two possibilities are respectively, and if the probability of the first possibility occupies 40%, the probability of the other possibility occupies 60%. In this embodiment, the classification probabilities include central nervous system benign and atypical lesion classification probabilities.

On the basis of the above embodiment, the image data in this embodiment includes traditional medical image data or computer image processing data, the medical image data obtains a final fourth classification probability by using the above central nervous system tumor image data analysis method, and the computer image processing data obtains a final fifth classification probability by using the above central nervous system tumor image data analysis method, that is, the third classification probability includes the fourth classification probability and the fifth classification probability. Wherein the computer image processing data includes texture features.

On the basis of the foregoing embodiment, in order to obtain a more accurate classification result, in this embodiment, the method further includes using the fourth classification probability and the fifth classification probability as a third classification dimension and a fourth classification dimension, respectively, using the third classification dimension and the fourth classification dimension as second two-dimensional features, obtaining a sixth classification probability based on the two-dimensional SVM algorithm, and using the sixth classification probability as a second classification result, where the second classification result is more accurate than the first classification result.

On the basis of the foregoing embodiment, this embodiment further includes verifying the first classification result: selecting a training set and a verification set, performing machine learning training by using data in the training set, and simultaneously determining all parameters required by the data analysis method, wherein the parameters comprise names and coefficients of features in an LASSO algorithm, the number of decision trees in a Random Forest algorithm, and the type of a kernel function in an SVM algorithm, and obtaining a correct classification result by combining extracted features through the LASSO algorithm, the Random Forest algorithm and the two-dimensional SVM algorithm; then, classifying the data in the verification set by adopting the data classification method to obtain a verification classification result; and comparing the verification classification result with the correct classification result to obtain the accuracy of the first classification result. The AUC value, i.e., the area under the ROC curve, was used as an index for evaluating the quality of the classification result. If AUC is 1, classification is completely correct, and if AUC is 0, classification is completely incorrect. If the AUC value of the verification set is basically consistent with the AUC value of the training set, the classification result is valid.

Tumors of the central nervous system include meningiomas, and in particular, the present methods use benign and atypical meningioma data for validation. The invention aims to verify whether the classification probability is obtained by using the data analysis method disclosed by the invention, and whether two types of meningiomas (benign and atypical) can be distinguished according to the classification probability, thereby playing a role in auxiliary diagnosis. Of the 188 cases collected in total, 108 were benign meningiomas and 80 were atypical meningiomas. The data was scaled into a training set (80%) and a validation set (20%), with the proportion of benign and atypical cases in each set remaining consistent. In the verification test, 271 texture features are automatically generated by using MaZda software, and 21 important features are obtained through selection of a LASSO algorithm, and are respectively as follows:

Mean：7.46743513329021e-08

Variance：-9.64620487628537e-09

Skewness：0.0183510694901050

Perc_01_：3.96287078842167e-07

Perc_10_：3.03430212931759e-07

Perc_50_：1.84485943484775e-07

S_2_2_Correlat：-0.0597885449198297

S_0_3_Correlat：-0.0237332224922227

S_4__4_Correlat：-0.0432722324250192

S_5_5_Correlat：-0.0943270560138581

S_5__5_Correlat：-0.0897137900197810

Horzl_GLevNonU：-0.000244276607501659

Vertl_LngREmph：-5.30755477163183e-06

x45dgr_GLevNonU：-0.000103509863405282

x135dr_LngREmph：-4.66538949337198e-06

GrMean：-0.0523716801683309

GrVariance：-0.184954357921091

GrNonZeros：-0.129586627947559

Teta1：-0.168344340866505

Teta2：0.111206870926906

Sigma：0.340163327683169

where the colon is followed by coefficients corresponding to the features.

In addition, we use 43 features of the traditional medical image data, and through the selection of the LASSO algorithm, 18 important features are obtained, which are respectively:

sex (gender): -0.0229764960541845

Location (location): 9.07750835589749e-06

Orientation (orientation): 0.00248730392127705

Shape (shape): -0.112592986308194

Long diameter (length): -0.00377227594852725

Short diameter (width): -0.000343448221861136

Contour (cournte): -0.000662784706417665

Tumor brain interface invasion (tumor brain invasion/parachylal invasion): -0.0388307762159711

Bone invasion (bone invasion): -0.00407057076398080

Edema (edema): -0.0164659469830602

Degree of enhancement (enhancement degree): -0.0367956384909010

Enhancement of homogeneity (enhancement homogeneity): -0.0288309780906818

Intratumoral artery (intratumoral artery): -0.0655023328383138

Eccentric cystic change (eccentric cystic change): -0.0184289725094601

Intratumoral necrosis (intratumoral necrosis): -0.0700134125234530

Hemorrhage (hemorrhage): -0.0216729527444438

Dural tails (dural tails): 0.0199106009684027

subLocation 1: 0.0123148020999805

subLocation 2: -0.00574664106065702

Where the colon is followed by coefficients corresponding to the features.

Respectively introducing the two features into a RandomForest algorithm, forming a random forest by using 8 decision trees to obtain respective second classification probabilities, and then further obtaining respective fourth classification probabilities and fifth classification probabilities by using a radial basis function as a kernel function in an SVM algorithm. And finally, respectively taking the fourth classification probability and the fifth classification probability as a third classification dimension and a fourth classification dimension as a second two-dimensional feature, obtaining a final sixth classification probability based on a two-dimensional SVM algorithm, and taking the sixth classification probability as a final classification result.

As shown in fig. 2 and 3, the patient numbers are reordered from small to large by a sixth classification probability, which ranges from 0 to 1. If the probability is less than 0.5, it is judged as an atypical meningioma, and if it is greater than 0.5, it is judged as a benign meningioma. The colors of the bars in the figure (black and white) represent standard results, white bars representing benign meningiomas and black bars representing atypical meningiomas. If the determination probability of benign meningioma is less than 0.5, or the determination probability of atypical meningioma is greater than 0.5, the classification result of the patient is wrong, which is obtained from the training set of fig. 2, 10 atypical patients out of 150 are determined to be benign, 7 benign patients are determined to be atypical, the obtained accuracy is 88.7%, which is obtained from the verification set of fig. 3, 2 atypical patients out of 38 are determined to be benign, 2 benign patients are determined to be atypical, the accuracy is 89.4%, which is very close to the accuracy of the training set, and the accuracy of the data analysis method adopting the invention is also shown to be high.

It should be noted that LASSO, RandomForest, and SVM algorithms in this experiment are implemented using standard functions in Matlab R2016b software.

Finally, the AUC values of the validated classification results (i.e., area under curve values) are compared to the AUC values of the correct classification results. The AUC value of the prior art is usually about 0.8, and the method obtains a classification result with an AUC value of 0.971 in a training set and a classification result with an AUC value of 0.957 in a verification set. The two AUCs are close to 1 and do not differ much, further illustrating the accuracy of the classification results of this embodiment.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for analyzing image data of a tumor in the central nervous system, comprising:

s1, acquiring the central nervous system tumor image data; the image data comprises traditional medical image data and computer image processing data, and the steps S2-S4 are respectively executed aiming at the traditional medical image data and the computer image processing data;

s2, automatically extracting features in the image data by adopting an LASSO algorithm, carrying out linear combination on the features to obtain a first classification probability, and taking the first classification probability as a first classification dimension;

the texture features of the computer image processing data include: mean, Variance, Skewness, Perc _01_, Perc _10_, Perc _50_, S _2_2_ corrlat, S _0_3_ corrlat, S _4__4_ corrlat, S _5_5_ corrlat, S _5__5_ corrlat, Horzl _ glevnnu, Vertl _ lngrph, x45dgr _ glevnnou, x135dr _ LngREmph, GrMean, GrVariance, grnozeros, Teta1, Teta2, Sigma; the conventional medical image data includes: gender, location, orientation, shape, long diameter, short diameter, contour, tumor brain interface invasion, bone invasion, edema, degree of enhancement, enhancement of homogeneity, intratumoral artery, eccentric cystic change, intratumoral necrosis, hemorrhage, dural tail, sub-location one and sub-location two;

s3, generating a plurality of decision trees based on a random forest algorithm according to the characteristics, obtaining a second classification probability according to the decision trees, and taking the second classification probability as a second classification dimension;

s4, taking the first classification dimension and the second classification dimension as a first two-dimensional feature, obtaining a third classification probability based on a two-dimensional SVM algorithm, and taking the third classification probability as a first classification result;

s5, the first classification result obtained by executing steps S2-S4 on the medical image data is a fourth classification probability, the first classification result obtained by executing steps S2-S4 on the computer image processing data is a fifth classification probability, and the fourth classification probability and the fifth classification probability are respectively used as a third classification dimension and a fourth classification dimension; and taking the third classification dimension and the fourth classification dimension as a second two-dimensional feature, obtaining a sixth classification probability based on the two-dimensional SVM algorithm, and taking the sixth classification probability as a second classification result.

2. The method for analyzing CNS tumor image data according to claim 1, wherein in step S2, the coefficient of the feature is not zero,the relationship between the characteristic and the coefficient is as follows: a is₁×F₁+...+a_k×F_k；

3. A method for analyzing central nervous system tumor image data according to claim 1, wherein each of the decision trees comprises a plurality of randomly selected features in step S2.

4. The method for analyzing central nervous system tumor image data according to claim 1, further comprising verifying the first classification result after step S4: selecting a training set and a verification set, using data in the training set to perform machine learning training, and obtaining a correct classification result according to the LASSO algorithm, the random forest algorithm and the two-dimensional SVM algorithm; then classifying the data in the verification set to obtain a verification classification result; and comparing the verification classification result with the correct classification result to obtain the accuracy of the first classification result.

5. The method of claim 4, wherein the AUC value of said validated classification is compared with the AUC value of said correct classification.