CN113160969A

CN113160969A - Soft tissue sarcoma recurrence probability prediction method based on machine learning

Info

Publication number: CN113160969A
Application number: CN202110399327.0A
Authority: CN
Inventors: 王鹤翔; 杨海强; 郝大鹏; 刘银华
Original assignee: Qingdao University; Affiliated Hospital of University of Qingdao
Current assignee: Qingdao University; Affiliated Hospital of University of Qingdao
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-23

Abstract

The invention relates to a soft tissue sarcoma recurrence probability prediction method based on machine learning, belonging to the technical field of medical image processing. The invention mainly comprises the following steps: s1: calculating the recurrence probability of the basic soft tissue sarcoma sample data; s2: screening conventional features and image features in the sample data set; s3: aiming at the sample data set, implementing conventional feature processing, image feature processing and data set division; s4: and (4) combining the BP neural network model and the random forest to construct a recurrence probability prediction model. The invention is based on soft tissue sarcoma patient samples collected by hospitals, utilizes the thinking of sample sampling to calculate the recurrence probability values of the soft tissue sarcoma in three-year period and five-year period for individual samples, combines the recurrence time data to convert the recurrence probability values to obtain accurate and reliable recurrence probability of the individual soft tissue sarcoma patients, and determines a final soft tissue sarcoma recurrence probability prediction model according to the difference of predicted values and true values.

Description

Soft tissue sarcoma recurrence probability prediction method based on machine learning

Technical Field

The invention relates to a soft tissue sarcoma recurrence probability prediction method based on machine learning, belonging to the technical field of medical image processing.

Background

The existing prediction method aiming at the recurrence probability of the soft tissue sarcoma mainly has two problems: firstly, doctors observe medical images of the sarcoma according to experience to judge the content of the sarcoma, such as size, histological type, pathological grade and the like, and great difference is caused by different abilities and experiences of the doctors, so that treatment is delayed; secondly, based on some specific characteristic information in soft tissue sarcoma data, a mathematical model can be established to carry out recurrence risk prediction, however, the existing model excessively depends on specific characteristics used in the model, and due to large morphological difference, many characteristics and complexity of soft tissue sarcoma, the prediction accuracy rate is low and the reliability is poor.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a soft tissue sarcoma recurrence probability prediction method based on machine learning, which utilizes nuclear magnetic resonance MRI images which are easy to collect in various cities and hospitals to extract typical characteristics, and comprehensively adopts BP neural network and random forest algorithm to establish a soft tissue sarcoma recurrence probability prediction model, so that the soft tissue sarcoma recurrence risk can be predicted.

The invention relates to a soft tissue sarcoma recurrence probability prediction method based on machine learning, which comprises the following steps:

s1: calculating the recurrence probability based on sample data, namely acquiring the recurrence probability of a single patient by collecting information of soft tissue sarcoma patients and converting the information, and comprising the following steps:

s11: collection of samples { D } of Soft tissue sarcoma patients₁,D₂,D₃,...,D_nThe number n of the suggested samples is more than or equal to 100;

s12: calculating the recurrence probability of each sample, including the following specific steps:

s121: for sample i, dividing all subsamples containing sample i

Each subsample containing

A sample is obtained;

s122: for sub-samples

Calculating the 3-year recurrence probability of the sample i in the subsample

And 5 years recurrence probability

Namely:

in the formula: n is_3-r、n_5-rAre respectively subsamples

The number of recurrent diseases in the middle 3 years and the number of recurrent diseases in the 5 yearsThe number of patients with the disease;

s123: calculating the recurrence probability of sample i, namely:

s124: then the { D for all samples is known₁,D₂,D₃,...,D_nTriple annual recurrence probability

And probability of recurrence in five years

S125: and (3) converting the three-year relapse probability and the five-year relapse probability by using the relapse time t respectively, namely:

in the formula: the recurrence time t represents the recurrence of the postoperative month, and the t value range [1,60 ];

s2: characteristic screening for soft tissue sarcoma recurrence: screening conventional features and image features in the sample data set;

s3: sample data processing based on features: according to step S1 and step S2, an acquired sample { D }is obtained₁,D₂,D₃,...,D_nProcessing the conventional characteristics and the image characteristics of all samples in the data, including the following small samples, corresponding to the conventional characteristics, the image characteristics, the 3-year recurrence probability and the 5-year recurrence probabilityThe method comprises the following steps:

s31: processing conventional characteristics;

s32: image feature processing: for sample { D₁,D₂,D₃,...,D_nAll image features of

Standardized processing is carried out, and each image feature needs to be treated

Its characteristic value

Normalization is performed, namely:

s32: data set partitioning: dividing the test set and the training set, wherein: the training set is used for training the machine learning algorithm, the test set is used for testing the quality of the machine learning algorithm, the data sets are sorted from high to low according to the 3-year recurrence probability or the 5-year recurrence probability, samples with a certain rule are selected as the test set according to the sequence numbers, and the rest data are used as the training set;

s4: recurrence probability prediction based on machine learning model: according to the steps S1, S1 and S2, obtaining a complete data set of all samples, and realizing the mapping of the sample characteristics and the recurrence probability by adopting a BP neural network and a random forest, wherein the method comprises the following steps:

s41: model training: the method comprises a BP neural network and a random forest, wherein:

s411: a BP neural network;

s412: random forests;

s42: model evaluation and determination: will correspond to the probability of relapse in three years

And probability of recurrence in five years

Respectively inputting the trained neural network and random forest to obtain the recurrence probability predicted value in three years

And the five-year recurrence probability prediction value

The difference v between the predicted and true values for the three and five years³And v⁵The calculation is carried out, namely:

parameter v³,v⁵The larger the value is, the larger the difference between the representative predicted value and the true value is, namely the larger the error of the corresponding model is, the better the effect is;

parameters v for all models^ANN、v^RFSelecting the minimum value min { v } of the minimum values^ANN,v^RFAnd the corresponding model is the soft tissue sarcoma recurrence probability prediction model.

Preferably, in step S11, the collecting sample information of the soft tissue sarcoma patient includes: personal information, pathological characteristics, image characteristics, whether the patient relapses in 3 years after the operation and whether the patient relapses in 5 years after the operation.

Preferably, in step S2, the characteristics of soft tissue sarcoma recurrence include:

s21: routine characteristics include gender, age, and post-operative time;

s22: and image characteristics are extracted by using MRI images obtained by the nuclear magnetic resonance equipment.

Preferably, in step S22, the MRI images obtained by the MRI apparatus are classified into T1-weighted imaging and T2-weighted imaging according to different imaging modes.

Preferably, in the step S22, T1 weighted imaging includes the following cases:

the first condition is as follows: in wavelet-low frequency subband imaging mode:

(a) large-area high-gray-level factor characteristics of the gray-level area matrix;

(b) a small area high gray level factor characteristic of the gray level area matrix;

case two: in wavelet-low high frequency sub-band imaging mode:

(a) roughness characteristics of adjacent gray level difference matrices;

(b) total energy characteristics of the first order statistics;

case three: in wavelet-high-low-frequency subband imaging mode:

(a) a small dependence low gray level factor characteristic of the gray level correlation matrix;

case four: in wavelet-high-low-high-frequency sub-band imaging mode:

(b) large-area high-gray-level factor characteristics of the gray-level area matrix;

(c) a small area high gray level factor characteristic of the gray level area matrix;

case five: under the three-dimensional imaging mode of the 5mm Laplacian:

(a) the dependency unevenness normalization characteristic of the gray difference matrix;

(b) the Mazis correlation coefficient characteristics of the gray level co-occurrence matrix;

(c) a kurtosis characteristic of the first order statistic;

case six: under a 15mm Laplacian three-dimensional imaging mode:

(b) a kurtosis characteristic of the first order statistic;

case seven: in the original imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

(b) the large dependence high gray level factor characteristic of the gray difference matrix;

(c) large area high gray level factor characteristics of the gray area matrix.

Preferably, in the step S22, T2 weighted imaging includes the following cases:

the first condition is as follows: in the original imaging mode:

(a) elongation characteristics of the shape;

(b) the inverse variance characteristic of the gray level co-occurrence matrix;

(c) the large dependence high gray level factor characteristic of the gray difference matrix;

case two: in wavelet-high frequency sub-band imaging mode:

(a) contrast characteristics of adjacent gray level difference matrices;

(b) non-uniform normalization of gray levels of the gray level area matrix;

(c) long-run high-gray-scale factor characteristics of the gray-scale run matrix;

(d) mean feature of first order statistics

Case three: under a 15mm Laplacian three-dimensional imaging mode:

(a) a 90 quantile value feature of the first order statistic;

(b) a kurtosis characteristic of the first order statistic;

case four: under the three-dimensional imaging mode of the 5mm Laplacian:

case five: in wavelet-high-low-high-frequency sub-band imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

(b) clustering shadow features of the gray level co-occurrence matrix;

case six: in wavelet-low frequency subband imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

(b) a small area of the gray scale region matrix is characteristic of a high gray level factor.

Preferably, in step S31, the conventional feature processing includes the following steps:

a) sex: male 1 and female 0;

b) age: 0.1 year-10, 0.2 year-10, 0.3 year-30, 0.4 year-30, 0.5 year-40, 0.6 year-50, 0.7 year-60, 0.8 year-70, 0.9 year-80, 0.9 year-90, and 1 year-90 or more years old;

c) the time after the operation: the actual number of months m is divided by 60.

Preferably, in step S32, the data set partition selects an arithmetic progression, i.e., the 3 rd, 6 th, 9 th, 12 th, 15 th, 18 th, 21 th, 24 th, 27 th, 30 th and 30 … th samples as a test set according to the sequence number, and the rest data as a training set.

Preferably, in step S411, the BP neural network includes the following contents:

a) selecting a 5-layer network structure: namely an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3 and an output layer Lⁱⁿ,L^y1,L^y2,L^y3,L^out；

b) Number of neurons in 5 layers: respectively as follows: sⁱⁿ,s^y1,s^y2,s^y3,s^outWherein: s^y1Value range of [16,30 ]]，s^y2Value range [8,12 ]]，s^y3Value range [3, 5]]；

c) Network initial weight: taking a random value;

d) activation function: the activation function adopts sigmoid function, and the calculation formula is

e) Error function: using sum variance SSE;

f) learning rate: the value range is [0.1,0.5 ].

Preferably, in step S412, the key parameters involved in the random forest are set as follows:

the variable sampling value of each iteration is set to be 10;

the number of decision trees contained in the random forest was set to 3000.

The invention has the beneficial effects that:

(1) based on soft tissue sarcoma patient samples collected by a hospital, calculating recurrence probability values of soft tissue sarcoma in three-year period and five-year period by using the thinking of sample sampling, and converting the recurrence probability values by combining recurrence time data to obtain accurate and reliable recurrence probability of individual soft tissue sarcoma patients;

(2) the method comprises the steps of extracting 33 typical characteristics such as age, sex and Magnetic Resonance Imaging (MRI) images by using a data set of a soft tissue sarcoma patient, establishing a BP neural network and a random forest model to realize mapping of the characteristics and recurrence probability values, and determining a final soft tissue sarcoma recurrence probability prediction model according to the difference between a predicted value and a true value.

Drawings

FIG. 1 is a flow diagram of the present invention.

FIG. 2 is a flow diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

as shown in fig. 1 and fig. 2, the method for predicting probability of recurrence of soft tissue sarcoma based on machine learning according to the present invention is mainly described. Firstly, calculating the recurrence probability of basic soft tissue sarcoma sample data; secondly, screening conventional features and image features in the sample data set; thirdly, implementing conventional feature processing, image feature processing and data set division for the sample data set; and finally, combining the BP neural network model and the random forest to construct a recurrence probability prediction model.

The invention specifically comprises the following steps:

step S1: calculating the recurrence probability based on sample data:

for a single soft tissue sarcoma patient, it is difficult to know the accurate recurrence probability, so enough information of the soft tissue sarcoma patient is collected and converted to know the recurrence probability of the single patient.

First, a sample of a patient with soft tissue sarcoma { D }was collected₁,D₂,D₃,...,D_nThe sample number is more than 100(n ≧ 100), and the data sample information contains: personal information, pathological characteristics, image characteristics, whether the patient relapses in 3 years after the operation, whether the patient relapses in 5 years after the operation and the like.

Second, the probability of recurrence was calculated for each sample as follows:

(1) for sample i, dividing all subsamples containing sample i

Each subsample containing

And (4) sampling.

(2) For sub-samples

The probability of 3-year recurrence of sample i within this subsample can be calculated

And 5 years recurrence probability

In a manner that

And

wherein n is_3-rAnd n_5-rAre respectively subsamples

The number of recurrent diseases in the middle 3 years and 5The number of recurring diseases in the year;

(3) the probability of recurrence for sample i was calculated as follows:

(4) then the { D for all samples can be known₁,D₂,D₃,...,D_nTriple annual recurrence probability

And probability of recurrence in five years

(5) Using the time of recurrence t (t)₁,t₂,...,t_n∈[1,60]When t is 10, representing postoperative recurrence of 10 months), the probability of recurrence in three years and the probability of recurrence in five years are respectively converted, and the formula is as follows:

step S2: characteristic screening for soft tissue sarcoma recurrence:

the features for recurrence of soft tissue sarcoma mainly include two categories, one is conventional and the other is medical imaging. The screening of the invention as the basis characteristics of the soft tissue sarcoma recurrence probability calculation comprises the following steps:

general characteristics

(1) Gender, (2) age, (3) postoperative time (month)

(II) medical imaging features

The invention utilizes MRI images obtained by nuclear magnetic resonance equipment to extract 30 image characteristics. The method specifically comprises the following steps:

in T1 weighted imaging, in Wavelet-low frequency sub-band (Wavelet-LLL) imaging mode

(1) Large Area High Gray Level factor (Large Area High Gray Level Emphasis) characteristic of Gray Level Area matrix (GLSZM);

(2) a small-Area High Gray Level factor (Samll Area High Gray Level Emphasis) feature of the Gray Level Area matrix (GLSZM);

in T1 weighted imaging, in Wavelet-low high frequency sub-band (Wavelet-LLH) imaging mode

(3) Roughness (coarsense) characteristics of adjacent gray difference matrices (NGTDM);

(4) total Energy (Total Energy) characteristic of the First Order statistic (First Order);

in T1 weighted imaging, in a Wavelet-high-low frequency subband (Wavelet-HLL) imaging mode

(5) Small dependent Low Gray Level factor (Small dependency Low Gray Level email) characteristics of a Gray Level correlation matrix (GLDM);

in T1 weighted imaging, in Wavelet-high-low-high frequency sub-band (Wavelet-HLH) imaging mode

(6) Large Area High Gray Level factor (Large Area High Gray Level Emphasis) characteristic of Gray Level Area matrix (GLSZM);

(7) a small-Area High Gray Level factor (Samll Area High Gray Level Emphasis) feature of the Gray Level Area matrix (GLSZM);

in T1 weighted imaging, the 5mm Laplacian is under a three-dimensional (log-sigma-0-5-mm-3D) imaging mode

(8) A dependent Non-Uniformity Normalized (dependent Non-Uniformity Normalized) feature of a gray level difference matrix (gldm);

(9) -a Mausus Correlation Coefficient (MCC) characteristic of the gray level co-occurrence matrix (glcm);

(10) kurtosis (Kurtosis) characteristic of the first order statistic (firstorder);

in T1 weighted imaging, 15mm Laplacian three-dimensional (log-sigma-1-5-mm-3D) imaging mode

(11) A dependent Non-Uniformity Normalized (dependent Non-Uniformity Normalized) feature of a gray level difference matrix (gldm);

(12) kurtosis (Kurtosis) characteristic of the first order statistic (firstorder);

original (original) imaging mode in T1 weighted imaging

(13) Inverse variance (invertebrance) characteristics of gray level co-occurrence matrix (glcm);

(14) a Large Dependence High Gray Level factor (Large dependency High Gray Level email) feature of a Gray Level difference matrix (gldm);

(15) large Area High Gray Level factor (Large Area High Gray Level algorithm) characteristics of a Gray Area matrix (GLSZM);

original (original) imaging mode in T2 weighted imaging

(16) Elongation (elongation) characteristics of the shape (shape);

(17) inverse variance (invertebrance) characteristics of gray level co-occurrence matrix (glcm);

(18) a Large Dependence High Gray Level factor (Large dependency High Gray Level email) feature of a Gray Level difference matrix (gldm);

in T2 weighted imaging, in Wavelet-high frequency sub-band (Wavelet-HHH) imaging mode

(19) Contrast (contrast) characteristics of adjacent gray difference matrices (NGTDM);

(20) a Gray Level Non-Uniformity Normalized (Gray Level Non-Uniformity Normalized) feature of a Gray Level area matrix (GLSZM);

(21) long Run High Gray Level factor (Long Run High Gray Level) characteristics of the Gray Run matrix (glrlm);

(22) mean feature of first order statistics (firstorder)

In T2 weighted imaging, 15mm Laplacian three-dimensional (log-sigma-1-5-mm-3D) imaging mode

(23) A 90 quantile (90Percentile) feature of the first order statistic (firstorder);

(24) kurtosis (Kurtosis) characteristic of the first order statistic (firstorder);

in T2 weighted imaging, the 5mm Laplacian is under a three-dimensional (log-sigma-0-5-mm-3D) imaging mode

(25) A dependent Non-Uniformity Normalized (dependent Non-Uniformity Normalized) feature of a gray level difference matrix (gldm);

(26) -a Mausus Correlation Coefficient (MCC) characteristic of the gray level co-occurrence matrix (glcm);

in T2 weighted imaging, in Wavelet-high-low-high frequency sub-band (Wavelet-HLH) imaging mode

(27) Inverse variance (invertebrance) characteristics of gray level co-occurrence matrix (glcm);

(28) cluster shadow (cluster shade) feature of gray level co-occurrence matrix (glcm);

in T2 weighted imaging, in Wavelet-low frequency sub-band (Wavelet-LLL) imaging mode

(29) Inverse variance (invertebrance) characteristics of gray level co-occurrence matrix (glcm);

(30) a small-Area High Gray Level factor (Samll Area High Gray Level Emphasis) feature of the Gray Level Area matrix (GLSZM);

step S3: sample data processing based on features:

from the contents of steps S1 and S2, an acquired sample ({ D)₁,D₂,D₃,...,D_n}) conventional features, image features, 3-year recurrence probability and 5-year recurrence probability corresponding to all samples in the set. The conventional features and the image features are processed as follows:

(1) routine feature processing

a) Sex: male 1 and female 0

b) Age: 0.1 in 0-10 years old, 0.2 in 10-20 years old, 0.3 in 20-30 years old, 0.4 in 30-40 years old, 0.5 in 40-50 years old, 0.6 in 50-60 years old, 0.7 in 60-70 years old, 0.8 in 70-80 years old, 0.9 in 80-90 years old, 1 in over 90 years old

c) The time after the operation: actual number of months m divided by 60(m/60)

(2) Image feature processing

For sample ({ D)₁,D₂,D₃,...,D_n}) ofAll image characteristics

Its characteristic value

Normalization is performed, the formula is as follows:

(3) data set partitioning

The test set is divided into a training set, the training set is used for training the machine learning algorithm, and the test set is used for checking the quality of the machine learning algorithm.

Sorting the data sets from large to small according to the 3-year recurrence probability or the 5-year recurrence probability, selecting samples of No. 3, No. 6, No. 9, No. 12, No. 15, No. 18, No. 21, No. 24, No. 27 and No. 30 … (arithmetic progression) as test sets according to sequence numbers, and using the rest data as training sets.

Step S4: recurrence probability prediction based on machine learning model:

according to the contents of the steps S1, S2 and S3, a complete data set of all samples can be obtained, and the method adopts a BP Neural Network (Back Propagation Neural Network) and a random forest (Ramdom forest) to realize the mapping of the sample characteristics (including conventional characteristics and image characteristics) and the 3-year relapse probability (or the 5-year relapse probability).

(1) Model training

1) BP neural network

a) Selecting 5-layer network structure, i.e. input layer, hidden layer 1, hidden layer 2, hidden layer 3 and output layer Lⁱⁿ,L^y1,L^y2,L^y3,L^out；

b) The number of neurons in each layer is respectively: sⁱⁿ,s^y1,s^y2,s^y3,s^out. Wherein s isⁱⁿ＝33、s^outCorresponding to 33 eigenvalues and 1 output (probability of recurrence in 3-year or 5-year), respectively, s1^y1Value range of [16,30 ]]，s^y2Value range [8,12 ]]，s^y3Value range [3, 5]]；

c) Network initial weight: taking a random value;

e) Error function: using sum variance (SSE);

f) learning rate: the value range is [0.1,0.5 ].

2) Random forest

The key parameter settings involved in the algorithm are as follows:

the variable sampling value of each iteration is set to be 10;

the number of decision trees contained in the random forest is set to 3000;

(2) model evaluation and determination

All test samples (corresponding to three years recurrence probability) were combined

And probability of recurrence in five years

) Respectively inputting the trained neural network and random forest to obtain the recurrence probability prediction value in three years

And the five-year recurrence probability prediction value

The difference v between the predicted and true values for the three and five years³And v⁵The calculation is carried out according to the following formula:

parameter v³,v⁵The larger the value is, the larger the difference between the predicted value and the true value is, namely the larger the error of the corresponding model (neural network or random forest) is, the worse the effect is.

Parameters v for all models^ANN、v^RFSelecting the minimum value min { v } of the minimum values^ANN,v^RFThe corresponding model is the soft tissue sarcoma recurrence probability prediction model of the invention. The method can be popularized and applied to other fields, areas and samples.

The invention has the following effects: (1) based on soft tissue sarcoma patient samples collected by a hospital, the recurrence probability values of the soft tissue sarcoma in three-year period and five-year period are calculated for the individual samples by using the thinking of sample sampling, and the recurrence probability values are converted by combining the recurrence time data, so that the accurate and reliable recurrence probability of the individual soft tissue sarcoma patient is obtained. (2) The method comprises the steps of extracting 33 typical characteristics such as age, sex and Magnetic Resonance Imaging (MRI) images by using a data set of a soft tissue sarcoma patient, establishing a BP neural network and a random forest model to realize mapping of the characteristics and recurrence probability values, and determining a final soft tissue sarcoma recurrence probability prediction model according to the difference between a predicted value and a true value.

The invention can be widely applied to medical image processing occasions.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A soft tissue sarcoma recurrence probability prediction method based on machine learning is characterized by comprising the following steps:

s121: for sample i, dividing all subsamples containing sample i

Each subsample containing

A sample is obtained;

s122: for sub-samples

Calculating the 3-year recurrence probability of the sample i in the subsample

And 5 years recurrence probability

Namely:

in the formula: n is_3-r、n_5-rAre respectively subsamples

The recurrence rate in the middle 3 years and the recurrence rate in the 5 years;

s123: calculating the recurrence probability of sample i, namely:

And probability of recurrence in five years

s3: sample data processing based on features: according to step S1 and step S2, an acquired sample { D }is obtained₁,D₂,D₃,...,D_nProcessing the conventional characteristics and the image characteristics of all samples corresponding to the conventional characteristics, the image characteristics, the 3-year relapse probability and the 5-year relapse probability, wherein the processing comprises the following steps:

s31: processing conventional characteristics;

Its characteristic value

Normalization is performed, namely:

s411: a BP neural network;

s412: random forests;

And probability of recurrence in five years

And the five-year recurrence probability prediction value

2. The method according to claim 1, wherein the step S11 of collecting sample information of soft tissue sarcoma patient includes: personal information, pathological characteristics, image characteristics, whether the patient relapses in 3 years after the operation and whether the patient relapses in 5 years after the operation.

3. The method of predicting probability of recurrence of soft tissue sarcoma based on machine learning of claim 1, wherein the characteristics of recurrence of soft tissue sarcoma in step S2 include:

s21: routine characteristics include gender, age, and post-operative time;

4. The method as claimed in claim 3, wherein the MRI images obtained by MRI apparatus in step S22 are divided into T1 weighted imaging and T2 weighted imaging according to different imaging modes.

5. The method of claim 3, wherein in step S22, T1 weighted imaging includes the following steps:

case two: in wavelet-low high frequency sub-band imaging mode:

(a) roughness characteristics of adjacent gray level difference matrices;

(b) total energy characteristics of the first order statistics;

case three: in wavelet-high-low-frequency subband imaging mode:

case four: in wavelet-high-low-high-frequency sub-band imaging mode:

case five: under the three-dimensional imaging mode of the 5mm Laplacian:

(c) a kurtosis characteristic of the first order statistic;

case six: under a 15mm Laplacian three-dimensional imaging mode:

(b) a kurtosis characteristic of the first order statistic;

case seven: in the original imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

(c) large area high gray level factor characteristics of the gray area matrix.

6. The method of claim 3, wherein in step S22, T2 weighted imaging includes the following steps:

the first condition is as follows: in the original imaging mode:

(a) elongation characteristics of the shape;

(b) the inverse variance characteristic of the gray level co-occurrence matrix;

case two: in wavelet-high frequency sub-band imaging mode:

(a) contrast characteristics of adjacent gray level difference matrices;

(b) non-uniform normalization of gray levels of the gray level area matrix;

(d) mean feature of first order statistics

Case three: under a 15mm Laplacian three-dimensional imaging mode:

(a) a 90 quantile value feature of the first order statistic;

(b) a kurtosis characteristic of the first order statistic;

case four: under the three-dimensional imaging mode of the 5mm Laplacian:

case five: in wavelet-high-low-high-frequency sub-band imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

(b) clustering shadow features of the gray level co-occurrence matrix;

case six: in wavelet-low frequency subband imaging mode:

(a) the inverse variance characteristic of the gray level co-occurrence matrix;

7. The method of predicting probability of recurrence of soft tissue sarcoma based on machine learning of claim 1, wherein the routine characteristic processing in step S31 comprises the following steps:

a) sex: male 1 and female 0;

8. The method of claim 7, wherein in step S32, the data set division selects the arithmetic progression according to the sequence number, i.e. the 3 rd, 6 th, 9 th, 12 th, 15 th, 18 th, 21 th, 24 th, 27 th, 30 th 30 … samples as the test set, and the rest data as the training set.

9. The method of predicting probability of recurrence of soft tissue sarcoma based on machine learning of claim 1, wherein in step S411, the BP neural network comprises the following contents:

c) Network initial weight: taking a random value;

e) Error function: using sum variance SSE;

f) learning rate: the value range is [0.1,0.5 ].

10. The method of predicting probability of recurrence of soft tissue sarcoma based on machine learning of claim 1, wherein in step S412, the key parameters involved in random forest are set as follows:

the variable sampling value of each iteration is set to be 10;

the number of decision trees contained in the random forest was set to 3000.