CN111445992B - Method, device, medium and equipment for establishing prediction model - Google Patents
Method, device, medium and equipment for establishing prediction model Download PDFInfo
- Publication number
- CN111445992B CN111445992B CN202010069021.4A CN202010069021A CN111445992B CN 111445992 B CN111445992 B CN 111445992B CN 202010069021 A CN202010069021 A CN 202010069021A CN 111445992 B CN111445992 B CN 111445992B
- Authority
- CN
- China
- Prior art keywords
- observation
- observation indexes
- results
- indexes
- actual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000007689 inspection Methods 0.000 claims abstract description 34
- 238000007619 statistical method Methods 0.000 claims abstract description 16
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000000556 factor analysis Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 14
- 230000015654 memory Effects 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 8
- 238000000546 chi-square test Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000000585 Mann–Whitney U test Methods 0.000 claims description 3
- 238000000611 regression analysis Methods 0.000 claims description 2
- 238000005481 NMR spectroscopy Methods 0.000 claims 2
- 238000009472 formulation Methods 0.000 claims 2
- 239000000203 mixture Substances 0.000 claims 2
- 210000001165 lymph node Anatomy 0.000 abstract description 61
- 238000011282 treatment Methods 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 21
- 230000000694 effects Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 18
- 230000000968 intestinal effect Effects 0.000 description 16
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 9
- 238000002595 magnetic resonance imaging Methods 0.000 description 9
- 208000015634 Rectal Neoplasms Diseases 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 206010038038 rectal cancer Diseases 0.000 description 5
- 201000001275 rectum cancer Diseases 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000001959 radiotherapy Methods 0.000 description 3
- 238000000729 Fisher's exact test Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000003211 malignant effect Effects 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 206010061289 metastatic neoplasm Diseases 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 230000007170 pathology Effects 0.000 description 2
- 230000002980 postoperative effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000011269 treatment regimen Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010009944 Colon cancer Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000011275 oncology therapy Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000002271 resection Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The present application relates to a method, a device, a medium and equipment for establishing a lymph node prediction model, wherein the method comprises the following steps: obtaining inspection results comprising N observation indexes, and marking the inspection results according to actual results; carrying out statistical analysis on the marked inspection results, and determining M observation indexes related to the actual results from the N observation indexes; and establishing a prediction model based on the M observation indexes. By carrying out statistical analysis on a large number of examination results, finding out observation indexes related to actual results, establishing a prediction model based on the observation indexes related to the actual results, and using the prediction model for analysis and prediction on subsequent examination results, the accuracy of judgment on the examination results can be improved, and the establishment of a clinical treatment scheme can be guided more effectively.
Description
Technical Field
This disclosure relates to the medical arts, and more particularly, to methods, apparatus, media, and devices for building predictive models.
Background
Colorectal cancer is one of the common digestive tract malignancies worldwide. For example, (1) for early stage rectal cancer, the presence or absence of metastasis to the lymph nodes is a critical element for enabling small invasive local resection via the anal endoscope; (2) For patients who are ready to undergo a total membranectomy, the lateral lymph node status determines whether the scope of lymph node cleaning is to be enlarged during the procedure; (3) Pre-operative neoadjuvant radiotherapy and chemotherapy is a recommended treatment regimen for locally advanced rectal cancer, wherein the presence or absence of lymph node metastasis is one of the key indicators for screening suitable cases. (4) Meanwhile, whether the regional lymph node metastasis exists or not determines the sketching of the radiotherapy target area of the patient and the setting of radiotherapy dosage. The rectal cancer lymph node state is a key factor for reasonably selecting a treatment strategy and improving the prognosis of a patient, and is a precondition for realizing accurate treatment. The conventional pathological sampling and slice making method in clinical work ensures that the pre-operation MRI (Magnetic Resonance Imaging ) shows lymph nodes which cannot be in one-to-one correspondence with lymph nodes harvested by the postoperative pathology, so that retrospective experience summarization and improvement are difficult, and the method is the most main reason that the accuracy of image examination on lymph node stage is low and wander for many years.
In the related technology, the determination of the lymph node property by MRI is still lack of unified standard, the lymph node size is still the most commonly used index in clinical research, and the literature reports that the diagnosis threshold value of the maximum diameter of the lymph node is different from 4-10mm, but because of a certain overlap between benign and metastatic lymph node sizes, any compromise between sensitivity and specificity can not be reached by taking the lymph node size as the diagnosis threshold value. It is also recommended by the scholars whether the lymph node boundary is clear or not, whether the internal signal is uniform or not as an index for identifying benign and metastatic lymph nodes, but because the two indexes have higher subjectivity in application, different research results have larger difference and overall accuracy is not high. Any single image index or two image indexes cannot meet the requirements of the precise medical era for evaluating the lymph node state of the rectal cancer.
Disclosure of Invention
To overcome the problems in the related art, a method, apparatus, medium, and device for building a predictive model are provided herein.
According to a first aspect herein, there is provided a method of building a predictive model, comprising:
obtaining inspection results comprising N observation indexes, and marking the inspection results according to actual results;
carrying out statistical analysis on the marked inspection results, and determining M observation indexes related to the actual results from the N observation indexes;
and establishing a prediction model based on the M observation indexes.
The statistical analysis is performed on the marked inspection results, and determining M observation indexes related to the actual results from the N observation indexes comprises the following steps:
counting the inspection results, carrying out single factor analysis on the N observation indexes, determining the relation between each observation index and the actual result, and determining the assumed several rate value of each observation index;
and selecting the observation index with the assumed several values smaller than the target value and the specific index from the N observation indexes, and performing multi-factor analysis to determine M observation indexes related to the actual result.
The single factor analysis of the N observation indexes includes:
according to the category of the observation index, selecting different single-factor analysis methods, wherein the single-factor analysis methods comprise: the fisher precision test, the chi-square test, the mann-whitney U test.
The multi-factor analysis comprises a backward stepwise regression method for analysis, and the determining M observation indexes related to the actual results comprises the following steps:
all the multiple observation indexes selected after single factor analysis are used as variables to be input into an analysis model;
removing each variable one by one, and determining the variation of the interpretation of the analysis model after the variable is removed;
deleting the observation index corresponding to the variable that causes the smallest variation in the analysis model interpretation amount from the plurality of observation indexes;
and repeating the above processes until the red pool information quantity criterion value of the analysis model is minimum, and determining the observation indexes corresponding to the residual variables at the moment as M observation indexes related to the actual results.
The establishing a prediction model based on the M observation indexes includes:
calculating the relative contribution degree of the rest variables by taking the variable with the maximum contribution degree as a reference according to the contribution degree of the variables corresponding to the M observation indexes to the actual result, performing percentage preparation on the different contribution degrees, normalizing the different variables into different scores, and obtaining the total score of the M observation indexes at different values.
The method for establishing the prediction model further comprises the following steps:
verifying the distinguishing degree and the calibration degree of the prediction model; the predictive model is evaluated for clinical diagnostic efficacy.
According to another aspect herein, there is provided an apparatus for building a predictive model, comprising:
the acquisition module is used for acquiring inspection results comprising N observation indexes and marking the inspection results according to actual results;
the statistical analysis module is used for carrying out statistical analysis on the marked inspection results and determining M observation indexes related to the actual results from the N observation indexes;
and the model building module is used for building a prediction model based on the M observation indexes.
According to another aspect herein, there is provided a computer readable storage medium having stored thereon a computer program which when executed performs the steps of a method of building a predictive model.
According to another aspect herein, there is provided a computer device comprising a processor, a memory and a computer program stored on the memory, the processor implementing the steps of a method of building a predictive model when the computer program is executed.
According to the method, the observation indexes related to the actual results are found out through statistical analysis of a large number of examination results, a prediction model is built based on the observation indexes related to the actual results, and then the prediction model is used for analysis and prediction of subsequent examination results, so that the accuracy of judgment of the examination results can be improved, and the establishment of a clinical treatment scheme can be guided more effectively.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the disclosure, and do not constitute a limitation on the disclosure. In the drawings:
FIG. 1 is a flowchart illustrating a method of building a predictive model, according to an exemplary embodiment.
FIG. 2 is a schematic diagram of a predictive model shown in accordance with an exemplary embodiment.
Fig. 3 is a schematic diagram of a subject's operational characteristics, according to an exemplary embodiment.
Fig. 4 is a waterfall plot of each lymph node status shown according to an exemplary embodiment.
FIG. 5 is a schematic diagram of a correction curve shown according to an exemplary embodiment.
FIG. 6 is a schematic diagram of a decision curve shown according to an exemplary embodiment.
FIG. 7 is a block diagram illustrating an apparatus for building a predictive model in accordance with an exemplary embodiment.
FIG. 8 is a block diagram of a computer device, according to an example embodiment.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments herein more apparent, the technical solutions in the embodiments herein will be clearly and completely described below with reference to the accompanying drawings in the embodiments herein, and it is apparent that the described embodiments are some, but not all, embodiments herein. All other embodiments, based on the embodiments herein, which a person of ordinary skill in the art would obtain without undue burden, are within the scope of protection herein. It should be noted that, without conflict, the embodiments and features of the embodiments herein may be arbitrarily combined with each other.
FIG. 1 is a flowchart illustrating a method of building a predictive model, according to an exemplary embodiment. As shown in fig. 1, the method for establishing the prediction model includes:
and S11, obtaining inspection results comprising N observation indexes, and marking the inspection results according to the actual results.
And step S12, carrying out statistical analysis on the marked inspection results, and determining M observation indexes related to the actual results from the N observation indexes.
And S13, building a prediction model based on the M observation indexes.
In this example, the method of creating the predictive model for clear explanation will be described by taking the lymph node examination results as an example, and is not limiting. The method comprises the steps of obtaining a lymph node detection result displayed by rectal MRI before operation, establishing a one-to-one correspondence with an actual lymph node result obtained according to postoperative pathology, marking the detection result with the actual lymph node metastasis as positive, and marking the detection result with the actual lymph node benign as negative. The examination results include examination of a plurality of observation indexes such as lymph node position, minimum distance from intestinal wall, chemical shift effect, long diameter, short diameter, long-short diameter ratio, internal signal, etc.
After a certain number of actual results and inspection results are collected, the marked inspection results can be used as samples for statistical analysis.
In one embodiment, in step S12, performing statistical analysis on the marked inspection result, and determining M observation indexes related to the actual result from the N observation indexes includes:
and counting the inspection results, carrying out single factor analysis on the N observation indexes, determining the relation between each observation index and the actual result, and determining the assumed several rate value of each observation index.
And selecting the observation index and the specific index with the assumed several values smaller than the target value from the N observation indexes, performing multi-factor analysis, and determining M observation indexes related to the actual result.
For example, in the present embodiment, 324 sample data are obtained, and statistics are performed on all the sample data, as shown in table one:
table one: single factor analysis table
Note that: 1) Marked with upper corner mark a The value of (2) represents the result of using the fisher exact test; marked with upper corner mark b The value of (2) represents the result of using chi-square test; marked with upper corner mark c The values of (2) represent the results using the Mannheim U test.
According to the actual results, out of 324 sample data, 42 samples of the test result marked positive and 282 samples of the test result marked negative were used.
Taking the observation indexes as variables, carrying out single factor analysis according to the attribute of each observation index, counting the number of samples marked positive and the number of samples marked negative, determining the relation between each observation index and an actual result, and determining the assumed rate value of each observation index. Taking the chemical shift effect in table one as an example, in the samples marked positive, 1 out of the samples, 8 out of the samples were regular, and 33 out of the samples were disappeared. In the samples marked negative, there were 204 regular ones, 33 irregular ones, and 45 vanishing ones. The variables-chemical shift effects were analyzed for one-factor using the chi-square test method, and the hypothetical threshold values for chemical shift effects were calculated to be <0.001. Similarly, a single factor analysis is performed on each observation and a hypothetical score value is determined.
In an embodiment, according to the category of the observation index, different single-factor analysis methods are selected, and the single-factor analysis method includes: the fisher precision test, the chi-square test, the mann-whitney U test. In this embodiment, the variables are as follows: lymph node location, signal intensity was analyzed for single factor using fisher's exact test; the pair variable: chemical shift effect, boundary, internal signal single factor analysis by chi-square test method; the remaining variables were analyzed for one factor using the Mannheim U test method.
And after carrying out single factor analysis on each variable and determining the assumed several rate value of the observation index corresponding to each variable, selecting the observation index and the specific index with the assumed several rate value smaller than the target value, and carrying out multi-factor analysis. In this example, the observation index assuming a value of less than 0.1 and a specific index, which is a variable recognized as clinically relevant to lymph node metastasis, were selected for multi-factor analysis. The observation index with the value smaller than 0.1 is assumed, so that the influence of the observation index on the actual result is larger; meanwhile, the probability of assumption is set to 0.1, and a specific index is selected, so that the variables which are invalid in part of single-factor analysis and clinically and effectively meaningful are effectively prevented from being discharged from multi-factor analysis. Taking table one as an example, there are 9 variables in total, assuming that the values of the several values are less than 0.1: chemical shift effect, lymph node location, internal signal, signal intensity, minimum distance from intestinal wall, minor diameter, boundary, total 7 variables. Other variables are considered less clinically relevant to lymph node metastasis. The 7 variables above were therefore selected for multi-factor analysis.
Among the selected factors, M observation indicators related to the actual results are determined.
In one embodiment, in step S12, the multi-factor analysis includes backward stepwise regression analysis, and determining M observation indexes related to the actual result includes:
and all the multiple observation indexes selected after the single factor analysis are taken as variables and input into an analysis model.
And eliminating each variable one by one, and determining the variation of the interpretation quantity of the analysis model after eliminating the variable.
The observation index corresponding to the variable that causes the smallest amount of change in the interpretation amount of the analysis model is deleted from the plurality of observation indexes.
And repeating the above processes until the red pool information quantity criterion value of the analysis model is minimum, and determining the observation indexes corresponding to the residual variables at the moment as M observation indexes related to the actual results.
In this example, by multi-factor analysis, at 7 variables selected: from chemical shift effects, lymph node location, internal signals, signal intensity, minimum distance from intestinal wall, minor diameter, boundaries, 4 variables strongly correlated with actual outcome are determined: chemical shift effects, lymph node location, short diameter, minimum distance from intestinal wall. As shown in table two.
And (II) table: multi-factor analysis table
Variable(s) | Regression coefficient | Ratio (95% confidence interval) | Assuming a value of a few times |
1. Lymph node location | |||
Position 1 | 0.000 | 1.000 | |
Position 2 | -0.500 | 0.606(0.222,1.604) | 0.318 |
Position 3 | -2.435 | 0.088(0.004,0.581) | 0.033 |
Position 4 | -2.698 | 0.067(0.003,0.414) | 0.016 |
2. Chemical shift effect | |||
Rules of | 0.000 | 1.000 | - |
Irregularities | 4.195 | 66.381(10.075,1357.976) | <0.001 |
Vanishing | 5.859 | 350.435(55.849,7441.252) | <0.001 |
3. Short diameter | 0.632 | 1.881(1.277,3.008) | 0.004 |
4. Minimum distance from intestinal wall | -0.103 | 0.902(0.814,0.990) | 0.038 |
5. Constant term | -6.407 |
In the second table, the ratio is less than 1, the regression coefficient is a negative number, and the independent variable is in negative correlation with the result; when the ratio is greater than 1, the regression coefficient is positive, indicating that the independent variable is positively correlated with the result.
Through single factor analysis and multi-factor analysis, M variables which are relevant to the actual result and have great influence on the actual result are screened out from variables corresponding to a plurality of observation indexes, and the observation indexes corresponding to the M variables are determined as M observation indexes relevant to the actual result
FIG. 2 is a schematic diagram of a predictive model shown in accordance with an exemplary embodiment. Referring to fig. 2, in step S13, building a prediction model based on M observation indexes includes:
according to the contribution degree of the variables corresponding to the M observation indexes to the actual result, calculating the relative contribution degree of the rest variables by taking the variable with the maximum contribution degree as a reference, and performing percentile of different contribution degrees to normalize different variables into different scores so as to obtain the total score of the M observation indexes at different values.
Referring to Table II, when the ratio is greater than 1, the greater the ratio of the variable ratios indicates a greater strength of association with the lymph node metastasis results, and when the ratio is less than 1, the greater the ratio of the variable ratios indicates a lesser strength of association with the lymph node metastasis results. The ratio is the result of the inverse logarithm of the regression coefficient, so the variable can be normalized by the regression coefficient. As can be seen from Table 2, the ratio of the chemical shift effect is greater than 1, and the ratio of the chemical shift effect is greatest when it is "disappeared", and 350.4350, the contribution is greatest when it is predicted that the lymph node metastasis. In this embodiment, therefore, the variable "chemical shift effect" of the maximum contribution degree is used as a reference, and the percentage component value of each variable is calculated. In this embodiment, the calculation result takes 4 bits after the decimal point.
The "disappearance" is set to 100.0000 points, and the "rule" is set to 0.0000 points. The regression coefficient corresponding to "vanishing" is 5.8590, the regression coefficient corresponding to "rule" is 0, and thus the score corresponding to 4.1950 is calculated as the regression coefficient of "irregularity: 100.0000 × (4.1950/5.8590) = 71.5992 minutes.
And determining normalized scores of lymph node positions, wherein the regression coefficient corresponding to the "position 4" is-2.9680, the score corresponding to the "position 4" is 0.0000 score, and the score corresponding to the regression coefficient 5.8590 of the "disappearance" is 100.0000 score by taking the variable chemical shift effect of the maximum contribution degree as a reference.
The regression coefficient for "position 3" was-2.4350, with the corresponding score:
100.0000 × (-2.4350/5.8590) +100× (2.6980/5.8590) = 4.4888 minutes.
The regression coefficient for "position 2" was-0.5000, with a corresponding score of:
100.0000 × (-0.5000/5.8590) +100× (2.6980/5.8590) = 37.5149 minutes.
The regression coefficient for "position 1" was 0.000, corresponding to a score of:
100.0000 × (0.0000/5.8590) +100× (2.6980/5.8590) = 46.0488 minutes.
The normalized score of the short diameter is determined, the regression coefficient of the short diameter is 0.6320, the regression coefficient of the short diameter is a positive value, which shows that the larger the short diameter is, the larger the contribution to the result is, therefore, the short diameter coefficient is obtained by setting the short diameter as 0.0000 score when the short diameter is 1mm, taking the variable chemical shift effect of the maximum contribution degree as a reference, the maximum regression coefficient is 5.8590: 100.000× (0.6320/5.8590) = 10.7868, yielding the formula: short diameter score= 10.7868 × (short diameter size-1) = 10.7868 ×short diameter size-10.7868. Therefore, when the minor diameter is 1mm, the minor diameter score is 0 minutes, and when the minor diameter is 10mm, the minor diameter score is 97.0814.
The normalized score of the minimum distance from the intestinal wall was determined as-0.1030, the regression coefficient of the minimum distance from the intestinal wall was negative, indicating that the greater the distance value, the smaller the contribution to the result was, therefore, the maximum regression coefficient was 5.8590 based on the "variable chemical shift effect" of the maximum contribution degree, the minimum distance coefficient from the intestinal wall was determined as 100.0000 × (-0.1030/5.8590) = -1.7580, and the score was 0 when the minimum distance from the intestinal wall was 50mm, the obtained formula was: minimum distance score from intestinal wall = -1.7580 x minimum distance size from intestinal wall +87.9000. When the minimum distance from the intestinal wall was 0mm, the minimum distance from the intestinal wall was found to be 87.9000.
The variables are normalized to different scores, and then the total score of the M observation indexes at different values is calculated.
In this example, total score = lymph node location score + chemical shift effect score + minor diameter score + minimum distance from intestinal wall score.
Through the above calculation process, the following predictive model can be established:
lymph node location score: position 1 = 46.0488; position 2= 37.5149; position 3 = 4.4888; position 4=0.0000.
Chemical shift effect score: rule = 0.0000; irregular = 71.5992; vanishing = 100.0000.
Short diameter score = 10.7868 x short diameter size-10.7868.
Minimum distance score from intestinal wall = -1.7580 x minimum distance size from intestinal wall +87.9000.
Probability = 0.0000 x total score 3 +0.0004×total fraction 2 -0.0729 x total fraction +4.3789.
After the prediction model is obtained, the prediction model can be applied to the prediction of the lymph nodes, the detection result of the observation index of the lymph nodes to be predicted is input into the prediction model, the score of each index corresponding variable can be obtained in the prediction model according to the specific index value of the detection result, and the total score of the lymph nodes to be predicted is calculated.
According to the total score, the prediction model can give a probability value of whether the related lymph node is metastasized or not, and a reference basis is provided for a doctor to judge.
Before the prediction model is applied, the prediction model needs to be verified and evaluated, and in one embodiment, the verification and evaluation of the prediction model includes: verifying the distinguishing degree and the calibration degree of the prediction model; the predictive model is evaluated for clinical diagnostic efficacy.
Fig. 3 is a schematic diagram of a subject's operational characteristics, according to an exemplary embodiment. In fig. 3, the abscissa is specificity, the ordinate is sensitivity, 31 is the subject work characteristic curve, and 32 is the confidence interval. The area under the curve represents the distinguishing capability of the model, and when the area under the curve is larger, the distinguishing capability of the model is better; referring to fig. 3, the area under the curve is 0.947 with a 95% confidence interval of 0.920-0.974, indicating that the model has a very high degree of discrimination.
Fig. 4 is a waterfall plot of each lymph node status shown according to an exemplary embodiment. In this embodiment, still using the 324 inspection results obtained in step S11, inputting a prediction model to perform prediction, to obtain a waterfall map of each lymph node state, where all 324 lymph nodes in the group are correctly identified and incorrectly identified after being predicted by the prediction model, and the prediction result of each lymph node is represented by a bar graph. As shown in FIG. 4, the abscissa represents the lymph node number and the ordinate represents the relative score of each lymph node. Each bar graph represents a predicted outcome for one lymph node, the downward bar graph represents that the lymph node was predicted negative, and the upward bar graph represents that the lymph node was predicted positive. The dark bar graph 41 indicates that the actual result corresponding to the bar graph is negative, and the gray bar graph 42 indicates that the actual result corresponding to the bar graph is positive.
As can be seen in connection with fig. 3 and 4, the predictive model is able to correctly identify and distinguish the vast majority of lymph nodes.
And drawing a correction curve, and verifying the calibration degree of the model. FIG. 5 is a schematic diagram of a correction curve shown according to an exemplary embodiment. As shown in fig. 5, the abscissa is the predicted probability and the ordinate is the actual probability. 51 is a reference line representing a predicted value=a true value; reference numeral 52 denotes an offset correction method estimation curve, and 53 denotes an actual prediction curve. Referring to fig. 5, the actual prediction curve 53 of the model has a good fit to the reference line 51, indicating that the prediction model has accurate prediction capabilities.
And drawing a decision curve. FIG. 6 is a schematic diagram of a decision curve shown according to an exemplary embodiment. As shown in fig. 6, the abscissa is the threshold probability and the ordinate is the net benefit rate. The net benefit of the patient is indicated by 61 for negative all lymph nodes in the sample, 62 for positive all lymph nodes in the sample, and curve 63 in the figure is the net benefit of the patient after prediction using the model. Referring to fig. 6, the net benefit of model prediction in the interval 0-1.0 is higher than the results when all lymph nodes in the sample are positive and when all lymph nodes in the sample are negative, which indicates that the model has better clinical application value.
To further evaluate the model effect, the model was evaluated by 2 physicians, respectively. Taking the actual result as a gold standard by using an MRI examination result different from the one used for establishing a prediction model, and respectively and independently evaluating lymph node properties of the MRI examination result by doctors 1 and 2 according to the prior clinical experience; after 2 weeks, physicians 1, 2 each applied the model criteria for two independent evaluations. The model effect is evaluated by comparing the accuracy of the physician empirically evaluating the lymph node properties of the MRI examination result with the accuracy of the application of the model to evaluate the lymph node properties of the MRI examination result.
Table three: lymph node grade diagnostic efficacy
Table four: efficacy of diagnosing benign and malignant lymph node
From tables three and four, it can be seen that the efficacy of the model for lymph node status discrimination by both physicians is higher than that of the model for empirical evaluation.
Through verification and evaluation, the prediction model is described to be capable of predicting whether the lymph node is metastasized according to the examination result, and a reference basis is provided for the judgment of doctors.
According to the embodiment, on the premise that the lymph node examination result displayed by the rectal MRI and the lymph node actual result obtained by pathological harvest are in one-to-one correspondence, the morphological characteristics of the benign and malignant lymph nodes and the data characteristics reflecting tumor heterogeneity are fully combined on the premise of starting from the characteristics of the lymph nodes and the lymph node metastasis rules, meaningful indexes are screened out by multi-factor analysis, a lymph node prediction model is established and verified, so that accurate reference data is provided for clinic, the accuracy of judging the lymph node properties before rectal cancer therapy is improved, and the establishment of a clinical treatment scheme is more effectively guided.
FIG. 7 is a block diagram illustrating an apparatus for building a predictive model, according to an exemplary embodiment. Referring to fig. 7, the apparatus for building a prediction model includes: the system comprises an acquisition module 701, a statistical analysis module 702 and a model establishment module 703.
The acquiring module 701 is configured to acquire an inspection result including N observation indexes, and mark the inspection result according to an actual result;
the statistical analysis module 702 is configured to perform statistical analysis on the marked inspection result, and determine M observation indexes related to the actual result from the N observation indexes;
the model building module 703 is configured to build a predictive model based on the M observations.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
FIG. 8 is a block diagram illustrating a computer device 800 for building a predictive model, according to an exemplary embodiment. For example, the computer device 800 may be provided as a server. Referring to fig. 8, a computer device 800 includes a processor 801, the number of which may be set to one or more as desired. Computer device 800 also includes a memory 802 for storing instructions, such as application programs, that may be executed by processor 801. The number of the memories can be set to one or more according to the requirement. Which may store one or more applications. The processor 801 is configured to execute instructions to perform the method of building a predictive model described above.
It will be apparent to one of ordinary skill in the art that embodiments herein may be provided as a method, apparatus (device), or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The description herein is with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional identical elements in an article or apparatus that comprises the element.
While preferred embodiments herein have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all alterations and modifications as fall within the scope herein.
It will be apparent to those skilled in the art that various modifications and variations can be made herein without departing from the spirit and scope of the disclosure. Thus, given that such modifications and variations herein fall within the scope of the claims herein and their equivalents, such modifications and variations are intended to be included herein.
Claims (8)
1. A method of building a predictive model, comprising:
obtaining inspection results comprising N observation indexes, marking the inspection results according to actual results, marking each inspection result as positive or negative, and displaying the observation indexes based on nuclear magnetic resonance images;
carrying out statistical analysis on the marked inspection results, and determining M observation indexes related to the actual results from the N observation indexes;
establishing a prediction model based on the M observation indexes, wherein the prediction model is used for predicting whether an actual result corresponding to the inspection result is negative or positive;
the establishing a prediction model based on the M observation indexes includes:
calculating the relative contribution of the rest variables by taking the variable with the maximum contribution as a reference according to the contribution of the variables corresponding to the M observation indexes to the actual result, and performing percentage formulation on the different contribution to normalize the different variables into different scores so as to obtain the total score of the M observation indexes at different values;
the contribution degree is calculated by calculating regression coefficients and ratio of variables corresponding to each observation index, and calculating the correlation strength of each observation index and the actual result as negative or positive according to the regression coefficients and ratio, wherein the correlation strength is the contribution degree.
2. The method of building a predictive model of claim 1 wherein said statistically analyzing the marked inspection results to determine M observations related to said actual results from said N observations comprises:
counting the inspection results, carrying out single factor analysis on the N observation indexes, determining the relation between each observation index and the actual result, and determining the assumed several rate value of each observation index;
and selecting the observation index with the assumed several values smaller than the target value and the specific index from the N observation indexes, and performing multi-factor analysis to determine M observation indexes related to the actual result.
3. The method of building a predictive model of claim 2 wherein said single factor analysis of said N observations comprises:
according to the category of the observation index, selecting different single-factor analysis methods, wherein the single-factor analysis methods comprise: the fisher precision test, the chi-square test, the mann-whitney U test.
4. The method of building a predictive model of claim 2 wherein the multi-factor analysis comprises a backward stepwise regression analysis, the determining M observations related to actual results comprising:
all the multiple observation indexes selected after single factor analysis are used as variables to be input into an analysis model;
removing each variable one by one, and determining the variation of the interpretation of the analysis model after the variable is removed;
deleting the observation index corresponding to the variable that causes the smallest variation in the analysis model interpretation amount from the plurality of observation indexes;
and repeating the above processes until the red pool information quantity criterion value of the analysis model is minimum, and determining the observation indexes corresponding to the residual variables at the moment as M observation indexes related to the actual results.
5. The method of building a predictive model of claim 1, further comprising:
verifying the distinguishing degree and the calibration degree of the prediction model; the predictive model is evaluated for clinical diagnostic efficacy.
6. An apparatus for building a predictive model, comprising:
the acquisition module is used for acquiring inspection results comprising N observation indexes, marking the inspection results according to actual results, marking each inspection result as positive or negative, and displaying the observation indexes based on nuclear magnetic resonance images;
the statistical analysis module is used for carrying out statistical analysis on the marked inspection results and determining M observation indexes related to the actual results from the N observation indexes;
the model building module is used for building a prediction model based on the M observation indexes, and the prediction model is used for predicting whether an actual result corresponding to the inspection result is negative or positive;
the establishing a prediction model based on the M observation indexes includes:
calculating the relative contribution of the rest variables by taking the variable with the maximum contribution as a reference according to the contribution of the variables corresponding to the M observation indexes to the actual result, and performing percentage formulation on the different contribution to normalize the different variables into different scores so as to obtain the total score of the M observation indexes at different values;
the contribution degree is calculated by calculating regression coefficients and ratio of variables corresponding to each observation index, and calculating the correlation strength of each observation index and the actual result as negative or positive according to the regression coefficients and ratio, wherein the correlation strength is the contribution degree.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method according to any one of claims 1-5.
8. A computer device comprising a processor, a memory and a computer program stored on the memory, characterized in that the processor implements the steps of the method according to any of claims 1-5 when the computer program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010069021.4A CN111445992B (en) | 2020-01-21 | 2020-01-21 | Method, device, medium and equipment for establishing prediction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010069021.4A CN111445992B (en) | 2020-01-21 | 2020-01-21 | Method, device, medium and equipment for establishing prediction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111445992A CN111445992A (en) | 2020-07-24 |
CN111445992B true CN111445992B (en) | 2023-11-03 |
Family
ID=71653930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010069021.4A Active CN111445992B (en) | 2020-01-21 | 2020-01-21 | Method, device, medium and equipment for establishing prediction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111445992B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113161000B (en) * | 2021-05-06 | 2024-05-28 | 复旦大学附属中山医院 | Prognosis scoring model of mixed cell type liver cancer and construction method thereof |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007122418A (en) * | 2005-10-28 | 2007-05-17 | Bioinformatics Institute For Global Good Inc | Prediction method, prediction device, and prediction program |
WO2014110350A2 (en) * | 2013-01-11 | 2014-07-17 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
CN107845409A (en) * | 2017-12-04 | 2018-03-27 | 泰康保险集团股份有限公司 | A kind of processing method and processing device of health data |
CN108345544A (en) * | 2018-03-27 | 2018-07-31 | 北京航空航天大学 | A kind of software defect distribution analysis of Influential Factors method based on complex network |
CN108693268A (en) * | 2018-05-21 | 2018-10-23 | 百迈康生物医药科技(广州)有限公司 | A kind of combination of metabolic marker object and its kit for predicting coronary heart disease prognosis |
CN108758969A (en) * | 2018-06-14 | 2018-11-06 | 河南科技大学 | A kind of handpiece Water Chilling Units fault detection method and system |
CN109665596A (en) * | 2018-12-14 | 2019-04-23 | 浙江工业大学 | Method for simultaneously optimizing COD (chemical oxygen demand) and ammonia nitrogen removing effects of biogas slurry by reverse osmosis membrane |
CN109767830A (en) * | 2018-12-13 | 2019-05-17 | 平安医疗健康管理股份有限公司 | Hospital evaluation method and Related product based on data analysis |
-
2020
- 2020-01-21 CN CN202010069021.4A patent/CN111445992B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007122418A (en) * | 2005-10-28 | 2007-05-17 | Bioinformatics Institute For Global Good Inc | Prediction method, prediction device, and prediction program |
WO2014110350A2 (en) * | 2013-01-11 | 2014-07-17 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
CN107845409A (en) * | 2017-12-04 | 2018-03-27 | 泰康保险集团股份有限公司 | A kind of processing method and processing device of health data |
CN108345544A (en) * | 2018-03-27 | 2018-07-31 | 北京航空航天大学 | A kind of software defect distribution analysis of Influential Factors method based on complex network |
CN108693268A (en) * | 2018-05-21 | 2018-10-23 | 百迈康生物医药科技(广州)有限公司 | A kind of combination of metabolic marker object and its kit for predicting coronary heart disease prognosis |
CN108758969A (en) * | 2018-06-14 | 2018-11-06 | 河南科技大学 | A kind of handpiece Water Chilling Units fault detection method and system |
CN109767830A (en) * | 2018-12-13 | 2019-05-17 | 平安医疗健康管理股份有限公司 | Hospital evaluation method and Related product based on data analysis |
CN109665596A (en) * | 2018-12-14 | 2019-04-23 | 浙江工业大学 | Method for simultaneously optimizing COD (chemical oxygen demand) and ammonia nitrogen removing effects of biogas slurry by reverse osmosis membrane |
Non-Patent Citations (6)
Title |
---|
Logistic回归模型在预测直肠癌肠旁淋巴结转移中的应用价值;陈暮楠;周洋;于丽娟;田墨涵;;肿瘤学杂志(第04期);全文 * |
MRI在评估直肠癌局部淋巴结转移中的应用价值;戴鑫;徐青;余静;李燕;施海彬;;中国临床医学影像杂志(第02期);全文 * |
MSKCC乳腺癌前哨淋巴结转移预测模型的验证性研究;陈嘉莹;陈嘉健;杨牧刘哲斌;黄晓燕;柳光宇;韩企夏;杨文涛;沈镇宙;邵志敏;吴炅;;中国实用外科杂志(第01期);全文 * |
乳腺影像报告数据系统超声图像特征预测乳腺癌风险的logistic模型及诊断效能研究;赵海娜;彭玉兰;骆洪浩;何玉霜;金亚;杨盼;;华西医学(第12期);全文 * |
人工神经网络模型基于胃癌生物学行为的MSCT影像信息判断淋巴结转移;王之龙;高云;唐磊;孙应实;曹崑;张晓鹏;;中国医学影像技术(第06期);全文 * |
潘辉 等."基于 SEER 数据库构建小细胞肺癌术后患者生存预测模型".《肿瘤预防与治疗》.2019,第第32卷卷(第第32卷期),第517-522页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111445992A (en) | 2020-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102583103B1 (en) | Systems and methods for processing electronic images for computational detection methods | |
US11842556B2 (en) | Image analysis method, apparatus, program, and learned deep learning algorithm | |
CN111028224B (en) | Data labeling method, model training device, image processing method, image processing device and storage medium | |
CN112382392A (en) | System for be used for pulmonary nodule risk assessment | |
CN109124660B (en) | Gastrointestinal stromal tumor postoperative risk detection method and system based on deep learning | |
Zewdie et al. | Classification of breast cancer types, sub-types and grade from histopathological images using deep learning technique | |
CN115210772B (en) | System and method for processing electronic images for universal disease detection | |
CN111028223A (en) | Microsatellite unstable intestinal cancer energy spectrum CT iodine water map image omics feature processing method | |
CN111445992B (en) | Method, device, medium and equipment for establishing prediction model | |
CN117011593A (en) | Deep learning-based urine protein automatic identification and classification method | |
CN111265234A (en) | Method and system for judging properties of lung mediastinal lymph nodes | |
Hrizi et al. | Lung cancer detection and nodule type classification using image processing and machine learning | |
CN114648509B (en) | Thyroid cancer detection system based on multi-classification task | |
CN115205276A (en) | Local advanced rectal cancer curative effect prediction method, system, medium, equipment and terminal | |
Sobral-Leite et al. | Artificial intelligence-based morphometric signature to identify ductal carcinoma in situ with low risk of progression to invasive breast cancer | |
CN117893792B (en) | Bladder tumor classification method based on MR signals and related device | |
CN116631617B (en) | Prostate Gleason scoring system | |
CN118657755A (en) | Deep learning-based PD-L1 expression combined positive score acquisition method and system | |
CN116503338A (en) | Thyroid cell pathology whole-slide image analysis method based on target detection | |
Aswolinskiy et al. | Predicting pathological complete response to neoadjuvant chemotherapy in breast cancer from routine diagnostic histopathology biopsies | |
Sangüesa Recalde | Image analysis and deep learning for urothelial carcinoma tumor microenvironment characterization | |
CN117198499A (en) | Computer system for processing full slice images to detect cancer and sorting system for diagnosing cancer cases | |
CN117830254A (en) | Intraoperative image analysis system, method and application thereof | |
CN118247547A (en) | Mammary cancer HER2 immunohistochemical expression prediction method based on histological image | |
Happila et al. | Lung Nodule Detection of CT Image Using Size and Shape-Based Features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |