CN114121291A - Disease grading prediction method and device, electronic equipment and storage medium - Google Patents
Disease grading prediction method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114121291A CN114121291A CN202111250817.0A CN202111250817A CN114121291A CN 114121291 A CN114121291 A CN 114121291A CN 202111250817 A CN202111250817 A CN 202111250817A CN 114121291 A CN114121291 A CN 114121291A
- Authority
- CN
- China
- Prior art keywords
- target
- feature
- preset
- user
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 78
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 55
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 131
- 239000013598 vector Substances 0.000 claims abstract description 92
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000012360 testing method Methods 0.000 claims abstract description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 55
- 238000012545 processing Methods 0.000 claims description 21
- 238000012216 screening Methods 0.000 claims description 21
- 230000035772 mutation Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 11
- 238000003709 image segmentation Methods 0.000 claims description 11
- 238000012795 verification Methods 0.000 claims description 10
- 206010061818 Disease progression Diseases 0.000 claims description 4
- 230000005750 disease progression Effects 0.000 claims description 4
- 230000002068 genetic effect Effects 0.000 claims description 4
- 230000004547 gene signature Effects 0.000 claims 1
- 208000003174 Brain Neoplasms Diseases 0.000 description 15
- 206010028980 Neoplasm Diseases 0.000 description 13
- 238000002595 magnetic resonance imaging Methods 0.000 description 12
- 201000007983 brain glioma Diseases 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000003384 imaging method Methods 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 238000012706 support-vector machine Methods 0.000 description 7
- 206010064571 Gene mutation Diseases 0.000 description 6
- 238000013145 classification model Methods 0.000 description 6
- 238000002790 cross-validation Methods 0.000 description 6
- 208000032612 Glial tumor Diseases 0.000 description 5
- 206010018338 Glioma Diseases 0.000 description 5
- 230000008030 elimination Effects 0.000 description 5
- 238000003379 elimination reaction Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 208000005017 glioblastoma Diseases 0.000 description 5
- JXSJBGJIGXNWCI-UHFFFAOYSA-N diethyl 2-[(dimethoxyphosphorothioyl)thio]succinate Chemical compound CCOC(=O)CC(SP(=S)(OC)OC)C(=O)OCC JXSJBGJIGXNWCI-UHFFFAOYSA-N 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000004083 survival effect Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 3
- 208000029824 high grade glioma Diseases 0.000 description 3
- 208000030173 low grade glioma Diseases 0.000 description 3
- 201000011614 malignant glioma Diseases 0.000 description 3
- 201000010915 Glioblastoma multiforme Diseases 0.000 description 2
- 230000017531 blood circulation Effects 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 206010003571 Astrocytoma Diseases 0.000 description 1
- 101001042041 Bos taurus Isocitrate dehydrogenase [NAD] subunit beta, mitochondrial Proteins 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 101000960234 Homo sapiens Isocitrate dehydrogenase [NADP] cytoplasmic Proteins 0.000 description 1
- 102100039905 Isocitrate dehydrogenase [NADP] cytoplasmic Human genes 0.000 description 1
- 102100025825 Methylated-DNA-protein-cysteine methyltransferase Human genes 0.000 description 1
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 1
- 102000015098 Tumor Suppressor Protein p53 Human genes 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- -1 and certainly Proteins 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 230000036770 blood supply Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000005013 brain tissue Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 239000002872 contrast media Substances 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000002601 intratumoral effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 108040008770 methylated-DNA-[protein]-cysteine S-methyltransferase activity proteins Proteins 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001613 neoplastic effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000000049 pigment Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011127 radiochemotherapy Methods 0.000 description 1
- 210000003625 skull Anatomy 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0033—Features or image-related aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/05—Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves
- A61B5/055—Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves involving electronic [EMR] or nuclear [NMR] magnetic resonance, e.g. magnetic resonance imaging
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4842—Monitoring progression or stage of a disease
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10072—Tomographic images
- G06T2207/10088—Magnetic resonance imaging [MRI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Biophysics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Veterinary Medicine (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Radiology & Medical Imaging (AREA)
- Heart & Thoracic Surgery (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- High Energy & Nuclear Physics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a disease grading prediction method, a disease grading prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring target image characteristics, target gene characteristics and target clinical characteristics of a target user; the target users comprise first users in a preset training set and second users in a preset test set; determining target characteristics of a target user according to the target image characteristics, the target gene characteristics and the target clinical characteristics; training a preset hierarchical prediction model based on the target feature vector and a preset label corresponding to the first user to obtain a target hierarchical prediction model; and inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user. A target grading prediction model is obtained through training, automatic grading prediction is carried out based on the target grading prediction model, interference of human factors is reduced, and accuracy of disease grading prediction is improved.
Description
Technical Field
The present invention relates to the field of medical image analysis, and in particular, to a disease classification prediction method, apparatus, electronic device, and storage medium.
Background
Brain tumor is an abnormal cell which is abnormally divided and abnormally grown in brain tissue, the morbidity is high, the mortality rate is over 3 percent, and the health of human body is seriously harmed. Among them, glioma is one of the common brain tumors originated in the cranium, and can be classified into four grades I-IV according to the classification of the world health organization. Wherein, the I-II grade is low grade glioma, and the III-IV grade is high grade glioma. High-grade gliomas (HGGs), such as glioblastoma multiforme (GBM), have a mean survival time of 23 months, a two-year survival rate of 47.4%, and a four-year survival rate of only 18.5%. Low Grade Gliomas (LGGs), such as oligodendrocytoma and astrocytoma, have a ten year survival rate of 57%. Accurate classification of brain glioma is a prerequisite for saving patient life, and has positive clinical significance for treatment decision, monitoring and management of chemoradiotherapy and prognosis evaluation.
For the brain tumor, Magnetic Resonance Imaging (MRI) is a typical non-invasive Imaging technique, can generate high-quality brain images without damage and skull artifacts, can provide more comprehensive information for brain tumor analysis, and is a main technical means for analyzing and processing brain tumors.
In the prior art, a grading method for diseases such as brain tumors is generally qualitatively analyzed by a radiologist by combining self experience and MRI images, and the method based on artificial experience grading is greatly influenced by human factors, so that the condition of wrong grading often occurs, and the accuracy of disease grading is low.
Disclosure of Invention
The embodiment of the invention provides a disease grading prediction method and device, electronic equipment and a storage medium, and aims to solve the problem that the accuracy of existing manual disease grading is low.
In order to solve the above problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention discloses a disease classification prediction method, including:
acquiring target image characteristics, target gene characteristics and target clinical characteristics of a target user; the target users comprise first users in a preset training set and second users in a preset test set;
determining a target feature of the target user according to the target image feature, the target gene feature and the target clinical feature;
training a preset hierarchical prediction model based on the target feature and a preset label corresponding to the first user to obtain a target hierarchical prediction model;
and inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user.
Optionally, the obtaining of the target image feature of the target user includes:
acquiring a target image of the target user;
segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image;
and extracting the image characteristics of the target area according to a preset image processing algorithm to obtain the target image characteristics of the target user.
Optionally, the acquiring target gene characteristics of the target user includes:
obtaining target genomics data of the target user;
selecting candidate genes from the target genomics data based on the type of the target disease; the candidate gene is related to the target disease and/or the mutation rate of the candidate gene in the target user is higher than a first preset threshold value;
and extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user.
Optionally, the acquiring the target clinical characteristics of the target user includes:
acquiring preset clinical data of the target user;
according to the type of the target disease, screening preset clinical data related to the target disease from the preset clinical data to obtain target clinical information;
and extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user.
Optionally, the determining the target feature of the target user according to the target image feature, the target gene feature and the target clinical feature includes:
merging the target image features, the target gene features and the target clinical features to obtain target total quantity features;
selecting target features of the target user from the target full-scale features according to a preset feature selection algorithm; the target features are the features with the importance degrees ranked at the top N bits in the target full-scale features; n is the target number of the target features; and N is an integer greater than 0.
Optionally, the merging the target image feature, the target gene feature and the target clinical feature includes:
vectorizing the target image features, the target gene features and the target clinical features respectively to obtain target image feature vectors, target gene feature vectors and target clinical feature vectors;
and merging the target image feature vector, the target gene feature vector and the target clinical feature vector to obtain a first target full-scale feature vector corresponding to the target full-scale feature.
Optionally, the selecting, according to a preset feature selection algorithm, a target feature of the target user from the target full-scale features includes:
normalizing the first target full-scale feature vector to obtain a second target full-scale feature vector corresponding to the first target full-scale feature;
determining the importance degree of each feature in the second target full-scale feature vector according to the preset feature selection algorithm;
determining a target number N corresponding to the target feature according to the importance degree and the preset feature selection algorithm;
and screening the feature with the importance degree ranked at the top N bits in the second target full-scale feature vector as the target feature.
Optionally, the determining the number N of the targets corresponding to the target feature includes:
sequentially selecting different numbers of features according to the ranking of the importance degrees;
inputting the features of each quantity into the preset feature selection algorithm for verification respectively, and determining the average accuracy corresponding to each feature quantity respectively;
and taking the feature quantity with the highest average accuracy as a target quantity N corresponding to the target feature.
In a second aspect, an embodiment of the present invention discloses a disease grading prediction apparatus, including:
the acquisition module is used for acquiring target image characteristics, target gene characteristics and target clinical characteristics of a target user; the target users comprise first users in a preset training set and second users in a preset test set;
a determination module, configured to determine a target feature of the target user according to the target image feature, the target gene feature, and the target clinical feature;
the training module is used for training a preset grading prediction model based on the target characteristics and a preset label corresponding to the first user to obtain a target grading prediction model;
and the input module is used for inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user.
Optionally, the obtaining module is specifically configured to:
acquiring a target image of the target user;
segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image;
and extracting the image characteristics of the target area according to a preset image processing algorithm to obtain the target image characteristics of the target user.
Optionally, the obtaining module is further specifically configured to:
obtaining target genomics data of the target user;
selecting candidate genes from the target genomics data based on the type of the target disease; the candidate gene is related to the target disease and/or the mutation rate of the candidate gene in the target user is higher than a first preset threshold value;
and extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user.
Optionally, the obtaining module is further specifically configured to:
acquiring preset clinical data of the target user;
according to the type of the target disease, screening preset clinical data related to the target disease from the preset clinical data to obtain target clinical information;
and extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user.
Optionally, the determining module is specifically configured to:
merging the target image features, the target gene features and the target clinical features to obtain target total quantity features;
selecting target features of the target user from the target full-scale features according to a preset feature selection algorithm; the target features are the features with the importance degrees ranked at the top N bits in the target full-scale features; n is the target number of the target features; and N is an integer greater than 0.
Optionally, the determining module is specifically configured to: vectorizing the target image features, the target gene features and the target clinical features respectively to obtain target image feature vectors, target gene feature vectors and target clinical feature vectors;
and merging the target image feature vector, the target gene feature vector and the target clinical feature vector to obtain a first target full-scale feature vector corresponding to the target full-scale feature.
Optionally, the determining module is further specifically configured to:
normalizing the first target full-scale feature vector to obtain a second target full-scale feature vector corresponding to the first target full-scale feature;
determining the importance degree of each feature in the second target full-scale feature vector according to the preset feature selection algorithm;
determining a target number N corresponding to the target feature according to the importance degree and the preset feature selection algorithm;
and screening the feature with the importance degree ranked at the top N bits in the second target full-scale feature vector as the target feature.
Optionally, the determining module is further specifically configured to:
sequentially selecting different numbers of features according to the ranking of the importance degrees;
inputting the features of each quantity into the preset feature selection algorithm for verification respectively, and determining the average accuracy corresponding to each feature quantity respectively;
and taking the feature quantity with the highest average accuracy as a target quantity N corresponding to the target feature.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the electronic device implements the step of disease classification prediction according to the first aspect.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the step of disease classification prediction according to the first aspect.
In the embodiment of the invention, the target image characteristic, the target gene characteristic and the target clinical characteristic of a target user are obtained; the target users comprise first users in a preset training set and second users in a preset test set; determining target characteristics of a target user according to the target image characteristics, the target gene characteristics and the target clinical characteristics; training a preset hierarchical prediction model based on the target feature vector and a preset label corresponding to the first user to obtain a target hierarchical prediction model; and inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user. Therefore, the target hierarchical prediction model is obtained by training by introducing the target image characteristics, the target gene characteristics and the target clinical characteristics of the target user, and automatic hierarchical prediction is performed based on the target hierarchical prediction model, so that the interference of human factors is reduced, and the accuracy of disease hierarchical prediction is improved.
Drawings
FIG. 1 is a flow chart illustrating the steps of a disease stratification prediction method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating disease progression prediction according to an embodiment of the present invention;
fig. 3 is a block diagram showing a disease classification prediction apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow chart of the steps of a disease staging prediction method of the present invention is shown. The execution subject in the embodiment of the present invention may refer to a computing device, and may specifically refer to a medical device, an intelligent device, a server, and the like, which is not specifically limited in this embodiment of the present invention. The disease grading prediction method specifically comprises the following steps:
In an embodiment of the present invention, the target user may refer to a target patient with a disease. Before a final target hierarchical prediction model is obtained, corresponding training and testing processes are required to be completed based on relevant data of a target user. A first user in the preset training set may be used for training of the model, and a second user in the preset test set may be used for accuracy testing of the model.
The target image feature may refer to a relevant feature of an MRI image of a target user, and for example, the target influence feature may include a first-order statistic feature, a shape feature, a texture feature, and the like, where the first-order statistic feature may specifically include a mean value, a variance, and the like, the shape feature may specifically include a volume, a surface area, and the like, and the texture feature may specifically include a gray level co-occurrence matrix, a gray level area size matrix, and the like, and may specifically be determined based on a specific kind of an actual disease and a specific hierarchical prediction requirement, which is not limited in this embodiment of the present invention.
The target gene characteristics may refer to gene information of a target user, and may specifically include a gene expression level, a gene mutation state, and the like. The target clinical characteristics may refer to clinical data of the target user, and may specifically include data of the target user such as age, sex, height, weight, and blood pressure, and the embodiments of the present invention are not limited to specific types of the target gene characteristics and the target clinical characteristics.
The traditional manual film reading mode can only be based on manual experience and partial histological classification characteristics in the medical image. The genes introduced in the examples of the invention are more critical predictors of changes in some molecular markers associated with treatment and prognosis. Illustratively, several molecular genetic markers (including IDH1/2 mutation, TP53 mutation, MGMT promoter methylation, 1p/19q co-deletion, etc.) play an important role in tumorigenesis. In the embodiment of the invention, the target image characteristics and the target gene characteristics are fused, and the target clinical characteristics of the target user are combined, so that the advantages of both imaging and genomics can be combined, the biomolecular hierarchical information is fused into an imaging method, and the actual clinical data of the target user can be combined, thereby further improving the accuracy of disease grading prediction.
And 102, determining the target characteristics of the target user according to the target image characteristics, the target gene characteristics and the target clinical characteristics.
In the embodiment of the present invention, the target feature may refer to each feature used by the target user for model training. The target influence characteristics, the target gene characteristics and the target clinical characteristics of the target user can be multiple, the characteristics with higher importance degree of disease grading prediction can be screened out through characteristic fusion and extraction in the step to obtain the target characteristics, and the model is trained based on the target characteristics subsequently, so that the accuracy of model prediction can be improved.
In the embodiment of the present invention, the preset label may be a hierarchical label (label) corresponding to a first user in a preset training set. The preset tag may be predetermined and labeled. The preset hierarchical prediction model may refer to a preset classification model, specifically, a Support Vector Machines (SVM) classification model, and the like, and of course, other classifiers may also be used. The target hierarchical prediction model may refer to a trained hierarchical prediction model that may be used for prediction.
In this step, after the target features of the target users are obtained, the preset hierarchical prediction model may be trained based on data of the first users in the preset training set, the target features of the plurality of first users and the preset labels of the first users are input into the preset hierarchical prediction model, and a training process of the preset hierarchical prediction model is executed to obtain a final target hierarchical prediction model.
And 104, inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user.
In an embodiment of the present invention, the target rating may refer to a prediction result of a disease level of the second user. After the preset hierarchical prediction model is trained and the target hierarchical prediction model is obtained, the target characteristics of the second user in the preset test set can be input into the target hierarchical prediction model to obtain the target hierarchy of the second user. When disease grading prediction needs to be carried out on other patients subsequently, target image characteristics, target gene characteristics and target clinical characteristics of other patients can be collected, target characteristics of other patients are obtained based on the characteristics, the target characteristics of other patients are input into a target grading prediction model, and then disease grades corresponding to other patients can be obtained, so that the method is convenient and rapid to use, and the accuracy is high.
In the embodiment of the invention, target image characteristics, target gene characteristics and target clinical characteristics of a target user are obtained; the target users comprise first users in a preset training set and second users in a preset test set; determining target characteristics of a target user according to the target image characteristics, the target gene characteristics and the target clinical characteristics; training a preset hierarchical prediction model based on the target feature vector and a preset label corresponding to the first user to obtain a target hierarchical prediction model; and inputting the target characteristic vector of the second user into the target grading prediction model to obtain the target grading corresponding to the second user. Therefore, the target hierarchical prediction model is obtained by training by introducing the target image characteristics, the target gene characteristics and the target clinical characteristics of the target user, and automatic hierarchical prediction is performed based on the target hierarchical prediction model, so that the interference of human factors is reduced, and the accuracy of disease hierarchical prediction is improved.
Optionally, in the embodiment of the present invention, the obtaining of the target image feature of the target user in step 101 may be specifically implemented through the following steps S21 to S23:
and step S21, acquiring a target image of the target user.
In an embodiment of the present invention, the target image may refer to a multi-modality MRI magnetic resonance image (including four sequences of T1, FLAIR, T1c, and T2) of the target user, where T1 and T2 are used to measure physical quantities of electromagnetic waves, T1 may be used to show an anatomical image, and T2 may be used to show a lesion image. The FLAIR water pressure image is based on a magnetic resonance imaging liquid attenuation inversion sequence, also called a water inhibition imaging technology, a T1c sequence is formed by shooting contrast agent (pigment) into blood before MR is carried out, bright places are rich in blood supply, and intensive display shows that the blood flow is rich, and a tumor part is a part with rapid blood flow. The T1c sequence can further indicate intratumoral conditions, identifying tumors from non-neoplastic lesions. The multi-modal target image can be obtained, the comprehensiveness and integrity of image data can be guaranteed, and the accuracy of subsequent grading prediction is improved.
Illustratively, when performing hierarchical prediction on a target disease, i.e., brain glioma, the embodiments of the present invention may acquire and use a TCGA-LGG low-level brain glioma data set and a TCGA-GBM high-level brain glioma data set, which collectively include pre-operative multi-modal MRI magnetic resonance images of 65 low-level brain tumor patients and 102 high-level brain tumor patients, and perform feature extraction and subsequent model training and testing processes.
And step S22, segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image.
In the embodiment of the present invention, the preset image segmentation algorithm may be used for image segmentation, and specifically may refer to a Semantic Feature Pyramid Network (SFPN) deep learning model, and the like. The target region may refer to a disease region in the target image, and for a brain tumor, the target region may refer to a tumor region.
In this step, after the target image is obtained, the target image may be preprocessed, specifically, the preprocessing may include size scaling and normalization processing (Z-score normalization), and the target image may be scaled to a uniform size and processing standard through the preprocessing, so that accuracy of subsequent feature extraction is ensured, and influence of irrelevant factors such as image size on accuracy of hierarchical prediction is avoided to the greatest extent. And then, automatically segmenting the target image at the pixel level by adopting an SFPN deep learning model, and classifying and judging the region to which each pixel belongs to determine the target region in the target image. For example, after performing pixel-level segmentation on a target image of a brain tumor patient, the classification result of each pixel can be determined, which belongs to a tumor region output 1 and a non-tumor region output 0.
And step S23, extracting the image features of the target area according to a preset image processing algorithm to obtain the target image features of the target user.
In the embodiment of the invention, the preset image processing algorithm can be used for extracting the image features. After the target region of the disease is determined according to the original target image and the preset image segmentation algorithm, the image processing algorithm can be used to extract the image features of the target region images under 4 modalities (T1, FLAIR, T1c, T2) respectively to form the target image features.
For example, for a target user with a brain tumor, in the embodiment of the present invention, an MRI image of the target user is first acquired, carrying out image segmentation on the MRI image, determining a target region, namely a tumor region image, then extracting influence characteristics of the tumor region image under 4 modalities, wherein the influence characteristics comprise 386 first-order statistic characteristics, shape characteristics and texture characteristics to form an image characteristic vector R, the first-order statistic characteristics comprise 18 average values, variances, kurtosis, energy, entropies and the like, the shape characteristics comprise 14 volume, surface area, sphericity, maximum three-dimensional diameter and the like, the texture characteristics comprise 75 gray level co-occurrence matrixes, gray level area size matrixes, gray level run matrixes, gray level correlation matrixes and adjacent gray level difference matrixes, and table 1 shows the types and the numbers of the image characteristics extracted in different modes in the embodiment of the invention. Note that since the shape feature is not dependent on the modality and is related only to the segmentation result of the target region, only one set may be extracted.
TABLE 1
In the embodiment of the invention, a target image of a target user is obtained; segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image; and extracting the image characteristics of the target area according to a preset image processing algorithm to obtain the target image characteristics of the target user. Therefore, the target area can be automatically segmented and determined, the image features of the target area can be subsequently extracted to obtain the target image features, the image processing efficiency is improved, the extracted image features have interpretability, and the accuracy of subsequent model prediction is further ensured.
Optionally, in the embodiment of the present invention, the obtaining of the target gene feature of the target user in step 101 may be specifically implemented by steps S31 to S33 as follows:
and step S31, obtaining target genomics data of the target user.
In the embodiment of the present invention, the target genomics data may refer to genetic data of a target user.
Step S32, selecting candidate genes from the target genomics data based on the types of target diseases; the candidate gene is related to the target disease and/or the mutation rate of the candidate gene in the target user is higher than a first preset threshold value.
In the embodiment of the present invention, the target disease may refer to a disease targeted by hierarchical prediction, such as brain glioma. The candidate gene can be a gene which is highly related to a target disease and has a high mutation rate in target genomics data. The first preset threshold may be a preset mutation rate threshold, and when the mutation rate is higher than the first preset threshold, it may be determined that the mutation rate of the gene is higher.
Specifically, in this step, the first n genes with strongest hierarchical relevance to the target disease may be selected as the first candidate genes according to the type of the target disease; then, gene mutation states of the target user are obtained, where the gene mutation states include a wild type (denoted as 0) and a mutant type (denoted as 1), and according to the difference of mutation rates of different genes in the target user (the mutation rate of a certain gene is the number of patients with the gene mutation in a data set/the total number of patients in the data set), the first q genes with the highest mutation rate are selected as second candidate genes, and certainly, genes with a mutation rate higher than a first preset threshold value may be directly selected as the second candidate genes. After the first candidate gene and the second candidate gene are obtained, the intersection or union of the first candidate gene and the second candidate gene can be taken to determine the final candidate gene, so that the gene with the strongest relevance with the target disease or the highest mutation rate can be accurately determined, and the accuracy and the comprehensiveness of the subsequent gene feature extraction are ensured.
And step S33, extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user.
In the embodiment of the invention, after the candidate genes are determined, the gene characteristics of each target user for each candidate gene, specifically including the gene expression level, the gene mutation state and the like, can be respectively obtained, so as to obtain the target gene characteristics of the target user.
In the embodiment of the invention, target genomics data of a target user are obtained; selecting candidate genes from the target genomics data based on the type of the target disease; the candidate gene is related to a target disease and/or the mutation rate of the candidate gene in a target user is higher than a first preset threshold value; and extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user. Therefore, in the embodiment of the invention, by determining the candidate gene most related to the target disease or having the highest mutation rate and extracting the gene characteristics of the candidate gene of the target user, the genomics characteristics and the image characteristics can be fused, and the accuracy of disease grading prediction is improved.
Optionally, in this embodiment of the present invention, the obtaining of the target clinical characteristics of the target user in step 101 may be specifically implemented by steps S41 to S43 as follows:
and step S41, acquiring preset clinical data of the target user.
In an embodiment of the present invention, the preset clinical data may refer to various clinical data of the target user.
And step S42, according to the type of the target disease, screening preset clinical data related to the target disease from the preset clinical data to obtain target clinical information.
In an embodiment of the present invention, the target clinical information may refer to clinical data related to the target disease in the preset clinical data. Through data screening, can filter out the clinical data irrelevant with the target disease, guarantee the validity of clinical data.
And step S43, extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user.
In the embodiment of the invention, after the target clinical information related to the target disease is determined, data processing operations such as discretization and labeling can be performed on the target clinical information, and the target clinical characteristics of the target user can be extracted.
In the embodiment of the invention, preset clinical data of a target user are acquired; screening preset clinical data related to the target disease from the preset clinical data according to the type of the target disease to obtain target clinical information; and extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user. Therefore, in the embodiment of the invention, the target clinical data of the target user is determined to be used in the subsequent model training process through data screening and data extraction processing, so that the classification accuracy of diseases can be improved.
Optionally, in this embodiment of the present invention, step 102 may be specifically implemented by the following steps 1021 to 1022:
and 1021, merging the target image characteristic, the target gene characteristic and the target clinical characteristic to obtain a target total quantity characteristic.
In the embodiment of the present invention, the target full-scale feature may refer to a feature set formed by combining a target image feature, a target gene feature, and a target clinical feature. The target total quantity features comprise all the features in the target image features, the target gene features and the target clinical features determined in the previous steps, and can be used as a data basis for subsequent feature screening.
Step 1022, selecting a target feature of the target user from the target full-scale features according to a preset feature selection algorithm; the target features are the features with the importance degrees ranked at the top N bits in the target full-scale features; n is the target number of the target features; and N is an integer greater than 0.
In the embodiment of the present invention, the preset feature selection algorithm may be a preset feature screening algorithm, and specifically may refer to a cross validation-recursive feature elimination method, etc., and the embodiment of the present invention does not limit the specific type of the preset feature selection algorithm. The degree of importance may refer to the degree of importance of a certain feature of the target full-scale features for the hierarchical prediction. The target number N may refer to an optimal number of target features. It should be noted that the accuracy of the hierarchical prediction of the target features with different numbers is different, and in this step, the target number N may be determined first based on the importance degree and a preset feature selection algorithm, and then the target features with the target number N are selected to form a target feature set finally used for the hierarchical prediction.
In the embodiment of the invention, the target image characteristics, the target gene characteristics and the target clinical characteristics are combined to obtain target total quantity characteristics; selecting target characteristics of a target user from the target full-scale characteristics according to a preset characteristic selection algorithm; the target feature is the feature with the importance degree ranked in the top N bits in the target full-scale feature. Therefore, the target total quantity features are obtained through feature combination, then the target features with higher importance degree are determined based on a feature selection algorithm, the optimal target feature subset can be selected, the classifier is trained by using the optimal feature subset, and the accuracy of the target grading prediction model can be further improved.
Optionally, in the embodiment of the present invention, the step 1021 may be specifically implemented by the following steps S51 to S52:
step S51, vectorizing the target image feature, the target gene feature and the target clinical feature respectively to obtain a target image feature vector, a target gene feature vector and a target clinical feature vector.
In the embodiment of the invention, after the target image feature, the target gene feature and the target clinical feature are obtained, vectorization of the feature set can be performed to obtain a target image feature vector R, a target gene feature vector G and a target clinical feature vector C.
And step S52, merging the target image feature vector, the target gene feature vector and the target clinical feature vector to obtain a first target full-scale feature vector corresponding to the target full-scale feature.
In an embodiment of the present invention, the first target full-scale feature vector may refer to a vector corresponding to an original target full-scale feature including all image genomics features. Specifically, the target image feature vector R, the target gene feature vector G, and the target clinical feature vector C may be merged to obtain the first target full-scale feature vector F, i.e., F ═ R + G + C.
In the embodiment of the invention, the target image characteristic, the target gene characteristic and the target clinical characteristic are respectively vectorized to obtain a target image characteristic vector, a target gene characteristic vector and a target clinical characteristic vector; and merging the target image feature vector, the target gene feature vector and the target clinical feature vector to obtain a first target full-scale feature vector corresponding to the target full-scale feature. Therefore, the first target full-quantity feature vector is obtained by vectorizing and merging the features, the operation speed can be improved by performing feature processing based on the vector, and the comprehensiveness of the features in subsequent feature screening can be ensured by merging the vectors.
Optionally, in the embodiment of the present invention, the step 1022 specifically includes the following steps S61 to S64:
step S61, performing normalization processing on the first target full-scale feature vector to obtain a second target full-scale feature vector corresponding to the first target full-scale feature.
In this embodiment of the present invention, the second target full-scale feature vector may refer to the first target full-scale feature vector after the normalization processing. In this step, after the feature combination is performed to obtain the first target full-scale feature, the normalization operation may be performed on the first target full-scale feature vector F obtained after the combination, and all the quantized feature data in F are normalized to the range of [ -1,1] to obtain F ', so as to obtain the normalized second target full-scale feature vector F'. The specific normalization formula may be:
and step S62, determining the importance degree of each feature in the second target full-quantity feature vector according to the preset feature selection algorithm.
In the embodiment of the invention, after the normalization processing is performed, the importance degree of each feature in the second target full-quantity feature vector can be calculated by using the second target full-quantity feature vector and a preset feature selection algorithm. Exemplarily, taking a preset feature selection algorithm as a support vector machine-based recursive feature elimination method (SVM-RFE) as an example, when feature selection is performed based on a second target full-scale feature vector F', the specific method is to randomly divide a data set into 5 subsets, wherein 4 subsets are used as training sets, and the other 1 subset is used as a verification set, so that 5 sets of training sets and verification sets can be obtained, the SVM classifier is used for modeling the 5 sets of training sets and the verification sets respectively, the importance degree of each feature is calculated, and the importance degree ranking of all the features is obtained by using the recursive feature elimination method (RFE).
And step S63, determining the target number N corresponding to the target feature according to the importance degree and the preset feature selection algorithm.
In the embodiment of the invention, after the importance degree of each feature in the second target full-scale feature vector is determined, verification can be performed again based on the importance degree and the preset feature selection algorithm, the optimal number N corresponding to the target feature is determined, and the accuracy in the subsequent model training process is further improved.
And step S64, screening the features with the importance degree ranked at the top N bits in the second target full-quantity feature vector as the target features.
In the embodiment of the present invention, after the number N of targets is determined, features with importance degree ranked to top TopN are selected as target features in the second target full-scale feature vector based on the ranking of importance degree, so as to form a target feature set.
In the embodiment of the invention, the first target full-scale feature vector is subjected to normalization processing to obtain a second target full-scale feature vector corresponding to the first target full-scale feature; determining the importance degree of each feature in the second target full-scale feature vector according to a preset feature selection algorithm; determining the target number N corresponding to the target features according to the importance degree and a preset feature selection algorithm; and in the second target full-quantity feature vector, screening features with importance degrees ranked in the top N bits as target features. Therefore, the target features are screened by determining the importance degree of each feature and determining the optimal number of the target features, the rationality of feature screening can be improved, the screening of the optimal target feature set is realized, and the accuracy of classification of a target grading prediction model obtained by subsequent training can be further improved.
Optionally, in the embodiment of the present invention, step S63 may be specifically implemented by steps S631 to S633 as follows:
and S631, sequentially selecting different numbers of features according to the ranking of the importance degrees.
In the embodiment of the invention, after the importance degree of each feature in the second target full-quantity feature vector is determined, the optimal number of the target features can be further determined. In this step, different numbers of features may be selected first, for example, the first 5 features, the first 10 features, the first 20 features, and the like, which are ranked in order of importance degree, may be selected from the second target full-scale feature vector, and then the feature set composed of the different numbers of features may be verified to determine the optimal number.
Step S632 is to input the features of each quantity into the preset feature selection algorithm for verification, and determine the average accuracy corresponding to each feature quantity.
In the embodiment of the present invention, the average accuracy may be an accuracy when the model trained based on the features of the feature number is finally used for the preset classification. Exemplarily, in this step, 5-fold cross validation may be adopted to determine the number of the selected optimal features, that is, the target number N, sequentially select different numbers of features according to the importance order, and perform cross validation on the selected feature sets to obtain average accuracies corresponding to the feature sets of different feature numbers, respectively.
Step S633, taking the feature quantity with the highest average accuracy as the target quantity N corresponding to the target feature.
In the embodiment of the present invention, after the average accuracy corresponding to the feature sets with different feature quantities is determined, the feature quantity of the feature set with the highest average accuracy may be used as the selected optimal feature quantity. Then, an RFE (recursive feature elimination) operation may be performed on the entire data set to obtain all target features and compose a target feature vector set. Illustratively, taking the data sets of the foregoing 65 low-grade brain tumor patients and 102 high-grade brain tumor patients as examples, the SVM-RFE method is used to determine the number of the optimal features to be 19, and table 2 shows the types of the target features determined after feature selection according to the embodiment of the present invention.
TABLE 2
In the embodiment of the invention, different numbers of features are sequentially selected according to the sequence of the importance degrees; respectively inputting the features of each quantity into a preset feature selection algorithm for verification, and determining the average accuracy corresponding to each feature quantity; and taking the feature quantity with the highest average accuracy as the target quantity N corresponding to the target features. Therefore, the optimal number of the features is determined through the importance degree and the preset feature selection algorithm, the optimal target feature subset can be selected, the classifier is trained by using the optimal feature subset, and the accuracy of model grading prediction can be further improved.
Illustratively, fig. 2 shows a flow chart of disease staging prediction according to an embodiment of the present invention. The following describes a specific procedure of the disease grading prediction method according to the embodiment of the present invention with reference to an example, in view of a case where a target disease is glioma and a target user is a patient with glioma:
step 201, obtaining a multi-modal MRI image of a patient with brain glioma; automatically segmenting the multi-modal MRI image according to a preset image segmentation algorithm to determine a tumor region (target region); and extracting the target image characteristics of the tumor region image through a preset image processing algorithm.
Step 202, acquiring gene data (target genomics data) of a patient with glioma, and determining candidate genes through correlation analysis and mutation rate statistics; and then extracting the gene characteristics of the candidate genes of each patient to obtain the target gene characteristics.
Step 203, obtaining clinical data (preset clinical data) of the patient with glioma, determining target clinical information through data screening, and obtaining target clinical characteristics through characteristic extraction.
Steps 201 to 203 are described in detail in the foregoing, and the embodiments of the present invention are not described herein again.
And 204, combining the target image features, the target clinical features and the target gene features to obtain target full-scale features, and then extracting the features to obtain the target features.
In the embodiment of the invention, a TCGA-LGG low-grade brain glioma data set and a TCGA-GBM high-grade brain glioma data set are firstly obtained, and the TCGA-LGG low-grade brain glioma data set and the TCGA-GBM high-grade brain glioma data set collectively comprise 65 cases of low-grade brain tumor patients and 102 cases of high-grade brain tumor patients preoperative multi-modal MRI magnetic resonance images (comprising four sequences of T1, FLAIR, T1c and T2), and preset clinical data and target genomics data corresponding to each patient. For MRI image data, a Semantic Feature Pyramid Network (SFPN) is adopted to automatically segment a tumor region, and an image processing algorithm is used to extract first-order statistic features, shape features, texture features and the like of the tumor region to form target image features; for gene data, determining candidate genes through correlation analysis and mutation rate statistics, and extracting the gene expression level and the gene mutation state of the candidate genes as target gene characteristics; for clinical data, target clinical information, such as age, sex and the like of a patient, is obtained through data screening, and the target clinical information is discretized, labeled and the like to obtain target clinical characteristics. And combining the three features to obtain a target total quantity feature, and performing feature selection on the target total quantity feature by using a cross validation-recursive feature elimination method to obtain a target feature vector set corresponding to the target feature.
Step 205, inputting the target features and the preset labels into a preset hierarchical prediction model (SVM classifier), and training to obtain the target hierarchical prediction model.
In the step, a target feature vector set corresponding to the target features is randomly divided into 5 subsets, and a Support Vector Machine (SVM) classification model is adopted as a preset hierarchical prediction classification model and a 5-fold cross validation method is adopted to perform model training to obtain a target hierarchical prediction classification model. Then, target feature vectors of patients (second users) in the preset test set can be sequentially input into the target grading prediction classification model, and the model outputs target grading, namely a tumor grading prediction result. The cross validation method adopted in the embodiment of the invention specifically comprises the steps of grouping original data, taking one part as a training set and taking the other part as a test set, firstly training the classifier by using the training set, and then testing the model obtained by training by using the test set so as to be used as a performance index for evaluating the classifier.
In the embodiment of the present invention, for the prediction result, the classification accuracy may be used as a performance evaluation index of the model, and the calculation formula is (TP + TN)/(P + N), where TP represents the number correctly divided into positive examples, TN represents the number correctly divided into negative examples, and P + N represents the sum of all positive examples and all negative examples, that is, all sample numbers.
In the above example, the results of the five-fold cross validation evaluation of 167 patients are shown in table 3, in order to validate the performance of the disease grading prediction method of the embodiment of the present invention, the imaging group method (only using image features to perform grading prediction) is compared with the embodiment of the present invention, it can be seen that the disease grading prediction method of the embodiment of the present invention has higher grading accuracy on 4 subsets (1, 2, 4, 5), the average accuracy of 167 patients is 94%, and is improved by 2.4% compared with the imaging group method used alone.
TABLE 3 five-fold cross-validation evaluation results
TABLE 3
From the results, the disease grading method provided by the embodiment of the invention can comprehensively quantify the tumor phenotype characteristics, reduce the human factor interference through automatic quantitative analysis, introduce the genomics characteristics and clinical characteristics of the patient for prediction, improve the grading prediction accuracy and provide a basis for the patient to establish the optimal diagnosis and treatment scheme compared with the imaging omics method only using the image characteristics.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 3, a block diagram of a disease grading prediction apparatus according to an embodiment of the present invention is shown, and specifically, the apparatus 30 may include the following modules:
an obtaining module 301, configured to obtain a target image feature, a target gene feature, and a target clinical feature of a target user; the target users comprise first users in a preset training set and second users in a preset test set;
a determining module 302, configured to determine a target feature of the target user according to the target image feature, the target gene feature, and the target clinical feature;
a training module 303, configured to train a preset hierarchical prediction model based on the target feature and a preset label corresponding to the first user, to obtain a target hierarchical prediction model;
an input module 304, configured to input the target feature of the second user into the target classification prediction model, so as to obtain a target classification corresponding to the second user.
In summary, the disease grading prediction apparatus provided in the embodiment of the present invention obtains the target image feature, the target gene feature and the target clinical feature of the target user; the target users comprise first users in a preset training set and second users in a preset test set; determining target characteristics of a target user according to the target image characteristics, the target gene characteristics and the target clinical characteristics; training a preset hierarchical prediction model based on the target feature vector and a preset label corresponding to the first user to obtain a target hierarchical prediction model; and inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user. Therefore, the target hierarchical prediction model is obtained by training by introducing the target image characteristics, the target gene characteristics and the target clinical characteristics of the target user, and automatic hierarchical prediction is performed based on the target hierarchical prediction model, so that the interference of human factors is reduced, and the accuracy of disease hierarchical prediction is improved.
Optionally, the obtaining module 301 is specifically configured to:
acquiring a target image of the target user;
segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image;
and extracting the image characteristics of the target area according to a preset image processing algorithm to obtain the target image characteristics of the target user.
Optionally, the obtaining module 301 is further specifically configured to:
obtaining target genomics data of the target user;
selecting candidate genes from the target genomics data based on the type of the target disease; the candidate gene is related to the target disease and/or the mutation rate of the candidate gene in the target user is higher than a first preset threshold value;
and extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user.
Optionally, the obtaining module 301 is further specifically configured to:
acquiring preset clinical data of the target user;
according to the type of the target disease, screening preset clinical data related to the target disease from the preset clinical data to obtain target clinical information;
and extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user.
Optionally, the determining module 302 is specifically configured to:
merging the target image features, the target gene features and the target clinical features to obtain target total quantity features;
selecting target features of the target user from the target full-scale features according to a preset feature selection algorithm; the target features are the features with the importance degrees ranked at the top N bits in the target full-scale features; n is the target number of the target features; and N is an integer greater than 0.
Optionally, the selecting, according to a preset feature selection algorithm, a target feature of the target user from the target full-scale features includes:
normalizing the first target full-scale feature vector to obtain a second target full-scale feature vector corresponding to the first target full-scale feature;
determining the importance degree of each feature in the second target full-scale feature vector according to the preset feature selection algorithm;
determining a target number N corresponding to the target feature according to the importance degree and the preset feature selection algorithm;
and screening the feature with the importance degree ranked at the top N bits in the second target full-scale feature vector as the target feature.
Optionally, the determining the number N of the targets corresponding to the target feature includes:
sequentially selecting different numbers of features according to the ranking of the importance degrees;
inputting the features of each quantity into the preset feature selection algorithm for verification respectively, and determining the average accuracy corresponding to each feature quantity respectively;
and taking the feature quantity with the highest average accuracy as a target quantity N corresponding to the target feature.
Optionally, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program that is stored in the memory and is executable on the processor, and when the computer program is executed by the processor, the computer program implements each process of the disease classification prediction method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
Optionally, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the disease classification prediction method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present invention, but the present disclosure is not necessarily detailed herein for reasons of space.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
Claims (11)
1. A disease staging prediction method, comprising:
acquiring target image characteristics, target gene characteristics and target clinical characteristics of a target user; the target users comprise first users in a preset training set and second users in a preset test set;
determining a target feature of the target user according to the target image feature, the target gene feature and the target clinical feature;
training a preset hierarchical prediction model based on the target feature and a preset label corresponding to the first user to obtain a target hierarchical prediction model;
and inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user.
2. The method of claim 1, wherein the obtaining target image features of the target user comprises:
acquiring a target image of the target user;
segmenting the target image according to a preset image segmentation algorithm, and determining a target area in the target image;
and extracting the image characteristics of the target area according to a preset image processing algorithm to obtain the target image characteristics of the target user.
3. The method of claim 1, wherein the obtaining the target gene signature of the target user comprises:
obtaining target genomics data of the target user;
selecting candidate genes from the target genomics data based on the type of the target disease; the candidate gene is related to the target disease and/or the mutation rate of the candidate gene in the target user is higher than a first preset threshold value;
and extracting the gene characteristics of the candidate genes to obtain the target gene characteristics of the target user.
4. The method of claim 1, wherein the obtaining a target clinical profile of a target user comprises:
acquiring preset clinical data of the target user;
according to the type of the target disease, screening preset clinical data related to the target disease from the preset clinical data to obtain target clinical information;
and extracting the clinical characteristics of the target clinical information to obtain the target clinical characteristics of the target user.
5. The method of any one of claims 1 to 4, wherein said determining a target feature of the target user based on the target image feature, the target genetic feature, and the target clinical feature comprises:
merging the target image features, the target gene features and the target clinical features to obtain target total quantity features;
selecting target features of the target user from the target full-scale features according to a preset feature selection algorithm; the target features are the features with the importance degrees ranked at the top N bits in the target full-scale features; n is the target number of the target features; and N is an integer greater than 0.
6. The method of claim 5, wherein said merging the target image signature, the target genetic signature, and the target clinical signature comprises:
vectorizing the target image features, the target gene features and the target clinical features respectively to obtain target image feature vectors, target gene feature vectors and target clinical feature vectors;
and merging the target image feature vector, the target gene feature vector and the target clinical feature vector to obtain a first target full-scale feature vector corresponding to the target full-scale feature.
7. The method according to claim 6, wherein the selecting the target feature of the target user from the target full-scale features according to a preset feature selection algorithm comprises:
normalizing the first target full-scale feature vector to obtain a second target full-scale feature vector corresponding to the first target full-scale feature;
determining the importance degree of each feature in the second target full-scale feature vector according to the preset feature selection algorithm;
determining a target number N corresponding to the target feature according to the importance degree and the preset feature selection algorithm;
and screening the feature with the importance degree ranked at the top N bits in the second target full-scale feature vector as the target feature.
8. The method according to claim 7, wherein the determining the target number N corresponding to the target feature comprises:
sequentially selecting different numbers of features according to the ranking of the importance degrees;
inputting the features of each quantity into the preset feature selection algorithm for verification respectively, and determining the average accuracy corresponding to each feature quantity respectively;
and taking the feature quantity with the highest average accuracy as a target quantity N corresponding to the target feature.
9. A disease progression prediction apparatus, comprising:
the acquisition module is used for acquiring target image characteristics, target gene characteristics and target clinical characteristics of a target user; the target users comprise first users in a preset training set and second users in a preset test set;
a determination module, configured to determine a target feature of the target user according to the target image feature, the target gene feature, and the target clinical feature;
the training module is used for training a preset grading prediction model based on the target characteristics and a preset label corresponding to the first user to obtain a target grading prediction model;
and the input module is used for inputting the target characteristics of the second user into the target grading prediction model to obtain the target grading corresponding to the second user.
10. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing a disease progression prediction method as claimed in any one of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a disease progression prediction method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111250817.0A CN114121291A (en) | 2021-10-26 | 2021-10-26 | Disease grading prediction method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111250817.0A CN114121291A (en) | 2021-10-26 | 2021-10-26 | Disease grading prediction method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114121291A true CN114121291A (en) | 2022-03-01 |
Family
ID=80377116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111250817.0A Pending CN114121291A (en) | 2021-10-26 | 2021-10-26 | Disease grading prediction method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114121291A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024066722A1 (en) * | 2022-09-27 | 2024-04-04 | 京东方科技集团股份有限公司 | Target-model acquisition method and apparatus, prognostic-evaluation-value determination method and apparatus, and device and medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107280697A (en) * | 2017-05-15 | 2017-10-24 | 北京市计算中心 | Lung neoplasm grading determination method and system based on deep learning and data fusion |
CN109817332A (en) * | 2019-02-28 | 2019-05-28 | 南京信息工程大学 | The stage division of Pancreatic Neuroendocrine Tumors based on CT radiation group |
CN111242174A (en) * | 2019-12-31 | 2020-06-05 | 浙江大学 | Liver cancer image feature extraction and pathological classification method and device based on imaging omics |
CN111476754A (en) * | 2020-02-28 | 2020-07-31 | 中国人民解放军陆军军医大学第二附属医院 | Artificial intelligence auxiliary grading diagnosis system and method for bone marrow cell image |
CN112117003A (en) * | 2020-09-03 | 2020-12-22 | 中国科学院深圳先进技术研究院 | Tumor risk grading method, system, terminal and storage medium |
CN112991363A (en) * | 2021-03-17 | 2021-06-18 | 泰康保险集团股份有限公司 | Brain tumor image segmentation method and device, electronic equipment and storage medium |
US20210200988A1 (en) * | 2019-12-31 | 2021-07-01 | Zhejiang University | Method and equipment for classifying hepatocellular carcinoma images by combining computer vision features and radiomics features |
-
2021
- 2021-10-26 CN CN202111250817.0A patent/CN114121291A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107280697A (en) * | 2017-05-15 | 2017-10-24 | 北京市计算中心 | Lung neoplasm grading determination method and system based on deep learning and data fusion |
CN109817332A (en) * | 2019-02-28 | 2019-05-28 | 南京信息工程大学 | The stage division of Pancreatic Neuroendocrine Tumors based on CT radiation group |
CN111242174A (en) * | 2019-12-31 | 2020-06-05 | 浙江大学 | Liver cancer image feature extraction and pathological classification method and device based on imaging omics |
US20210200988A1 (en) * | 2019-12-31 | 2021-07-01 | Zhejiang University | Method and equipment for classifying hepatocellular carcinoma images by combining computer vision features and radiomics features |
CN111476754A (en) * | 2020-02-28 | 2020-07-31 | 中国人民解放军陆军军医大学第二附属医院 | Artificial intelligence auxiliary grading diagnosis system and method for bone marrow cell image |
CN112117003A (en) * | 2020-09-03 | 2020-12-22 | 中国科学院深圳先进技术研究院 | Tumor risk grading method, system, terminal and storage medium |
CN112991363A (en) * | 2021-03-17 | 2021-06-18 | 泰康保险集团股份有限公司 | Brain tumor image segmentation method and device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
卞修武等: "分子病理与精准诊断", 31 December 2020, 上海交通大学出版社, pages: 464 - 465 * |
孙贤婷等: "基于影像组学的logistic回归模型预测胶质瘤分级", 中南大学学报(医学版), vol. 46, no. 4, 15 April 2021 (2021-04-15), pages 385 - 392 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024066722A1 (en) * | 2022-09-27 | 2024-04-04 | 京东方科技集团股份有限公司 | Target-model acquisition method and apparatus, prognostic-evaluation-value determination method and apparatus, and device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches | |
Thawani et al. | Radiomics and radiogenomics in lung cancer: a review for the clinician | |
Vaithinathan et al. | A novel texture extraction technique with T1 weighted MRI for the classification of Alzheimer’s disease | |
Stoyanova et al. | Prostate cancer radiomics and the promise of radiogenomics | |
Acharya et al. | Towards precision medicine: from quantitative imaging to radiomics | |
Santos et al. | Automatic detection of small lung nodules in 3D CT data using Gaussian mixture models, Tsallis entropy and SVM | |
Qiu et al. | Reproducibility and non-redundancy of radiomic features extracted from arterial phase CT scans in hepatocellular carcinoma patients: impact of tumor segmentation variability | |
Gao et al. | Prostate segmentation by sparse representation based classification | |
Zhang et al. | Effective staging of fibrosis by the selected texture features of liver: Which one is better, CT or MR imaging? | |
CN107016395B (en) | Identification system for sparsely expressed primary brain lymphomas and glioblastomas | |
Perez et al. | Automated lung cancer diagnosis using three-dimensional convolutional neural networks | |
Li et al. | Molecular Subtypes Recognition of Breast Cancer in Dynamic Contrast‐Enhanced Breast Magnetic Resonance Imaging Phenotypes from Radiomics Data | |
Bandyk et al. | MRI and CT bladder segmentation from classical to deep learning based approaches: Current limitations and lessons | |
Lee et al. | Quality of radiomic features in glioblastoma multiforme: impact of semi-automated tumor segmentation software | |
Gut et al. | Benchmarking of deep architectures for segmentation of medical images | |
Zhang et al. | A review of breast tissue classification in mammograms | |
CN112561869B (en) | Pancreatic neuroendocrine tumor postoperative recurrence risk prediction method | |
Bhatele et al. | Machine learning application in glioma classification: review and comparison analysis | |
Florez et al. | Emergence of radiomics: novel methodology identifying imaging biomarkers of disease in diagnosis, response, and progression | |
Zhang et al. | A fully automatic extraction of magnetic resonance image features in glioblastoma patients | |
CN115457361A (en) | Classification model obtaining method, expression class determining method, apparatus, device and medium | |
Bakas et al. | Segmentation of gliomas in multimodal magnetic resonance imaging volumes based on a hybrid generative-discriminative framework | |
Kaushik et al. | Brain tumor segmentation using genetic algorithm | |
Gundreddy et al. | Assessment of performance and reproducibility of applying a content‐based image retrieval scheme for classification of breast lesions | |
Decuyper et al. | Binary glioma grading: radiomics versus pre-trained CNN features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |