CN113851216A - Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof - Google Patents

Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof Download PDF

Info

Publication number
CN113851216A
CN113851216A CN202111114266.5A CN202111114266A CN113851216A CN 113851216 A CN113851216 A CN 113851216A CN 202111114266 A CN202111114266 A CN 202111114266A CN 113851216 A CN113851216 A CN 113851216A
Authority
CN
China
Prior art keywords
clinical
ischemic stroke
acute ischemic
risk
prognosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111114266.5A
Other languages
Chinese (zh)
Inventor
王拥军
李子孝
丁玲玲
李�昊
孟霞
姜勇
荆京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tiantan Hospital
Original Assignee
Beijing Tiantan Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tiantan Hospital filed Critical Beijing Tiantan Hospital
Priority to CN202111114266.5A priority Critical patent/CN113851216A/en
Publication of CN113851216A publication Critical patent/CN113851216A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a construction method of an acute ischemic stroke clinical phenotype, a screening method of key biomarkers and application of the key biomarkers, and belongs to the technical field of bioinformatics. The invention discloses a method for mining the clinical phenotype of acute ischemic stroke based on multigroup mathematical data and data drive, realizes a machine learning algorithm for evaluating the clinical outcome and treatment responsiveness of a patient, and applies the method to risk stratification and auxiliary treatment decision of the patient with acute ischemic stroke to form an effective clinical evaluation tool. The invention also screens a group of acute ischemic stroke prognosis key biomarkers, can accurately predict the disease risk of patients according to different pathophysiology mechanisms, and has the advantages that the reliable key biomarkers are mined in a large amount of data by applying a machine learning method, so that the constructed prediction model is convenient to apply in clinic while realizing high-precision prediction.

Description

Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof
Technical Field
The invention belongs to the technical field of bioinformatics, and particularly relates to a construction method of an acute ischemic stroke clinical phenotype, a screening method of a key biomarker and application of the method.
Background
Cerebrovascular disease is a serious disease seriously threatening human health and life, and has become one of the main causes of global death and disability, and Acute Ischemic Stroke (AIS) is the most common type. The secondary prevention measures of the ischemic stroke comprise antithrombotic treatment, blood pressure control, blood sugar control, cholesterol reduction treatment and the like, and are important for reducing the recurrence risk of the stroke and improving the prognosis. However, there is evidence that patients with partial ischemic stroke still have a high risk of stroke recurrence after receiving the guidelines' recommended standardized secondary prophylactic treatment. The above evidence indicates that the existing secondary preventive measures cannot meet the effective risk management of all cerebrovascular disease patients, and a new tool capable of effectively performing risk stratification management on ischemic stroke patients is urgently to be discovered, so that the prognosis of stroke patients is further improved through individualized accurate management. The existing stroke risk evaluation mostly adopts clinical information such as traditional risk factors and the like as main evaluation contents to carry out risk stratification on ischemic stroke, wherein the clinical information comprises a stroke prognosis evaluation tool I (SPI-I), an SPI-II, an Essen stroke risk score, a California risk score, an ABCD score system and the like, however, the traditional scoring and prediction model has limited credibility and effectiveness, and the generalization performance of an external queue is poor.
Clinical phenotypes are a single or set of disease attributes that describe individual patient differences, are closely related to clinical outcome (e.g., response to treatment, relapse, death, etc.), and are an important basis for studying disease risk. Ischemic stroke is a complex and multifactorial disease, acute ischemic stroke clinical phenotype research based on a plurality of biomarkers is not available at present, and how to discover effective biomarkers and construct ischemic stroke clinical phenotype capable of realizing disease prognosis risk stratification, reflecting disease pathophysiology mechanism and treating responsiveness is an urgent problem to be solved for realizing accurate ischemic stroke diagnosis and treatment.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for constructing an acute ischemic stroke clinical phenotype, a method for screening key biomarkers, and applications thereof.
The invention provides a construction method of an acute ischemic stroke clinical phenotype, which comprises the following steps:
1) performing data sorting on 92 characteristics of clinical information, image information and blood detection indexes of acute ischemic stroke cases;
2) selecting the top 30 features related to disease prognosis from the 92 features by adopting an information gain and lightweight gradient boosting learning method;
3) performing cluster analysis on the 30 features screened in the step 2) based on a maximized Bayesian information criterion, determining an optimal cluster number according to a minimum value of BIC, and determining to divide the cluster number into 4 clinical phenotypes;
4) carrying out unsupervised clustering analysis on the 30 characteristics screened in the step 2) by adopting a Gaussian mixture model to obtain 4 clustering clusters;
5) comparing the clinical information, blood detection indexes and clinical outcome differences of the 4 clustering clusters in the step 4) to obtain 4 clinical phenotypes.
Preferably, the clinical outcome includes stroke recurrence, complex vascular events, adverse functional outcomes, and death.
The invention provides 4 clinical phenotypes obtained by the construction method, which comprise a clinical phenotype 1 mainly characterized by sugar metabolism and lipid metabolism abnormality, a clinical phenotype 2 mainly characterized by inflammation and renal dysfunction, a clinical phenotype 3 mainly characterized by arteriolar occlusion and a clinical phenotype 4 mainly characterized by homocysteine metabolism abnormality.
The invention provides a machine learning-based screening method of key biomarkers related to acute ischemic stroke prognosis, which comprises the following steps:
and optimally screening the 30 characteristics screened in the construction method by using an information gain and LightGBM algorithm to obtain the key biomarkers.
The invention provides a key biomarker for acute ischemic stroke prognosis, and when the characteristics only relate to clinical information and blood detection indexes, the key biomarker comprises one or more of the following indexes: glutamic-pyruvic transaminase, hypersensitive C-reactive protein, gamma-glutamyl transpeptidase, neutrophil count, creatinine, triglycerides, fasting plasma glucose, methylmalonic acid, and lipoprotein (a).
Preferably, when the features also relate to image information, the key biomarkers also include infarct volume.
The invention provides application of the key biomarker in construction of an acute ischemic stroke prognosis risk prediction model.
The invention provides a construction method of an acute ischemic stroke prognosis risk prediction model based on a key biomarker, which comprises the following steps:
and constructing a prediction model through LightGBM machine learning based on the acute ischemic stroke prognosis key biomarker.
The invention provides application of the constructed acute ischemic stroke prognosis risk prediction model in non-treatment-purpose prediction of disease risk and/or treatment responsiveness evaluation of an acute ischemic stroke patient.
Preferably, said predicting the disease risk of the acute ischemic stroke patient is classifying the patient into said 4 clinical phenotypes using a disease prognosis risk prediction model;
the treatment responsiveness assessment analyzes the responsiveness of different clinical phenotypes to intensive statin treatment.
The method for constructing the clinical phenotype of the acute ischemic stroke provided by the invention excavates the clinical phenotype of the acute ischemic stroke based on multiple sets of mathematical data and data drive, integrates the multiple sets of mathematical data to construct the deep clinical phenotype of the acute ischemic stroke, realizes a machine learning algorithm for evaluating the clinical outcome and treatment responsiveness of a patient, can apply the constructed clinical phenotype to risk stratification and auxiliary treatment decision of the patient of the acute ischemic stroke, and forms an effective clinical evaluation tool.
The invention provides a key biomarker for acute ischemic stroke prognosis, and when the characteristics only relate to clinical information and blood detection indexes, the key biomarker comprises one or more of the following indexes: glutamic-pyruvic transaminase, hypersensitive C-reactive protein, gamma-glutamyl transpeptidase, neutrophil count, creatinine, triglycerides, fasting plasma glucose, methylmalonic acid, and lipoprotein (a). The prognostic key biomarker obtained by screening can accurately stratify the risk of patients according to different pathophysiology mechanisms, and has the advantages that a machine learning method is applied to mine reliable biomarkers in a large amount of data, and finally a few key biomarkers are applied to construct a prediction model, so that high-precision prediction is realized, and the prognostic key biomarker is convenient to apply in clinic.
Drawings
FIG. 1 is a graph showing clinical phenotype characteristics, in which clinical phenotype 1 is mainly characterized by abnormal glucose metabolism (fasting blood glucose) and lipid metabolism (triglyceride); clinical phenotype 2 is primarily characterized by inflammation (hypersensitivity C-reactive protein, neutrophils), renal function (creatinine) abnormalities; clinical phenotype 3 was not characterized by significant abnormalities; clinical phenotype 4 is primarily characterized by abnormalities in homocysteine metabolism (methylmalonic acid);
FIG. 2 is a graph of the clinical phenotypic prognostic risk differences, with the highest risk of stroke recurrence, multiple vascular events, all-cause death, and adverse functional outcome for phenotype 2 compared to phenotype 3, and increased risk to varying degrees for both phenotype 1 and phenotype 4 in the inferred cohorts (A-D) and the validated cohorts (E-H);
FIG. 3 shows that the model performance, the prediction models constructed by 9 biomarkers (A) and 10 biomarkers (B) can achieve accurate phenotype classification;
FIG. 4 is a graph showing a Monte Carlo stochastic simulation comparing the responsiveness of different clinical phenotypes to the treatment with enhanced statins, with phenotype 2 responding well to the treatment with enhanced statins in the derived cohorts (A-C) and the validated cohorts (D-F).
Detailed Description
The invention provides a construction method of an acute ischemic stroke clinical phenotype, which comprises the following steps:
1) performing data sorting on 92 characteristics of clinical information, image information and blood detection indexes of acute ischemic stroke cases;
2) selecting the top 30 features related to disease prognosis from the 92 features by adopting an information gain and lightweight boosting learning method;
3) performing cluster analysis on the 30 features screened in the step 2) based on a maximized Bayesian information criterion, determining an optimal cluster number according to a minimum value of BIC, and determining to divide the cluster number into 4 clinical phenotypes;
4) carrying out unsupervised clustering analysis on the 30 characteristics screened in the step 2) by adopting a Gaussian mixture model to obtain 4 clustering clusters;
5) comparing the clinical information, blood detection indexes and clinical outcome differences of the 4 clustering clusters obtained in the step 4) to obtain 4 clinical phenotypes.
The invention carries out data arrangement on 92 characteristics of clinical information, image information and blood detection indexes of acute ischemic stroke cases.
In the present invention, the total 92 characteristics of the clinical information, the image information and the blood detection index include demographic characteristics, clinical symptoms and physical signs, blood components, blood coagulation function, liver and kidney function, inflammation markers, sugar metabolism, lipid metabolism, homocysteine metabolism, metabolites of intestinal flora, infarct image characteristics, intracranial and extracranial vascular stenosis, etc., which are specifically shown in table 1.
TABLE 1 92 characteristics to which the invention relates
Figure BDA0003274933550000041
Figure BDA0003274933550000051
Figure BDA0003274933550000061
Figure BDA0003274933550000071
Figure BDA0003274933550000081
After 92 features are obtained, the method selects the features which are related to disease prognosis and are ranked 30 from the 92 features by adopting an information gain and lightweight class boosting learning method.
In the present invention, the top 30 ranked features relevant to disease prognosis are detailed in table 2.
TABLE 2 Top 30 features related to disease prognosis
Figure BDA0003274933550000082
Figure BDA0003274933550000091
After 30 characteristics are obtained, the invention determines the optimal clustering number according to the minimum value of BIC and determines and divides the screened 30 characteristics into 4 clinical phenotypes based on the maximum Bayesian information criterion.
In the invention, the calculation formula of the maximized Bayesian information criterion is shown in formula I, and the lowest BIC is the best fitting model.
BIC ═ 2ln (l) + ln (n) × k formula I.
After 4 optimal cluster numbers are obtained, the invention carries out unsupervised cluster analysis on the screened 30 characteristics by adopting a Gaussian mixture model to obtain 4 cluster clusters, namely 4 clinical phenotypes.
The method for clustering the Gaussian mixture models is not particularly limited, and the clustering method of the Gaussian mixture models known in the field can be adopted.
After 4 clusters are obtained, the invention compares the clinical information, blood detection indexes and clinical outcome differences of the obtained 4 clusters to obtain 4 clinical phenotypes.
In the invention, clinical information of 4 clustering clusters and median (interquartile distance) difference of blood detection indexes are compared by statistical methods such as Kruskal-Wallis-H test, chi-square test and the like; clinical outcome differences were compared using Cox regression risk ratio (HR), Odds Ratio (OR) and 95% confidence interval, with P <0.05 considered significant differences.
In the present invention, the clinical outcome preferably includes stroke recurrence, complex vascular events, adverse functional outcomes, and death. The poor function ending is preferably 3-6 points of a modified Rankin scale (mRS).
The 4 clinical phenotypes obtained by the above construction method include a clinical phenotype 1 mainly characterized by abnormal carbohydrate metabolism and lipid metabolism, a clinical phenotype 2 mainly characterized by inflammation and abnormal renal function, a clinical phenotype 3 mainly characterized by arteriolar occlusion, and a clinical phenotype 4 mainly characterized by abnormal homocysteine metabolism. The 4 clinical phenotypes allowed the differentiation of clinical phenotypes with different biomarkers, reflecting different pathophysiological mechanisms. The biomarker differences among the 4 clinical phenotypes are detailed in table 3 and figure 1. With clinical phenotype 3 as a reference, the risk of 3-month stroke recurrence (corrected risk ratio [ aHR ]1.89, 95% confidence interval [ confidence interval, CI ]1.38-2.57), complex vascular events (ahr1.98, 95% CI 1.46-2.68), death (aHR 12.92.92, 95% CI 6.95-24.02) and adverse functional outcome (corrected ratio [ corrected odds ratio, aOR ]3.61, 95% CI 2.96-4.39, p <0.001) for clinical phenotype 2 were significantly increased; 3-month stroke recurrence for clinical phenotype 4 (aHR 1.77.77, 95% CI 1.45-2.16, p <0.001), complex vascular events (aHR 1.79, 95% CI 1.47-2.18, p <0.001), mortality (aHR 4.18.18, 95% CI 2.32-7.55, p <0.001) and adverse functional outcome (aOR 2.31, 95% CI 2.04-2.61, p <0.001) significantly increased risk; there was no significant difference in the 3-month clinical outcome risk for clinical phenotype 1, but there was a significant increase in the risk of 1-year stroke recurrence (aHR 1.23.23, 95% CI1.04-1.45, p ═ 0.014), complex vascular events (aHR 1.24.24, 95% CI 1.06-1.46, p ═ 0.008), and adverse functional outcomes (aOR 1.35, 95% CI 1.19-1.53, p < 0.001). The differences in clinical outcome for the 4 clinical phenotypes are shown in detail in FIGS. 2A-D.
The invention provides a screening method of a machine learning-based key biomarker related to acute ischemic stroke prognosis, which comprises the following steps:
and optimally screening the 30 characteristics screened in the construction method by using an information gain and LightGBM algorithm to obtain the key biomarkers.
In the invention, the number of the optimized screens is preferably 9-10. The top 10 ranked features were screened using information gain and LightGBM algorithm optimization.
In the present invention, when the characteristics relate only to clinical information and blood detection indexes, key biomarkers are formed by 9 biomarkers of glutamic-pyruvic transaminase, hypersensitivity C-reactive protein, γ -glutamyl transpeptidase, neutrophil count, creatinine, triglyceride, fasting plasma glucose, methylmalonic acid, and lipoprotein (a). When the features also relate to image information, the key biomarkers also include infarct volume, forming key biomarkers of 10 biomarker composition.
The invention also provides application of the key biomarker in constructing an acute ischemic stroke prognosis risk prediction model.
The invention provides a construction method of an acute ischemic stroke prognosis risk prediction model based on a key biomarker, which comprises the following steps:
and constructing a prediction model through LightGBM machine learning based on the acute ischemic stroke prognosis key biomarker.
In the invention, after the LightGBM machine learning is used for constructing the prediction model, preferably, ten-fold cross validation is adopted, and the performance of the model is evaluated through the area under the working curve of the subject (AUC).
In the present invention, the method for predicting the acute ischemic stroke prognosis risk prediction model is preferably as follows:
when case information of a patient to be predicted only relates to blood markers and clinical information, executing a prediction model constructed based on 9 biomarkers, wherein the model performance is shown as A in figure 3;
when the case information of the patient to be predicted contains image information, blood markers and clinical information, a prediction model which can accurately realize risk stratification is constructed based on 10 biomarkers, and the model performance is shown as B in figure 3.
The invention provides application of the constructed acute ischemic stroke prognosis risk prediction model in non-treatment-purpose prediction of disease risk and/or treatment responsiveness evaluation of an acute ischemic stroke patient.
In the present invention, said predicting the disease risk of the acute ischemic stroke patient is preferably to classify the patient into said 4 clinical phenotypes using a disease prognosis risk prediction model. The results are shown in E-H in FIG. 2 for the prediction results of disease prognosis risks.
In the present invention, the treatment responsiveness assessment is preferably an analysis of the responsiveness of different clinical phenotypes to intensive statin treatment. The method of analysis preferably compares the responsiveness of different phenotypic subclasses to the potentiating statin treatment by a Monte Carlo method stochastic simulation.
The construction method, the screening method for key biomarkers and the application of the clinical phenotype of acute ischemic stroke provided by the present invention are described in detail with reference to the following examples, but they should not be construed as limiting the scope of the present invention.
Example 1
1. Acute ischemic stroke population
Ischemic Stroke or TIA patients were recruited continuously in 201 hospitals in 22 provinces and 4 municipalities in China from 8 months in 2015 to 3 months in 2018 using The Third chinese National Stroke Registry (CNSR-III). 15166 patients are grouped together as a deduction queue, clinical data are prospectively collected, baseline brain imaging including brain MRI and cerebrovascular imaging is completed, baseline blood samples are collected, blood indexes are detected, and follow-up of clinical outcomes of 3 months, 6 months and 1 year of the patients are completed, the follow-up outcomes include recurrence of stroke, modified rankine scale (modified rankine scale), death and the like.
1) Grouping standard:
the age is more than or equal to 18 years.
② the time from onset to group entry is less than or equal to 7 days.
And thirdly, clinically, the acute ischemic stroke is diagnosed and determined by means of head imaging CT or MRI.
Signing an informed consent.
2) Exclusion criteria:
the method includes the following steps of firstly, cardiogenic stroke and existence of high-risk cardiogenic risk factors.
② ischemic stroke caused by combined tumor, acute infectious diseases or other specific causes.
2. Feature selection and preprocessing
1) Demographic characteristics: age, sex, smoking, drinking, hypertension, diabetes, hyperlipidemia, etc.
2) Clinical symptoms and signs: NIHSS score, heart rate, systolic blood pressure, diastolic blood pressure, etc.
3) Routine laboratory examination: blood regulation, blood coagulation function, cardiac myozyme, biochemical electrolyte, liver and kidney function, HCY, folic acid, vitamin B12 and the like.
4) Inflammation biomarkers: hypersensitivity C-reactive protein (hs-CRP), interleukin 1 receptor antagonist (IL-1RA), interleukin 6(IL-6), interleukin 6 receptor (IL-6R), monocyte chemotactic protein-1 (MCP1), human chitinase 3-like protein 1(YKL40), etc.;
5) sugar metabolism: fasting plasma glucose, glycated hemoglobin, glycated albumin, AGEs, esage, sRAGE, insulin, C-peptide, etc.;
6) lipid metabolism: total cholesterol, HDL-C, LDL-C, triglycerides, lipoprotein (a), PCSK9, apolipoprotein-A1 (Apo-AI), Apo-AII, Apo-B, Apo-CII, Apo-CIII, Apo-E, etc.;
7) metabolites of intestinal flora: choline (Choline), Betaine (Betaine), trimethylamine N-oxide (TMAO), Carnitine (Carnitine), Trimethyllysine (TML), Choline phosphate (Choline phosphate), and the like;
8) acute infarction image characteristics: and (3) segmenting acute infarction focuses by using DWI and ADC original images based on an earlier established deep learning model to obtain image characteristics such as infarction volume, infarction quantity and the like.
9) Intracranial and extracranial blood vessel evaluation: intracranial and extracranial vascular examinations such as Magnetic Resonance Angiography (MRA), head and neck CT angiography (CTA) are evaluated by professional neurologists and radiologists blindly to obtain intracranial and extracranial vascular stenosis features including symptomatic intracranial arterial stenosis, symptomatic extracranial arterial stenosis, and stenosis severity (< 50%; 50-99%; occlusion).
And (4) finally obtaining 92 features by data sorting, and carrying out processing such as standardization and normalization on the data. The top 30 ranked features relevant to disease prognosis were selected based on information gain and LightGBM model. The results are shown in Table 2.
TABLE 2 Top 30 features related to disease prognosis
Figure BDA0003274933550000131
Figure BDA0003274933550000141
3. Cluster analysis
1) Determining an optimal number of clusters
Bayesian Information Criterion (BIC) is maximized using finite mixture model-based clustering. The lowest BIC is the optimal number of clusters and is calculated in equation I, where n is the number of data points and k is the number of estimated parameters.
BIC ═ 2ln (l) + ln (n) × k formula I.
BIC introduces a penalty term to the number of parameters in the model, thereby selecting a more suitable model. Regardless of the equation used, the goal is to minimize the absolute value of BIC.
2) Gaussian model (GMM) clustering analysis
GMM is a probabilistic clustering method that assumes that all data samples X are generated from K mixed multivariate gaussian distributions.
Initializing a K Gaussian distribution: k objects were randomly selected to represent the mean (center) of each cluster. Initial values of π k, μ k and Σ k are set for each component k.
Step E: a posterior probability gamma is calculated from the current pi k, mu k, sigma k (Znk).
Figure BDA0003274933550000151
The number of components K is defined, initial values of pi K, μ K, and Σ K are set for each component K, and then a log likelihood function is calculated.
③ M step: the π k, μ k, Σ k of each cluster is updated according to γ (znk) calculated in E step.
Figure BDA0003274933550000152
And fourthly, calculating a log-likelihood function.
Figure BDA0003274933550000153
And fifthly, repeating the steps II and III until the algorithm is converged.
Each object is assigned to the cluster with the highest probability according to the probability that the object belongs to each cluster.
The clustering algorithm was repeated 1000 times using bootstrappedpalinates to obtain a stable model.
4. Comparing the obtained clinical information, blood detection indexes and clinical outcome (including stroke recurrence, compound vascular events, adverse function outcome (modified rankine scale, mRS) 3-6 points) differences of the 4 clustering clusters to obtain 4 clinical phenotypes, and distinguishing the clinical phenotypes with different biomarkers and reflecting different pathophysiological mechanisms. Clinical phenotype 1 is characterized primarily by abnormalities in carbohydrate metabolism and lipid metabolism; clinical phenotype 2 is characterized primarily by inflammation, renal dysfunction; clinical phenotype 3 is primarily characterized by arteriolar occlusion; clinical phenotype 4 is primarily characterized by abnormalities in homocysteine metabolism. The biomarker differences among the 4 clinical phenotypes are detailed in attached table 3.
With clinical phenotype 3 as a reference, 3 months of clinical phenotype 2 had significantly increased risk of stroke recurrence (corrected risk ratio [ adjuvated hazard ratio, aHR ]1.89, 95% confidence interval [ confidence interval, CI ]1.38-2.57), complex vascular events (ahr1.98, 95% CI 1.46-2.68), mortality (aHR 12.92.92, 95% CI 6.95-24.02), and adverse functional outcome (corrected ratio [ adjuvated odds ratio, aOR ]3.61, 95% CI 2.96-4.39, p < 0.001); 3-month stroke recurrence for clinical phenotype 4 (aHR 1.77.77, 95% CI 1.45-2.16, p <0.001), complex vascular events (aHR 1.79, 95% CI 1.47-2.18, p <0.001), death (aHR 4.18.18, 95% CI 2.32-7.55, p <0.001), and a significant increase in risk of adverse functional outcome (aOR 2.31, 95% CI 2.04-2.61, p < 0.001); there was no significant difference in the 3-month clinical outcome risk for clinical phenotype 1, but there was a significant increase in the risk of 1-year stroke recurrence (aHR 1.23.23, 95% CI1.04-1.45, p ═ 0.014), complex vascular events (aHR 1.24.24, 95% CI 1.06-1.46, p ═ 0.008), and adverse functional outcomes (aOR 1.35, 95% CI 1.19-1.53, p < 0.001). The differences in clinical outcome for the 4 clinical phenotypes are shown in detail in FIGS. 2A-D.
Example 2
Machine learning-based screening method for acute ischemic stroke prognosis-related key biomarkers
Selecting prognosis related characteristics based on an information gain and lightweight Boosting (LightGBM) learning method;
if the disease case information only relates to blood markers and clinical information, 9 prognostic-related key biomarkers (glutamic-pyruvic transaminase, hypersensitive C-reactive protein, gamma-glutamyl transpeptidase, neutrophil count, creatinine, triglyceride, fasting plasma glucose, methylmalonic acid and lipoprotein (a)) are selected;
if the disease case information contains image information, blood markers and clinical information, 10 prognostic key biomarkers (infarct volume, glutamic-pyruvic transaminase, hypersensitive C reactive protein, gamma-glutamyl transpeptidase, neutrophil count, creatinine, triglyceride, fasting plasma glucose, methylmalonic acid and lipoprotein (a)) were selected. The key biomarker differences among the 4 clinical phenotypes are detailed in fig. 1, table 3.
Example 3
Disease risk prediction model construction method based on biomarkers
The key biomarkers screened in example 2 were used to construct a prediction model by LightGBM, and ten-fold cross validation was performed to evaluate model performance by area under the subject's working curve (AUC).
If the disease case information only relates to the blood marker and the clinical information, executing the step 1.1);
if the disease case information contains image information, blood marker and clinical information, step 1.2) is executed.
Wherein, step 1.1) constructs a prediction model capable of accurately realizing risk stratification based on 9 biomarkers, and the model performance is shown as A in figure 3.
Step 1.2) a prediction model capable of accurately realizing risk stratification is constructed based on 10 biomarkers, and the model performance is shown as B in figure 3.
The disease patients are classified into different clinical phenotypes by using a prediction model, the prognosis differences of 4 clinical phenotypes in the verification cohort are compared, the results are highly consistent with the derivation cohort, and the disease prognosis risk prediction results are shown in E-H in FIG. 2. The model is proved to realize better disease prognosis risk stratification.
Example 4
Analysis of the correlation of the deep phenotype with the risk of stroke recurrence
The differences in risk of stroke recurrence for the different phenotypic subgroups of patients were compared for 3 months, 6 months and 1 year. The risk of relapse was assessed using a Cox proportional hazards regression model and the 95% CI risk ratio (HR) was reported. And further analyzing key phenotypic characteristics of the group with high risk of stroke recurrence.
Example 5
Treatment responsiveness assessment
The responsiveness of different phenotypic subsets to the potentiation of statin treatment was compared by the Monte Carlo method stochastic simulation method.
The results are shown in FIG. 4. In the original baseline data, the reinforced statins may make 0.01% of the patient's revenue, 76.69% may be ineffective, and 23.30% may be detrimental. The highest rate of change in clinical phenotype 2 in the cohort was expected to increase the benefited population to 6.1% and the detrimental population to 0.22% (fig. 4 a, D). The highest ratio of phenotype 2 changes in the validation cohort increased the benefit population to 87.51% (B in fig. 4) and the harm population to 0.00% (E in fig. 4). Changes in clinical phenotype 1, clinical phenotype 2, and clinical phenotype 4 had no significant effect on treatment responsiveness (C and F in fig. 4).
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
TABLE 34 characterization comparison of clinical phenotypes
Figure BDA0003274933550000181
Figure BDA0003274933550000191
Figure BDA0003274933550000201
Note: p <0.05, reference clinical phenotype 3; # 10 key biomarkers screened.

Claims (10)

1. A method for constructing an acute ischemic stroke clinical phenotype is characterized by comprising the following steps:
1) performing data sorting on 92 characteristics of clinical information, image information and blood detection indexes of acute ischemic stroke cases;
2) selecting the top 30 features related to disease prognosis from the 92 features by adopting an information gain and lightweight gradient boosting learning method;
3) performing cluster analysis on the 30 features screened in the step 2) based on a maximized Bayesian information criterion, determining an optimal cluster number according to a minimum value of BIC, and determining to divide the cluster number into 4 clinical phenotypes;
4) carrying out unsupervised clustering analysis on the 30 characteristics screened in the step 2) by adopting a Gaussian mixture model to obtain 4 clustering clusters;
5) comparing the clinical information, blood detection indexes and clinical outcome differences of the 4 clustering clusters in the step 4) to obtain 4 clinical phenotypes.
2. The method of constructing according to claim 1, wherein the clinical outcome comprises stroke recurrence, complex vascular events, adverse functional outcomes, and death.
3. The 4 clinical phenotypes obtained by the construction method according to claim 1 or 2, which include a clinical phenotype 1 mainly characterized by abnormal glucose metabolism and lipid metabolism, a clinical phenotype 2 mainly characterized by inflammation and abnormal renal function, a clinical phenotype 3 mainly characterized by arteriolar occlusion, and a clinical phenotype 4 mainly characterized by abnormal homocysteine metabolism.
4. A screening method of a key biomarker for acute ischemic stroke prognosis based on machine learning is characterized by comprising the following steps:
the 30 features screened in the construction method of claim 1 were optimized using information gain and LightGBM algorithm to obtain key biomarkers.
5. The screening method of claim 4, wherein the key biomarkers for acute ischemic stroke prognosis comprise one or more of the following markers when the characteristics relate to clinical information and blood detection indexes only: glutamic-pyruvic transaminase, hypersensitive C-reactive protein, gamma-glutamyl transpeptidase, neutrophil count, creatinine, triglycerides, fasting plasma glucose, methylmalonic acid, and lipoprotein (a).
6. The key biomarker of claim 5, wherein the key biomarker further comprises infarct volume when the features further relate to image information.
7. Use of the key biomarker of claim 5 or 6 in the construction of a model for predicting the risk of prognosis of acute ischemic stroke.
8. A method for constructing an acute ischemic stroke prognosis risk prediction model based on the key biomarker of claim 5 or 6, which comprises the following steps:
constructing a prediction model by LightGBM machine learning based on the key biomarker for acute ischemic stroke prognosis as claimed in claim 5 or 6.
9. Use of the prognostic risk prediction model constructed according to the method of claim 8 for non-therapeutic purposes in predicting the risk of disease and/or the treatment responsiveness of patients with acute ischemic stroke.
10. The use of claim 9, wherein the predicting the disease risk of the acute ischemic stroke patient is classifying the patient into the 4 clinical phenotypes of claim 3 using a disease prognosis risk prediction model;
the treatment responsiveness assessment analyzes the responsiveness of different clinical phenotypes to intensive statin treatment.
CN202111114266.5A 2021-09-23 2021-09-23 Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof Pending CN113851216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111114266.5A CN113851216A (en) 2021-09-23 2021-09-23 Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111114266.5A CN113851216A (en) 2021-09-23 2021-09-23 Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof

Publications (1)

Publication Number Publication Date
CN113851216A true CN113851216A (en) 2021-12-28

Family

ID=78978810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111114266.5A Pending CN113851216A (en) 2021-09-23 2021-09-23 Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof

Country Status (1)

Country Link
CN (1) CN113851216A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485262A (en) * 2022-04-26 2023-07-25 深圳依时货拉拉科技有限公司 Evaluation method of pricing strategy, electronic equipment and storage medium
CN117031044A (en) * 2023-10-07 2023-11-10 首都医科大学附属北京天坛医院 Biomarker and kit for predicting stroke-related pneumonia and diagnosis equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107615395A (en) * 2015-03-26 2018-01-19 外科安全技术公司 For event and operating room black box device, system, method and the computer-readable medium of error prediction
CN111430029A (en) * 2020-03-24 2020-07-17 浙江达美生物技术有限公司 Multi-dimensional stroke prevention screening method based on artificial intelligence
CN111739637A (en) * 2020-06-23 2020-10-02 王凯 Clinical diagnosis and treatment application of acute ischemic stroke based on cerebral microhemorrhage detection
CN112185549A (en) * 2020-09-29 2021-01-05 郑州轻工业大学 Esophageal squamous carcinoma risk prediction method based on clinical phenotype and logistic regression analysis
CN113257422A (en) * 2021-06-04 2021-08-13 福州大学 Method and system for constructing disease prediction model based on glucose metabolism data
US20210272701A1 (en) * 2018-07-02 2021-09-02 Mayo Foundation For Medical Education And Research System and method for determining a set of principal components in a multivariate medical data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107615395A (en) * 2015-03-26 2018-01-19 外科安全技术公司 For event and operating room black box device, system, method and the computer-readable medium of error prediction
US20210272701A1 (en) * 2018-07-02 2021-09-02 Mayo Foundation For Medical Education And Research System and method for determining a set of principal components in a multivariate medical data
CN111430029A (en) * 2020-03-24 2020-07-17 浙江达美生物技术有限公司 Multi-dimensional stroke prevention screening method based on artificial intelligence
CN111739637A (en) * 2020-06-23 2020-10-02 王凯 Clinical diagnosis and treatment application of acute ischemic stroke based on cerebral microhemorrhage detection
CN112185549A (en) * 2020-09-29 2021-01-05 郑州轻工业大学 Esophageal squamous carcinoma risk prediction method based on clinical phenotype and logistic regression analysis
CN113257422A (en) * 2021-06-04 2021-08-13 福州大学 Method and system for constructing disease prediction model based on glucose metabolism data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
曹逸涵等: "SAPHO综合征164例临床表型聚类分析的临床意义", 《中华风湿病学杂志》 *
杨国源,金坤林,(美)张志君: "《实验卒中模型方法学》", 30 November 2019 *
郭开达: "阻塞性睡眠呼吸暂停低通气综合征患者临床表型的聚类分析及初步探讨", 《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485262A (en) * 2022-04-26 2023-07-25 深圳依时货拉拉科技有限公司 Evaluation method of pricing strategy, electronic equipment and storage medium
CN116485262B (en) * 2022-04-26 2024-04-12 深圳依时货拉拉科技有限公司 Evaluation method of pricing strategy, electronic equipment and storage medium
CN117031044A (en) * 2023-10-07 2023-11-10 首都医科大学附属北京天坛医院 Biomarker and kit for predicting stroke-related pneumonia and diagnosis equipment
CN117031044B (en) * 2023-10-07 2024-01-02 首都医科大学附属北京天坛医院 Biomarker and kit for predicting stroke-related pneumonia and diagnosis equipment

Similar Documents

Publication Publication Date Title
CA2777800C (en) Biomarkers and methods for measuring and monitoring inflammatory disease activity
Kho et al. Transcriptomic analysis of human lung development
CN113851216A (en) Acute ischemic stroke clinical phenotype construction method, key biomarker screening method and application thereof
Robotti et al. Biomarkers discovery through multivariate statistical methods: a review of recently developed methods and applications in proteomics
Wang et al. An imbalanced-data processing algorithm for the prediction of heart attack in stroke patients
CN102803951A (en) Determination of coronary artery disease risk
Rahman et al. QCovSML: A reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model
CN113327679A (en) Pulmonary embolism clinical risk and prognosis scoring method and system
WO2015153437A1 (en) Biomarkers and methods for measuring and monitoring juvenile idiopathic arthritis activity
Guo et al. Machine learning models for predicting in-hospital mortality in acute aortic dissection patients
Probert et al. Integrative biochemical, proteomics and metabolomics cerebrospinal fluid biomarkers predict clinical conversion to multiple sclerosis
CN114242245A (en) Machine learning method, system and device for predicting diabetic nephropathy occurrence risk based on electronic medical record data
Chen et al. Plasma metabolic fingerprints for large-scale screening and personalized risk stratification of metabolic syndrome
WO2019027910A2 (en) Adjusted multi-biomarker disease activity score for inflammatory disease assessment
Sh et al. Application of artificial intelligence modeling technology based on fluid biopsy to diagnose alzheimer’s disease
KR20230158101A (en) Systems and methods for generating surgical risk scores and their uses
Yuan et al. Classification of mild cognitive impairment with multimodal data using both labeled and unlabeled samples
Afzal et al. Revealing genetic links of Type 2 diabetes that lead to the development of Alzheimer’s disease
US11293852B2 (en) White blood cell population dynamics
WO2017151755A1 (en) Biomarkers and methods for predicting response to discontinuation of inflammatory disease therapy
CN113782197B (en) New coronary pneumonia patient outcome prediction method based on interpretable machine learning algorithm
CN110739072A (en) Bleeding event occurrence evaluation method and system
WO2019217910A1 (en) Genome-wide classifiers for detection of subacute transplant rejection and other transplant conditions
Fostel et al. Exploration of the gene expression correlates of chronic unexplained fatigue using factor analysis
US20230028910A1 (en) Method for diagnosing cutaneous t-cell lymphoma diseases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211228

RJ01 Rejection of invention patent application after publication