CN116087482A - Biomarkers for novel patient progression severity typing for coronavirus infection - Google Patents
Biomarkers for novel patient progression severity typing for coronavirus infection Download PDFInfo
- Publication number
- CN116087482A CN116087482A CN202310172516.3A CN202310172516A CN116087482A CN 116087482 A CN116087482 A CN 116087482A CN 202310172516 A CN202310172516 A CN 202310172516A CN 116087482 A CN116087482 A CN 116087482A
- Authority
- CN
- China
- Prior art keywords
- biomarker
- model
- serum
- algorithm
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000000090 biomarker Substances 0.000 title claims abstract description 56
- 208000001528 Coronaviridae Infections Diseases 0.000 title claims abstract description 8
- 210000002966 serum Anatomy 0.000 claims abstract description 31
- 239000002207 metabolite Substances 0.000 claims abstract description 17
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 13
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 13
- 239000000523 sample Substances 0.000 claims description 41
- 238000004422 calculation algorithm Methods 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 33
- 238000010801 machine learning Methods 0.000 claims description 30
- 238000007637 random forest analysis Methods 0.000 claims description 17
- -1 hexadecyl oleanolic acid Chemical compound 0.000 claims description 16
- 102100027936 Attractin Human genes 0.000 claims description 13
- 101710134735 Attractin Proteins 0.000 claims description 13
- KROOJMCNWRRPCC-UHFFFAOYSA-N 8-hydroxyeicosatetraenoic acid Natural products CCCCCC=CCC=CCC=C(O)CC=CCCCC(O)=O KROOJMCNWRRPCC-UHFFFAOYSA-N 0.000 claims description 11
- VQPVVWAFTIFKDD-UHFFFAOYSA-N 2-amino-5-(2-cyanoethylamino)-5-oxopentanoic acid Chemical compound OC(=O)C(N)CCC(=O)NCCC#N VQPVVWAFTIFKDD-UHFFFAOYSA-N 0.000 claims description 10
- 102000013366 Filamin Human genes 0.000 claims description 10
- 108060002900 Filamin Proteins 0.000 claims description 10
- 102100040990 Platelet-derived growth factor subunit B Human genes 0.000 claims description 10
- 101710103494 Platelet-derived growth factor subunit B Proteins 0.000 claims description 10
- 102000043253 matrix Gla protein Human genes 0.000 claims description 10
- 108010057546 matrix Gla protein Proteins 0.000 claims description 10
- 125000005504 styryl group Chemical group 0.000 claims description 10
- QALLXIUHRUVULB-OKFWSBNLSA-N 1-[(9Z)-hexadecenoyl]-2-acetyl-sn-glycero-3-phosphocholine Chemical compound CCCCCC\C=C/CCCCCCCC(=O)OC[C@@H](OC(C)=O)COP([O-])(=O)OCC[N+](C)(C)C QALLXIUHRUVULB-OKFWSBNLSA-N 0.000 claims description 8
- 150000000668 8-HETE derivatives Chemical class 0.000 claims description 8
- 102100036977 Talin-1 Human genes 0.000 claims description 8
- JOKIQGQOKXGHDV-UHFFFAOYSA-N thiomorpholine-3-carboxylic acid Chemical compound [O-]C(=O)C1CSCC[NH2+]1 JOKIQGQOKXGHDV-UHFFFAOYSA-N 0.000 claims description 8
- 241000711573 Coronaviridae Species 0.000 claims description 7
- JWBLQDDHSDGEGR-DRZSPHRISA-N Phe-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@@H](N)CC1=CC=CC=C1 JWBLQDDHSDGEGR-DRZSPHRISA-N 0.000 claims description 7
- GCKFUYQCUCGESZ-BPIQYHPVSA-N etonogestrel Chemical compound O=C1CC[C@@H]2[C@H]3C(=C)C[C@](CC)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=C1 GCKFUYQCUCGESZ-BPIQYHPVSA-N 0.000 claims description 7
- 229960002941 etonogestrel Drugs 0.000 claims description 7
- VHZZVMHWADQDDI-UHFFFAOYSA-M 10-methoxy-2-methyl-11ah-pyrido[4,3-c]carbazol-2-ium;iodide Chemical compound [I-].C1=C[N+](C)=CC2=C3C4C=C(OC)C=CC4=NC3=CC=C21 VHZZVMHWADQDDI-UHFFFAOYSA-M 0.000 claims description 6
- 102000007469 Actins Human genes 0.000 claims description 6
- 108010085238 Actins Proteins 0.000 claims description 6
- 108010016626 Dipeptides Proteins 0.000 claims description 6
- JAUOIFJMECXRGI-UHFFFAOYSA-N Neoclaritin Chemical compound C=1C(Cl)=CC=C2C=1CCC1=CC=CN=C1C2=C1CCNCC1 JAUOIFJMECXRGI-UHFFFAOYSA-N 0.000 claims description 6
- 102100024944 Tropomyosin alpha-4 chain Human genes 0.000 claims description 6
- 101710193115 Tropomyosin alpha-4 chain Proteins 0.000 claims description 6
- 229960001271 desloratadine Drugs 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 6
- 238000003379 elimination reaction Methods 0.000 claims description 6
- 210000000265 leukocyte Anatomy 0.000 claims description 6
- 108700042226 ras Genes Proteins 0.000 claims description 6
- YLZRFVZUZIJABA-UHFFFAOYSA-N 4-Acetamido-2-aminobutanoic acid Chemical compound CC(=O)NCCC(N)C(O)=O YLZRFVZUZIJABA-UHFFFAOYSA-N 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 5
- 229960000367 inositol Drugs 0.000 claims description 5
- 102000019034 Chemokines Human genes 0.000 claims description 4
- 108010012236 Chemokines Proteins 0.000 claims description 4
- 101710142287 Talin-1 Proteins 0.000 claims description 4
- 239000003153 chemical reaction reagent Substances 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 229940100243 oleanolic acid Drugs 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- 239000002253 acid Substances 0.000 claims description 2
- 239000005667 attractant Substances 0.000 claims description 2
- 238000003066 decision tree Methods 0.000 claims description 2
- 235000002949 phytic acid Nutrition 0.000 claims description 2
- QZVLLXPELCRVPR-UHFFFAOYSA-N 2-(undecylamino)acetic acid Chemical compound CCCCCCCCCCCNCC(O)=O QZVLLXPELCRVPR-UHFFFAOYSA-N 0.000 claims 3
- 101000584600 Homo sapiens Ras-related protein Rap-1b Proteins 0.000 claims 3
- 102100030705 Ras-related protein Rap-1b Human genes 0.000 claims 3
- JKLISIRFYWXLQG-UHFFFAOYSA-N Epioleonolsaeure Natural products C1CC(O)C(C)(C)C2CCC3(C)C4(C)CCC5(C(O)=O)CCC(C)(C)CC5C4CCC3C21C JKLISIRFYWXLQG-UHFFFAOYSA-N 0.000 claims 2
- YBRJHZPWOMJYKQ-UHFFFAOYSA-N Oleanolic acid Natural products CC1(C)CC2C3=CCC4C5(C)CCC(O)C(C)(C)C5CCC4(C)C3(C)CCC2(C1)C(=O)O YBRJHZPWOMJYKQ-UHFFFAOYSA-N 0.000 claims 2
- MIJYXULNPSFWEK-UHFFFAOYSA-N Oleanolinsaeure Natural products C1CC(O)C(C)(C)C2CCC3(C)C4(C)CCC5(C(O)=O)CCC(C)(C)CC5C4=CCC3C21C MIJYXULNPSFWEK-UHFFFAOYSA-N 0.000 claims 2
- 230000003399 chemotactic effect Effects 0.000 claims 2
- 229960001669 kinetin Drugs 0.000 claims 2
- HZLWUYJLOIAQFC-UHFFFAOYSA-N prosapogenin PS-A Natural products C12CC(C)(C)CCC2(C(O)=O)CCC(C2(CCC3C4(C)C)C)(C)C1=CCC2C3(C)CCC4OC1OCC(O)C(O)C1O HZLWUYJLOIAQFC-UHFFFAOYSA-N 0.000 claims 2
- 102000045595 Phosphoprotein Phosphatases Human genes 0.000 claims 1
- 108700019535 Phosphoprotein Phosphatases Proteins 0.000 claims 1
- 125000000913 palmityl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 claims 1
- 208000025721 COVID-19 Diseases 0.000 abstract description 23
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 11
- 201000010099 disease Diseases 0.000 abstract description 10
- 238000011084 recovery Methods 0.000 abstract description 6
- 230000002349 favourable effect Effects 0.000 abstract description 5
- 238000003205 genotyping method Methods 0.000 abstract 1
- 238000004458 analytical method Methods 0.000 description 11
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 6
- 238000004949 mass spectrometry Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 108090000765 processed proteins & peptides Proteins 0.000 description 5
- 208000024891 symptom Diseases 0.000 description 5
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 208000035473 Communicable disease Diseases 0.000 description 4
- 101000598025 Homo sapiens Talin-1 Proteins 0.000 description 4
- 208000015181 infectious disease Diseases 0.000 description 4
- 238000001819 mass spectrum Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- LIBBWEZNMNXORM-UHFFFAOYSA-N 8-hydroxyicosa-2,4,6,8-tetraenoic acid Chemical compound CCCCCCCCCCCC=C(O)C=CC=CC=CC(O)=O LIBBWEZNMNXORM-UHFFFAOYSA-N 0.000 description 3
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 3
- 108010017384 Blood Proteins Proteins 0.000 description 3
- 102000004506 Blood Proteins Human genes 0.000 description 3
- 101100020739 Caenorhabditis elegans lect-2 gene Proteins 0.000 description 3
- 208000028399 Critical Illness Diseases 0.000 description 3
- HJGDZMKXFBZCDO-UHFFFAOYSA-N OC(=O)C1CSCCN1.OC(=O)C1CSCCN1 Chemical compound OC(=O)C1CSCCN1.OC(=O)C1CSCCN1 HJGDZMKXFBZCDO-UHFFFAOYSA-N 0.000 description 3
- 102000007079 Peptide Fragments Human genes 0.000 description 3
- 108010033276 Peptide Fragments Proteins 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000002209 hydrophobic effect Effects 0.000 description 3
- 239000007788 liquid Substances 0.000 description 3
- 230000002503 metabolic effect Effects 0.000 description 3
- 238000002705 metabolomic analysis Methods 0.000 description 3
- 230000001431 metabolomic effect Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 229910052760 oxygen Inorganic materials 0.000 description 3
- 239000001301 oxygen Substances 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000013517 stratification Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 102100031366 Ankyrin-1 Human genes 0.000 description 2
- 101710191059 Ankyrin-1 Proteins 0.000 description 2
- 241001678559 COVID-19 virus Species 0.000 description 2
- SQUHHTBVTRBESD-UHFFFAOYSA-N Hexa-Ac-myo-Inositol Natural products CC(=O)OC1C(OC(C)=O)C(OC(C)=O)C(OC(C)=O)C(OC(C)=O)C1OC(C)=O SQUHHTBVTRBESD-UHFFFAOYSA-N 0.000 description 2
- 238000003759 clinical diagnosis Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- VJJPUSNTGOMMGY-MRVIYFEKSA-N etoposide Chemical compound COC1=C(O)C(OC)=CC([C@@H]2C3=CC=4OCOC=4C=C3[C@@H](O[C@H]3[C@@H]([C@@H](O)[C@@H]4O[C@H](C)OC[C@H]4O3)O)[C@@H]3[C@@H]2C(OC3)=O)=C1 VJJPUSNTGOMMGY-MRVIYFEKSA-N 0.000 description 2
- 229960005420 etoposide Drugs 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- CDAISMWEOUEBRE-GPIVLXJGSA-N inositol Chemical compound O[C@H]1[C@H](O)[C@@H](O)[C@H](O)[C@H](O)[C@@H]1O CDAISMWEOUEBRE-GPIVLXJGSA-N 0.000 description 2
- 238000004811 liquid chromatography Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000000575 proteomic method Methods 0.000 description 2
- 230000000241 respiratory effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- CDAISMWEOUEBRE-UHFFFAOYSA-N scyllo-inosotol Natural products OC1C(O)C(O)C(O)C(O)C1O CDAISMWEOUEBRE-UHFFFAOYSA-N 0.000 description 2
- BRNULMACUQOKMR-UHFFFAOYSA-N thiomorpholine Chemical compound C1CSCCN1 BRNULMACUQOKMR-UHFFFAOYSA-N 0.000 description 2
- 238000004704 ultra performance liquid chromatography Methods 0.000 description 2
- 239000003643 water by type Substances 0.000 description 2
- HNSDLXPSAYFUHK-UHFFFAOYSA-N 1,4-bis(2-ethylhexyl) sulfosuccinate Chemical compound CCCCC(CC)COC(=O)CC(S(O)(=O)=O)C(=O)OCC(CC)CCCC HNSDLXPSAYFUHK-UHFFFAOYSA-N 0.000 description 1
- VGBAYGFELCUXBS-UHFFFAOYSA-N 1,4-dioxane-2-carboxylic acid Chemical compound OC(=O)C1COCCO1 VGBAYGFELCUXBS-UHFFFAOYSA-N 0.000 description 1
- WAESUNHGNAUFBW-UHFFFAOYSA-N 3-hexadecanoyloleanolic acid Chemical compound C12CC=C3C4CC(C)(C)CCC4(C(O)=O)CCC3(C)C1(C)CCC1C2(C)CCC(OC(=O)CCCCCCCCCCCCCCC)C1(C)C WAESUNHGNAUFBW-UHFFFAOYSA-N 0.000 description 1
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 1
- 208000010444 Acidosis Diseases 0.000 description 1
- 206010001052 Acute respiratory distress syndrome Diseases 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 208000000059 Dyspnea Diseases 0.000 description 1
- 206010013975 Dyspnoeas Diseases 0.000 description 1
- 101000945751 Homo sapiens Leukocyte cell-derived chemotaxin-2 Proteins 0.000 description 1
- 101000637835 Homo sapiens Serum amyloid A-4 protein Proteins 0.000 description 1
- 208000006083 Hypokinesia Diseases 0.000 description 1
- 206010021143 Hypoxia Diseases 0.000 description 1
- 102100034762 Leukocyte cell-derived chemotaxin-2 Human genes 0.000 description 1
- 206010025102 Lung infiltration Diseases 0.000 description 1
- 206010027417 Metabolic acidosis Diseases 0.000 description 1
- BZLVMXJERCGZMT-UHFFFAOYSA-N Methyl tert-butyl ether Chemical compound COC(C)(C)C BZLVMXJERCGZMT-UHFFFAOYSA-N 0.000 description 1
- 108010019160 Pancreatin Proteins 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 208000013616 Respiratory Distress Syndrome Diseases 0.000 description 1
- 208000037847 SARS-CoV-2-infection Diseases 0.000 description 1
- 206010040047 Sepsis Diseases 0.000 description 1
- 102100032016 Serum amyloid A-4 protein Human genes 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 201000000028 adult respiratory distress syndrome Diseases 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- 150000001450 anions Chemical class 0.000 description 1
- 125000001797 benzyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])* 0.000 description 1
- 125000002091 cationic group Chemical group 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000008367 deionised water Substances 0.000 description 1
- 229910021641 deionized water Inorganic materials 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 229940074200 diamode Drugs 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000007071 enzymatic hydrolysis Effects 0.000 description 1
- 238000006047 enzymatic hydrolysis reaction Methods 0.000 description 1
- 235000019253 formic acid Nutrition 0.000 description 1
- 238000004108 freeze drying Methods 0.000 description 1
- 230000001339 gustatory effect Effects 0.000 description 1
- 238000000589 high-performance liquid chromatography-mass spectrometry Methods 0.000 description 1
- 238000013427 histology analysis Methods 0.000 description 1
- 230000007954 hypoxia Effects 0.000 description 1
- PGLTVOMIXTUURA-UHFFFAOYSA-N iodoacetamide Chemical compound NC(=O)CI PGLTVOMIXTUURA-UHFFFAOYSA-N 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- PGYPOBZJRVSMDS-UHFFFAOYSA-N loperamide hydrochloride Chemical compound Cl.C=1C=CC=CC=1C(C=1C=CC=CC=1)(C(=O)N(C)C)CCN(CC1)CCC1(O)C1=CC=C(Cl)C=C1 PGYPOBZJRVSMDS-UHFFFAOYSA-N 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229940055695 pancreatin Drugs 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000036387 respiratory rate Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000000405 serological effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 238000002636 symptomatic treatment Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 101150037438 tpm gene Proteins 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 229940075466 undecylenate Drugs 0.000 description 1
- 238000009777 vacuum freeze-drying Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/5302—Apparatus specially adapted for immunological test procedures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/26—Infectious diseases, e.g. generalised sepsis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A50/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
- Y02A50/30—Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Immunology (AREA)
- Chemical & Material Sciences (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Hematology (AREA)
- Urology & Nephrology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biochemistry (AREA)
- Primary Health Care (AREA)
- Microbiology (AREA)
- General Physics & Mathematics (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention proposes a biomarker for the genotyping of the severity of the course of a patient suffering from a novel coronavirus infection, said biomarker comprising at least one selected from the group consisting of a serum-specific protein and a serum-specific metabolite. The biomarker has good predictive performance, can effectively evaluate the severity of a patient with the COVID-19, can reflect the recovery condition of the patient, and is favorable for doctors to accurately predict the progress of diseases and perform clinical intervention in time.
Description
Technical Field
The invention relates to the technical field of biomedical treatment, in particular to a biomarker for parting the severity of a disease course of a novel coronavirus infected patient.
Background
Novel coronavirus disease 2019 (covd-19) is a novel respiratory and systemic disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Since the report of 12 months in 2019, covd-19 has rapidly become popular worldwide, providing a great challenge to the global public health system, which has been incorporated as an acute respiratory infectious disease into the infectious disease of class b, regulatory in the infectious disease control laws of the people's republic of China, and managed by class a infectious disease. The patient with COVID-19 mainly has fever, cough and hypodynamia as symptoms, and rarely has typical symptoms such as gustatory and dysolfactory disorders, and the obvious double-lung ground glass-like lesions are the most common imaging manifestations. People develop symptoms within 1-2 weeks of SARS-CoV-2 infection, most of the cases are light or common cases, few severe cases develop dyspnea and/or hypoxia, if the patients cannot be timely cured and rapidly progress to acute respiratory distress syndrome, sepsis shock, and even metabolic acidosis and coagulation dysfunction which are difficult to correct, and the death rate of the COVID-19 inpatients is reported to reach 3.28%. The number of patients suffering from COVID-19 in early stage is large, and limited medical resources are easy to be extruded by a large number of patients, so that the real serious patients are difficult to obtain and effectively treat. Therefore, the method has very important significance for accurately judging the critical patients and the non-critical patients clinically.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art to at least some extent.
Since the epidemic situation of the COVID-19, a plurality of researches establish a machine learning classifier based on clinical characteristics and histology data, but the machine learning classifier is difficult to be effectively implemented due to the limitation of the sample size. The study integrates human serum proteomics and metabonomics data, and finds biomarkers which can be used for diagnosing a new crown or judging the severity of a new crown patient based on a machine learning model of characteristic molecules (serum proteins and metabolites), and the biomarkers can be used for monitoring risk stratification and disease progress of a COVID-19 patient.
Thus, in one aspect of the invention, the invention proposes a set of biomarkers. According to an embodiment of the invention, the biomarker comprises at least one selected from the group consisting of a serum-signature protein and a serum-signature metabolite, wherein the serum-signature protein comprises at least one of a dermato-dynamic protein (SRC 8), a myo-inositol-polyphosphatase (MINP 1), a serum amyloid A4 (SAA 4), a RAS oncogene family member RAP1B (RAP 1B), a filamin a (FLNA), a Matrix Gla Protein (MGP), a platelet-derived growth factor subunit B (PDGFB), an Attractin (ATRN), a talin 1 (TLN 1), a tropomyosin 4 (TPM 4) and a leukocyte-derived chemokine 2 (LECT 2); the serum characteristic metabolites include at least one of gamma-Glutamyl-beta-aminopropionitrile (gamma-Glutamyl-beta-aminopropionylate), 3-carboxythiomorpholine (Thiomorpholine 3-carboxylate), 2- (styryl) -1-3-dioxolane (2- (phenylmethyl) -1, 3-dioxalane), 3-hexadecyl-oleanolic acid (3-hexadecanoolyl-able), PC (16:1 (9Z)/2:0), 8-hydroxyeicosatetraenoic acid (8-HETE), gulethylglycerinosine (archettoglycol-myo-inosol), PC (O-16:0/O-18:0), phenylalanyl-Isoleucine dipeptide (phenylglyoxyl-isoline), 4-Acetamido-2-aminobutyric acid (4-acetylaminoacid), N-undecanoside (N-undecylenate), and etoposide (etoposide). The biomarker has good predictive performance, can effectively diagnose or evaluate the severity of a patient with the COVID-19, can reflect the recovery condition of the patient, and is favorable for doctors to accurately predict the progress of diseases and perform clinical intervention in time.
According to an embodiment of the present invention, the biomarker may further comprise at least one of the following additional technical features:
according to an embodiment of the invention, the serum-characterized proteins comprise at least one of actin (SRC 8), inositol-polyphosphatase (MINP 1), serum amyloid A4 (SAA 4), RAS oncogene family members RAP1B (RAP 1B), filamin a (FLNA), matrix Gla Protein (MGP), platelet-derived growth factor subunit B (PDGFB), attractin (ATRN), ankyrin 1 (TLN 1) and tropomyosin 4 (TPM 4).
According to an embodiment of the invention, the serum characteristic metabolites comprise at least one of gamma-Glutamyl-beta-aminopropionitrile (gamma-Glutamyl-beta-aminopropionic acid), thiomorpholine3-carboxylate (Thiomorpholine 3-carbonyl), 2- (styryl) -1-3-dioxolane (2- (phenylsulfonyl) -1, 3-dioxane), 3-hexadecyl-oleanolic acid (3-hexa-decanoic acid), PC (16:1 (9Z)/2:0), 8-hydroxyeicosatetraenoic acid (8-HETE), gulethylglycerol inositol (archetetidylglycol-myo-inol), PC (O-16:0/O-18:0), phenylalanyl-Isoleucine dipeptide (phenylsulfonyl-Isoleucine) and 4-Acetamido-2-aminobutyric acid (4-Acetamido-2-aminobutyric acid).
According to an embodiment of the invention, the biomarker comprises at least one of gamma-Glutamyl-beta-aminopropionitrile (gamma-Glutamyl-beta-aminopropionyl), 3-carboxythiomorpholine (Thiomorpholine 3-carbonyl), 2- (styryl) -1-3-dioxolane (2- (phenylthesyl) -1, 3-dioxalane), dermato-lin (SRC 8), filamin a (FLNA), leukocyte-derived chemokine 2 (LECT 2), N-undecylglycine (N-Undecanoylglycine), etonogestrel (Etonogestrel), attractants (ATRN) and Desloratadine (Desloratadine).
According to an embodiment of the invention, the biomarkers are screened out through an algorithm and a machine learning model.
According to an embodiment of the invention, the algorithm is selected from at least one of a correlation algorithm, a recursive feature elimination algorithm, a genetic algorithm, a Boruta algorithm and an MMPC algorithm.
According to an embodiment of the invention, the algorithm is a correlation algorithm.
According to an embodiment of the invention, the machine learning model is selected from at least one of a random forest model, a k-nearest neighbor model, a single C5.0 decision tree model, and a partial least squares model.
According to an embodiment of the invention, the machine learning model is a random forest model. The inventor finds that the biomarker screened out after the related algorithm is integrated with the random forest model is optimal in terms of precision and accuracy of new crown prediction.
According to an embodiment of the invention, the machine learning model is a random forest model, the biomarkers include γ -Glutamyl- β -aminopropionitrile (γ -Glutamyl- β -aminopropionyl), thiomorpholine3-carboxylate (Thiomorpholine 3-carboxylate), 2- (styryl) -1-3-dioxole (2- (phenylcathenyl) -1, 3-dioxalane), actin (SRC 8), filamin a (FLNA), leukocyte-derived chemokine 2 (LECT 2), N-undecylglycine (N-Undecanoylglycine), etonogestrel (Etonogestrel), attractin (ATRN) and Desloratadine (destatadine), the model parameters are set as follows: method=rf, mtry=4; the trail control parameter method=cv, number=5. According to the specific embodiment of the invention, under the model parameters, the patient with the COVID-19 can be effectively predicted or diagnosed through the 10 biomarkers, and the severity of the patient can be estimated, so that the patient with the COVID-19 has higher accuracy.
According to an embodiment of the invention, the machine learning model is a random forest model, the biomarker includes γ -Glutamyl- β -aminopropionitrile (γ -Glutamyl- β -aminopropionit), 3-carboxythiomorpholine (Thiomorpholine 3-carboxylate), 2- (styryl) -1-3-dioxolane (2- (phenylvinyl) -1, 3-dioxalane), 3-hexadecyl mound acid (3-hexadecanoyland), PC (16:1 (9Z)/2:0), 8-hydroxyeicosatetraenoic acid (8-HETE), gulethylglycero inositol (archaiidyl-myo-inositol), PC (O-16:0/O-18:0), phenylalanyl-Isoleucine dipeptide (phenyllanyl-eucine) and 4-Acetamido-2-aminobutyric acid (4-Acetamido-2-aminobutyric acid), the parameters are set as follows: method=rf, mtry=3; the trail control parameter method=cv, number=5. According to the specific embodiment of the invention, under the model parameters, the patient with the COVID-19 can be effectively predicted or diagnosed through the 10 biomarkers, and the severity of the patient can be estimated, so that the patient with the COVID-19 has higher accuracy.
According to an embodiment of the invention, the machine learning model is a random forest model, the biomarkers comprise actin (SRC 8), inositol-polyphosphate phosphatase (MINP 1), serum amyloid A4 (SAA 4), RAS oncogene family member RAP1B (RAP 1B), filamin a (FLNA), matrix Gla Protein (MGP), platelet derived growth factor subunit B (PDGFB), attractin (ATRN), ankyrin 1 (TLN 1) and tropomyosin 4 (TPM 4), the model parameters are set as follows: method=rf, mtry=2; the trail control parameter method=cv, number=5. According to the specific embodiment of the invention, under the model parameters, the patient with the COVID-19 can be effectively predicted or diagnosed through the 10 biomarkers, and the severity of the patient can be estimated, so that the patient with the COVID-19 has higher accuracy.
It should be noted that, in the present application, "method" refers to a modeling method, "rf" refers to a random forest (random forest), and "mtry" refers to an optimization parameter used for building a random forest machine learning model, that is, the number of variables used for binary tree in a node, "method" in a trace control parameter refers to a model verification method, "cv" refers to cross verification (cross verification), and "number" refers to the number of cross verification.
In another aspect, the invention provides a method of determining the source of a sample to be tested. According to an embodiment of the invention, the method comprises determining whether the source of the sample originates from a new coronal patient or from an critically ill new coronal patient based on the content of the aforementioned biomarkers in the sample to be tested and the aforementioned model, the machine learning model being as defined hereinbefore. The method has good prediction performance, can effectively judge whether the sample is derived from a new coronary patient or a new coronary patient with severe origin, lays a foundation for subsequent scientific researchers to continuously analyze the sample to be detected, or can diagnose or evaluate the severity of the COVID-19 patient clinically according to the method, reflects the recovery condition of the patient, and is favorable for doctors to accurately predict the progress of diseases and perform clinical intervention in time.
According to an embodiment of the present invention, the method for determining a source of a sample to be tested may further include at least one of the following additional technical features:
according to an embodiment of the present invention, the sample to be tested is a serum sample.
In yet another aspect, the invention provides a system for determining the source of a sample to be tested. According to an embodiment of the invention, the system comprises an assay device for determining the content of the aforementioned biomarker in a sample to be tested; and the determining device is connected with the measuring device and is used for determining whether the source of the sample to be tested is from a new coronary patient or from an severe new coronary patient or not based on the content of the biomarker obtained in the measuring device and the model, wherein the machine learning model is defined as above. The system according to the embodiment of the invention can operate the method for determining the source of the sample to be tested, can effectively judge whether the sample is derived from a new coronary patient or whether the sample is derived from a severe new coronary patient, lays a foundation for subsequent scientific researchers to continuously analyze the sample to be tested, or can diagnose or evaluate the severity of the patient with the COVID-19 clinically according to the method, reflects the recovery condition of the patient, and is favorable for doctors to accurately predict the progress of diseases and perform clinical intervention in time.
In yet another aspect of the invention, the invention provides a method for classifying a novel coronavirus infected patient. According to an embodiment of the invention, the method comprises classifying a new coronavirus infected patient based on the content of the aforementioned biomarker and the aforementioned model in a sample to be tested, which is derived from the new coronavirus infected patient. The method has good predictive performance, can effectively evaluate the severity of the patient infected by the novel coronavirus, can reflect the recovery condition of the patient, and improves the accuracy of risk stratification of the patient infected by the novel coronavirus.
In a further aspect of the invention, the invention proposes the use of a biomarker as described hereinbefore in the manufacture of a kit for diagnosing or predicting the severity of a novel coronavirus infection.
In a further aspect of the invention, the invention proposes the use of a reagent for detecting a biomarker as described hereinbefore in the manufacture of a kit for diagnosing a novel coronavirus infection or predicting the severity of a novel coronavirus infection.
According to an embodiment of the invention, the reagent comprises a probe, an antibody, a small molecule compound that specifically recognizes the biomarker.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
The invention provides a group of biomarkers, wherein the biomarkers comprise serum characteristic proteins, serum characteristic metabolites and combinations of the serum characteristic proteins and the serum characteristic metabolites, healthy people and COVID-19 patient groups can be rapidly and effectively distinguished by measuring the content of the biomarkers and adopting the model of the invention, the severity of the COVID-19 patient groups can be accurately divided, and the method is favorable for doctors to accurately predict the progress of diseases and perform clinical intervention in time, achieves symptomatic treatment, and avoids real severe patients from being unable to be effectively treated in time.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a system diagram of determining the source of a sample to be tested according to an embodiment of the present invention;
FIG. 2 is a flow chart of a sample set analysis according to an embodiment of the present invention (where H in cohort 1 is Healthy control, M is Mild patient, S is Severe patient, clinical diagnosis is Severe at time T1 in cohort 2, and time T2 and time T3 are sampling time points of patients during treatment);
FIG. 3 is a diagram of non-targeted proteomic and metabolomic detection of queues 1 and 2 (where UMAP is a unified manifold approximation and projection) according to an embodiment of the present invention;
FIG. 4 is a machine learning model building flow chart according to an embodiment of the invention;
FIG. 5 is a graph of comparison of diagnostic performance in a training set for different "eigen-machine learning models" based on multiple sets of mathematical data (where Accurcy is Accuracy; log Loss is loss function; AUPR is area under Accuracy-recall curve; according to an embodiment of the invention);
FIG. 6 is a graph of top10 feature molecules ranked by contribution in a multi-component CA-RF model in accordance with an embodiment of the invention;
FIG. 7 is a rasterized density plot of the Precision-Recall (Recall) curve (left) and the subject's operational characteristics curve (right) for a multiple-study CA-RF model in accordance with an embodiment of the invention;
FIG. 8 is a graph of a confusion matrix in a validation dataset for a rRF model based on multiple sets of science, proteomics, and metabolomics according to an embodiment of the invention (where cells located diagonally from the top left to the bottom right represent correct predictions, and cells located diagonally from the bottom left represent incorrect predictions H, health control M, light patients (Mild patient), S, severe patients (Severe patient), prop, the proportion of cases in each cell. Light represents poor classification, dark represents good classification);
FIG. 9 is a bar graph of patient rRF model scores in a follow-up cohort according to an embodiment of the invention, including results from sampling at multiple time points for a total of 7 critically ill patients (where clinical diagnosis is severe at time T1, time T2 and time T3 are the time points of patient sampling during treatment;
FIG. 10 is a top10 feature molecular graph of a CA-RF model based on proteomic data in accordance with an embodiment of the present invention;
FIG. 11 is a top10 feature molecular graph of a CA-RF model based on metabonomics data in accordance with an embodiment of the invention;
FIG. 12 is a graph comparing multiple, metabolomic, proteomic rRF models constructed from top10 signature molecules (where Accuracy is Accuracy; mean_F1 is the Mean of the F1 scores (or F1 scores), and logLoss is the loss function) according to an embodiment of the invention.
Detailed Description
The invention provides a system for determining the source of a sample to be tested. As shown in fig. 1, the system includes an assay device 100 and a determination device 200. After the sample to be tested, i.e., the serum sample, enters the measuring device 100, the biomarkers in the serum sample are determined through an algorithm and a machine learning model in the measuring device. The determining means 200 is then able to determine the origin of the serum sample based on the biomarkers obtained in the measuring means 100.
Embodiments of the present invention are described in detail below. The following examples are illustrative only and are not to be construed as limiting the invention. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
Examples
1. Collecting serum samples of the patient with the COVID-19 and healthy control and collecting and recording clinical data:
patients were divided into light (M) and heavy (S) groups according to the national health committee covd-19 guidelines for patient diagnosis and management (7 th edition): the clinical symptoms of the patients with mild symptoms are slightly manifested; severe patients manifest as exhalationsDifficulty in inhalation, respiratory rate not less than 30/min, blood oxygen saturation not more than 93%, ratio of arterial blood oxygen partial pressure to inhaled air oxygen concentration (PaO) 2 /FiO 2 )<300, and/or 24 to 48 hours lung imaging examination lung infiltration greater than 50%.
At the same time, healthy subjects were also enrolled into healthy control group (H). In addition, separate patient cohorts were enrolled and blood samples from these patients were collected dynamically as a follow-up cohort. The peripheral blood collection time of the patient is 1-10 days after admission, and the serum sample is inactivated at 56 ℃ for 30 minutes so as to carry out the next histology analysis.
2. The method comprises the steps of obtaining proteomics and metabonomics data of serum samples of a study object by adopting a high performance liquid chromatography-mass spectrometry technology:
(1) Proteomic analysis:
1) Serum high-abundance proteins were removed and low-abundance proteins were enriched using a ProteoMinerTMkits (Bio-Rad, USA) kit.
2) Samples were sequentially subjected to reduction (10 mM dithiothreitol, 37 ℃,60 min), alkylation (40 mM iodoacetamide, room temperature, 30 min) and enzymatic hydrolysis (pancreatin: sample = 1:50, 37 ℃,12 h).
3) The enzymatic peptide was desalted by C18 (The Nest Group, USA) and lyophilized.
4) Preparing a library sample: (1) mixing the treated sample peptide solution in equal amount, dividing the sample peptide solution into 6 fractions, and vacuum freeze-drying; (2) the sample serum was mixed in equal amounts, and after removing the high abundance proteins (HighSelect ™ Top14 Abundant Protein Depletion Mini Spin Columns, thermo Fisher, USA), its peptide fragments were dissolved according to the previous method and separated into 18 fractions using a liquid chromatography system, followed by vacuum lyophilization treatment.
5) Mass spectrometry: (1) building a warehouse: the enzymatically hydrolyzed polypeptide samples were placed in formic acid and acetonitrile solutions containing iRT (Biognosys, schlieren, switzerland) standard peptides at a concentration of 0.1%, loaded on an EASY-nLC 1000 system (Thermo FisherScientific, USA) nano-upgrad liquid chromatograph at 5 μl, set at a flow rate of 300nL/min, and separated on an analytical column. It was detected using positive ion scanning for 65 min. Data Dependent Analysis (DDA) was done using a qexact ™ HF-X mass spectrometer (Thermo Fisher Scientific, USA). Primary mass spectrum scan range: 300-1800 m/z, mass spectrum resolution: 60000 (m/z 200), AGC target:3e6,Maximum IT:50 ms,DDA data were directly imported into Spectronaut (version 14.10; biognosys AG, USA) software to construct a profile library. (2) Data independent pattern analysis (DIA): 2 mug peptide fragments (into which an appropriate amount of iRT standard peptide fragments were incorporated) were taken for each sample and subjected to DIA analysis using the same assay platform as used in the library building step. The DIA mode includes 60 variable scanning windows. The mass spectrum setting parameters are as follows: 120,000 (m/z 350-1250); nce=27%; AGC target=3e6; max it=60 ms. The cycle time was 3 seconds. The DIA data was imported into Spectronaut software for analysis.
(2) Metabonomic analysis:
1) Metabolite extraction: (1) hydrophobic metabolite extraction. 40. Mu.L of serum sample was added to 300. Mu.L of methanol solution containing internal standard (PC [12:0-13:0], PE [12:0-13:0], cer/Sph mixture I and FFA [19:0 ]), and after shaking for 2 min, 1000. Mu.L of methyl t-butyl ether and 250. Mu.L of deionized water were sequentially added to extract hydrophobic metabolite, and the mixture was lyophilized with liquid nitrogen and stored at-80 ℃. (2) Hydrophilic metabolite extraction. 100. Mu.L of serum samples were taken, 300. Mu.L of methanol with internal standard (TMAO-D9) was added, the supernatant was centrifuged, and the liquid nitrogen was lyophilized and stored at-80 ℃.
2) Mass spectrometry: the platform was tested in combination using a Dionex TMMMultiMate ™ 3000 Rapid Separation LC (RSLC) system (Thermo Scientific, USA) liquid chromatography system and a Q exact ™ hybrid quadrupole Orbitrap massspectrometer (Thermo Scientific, USA) mass spectrometry system. The hydrophobic metabolite was detected by a ACQUITY UPLC BEH C8 (1.7 μm, 2.1 mm ×100 mm; waters, USA) column, and mass spectrometry was performed using an anion mode and a cation mode, respectively; hydrophilic metabolite detection mass spectrometry was performed using a UPLC BEH Amide column (2.1 mm X100 mm, 1.7 μm; waters, USA) column using a cationic mode. The mass spectrometry parameters were set as: scan range = m/z 150-1500; mass spectral resolution = 70,000; agc=3e6; max it=50 ms; the secondary mass spectrum resolution was 17,500. The data processing was performed using Xcalibur 2.2 SP1.48 software (ThermoFisher Scientific, USA).
3. The data processing analysis adopts the process of extracting characteristic variables, constructing and selecting a model, adopting a Top10 variable reconstruction model and verifying the model to sequentially establish a machine learning model based on proteomics data, metabonomics data and integrating two groups of data. In addition, the role of the machine learning model in disease course monitoring is evaluated in a follow-up queue. Based on the sero-proteomic data and the metabonomic data, a machine learning model is constructed that can stratify risk of a patient with covd-19, and the specific process is as follows, as shown in fig. 4:
(1) Log2 transformation is carried out on proteomics data and metabonomics data, and median normalization and minimum filling are carried out, so that the proteomics data and metabonomics data are used as a data set constructed by a machine learning model.
(2) 154 subjects were randomly divided into 109 training sets (70%) and 45 test sets (30%).
(3) Five algorithms were used to screen feature variable combinations (FP): correlation Algorithms (CA), recursive feature elimination (Recursivefeature elimination, RFE), genetic Algorithms (GA), boruta Algorithm (BA), and MMPC algorithm (Max-min parents and children algorithm, MMPC).
(4) Four machine learning Models (ML) were constructed for each feature variable combination: random forest models (RF), K-nearest neighbor models (K-nearest neighbors, KNN), single C5.0tree models (c5.0tree), and partial least squares models (Partial least squares, PLS). The parameters of each FP-ML combination were optimized using a 5-fold cross-validated basic grid search algorithm. In order to overcome the problem of unbalanced sample size of different study groups when constructing an RF model, corresponding weights w are assigned to different study groups c =N/(kN c ) Wherein c is a group, N c For the c groups of samples, k is the number of groups and N is the total number of samples. Selecting FP-ML, CA-RF, with the best precision-recall area under the curve (area under the precision recall curve, AUPR) and lowest log lossThe rows are further optimized.
(5) Model variables are prioritized based on the CA-RF model, and the top10 variables (top 10) are further selected to construct an RF model (i.e., rRF model).
(6) The rRF model was validated in the validation dataset and evaluated for its disease course monitoring ability in the follow-up queue 2.
Specific data and results are described below for study queue recruitment cases 1. Study cohort 1 included healthy controls (H, n=30), light (M, n=42) and severe (S, n=82). Study cohort 2 included 7 critically ill patients, and 15 peripheral blood samples were dynamically collected as shown in fig. 2.
2. Serological testing. 717 serum metabolites and 628 serum proteins (stably detected in 50% of patients in at least one study group) were identified by proteomic analysis and metabonomic analysis. Unified manifold approximation and projection (Uniform manifold approximation and projection, UMAP) analysis showed that both histology data better differentiated H, M, S populations, as shown in fig. 3.
3. And (5) screening characteristic molecules and constructing a machine model. First, study cohort 1 was divided into training set (n=109) and validation set (n=45). Then 5 molecular Feature combinations (FP) were obtained using 5 Feature molecular screening algorithms (correlation algorithm (Correlation algorithm, CA), recursive Feature elimination (Recursive Feature elimination, RFE), genetic algorithm (Genetic algorithm, GA), boruta Algorithm (BA), and MMPC algorithm (Max-min parents and children algorithm, MMPC)), CA-FP (n=774 Feature molecules), RFE-FP (n=4 Feature molecules), GA-FP (n=281 Feature molecules), BA-FP (n=162 Feature molecules), MMPC-FP (n=7 Feature molecules), respectively. Thereafter, 24 "molecular feature-machine learning" models were constructed by 4 machine learning model algorithms (random forest, RF), K-nearest neighbor model (K-nearest neighbors, KNN), single C5.0tree model (C5.0 tree) and partial least squares model (Partial least squares, PLS)), as shown in fig. 5, and by comparison, the model with the best precision-recall area under the curve (area under the precision recall curve, AUPR) and the lowest logLoss was selected as the best model, i.e., the RF-CA model, for optimization.
Further, according to the contribution degree of model variables, as shown in FIG. 6, the top10 characteristic molecules (including gamma-Glutamyl-beta-aminopropionate, thiomorpholine3-carboxylate, 2- (phenylthanyl) -1, 3-dioxanate, SRC8, FLNA, LECT2, N-Undelanoglycine, etOnogestrel, ATN and Desloatadine) of the RF-CA model are taken to reconstruct the RF model (namely, the rRF model is used, the model construction still uses a caraet package, 5-fold cross validation, the optimization parameter mtry=4 is used, the sample weight is used, and other default parameters are used), and finally the micro-average AUPR (micro-average AUPR) and AUROC (micro-average AUROC) of rRF are 0.7693 (95% CI, 0.7570-0.7708) and 0.9997 (95% CI, 0.9997-0.9998), respectively (shown in FIG. 7). Finally, the accuracy of rRF is verified in the verification set, and the confusion matrix is shown as shown in FIG. 8, and the accuracy of classifying 45 study objects in the test set by the multiple groups of the metabonomics rRF models and the proteomics rRF models is found to be 100%, and the accuracy of classifying 45 study objects by the proteomics rRF models is found to be 91.84%, which indicates that the three models can accurately classify the samples in the verification set. Furthermore, analysis of the samples in cohort 2 found that the rRF model score (probability of samples falling under severe-probability of falling under mild) for 71.42% (5/7) subject samples showed a steadily decreasing trend with patient recovery, suggesting that the rRF model could also have potential as a means of clinical course monitoring (fig. 9).
To further verify the above-described multiple-group metabolic rRF model, F models based on proteomic top10 characteristic molecules (including SRC8, MINP1, SAA4, RAP1B, FLNA, MGP, PDGFB, ATRN, TLN1, TPM 4) and metabonomic top10 characteristic molecules (including gamma-Glutaminyl-beta-aminopropionate, 2- (phenylcathenyl) -1, 3-dioxane, 3-Hexadecanoyloleanolic acid, PC (16:1 (9Z)/2:0), 8-HETE, archalidyl-myo-inoisitol, PC (O-16:0/O-18:0)) were constructed respectively using the same analytical procedure, and the multiple-group metabolic rRF models were further trained by comparing the rRF models of the multiple groups of rRF, F1, 3-dioxanyl) -1, 3-dioxanyl, PC (16:1 (9Z)/2:0), 8-HETE, archalidyl-myo-inoisitol, PC (O-16:0), phenylalanyl-Isoleucine,4-Acetamido-2-aminobutanoic acid) as shown in FIG. 12, and thus the rRF model was better than the single-group rRF model was trained by comparing the metabolic scores of the multiple groups of rRF models.
Therefore, the random forest model constructed based on proteomics, metabonomics and integrated two groups of top10 characteristic molecules can be used for identifying heavy and light COVID-19 patients, has good identification performance on heavy and light COVID-19 patient samples, and can improve the accuracy of risk stratification on the COVID-19 patients.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (16)
1. A panel of biomarkers, wherein the biomarkers comprise at least one member selected from the group consisting of serum-specific proteins and serum-specific metabolites,
wherein,,
the serum-characterized proteins include at least one of a dermato-kinetin, a polyose-polyphosphatase, a serum amyloid A4, a RAS oncogene family member RAP1B, a filamin a, a matrix Gla protein, a platelet-derived growth factor subunit B, an attractor, a talin 1, a tropomyosin 4, and a leukocyte-derived chemokine 2;
the serum trait metabolites include at least one of gamma-glutamyl-beta-aminopropionitrile, 3-carboxythiomorpholine, 2- (styryl) -1-3-dioxolane, 3 hexadecyl oleanolic acid, PC (16:1 (9Z)/2:0), 8 hydroxy eicosatetraenoic acid, gulethyl glycero-inositol, PC (O-16:0/O-18:0), phenylalanyl-isoleucine dipeptide, 4-acetamido-2-aminobutyric acid, N-undecylglycine, etonogestrel, and desloratadine.
2. The biomarker of claim 1, wherein the serum-characterized protein comprises at least one of a dermato-kinetin, a polybasic inositol-polyphosphate phosphatase, a serum amyloid A4, a RAS oncogene family member RAP1B, filamin a, a matrix Gla protein, a platelet-derived growth factor subunit B, an attractin, a talin 1, and a tropomyosin 4.
3. The biomarker of claim 1, wherein the serum profile metabolites comprise at least one of gamma-glutamyl-beta-aminopropionitrile, 3-carboxythiomorpholine, 2- (styryl) -1-3-dioxolane, 3 hexadecyl mound acid, PC (16:1 (9Z)/2:0), 8 hydroxy eicosatetraenoic acid, gul-ethyl-glycerol-inositol, PC (O-16:0/O-18:0), phenylalanyl-isoleucine dipeptide, and 4-acetamido-2-aminobutyric acid.
4. The biomarker of claim 1, wherein the biomarker comprises at least one of gamma-glutamyl-beta-aminopropionitrile, 3-carboxythiomorpholine, 2- (styryl) -1-3-dioxolane, actin, filamin a, leukocyte-derived chemotactic agent 2, n-undecylglycine, etogestrel, attractin, and desloratadine.
5. The biomarker of claim 1, wherein the biomarker is screened out by an algorithm and a machine learning model.
6. The biomarker according to claim 5 wherein the algorithm is selected from at least one of a correlation algorithm, a recursive feature elimination algorithm, a genetic algorithm, a Boruta algorithm and an MMPC algorithm.
7. The biomarker of claim 5, wherein the algorithm is a correlation algorithm.
8. The biomarker of claim 5, wherein the machine learning model is selected from at least one of a random forest model, a k-nearest neighbor model, a single C5.0 decision tree model, and a partial least squares model.
9. The biomarker of claim 5, wherein the machine learning model is a random forest model.
10. The biomarker of claim 5, wherein the machine learning model is a random forest model, the biomarker comprising γ -glutamyl- β -aminopropionitrile, 3-carboxythiomorpholine, 2- (styryl) -1-3-dioxolane, actin, filamin a, leukocyte-derived chemotactic 2, n-undecylglycine, etonogestrel, attractants, and desloratadine, the model parameters being set as follows: method=rf, mtry=4; the trail control parameter method=cv, number=5.
11. The biomarker of claim 5, wherein the machine learning model is a random forest model, the biomarker comprises γ -glutamyl- β -aminopropionitrile, 3-carboxythiomorpholine, 2- (styryl) -1-3-dioxolane, 3 hexadecyl oleanolic acid, PC (16:1 (9Z)/2:0), 8 hydroxyeicosatetraenoic acid, gulethylglycerinositols, PC (O-16:0/O-18:0), phenylalanyl-isoleucine dipeptide and 4-acetamido-2-aminobutyric acid, the model parameters are set as follows: method=rf, mtry=3; the trail control parameter method=cv, number=5.
12. The biomarker of claim 5, wherein the machine learning model is a random forest model, the biomarker comprising actin, myo-inositol-polyphosphatase, serum amyloid A4, RAS oncogene family member RAP1B, filamin a, matrix Gla protein, platelet derived growth factor subunit B, attractin, talin 1, and tropomyosin 4, the model parameters being set forth below: method=rf, mtry=2; the trail control parameter method=cv, number=5.
13. A system for determining the source of a sample to be tested, comprising:
the measuring device is used for determining the content of the biomarker according to any one of claims 1-4 in a sample to be measured;
the determining device is connected with the determining device and is used for determining whether the source of the sample to be detected is derived from a novel coronavirus patient or a severe patient or not based on the content of the biomarker obtained in the determining device and a machine learning model, wherein the machine learning model is defined in any one of claims 10-12, and the sample to be detected is a serum sample.
14. Use of a biomarker according to any of claims 1 to 4, in the manufacture of a kit for diagnosing or predicting the severity of a novel coronavirus infection.
15. Use of a reagent for detecting a biomarker according to any of claims 1 to 4 in the manufacture of a kit for diagnosing or predicting the severity of a novel coronavirus infection.
16. The use according to claim 15, wherein the agent comprises a probe, an antibody, a small molecule compound that specifically recognizes the biomarker.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310172516.3A CN116087482B (en) | 2023-02-24 | 2023-02-24 | Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310172516.3A CN116087482B (en) | 2023-02-24 | 2023-02-24 | Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116087482A true CN116087482A (en) | 2023-05-09 |
CN116087482B CN116087482B (en) | 2023-07-11 |
Family
ID=86186939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310172516.3A Active CN116087482B (en) | 2023-02-24 | 2023-02-24 | Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116087482B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112458159A (en) * | 2020-08-27 | 2021-03-09 | 中国人民解放军军事科学院军事医学研究院 | Method and kit for detecting polymorphism of 21q22.3 region related to severe coronavirus pneumonia, and application of method and kit |
WO2021067628A2 (en) * | 2019-10-01 | 2021-04-08 | Beth Israel Deaconess Medical Center, Inc. | Conformation-specific antibodies that bind nuclear factor kappa-light-chain-enhancer of activated b cells |
CN114051536A (en) * | 2020-06-12 | 2022-02-15 | 国际宇宙医疗株式会社 | COVID-19 severity prediction method using RNA in blood |
CN114107439A (en) * | 2020-08-28 | 2022-03-01 | 苏州同力生物医药有限公司 | Method, system and kit for preparing test solution for pathogen detection, detection primers and method |
WO2022114984A1 (en) * | 2020-11-25 | 2022-06-02 | Qatar Foundation For Education, Science And Community Development | Methods of treating sars-cov-2 infections |
CN114858906A (en) * | 2021-02-04 | 2022-08-05 | 北京毅新博创生物科技有限公司 | Kit for diagnosing neocoronary pneumonia |
CN114858903A (en) * | 2021-02-04 | 2022-08-05 | 北京毅新博创生物科技有限公司 | Characteristic polypeptide composition for diagnosing neocoronary pneumonia |
-
2023
- 2023-02-24 CN CN202310172516.3A patent/CN116087482B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021067628A2 (en) * | 2019-10-01 | 2021-04-08 | Beth Israel Deaconess Medical Center, Inc. | Conformation-specific antibodies that bind nuclear factor kappa-light-chain-enhancer of activated b cells |
CN114051536A (en) * | 2020-06-12 | 2022-02-15 | 国际宇宙医疗株式会社 | COVID-19 severity prediction method using RNA in blood |
CN112458159A (en) * | 2020-08-27 | 2021-03-09 | 中国人民解放军军事科学院军事医学研究院 | Method and kit for detecting polymorphism of 21q22.3 region related to severe coronavirus pneumonia, and application of method and kit |
CN114107439A (en) * | 2020-08-28 | 2022-03-01 | 苏州同力生物医药有限公司 | Method, system and kit for preparing test solution for pathogen detection, detection primers and method |
WO2022114984A1 (en) * | 2020-11-25 | 2022-06-02 | Qatar Foundation For Education, Science And Community Development | Methods of treating sars-cov-2 infections |
CN114858906A (en) * | 2021-02-04 | 2022-08-05 | 北京毅新博创生物科技有限公司 | Kit for diagnosing neocoronary pneumonia |
CN114858903A (en) * | 2021-02-04 | 2022-08-05 | 北京毅新博创生物科技有限公司 | Characteristic polypeptide composition for diagnosing neocoronary pneumonia |
Non-Patent Citations (1)
Title |
---|
TOSHIFUMI MATSUYAMA ET AL.: "An aberrant STAT pathway is central to COVID-19", CELL DEATH AND DIFFERENTIATION, vol. 27, pages 3209 - 3225, XP037312803, DOI: 10.1038/s41418-020-00633-7 * |
Also Published As
Publication number | Publication date |
---|---|
CN116087482B (en) | 2023-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Evaluation of a novel, integrated approach using functionalized magnetic beads, bench-top MALDI-TOF-MS with prestructured sample supports, and pattern recognition software for profiling potential biomarkers in human plasma | |
CN110057955B (en) | Method for screening specific serum marker of hepatitis B | |
US20080086272A1 (en) | Identification and use of biomarkers for the diagnosis and the prognosis of inflammatory diseases | |
CA2681010A1 (en) | Apolipoprotein fingerprinting technique | |
WO2011157655A1 (en) | Use of bile acids for prediction of an onset of sepsis | |
CN112881547B (en) | Screening method of early liver cancer diagnosis markers for liver cirrhosis and hepatitis people | |
JP2017510821A (en) | Method and system for determining risk of autism spectrum disorder | |
CN110057954B (en) | Application of plasma metabolism marker in diagnosis or monitoring of HBV | |
JP2020517935A (en) | Diagnostic method for Behcet's disease using metabolite analysis | |
JPWO2006129401A1 (en) | Screening method for specific proteins in comprehensive proteome analysis | |
US20070082402A1 (en) | Device and process for the quantitative evaluation of the polypeptides and markers contained in a sample of body fluid for recognizing pathological conditions | |
US20160018413A1 (en) | Methods of Prognosing Preeclampsia | |
CN116087482B (en) | Biomarkers for severity typing of course of patients with 2019 novel coronavirus infection | |
CN116754772A (en) | Peripheral blood protein marker for early diagnosis of senile dementia, application and auxiliary diagnosis system | |
CN108318573B (en) | Preparation method of mass spectrum model for detecting insulin resistance | |
WO2009156747A2 (en) | Assay | |
Heegaard et al. | Important options available—from start to finish—for translating proteomics results to clinical chemistry | |
CN112305120B (en) | Application of metabolite in atherosclerotic cerebral infarction | |
US20120003744A1 (en) | Diagnostic methods | |
CN116183924B (en) | Serum metabolism marker for liver cancer risk prediction and screening method and application thereof | |
CN112305123B (en) | Application of small molecular substance in atherosclerotic cerebral infarction | |
CN116165385B (en) | Serum metabolic marker for liver cancer diagnosis and screening method and application thereof | |
CN113866284B (en) | Intestinal microbial metabolism markers for heart failure diagnosis and application thereof | |
Smith et al. | Maximizing Analytical Performance in Biomolecular Discovery with LC-MS: Focus on Psychiatric Disorders | |
CN109870583B (en) | Metabolites associated with acute pancreatitis and uses thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |