CN116735889A - Protein marker for early colorectal cancer screening, kit and application - Google Patents
Protein marker for early colorectal cancer screening, kit and application Download PDFInfo
- Publication number
- CN116735889A CN116735889A CN202310049892.3A CN202310049892A CN116735889A CN 116735889 A CN116735889 A CN 116735889A CN 202310049892 A CN202310049892 A CN 202310049892A CN 116735889 A CN116735889 A CN 116735889A
- Authority
- CN
- China
- Prior art keywords
- colorectal cancer
- protein
- protein marker
- marker combination
- application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010009944 Colon cancer Diseases 0.000 title claims abstract description 71
- 208000001333 Colorectal Neoplasms Diseases 0.000 title claims abstract description 71
- 239000012474 protein marker Substances 0.000 title claims abstract description 59
- 238000012216 screening Methods 0.000 title abstract description 13
- 101000783723 Homo sapiens Leucine-rich alpha-2-glycoprotein Proteins 0.000 claims abstract description 37
- 102100035987 Leucine-rich alpha-2-glycoprotein Human genes 0.000 claims abstract description 37
- 102100031006 Beta-Ala-His dipeptidase Human genes 0.000 claims abstract description 32
- 101000919694 Homo sapiens Beta-Ala-His dipeptidase Proteins 0.000 claims abstract description 32
- 102100022712 Alpha-1-antitrypsin Human genes 0.000 claims abstract description 28
- 101000823116 Homo sapiens Alpha-1-antitrypsin Proteins 0.000 claims abstract description 28
- 101001044940 Homo sapiens Insulin-like growth factor-binding protein 2 Proteins 0.000 claims abstract description 26
- 101000609406 Homo sapiens Inter-alpha-trypsin inhibitor heavy chain H3 Proteins 0.000 claims abstract description 26
- 102100022710 Insulin-like growth factor-binding protein 2 Human genes 0.000 claims abstract description 26
- 102100039460 Inter-alpha-trypsin inhibitor heavy chain H3 Human genes 0.000 claims abstract description 26
- 238000001514 detection method Methods 0.000 claims abstract description 24
- 239000013256 coordination polymer Substances 0.000 claims abstract description 23
- 102100022463 Alpha-1-acid glycoprotein 1 Human genes 0.000 claims abstract description 19
- 101000678195 Homo sapiens Alpha-1-acid glycoprotein 1 Proteins 0.000 claims abstract description 19
- 238000004393 prognosis Methods 0.000 claims abstract description 18
- 238000003745 diagnosis Methods 0.000 claims abstract description 14
- 108090000623 proteins and genes Proteins 0.000 claims description 64
- 102000004169 proteins and genes Human genes 0.000 claims description 61
- 229920001184 polypeptide Polymers 0.000 claims description 47
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 47
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 47
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 238000010801 machine learning Methods 0.000 claims description 15
- 238000007477 logistic regression Methods 0.000 claims description 9
- 238000013500 data storage Methods 0.000 claims description 8
- 239000003153 chemical reaction reagent Substances 0.000 claims description 7
- 238000004949 mass spectrometry Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 claims description 2
- 230000035945 sensitivity Effects 0.000 abstract description 28
- 210000002381 plasma Anatomy 0.000 abstract description 15
- 238000000034 method Methods 0.000 abstract description 12
- 201000011510 cancer Diseases 0.000 abstract description 8
- 206010028980 Neoplasm Diseases 0.000 abstract description 7
- 239000000463 material Substances 0.000 abstract description 3
- 230000003902 lesion Effects 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 33
- 238000012549 training Methods 0.000 description 21
- 125000003275 alpha amino acid group Chemical group 0.000 description 20
- 238000010200 validation analysis Methods 0.000 description 19
- 239000000523 sample Substances 0.000 description 18
- 210000004369 blood Anatomy 0.000 description 6
- 239000008280 blood Substances 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 239000007791 liquid phase Substances 0.000 description 4
- 238000001819 mass spectrum Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 4
- 230000004083 survival effect Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 102000004506 Blood Proteins Human genes 0.000 description 3
- 108010017384 Blood Proteins Proteins 0.000 description 3
- 101001091590 Homo sapiens Kininogen-1 Proteins 0.000 description 3
- 101001090065 Homo sapiens Peroxiredoxin-2 Proteins 0.000 description 3
- 108090000723 Insulin-Like Growth Factor I Proteins 0.000 description 3
- 102100035792 Kininogen-1 Human genes 0.000 description 3
- 102100034763 Peroxiredoxin-2 Human genes 0.000 description 3
- 102000004142 Trypsin Human genes 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000005191 phase separation Methods 0.000 description 3
- 229960001322 trypsin Drugs 0.000 description 3
- 239000012588 trypsin Substances 0.000 description 3
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 2
- 108010034753 Complement Membrane Attack Complex Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 108010088842 Fibrinolysin Proteins 0.000 description 2
- 206010064571 Gene mutation Diseases 0.000 description 2
- 102000004218 Insulin-Like Growth Factor I Human genes 0.000 description 2
- 108090001117 Insulin-Like Growth Factor II Proteins 0.000 description 2
- 102000048143 Insulin-Like Growth Factor II Human genes 0.000 description 2
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 102000004338 Transferrin Human genes 0.000 description 2
- 108090000901 Transferrin Proteins 0.000 description 2
- 108090000631 Trypsin Proteins 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000013399 early diagnosis Methods 0.000 description 2
- 229940088598 enzyme Drugs 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000002550 fecal effect Effects 0.000 description 2
- 235000019253 formic acid Nutrition 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000011987 methylation Effects 0.000 description 2
- 238000007069 methylation reaction Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000002552 multiple reaction monitoring Methods 0.000 description 2
- 229940012957 plasmin Drugs 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 239000012581 transferrin Substances 0.000 description 2
- KIUKXJAPPMFGSW-DNGZLQJQSA-N (2S,3S,4S,5R,6R)-6-[(2S,3R,4R,5S,6R)-3-Acetamido-2-[(2S,3S,4R,5R,6R)-6-[(2R,3R,4R,5S,6R)-3-acetamido-2,5-dihydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-2-carboxy-4,5-dihydroxyoxan-3-yl]oxy-5-hydroxy-6-(hydroxymethyl)oxan-4-yl]oxy-3,4,5-trihydroxyoxane-2-carboxylic acid Chemical compound CC(=O)N[C@H]1[C@H](O)O[C@H](CO)[C@@H](O)[C@@H]1O[C@H]1[C@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H]([C@@H](O[C@H]3[C@@H]([C@@H](O)[C@H](O)[C@H](O3)C(O)=O)O)[C@H](O)[C@@H](CO)O2)NC(C)=O)[C@@H](C(O)=O)O1 KIUKXJAPPMFGSW-DNGZLQJQSA-N 0.000 description 1
- WMSPZRZYIGOEDS-QMMMGPOBSA-N 4-[[(1s)-5-amino-1-carboxypentyl]amino]-3,5-dinitrobenzoic acid Chemical compound NCCCC[C@@H](C(O)=O)NC1=C([N+]([O-])=O)C=C(C(O)=O)C=C1[N+]([O-])=O WMSPZRZYIGOEDS-QMMMGPOBSA-N 0.000 description 1
- ATRRKUHOCOJYRX-UHFFFAOYSA-N Ammonium bicarbonate Chemical compound [NH4+].OC([O-])=O ATRRKUHOCOJYRX-UHFFFAOYSA-N 0.000 description 1
- 229910000013 Ammonium bicarbonate Inorganic materials 0.000 description 1
- 102100024504 Bone morphogenetic protein 3 Human genes 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 108020004414 DNA Proteins 0.000 description 1
- 238000009007 Diagnostic Kit Methods 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 206010014561 Emphysema Diseases 0.000 description 1
- 102000010834 Extracellular Matrix Proteins Human genes 0.000 description 1
- 108010037362 Extracellular Matrix Proteins Proteins 0.000 description 1
- CWYNVVGOOAEACU-UHFFFAOYSA-N Fe2+ Chemical compound [Fe+2] CWYNVVGOOAEACU-UHFFFAOYSA-N 0.000 description 1
- VTLYFUHAOXGGBS-UHFFFAOYSA-N Fe3+ Chemical compound [Fe+3] VTLYFUHAOXGGBS-UHFFFAOYSA-N 0.000 description 1
- 101000762375 Homo sapiens Bone morphogenetic protein 3 Proteins 0.000 description 1
- 101000976697 Homo sapiens Inter-alpha-trypsin inhibitor heavy chain H1 Proteins 0.000 description 1
- 101000995332 Homo sapiens Protein NDRG4 Proteins 0.000 description 1
- 206010062016 Immunosuppression Diseases 0.000 description 1
- 102100023490 Inter-alpha-trypsin inhibitor heavy chain H1 Human genes 0.000 description 1
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 101150105104 Kras gene Proteins 0.000 description 1
- 102000005741 Metalloproteases Human genes 0.000 description 1
- 108010006035 Metalloproteases Proteins 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 108010067372 Pancreatic elastase Proteins 0.000 description 1
- 102000016387 Pancreatic elastase Human genes 0.000 description 1
- 108010001014 Plasminogen Activators Proteins 0.000 description 1
- 102000001938 Plasminogen Activators Human genes 0.000 description 1
- 102100034432 Protein NDRG4 Human genes 0.000 description 1
- 101150042012 SEPTIN9 gene Proteins 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 229940122055 Serine protease inhibitor Drugs 0.000 description 1
- 101710102218 Serine protease inhibitor Proteins 0.000 description 1
- 102000013275 Somatomedins Human genes 0.000 description 1
- 108090000190 Thrombin Proteins 0.000 description 1
- 229940122618 Trypsin inhibitor Drugs 0.000 description 1
- 101710162629 Trypsin inhibitor Proteins 0.000 description 1
- 108010027252 Trypsinogen Proteins 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000010398 acute inflammatory response Effects 0.000 description 1
- 230000033289 adaptive immune response Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 235000012538 ammonium bicarbonate Nutrition 0.000 description 1
- 239000001099 ammonium carbonate Substances 0.000 description 1
- 230000010100 anticoagulation Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 229940080701 chymosin Drugs 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 230000004154 complement system Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011033 desalting Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 239000012470 diluted sample Substances 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001704 evaporation Methods 0.000 description 1
- 210000002744 extracellular matrix Anatomy 0.000 description 1
- 210000003608 fece Anatomy 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 229920002674 hyaluronan Polymers 0.000 description 1
- 229960003160 hyaluronic acid Drugs 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 230000001506 immunosuppresive effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000015788 innate immune response Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000005732 intercellular adhesion Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 210000004901 leucine-rich repeat Anatomy 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 238000011528 liquid biopsy Methods 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 208000019423 liver disease Diseases 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 210000003563 lymphoid tissue Anatomy 0.000 description 1
- 208000024714 major depressive disease Diseases 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 238000000386 microscopy Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000001616 monocyte Anatomy 0.000 description 1
- GNOLWGAJQVLBSM-UHFFFAOYSA-N n,n,5,7-tetramethyl-1,2,3,4-tetrahydronaphthalen-1-amine Chemical compound C1=C(C)C=C2C(N(C)C)CCCC2=C1C GNOLWGAJQVLBSM-UHFFFAOYSA-N 0.000 description 1
- 230000000926 neurological effect Effects 0.000 description 1
- 238000002414 normal-phase solid-phase extraction Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000020477 pH reduction Effects 0.000 description 1
- 238000005502 peroxidation Methods 0.000 description 1
- 230000036470 plasma concentration Effects 0.000 description 1
- 229940127126 plasminogen activator Drugs 0.000 description 1
- 238000010837 poor prognosis Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108010064037 prorennin Proteins 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 238000005173 quadrupole mass spectroscopy Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 239000003001 serine protease inhibitor Substances 0.000 description 1
- 230000019491 signal transduction Effects 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 229960004072 thrombin Drugs 0.000 description 1
- 230000000451 tissue damage Effects 0.000 description 1
- 231100000827 tissue damage Toxicity 0.000 description 1
- 239000002753 trypsin inhibitor Substances 0.000 description 1
- 239000012224 working solution Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
- G01N27/626—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/531—Production of immunochemical test materials
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/06—Gastro-intestinal diseases
- G01N2800/065—Bowel diseases, e.g. Crohn, ulcerative colitis, IBS
Abstract
The application discloses a protein marker combination for colorectal cancer prediction, diagnosis or prognosis, and belongs to the technical field of cancer proteomics detection. The protein marker combination includes at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1. The application also provides application and a system based on the protein marker combination. The protein marker combination of the application provides a non-invasive screening means based on plasma for the prediction of early colorectal cancer, even premalignant lesions. The method and the system of the application are used for predicting, diagnosing or prognosing colorectal cancer, have no wound on patients, convenient material acquisition, small blood plasma sample amount, high sensitivity and specificity, and most importantly fill the blank that the early colorectal cancer has no effective protein marker.
Description
Technical Field
The application belongs to the technical field of cancer proteomics detection, and particularly relates to a protein marker for early colorectal cancer screening, a kit and application.
Background
Colorectal cancer is one of the five major causes of cancer death worldwide. In the united states, colorectal cancer incidence rates are third and mortality rates are second. Similarly, colorectal cancer is also a highly malignant tumor that severely affects the health of the national people in China, and the morbidity and mortality rate of colorectal cancer are ranked in the top three among all malignant tumors. The main reason for the low survival rate of colorectal cancer patients is the lack of effective early diagnosis of early stage intestinal cancer. A number of clinical practices have shown that patients who have undergone surgery in the early stages of tumorigenesis (stage I or IIa) have a five-year survival rate of 90%, whereas patients who have undergone surgery in the late stages (stage III and IV) have a five-year survival rate of less than 10%. Colorectal cancer often evolves from precancerous to diffuse metastatic malignancy for 10-15 years, so making early diagnoses of cancer cells before they diffuse metastasis is of great importance to improve survival in patients.
The main means of the existing colorectal cancer screening in clinic comprise colorectal microscopy, imaging examination, fecal occult blood test, DNA detection, CEA and other protein markers detection and the like. The conventional technology is invasive or generates radiation damage, and more importantly, the sensitivity is low, so that the conventional technology is difficult to be used for early screening of large-scale risk groups, and the tolerance and the acceptance of common groups to enteroscopes are low. The only non-invasive detection means applied to clinic is the chemical and immunological detection of fecal occult blood, but the sensitivity of the detection on colorectal cancer is only 61-79% on the premise of 86-95% specificity, and the detection rate of early colorectal cancer is difficult to meet clinical requirements although the detection method is widely applied to clinic.
In recent years, liquid biopsy technology has been developed rapidly, and the problem of lower sensitivity of the traditional detection technology is solved to a certain extent. For example, the methylation products of Septin9 gene in blood plasma (Epi protocol), the detection of BMP3/NDRG4 methylation in feces in combination with KRAS gene mutation and the early colorectal cancer screening products of FIT (Cologuard) are used, and these noninvasive novel screening technologies create a new era of early diagnosis of colorectal cancer. However, there is still a great room for improvement in the sensitivity and specificity of these detection techniques. For example, the Epi protocol assay has 97.5% specificity, but only 79% sensitivity, which can lead to a large proportion of missed diagnoses. Cologard can reach a sensitivity of 95.55%, but its specificity is reduced to 87.1%. Meanwhile, the sensitivity and the specificity are improved, the detection accuracy can be better improved, and the probability of missed diagnosis and misdiagnosis is reduced as much as possible. In addition, protein markers such as CEA detection have more limited sensitivity and specificity.
In recent years, proteomics based on high-resolution mass spectrometers greatly improves detection accuracy and increases detection speed, and is gradually suitable for analyzing the proteomic expression level of large-scale clinical samples. Over the years of practice, it is widely recognized by the industry that high sensitivity and high specificity early cancer screening strategies require shifting from single protein markers to combined markers. At present, there is no early screening diagnostic kit for colorectal cancer based on protein markers in clinic.
Disclosure of Invention
In order to solve at least one of the technical problems, the application adopts the following technical scheme:
the first aspect of the present application provides a protein marker combination for colorectal cancer prediction, diagnosis or prognosis, comprising at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, CNDP1. ITIH3: heavy chain H3 of the meta alpha trypsin inhibitor, the complex can stabilize the extracellular matrix by its ability to bind hyaluronic acid. Polymorphism of this gene may be associated with increased risk of schizophrenia and major depression.
LRG1: belongs to the family of leucine-rich repeats, and plays an important role in protein-protein interactions, signal transduction, intercellular adhesion and development processes.
C9: this protein is the last component of the complement system and is involved in the formation of the Membrane Attack Complex (MAC). Membrane attack complexes play a key role in innate and adaptive immune responses.
IGFBP2: the protein can bind insulin-like growth factors I and II (IGF-I and IGF-II), can better bind IGF-I and IGF-II after being secreted into blood, and can also act with different ligands in cells. High expression of IGFBP2 may promote the growth of a variety of tumors and may allow for the prognosis of a patient.
CNDP1: the protein is one of M20 metalloprotease family members, specifically expressed in brain, and coding region of gene Contains Trinucleotide (CTG) repetitive sequence.
SERPINA1: the protein is a serine protease inhibitor, belongs to serine superfamily, and its action targets include elastase, plasmin, thrombin, trypsin, chymosin and plasminogen activator. The protein is produced by lymphocytes and monocytes in liver, bone marrow, lymphoid tissues, and pantyhose cells of the gut. It is known that the deficiency of this gene is associated with chronic obstructive pulmonary disease, emphysema and chronic liver disease.
CP: the protein is a metallic protein, can bind most of copper in plasma, and is involved in the peroxidation of iron (II) transferrin to iron (III) transferrin. This gene mutation leads to acute plasmin, iron accumulation and tissue damage, and is associated with diabetes and neurological abnormalities.
ORM1: the protein belongs to acute stage plasma protein. In the acute inflammatory response, the expression level increases. The specific function of the protein is unknown and may be involved in immunosuppression.
In some embodiments of the application, the protein marker combination comprises LRG1, further comprising at least one of SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1.
In other embodiments of the application, the protein marker combination comprises C9 and further comprises at least one of LRG1, SERPINA1, ITIH3, CP, ORM1, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises ITIH3, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises CP, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises ITIH3, CP, LRG1, C9 and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, CP, LRG1, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, ORM1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, CP, ORM1, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, ITIH3, CP, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, ITIH3, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, ITIH3, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, ITIH3, LRG1, ORM1, C9, and CNDP1.
In the present application, by detecting the expression level of each protein in the combination of protein markers, it is possible to predict whether a subject is at risk of having colorectal cancer, i.e., can be used for colorectal cancer early screening; it is also possible to diagnose whether the subject has colorectal cancer, which may be an auxiliary diagnosis, by the clinician in combination with other clinical indicators; a prognosis of a subject with colorectal cancer after receiving treatment can also be assessed.
In a second aspect the application provides a polypeptide combination for use in the prediction, diagnosis or prognosis of colorectal cancer, said polypeptide combination comprising at least one polypeptide from each protein in any of the protein marker combinations according to the first aspect of the application.
Optionally, the polypeptide from C9 comprises the amino acid sequence shown as SEQ ID No.1 or SEQ ID No. 2.
Optionally, the polypeptide from SERPINA1 comprises the amino acid sequence shown in SEQ ID No. 3.
Optionally, the polypeptide from ITIH3 comprises the amino acid sequence shown in SEQ ID No. 4.
Optionally, the polypeptide from CP comprises the amino acid sequence shown in SEQ ID No. 5.
Optionally, the polypeptide from LRG1 comprises the amino acid sequence shown as SEQ ID No.6 or SEQ ID No. 7.
Optionally, the polypeptide from IGFBP2 comprises the amino acid sequence set forth in SEQ ID No. 8.
Optionally, the polypeptide from KNG1 comprises the amino acid sequence shown in SEQ ID No. 9.
Optionally, the polypeptide from ORM1 comprises the amino acid sequence shown in SEQ ID No. 10.
Optionally, the polypeptide from PRDX2 comprises the amino acid sequence shown in SEQ ID No. 11.
Optionally, the polypeptide from CNDP1 comprises the amino acid sequence shown in SEQ ID No. 12.
In a third aspect, the application provides the use of a reagent for detecting the expression level of a combination of protein markers according to any one of the first aspects of the application for the preparation of a kit for the prediction, diagnosis or prognosis of colorectal cancer.
In some embodiments of the application, the detection reagent detects the expression level of each protein in the protein marker combination based on mass spectrometry.
In some embodiments of the application, the level of expression of each protein in the protein marker combination is detected by detecting the level of one or more polypeptides of each protein in the protein marker combination.
Optionally, the polypeptide from C9 comprises the amino acid sequence shown as SEQ ID No.1 or SEQ ID No. 2.
Optionally, the polypeptide from SERPINA1 comprises the amino acid sequence shown in SEQ ID No. 3.
Optionally, the polypeptide from ITIH3 comprises the amino acid sequence shown in SEQ ID No. 4.
Optionally, the polypeptide from CP comprises the amino acid sequence shown in SEQ ID No. 5.
Optionally, the polypeptide from LRG1 comprises the amino acid sequence shown as SEQ ID No.6 or SEQ ID No. 7.
Optionally, the polypeptide from IGFBP2 comprises the amino acid sequence set forth in SEQ ID No. 8.
Optionally, the polypeptide from KNG1 comprises the amino acid sequence shown in SEQ ID No. 9.
Optionally, the polypeptide from ORM1 comprises the amino acid sequence shown in SEQ ID No. 10.
Optionally, the polypeptide from PRDX2 comprises the amino acid sequence shown in SEQ ID No. 11.
Optionally, the polypeptide from CNDP1 comprises the amino acid sequence shown in SEQ ID No. 12.
In a fourth aspect the application provides a kit for the prediction, diagnosis or prognosis of colorectal cancer comprising an expression level detection reagent for any one of the protein marker combinations of the first aspect of the application.
In a fifth aspect the present application provides a method for the prediction, diagnosis or prognosis of colorectal cancer comprising the steps of:
s1, obtaining expression level data of each protein in the protein marker combination according to any one of the first aspect of the application;
s2, constructing a machine learning model by using expression level data of each protein in the protein marker combination in the population sample and information of whether each sample is derived from colorectal cancer patients, and judging whether a subject has colorectal cancer or has risk of colorectal cancer or whether colorectal cancer prognosis is good or not based on the machine learning model.
In some embodiments of the application, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some preferred embodiments of the application, the machine learning model is trained using a logistic regression algorithm.
Further, a preset threshold is obtained based on the machine learning model by using the population samples, and a model measurement result of each subject sample is judged to have colorectal cancer or to have a risk of having colorectal cancer or a poor prognosis of colorectal cancer if the model measurement result is higher than the preset threshold. If not higher than the preset threshold, it is judged that the colorectal cancer does not exist or the risk of suffering from the colorectal cancer does not exist or the prognosis of the colorectal cancer is good.
In some embodiments of the application, in step S1, the blood sample of the subject is anticoagulated with EDTA to obtain plasma, the plasma protein is denatured, reduced, alkylated, digested with trypsin to obtain polypeptide fragments, desalted and evaporated to dryness, and subjected to liquid phase separation and mass spectrometry to determine the level of the protein marker combination based on the level of the polypeptide.
In some embodiments of the application, the mass spectrometry detection is performed using a triple quadrupole mass spectrometry method.
In a sixth aspect the application provides a system for colorectal cancer prediction, diagnosis or prognosis comprising the following modules:
a data input module for inputting expression level data of each protein in any of the protein marker combinations of the first aspect of the present application to a subject;
the data storage module is used for storing the expression level data of each protein in the protein marker combination in the population samples and the information of whether each sample is derived from colorectal cancer patients;
the colorectal cancer analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by utilizing the expression level data of each protein in the protein marker combination in the storage population sample stored in the data storage module and the information of whether each sample is derived from a colorectal cancer patient, and judges whether the subject has colorectal cancer or has risk of colorectal cancer or has good colorectal cancer prognosis based on the machine learning model.
In some embodiments of the application, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some embodiments of the application, the colorectal cancer analysis module further inputs the expression level data and the determination of each protein in the subject protein marker combination to the data storage module.
In some preferred embodiments of the application, the machine learning model is trained using a logistic regression algorithm.
The beneficial effects of the application are that
Compared with the prior art, the application has the following beneficial effects:
and (3) detecting a plurality of protein markers in the plasma simultaneously based on the target mass spectrum, and carrying out absolute quantification, so that the result is accurate, and the time cost of detection is saved.
The protein marker combination of the application provides a non-invasive screening means based on plasma for early colorectal cancer.
The method and the system of the application are used for predicting, diagnosing or prognosing colorectal cancer, have no wound on patients, convenient material acquisition, small blood plasma sample amount, high sensitivity and specificity, and most importantly fill the blank that the early colorectal cancer has no effective protein marker.
The protein marker combination has high accuracy in predicting early colorectal cancer, and can promote patients to further diagnose after judging positive results, so that the death rate of colorectal cancer can be effectively reduced in the crowd in long term.
The machine learning is utilized to detect the marker protein of the blood plasma, so that the purpose of dynamically monitoring the disease state of a patient can be achieved.
Drawings
FIG. 1 shows the subject working characteristics of a single protein marker LRG1 with areas under the curve (AUC) of 0.904, 0.85, 0.8 for the training set, the test set and the independent validation set, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 2 shows the subject working characteristics of a single protein marker SERPINA1 with areas under the curve (AUC) of 0.837, 0.779, 0.771 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 3 shows the subject working characteristics of a single protein marker ITIH3 with areas under the curve (AUC) of the training set, test set and independent validation set of 0.835, 0.921, 0.79, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 4 shows the subject working characteristics of a single protein marker CP with areas under the curves (AUC) of 0.823, 0.842, 0.624 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 5 shows the subject working characteristics of a single protein marker ORM1 with areas under the curve (AUC) of 0.818, 0.783, 0.697 for the training set, the test set and the independent validation set, respectively, wherein train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 6 shows the subject working characteristics of a single protein marker C9 with areas under the curves (AUC) of 0.875, 0.91, 0.81 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 7 shows the subject operating characteristics of the single protein marker IGFBP2 with areas under the curve (AUC) of 0.728, 0.738, 0.737 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 8 shows a subject working profile for 5 protein marker combinations with areas under the profile (AUC) of 0.956, 0.954, 0.893 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
Figure 9 shows a confusion matrix of 5 protein marker combinations, with 121 colorectal cancer patients and 186 healthy individuals. 1 indicates positive, and 0 indicates negative. Wherein train represents a training set, test represents a test set, and valid represents an independent verification set; truth represents reality and Prediction represents Prediction.
Detailed Description
Unless otherwise indicated, implied from the context, or common denominator in the art, all parts and percentages in the present application are based on weight and the test and characterization methods used are synchronized with the filing date of the present application. Where applicable, the disclosure of any patent, patent application, or publication referred to in this application is incorporated by reference in its entirety, and the equivalent patents to those cited in this application are incorporated by reference, particularly as if they were set forth in the relevant terms of art. If the definition of a particular term disclosed in the prior art is inconsistent with any definition provided in the present application, the definition of the term provided in the present application controls.
The numerical ranges in the present application are approximations, so that it may include the numerical values outside the range unless otherwise indicated. The numerical range includes all values from the lower value to the upper value that increase by 1 unit, provided that there is a spacing of at least 2 units between any lower value and any higher value. For ranges containing values less than 1 or containing fractions greater than 1 (e.g., 1.1,1.5, etc.), then 1 unit is suitably considered to be 0.0001,0.001,0.01, or 0.1. For a range containing units of less than 10 (e.g., 1 to 5), 1 unit is generally considered to be 0.1. These are merely specific examples of what is intended to be provided, and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure.
The terms "comprises," "comprising," "including," and their derivatives do not exclude the presence of any other component, step or process, and are not related to whether or not such other component, step or process is disclosed in the present application. For the avoidance of any doubt, all use of the terms "comprising", "including" or "having" herein, unless expressly stated otherwise, may include any additional additive, adjuvant or compound. Rather, the term "consisting essentially of … …" excludes any other component, step or process from the scope of any of the terms recited below, as those out of necessity for operability. The term "consisting of … …" does not include any components, steps or processes not specifically described or listed. The term "or" refers to the listed individual members or any combination thereof unless explicitly stated otherwise.
In order to make the technical problems, technical schemes and beneficial effects solved by the application more clear, the application is further described in detail below with reference to the embodiments.
Examples
The following examples are presented herein to demonstrate preferred embodiments of the present application. It will be appreciated by those skilled in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function in the practice of the application, and thus can be considered to constitute preferred modes for its practice. Those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit or scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the disclosure of which is incorporated herein by reference as is commonly understood by reference.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the application described herein. Such equivalents are intended to be encompassed by the claims.
The experimental methods in the following examples are conventional methods unless otherwise specified. The instruments used in the following examples are laboratory conventional instruments unless otherwise specified; the test materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.
Example 1 discovery of protein markers
The inventors collected fresh blood samples of gender and age matched 101 colorectal cancer patients and 89 healthy human controls for the discovery of protein markers.
1. Blood sample processing
After anticoagulation treatment, 1000g of fresh blood sample is centrifuged for 5min to obtain a plasma sample, and the plasma sample is stored for a long time in a refrigerator at-70 ℃.
Plasma samples were diluted 50-fold and BCA assay concentrations were determined: BSA standards were diluted in a gradient to concentration gradients of 2, 1, 0.5, 0.25, 0.125, 0.0625mg/mL and plasma concentrations were calibrated as a working curve. The diluted sample and standard substance are respectively added into a 96-well plate, a pre-prepared BCA working solution is added, and the reaction is carried out at 37 ℃ for 30min, and the concentration of plasma protein is measured under the absorbance of 562 nm.
50 μg of protein was taken and ammonium bicarbonate solution was added to a final concentration of 50mM. DTT was added to a final concentration of 10mM and heated at 95℃for 10min. After returning to room temperature, dark reaction was performed for 30min by adding IAA at a final concentration of 15 mM. 1 mug of trypsin was added to each sample, and the reaction was carried out overnight in a metal bath at 37℃for 12-14 h. The next day, formic acid with a final concentration of 1% was added to carry out the acidification treatment to terminate the cleavage reaction.
2. Differential proteins and polypeptides
The selection of targets is first based on finding differentially expressed proteins. The inventors performed mass spectrum collection by independent collection pattern (DIA) on 190 plasma samples (89 healthy people and 101 colorectal cancer patients) with symmetrical gender and age, further analyzed by DIA-NN software to obtain expression data of proteins and polypeptides, and performed normalization analysis by total protein intensity to total 714 proteins and 7988 polypeptides. For expressing proteins and polypeptides conforming to normal distribution, the inventors found differentially expressed proteins and polypeptides using T-test, and for expressing proteins and polypeptides not conforming to normal distribution, the inventors found differentially expressed proteins and polypeptides using Wilcoxon non-parametric test. Finally, the inventors have obtained 96 differentially expressed proteins, 832 differentially expressed polypeptides. Integration yields a differentially expressed polypeptide.
3. Marker protein screening
The potential polypeptides capable of distinguishing colorectal cancer and healthy people are selected by a random forest method, average Gini coefficients of the targets are calculated by the random forest, the targets are ranked according to importance, the biological functions of the proteins are further combined, and finally 10 top-ranked proteins, namely LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, CNDP1, KNG1 and PRDX2 are obtained, and corresponding polypeptide sequences are shown in table 1:
TABLE 1 polypeptide sequences of candidate proteins
Example 2 machine learning model establishment
C at an appropriate concentration for each polypeptide 13 And N 15 The labeled heavy isotope polypeptide is added to the enzyme after the enzyme digestionAnd (3) uniformly mixing the plasma samples, and then carrying out desalting and evaporating treatment by a 96-well SOLA solid-phase extraction device.
For each polypeptide, a concentration-appropriate standard curve range (9 standard curve points) is configured, and an equivalent amount of internal standard is added to each standard curve point. Mass spectrometry was performed using an AB Sciex 5500Qtrap mass spectrometer, and the polypeptides were separated using a C18 column (Phenomenex) at a set column temperature of 45 ℃ and 15 μl of standard sample was introduced. 150. Mu.L of 0.1% formic acid is added into the evaporated sample, the mixture is fully and uniformly mixed, 15. Mu.L of sample is injected for mass spectrum detection, and the conditions of liquid phase separation are shown in Table 2:
TABLE 2 conditions for separating liquid phases
Time (min) | Event(s) | Parameters (parameters) | Flow rate (ml/min) |
0.01 | PumpBConc. | 6 | 0.25 |
2.0 | PumpBConc. | 6 | 0.25 |
18.0 | PumpBConc. | 28 | 0.25 |
18.5 | PumpBConc. | 28 | 0.25 |
21.5 | PumpBConc. | 98 | 0.25 |
22 | PumpBConc. | 98 | 0.25 |
25 | PumpBConc. | 6 | 0.25 |
Triple quaternary rod targeted mass spectrometry was then performed and the ion pair information for multiple reaction monitoring (multiple reaction monitoring, MRM) is shown in table 3.
Table 3MRM monitoring information
After mass spectrometry, the polypeptide concentrations corresponding to the respective protein markers were quantified and used for model establishment. 190 samples were randomly selected 80% (152) as training set, the remaining 20% (38) as test set, and 10 potential protein markers were further modeled as logistic regression. The inventors found that LRG1, SERPINA1, ITIH3, CP, ORM1, C9 and IGFBP2 together had 7 single protein markers, which had very good predictive power in both training and test sets, and ROC curves thereof were shown in fig. 1 to 7, respectively.
Example 3 model verification
The inventor selects 121 colorectal cancer patients and 186 matched healthy people as verification sets to verify the model. In order to more accurately quantify the polypeptide and reduce errors caused by complicated experimental treatment, the inventor does not need to perform the operation of removing the kurtosis protein, and the pretreatment cost of the experiment can be greatly reduced. And (3) extracting protein, measuring the concentration, and then carrying out liquid phase separation and mass spectrum detection.
Example 4 modeling and validation of multiple marker combinations
The inventor further utilizes the optimal combination of the aforementioned proteins-the concentration of 5 protein markers (ITIH 3, LRG1, SERPINA1, IGFBP2, and CDNP 1) to build a logistic regression model to better discriminate colorectal cancer patients from healthy people. Specifically, logistic regression modeling used 77 colorectal cancer patients and 79 healthy people to learn the distinguishing effects of 5 protein markers. A threshold of 0.34 in the logistic regression model was set and independent verification of the model was performed using 44 colorectal cancer patients and 107 healthy persons. A threshold was set based on the model results for all 307 plasma samples, and a model measurement result for each sample was determined to be positive if above this threshold. And if the model measurement result of the sample is lower than the threshold value, judging as negative.
The ROC curves are shown in fig. 8, and the area under the curve (AUC) for the training set, the test set, and the independent validation set are 0.956, 0.954, and 0.893, respectively. The final result was 92% sensitivity, 81% specificity, 94% negative predictive value, 76% positive predictive value, as shown in FIG. 9.
In addition, the inventors also presented other 10 protein marker combinations that perform well during machine learning, and the results are shown in table 4.
Table 4 protein marker combinations
Model | Training set AUC | Test set AUC | Independent validation set AUC |
CP+LRG1+C9+IGFBP2+CNDP1 | 0.955 | 0.945 | 0.870 |
ITIH3+CP+LRG1+C9+CNDP1 | 0.953 | 0.945 | 0.872 |
SERPINA1+LRG1+C9+IGFBP2+CNDP1 | 0.952 | 0.939 | 0.884 |
SERPINA1+CP+LRG1+C9+CNDP1 | 0.952 | 0.942 | 0.870 |
LRG1+ORM1+C9+IGFBP2+CNDP1 | 0.947 | 0.935 | 0.891 |
LRG1+SERPINA1+CP+ORM1+C9+CNDP1 | 0.950 | 0.939 | 0.861 |
LRG1+SERPINA1+ITIH3+CP+C9+CNDP1 | 0.951 | 0.941 | 0.866 |
LRG1+SERPINA1+ITIH3+C9+IGFBP2+CNDP1 | 0.949 | 0.936 | 0.892 |
SERPINA1+ITIH3+LRG1+C9+IGFBP2+CNDP1 | 0.952 | 0.941 | 0.887 |
SERPINA1+ITIH3+LRG1+ORM1+C9+CNDP1 | 0.951 | 0.941 | 0.890 |
All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Claims (10)
1. A protein marker combination for colorectal cancer prediction, diagnosis or prognosis, characterized in that the protein marker combination comprises at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2 and CNDP1.
2. The protein marker combination according to claim 1, wherein the protein marker combination comprises LRG1 and further comprises at least one of SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1.
3. The protein marker combination of claim 1, wherein the protein marker combination comprises C9 and further comprises at least one of LRG1, SERPINA1, ITIH3, CP, ORM1, IGFBP2, and CNDP1.
4. A combination of polypeptides for use in the prediction, diagnosis or prognosis of colorectal cancer, characterized in that the combination of polypeptides comprises at least one polypeptide from each protein in the combination of protein markers according to any one of claims 1-3.
5. Use of a reagent for detecting the expression level of a combination of protein markers according to any one of claims 1 to 3 for the preparation of a kit for the prediction, diagnosis or prognosis of colorectal cancer.
6. The use according to claim 5, wherein the detection reagent detects the expression level of each protein in the protein marker combination based on mass spectrometry.
7. A kit for colorectal cancer prediction, diagnosis or prognosis comprising an expression level detection reagent comprising a combination of protein markers comprising ITIH3, LRG1 and C9.
8. A system for colorectal cancer prediction, diagnosis or prognosis comprising the following modules:
a data input module for inputting expression level data for each protein in a subject protein marker combination comprising ITIH3, LRG1 and C9;
the data storage module is used for storing the expression level data of each protein in the protein marker combination in the population samples and the information of whether each sample is derived from colorectal cancer patients;
the colorectal cancer analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by utilizing the expression level data of each protein in the protein marker combination in the storage population sample stored in the data storage module and the information of whether each sample is derived from a colorectal cancer patient, and judges whether the subject has colorectal cancer or has risk of colorectal cancer or has good colorectal cancer prognosis based on the machine learning model.
9. The system of claim 8, wherein the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
10. The system of claim 8 or 9, wherein the colorectal cancer analysis module further inputs into the data storage module expression level data and determinations of each protein in the subject protein marker combination.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310049892.3A CN116735889B (en) | 2023-02-01 | Protein marker for early colorectal cancer screening, kit and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310049892.3A CN116735889B (en) | 2023-02-01 | Protein marker for early colorectal cancer screening, kit and application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116735889A true CN116735889A (en) | 2023-09-12 |
CN116735889B CN116735889B (en) | 2024-05-17 |
Family
ID=
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120149022A1 (en) * | 2009-02-20 | 2012-06-14 | Eva I-Wei Aw | Compositions and methods for diagnosis and prognosis of colorectal cancer |
WO2013152989A2 (en) * | 2012-04-10 | 2013-10-17 | Eth Zurich | Biomarker assay and uses thereof for diagnosis, therapy selection, and prognosis of cancer |
US20170269089A1 (en) * | 2014-12-11 | 2017-09-21 | Wisconsin Alumni Research Foundation | Methods for Detection and Treatment of Colorectal Cancer |
CN109036571A (en) * | 2014-12-08 | 2018-12-18 | 20/20基因系统股份有限公司 | The method and machine learning system of a possibility that for predicting with cancer or risk |
CN111584008A (en) * | 2020-05-29 | 2020-08-25 | 杭州广科安德生物科技有限公司 | Method for constructing mathematical model for detecting colorectal cancer in vitro and application thereof |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120149022A1 (en) * | 2009-02-20 | 2012-06-14 | Eva I-Wei Aw | Compositions and methods for diagnosis and prognosis of colorectal cancer |
WO2013152989A2 (en) * | 2012-04-10 | 2013-10-17 | Eth Zurich | Biomarker assay and uses thereof for diagnosis, therapy selection, and prognosis of cancer |
CN109036571A (en) * | 2014-12-08 | 2018-12-18 | 20/20基因系统股份有限公司 | The method and machine learning system of a possibility that for predicting with cancer or risk |
US20170269089A1 (en) * | 2014-12-11 | 2017-09-21 | Wisconsin Alumni Research Foundation | Methods for Detection and Treatment of Colorectal Cancer |
CN111584008A (en) * | 2020-05-29 | 2020-08-25 | 杭州广科安德生物科技有限公司 | Method for constructing mathematical model for detecting colorectal cancer in vitro and application thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2295975B1 (en) | Determining the expression status of human epidermal growth factor receptor 2 (HER2) in biological samples | |
EP2362942A1 (en) | Biomarkers | |
WO2023179263A1 (en) | System, model and kit for evaluating malignancy grade or probability of thyroid nodules | |
WO2011161186A1 (en) | Method for in vitro diagnosing sepsis utilizing biomarker composed of more than two different types of endogenous biomolecules | |
CN113156018B (en) | Method for establishing liver and gall disease diagnosis model and diagnosis system | |
CN115575636A (en) | Biomarker for lung cancer detection and system thereof | |
CN115798712B (en) | System for diagnosing whether person to be tested is breast cancer or not and biomarker | |
CN111833963A (en) | cfDNA classification method, device and application | |
CN112748191A (en) | Small molecule metabolite biomarker for diagnosing acute diseases, and screening method and application thereof | |
CN105624166B (en) | A kind of aptamer for detecting Human Bladder Transitional Cell Carcinoma cell and its application in detection preparation is prepared | |
CN109971853A (en) | One kind molecular marker relevant to Diagnosis of Non-Small Cell Lung and its application | |
CN113391072A (en) | Ovarian cancer urine marker combination and application thereof | |
WO2011163627A2 (en) | Organ specific diagnostic panels and methods for identification of organ specific panel proteins | |
US20160018413A1 (en) | Methods of Prognosing Preeclampsia | |
US20070184511A1 (en) | Method for Diagnosing a Person Having Sjogren's Syndrome | |
CN116735889B (en) | Protein marker for early colorectal cancer screening, kit and application | |
CN115128285B (en) | Kit and system for identifying and evaluating thyroid follicular tumor by protein combination | |
CN116735889A (en) | Protein marker for early colorectal cancer screening, kit and application | |
US20070249000A1 (en) | Method for diagnosing a person having b-cell pathologies | |
CN114660290A (en) | Sugar chain marker for predicting postoperative recurrence of thyroid cancer and application thereof | |
CN110780070B (en) | Plasma protein molecule for detecting cancer chemotherapy sensitivity, application and kit | |
CN115349091A (en) | Biomarkers for endometriosis | |
JP2023518280A (en) | Compositions for ovarian cancer assessment with improved specificity and sensitivity | |
JP2023514809A (en) | Biomarkers for diagnosing ovarian cancer | |
CN107541564B (en) | Molecular marked compound TCONS_00016233, kit and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Liao Lujian Inventor after: Wang Tingting Inventor after: Gao Fei Inventor after: Pan Liangxuan Inventor after: Du Xiaoyao Inventor before: Liao Lujian |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant |