CN116519954B - Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker - Google Patents
Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker Download PDFInfo
- Publication number
- CN116519954B CN116519954B CN202310770060.0A CN202310770060A CN116519954B CN 116519954 B CN116519954 B CN 116519954B CN 202310770060 A CN202310770060 A CN 202310770060A CN 116519954 B CN116519954 B CN 116519954B
- Authority
- CN
- China
- Prior art keywords
- colorectal cancer
- model
- biomarker
- individual
- constructing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 208000001333 Colorectal Neoplasms Diseases 0.000 title claims abstract description 149
- 206010009944 Colon cancer Diseases 0.000 title claims abstract description 147
- 239000000090 biomarker Substances 0.000 title claims abstract description 104
- 238000001514 detection method Methods 0.000 title claims abstract description 27
- 238000010276 construction Methods 0.000 title claims abstract description 18
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 69
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 17
- 238000003860 storage Methods 0.000 claims abstract description 7
- 108010005020 Serine Peptidase Inhibitor Kazal-Type 5 Proteins 0.000 claims description 31
- 102100025420 Serine protease inhibitor Kazal-type 5 Human genes 0.000 claims description 31
- 102100030595 HLA class II histocompatibility antigen gamma chain Human genes 0.000 claims description 27
- 101001082627 Homo sapiens HLA class II histocompatibility antigen gamma chain Proteins 0.000 claims description 27
- 101000667595 Homo sapiens Ribonuclease pancreatic Proteins 0.000 claims description 27
- 102100039832 Ribonuclease pancreatic Human genes 0.000 claims description 27
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 27
- 102100035628 N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 2 Human genes 0.000 claims description 24
- 101000874526 Homo sapiens N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 2 Proteins 0.000 claims description 23
- 102100022460 Alpha-1-acid glycoprotein 2 Human genes 0.000 claims description 21
- 102100028065 Fibulin-5 Human genes 0.000 claims description 21
- 101000678191 Homo sapiens Alpha-1-acid glycoprotein 2 Proteins 0.000 claims description 21
- 101001060252 Homo sapiens Fibulin-5 Proteins 0.000 claims description 21
- 101000609406 Homo sapiens Inter-alpha-trypsin inhibitor heavy chain H3 Proteins 0.000 claims description 21
- 102100039460 Inter-alpha-trypsin inhibitor heavy chain H3 Human genes 0.000 claims description 21
- 102100022463 Alpha-1-acid glycoprotein 1 Human genes 0.000 claims description 17
- 101000678195 Homo sapiens Alpha-1-acid glycoprotein 1 Proteins 0.000 claims description 17
- 210000002966 serum Anatomy 0.000 claims description 15
- 239000003153 chemical reaction reagent Substances 0.000 claims description 12
- 239000003550 marker Substances 0.000 claims description 11
- 238000002790 cross-validation Methods 0.000 claims description 9
- 238000010200 validation analysis Methods 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 claims 4
- 101000914321 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 7 Proteins 0.000 claims 4
- 101000617725 Homo sapiens Pregnancy-specific beta-1-glycoprotein 2 Proteins 0.000 claims 4
- 102100022019 Pregnancy-specific beta-1-glycoprotein 2 Human genes 0.000 claims 4
- 210000001124 body fluid Anatomy 0.000 claims 3
- 239000010839 body fluid Substances 0.000 claims 3
- 238000012216 screening Methods 0.000 abstract description 8
- 210000004369 blood Anatomy 0.000 abstract description 5
- 239000008280 blood Substances 0.000 abstract description 5
- 235000018102 proteins Nutrition 0.000 description 62
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 27
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 27
- 238000003745 diagnosis Methods 0.000 description 23
- 201000010099 disease Diseases 0.000 description 21
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 21
- 239000000523 sample Substances 0.000 description 15
- 238000004458 analytical method Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 13
- 230000008859 change Effects 0.000 description 11
- 125000003275 alpha amino acid group Chemical group 0.000 description 10
- 101000976697 Homo sapiens Inter-alpha-trypsin inhibitor heavy chain H1 Proteins 0.000 description 9
- 102100023490 Inter-alpha-trypsin inhibitor heavy chain H1 Human genes 0.000 description 9
- 229940122618 Trypsin inhibitor Drugs 0.000 description 9
- 239000002753 trypsin inhibitor Substances 0.000 description 9
- 235000008474 Cardamine pratensis Nutrition 0.000 description 8
- 240000000606 Cardamine pratensis Species 0.000 description 8
- 101100000208 Mus musculus Orm2 gene Proteins 0.000 description 8
- 101150110809 ORM1 gene Proteins 0.000 description 8
- 102000006382 Ribonucleases Human genes 0.000 description 8
- 108010083644 Ribonucleases Proteins 0.000 description 8
- 150000002500 ions Chemical class 0.000 description 8
- 239000000047 product Substances 0.000 description 8
- 102100031812 Fibulin-1 Human genes 0.000 description 7
- 101001065276 Homo sapiens Fibulin-1 Proteins 0.000 description 7
- 206010028980 Neoplasm Diseases 0.000 description 7
- 210000004408 hybridoma Anatomy 0.000 description 7
- 238000001294 liquid chromatography-tandem mass spectrometry Methods 0.000 description 7
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 6
- 238000007619 statistical method Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 101001030625 Homo sapiens Mucin-like protein 1 Proteins 0.000 description 5
- 102100038565 Mucin-like protein 1 Human genes 0.000 description 5
- 108090000992 Transferases Proteins 0.000 description 5
- 102000004357 Transferases Human genes 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012827 research and development Methods 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 201000011510 cancer Diseases 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000004949 mass spectrometry Methods 0.000 description 4
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 4
- 238000010239 partial least squares discriminant analysis Methods 0.000 description 4
- 230000007170 pathology Effects 0.000 description 4
- 238000000513 principal component analysis Methods 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 108010008707 Mucin-1 Proteins 0.000 description 3
- 102000007298 Mucin-1 Human genes 0.000 description 3
- 108010008705 Mucin-2 Proteins 0.000 description 3
- 102000007296 Mucin-2 Human genes 0.000 description 3
- 229940119135 Serine peptidase inhibitor Drugs 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 3
- 239000007791 liquid phase Substances 0.000 description 3
- 238000001819 mass spectrum Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 229950006780 n-acetylglucosamine Drugs 0.000 description 3
- 239000012071 phase Substances 0.000 description 3
- 238000004393 prognosis Methods 0.000 description 3
- 239000003001 serine protease inhibitor Substances 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 108010033276 Peptide Fragments Proteins 0.000 description 2
- 102000007079 Peptide Fragments Human genes 0.000 description 2
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 2
- 238000001793 Wilcoxon signed-rank test Methods 0.000 description 2
- 102000036639 antigens Human genes 0.000 description 2
- 108091007433 antigens Proteins 0.000 description 2
- 239000007864 aqueous solution Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000019253 formic acid Nutrition 0.000 description 2
- 230000002496 gastric effect Effects 0.000 description 2
- 231100001014 gastrointestinal tract lesion Toxicity 0.000 description 2
- 239000008103 glucose Substances 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 230000002980 postoperative effect Effects 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 102220465492 Lymphocyte activation gene 3 protein_Q35A_mutation Human genes 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 101710148605 N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 2 Proteins 0.000 description 1
- 238000011869 Shapiro-Wilk test Methods 0.000 description 1
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 102000006668 UniProt protein families Human genes 0.000 description 1
- 108020004729 UniProt protein families Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 238000000889 atomisation Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000000091 biomarker candidate Substances 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000004202 carbamide Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 235000018417 cysteine Nutrition 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011033 desalting Methods 0.000 description 1
- 238000003748 differential diagnosis Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 239000003596 drug target Substances 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000002550 fecal effect Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000004811 liquid chromatography Methods 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 229930182817 methionine Natural products 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000101 novel biomarker Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000011470 radical surgery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000004885 tandem mass spectrometry Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 231100000027 toxicology Toxicity 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
- 239000000439 tumor marker Substances 0.000 description 1
- 238000007473 univariate analysis Methods 0.000 description 1
- 230000003827 upregulation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/34—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
- C12Q1/44—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase involving esterase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/48—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving transferase
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6893—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids related to diseases not provided for elsewhere
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/60—Complex ways of combining multiple protein biomarkers for diagnosis
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/70—Mechanisms involved in disease identification
- G01N2800/7023—(Hyper)proliferation
- G01N2800/7028—Cancer
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention provides a colorectal cancer detection model construction method, a colorectal cancer detection model construction system and a biomarker, which are used for screening out proteins which can be singly or combined to be used as the biomarker for early prediction of colorectal cancer occurrence risk, and products, models, systems, computer readable storage media and information data processing terminals which comprise the biomarkers and are used for predicting whether an individual is colorectal cancer or not by analyzing proteins with significant differences in blood of colorectal cancer patients and normal people through a proteomics method, so that the colorectal cancer detection model construction method, the colorectal cancer detection model construction system and the colorectal cancer detection model construction system can be used for predicting whether the individual suffers from colorectal cancer or not conveniently, noninvasively and efficiently and can meet clinical requirements.
Description
Technical Field
The present invention relates to the field of medicine, in particular to the use of proteomics to screen biomarkers for colorectal cancer and to apply the biomarkers for predicting whether an individual is colorectal cancer.
Background
Proteomics (Proteomics) is the science of studying the composition, location, variation and rules of interactions of proteins in cells, tissues or organisms, including the study of protein expression patterns and proteomic functional patterns. With the development of mass spectrometry technology, liquid chromatography and mass spectrometry combined technology (LC-MS/MS) have become the most dominant tool in proteomics research. The development of proteomics has important significance in searching diagnostic markers of diseases, screening drug targets, toxicology research and the like, and is also widely applied to medical research.
Colorectal cancer is one of the most common malignant tumors in clinic, and about 60% of colorectal cancer patients are older than 65 years old, and the incidence of colorectal cancer is increasing year by year due to the influence of various factors such as population aging and structural changes of eating and drinking solutions. National cancer reports of 2022 showed that colorectal cancer is second only to lung cancer, with mortality accounting for 9.5% of all cancers, and second in female cancers.
Notably, early detection of colorectal cancer is a key factor in reducing colorectal cancer mortality, as 5-year survival rates after radical surgery are about 90% when colorectal cancer is diagnosed as a localized disease; however, as the disease progresses, only 5% of patients diagnosed with distant metastasis survive for 5 years. Among the various screening methods for colorectal cancer, the Fecal Occult Blood Test (FOBT) is considered the most effective non-invasive screening method, but this method still has some limitations that are currently not overcome. With the development of immunology and molecular biology, tumor-associated protein markers show increasingly important clinical value in diagnosis and treatment of colorectal cancer, and become indispensable biological indexes for assisting diagnosis, observing curative effects and judging prognosis. Clinically, a plurality of tumor markers which can be used for colorectal cancer diagnosis, pathological typing and clinical staging and prognosis and curative effect judgment are found, but the diagnosis efficacy of the colorectal cancer markers (CEA and CA 199) which are commonly used at present is not ideal, and a specific tumor marker has higher sensitivity and specificity for colorectal cancer diagnosis.
Therefore, searching for new colorectal cancer diagnosis related markers and various marker combinations and constructing a colorectal cancer diagnosis prediction model have important clinical value and significance.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a biomarker for colorectal cancer detection, a model, a system, a computer-readable storage medium and an information data processing terminal for predicting whether an individual is colorectal cancer, which can be used for predicting whether the individual is colorectal cancer conveniently, noninvasively and efficiently and meet clinical requirements.
In particular, in one aspect, the invention provides the use of a biomarker selected from one of ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2 in the manufacture of a reagent for predicting whether an individual is colorectal cancer.
According to the invention, a research and development team analyzes two groups of blood samples of a healthy group and a colorectal cancer patient group by using a TMT (total length, mean time and mass) marking quantitative proteomics research and an LC-MS/MS (liquid chromatography-mass spectrometry) ultra-high performance liquid chromatography-tandem mass spectrometry method, and judges proteins with obvious differences between the colorectal cancer sample and a control sample by using an orthogonal partial least square method to obtain proteins related to colorectal cancer, and the proteins can be used as biomarkers for efficiently predicting whether individuals have colorectal cancer.
In another aspect, the invention provides the use of a biomarker selected from the group consisting of a combination of at least two of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA, CA199 in the manufacture of a reagent for predicting whether an individual is colorectal cancer. The research and development team combines ORM1 and colorectal cancer markers CEA and CA199 commonly used at present, provides at least 2 biomarkers selected from the 10 proteins, and the colorectal cancer diagnosis model constructed based on the biomarkers has better diagnosis value, so that whether an individual is colorectal cancer or not can be predicted more accurately.
Further, the present development team prefers combinations of orders of magnitude of the proteins comprised by the above biomarkers selected from the group consisting of at least two of ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, or from the group consisting of 1 or more of ORM1, CEA, CA199 and 1 or more of ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT 2.
Still further, the biomarker is selected from a combination of at least two of FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, or from a combination of any 1 or more of ORM1, ORM2, CD74, CEA, CA199 with any 1 or more of FBLN5, RNASE1, ITIH3, SPINK5, B3GNT 2.
Still further, the biomarker may be selected to comprise RNASE1 and SPINK5; or RNASE1, SPINK5 and B3GNT2; or ORM2, RNASE1, SPINK5 and B3GNT2; or ITIH3, FBLN5, RNASE1, SPINK5 and B3GNT2; or CA199, ORM2, RNASE1, ITIH3, SPINK5 and B3GNT2; or CA199, CEA, ORM1, FBLN5, RNASE1, SPINK5 and B3GNT2; or CA199, CEA, ORM2, CD74, RNASE1, ITIH3, SPINK5 and B3GNT2; or CA199, CEA, ORM1, ORM2, CD74, FBLN5, RNASE1, SPINK5 and B3GNT2; or CA199, CEA, ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5 and B3GNT2.
It should be noted that in the present invention, the mucin 1 (ORM 1) is a protein or amino acid sequence of UniProt database No. P02763, the mucin 2 (ORM 2) is a protein or amino acid sequence of UniProt database No. P19652, the CD74 molecule (CD 74) is a protein or amino acid sequence of UniProt database No. P04233, the mouse hybridoma cell 5 (FBLN 5) is a protein or amino acid sequence of UniProt database No. Q9UBX5, the ribonuclease family member 1 (RNASE 1) is a protein or amino acid sequence of UniProt database No. P07998, the alpha-trypsin inhibitor heavy chain 3 (ITIH 3) is a protein or amino acid sequence of UniProt database No. Q06033, the serine peptidase inhibitor Kazal 5 (SPINK 5) is a protein or amino acid sequence of UniProt database No. Q9NQ38, the beta-1, 3-N-acetylglucosyltransferase 2 (B3) is a protein or amino acid sequence of UniProt 9NQ 9A 35, and the alpha-trypsin inhibitor heavy chain 3 (ITIH 3) is a protein or amino acid sequence of UniProt database No. Q35A 9 or the antigen protein or amino acid sequence of UniProt protein or protein (Yb 35A) of UniProt 9.Ch 35.6.
In another aspect, the invention provides a biomarker for predicting whether an individual is colorectal cancer, the biomarker being selected from the group consisting of at least two of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA, CA 199. Further, the research and development team of the present invention prefers biomarkers used in the agent to achieve a better technical result in predicting whether an individual is colorectal cancer.
In particular, in a further aspect, the invention provides a product for predicting whether an individual is colorectal cancer, the product comprising a kit or chip comprising a biomarker for the use as described above. In some embodiments, the biomarkers useful for predicting whether an individual is colorectal cancer can be used to prepare detection reagents for detection targets, such as sample pretreatment reagents, biological reagents and kits suitable for detection of the biomarkers, such as antigens or antibodies; standardized reagents or kits etc. suitable for the biomarkers can also be developed; in some embodiments, the detection reagent is an antibody to a biomarker as described above, which is a monoclonal antibody. Furthermore, the research and development team of the invention optimizes the biomarker contained in the kit or chip in the product for predicting whether the individual is colorectal cancer, so as to improve the accuracy of product detection.
In another aspect, the invention provides a method of constructing a model for predicting whether an individual is colorectal cancer, the method comprising:
(1) Data acquisition, setting a model group, and acquiring the concentration of a biomarker in serum of a sample of the model group; wherein the model group comprises colorectal cancer group samples and healthy control samples, and the detected biomarker is selected from a combination of at least two of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA and CA 199;
(2) The model construction comprises the following steps:
s201, adopting biomarker concentration of samples in a model group as an original training data set, dividing the original training data set into K subsets according to a K-fold cross validation mechanism, selecting one subset as a validation set Ddev, and combining unselected subsets to form a training data pool Dtrain;
s202, selecting a generalized linear model (glmcet) algorithm for constructing a prediction model and a grid search range in a hyper-parameter optimization process of the algorithm, and determining parameters constructed by the prediction model;
s203, based on the training data pool Dtrain obtained in S201, constructing a prediction model by adopting the algorithm and the super parameters selected in S202.
Furthermore, the research and development team optimizes the composition of the biomarker used in the model construction method and evaluates the verification set Ddev to obtain an AUC value which can be used as a final performance evaluation value of the model.
In another aspect, the invention features a system for predicting whether an individual is colorectal cancer, the system comprising:
the data acquisition module is used for acquiring the concentration of a biomarker in serum of a model group sample, wherein the detected biomarker is selected from at least two combinations of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA and CA 199;
and (3) constructing a model module: the model is built by adopting the following steps:
s001, adopting biomarker concentration of samples in a model group as an original training data set, dividing the original training data set into K subsets according to a K-fold cross validation mechanism, selecting one subset as a validation set Ddev, and combining unselected subsets to form a training data pool Dtrain;
s002, selecting a generalized linear model (glmcet) algorithm for constructing a prediction model and a grid search range in a hyper-parameter optimization process of the algorithm, and determining parameters constructed by the prediction model;
s003, based on the training data pool Dtrain obtained in the step S001, constructing a prediction model by adopting the algorithm and the super parameters selected in the step S002.
And a prediction module: and predicting the individual by using the model constructed by the model constructing module.
In another aspect, the present invention discloses a computer readable storage medium having a computer program stored thereon; the computer program, when executed by a processor, implements the above-described method of constructing a model for predicting whether an individual is colorectal cancer.
Alternatively, the storage medium includes various media that can store program codes such as ROM, RAM, magnetic disk, or optical disk.
On the other hand, the invention discloses an information data processing terminal which is used for realizing the construction method of the model for predicting whether an individual is colorectal cancer.
Optionally, the information data processing terminal includes a processor and a memory; the memory may include RAM, and may also include non-volatile memory (NVRAM), such as at least one disk memory. The processor may be a general-purpose processor including a CPU, network Processor (NP), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
Optionally, the information data processing terminal includes a processor, a memory, and a communicator.
The invention utilizes proteomics to screen out biomarkers ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5 and B3GNT2 which can be independently used for early prediction of colorectal cancer occurrence risk, and proposes a biomarker which is selected from at least two of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA and CA199 and is used for predicting whether an individual is colorectal cancer, and a product, a model, a system, a computer-readable storage medium and an information data processing terminal for predicting whether the individual is colorectal cancer, wherein the product, the model and the system comprise the biomarker, and the computer-readable storage medium and the information data processing terminal can be used for predicting whether the individual is colorectal cancer conveniently, noninvasively and efficiently in clinical practice.
Drawings
FIG. 1 is a graph of Wilcoxon results for the healthy control and colorectal cancer groups of example 1;
FIG. 2 is a graph of the results of ROC and OPLS-DA analyses of the healthy control and colorectal cancer groups of example 1;
FIG. 3 is a graph of AUC results of a model constructed under different combinations of super parameters of the glmnet algorithm in example 3;
FIG. 4 is a ROC curve of the colorectal cancer joint diagnosis model constructed in example 3 in the model group;
FIG. 5 is a ROC curve of the colorectal cancer combined diagnostic model constructed in example 3 in the test group;
FIG. 6 is a graph showing the results of performance evaluation of the colorectal cancer joint diagnosis model constructed in example 3 in a test group;
FIG. 7 is a comparison of the area under the ROC curve of diagnostic models constructed from different protein combination biomarkers in example 4;
FIG. 8 is a comparison of the area under the ROC curve of the colorectal cancer diagnostic model (10 MP) of example 4 with conventional markers and combinations thereof;
FIG. 9 is a system for predicting whether an individual is colorectal cancer as shown in example 5.
Note that the "Log-transformed corrected P value" shown in the drawings is used to characterize-Log 10 adjust P value; the "generalized linear model hyper-parameters" shown are used to characterize the glrnet model hyper-parameters.
Detailed Description
(1) Diagnosis or detection
Diagnostic or test herein refers to the detection or assay of a biomarker in a sample, or the level of the biomarker of interest, such as absolute or relative, and then indicating whether the individual providing the sample is likely to have or suffer from a disease, or the likelihood of having a disease, by the presence or amount of the biomarker of interest. The diagnostic and detection meanings are interchangeable herein. The result of such detection or diagnosis is not directly as a direct result of the disease, but is an intermediate result, and if a direct result is obtained, it is also necessary to confirm that the patient has a disease by other auxiliary means such as pathology or anatomy. For example, the present invention provides a number of novel biomarkers that have relevance to colorectal cancer, and changes in the levels of these markers have a direct relevance to whether or not colorectal cancer is present.
(2) Association of markers or biomarkers with colorectal cancer
Markers and biomarkers have the same meaning in the present invention. The association here means that the presence or change in the amount of a biomarker in a sample has a direct correlation with a particular disease, e.g. a relative increase or decrease in the amount, indicating a higher likelihood of such a disease than a healthy person.
If multiple different markers are present in the sample at the same time or in a relatively varying amount, this is indicative of a higher likelihood of suffering from the disease than for healthy persons. That is, some markers have strong association with a disease, some markers have weak association with a disease, or some are even not associated with a particular disease among the marker categories. One or more of the markers with strong association can be used as a marker for diagnosing diseases, and the markers with weak association can be combined with the markers with strong association to diagnose a certain disease, so that the accuracy of detection results is improved.
For the numerous biomarkers found in the serum of the present invention, these markers can be used to distinguish colorectal cancer from healthy persons. The markers herein may be used alone as individual markers for direct detection or diagnosis, and selection of such markers indicates that a relative change in the content of the markers has a strong correlation with colorectal cancer. Of course, it will be appreciated that simultaneous detection of one or more markers strongly associated with colorectal cancer may be selected. It is well understood that in some embodiments, the selection of highly correlated biomarkers for detection or diagnosis may be accurate to a standard, such as 60%,65%,70%,80%,85%,90% or 95% accuracy, and that these markers may be used to obtain intermediate values for diagnosing a disease, but are not indicative of a direct confirmation of a disease.
Of course, a differential protein with a larger ROC value may also be selected as a diagnostic marker. So-called strong, weak are typically confirmed by some algorithm, such as marker and colorectal cancer contribution rate or weight analysis. Such a calculation method may be significance analysispValues or FDR values) and Fold change (Fold change), the multivariate statistical analysis mainly comprises Principal Component Analysis (PCA), partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA), but also other methods such as ROC analysis, etc. Of course, other model predictive methods are possible, and the differential proteins disclosed herein may be selected when specifically selecting biomarkers, or may be predicted by model methods in combination with other known combinations of markers.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate an understanding of the invention and are not intended to limit the invention in any way. The reagents used in this example are all known products and are obtained by purchasing commercially available products.
Example 1 screening of biomarkers for colorectal cancer Using proteomics
1.1 collection of samples
The study panel collected 50 colorectal cancers and 50 healthy controls from 2021.7-2021.12, with all patients in the panel signed informed consent. Colorectal cancer patients are all results of pathological confirmation of living tissues, and healthy controls are normal physical examination. Inclusion criteria for colorectal cancer patients: (a) No history of other malignancy, (b) surgical treatment within one month after blood collection, and post-operative pathology confirmed colorectal cancer. Healthy persons of the control group were selected from the physical examination center; the gastrointestinal examination confirms that the patient has no gastrointestinal lesions, has no other serious diseases in physical examination, and has age and sex matched with the case. After informed consent, all serum samples collected were stored in a serum pool at-80 ℃.
Sample processing and enzymolysis
First, the plasma samples were centrifuged on a centrifuge for 15 minutes (15000 xg), and the supernatant was collected and filtered, followed by immunoaffinity chromatography to remove 14 high abundance proteins. Then concentrated on a centrifuge (4000 Xg,1 hour) with a concentration tube having a molecular weight cut-off of 3 kDa. The concentrate was recovered and subjected to solution displacement (Buffer Exchange) on a centrifuge (1000 Xg,2 minutes) using a desalting column having a molecular weight of 7kDa, the displacement solution being AEX-A (20mM Tris,4M Urea,3% isopanol, pH 8.0). Protein concentration in the samples was determined using the BCA method with AEX-a as a blank. According to the sample grouping case of table 1, TCEP was added to the samples and protein reduction was performed by incubation at 37 ℃ for 30 minutes. The corresponding 6-plex TMT reagent was then added and incubated at room temperature for 1 hour in the dark for TMT labelling. The samples were then buffer-displaced with a Zeba column, the displacement fluid being AEX-a. After mixing the 6-plex TMT labeled samples, 2mLAEX-A was added to the mixed samples to a final volume of 5.5. 5.5 mL. The samples were filtered using a 0.22 m filter and the 6-plex TMT-labeled samples were separated using a 2D-HPLC system. The collected fractions were freeze-dried, and finally, a Trypsin-Lysin C mixed enzyme was added, the samples were incubated at 37℃for 5 hours to perform enzymolysis, and 5. Mu.L of 10% TFA was added to terminate the enzymolysis reaction. A total of 60 digested 2D-HPLC fractions were used for nano-LC-MS/MS analysis.
Table 1: proteomics study sample grouping
1.3 LC-MS/MS data acquisition and search analysis
The LC-MS/MS system is Easy-nLC 1200 and Q exact HFX, and the mobile phase A is aqueous solution containing 0.1% formic acid and 2% acetonitrile; mobile phase B was an aqueous solution containing 0.1% formic acid and 80% acetonitrile. The homemade analytical column had a length of 20cm and was filled with ReproSil-Pur C18,1.9 μm particles of Dr. Maisch GmbH. 1. Mu.g of peptide fragment was dissolved in mobile phase A and separated using EASY-nLC 1200 ultra high performance liquid phase system. Setting a liquid phase gradient: 0-26 min,7% -22% of B;26-34 min,22% -32% of B;34-37 min,32% -80% of B;37-40 min,80% B, liquid flow rate maintained at 450 nL/min.
Injecting the peptide segment separated by the high performance liquid phase system into a NanoFlex ion source for atomization, and then, feeding the peptide segment into Q exact HF-X for mass spectrometry. The ion source voltage is set to 2.1 kV, the primary mass spectrum scanning range is set to 400-1200, and the Resolution is 60,000 (MS Resolution); the start of the secondary mass spectrum scan range was 100 m/z, and the Resolution was set to 15,000 (MS 2 Resolution). Data dependent scanning (DDA) mode setting TOP 20 parent ions enter an HCD collision cell sequentially for fragmentation and then sequentially carry out secondary mass spectrometry. The Automatic Gain Control (AGC) is set to 5E4, the signal threshold is set to 1E4, and the maximum injection time is set to 22 ms. To avoid repeated scans of high abundance peptide fragments, the dynamic exclusion time for tandem mass spectrometry was set to 30 seconds.
Mass spectrum data obtained by LC-MS/MS were retrieved using Maxquat (v1.6.15.0). The data type is TMT proteomic data based on secondary reporter ion quantification, and the secondary spectrogram for quantification requires a parent ion ratio of greater than 75% in the primary spectrogram. Database source Uniprot database homo_sapiens_9606_protein (release: 2021-10-14, sequence: 20614), and common pollution library is added into the database, and pollution proteins are deleted during data analysis; the enzyme cutting mode is set as Trypsin/P; the number of the missed cut sites is set to 2; the parent ion mass error tolerance of first and Main search was set to 20ppm and 5 ppm, respectively, and the mass error tolerance of the secondary fragment ion was set to 20ppm. The fixed modification is cysteine alkylation, the variable modification is methionine oxidation and protein N-terminal acetylation. FDR was set to 1% for both protein identification and PSM identification.
Grouping samples by using orthogonal partial least square discriminant analysis, combining significance analysis, and screening differential proteins
Using single variantsScreening differential proteins by combining quantitative analysis and multivariate statistical analysis, wherein univariate analysis mainly comprises significance analysis of characteristic ions in different groups pValues or FDR values) and Fold change (Fold change), the multivariate statistical analysis mainly comprises Principal Component Analysis (PCA), partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA).
We have found 581 protein substances in combination, including some newly discovered markers associated with colorectal cancer, and some markers known and confirmed to be associated with colorectal cancer (e.g., carcinoembryonic antigen (CEA), carcinoembryonic antigen (CA 199), etc.).
For the 581 protein substances, protein substances with obvious content difference are obtained through analysis. All statistical analyses were performed using R, and specific R-related information is shown in table 2.
Table 2: r and related information thereof used in the present invention
Calculating variable projection importance (Variable Importance for the Projection, VIP) to measure influence intensity and interpretation ability of expression pattern of each protein on classification and discrimination of each group of samples, and further performing Wilcoxon rank sum test to obtain corrected samplespValue (FDR). Wilcoxon showed that the total content of 90 proteins in 581 protein substances was significantly reduced in the serum of colorectal cancer patients, and the content of 53 proteins in the serum of colorectal cancer patients was significantly increased (see FIG. 1 for details).
The results of ROC and OPLS-DA analysis are shown in FIG. 2, the abscissa is AUC obtained by ROC analysis, the ordinate is VIP value obtained by OPLS-DA analysis, the small and large of the dots represent p value obtained by Wilcoxon test calculation, and the color of the dots represents the significance evaluation of the VIP value.
According to the screening criteria for differential proteins: (1) When FC is>1.2 andadj.P.Val<at 0.01, protein was down-regulated for significant differences. (2) When FC is<0.83 andadj.P.Val<at 0.01, it is a significant differenceAnd (3) abnormal up-regulating protein. Based on this screening criteria, a total of 8 more significant differential proteins were found, including biomarkers (mucin-like 1 (ORM 1), mucin-like 2 (ORM 2), CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), ribonuclease family member 1 (RNASE 1), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal type 5 (SPINK 5), beta-1, 3-N-acetylglucosaminyl transferase 2 (B3 GNT 2)).
The present invention found 8 differential proteins that were predominantly significantly upregulated are shown in table 3:
table 3: up-regulation marker for colorectal cancer and normal health difference
The larger LogFC values and/or smaller adj.p.val values in table 3 indicate to some extent that the difference between the two groups is more pronounced, and also that the difference compound may have a higher diagnostic value.
As can be verified from table 2, among the 1256 colorectal cancer patients and the normal and healthy serum differential substances, 8 differential proteins were found to be more significantly different between colorectal cancer group and non-colorectal cancer group, and were used as markers for efficient prediction of colorectal cancer: mucin 1 (ORM 1), mucin 2 (ORM 2), CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), ribonuclease family member 1 (RNASE 1), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal type 5 (SPINK 5), beta-1, 3-N-acetylglucosamine transferase 2 (B3 GNT 2). Among these, the most significant differences in colorectal cancer and health were identified as mucin-like 2 (ORM 2), followed by CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), serine peptidase inhibitor Kazal type 5 (SPINK 5), mucin-like 1 (ORM 1), ribonuclease family member 1 (RNASE 1), beta-1, 3-N-acetylglucosamine transferase 2 (B3 GNT 2), alpha-trypsin inhibitor heavy chain 3 (ITIH 3).
Example 2: single biomarker prediction of colorectal cancer
This example demonstrates the likelihood that a single biomarker screened in example 1 is used to distinguish colorectal cancer from non-colorectal cancer, or to screen colorectal cancer patients from a population, or to predict whether an individual is a colorectal cancer patient or colorectal cancer in an individual.
Specifically, in this example, ROC curves of the 8 proteins obtained in example 1 were respectively established, and the results are shown in Table 4. In this embodiment, the advantage and disadvantage of the experimental result are determined by the area under the curve (AUC). Specifically, when AUC of 0.5 indicates no diagnostic value for a single protein; when AUC is greater than 0.5, it is indicated that individual proteins have diagnostic value; the greater the AUC, the higher the diagnostic value of the individual proteins.
Table 4: ROC analysis of differential protein ROC values of colorectal cancer and normal healthy samples and related information
It is noted that the correlation of the concentration change of the biomarker with the presence or absence of colorectal cancer can be distinguished by the AUC values, sensitivity, specificity, etc. in the table, wherein the AUC values are most intuitive and obvious. The higher the AUC value, the more accurate the biomarker is to distinguish between colorectal cancer and non-colorectal cancer populations.
From table 4, it can be verified that the concentration change of the above 8 biomarkers has obvious relevance to whether colorectal cancer is caused or not, and any one of the above biomarkers is singly adopted, and the concentration change is used for distinguishing the colorectal cancer crowd from the non-colorectal cancer crowd, so that the AUC value of the colorectal cancer crowd can reach more than 0.7, and the accuracy is high; wherein the association of CD74 molecule (CD 74) is highest, the AUC value reaches 0.838, the AUC value reaches 0.816, the RNASE1 is the ribonuclease family member 1, the AUC value reaches 0.801, and the AUC value sequentially reaches beta-1, 3-N-acetamido-glucose transferase 2 (B3 GNT 2), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal 5 (SPINK 5), pre-mucin 2 (ORM 2) and mucin 1 (ORM 1).
Example 3:10 protein combination biomarkers for predicting whether an individual is colorectal cancer
The colorectal cancer differential biomarker not only can be independently used as a candidate biomarker for colorectal cancer and health differential diagnosis, but also can be used for auxiliary diagnosis of colorectal cancer by selecting one or a combination of more of the colorectal cancer differential biomarkers. In general, the use of a single biomarker can be used to distinguish colorectal cancer from a serum sample of non-colorectal cancer or to make predictions of colorectal cancer, and the accuracy of the distinction or prediction is greater when multiple biomarkers are combined.
It is noted that a single biomarker that predicts colorectal cancer with greater accuracy, when combined with other biomarker(s), does not necessarily play a greater role in the combination; furthermore, the greater the number of biomarkers that are not employed, the greater the predictive accuracy (AUC value) of their combination. Therefore, in order to obtain the combined biomarker with better prediction accuracy, the research team of the invention performs a large number of verification experiments.
This example describes a model for predicting colorectal cancer constructed from mucin-like 1 (ORM 1), mucin-like 2 (ORM 2), CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), ribonuclease family member 1 (RNASE 1), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal 5 (SPINK 5), beta-1, 3-N-acetylglucosamintransferase 2 (B3 GNT 2) 8 protein markers, and 2 conventional markers carcinoembryonic antigen (CEA) and carcinoantigen 199 (CA 199), 10 protein markers (10 MP).
Acquiring data
Study population: 300 colorectal cancers and 650 healthy controls were collected from 2021.7-2021.12, and all patients in the group signed informed consent. Colorectal cancer patients are all living tissues and are confirmed by pathology, and healthy controls are normal physical examination (containing nodes or not containing nodes or people without colorectal cancer). Group personnel were entered according to 7: the ratio of 3 was divided into model group (colorectal cancer n=210, healthy control n=450) and test group (colorectal cancer n=90, healthy control n=200). The data information is as in table 5:
table 5: modeling sample information
Inclusion criteria for colorectal cancer patients: (a) No history of other malignancy, (b) surgical treatment within one month after blood collection, and post-operative pathology confirmed colorectal cancer. Healthy persons of the control group were selected from the physical examination center; the gastrointestinal examination confirms that the patient has no gastrointestinal lesions, has no other serious diseases in physical examination, and has age and sex matched with the case. After informed consent, all serum samples collected were stored in a serum pool at-80 ℃.
In this example, ELISA was performed on collected serum samples to obtain the concentration of 10 protein markers of mucin-like 1 (ORM 1), mucin-like 2 (ORM 2), CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), ribonuclease family member 1 (RNASE 1), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal type 5 (SPINK 5), beta-1, 3-N-acetylglucosamine transferase 2 (B3 GNT 2), carcinoembryonic antigen (CEA) and carcinoembryonic antigen 199 (CA 199).
Statistical analysis of experimental data
The Shapiro Wilk test was used to evaluate normal distribution and the non-parametric test Wilcoxon test was used to analyze differences in blood marker concentrations between colorectal cancer patients and healthy controls in the model and test groups, respectively.
In the model group, a combined diagnosis model of 10 colorectal cancer markers is constructed by adopting a method combining a plurality of machine learning methods. The predicted probability values are used to estimate the area under the Receiver Operator Characteristic (ROC) curve (AUC) with 95% Confidence Intervals (CI) to assess the discriminatory power of the multivariate diagnostic model.
Using the test set, the Youden Index (YI) was calculated to determine the predictive probability cut-off values for distinguishing colorectal cancer patients from normal controls. In addition, ROCs of individual markers and different subgroups were constructed and compared. Standard descriptive statistics, such as frequency, mean, median, positive Predictive Value (PPV), negative Predictive Value (NPV) and Standard Deviation (SD) were calculated to describe experimental results for the study population. Statistical analysis using R3.6.1, p-values less than 0.05 were considered statistically significant.
Construction of colorectal cancer diagnostic model
This example illustrates the construction of a diagnostic model for colorectal cancer using a biomarker (10 MP) comprising a combination of 10 proteins as an example.
S201, concentration matrices of 10 protein markers of mucin-like 1 (ORM 1), mucin-like 2 (ORM 2), CD74 molecule (CD 74), mouse hybridoma cell 5 (FBLN 5), ribonuclease family member 1 (RNASE 1), alpha-trypsin inhibitor heavy chain 3 (ITIH 3), serine peptidase inhibitor Kazal type 5 (SPINK 5), beta-1, 3-N-acetamido glucose transferase 2 (B3 GNT 2), carcinoembryonic antigen (CEA) and carcinoembryonic antigen 199 (CA 199) of samples in the model group were taken as raw training data sets.
S201, dividing the original training data set into K subsets according to a K-fold cross validation mechanism. In order to ensure that the proportion of most types of samples and few types of samples in each folded subset is the same as that of the original data set, a layered K-fold cross validation (layered K-Folds cross validation) mechanism is adopted to divide data, K training data subsets obtained by dividing are divided, one subset is selected to serve as a validation set Ddev, and unselected training data subsets are combined to form a training data pool Dtrain.
S202, a generalized linear model (glmcet) algorithm is selected to be used for constructing a prediction model, and a grid search range is adopted in a hyper-parameter optimization process of the algorithm. In this step, the grid search range of the hyper-parametric optimization of the model is set for each algorithm as shown in table 6.
Table 6: parameter grid search range of glmnet algorithm
S203, selecting one of the super-parameter combination modes as a parameter for constructing a prediction model according to the algorithm and the super-parameter setting range set in the step S202, and constructing the prediction model based on the selected supervised classification algorithm and the super-parameter according to the training data set Dtrain obtained in the step S201.
In addition, the construction step of this embodiment further includes:
s204, according to the prediction model obtained in the step S203, evaluating in a verification set Ddev to obtain an AUC value, and storing the current prognosis prediction model and the corresponding AUC value in a prediction model Pool for selection of a future base prediction model. The evaluation mentioned in this step may be an AUC value or other reasonable index for evaluating the performance of the model.
S205, judging whether each subset is all verified. If all the subsets are used as verification sets and training is completed, the next step S206 is continued; if there is a subset that is not used as the verification set, step S201 is performed to select the subset as the verification set Ddev. By the step, in the original data set, each sample is verified, so that the stability of the model is improved, and the model is prevented from being overfitted to a certain subset.
S206, taking the AUC average value of all models of the Pool of the prediction models as the final performance evaluation value of the model of the current combination mode. And storing the model parameters and the final performance evaluation AUC value into an optimal model Pool.
S207, judging whether all the super parameter combination modes construct a prediction model. In step S202, it is obtained whether all algorithms and corresponding hyper-parameter combinations have been subjected to the construction of the prediction model. If all the combination modes are completed to construct the model, executing the following step S208; if the combination method does not complete the construction of the model, step S203 is executed.
S208, selecting a prediction model with the highest AUC value for each algorithm from the optimal model Pool obtained after the iteration of the step S207, and storing the prediction model into a candidate prediction model set M.set for colorectal cancer diagnosis.
S209, selecting a model with the largest AUC value from the model set M.set obtained in the step S208 as a final prediction model for colorectal cancer diagnosis.
Colorectal cancer diagnostic model (10 MP) parameter optimization
By performing the model building step described above, we obtained a model built under a combination of 9 different glrnet algorithm hyper-parameters (fig. 3) and model performance was assessed by AUC values. As shown in table 7 and fig. 3: AUC reached a maximum of 0.897 when the glmnet algorithm super-parameter combination was alpha=0.55, lambda=0.0551 (AUC was calculated using 10-fold cross validation method during modeling).
Table 7: AUC of model constructed under different hyper-parameter combinations of glmnet algorithm
Therefore, the equation of the model constructed based on the optimal hyper-parametric combination constructed by using the biomarkers of 10 protein combinations in this embodiment is:
where Y is a predicted value, i denotes the i-th biomarker, m denotes the number of biomarkers (m=10), xi denotes the detection value of the i-th biomarker (μg/mL), ki denotes the coefficient of the i-th biomarker (table 8), and b is a constant 2.28584755043089.
Table 8: coefficients of 10 biomarkers in model
3.5 determination of diagnostic model for colorectal cancer (10 MP) diagnostic threshold
The ROC curve is plotted with the predicted values in the model set and the optimal diagnostic cutoff is set to 0.472 based on the about log (you den) index value. Namely, when the predicted value of the diagnostic model is less than or equal to 0.472, the tested person is not considered to be a colorectal cancer patient; when the model predictive value is > 0.472, the subject is considered to be a colorectal cancer patient. The results are shown in FIG. 4: the AUC of the model in the model group was 0.886, the sensitivity was 90.6% and the specificity was 83.3%.
Colorectal cancer diagnostic model (10 MP) validation
ROC curves were plotted with the predicted values in the test set, as shown in fig. 5, with AUC 0.827. And sets the optimal diagnostic cutoff to 0.465 based on the about log (you den) index value. Namely, when the predicted value of the diagnostic model is less than or equal to 0.465, the tested person is not considered to be a colorectal cancer patient; when the model predictive value is > 0.465, the subject is considered to be a colorectal cancer patient. The results are shown in FIG. 6: the accuracy of the model in the test group was 76.1%, kappa value was 0.457, sensitivity was 59.4%, specificity was 85.0%, positive predictive rate was 67.9%, and negative predictive rate was 79.7%.
Example 4: comparison of colorectal cancer diagnostic models constructed based on biomarkers of different protein combinations
To further analyze the diagnostic value of colorectal cancer diagnostic models constructed based on biomarkers of different protein combinations, diagnostic models constructed based on biomarkers of different protein combinations were compared in the test set in this example. The results are shown in fig. 7 and table 9, with table 10 showing the coefficients of the Max AUC Panel biomarkers in table 9.
Table 9: area under ROC curve comparison of diagnostic model constructed based on different protein combination biomarkers
Table 10: coefficients of biomarkers of Max AUC Panel of diagnostic model constructed from 2MP-10MP biomarkers
Theoretically, the more markers can provide more information for disease diagnosis. The process of modeling is to explain the role of each marker in disease diagnosis. The interpretation of a part of the markers by the model may deviate, which may instead reduce the model performance in the test set. It is desirable to optimize model parameters to enhance the interpretation ability of the markers, as well as to exclude those markers that are prone to interference with the model. This process requires that the optimal combination form be found by permutation and combination.
As can be verified from tables 9, 10 and 7, as the amount of protein contained in the biomarker increases, the average AUC value of the model constructed increases, but the diagnostic value of the particular model appears more unpredictable, e.g., see max. Set of data in table 9, the AUC value of the model constructed appears to change from increasing to decreasing as the amount of protein contained in the biomarker increases, whereas min. Set, 1st Qu. Set, median set, mean set and 3rd Qu. The AUC value of the model appears to change as the amount of protein in the biomarker changes. In addition, table 9 also verifies from one side that when the number of proteins contained in the biomarker is the same, the use of different combinations of proteins will also result in different diagnostic value of the colorectal cancer diagnostic model constructed.
Furthermore, the performance of the model constructed based on the 10MP biomarker was compared in this example with the traditional markers (CEA and CA 199) and their combinations (2 MP, including CEA and CA 199) in the test group. The results are shown in fig. 8 and table 11:
table 11: colorectal cancer diagnostic model (10 MP) versus traditional markers and ROC curve area under combination thereof
As can be confirmed from fig. 8 and table 11, the diagnosis value of the colorectal cancer diagnosis model (10 MP) is significantly (p < 0.05) higher than that of the conventional marker or the conventional marker combination model by using the test result of the AUC difference significance test method.
Example 5: system for predicting whether individual is colorectal cancer
This example shows a system for predicting whether an individual is colorectal cancer, as shown in fig. 9, comprising:
a data acquisition module for acquiring the concentration of a biomarker in the serum of a model group sample, wherein the detected biomarker is selected from one of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, or at least two of ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, SPINK5, B3GNT2, CEA, CA 199; wherein the model group comprises colorectal cancer group samples and healthy control samples;
and (3) constructing a model module: the model is built by adopting the following steps:
s001, adopting biomarker concentration of samples in a model group as an original training data set, dividing the original training data set into K subsets according to a K-fold cross validation mechanism, selecting one subset as a validation set Ddev, and combining unselected subsets to form a training data pool Dtrain;
S002, selecting a generalized linear model (glmcet) algorithm for constructing a prediction model and a grid search range in a hyper-parameter optimization process of the algorithm, and determining parameters constructed by the prediction model;
s003, based on the training data pool Dtrain obtained in the step S001, constructing a prediction model by adopting the algorithm and the super parameters selected in the step S002.
And a prediction module: and predicting the individual by using the model constructed by the model constructing module.
Note that, the respective modules provided in this embodiment are similar to the methods and embodiments provided in embodiment 3 and embodiment 3, and are not described herein for brevity.
It will be appreciated by those of ordinary skill in the art that the division of the individual modules of the system of the embodiment to predict whether an individual will be colorectal cancer is merely a division of one logical function, and may be fully or partially integrated into one physical entity or physically separated in the actual implementation; and these modules may all be implemented in software, in the form of processing element calls; or all in hardware; or part of the modules are called by the processing element, and part of the modules are realized by the form of hardware. In addition, it should be noted that these modules may be fully or partially integrated together in the present embodiment, or may be implemented separately. The processing element here may be an integrated circuit with signal processing capabilities.
In the implementation of this embodiment, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form. For example, the modules above may be one or more integrated circuits configured to the risk prediction model modeling method of the present invention, such as one or more specific integrated circuits, or one or more microprocessors, or one or more field programmable gate arrays, or the like. For another example, when the above modules are implemented in the form of processing element program code, the processing element may be a general purpose processing element, such as a central processing unit or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-chip.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.
Claims (12)
1. A method of constructing a model for predicting whether an individual is colorectal cancer, the method comprising:
(1) Data acquisition, setting a model group, and acquiring the concentration of a biomarker in serum of a sample of the model group; wherein the model group comprises colorectal cancer group samples and healthy control samples, and the detected biomarkers are the combination of SPINK5 and ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, B3GNT2, CEA and CA 199;
(2) The model construction comprises the following steps:
s201, adopting biomarker concentration of samples in a model group as an original training data set, dividing the original training data set into K subsets according to a K-fold cross validation mechanism, selecting one subset as a validation set Ddev, and combining unselected subsets to form a training data pool Dtrain;
s202, selecting a generalized linear model algorithm for constructing a prediction model and a grid search range in a super-parameter optimization process of the algorithm, and determining parameters constructed by the prediction model;
s203, based on the training data pool Dtrain obtained in S201, constructing a prediction model by adopting the algorithm and the super parameters selected in S202.
2. The method according to claim 1, further comprising S204 of calculating AUC values of the model as final performance evaluation values of the model by ROC method in the validation set Ddev according to the prediction model obtained in S203.
3. The method of constructing a model for predicting whether an individual is colorectal cancer as claimed in claim 2, wherein the equation for constructing the model based on the hyper-parametric combination is:
wherein Y is a predicted value, i represents an i-th biomarker, m represents the number of proteins combined in the biomarker, xi represents a detected value of the i-th protein contained in the biomarker, ki represents a coefficient of the i-th biomarker, and b is a constant.
4. A system for predicting whether an individual is colorectal cancer, the system comprising:
and a data acquisition module: obtaining the concentration of a biomarker in serum of a sample of a model group, wherein the biomarker to be detected is a combination of SPINK5 and ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, B3GNT2, CEA and CA 199; wherein the model group comprises colorectal cancer group samples and healthy control samples;
and (3) constructing a model module: the model is built by adopting the following steps:
s001, adopting biomarker concentration of samples in a model group as an original training data set, dividing the original training data set into K subsets according to a K-fold cross validation mechanism, selecting one subset as a validation set Ddev, and combining unselected subsets to form a training data pool Dtrain;
S002, selecting a generalized linear model algorithm for constructing a prediction model and a grid search range in a super-parameter optimization process of the algorithm, and determining parameters constructed by the prediction model;
s003, constructing a prediction model by adopting the algorithm and the super parameters selected in the S002 based on the training data pool Dtrain obtained in the S001;
and a prediction module: and predicting the individual by using the model constructed by the model constructing module.
5. The system for predicting whether an individual is colorectal cancer of claim 4, further comprising S004, calculating AUC values at the validation set Ddev using ROC method as final performance assessment value of the model according to the prediction model obtained in S003.
6. The system for predicting whether an individual is colorectal cancer of claim 5, wherein the equation for constructing the model based on the hyper-parametric combination is:
wherein Y is a predicted value, i represents an i-th biomarker, m represents the number of proteins combined in the biomarker, xi represents a detected value of the i-th protein contained in the biomarker, ki represents a coefficient of the i-th biomarker, and b is a constant.
7. A computer readable storage medium having a computer program stored thereon; the computer program, when executed by a processor, implements a method of constructing a model of any one of claims 1-3 for predicting whether an individual is colorectal cancer.
8. An information data processing terminal, characterized by implementing a method of constructing a model for predicting whether an individual is colorectal cancer according to any one of claims 1 to 3.
9. Use of a biomarker in the preparation of a reagent for predicting whether an individual is colorectal cancer, characterized in that the biomarker is a combination of SPINK5 with ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, B3GNT2, CEA, CA 199.
10. The use according to claim 9, wherein the reagent is for detecting a biomarker in a body fluid sample.
11. The use of claim 10, wherein the detection of a marker in a body fluid sample is detection of the presence or relative abundance or concentration of a biomarker in a body fluid sample of an individual.
12. A product for predicting whether an individual is colorectal cancer comprising a kit or chip comprising reagents for detecting a biomarker, wherein the biomarker is a combination of SPINK5 and ORM1, ORM2, CD74, FBLN5, RNASE1, ITIH3, B3GNT2, CEA, CA 199.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310770060.0A CN116519954B (en) | 2023-06-28 | 2023-06-28 | Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310770060.0A CN116519954B (en) | 2023-06-28 | 2023-06-28 | Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116519954A CN116519954A (en) | 2023-08-01 |
CN116519954B true CN116519954B (en) | 2023-10-27 |
Family
ID=87394365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310770060.0A Active CN116519954B (en) | 2023-06-28 | 2023-06-28 | Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116519954B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019048588A1 (en) * | 2017-09-07 | 2019-03-14 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Mixed protein and autoantibody biomarker panel for diagnosing colorectal cancer |
WO2021116057A1 (en) * | 2019-12-13 | 2021-06-17 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Biomarker panel for diagnosing colorectal cancer |
CN113767289A (en) * | 2019-05-08 | 2021-12-07 | 德国癌症研究公共权益基金会 | Colorectal cancer screening and early detection method |
CN113820401A (en) * | 2020-06-18 | 2021-12-21 | 深圳市第二人民医院(深圳市转化医学研究院) | Noninvasive and rapid screening method for discovering digestive system tumor polypeptide spectrum biomarkers based on mass spectrometry technology and application thereof |
CN113866424A (en) * | 2021-09-14 | 2021-12-31 | 哈尔滨医科大学 | Application of carbonic anhydrase 1 and acid sphingomyelinase-like phosphodiesterase 3a as molecular markers in colorectal cancer diagnosis |
CN114609389A (en) * | 2022-03-07 | 2022-06-10 | 游顶云 | Method for using ZNF326 protein as II-stage colorectal cancer curative effect prediction marker |
CN114758719A (en) * | 2022-06-10 | 2022-07-15 | 杭州凯莱谱精准医疗检测技术有限公司 | Colorectal cancer prediction system and application thereof |
CN114755422A (en) * | 2022-06-10 | 2022-07-15 | 杭州凯莱谱精准医疗检测技术有限公司 | Biomarker for colorectal cancer detection and application thereof |
CN115575636A (en) * | 2022-11-22 | 2023-01-06 | 杭州广科安德生物科技有限公司 | Biomarker for lung cancer detection and system thereof |
CN115798712A (en) * | 2023-01-29 | 2023-03-14 | 杭州广科安德生物科技有限公司 | System and biomarker for diagnosing whether person to be tested is breast cancer |
CN116030880A (en) * | 2022-11-01 | 2023-04-28 | 苏州科技大学 | Biomarker for colorectal cancer prognosis risk prediction, model and application thereof |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090275057A1 (en) * | 2006-03-31 | 2009-11-05 | Linke Steven P | Diagnostic markers predictive of outcomes in colorectal cancer treatment and progression and methods of use thereof |
CN103782174A (en) * | 2011-06-07 | 2014-05-07 | 卡里斯生命科学卢森堡控股有限责任公司 | Circulating biomarkers for cancer |
US20150141273A1 (en) * | 2012-04-26 | 2015-05-21 | Stichting Vu-Vumc | Biomarkers |
SG11201504023SA (en) * | 2012-12-03 | 2015-06-29 | Almac Diagnostics Ltd | Molecular diagnostic test for cancer |
WO2021035098A2 (en) * | 2019-08-21 | 2021-02-25 | The Regents Of The University Of California | Systems and methods for machine learning-based identification of sepsis |
CA3202255A1 (en) * | 2020-12-21 | 2022-06-30 | Hayley WARSINSKE | Markers for the early detection of colon cell proliferative disorders |
-
2023
- 2023-06-28 CN CN202310770060.0A patent/CN116519954B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019048588A1 (en) * | 2017-09-07 | 2019-03-14 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Mixed protein and autoantibody biomarker panel for diagnosing colorectal cancer |
CN113767289A (en) * | 2019-05-08 | 2021-12-07 | 德国癌症研究公共权益基金会 | Colorectal cancer screening and early detection method |
WO2021116057A1 (en) * | 2019-12-13 | 2021-06-17 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Biomarker panel for diagnosing colorectal cancer |
CN113820401A (en) * | 2020-06-18 | 2021-12-21 | 深圳市第二人民医院(深圳市转化医学研究院) | Noninvasive and rapid screening method for discovering digestive system tumor polypeptide spectrum biomarkers based on mass spectrometry technology and application thereof |
CN113866424A (en) * | 2021-09-14 | 2021-12-31 | 哈尔滨医科大学 | Application of carbonic anhydrase 1 and acid sphingomyelinase-like phosphodiesterase 3a as molecular markers in colorectal cancer diagnosis |
CN114609389A (en) * | 2022-03-07 | 2022-06-10 | 游顶云 | Method for using ZNF326 protein as II-stage colorectal cancer curative effect prediction marker |
CN114758719A (en) * | 2022-06-10 | 2022-07-15 | 杭州凯莱谱精准医疗检测技术有限公司 | Colorectal cancer prediction system and application thereof |
CN114755422A (en) * | 2022-06-10 | 2022-07-15 | 杭州凯莱谱精准医疗检测技术有限公司 | Biomarker for colorectal cancer detection and application thereof |
CN115436633A (en) * | 2022-06-10 | 2022-12-06 | 杭州凯莱谱精准医疗检测技术有限公司 | Biomarker for colorectal cancer detection and application thereof |
CN116030880A (en) * | 2022-11-01 | 2023-04-28 | 苏州科技大学 | Biomarker for colorectal cancer prognosis risk prediction, model and application thereof |
CN115575636A (en) * | 2022-11-22 | 2023-01-06 | 杭州广科安德生物科技有限公司 | Biomarker for lung cancer detection and system thereof |
CN115798712A (en) * | 2023-01-29 | 2023-03-14 | 杭州广科安德生物科技有限公司 | System and biomarker for diagnosing whether person to be tested is breast cancer |
Non-Patent Citations (4)
Title |
---|
Chenchen Guo et al..Weighted gene co-expression network analysis combined with machine learning validation to identify key hub biomarkers in colorectal cancer.Functional & Integrative Genomics.2022,第23卷(第24期),全文. * |
基于质谱的高通量蛋白质组学技术探索肿瘤蛋白标志物的研究进展;冉冰冰;梁楠;孙辉;;中国肿瘤临床(第08期);全文 * |
血液标志物用于结直肠癌早期筛查的研究进展;杨喜艳;程宗勇;;临床与病理杂志(第06期);全文 * |
血清CEA阴性结直肠癌患者血清差异蛋白质的初步筛选;廖存;贺永明;吴留成;甘嘉亮;曹云飞;高枫;;结直肠肛门外科(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116519954A (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115575636B (en) | Biomarker for lung cancer detection and system thereof | |
CN103140760B (en) | The diagnosis of colorectal cancer | |
US8772038B2 (en) | Detection of saliva proteins modulated secondary to ductal carcinoma in situ of the breast | |
US20170059581A1 (en) | Methods for diagnosis and prognosis of inflammatory bowel disease using cytokine profiles | |
JP2022524298A (en) | Biomarker for diagnosing ovarian cancer | |
US20170176441A1 (en) | Protein biomarker profiles for detecting colorectal tumors | |
CN115798712B (en) | System for diagnosing whether person to be tested is breast cancer or not and biomarker | |
EP3012634B1 (en) | Biomarker for rheumatoid arthritis diagnosis or activity evaluation | |
CN116626297B (en) | System for pancreatic cancer detection and reagent or kit thereof | |
CN116519954B (en) | Colorectal cancer detection model construction method, colorectal cancer detection model construction system and biomarker | |
KR102402428B1 (en) | Multiple biomarkers for diagnosing ovarian cancer and uses thereof | |
AU2019297457A1 (en) | Kits and methods for detecting markers | |
CN113785199B (en) | Protein characterization for diagnosing colorectal cancer and/or pre-cancerous stage | |
WO2022192857A9 (en) | Biomarkers for determining an immuno-oncology response | |
CN115427811A (en) | Methods relating to prostate cancer diagnosis | |
CN116593702B (en) | Biomarker and diagnostic system for lung cancer | |
CN117169504A (en) | Biomarker for gastric cancer related parameter detection and related prediction system and application | |
US20240118282A1 (en) | Kits and methods for detecting markers and determining the presence or risk of cancer | |
CN115902223A (en) | Application of protein biomarker in diagnosis of gastric cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |