CN116344027A - Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein - Google Patents
Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein Download PDFInfo
- Publication number
- CN116344027A CN116344027A CN202310110374.8A CN202310110374A CN116344027A CN 116344027 A CN116344027 A CN 116344027A CN 202310110374 A CN202310110374 A CN 202310110374A CN 116344027 A CN116344027 A CN 116344027A
- Authority
- CN
- China
- Prior art keywords
- mir
- adenocarcinoma
- intestinal adenoma
- model
- protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010051635 Gastrointestinal tract adenoma Diseases 0.000 title claims abstract description 39
- 208000009956 adenocarcinoma Diseases 0.000 title claims abstract description 36
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 25
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 25
- 238000003745 diagnosis Methods 0.000 title claims abstract description 19
- 230000017531 blood circulation Effects 0.000 title claims abstract description 13
- 210000005259 peripheral blood Anatomy 0.000 title claims abstract description 13
- 239000011886 peripheral blood Substances 0.000 title claims abstract description 13
- 229920002477 rna polymer Polymers 0.000 title claims description 14
- 230000014509 gene expression Effects 0.000 claims abstract description 12
- 210000004369 blood Anatomy 0.000 claims abstract description 9
- 239000008280 blood Substances 0.000 claims abstract description 9
- 239000002679 microRNA Substances 0.000 claims abstract description 8
- 108700011259 MicroRNAs Proteins 0.000 claims abstract description 7
- 239000002253 acid Substances 0.000 claims abstract description 6
- 238000013135 deep learning Methods 0.000 claims abstract description 5
- 239000000439 tumor marker Substances 0.000 claims abstract description 5
- 238000007637 random forest analysis Methods 0.000 claims description 19
- 239000000523 sample Substances 0.000 claims description 17
- 238000012216 screening Methods 0.000 claims description 17
- 206010009944 Colon cancer Diseases 0.000 claims description 16
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 13
- 206010028980 Neoplasm Diseases 0.000 claims description 10
- 230000008030 elimination Effects 0.000 claims description 10
- 238000003379 elimination reaction Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000002790 cross-validation Methods 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 8
- 108091037787 miR-19b stem-loop Proteins 0.000 claims description 8
- 108091028067 miR-19b-1 stem-loop Proteins 0.000 claims description 8
- 108091091434 miR-19b-2 stem-loop Proteins 0.000 claims description 8
- 230000003902 lesion Effects 0.000 claims description 7
- 208000003200 Adenoma Diseases 0.000 claims description 6
- 108091062762 miR-21 stem-loop Proteins 0.000 claims description 6
- 108091065175 miR-3613 stem-loop Proteins 0.000 claims description 6
- 239000000090 biomarker Substances 0.000 claims description 4
- 238000010801 machine learning Methods 0.000 claims description 4
- 108091045790 miR-106b stem-loop Proteins 0.000 claims description 4
- 108091031326 miR-15b stem-loop Proteins 0.000 claims description 4
- 108091027943 miR-16 stem-loop Proteins 0.000 claims description 4
- 108091074057 miR-16-1 stem-loop Proteins 0.000 claims description 4
- 108091056204 miR-16-2 stem-loop Proteins 0.000 claims description 4
- 108091091751 miR-17 stem-loop Proteins 0.000 claims description 4
- 108091044046 miR-17-1 stem-loop Proteins 0.000 claims description 4
- 108091065423 miR-17-3 stem-loop Proteins 0.000 claims description 4
- 108091035591 miR-23a stem-loop Proteins 0.000 claims description 4
- 108091045911 miR-23a-1 stem-loop Proteins 0.000 claims description 4
- 108091047979 miR-23a-2 stem-loop Proteins 0.000 claims description 4
- 108091032054 miR-23a-3 stem-loop Proteins 0.000 claims description 4
- 108091029166 miR-23a-4 stem-loop Proteins 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 102100040069 Aldehyde dehydrogenase 1A1 Human genes 0.000 claims description 2
- 102100023635 Alpha-fetoprotein Human genes 0.000 claims description 2
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 claims description 2
- 108010078239 Chemokine CX3CL1 Proteins 0.000 claims description 2
- 102100031940 Epithelial cell adhesion molecule Human genes 0.000 claims description 2
- 102000013818 Fractalkine Human genes 0.000 claims description 2
- 102000000802 Galectin 3 Human genes 0.000 claims description 2
- 108010001517 Galectin 3 Proteins 0.000 claims description 2
- 108010041834 Growth Differentiation Factor 15 Proteins 0.000 claims description 2
- 102100040896 Growth/differentiation factor 15 Human genes 0.000 claims description 2
- 101000890570 Homo sapiens Aldehyde dehydrogenase 1A1 Proteins 0.000 claims description 2
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 claims description 2
- 101000920667 Homo sapiens Epithelial cell adhesion molecule Proteins 0.000 claims description 2
- 101000623901 Homo sapiens Mucin-16 Proteins 0.000 claims description 2
- 101001024605 Homo sapiens Next to BRCA1 gene 1 protein Proteins 0.000 claims description 2
- 101000622304 Homo sapiens Vascular cell adhesion protein 1 Proteins 0.000 claims description 2
- 101710123134 Ice-binding protein Proteins 0.000 claims description 2
- 101710082837 Ice-structuring protein Proteins 0.000 claims description 2
- 102100023123 Mucin-16 Human genes 0.000 claims description 2
- 102100024616 Platelet endothelial cell adhesion molecule Human genes 0.000 claims description 2
- 101710107540 Type-2 ice-structuring protein Proteins 0.000 claims description 2
- 102100023543 Vascular cell adhesion protein 1 Human genes 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 108091068974 miR-101 stem-loop Proteins 0.000 claims description 2
- 108091053561 miR-101-1 stem-loop Proteins 0.000 claims description 2
- 108091093015 miR-101-2 stem-loop Proteins 0.000 claims description 2
- 108091064157 miR-106a stem-loop Proteins 0.000 claims description 2
- 108091041195 miR-106a-1 stem-loop Proteins 0.000 claims description 2
- 108091051053 miR-106a-2 stem-loop Proteins 0.000 claims description 2
- 108091049679 miR-20a stem-loop Proteins 0.000 claims description 2
- 108091030817 miR-20a-1 stem-loop Proteins 0.000 claims description 2
- 108091086627 miR-20a-2 stem-loop Proteins 0.000 claims description 2
- 108091069790 miR-20a-3 stem-loop Proteins 0.000 claims description 2
- 108091083275 miR-26b stem-loop Proteins 0.000 claims description 2
- 108091023108 miR-30e stem-loop Proteins 0.000 claims description 2
- 108091027549 miR-30e-1 stem-loop Proteins 0.000 claims description 2
- 108091029213 miR-30e-2 stem-loop Proteins 0.000 claims description 2
- 108091085488 miR-30e-3 stem-loop Proteins 0.000 claims description 2
- 108091047175 miR-374a stem-loop Proteins 0.000 claims description 2
- 239000013610 patient sample Substances 0.000 claims description 2
- 101001008874 Homo sapiens Mast/stem cell growth factor receptor Kit Proteins 0.000 claims 1
- 102100027754 Mast/stem cell growth factor receptor Kit Human genes 0.000 claims 1
- 230000035945 sensitivity Effects 0.000 abstract description 7
- 206010051925 Intestinal adenocarcinoma Diseases 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 9
- 238000007477 logistic regression Methods 0.000 description 9
- 239000003550 marker Substances 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 6
- 238000010200 validation analysis Methods 0.000 description 6
- 238000009534 blood test Methods 0.000 description 4
- 230000000984 immunochemical effect Effects 0.000 description 4
- 108091070501 miRNA Proteins 0.000 description 4
- 210000002966 serum Anatomy 0.000 description 4
- 206010001233 Adenoma benign Diseases 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 208000004232 Enteritis Diseases 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000000968 intestinal effect Effects 0.000 description 2
- 108091043184 miR-1246 stem-loop Proteins 0.000 description 2
- 108091032770 miR-451 stem-loop Proteins 0.000 description 2
- 108091030646 miR-451a stem-loop Proteins 0.000 description 2
- 239000013642 negative control Substances 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 239000013641 positive control Substances 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 206010064912 Malignant transformation Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 208000025865 Ulcer Diseases 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 230000005773 cancer-related death Effects 0.000 description 1
- 238000002052 colonoscopy Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 1
- 229960001484 edetic acid Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 230000002550 fecal effect Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 230000036212 malign transformation Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 239000002953 phosphate buffered saline Substances 0.000 description 1
- 230000002980 postoperative effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 239000012474 protein marker Substances 0.000 description 1
- 238000002271 resection Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 231100000397 ulcer Toxicity 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
- G01N33/57419—Specifically defined cancers of colon
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57484—Immunoassay; Biospecific binding assay; Materials therefor for cancer involving compounds serving as markers for tumor, cancer, neoplasia, e.g. cellular determinants, receptors, heat shock/stress proteins, A-protein, oligosaccharides, metabolites
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention discloses a method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation microribonucleic acid and protein, which is characterized by comprising the following steps of: according to the differential expression of the microRNA gene and the protein tumor marker in the detected blood sample, a deep learning method is adopted to establish an intestinal adenoma adenocarcinoma diagnosis model, and the intestinal adenoma adenocarcinoma diagnosis model is utilized to diagnose the intestinal adenoma adenocarcinoma. The invention has higher sensitivity, can diagnose the intestinal adenoma and adenocarcinoma patients more simply and accurately, is convenient for taking treatment means in time, and has higher clinical application value.
Description
Technical Field
The invention relates to the technical field of biological information, in particular to an intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein.
Background
Colorectal cancer is one of the most common malignant tumors worldwide, with the third most frequently occurring among various malignant tumors and the fourth most frequently occurring mortality, usually due to mutation and proliferation of colorectal mucosal epithelial cells. In recent years, with the rapid development of social economy and the huge change of people's economic level and life style, the incidence rate of colorectal cancer in China is steadily increased and the trend of low age is advanced, and the number of colorectal cancer patients diagnosed each year is increased by about 4% compared with the last year.
Colorectal cancer is one of the most common cancers and, to a large extent, causes many cancer-related deaths, despite continued progress in diagnosis and treatment, with high mortality, morbidity, and a trend toward reduced age. Many forms of colorectal cancer can be prevented by early and routine screening, i.e. by finding and excision before malignant transformation or metastasis of the precancerous lesions occurs. Despite extensive efforts to increase colorectal cancer screening rates, at least 40% of age-appropriate adults do not comply with screening guidelines. The main screening flow at present adopts a high risk factor risk questionnaire (HRFQ) and an immunochemical occult blood test (iFOBT) for primary screening, and the immunochemical occult blood test (iFOBT) is checked for 2 times, 1 week is separated from 2 times of checking, and any 1 time of checking is positive, and the result of the immunochemical occult blood test (iFOBT) is finally assessed to be positive; secondly, the high risk factor risk questionnaire (HRFQ) and the Immunochemical Fecal Occult Blood Test (iFOBT) positive are all included in the high risk group, and all the groups need to be subjected to colonoscopy; finally, if the colonoscope finds that ulcers and polypoid lesions exist, biopsy is also needed to be taken, and finally pathological diagnosis is carried out. The sampling steps are complex, the time span is large, the operation is too specialized, and the wide popularization is difficult.
Disclosure of Invention
The invention aims to provide a method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein. The invention utilizes early screening to diagnose the intestinal adenoma adenocarcinoma, and has the advantage of convenient and accurate detection.
The technical scheme of the invention is as follows: according to the method for diagnosing the intestinal adenoma adenocarcinoma based on peripheral blood circulation microRNA and protein, a deep learning method is adopted to establish an intestinal adenoma adenocarcinoma diagnosis model according to the differential expression of microRNA genes and protein tumor markers in a detected blood sample, and the intestinal adenoma adenocarcinoma diagnosis model is utilized to diagnose the intestinal adenoma adenocarcinoma.
The method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein, wherein the micro ribonucleic acid gene comprises the following steps: miR-23a-3p, miR-16-5P, miR-17-5p, miR-101-3p, miR-21-5p, miR-30e-5p, miR-19b-3p, miR-374a-5p, miR-20a-5p, miR-106a-5p, miR-26b-5p, miR-3613-5p, miR-15b-5p, miR-19b-3p and miR-106b-5p.
The method for diagnosing the intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein comprises the steps of CD31, galectin-3, AFP, GDF-15, CD106, CD66e, ALDH1A1, CA125, her3, CA15-3, fractalkine, CD and TROP1.
The method for diagnosing the intestinal adenoma adenocarcinoma based on the peripheral blood circulation microribonucleic acid and protein comprises the following steps of:
step (1), obtaining micro ribonucleic acid gene and protein tumor marker expression data of an intestinal adenoma adenocarcinoma tumor sample;
step (2), model prediction: training a classifier by using a random forest algorithm in the R language packet, and verifying other samples by using the generated prediction model;
step (3), model training: feature screening is performed by using Borata, and defined features are subjected to recursive feature elimination screening by using a cross-validation recursive feature elimination method;
step (4), characteristic elimination: using a machine learning model evaluation index in a shape RFECV of Probatus, and adopting a ten-fold cross validation method to search the number of features required in a training set; detecting a differential value of the biomarker between the colorectal cancer patient group and the control group at each of the doublets to nine tenths of training data with a false discovery rate < 0.05;
step (5), evaluating model accuracy: as an independent detection of precancerous lesions from patient samples diagnosed with advanced adenomas.
Compared with the prior art, the method provided by the invention has the advantages that the deep learning method is adopted to establish the intestinal adenoma adenocarcinoma diagnosis model according to the differential expression of the microribonucleic acid and the multiple tumor proteins in the blood sample of the intestinal adenoma adenocarcinoma patient, and the intestinal adenoma adenocarcinoma is diagnosed by using the intestinal adenoma adenocarcinoma diagnosis model. After sequencing and database comparison naming are carried out on the micro ribonucleic acid and tumor protein marker distribution extracted from sample plasma, 38 and 15 optimal characteristic markers are determined through a random forest model and a SHAP algorithm, the specificity and the accuracy of a colorectal cancer screening model constructed based on the random forest model algorithm are 0.879 and 0.875, the SHAP model is 0.788 and 0.8125, the area under a subject working characteristic curve of a patient diagnosed with intestinal adenoma adenocarcinoma by the random forest model is 0.926, and the area under a subject working characteristic curve of the patient diagnosed with the intestinal adenoma adenocarcinoma by the SHAP model is 0.901. The experimental results set forth above illustrate the feasibility of constructing the microribonucleic acid positive high expression of the optimal marker and the application of the protein model in early colorectal cancer screening based on a random forest model and a SHAP algorithm. The invention has higher sensitivity, can diagnose the intestinal adenoma and adenocarcinoma patients more simply and accurately, is convenient for taking treatment means in time, and has higher clinical application value.
Drawings
FIG. 1 is a graph of a subject's working characteristics for a random forest model;
FIG. 2 is a confusion matrix for a random forest model;
FIG. 3 is a feature quantity-cross validation score polyline analysis of a random forest model;
FIG. 4 is a graph showing the proportion of each marker in the random forest model in diagnosis;
FIG. 5 is a graph of the subject's working characteristics for a logistic regression model;
FIG. 6 is a confusion matrix of a logistic regression model;
FIG. 7 is a broken line analysis of the area under the marker number-subject working characteristic curve of the test set and the validation set by the logistic regression model;
fig. 8 shows the proportion of each marker in the logistic regression model in diagnosis.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not intended to be limiting.
Examples: construction and verification of an early screening prediction model based on the difference between serum microRNA and protein tumor markers of an intestinal adenoma adenocarcinoma patient.
1. Obtaining a micro ribonucleic acid and protein tumor marker sample;
blood was drawn from the elbow vein prior to enteroscopy using two 5mL ethylene diamine tetraacetic acid
The vacuum blood collection tube collects blood. Blood was centrifuged (1800 Xg, 4 ℃,10 min, two times) and plasma was collected, stored at-80 ℃, then assayed using nucleic acid extraction or purification reagents, 0.1mL of plasma was diluted 2-fold with phosphate buffered saline, and data sets were screened using nucleic acid extraction and purification reagents according to inclusion and exclusion criteria.
The inclusion criteria for the samples were: positive control group inclusion criteria:
(1) The sample type is a serum sample of a colorectal cancer patient in stage I/II or stage III/IV;
(2) Patient relapse free survival data is available;
(3) The detection technology is a gene expression profile chip.
The data set that all met the above 3 criteria will be incorporated into the subsequent analysis.
Negative control group inclusion criteria:
(1) The sample type is serum samples of healthy people, enteritis patients or intestinal benign lesions patients;
(2) Patient relapse free survival data is available;
(3) The detection technology is a gene expression profile chip.
The data set that all met the above 3 criteria will be incorporated into the subsequent analysis.
Independent validation group inclusion criteria:
(1) The sample type is a serum sample of an adenoma patient in the intestinal progression stage I;
(2) The detection technology is a gene expression profile chip.
Samples that all meet the above 2 criteria will be included in the subsequent analysis.
The exclusion criteria for the samples were:
(1) Sample type non-stage II colorectal cancer patient post-operative tumor tissue samples;
(2) Sample sources complete focal resections were performed over the last 4 weeks.
Samples that do not meet any 1 of the above 2 criteria will be excluded.
The samples finally included in the analysis include 63 normal persons, enteritis patients and 112 benign lesions patients as a negative control group; 203 cases of intestinal cancer patients are used as positive controls; 35 patients with advanced adenoma served as independent validation groups.
2. Machine learning based on random forest models;
model prediction was performed on the selected samples, 70% of the samples were randomly selected from each group, classifier was trained using the Random Forest (RF) algorithm in the scikit-learn packet, and the remaining samples were validated using the generated predictive model.
Feature (biomarker) screening was performed using Boruta, and Recursive Feature Elimination (RFE) screening was performed using cross-validation recursive feature elimination (RFECV method) to screen for defined features. Feature elimination was performed using the "roc _ auc" scoring method in Probatus (https:// pypi. Org/project/Probatus /) shape RFECV (SHAP importance elimination recursive feature), using a ten-fold cross-validation method, looking for the number of features required in the training set. At every one to nine-tenth of the training data for the False Discovery Rate (FDR) <0.05, the biomarker was tested for differences between colorectal cancer patients and the control group. And evaluating the accuracy of model prediction on the rest training samples, and drawing a working characteristic curve of the test subject. The performance of each random forest model in early detection of colorectal cancer patients was examined as an independent detection of precancerous lesions from a sample of patients diagnosed with progressive adenomas (n=35). Figure 1 shows a subject working characteristic curve of a random forest model, the area under the corresponding subject working characteristic curve is 0.926, the model specificity is 0.879, and the sensitivity reaches 0.932. Fig. 2 is a confusion matrix of a random forest model, according to which the prediction labels and the real labels in the matrix can calculate the specificity sensitivity and the like, and further drawing a working characteristic curve of a subject. FIG. 3 is a feature quantity-cross-validation score polyline analysis of a random forest model, illustrating a 38 number of markers selected for colorectal cancer diagnosis; FIG. 4 shows the proportion of markers of random forest models, and the correlation of each mirna with diagnostic performance in the models is embodied, wherein the markers comprise miR-15b-5p, miR-106b-5p, miR-16-5p, miR-451a, miR-19b-3p, miR-17-5p, miR1246, miR-19b-3p, miR-23a-3p, miR-21-5p, miR-3613-5p, miR-21-5p and miR-3613-5p.
3. Machine learning based on logistic regression models. FIG. 5 is a graph of the working characteristics of a subject in a logistic regression model, with an area under the corresponding working characteristics of 0.901, a model specificity of 0.788, and a sensitivity of 0.909; FIG. 6 is a confusion matrix of a logistic regression model, according to which predictive labels and real labels in the matrix can calculate specificity sensitivity, etc., further drawing a test subject working characteristic curve graph; FIG. 7 is a feature-marker polyline analysis of a logistic regression model, showing the area under the subject's working feature curve values for the corresponding training set and validation set in diagnostic models constructed with different numbers of mirna, where the analysis shows that the area under the subject's curve for the validation set for the training set is higher, especially the validation set, when the number of markers is around 14; namely, one marker number with the number of 14 to 24 is selected for modeling, and the model diagnosis effect is considerable; FIG. 8 is a graph showing the proportion of markers in a logistic regression model, and the correlation of each mirna in the model with diagnostic performance was specified. Wherein the marker comprises miR-15b-5p, miR-106b-5p, miR-16-5p, miR-451a, miR-19b-3p, miR-17-5p, miR1246, miR-19b-3p, miR-23a-3p, miR-21-5p, miR-3613-5p, miR-21-5p and miR-3613-5p.
By counting the evaluation results of the diagnostic model, the results are shown in table 1:
TABLE 1
According to the invention, after sequencing and database comparison naming are carried out on miRNA extracted from sample plasma, 38 and 15 optimal characteristic markers are determined through RFE and SHAP algorithms, the specificity and accuracy of a screening model constructed based on the RFE algorithm are 0.879 and 0.896, the SHAP model is 0.788 and 0.844, and the adenoma accuracy in the progressive stage reaches 0.875 and 0.8125 respectively. The results set forth above illustrate the feasibility of constructing the microribonucleic acid positive high expression of the optimal marker and the application of the protein model in early colorectal cancer screening based on a random forest model and a SHAP algorithm.
In conclusion, according to the differential expression of the microRNA and the multiple tumor proteins in the blood sample of the patient with the intestinal adenoma adenocarcinoma, a deep learning method is adopted to establish an intestinal adenoma adenocarcinoma diagnosis model, and the intestinal adenoma adenocarcinoma diagnosis model is utilized to diagnose the intestinal adenoma adenocarcinoma. The invention has higher sensitivity, can diagnose the intestinal adenoma and adenocarcinoma patients more simply and accurately, is convenient for taking treatment means in time, and has higher clinical application value.
Claims (4)
1. A method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation microRNA and protein is characterized in that: according to the differential expression of the microRNA gene and the protein tumor marker in the detected blood sample, a deep learning method is adopted to establish an intestinal adenoma adenocarcinoma diagnosis model, and the intestinal adenoma adenocarcinoma diagnosis model is utilized to diagnose the intestinal adenoma adenocarcinoma.
2. The method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein according to claim 1, wherein: the microribonucleic acid gene comprises: miR-23a-3p, miR-16-5P, miR-17-5p, miR-101-3p, miR-21-5p, miR-30e-5p, miR-19b-3p, miR-374a-5p, miR-20a-5p, miR-106a-5p, miR-26b-5p, miR-3613-5p, miR-15b-5p, miR-19b-3p and miR-106b-5p.
3. The method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein according to claim 1, wherein: the protein tumor markers include CD31, galectin-3, AFP, GDF-15, CD106, CD66e, ALDH1A1, CA125, her3, CA15-3, fractalkine, CD117 and TROP1.
4. The method for diagnosing intestinal adenoma adenocarcinoma based on peripheral blood circulation micro ribonucleic acid and protein according to claim 1, wherein: the construction method of the diagnosis model comprises the following steps:
step (1), obtaining micro ribonucleic acid gene and protein tumor marker expression data of an intestinal adenoma adenocarcinoma tumor sample;
step (2), model prediction: training a classifier by using a random forest algorithm in the R language packet, and verifying other samples by using the generated prediction model;
step (3), model training: feature screening is performed by using Borata, and defined features are subjected to recursive feature elimination screening by using a cross-validation recursive feature elimination method;
step (4), characteristic elimination: using a machine learning model evaluation index in a shape RFECV of Probatus, and adopting a ten-fold cross validation method to search the number of features required in a training set; detecting a differential value of the biomarker between the colorectal cancer patient group and the control group at each of the doublets to nine tenths of training data with a false discovery rate < 0.05;
step (5), evaluating model accuracy: as an independent detection of precancerous lesions from patient samples diagnosed with advanced adenomas.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310110374.8A CN116344027B (en) | 2023-02-14 | 2023-02-14 | Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310110374.8A CN116344027B (en) | 2023-02-14 | 2023-02-14 | Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116344027A true CN116344027A (en) | 2023-06-27 |
CN116344027B CN116344027B (en) | 2023-09-26 |
Family
ID=86876472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310110374.8A Active CN116344027B (en) | 2023-02-14 | 2023-02-14 | Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116344027B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103667516A (en) * | 2014-01-07 | 2014-03-26 | 山东大学齐鲁医院 | miRNAs specific expression profile and diagnosis model for early colonic adenocarcinoma and rectal adenocarcinoma |
CN105018594A (en) * | 2015-04-27 | 2015-11-04 | 广州医科大学附属第三医院 | Early-diagnosis marker for colorectal cancer and related kit |
CN109852714A (en) * | 2019-03-07 | 2019-06-07 | 南京世和基因生物技术有限公司 | A kind of early diagnosis of intestinal cancer and Diagnosis of Pituitary marker and purposes |
WO2019122341A1 (en) * | 2017-12-21 | 2019-06-27 | Belgian Volition Sprl | Method for the detection and treatment of colorectal adenomas |
CN110791565A (en) * | 2019-09-29 | 2020-02-14 | 浙江大学 | Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model |
US20220214345A1 (en) * | 2019-05-08 | 2022-07-07 | Deutsches Krebsforschungszentrum Sitftung des öffentlichen Rechts | Colorectal cancer screening examination and early detection method |
CN115094142A (en) * | 2022-07-19 | 2022-09-23 | 中国医学科学院肿瘤医院 | Methylation markers for diagnosing colorectal adenocarcinoma |
CN115521982A (en) * | 2022-09-26 | 2022-12-27 | 浙江洛兮医疗科技有限公司 | Construction of colorectal cancer serum exosome miRNA diagnosis classifier based on MLP |
-
2023
- 2023-02-14 CN CN202310110374.8A patent/CN116344027B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103667516A (en) * | 2014-01-07 | 2014-03-26 | 山东大学齐鲁医院 | miRNAs specific expression profile and diagnosis model for early colonic adenocarcinoma and rectal adenocarcinoma |
CN105018594A (en) * | 2015-04-27 | 2015-11-04 | 广州医科大学附属第三医院 | Early-diagnosis marker for colorectal cancer and related kit |
WO2019122341A1 (en) * | 2017-12-21 | 2019-06-27 | Belgian Volition Sprl | Method for the detection and treatment of colorectal adenomas |
CN109852714A (en) * | 2019-03-07 | 2019-06-07 | 南京世和基因生物技术有限公司 | A kind of early diagnosis of intestinal cancer and Diagnosis of Pituitary marker and purposes |
US20220214345A1 (en) * | 2019-05-08 | 2022-07-07 | Deutsches Krebsforschungszentrum Sitftung des öffentlichen Rechts | Colorectal cancer screening examination and early detection method |
CN110791565A (en) * | 2019-09-29 | 2020-02-14 | 浙江大学 | Prognostic marker gene for colorectal cancer recurrence prediction in stage II and random survival forest model |
CN115094142A (en) * | 2022-07-19 | 2022-09-23 | 中国医学科学院肿瘤医院 | Methylation markers for diagnosing colorectal adenocarcinoma |
CN115521982A (en) * | 2022-09-26 | 2022-12-27 | 浙江洛兮医疗科技有限公司 | Construction of colorectal cancer serum exosome miRNA diagnosis classifier based on MLP |
Non-Patent Citations (4)
Title |
---|
TANG W J 等: ""Diagnostic Value of 128-slice Spiral CT Combined with Virtual Colonoscopy for Colorectal Cancer"", 《CURRENT MEDICAL SCIENCE》, vol. 39, pages 146, XP036725429, DOI: 10.1007/s11596-019-2013-7 * |
宋志刚 等: ""基于深度学习的肠腺瘤病变识别"", 《诊断病理学杂志》, vol. 26, no. 4, pages 201 - 206 * |
谌燕 等: ""肠腺瘤样腺癌7例临床病理学特征"", 《临床与实验病理学杂志》, vol. 38, no. 12, pages 1515 - 1518 * |
陆玮 等: ""微生物与结直肠癌的发病机制、早期诊断和治疗的研究进展"", 《肿瘤防治研究》, vol. 47, no. 12, pages 909 - 914 * |
Also Published As
Publication number | Publication date |
---|---|
CN116344027B (en) | 2023-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230126920A1 (en) | Method and device for classification of urine sediment genomic dna, and use of urine sediment genomic dna | |
CN109830264B (en) | Method for classifying tumor patients based on methylation sites | |
CN111128385B (en) | Prognosis early warning system for esophageal squamous carcinoma and application thereof | |
CN111863250B (en) | Combined diagnosis model and system for early breast cancer | |
US20170059581A1 (en) | Methods for diagnosis and prognosis of inflammatory bowel disease using cytokine profiles | |
CN111564177B (en) | Construction method of early non-small cell lung cancer recurrence model based on DNA methylation | |
CN110838340A (en) | Method for identifying protein biomarkers independent of database search | |
US20210180140A1 (en) | Set of genes for bladder cancer detection and use thereof | |
CN112831562A (en) | Biomarker combination and kit for predicting recurrence risk of liver cancer patient after resection | |
AU2021378868A1 (en) | Panel of mirna biomarkers for diagnosis of ovarian cancer, method for in vitro diagnosis of ovarian cancer, uses of panel of mirna biomarkers for in vitro diagnosis of ovarian cancer and test for in vitro diagnosis of ovarian cancer | |
CN113355421A (en) | Lung cancer early screening marker, model construction method, detection device and computer readable medium | |
CN115424666A (en) | Method and system for screening pan-cancer early-screening molecular marker based on whole genome bisulfite sequencing data | |
CN113151460B (en) | Gene marker for identifying lung adenocarcinoma tumor cells and application thereof | |
CN116344027B (en) | Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein | |
CN116287248B (en) | miRNA gene for diagnosis of intestinal adenoma adenocarcinoma and application thereof | |
CN115521982A (en) | Construction of colorectal cancer serum exosome miRNA diagnosis classifier based on MLP | |
CN116449018B (en) | Plasma protein marker for diagnosis of intestinal adenoma adenocarcinoma and application | |
CN116287248A (en) | miRNA gene for diagnosis of intestinal adenoma adenocarcinoma and application thereof | |
CN113584175A (en) | Group of molecular markers for evaluating renal papillary cell carcinoma progression risk and screening method and application thereof | |
WO2023246808A1 (en) | Use of cancer-associated short exons to assist cancer diagnosis and prognosis | |
CN116593702B (en) | Biomarker and diagnostic system for lung cancer | |
CN115678999B (en) | Application of marker in lung cancer recurrence prediction and prediction model construction method | |
CN114150059B (en) | MCM3 related breast cancer biomarker kit, diagnosis system and related application thereof | |
CN117347643B (en) | Metabolic marker combination for judging benign and malignant pulmonary nodule, screening method and application thereof | |
CN116805509A (en) | Construction method and application of colorectal cancer immunotherapy prediction marker |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |