CN112397153A - Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis - Google Patents

Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis Download PDF

Info

Publication number
CN112397153A
CN112397153A CN202011295497.6A CN202011295497A CN112397153A CN 112397153 A CN112397153 A CN 112397153A CN 202011295497 A CN202011295497 A CN 202011295497A CN 112397153 A CN112397153 A CN 112397153A
Authority
CN
China
Prior art keywords
candidate
screening
molecules
gene expression
esophageal squamous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011295497.6A
Other languages
Chinese (zh)
Inventor
齐义军
李孟祥
陈攀
焦叶林
刘轲
冯笑山
高社干
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Henan University of Science and Technology
Original Assignee
First Affiliated Hospital of Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Henan University of Science and Technology filed Critical First Affiliated Hospital of Henan University of Science and Technology
Priority to CN202011295497.6A priority Critical patent/CN112397153A/en
Publication of CN112397153A publication Critical patent/CN112397153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Abstract

The application relates to the technical field of biology, and particularly discloses a method for screening biomarkers for predicting esophageal squamous cell carcinoma prognosis, which comprises the following steps: retrieving a biological information database according to the identification information of the esophageal squamous carcinoma to obtain a candidate molecules related to the esophageal squamous carcinoma; obtaining gene expression values of a candidate molecules; constructing an interaction network of a candidate molecules according to the gene expression value; performing a first screening on a plurality of candidate molecules based on an interaction network to filter part of the candidate molecules and reserve b candidate molecules which meet preset conditions; based on a prognosis prediction model, carrying out secondary screening on b candidate molecules reserved after the primary screening to obtain c preferred molecules; and (4) screening out the biomarkers for predicting the prognosis of the esophageal squamous cell carcinoma from the c preferential molecules. By the method, the credibility of the screening result is higher, and the accuracy of the screening result is higher.

Description

Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis
Technical Field
The application relates to the field of biotechnology, in particular to a method for screening biomarkers for predicting esophageal squamous cell carcinoma prognosis, a computer-readable storage medium and the biomarkers for predicting esophageal squamous cell carcinoma prognosis.
Background
Cancer is a disease with great harm to human beings, and according to related epidemiological data, about 1100 million new cancers occur every year. Among many cancers, esophageal cancer is the eighth and sixth most malignant tumors, and 45000 people worldwide suffer from the disease annually. Different from most esophageal adenocarcinomas in western countries, Esophageal Squamous Cell Carcinoma (ESCC), hereinafter referred to as esophageal squamous cell carcinoma or ESCC), accounts for more than 95% of the total incidence rate of esophageal carcinoma in China. The major current therapeutic approaches for esophageal squamous carcinoma are surgery, radiotherapy and chemotherapy. Although the comprehensive treatment means of esophageal squamous cell carcinoma is continuously improved in more than ten years, the prognosis is still not ideal, and the five-year overall survival rate is only 15% -25% at present.
During long-term research, the present inventors found that low reproducibility between different clinical studies is the biggest challenge in existing studies due to differences in molecular heterogeneity of tumors, specimen origin, tissue processing, detection techniques, data analysis, etc. Therefore, the method has extremely special clinical significance in China by exploring and researching biomarkers for esophageal squamous cell carcinoma prognosis and establishing a relatively accurate prognosis prediction model.
Disclosure of Invention
In view of the problems of molecular heterogeneity, sample source, tissue treatment, detection technology, data analysis and the like of tumors in the prior art and low repeatability of different clinical experiment results, the application provides a screening method of biomarkers for predicting esophageal squamous cell carcinoma prognosis, a computer-readable storage medium and the biomarkers for predicting esophageal squamous cell carcinoma prognosis.
In a first aspect, the present application provides a method of screening for biomarkers for predicting the prognosis of esophageal squamous carcinoma, the method comprising: retrieving a biological information database according to the identification information of the esophageal squamous carcinoma to obtain a candidate molecules related to the esophageal squamous carcinoma; obtaining gene expression values of a candidate molecules; constructing an interaction network of a candidate molecules according to the gene expression value; performing a first screening on a plurality of candidate molecules based on the interaction network to filter part of the candidate molecules and reserve b candidate molecules meeting preset conditions, wherein b is less than a; b candidate molecules reserved after the first screening are screened for the second time based on a prognosis prediction model to obtain c preferred molecules, wherein c is less than b; screening the biomarkers for predicting esophageal squamous carcinoma prognosis from the c preferential molecules.
In a second aspect, the present application provides a computer readable storage medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the foregoing method.
In a third aspect, the present application provides a biomarker for predicting esophageal squamous carcinoma prognosis, wherein the biomarker is screened by the method as described above.
The application has the advantages and beneficial effects that: different from the prior art, the candidate molecules related to esophageal squamous cell carcinoma are obtained from the biological information database, and basically all the candidate molecules related to esophageal squamous cell carcinoma are included. Meanwhile, the method screens a plurality of candidate molecules for the first time based on an interaction network of the candidate molecules to filter partial candidate molecules, screens the candidate molecules reserved after the first screening for the second time based on a prognosis prediction model to obtain preferred molecules, screens biomarkers for predicting esophageal squamous cell carcinoma prognosis from the preferred molecules, and screens the biomarkers in a classification mode through a plurality of models to enable the credibility of screening results to be higher and the accuracy of the screening results to be higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a schematic flow chart of a first embodiment of the screening method for biomarkers for predicting esophageal squamous carcinoma prognosis according to the present application;
FIG. 2 is a schematic flow chart of a second embodiment of the screening method for biomarkers for predicting esophageal squamous carcinoma prognosis according to the present application;
FIG. 3 is a schematic flow chart of a third embodiment of the screening method for biomarkers for predicting esophageal squamous carcinoma prognosis according to the present application;
FIG. 4 is a schematic flow chart of a fourth embodiment of the screening method for biomarkers for predicting esophageal squamous carcinoma prognosis according to the present application;
FIG. 5 is a schematic flow chart of a fifth embodiment of the screening method for biomarkers for predicting esophageal squamous carcinoma prognosis according to the present application;
FIG. 6 is a diagram of an interaction network of 16 candidate molecules;
FIG. 7 is the probability that 17 candidate molecules are included in a candidate molecule combination in 5 classifier algorithms;
FIG. 8 shows the survival analysis results of SFN in 179 samples of gene expression profiles (GEO number GSE 53625);
FIG. 9 is the results of a survival analysis of SFN in 37 TCGA-ESCC datasets;
fig. 10 is the survival analysis results of SFN in 89 independent experimental samples.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description of the present application, it should be noted that the terms "upper", "lower", "inner" and "outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present application. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect via an intermediate medium, and may be a communication between the two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
The application provides a method for screening biomarkers for predicting esophageal squamous cell carcinoma prognosis, wherein the biomarkers refer to molecules related to diseases, for example, the biomarkers can be molecules related to the existence, stage, prognosis and the like of esophageal squamous cell carcinoma, and specifically, the esophageal squamous cell carcinoma biomarkers can comprise proteins (such as full-length polypeptides, splice variants, post-translational modified polypeptides and the like) which are differentially expressed in esophageal squamous cell carcinoma subjects, as well as fragments of gene products and corresponding polynucleotide sequences, such as mRNA, DNA and the like.
As shown in fig. 1, the method comprises the steps of:
s10: and searching the biological information database according to the identification information of the esophageal squamous carcinoma to obtain a candidate molecules related to the esophageal squamous carcinoma.
Specifically, the biological information database comprises a Chinese and English academic database, and specifically comprises a Chinese Hopkinson area database, a Wanfang database and a Web of science NCBI database. The literature related to the ESCC is retrieved from the chinese-english academic database, and the 48 candidate molecules related to the ESCC are obtained through sorting, see table 1 specifically.
TABLE 1 48 candidate molecules associated with ESCC
Figure BDA0002785282110000041
Figure BDA0002785282110000051
S20: and acquiring the gene expression values of the a candidate molecules.
Specifically, the gene expression values of the above 48 candidate molecules associated with ESCC were obtained.
S30: and constructing an interaction network of the a candidate molecules according to the gene expression value.
Specifically, an interaction network of a candidate molecules is constructed by using a software tool NetBox.
NetBox is a Java-based software tool developed to perform analysis of human protein interaction networks. The tool is based on a Human Interaction Network (HIN), which consists of four data sources: HPRD, Reactome, NCI-Nature PID databases, and MSKCC Cancer Cell Map, are able to join genes into a network through some computation and identify significant "linker" genes, while partitioning the network into modules.
S40: and screening a plurality of candidate molecules for the first time based on the interaction network to filter part of the candidate molecules and reserve b candidate molecules meeting preset conditions, wherein b is less than a.
Constructing an interaction network of a plurality of candidate molecules based on NetBox software, wherein the parameters of the interaction network comprise: the shortest path (short path threshold) is 1, the P value (P value threshold) is less than 0.05, and candidate molecules with an interaction smaller than a preset value among 48 candidate molecules are filtered out, so as to obtain 17 candidate molecules satisfying the interaction network parameters, which is specifically shown in table 2.
TABLE 2 17 candidate molecules with interactions equal to or less than the preset value
CCNA2 CD44 MDM2 TRAM1 RRM2B
CCND1 EGFR MLH1 RAC3 SFN
BRCA1 CDKN2A PTGS2 PIK3CA TP53
VEGFA RAD51
Note that 17 candidate molecules are inputted into STRING: (https://string-db.org/) After the website is completed and the intermolecular relationship score is set to 0.7, candidate molecules which are not strongly linked to other candidate molecules are filtered out, and 16 candidate molecules shown in fig. 6 are obtained.
S50: and performing secondary screening on b candidate molecules reserved after the primary screening based on a prognosis prediction model to obtain c preferred molecules, wherein c is less than b.
Specifically, 17 candidate molecules were randomly combined to give 217And (3) screening the 131071 candidate molecule combinations based on a prognosis prediction model to obtain preferred candidate molecule combinations, counting the occurrence frequencies of 17 candidate molecules in the candidate molecule combinations, and selecting c preferred molecules with the highest occurrence frequency.
S60: and (4) screening out the biomarkers for predicting the prognosis of the esophageal squamous cell carcinoma from the c preferential molecules.
Specifically, c preferential molecules obtained by screening in step S50 are further screened to obtain biomarkers for predicting esophageal squamous cell carcinoma prognosis.
Further, based on independent verification experiments, independent verification was performed in several gene expression profile samples. And carrying out accurate quantitative PCR (polymerase chain reaction) experiments on the biomarkers, detecting gene expression values of the biomarkers in the esophageal squamous carcinoma tissue sample and the tissue sample beside the carcinoma, and calculating the difference value of the gene expression values or the difference multiple of the gene expression values of the biomarkers in the esophageal squamous carcinoma tissue sample and the tissue sample beside the carcinoma. The gene expression values are taken as factors to carry out survival analysis, the samples are divided into a High risk (High) group and a Low risk (Low) group according to the difference value of the gene expression values of the SFN, and the specific survival analysis results are shown in figures 8-10.
Different from the prior art, the candidate molecules related to esophageal squamous cell carcinoma are obtained from the biological information database, and basically all the candidate molecules related to esophageal squamous cell carcinoma are included. Meanwhile, the method screens a plurality of candidate molecules for the first time based on an interaction network of the candidate molecules to filter partial candidate molecules, screens the candidate molecules reserved after the first screening for the second time based on a prognosis prediction model to obtain preferred molecules, screens biomarkers for predicting esophageal squamous cell carcinoma prognosis from the preferred molecules, and screens the biomarkers in a classification mode through a plurality of models to enable the credibility of screening results to be higher and the accuracy of the screening results to be higher.
As shown in fig. 2, in an embodiment, step S20 includes the following steps:
s21: obtaining a plurality of pairs of Gene Expression profiles in a GEO database (Gene Expression Omnibus database), wherein each pair of Gene Expression profiles comprises a Gene Expression profile of an esophageal squamous carcinoma tissue sample and a Gene Expression profile of a tissue sample beside carcinoma.
Specifically, in the step, the data information of the molecular chip (GEO number is GSE53625) of the same esophageal squamous cell carcinoma patient queue is retrieved from a GEO database (https:// www.ncbi.nlm.nih.gov/GEO /), and the gene expression profiles of 179 esophageal squamous cell carcinoma tissue samples and paracancerous tissue samples from the same esophageal squamous cell carcinoma patient, namely 179 gene expression profile samples, are extracted from the data information of the molecular chip.
S22: the gene probe sequences of the chip platform were aligned to the human reference genome (GRCh37) to re-annotate the chip platform.
Specifically, the chip platform is a microarray chip platform GPL18109, and the re-annotation of the chip platform is completed by using a gene annotation method based on sequence similarity, using a human reference genome (GRCh37) in a Gencode database as a genome annotation file, and aligning a gene probe sequence of the chip platform to the human reference genome (GRCh37) using Gencode and SeqMap.
S23: and extracting the gene expression values of the corresponding a candidate molecules from the plurality of pairs of gene expression profiles according to the chip platform after re-annotation.
Specifically, a search is performed based on the gene expression profile (GEO No. GSE53625) of the GEO database of step S21 of the chip platform after the re-annotation, and the corresponding gene expression values of the 48 candidate factors are extracted.
In one embodiment, the gene expression value of each candidate molecule in step S23 is the difference between the gene expression value of the corresponding esophageal squamous carcinoma tissue sample and the gene expression value of the paracarcinoma tissue sample.
Specifically, the difference between the gene expression value of the esophageal squamous carcinoma tissue sample and the gene expression value of the tissue sample beside the carcinoma is used as the gene expression value of the candidate molecule, and the gene expression value of the candidate molecule is used as the input data of all subsequent calculation works.
As shown in fig. 3, in an embodiment, before step S50, the method further includes:
s70: and (4) randomly combining b candidate molecules reserved after the first screening to obtain m candidate molecule combinations.
Randomly combining 17 candidate molecules in table 2 to obtain 217-1-131071 candidate molecule combinations.
S80: and constructing m models respectively corresponding to the m candidate molecule combinations by adopting various classifier algorithms.
Wherein the classifier algorithm comprises at least: logistic Regression (LR), Support Vector Machine (SVM) algorithms, Artificial Neural Network (ANN), Random Forest (RF) algorithms, and eXtreme Gradient Boosting (XGBoost) algorithms.
Wherein LR, SVM and ANN are weak classifier algorithms, and RF and XGboost are strong classifier algorithms. And constructing 131071 models by using the 5 classifier algorithms corresponding to the 131071 candidate molecule combinations respectively.
S90: and acquiring a training sample set and a testing sample set, wherein the training sample set comprises e pairs of gene expression profiles, and the testing sample set comprises the remaining f pairs of gene expression profiles.
Specifically, the 179 gene expression profile samples in step S21 are divided into a training sample set and a test sample set, where the training sample set includes 134 gene expression profile samples, and the test sample set includes the remaining 45 gene expression profile samples.
S100: and (3) taking whether the life cycle is longer than the preset time as a label, and training the m models according to the training sample set by adopting a cross validation and parameter optimization method to obtain the best candidate models of the n models in various classifier algorithms.
Specifically, for the 131071 models, parameter optimization and model establishment were performed in a training sample set (134 gene expression profile samples) by a cross-validation method, respectively. When the model quality parameter is the Area Under the working Characteristic Curve (AUC) of a subject, if the AUC of a model is larger than the average value of the AUC, the model is determined as the candidate model with the best model quality parameter, thereby obtaining n candidate models in various classifier algorithms, wherein, 1000 < n < 131071, and n is an integer.
S110: and for each candidate model, inputting each pair of gene expression profiles in the test sample set into the candidate model to obtain the prognosis prediction model with the best p model quality parameters in various classifier algorithms.
When the model quality parameter is the Area Under the working Characteristic Curve (AUC) of the testee, according to the descending order of the quality parameters, the candidate model with the AUC arranged at the top 1000 bits is determined as the prognosis prediction model in various classifier algorithms, and the best prognosis prediction model with 1000 quality parameters is obtained.
As shown in fig. 4, in an embodiment, step S50 includes:
s51: and counting the occurrence frequency of each candidate molecule in p candidate molecule combinations corresponding to the p prognosis prediction models, and selecting c candidate molecules with the maximum occurrence frequency as preferred molecules.
Specifically, the occurrence probability of each candidate molecule in the 1000 feature combinations corresponding to the 1000 prognosis prediction models with the best quality parameters is counted, and the 5 candidate molecules with the highest occurrence probability in each classifier algorithm are selected as the preferred molecules of the corresponding classifier algorithm, which is specifically shown in table 3 and fig. 7.
TABLE 3 preferred numerator of the respective classifier algorithms
Figure BDA0002785282110000091
Step S60 includes:
s61: and taking the intersection of c preferential molecules in all classifier algorithms to screen out the biomarker for predicting esophageal squamous cell carcinoma prognosis.
Specifically, as shown in fig. 6, of the 5 classifier algorithms (LR, SVM, ANN, RF, XGBoost), LR screened 2 other four classifier algorithms did not screen out preferred molecules: TP53 and EGFR; the SVM screens out 1 other four classifier algorithms, and no preferred molecule is screened out: TRAM 1; the ANN screened 1 other four classifier algorithms did not screen out preferred molecules: CCND 1; 1 other three classifier algorithms, which are selected by LR and RF, do not screen out preferred molecules: RAC 3; both RF and ANN screened 1 other three classifier algorithms that did not screen out preferred molecules: MDM 2; 1 other three classifier algorithms, which are selected by both RF and XGboost, do not screen out the preferred molecules: PIK3 CA; the XGboost and the SVM screen 1 other three classifier algorithms, and no optimal molecule is screened out: VEGFA; the LR is screened out from the SVM, the ANN, the RF and the XGboost to obtain the optimal molecules: PTGS 2; the LR, SVM, ANN and XGboost screen the RF and do not screen the preferred molecules: CD 44; whereas the above 5 classifier algorithms all screen out the same preferred molecule: SFN. SFN was therefore determined as a biomarker for predicting esophageal squamous carcinoma prognosis.
Through the steps, the SFN is finally determined to be the biomarker for predicting the prognosis of the esophageal squamous cell carcinoma.
Further, based on independent verification experiments, independent verification was performed in several gene expression profile samples. And performing a precise quantitative PCR experiment on the SFN, detecting the gene expression values of the SFN in the esophageal squamous carcinoma tissue sample and the paracarcinoma tissue sample, and calculating the difference value of the gene expression values of the SFN in the esophageal squamous carcinoma tissue sample and the paracarcinoma tissue sample or the difference multiple of the gene expression values. And performing survival analysis by taking SFN as a factor.
In addition, the embodiment of the application also provides a biomarker for predicting the prognosis of the esophageal squamous cell carcinoma, and the biomarker is obtained by screening the biomarker for predicting the prognosis of the esophageal squamous cell carcinoma by the above-mentioned screening method.
Specifically, the biomarker is SFN, and SFN is mRNA expression level of a gene or protein expression level thereof.
It should be noted that the model construction in the present application is performed in R version 3.6.3(http:// www.r-project. org). Wherein the weak classifier uses the following software package: bestglm, e1071, and nnet; the ensemble learning algorithm uses the following software package: randomForest and xgboost; survival analysis the following software package was used: a surfminer.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium, on which computer-readable instructions are stored, and the computer-readable instructions, when executed by a processor, implement the steps of the method for screening biomarkers for predicting esophageal squamous carcinoma prognosis, as described in the above embodiment.
Finally, the application obtains the candidate molecules associated with the esophageal squamous cell carcinoma from the biological information database, and basically covers all the candidate molecules associated with the esophageal squamous cell carcinoma. Meanwhile, the method screens a plurality of candidate molecules for the first time based on an interaction network of the candidate molecules to filter partial candidate proteins, screens the candidate molecules reserved after the first screening for the second time based on a prognosis prediction model to obtain preferred molecules, screens biomarkers for predicting esophageal squamous cell carcinoma prognosis from the preferred molecules, and screens the biomarkers in a classification mode through a plurality of models to enable the credibility of screening results to be higher and the accuracy of the screening results to be higher.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method of screening for biomarkers for predicting esophageal squamous carcinoma prognosis, the method comprising:
retrieving a biological information database according to the identification information of the esophageal squamous carcinoma to obtain a candidate molecules related to the esophageal squamous carcinoma;
obtaining gene expression values of a candidate molecules;
constructing an interaction network of a candidate molecules according to the gene expression value;
performing a first screening on a plurality of candidate molecules based on the interaction network to filter part of the candidate molecules and reserve b candidate molecules meeting preset conditions, wherein b is less than a;
b candidate molecules reserved after the first screening are screened for the second time based on a prognosis prediction model to obtain c preferred molecules, wherein c is less than b;
screening the biomarkers for predicting esophageal squamous carcinoma prognosis from the c preferential molecules.
2. The screening method of claim 1, wherein obtaining gene expression values for a plurality of the candidate molecules comprises:
acquiring a plurality of pairs of gene expression profiles in a GEO database, wherein each pair of gene expression profiles comprises a gene expression profile of an esophageal squamous carcinoma tissue sample and a gene expression profile of a tissue sample beside carcinoma;
comparing the gene probe sequences of the chip platform to a human reference genome to re-annotate the chip platform;
and extracting the corresponding gene expression values of a candidate molecules from a plurality of pairs of gene expression profiles according to the chip platform after re-annotation.
3. The screening method according to claim 2, wherein the gene expression value of each candidate molecule is the difference between the gene expression value of the corresponding esophageal squamous carcinoma tissue sample and the gene expression value of the tissue sample adjacent to the carcinoma.
4. The screening method of claim 1, wherein said constructing an interaction network of a plurality of said candidate molecules based on said gene expression values comprises:
constructing an interaction network of a plurality of the candidate molecules based on NetBox software, wherein the parameters of the interaction network comprise: the shortest path (shortest path threshold) is 1, and the P value (P value threshold) < 0.05.
5. The screening method of claim 1, wherein before the second screening of b candidate molecules retained after the first screening based on the prognostic prediction model to obtain c preferred molecules, the method further comprises:
b candidate molecules reserved after the first screening are randomly combined to obtain m candidate molecule combinations;
constructing m models corresponding to the m candidate molecule combinations respectively by adopting a plurality of classifier algorithms, wherein the classifier algorithms at least comprise: a logistic regression algorithm, a support vector machine algorithm, an artificial neural network algorithm, a random forest algorithm and an extreme gradient boosting algorithm;
acquiring a training sample set and a test sample set, wherein the training sample set comprises e pairs of gene expression profiles, and the test sample set comprises the remaining f pairs of gene expression profiles;
taking whether the life cycle is longer than preset time as a label, and training m models according to the training sample set by adopting a cross validation and parameter optimization method to obtain n candidate models with the best quality parameters of the models in various classifier algorithms;
for each candidate model, inputting each pair of the gene expression profiles in the test sample set into the candidate model to obtain the prognosis prediction model with the best p model quality parameters in various classifier algorithms.
6. The screening method according to claim 5,
when the model quality parameter is the area under the working characteristic curve (AUC) of the subject, determining n models with AUC greater than the average of AUC as the best candidate models for the model quality parameter in each classifier algorithm;
when the model quality parameter is the area under the working characteristic curve (AUC) of the subjects, determining the p candidate models with the maximum AUC as the prognosis prediction model with the best model quality parameter in various classifier algorithms in descending order.
7. The screening method of claim 6, wherein said second screening of b candidate molecules retained after the first screening based on the prognostic prediction model to obtain c preferred molecules comprises:
counting the occurrence times of each candidate molecule in p candidate molecule combinations corresponding to p prognosis prediction models in various classifier algorithms, and selecting c candidate molecules with the largest occurrence times as the preferred molecules in various classifier algorithms;
the screening of the biomarkers for predicting esophageal squamous carcinoma prognosis from c of the priority molecules comprises:
taking the intersection of c said preferred molecules in all said classifier algorithms to screen out said biomarker for predicting esophageal squamous carcinoma prognosis.
8. A computer readable storage medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method of any one of claims 1-7.
9. A biomarker for predicting the prognosis of esophageal squamous carcinoma, wherein the biomarker is selected by the method of any one of claims 1 to 7.
10. The biomarker of claim 9,
the biomarker comprises SFN, and the SFN is the mRNA expression quantity of the gene or the protein expression quantity of the gene.
CN202011295497.6A 2020-11-18 2020-11-18 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis Pending CN112397153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011295497.6A CN112397153A (en) 2020-11-18 2020-11-18 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011295497.6A CN112397153A (en) 2020-11-18 2020-11-18 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis

Publications (1)

Publication Number Publication Date
CN112397153A true CN112397153A (en) 2021-02-23

Family

ID=74606519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011295497.6A Pending CN112397153A (en) 2020-11-18 2020-11-18 Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis

Country Status (1)

Country Link
CN (1) CN112397153A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113777311A (en) * 2021-09-16 2021-12-10 郑州大学 ELISA kit for auxiliary diagnosis of esophageal squamous cell carcinoma
CN117594133A (en) * 2024-01-19 2024-02-23 普瑞基准科技(北京)有限公司 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108866185A (en) * 2017-05-16 2018-11-23 肿瘤学风险公司 Method for predicting the medicine response in cancer patient
CN111575376A (en) * 2020-05-14 2020-08-25 复旦大学附属肿瘤医院 Combined genome for evaluating kidney clear cell carcinoma prognosis and application thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108866185A (en) * 2017-05-16 2018-11-23 肿瘤学风险公司 Method for predicting the medicine response in cancer patient
CN111575376A (en) * 2020-05-14 2020-08-25 复旦大学附属肿瘤医院 Combined genome for evaluating kidney clear cell carcinoma prognosis and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ETHAN CERAMI等: "Automated Network Analysis Identifies Core Pathways in Glioblastoma", 《PLOS ONE》, vol. 5, no. 2, pages 8918 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113777311A (en) * 2021-09-16 2021-12-10 郑州大学 ELISA kit for auxiliary diagnosis of esophageal squamous cell carcinoma
CN113777311B (en) * 2021-09-16 2023-08-01 郑州大学 ELISA kit for auxiliary diagnosis of esophageal squamous carcinoma
CN117594133A (en) * 2024-01-19 2024-02-23 普瑞基准科技(北京)有限公司 Screening method of biomarker for distinguishing uterine lesion type and application thereof

Similar Documents

Publication Publication Date Title
US20240112811A1 (en) Methods and machine learning systems for predicting the likelihood or risk of having cancer
JP7368483B2 (en) An integrated machine learning framework for estimating homologous recombination defects
US20230114581A1 (en) Systems and methods for predicting homologous recombination deficiency status of a specimen
CN112086129B (en) Method and system for predicting cfDNA of tumor tissue
US20210327534A1 (en) Cancer classification using patch convolutional neural networks
US20180166170A1 (en) Generalized computational framework and system for integrative prediction of biomarkers
US20200219587A1 (en) Systems and methods for using fragment lengths as a predictor of cancer
Dohmen et al. Identifying tumor cells at the single-cell level using machine learning
US9940383B2 (en) Method, an arrangement and a computer program product for analysing a biological or medical sample
CN108021788B (en) Method and device for extracting biomarkers based on deep sequencing data of cell free DNA
US11929148B2 (en) Systems and methods for enriching for cancer-derived fragments using fragment size
CN112397153A (en) Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis
CN111833963A (en) cfDNA classification method, device and application
CN112382341B (en) Method for identifying biomarkers related to prognosis of esophageal squamous carcinoma
US20180181705A1 (en) Method, an arrangement and a computer program product for analysing a biological or medical sample
Yazici et al. New Approach for Risk Estimation Algorithms of BRCA1/2 Negativeness Detection with Modelling Supervised Machine Learning Techniques
Jopek et al. Deep Learning-based, multiclass approach to cancer classification on liquid biopsy data
US20240136018A1 (en) Component mixture model for tissue identification in dna samples
WO2023102786A1 (en) Application of gene marker in prediction of premature birth risk of pregnant woman
WO2024086226A1 (en) Component mixture model for tissue identification in dna samples
CN117953965A (en) Classification prediction method and device for tumors and electronic equipment
Dohmen et al. Identifying tumor cells at the single cell level
WO2024079279A1 (en) Disease characterisation
CN114446389A (en) Tumor neoantigen characteristic analysis and immunogenicity prediction tool and application thereof
Yazici et al. Research Article New Approach for Risk Estimation Algorithms of BRCA1/2 Negativeness Detection with Modelling Supervised Machine Learning Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination