CN114203256B - MIBC typing and prognosis prediction model construction method based on microbial abundance - Google Patents
MIBC typing and prognosis prediction model construction method based on microbial abundance Download PDFInfo
- Publication number
- CN114203256B CN114203256B CN202210148697.1A CN202210148697A CN114203256B CN 114203256 B CN114203256 B CN 114203256B CN 202210148697 A CN202210148697 A CN 202210148697A CN 114203256 B CN114203256 B CN 114203256B
- Authority
- CN
- China
- Prior art keywords
- microorganism
- differential
- bladder cancer
- muscle
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Genetics & Genomics (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Crystallography & Structural Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a Muscle Invasive Bladder Cancer (MIBC) typing method, which carries out molecular typing on MIBC according to the tissue microorganism abundance spectrum characteristics of a MIBC patient. The invention also discloses an MIBC prognosis prediction model based on the typing method and a construction method thereof. The method utilizes RNA-seq data of a muscle-invasive bladder cancer transcriptome in a TCGA database to analyze and obtain microbial data of an MIBC patient, then NMF clustering is carried out by taking a microbial abundance spectrum as a characteristic, and molecular typing of an MIBC microbial layer is established, so that the correlation between microbes and MIBC is deeply analyzed from a tumor tissue microbial layer, and an MIBC prognosis prediction model is established, and is helpful for accurately predicting the 1-5-year survival rate of the MIBC patient.
Description
Technical Field
The invention relates to the field of urinary system tumors, in particular to the field of Muscle-invasive bladder cancer (MIBC), and more particularly relates to a Muscle-invasive bladder cancer typing method based on a microbial abundance spectrum and a construction method of a prognosis prediction model.
Background
Bladder cancer is one of the most common malignancies of the urinary system. Worldwide, there are about 54 new cases of bladder cancer and 18.8 deaths. In 2016, the Chinese prediction data show 8.05 new cases of bladder cancer, wherein 6.21 new cases of male and 1.84 new cases of female; 3.29 ten thousand deaths, of which 2.51 ten thousand men (11 th malignant tumor in men) and 0.78 ten thousand women.
Bladder cancer typically originates in the epithelium (urothelium) that covers the inner surface of the bladder. Bladder cancer includes urothelial (transitional cell) carcinoma, squamous cell carcinoma, glandular cell carcinoma, umbilical duct carcinoma, Mullerian duct malignancy, neuroendocrine tumor, mesenchymal tumor, mixed type cancer, sarcomatoid cancer, metastatic cancer, etc., according to WHO classification of urinary tract tumors in 2004. Wherein, the urinary bladder urothelial carcinoma is the most common and accounts for more than 90 percent of the bladder carcinoma, and the bladder squamous cell carcinoma accounts for about 3 to 7 percent; the bladder adenocarcinoma proportion is less than 2%.
Approximately 75% of the first-diagnosed patients were NMIBC (non-muscle invasive bladder cancer), the remaining 25% were MIBC or metastatic disease. NMIBC has a higher risk of progression with 29% and 74% probability of progression for 1 and 5 years, respectively. MIBC has infiltrated the muscle layer of the bladder wall and is therefore more aggressive. It spreads more easily and is more difficult to treat, as is other aggressive cancers. The 5-year survival rate of locally advanced MIBC is 36.3% -69.5%, and the 5-year survival rate of metastatic diseases is only 4.6%.
Treatment modalities for MIBC patients include: radical cystectomy, partial cystectomy, neoadjuvant/adjuvant chemotherapy, and comprehensive treatment of retained bladder. Radical cystectomy is the standard treatment for MIBC patients with clinical staging cT 2-T4 aN0M0, but its 5-year OS (overall survival time) is about 50%. In order to improve the therapeutic effect, cisplatin-based combined neoadjuvant chemotherapy has been widely used. Multiple random trials and meta-analyses showed: after the MIBC patient receives the cisplatin-based new adjuvant chemotherapy, the complete response rate (CR) of the tumor can be obviously improved, the overall survival time (OS) of the patient is prolonged, the death risk of the patient is reduced by 10-13%, the overall survival rate in five years is improved by 5-8%, and the 5-year survival rate of the cT3 patient can be improved by 11%.
However, MIBC is a highly heterogeneous tumor with inconsistent clinical course and prognosis. Currently, a range of clinical and pathological parameters are used for risk stratification of bladder cancer, including tumor number, size, recurrence rate, staging and grading of primary tumors, lymph node metastasis and histological variant types. However, these parameters still have certain limitations for the determination of clinical prognosis and may influence the selection of clinical treatment scheme and treatment effect. Therefore, there is a need for more accurate methods of biological typing for MIBC.
There have been a number of studies to classify MIBC into corresponding subtypes according to different molecular characteristics through large-scale transcriptome analysis. As with breast cancer, each RNA-based subtype appears to have distinct characteristics at both prognostic and therapeutic targets. Since 2012, 4 research centers have proposed comprehensive molecular typing schemes for bladder cancer, four centers being: cancer Genome Atlas (TCGA) typing, University of North Carolina (NCU) typing, Anderson Cancer Center (University of Texas M.D. Anderson Cancer Center, MDA) typing, and University of Longde (Lund University, Lund) typing.
NCU (north carolina university) typing: the molecular subtype, luminal (luminal) cell type and basal (basal) cell type of the high-grade bladder cancer 2 are provided, the similarity of the subtype of the bladder cancer and the breast cancer is revealed, the significance of layered treatment is realized, and the follow-up research is triggered.
Andreson cancer center MDA typing: the total transcriptome mRNA of 73 urethroexcised MIBC tissues was analyzed, and 3 subtypes were proposed by hierarchical analysis: basal-like (basal) cell type, luminal-like (luminal) cell type, and p53-like type (p53-like), basal tumors are associated with shorter overall survival (median 25.0 months, p =0.011) and disease-specific survival (median 25.3 months, p =0.004) compared to the remaining two subtypes, and basal-rich in basal biomarkers. Different typing of MDA reveals different clinical significance. Basal tumors are associated with squamous cell differentiation, P53-like typing is resistant to neoadjuvant chemotherapy, and MDA typing first suggests that Basal needs to be subdivided.
TCGA typing was based on high throughput sequencing analysis of DNA, RNA and protein data of 412 MIBC patients and 5 mRNA expression subtypes were determined using unbiased NMF consensus clustering of RNA-Seq data (n = 408): luminal, luminal invasive, basal squamous cell, neuronal, and luminal papillary types, combined with bayesian nmf for expression cluster analysis, summarize key drivers for each subtype, and propose treatment strategies that may be suitable for a variety of clinical scenarios, including perioperative treatment (neoadjuvant and adjuvant therapy) combined with radical cystectomy, systemic treatment combined with local radiotherapy, or systemic treatment of measurable metastatic disease.
In general, for molecular typing of MIBC, multiple centers propose their own classification schemes, but the multiple inconsistent subtype classifications prevent their clinical application. Thus, the molecular typing consensus on MIBC was published in European neurology by the bladder molecular group consisting of multiple institutions in 2020, and a consensus typing classifier was constructed mainly using 18 previously published studies including RNA sequencing data of 1750 patients with MIBC. These classifiers are incorporated into the R-package (https:// github. com/cit-bioinfo/BLCAsubstyping), which is essentially a weighted network that constructs one subtype result, uses Cohen/s kappa metric to quantify the similarity between subtypes from different classification systems, and applies Markov clustering algorithms to identify robust network substructures corresponding to potential consensus classes. MIBC molecule EU typing consensus divides bladder cancer into 6 subtypes, and establishing consensus contributes to the clinical transformation of typing.
Although the EU consensus for MIBC was established by 2020, the existing molecular typing of MIBC is mainly based on expression clustering analysis performed on RNA transcriptome data, and many dimensional factors have not been taken into consideration in MIBC typing, such as: a microorganism. It is not known whether the information provided by the RNA transcriptome covers the full information required for the development of cancer. The development of cancer is an extremely complex process, and the change of genome of a host is accompanied by the influence of external microorganisms on the host. Over the past decade, microbial communities have been thought to affect the development, progression, metastasis formation and therapeutic response of various cancer types. Analysis is performed only from the host genome level and important information of tumor tissue microorganisms related to the development of cancer may be missed. The research of tumor tissue microorganisms can better understand the tumor microenvironment, uncover the influence of the microorganisms in the tumor on different phenotypes of the tumor, and is helpful for predicting the potential effect of certain treatments, thereby guiding clinical diagnosis and treatment.
Disclosure of Invention
One of the technical problems to be solved by the invention is to provide a muscle layer invasive bladder cancer typing method based on microbial abundance, which can be used for establishing a prognosis prediction model to guide the prognosis of MIBC patients.
In order to solve the technical problems, the muscle-invasive bladder cancer typing method provided by the invention is used for carrying out molecular typing on the muscle-invasive bladder cancer according to the tissue microorganism abundance spectrum characteristics of a patient with the muscle-invasive bladder cancer, and preferably, the muscle-invasive bladder cancer is divided into four molecular subtypes, namely a microorganism I type, a microorganism II type, a microorganism III type and a microorganism IV type.
The typing method may specifically include the steps of:
1) acquiring gene expression profile data and clinical information data of a patient with muscle-invasive bladder cancer, removing a T1 stage sample and a repeated sequencing sample of the same patient, and acquiring an actual analysis sample;
2) obtaining the genome of a known microorganism, and filtering according to the mass fraction;
3) comparing sequencing readings of actual analysis samples to a human reference genome, splitting the sequencing readings which are not aligned with the human reference genome into sequence fragments, and mapping the sequence fragments to the microbial genome obtained in the step 2);
4) carrying out standardization processing on the microbial genome data obtained in the step 3), reducing the influence of sequencing depth and batch effect, and selecting microbial abundance spectrum data of an actual analysis sample;
5) calculating the potential difference between the two absolute values of the abundance values of the microorganisms in the microorganism abundance spectrum obtained in the step 4) in the actual analysis sample, selecting microorganisms with different proportions according to the potential difference between the two absolute values to form a behavior sample, and listing the behavior sample as a microorganism abundance spectrum subset of the actual analysis sample;
6) clustering the microorganism abundance spectrum subset to microorganism typing, and classifying the muscle invasive bladder cancer into four molecular subtypes of microorganism I type, microorganism II type, microorganism III type and microorganism IV type.
In the step 1), the clinical information mainly includes age, disease stage, DSS and OS.
In the step 2), bacteria and archaea can be filtered according to the standard of whether the mass fraction is greater than 0.8, the bacteria and archaea with the mass fraction less than or equal to 0.8 are filtered, and the bacteria and archaea with the mass fraction greater than 0.8 are retained.
Step 4) above, in order to reduce the influence of sequencing depth, the following standardized processing method can be adopted for the microbial data:
filtering out samples lacking clinical information;
filtering low-abundance microorganism data, and carrying out weighted truncation mean standardization of an M-value;
and log-cpm conversion is carried out on the microbial data of each sample, so that the heteroscedasticity of the data is eliminated.
In order to reduce the batch effect influence, the sample types (including primary tumor samples, metastatic tumor samples, normal tissue samples, blood-derived normal samples and new tumor samples) can be used as biological variables, and the sequencing center, the sequencing platform, the experimental method, the sample source mechanism, the formalin-fixed state and the paraffin embedding state can be used as correction variables; the effect of the biological variable is retained and the effect of the correcting variable is removed.
The above step 5), preferably, the values of the potential difference between the absolute centers are arranged in descending order, and the first 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100% of the microorganisms are selected to constitute 20 behavior samples.
And 6) preferably, clustering by adopting a non-negative matrix factorization algorithm.
The second technical problem to be solved by the invention is to provide a construction method of a muscle-invasive bladder cancer prognosis prediction model based on the typing method. The method specifically comprises the following steps:
1) screening microorganisms with different abundance distribution among different muscle-layer invasive bladder cancer types;
2) constructing a lasso model by taking the clinical information of the differential microorganisms screened in the step 1) and the muscle invasive bladder cancer patient as independent variables and the survival time and the survival state of the muscle invasive bladder cancer patient as dependent variables, and screening out the differential microorganisms with nonzero lasso coefficients;
3) constructing a multi-factor regression model by taking the differential microorganisms screened in the step 2) as independent variables and the survival time and survival state of the muscle invasive bladder cancer patient as dependent variables, and screening differential microorganisms conforming to the proportional risk assumption of the Cox regression model;
4) extracting regression coefficients of the differential microorganisms screened in the step 3), and calculating a differential microorganism risk value of each sample;
5) and constructing a prognosis prediction model by taking the differential microorganism risk value and the clinical information as variables.
Preferably, the differential microorganisms selected in step 3) are 49 microorganisms shown in Table 3.
The step 4) above, the calculation formula of the differential microorganism risk value is:
wherein S is the differential microorganism risk value of the sample, beta is the regression coefficient of the differential microorganism, X is the abundance value of the differential microorganism corresponding to the sample, i is the number of the differential microorganism, and n is the total number of the differential microorganisms screened in the step 3).
The clinical information in the above step 2) and step 5) is preferably the stage of the disease and the age of the patient.
The invention also provides a muscle invasive bladder cancer prognosis prediction model constructed by the construction method.
The prognostic predictive model comprises a nomogram whose variables predominantly comprise differential microbial risk values.
The fourth technical problem to be solved by the invention is to provide the usage of the typing method and the prognosis prediction model for the muscle-invasive bladder cancer, and the typing method and the prognosis prediction model can be used for prognosis prediction of patients with the muscle-invasive bladder cancer.
The method utilizes RNA-seq data of a muscle layer invasive bladder cancer transcriptome in a TCGA database to analyze and obtain tissue microorganism data of an MIBC patient, then carries out NMF clustering by taking a microorganism abundance spectrum as a characteristic, establishes molecular typing of an MIBC microorganism layer, establishes a cancer prognosis prediction model according to the correlation between tumor tissue microorganisms and MIBC, and predicts the 1-5-year survival rate of the MIBC patient, thereby realizing accurate prognosis prediction of the MIBC patient.
Drawings
FIG. 1 is a heat map of four subtypes of microorganisms clustered based on NMF in example 1 of the present invention. Wherein the rows represent genera and the columns represent samples.
FIG. 2 shows that the tetrasomy of the microorganisms in example 1 of the present invention is significantly related to the Overall Survival (OS).
FIG. 3 shows that the four-type association of the microorganisms in example 1 of the present invention is significantly related to Disease Free Survival (DSS).
FIG. 4 is a diagram of the selection of a parameter lambda for adjusting the complexity of a model in example 1 of the present invention, wherein the abscissa represents the log-transformed lambda value and the ordinate represents the C-Index.
FIG. 5 is a nomogram for predicting 1-5 year overall survival for MIBC patients in example 1 of the present invention.
FIG. 6 is a graph showing the results of performance evaluation of the MIBC patient prognosis prediction model constructed in example 1 of the present invention.
Detailed Description
For a more detailed understanding of the technical contents, characteristics and effects of the present invention, the technical solutions of the present invention will now be described in detail with reference to the accompanying drawings and specific embodiments.
Example 1
First, collecting MIBC and clinical data
Gene expression profiling data and clinical information data (clinical information mainly including age, disease stage, DSS and OS) of MIBC patients were obtained from the American cancer genome map (TCGA) database (website: https:// portal.gdc.cancer. gov).
The MIBC queue incorporated into this embodiment is: TCGA-BLCA cohort (n = 410), with samples from stage T1 excluded and duplicate samples from the same patient, 406 samples included for actual analysis.
Second, TCGA microbiological detection process
1. Identifying microbial data
A total of 71782 microbial genomes were downloaded using the RepoPholan (https:// bitbucket. org/nsegata/reprophalan) database, of which 5503 were viruses and 66279 were bacteria/archaea. And filtering the bacteria or the archaea according to the standard whether the mass fraction is more than 0.8. After filtration 5503 viruses and 54471 bacteria/archaea remained, i.e. 59974 total microbial genomes, for subsequent analysis.
Sequencing reads that did not align with the known GRCh37 human reference genome (based on the mapping information in the original BAM file) were mapped to the filtered bacterial, archaeal and viral microbial genomes described above using the ultrafast Kraken algorithm (http:// ccb. jhu. edu/software/Kraken /). The sequencing reads (sequencing reads) were split into individual sequence fragments (i.e., k-mers), which were set as the default 31-mers, and the sequence fragments were matched to those microbial fragments known in the RepPholan database after filtration.
2. Microbial data normalization
In order to reduce the batch effect caused by technical differences while preserving the biological effect, the microbial data were subjected to the following normalization process:
1) treatment to reduce the impact of sequencing depth
First, samples of metadata lacking clinical information are filtered out. The "raw count data" was filtered for low abundance microorganisms using the filterByExpr function of R-package edgeR and normalized for the weighted mean of M-values (TMM). The data is then transformed by a voom function in a log-counts per million (log-cpm) per sample mode, and the log-cpm can eliminate the heteroscedasticity of the data so that the data approximately obeys normal distribution, so that the data can be used in the subsequent SNM standardization process.
2) Processing to reduce batch effect effects
Using R package snm (visualized normalization), using sample types (primary tumor sample, metastatic tumor sample, normal tissue sample, blood source normal sample and new tumor sample) as biological variables, and needing to keep the effect; taking these factors as calibration variables, namely, sequencing center, sequencing platform, experimental protocol, tissue source site, fixed-format partial-embedded (ffpe) status, their effects need to be removed, removal method: and setting a bio.var parameter as a sample type, an adj.var parameter as a correcting variable and an rm.adj parameter as TRUE by using a snm function in a snm packet, wherein the effect of removing the correcting variable and the intensity dependence effect are represented. Finally, the microbial data of the MIBC sample, for a total of 1284 microbes, were selected, and classification numbers of these microbes at NCBI (national center for biotechnology information, nih. gov, Taxonomy brown (root)) are shown in table 1.
TABLE 11284 Classification of microorganisms
Thirdly, selecting the characteristics of the microorganisms
Since 406 patients have different abundance values under 1284 microorganisms, we first calculated the mean abundance value of all 406 patient samples under 1284 microorganisms, and then, according to the calculation formula of Mean Absolute Difference (MAD): MAD = mean (| Xi-mean (X) |), MAD of 1284 microorganisms in 406 samples in the microorganism abundance spectrum is calculated, wherein Xi is the abundance value of the microorganisms in each sample, i is the sample number (i = 1-406), and mean (X) is the median of 406 abundance values.
Sorting the MAD values in a descending order, selecting microorganisms with top of 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% and 100% to form 20 row samples, and listing the row samples as a microorganism abundance spectrum subset of the samples for subsequent clustering.
Fourth, microorganism typing
And (3) taking each microorganism abundance spectrum subset as input data, performing NMF clustering on the basis of each different microorganism abundance spectrum subset by using an R-pack (CRAN. R package version 0.22.0) NMF (Non-negative Matrix Factorization) algorithm, and clustering a plurality of microorganism abundance spectrum subsets into microorganism typing. Because the normalized microbial data has a negative value, the microbial data is converted first, and a minimum constant is added on the basis of the original matrix, so that the whole matrix is converted into a positive number for subsequent NMF clustering. Setting the operation execution times to be 20, setting the calculation method to be 'brunet', designating the rank of the decomposition dimension to be 2-6, and respectively obtaining the results of clustering into 2-6 subtypes. The optimal rank =4 (i.e. the point where the value of the coherence varies the most with K) is finally determined by the coherence coefficient and the contour coefficient, as shown in fig. 1. MIBC are optimally classified into four molecular subtypes, microorganism type I, microorganism type II, microorganism type III and microorganism type IV, according to the prognostic survival status (OS status/DSS status) of each patient in TCGA.
Fifth, according to the microbial typing prediction prognosis verification
1. Single factor Kaplan-Meier survival analysis, log rank test differences
Inputting the survival time (OS time/DSS time) and the survival state (OS status/DSS status) of the patient and the typing information of the patient by using the survivval R packet, and calculating the survival difference p value of the patient among different types, wherein the p value of OS is 0.00075, the p value of DSS is 0.039, and the p value less than 0.05 represents that the survival of the patients of different types is remarkably different.
2. Multifactor Cox proportional Risk regression model analysis, wald test to test for differences
Using R packages survivval and survivor, inputting the survival time (OS time/DSS time) and survival state (OS status/DSS status) of the patient, the typing information of the patient and the age and stage information of the patient, and carrying out multi-factor survival analysis, wherein the influence of each factor on the survival is calculated by taking the category (microorganism I and disease clinical stage I and II) with the longest median survival time, namely the best prognosis, as reference in all classification variables (typing and disease stage), wherein the p value between the microorganism I and the microorganism II is 0.964, the p value between the microorganism I and the microorganism III is 0.024, the p value between the microorganism I and the microorganism IV is 0.046, and the p value is less than 0.05, which indicates that the factor has a significant influence on the survival of the patient.
3. Survival curve visualization
And (4) visually displaying the survival conditions of different typing patients by utilizing a ggsurvplot function.
ggsurfplot (surffit (Surv (OS. time, OS. status) -cluster, data), as shown in fig. 2, the microbial typing significantly correlated with the Overall Survival (OS) of the patient, P = 0.00075.
ggsurfplot (surffit (surfv (DSS. time, DSS. status) -cluster, data), data), as shown in fig. 3, the typing of microorganisms also significantly correlated with disease-specific survival (DSS), P = 0.039.
Sixth, MIBC survival prediction model is constructed based on microorganism subtype labels
1. Calculating differential microorganisms
And (3) testing microorganisms with differences calculated among four types of microorganisms I, II, III and IV by using kruskost, wherein the p value is corrected by FDR, and if the corrected p value is less than 0.05, the abundance distribution of the microorganisms among the four types is considered to have differences. A total of 1147 differential microorganisms were obtained after the final examination.
2. Construction of survival prediction model using differential microorganisms
1147 differential microorganisms and clinical information (age, disease stage) of the patient were taken as independent variables x, and the survival time and survival status of the patient were taken as dependent variables y.
Cross validation is performed using the cv. The specific operation is as follows: the family parameter is set to "cox" (i.e., the regression model is defined as the cox risk ratio model), the nfolds parameter is set to 5 (i.e., the verification method is set to 5-fold cross-validation), the type. And lambda.min is the optimal lambda value obtained by cross validation, and optimal lambda =0.05148491 (lambda is a parameter for adjusting the complexity of the model, the parameter selection method is shown in fig. 4, the abscissa is a log-transformed lambda value, the ordinate is C-Index, the higher the C-Index value is, the higher the precision is, and the value of lambda is selected by comprehensively considering the prediction precision and the complexity of the model). The lasso model constructed using this lambda value resulted in the screening of 56 different microorganisms with a nonzero coefficient (see table 2) for subsequent cox model construction.
TABLE 2 differential microorganisms with non-zero coefficients screened by the lasso model
And (2) constructing a multi-factor regression model by using a coxph function of R package survivval, using the obtained 56 differential microorganisms as independent variables x, using the survival time and the survival state of a patient as dependent variables y, evaluating whether the 56 differential microorganisms meet the Proportional Hazards (PH) assumption of the Cox regression model, and if the PH assumption test p value of a certain differential microorganism variable is more than 0.05, determining that the differential microorganism variable meets the PH assumption, wherein the differential microorganism variable can be used for constructing the Cox regression model. The differential microorganism variables are screened again by the principle, and finally 49 differential microorganisms which accord with the PH hypothesis are obtained and used for the final cox regression model construction.
According to the cox regression model obtained above, regression coefficients of the 49 differential microorganisms described above (see table 3) were extracted, and a differential microorganism risk value (risk score) was calculated for each sample, using the formula:
wherein S is the differential microorganism risk value of the sample, beta is the regression coefficient of the differential microorganism, X is the abundance value of the differential microorganism corresponding to the sample, and i is the number of the differential microorganism.
TABLE 349 regression coefficients for the differential microorganisms
Using the differential microbial risk value and clinical characteristics (tumor stage and age) as variables, wherein the tumor stage is a classification variable and comprises stage I, stage II, stage III and stage IV, and the stage I and the stage II are used as references; age is a continuous variable. Establishing a cox prognosis prediction model to obtain a clinical stage III regression coefficient of 0.1851, a clinical stage IV regression coefficient of 0.7124 and an age regression coefficient of 0.0196. Nomogram graphs were built with nomogram function of R package rms to predict the overall survival of patients for 1-5 years. As shown in fig. 5, in the nomogram, the left side is the name of each variable in the prediction model, including differential microbial risk value, disease stage (i.e., tumor stage) and age, the line segment corresponding to the variable represents the value range of the variable, each variable corresponds to a score (see table 4), and the total score is the sum of the corresponding scores after all variables are taken, which can correspond to the survival rate of the patient for 1-5 years (see tables 5-9).
TABLE 4 score values for variables in alignment chart
1-year survival rate corresponding to the total score in the alignment chart of Table 5
2-year survival rates corresponding to the totals in the alignment chart of Table 6
3-year survival rates corresponding to the total scores in the alignment chart of Table 7
4-year survival rates corresponding to the total scores in the alignment chart of Table 8
5-year survival rates corresponding to the total scores in the alignment chart of Table 9
To evaluate and validate the performance of the nomograms (i.e. prognostic prediction models) described above, bootstrap internal validation was performed. Calling a Score function of an R language riskReguration package, calculating AUC (AUC value is the area formed by an ROC curve and a coordinate axis, the closer the AUC value is to 1.0, the better the model performance) of the model respectively predicting the survival rate of 1-5 years, and setting split. The predicted mean AUC for 1-5 year survival from cross validation was 0.87, 0.90, 0.91, 0.92, respectively, see fig. 6.
The above-mentioned embodiments are only possible or preferred embodiments of the present invention, and are not intended to limit the scope of the claims of the present invention, therefore, all equivalent changes and modifications made in the claims of the present invention shall fall within the scope of the present invention.
Claims (13)
1. The method is not used for the purpose of disease diagnosis and treatment, and is characterized in that the method carries out molecular typing on the muscle-invasive bladder cancer according to the tissue microorganism abundance spectrum characteristics of a patient with the muscle-invasive bladder cancer, and divides the muscle-invasive bladder cancer into four molecular subtypes, namely a microorganism I type, a microorganism II type, a microorganism III type and a microorganism IV type; the method comprises the following steps:
1) acquiring gene expression profile data and clinical information data of a patient with muscle-layer invasive bladder cancer, and removing a T1 stage sample and a repeated sequencing sample of the same patient to obtain an actual analysis sample;
2) obtaining the genome of a known microorganism, and filtering according to the mass fraction;
3) aligning the sequencing reading of the actual analysis sample to a human reference genome, splitting the sequencing reading which is not aligned with the human reference genome into sequence fragments, and mapping the sequence fragments to the microbial genome obtained in the step 2);
4) carrying out standardization processing on the microbial genome data obtained in the step 3), reducing the influence of sequencing depth and batch effect, and selecting microbial abundance spectrum data of an actual analysis sample;
5) calculating the potential difference between the two absolute values of the abundance values of the microorganisms in the microorganism abundance spectrum obtained in the step 4) in the actual analysis sample, selecting microorganisms with different proportions according to the potential difference between the two absolute values to form a behavior sample, and listing the behavior sample as a microorganism abundance spectrum subset of the actual analysis sample;
6) clustering the microorganism abundance spectrum subset to microorganism typing, and classifying the muscle invasive bladder cancer into four molecular subtypes of microorganism I type, microorganism II type, microorganism III type and microorganism IV type.
2. The typing method according to claim 1, wherein the clinical information of step 1) comprises age, disease stage, disease specific survival and overall survival.
3. Typing method according to claim 1, wherein step 2), bacteria and archaea with mass fraction less than or equal to 0.8 are filtered out.
4. The typing method according to claim 1, wherein the step 4), the normalization processing method for reducing the influence of the sequencing depth comprises:
filtering out samples lacking clinical information;
filtering low-abundance microorganism data, and carrying out weighted truncation mean standardization of an M-value;
and log-cpm conversion is carried out on the microbial data of each sample, so that the heteroscedasticity of the data is eliminated.
5. The typing method according to claim 1 or 4, wherein the step 4), the normalization processing method for reducing the batch effect influence comprises: taking the sample type as a biological variable, and taking a sequencing center, a sequencing platform, an experimental method, a sample source mechanism, a formalin fixation state and a paraffin embedding state as correction variables; the effect of the biological variable is reserved, and the effect of the correcting variable is removed; the sample types include primary tumor samples, metastatic tumor samples, normal tissue samples, blood-derived normal samples, and new tumor samples.
6. A typing method according to claim 1, wherein in step 5), the first 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100% of microorganisms are selected to constitute 20 behavioral samples according to the value of the potential difference between the absolute values.
7. A method for constructing a prognosis prediction model for muscle-invasive bladder cancer based on the typing method according to any one of claims 1 to 6, which is not used for the purpose of diagnosis and treatment of diseases, comprising the steps of:
1) screening microorganisms with different abundance distribution among different muscle-layer invasive bladder cancer types;
2) constructing a lasso model by taking the clinical information of the differential microorganisms screened in the step 1) and the muscle invasive bladder cancer patient as independent variables and the survival time and the survival state of the muscle invasive bladder cancer patient as dependent variables, and screening out the differential microorganisms with nonzero lasso coefficients;
3) constructing a multi-factor regression model by taking the differential microorganisms screened in the step 2) as independent variables and the survival time and the survival state of the muscle-layer invasive bladder cancer patient as dependent variables, and screening out the differential microorganisms which accord with the proportional risk assumption of the Cox regression model;
4) extracting regression coefficients of the differential microorganisms screened in the step 3), and calculating a differential microorganism risk value of each sample;
5) and constructing a prognosis prediction model by taking the differential microorganism risk value and the clinical information as variables.
9. the method according to claim 7, wherein step 4) the differential microbial risk value is calculated by the formula:
wherein S is the differential microorganism risk value of the sample, beta is the regression coefficient of the differential microorganism, X is the abundance value of the differential microorganism corresponding to the sample, i is the number of the differential microorganism, and n is the total number of the differential microorganisms screened in the step 3).
10. The method of claim 7, wherein the clinical information includes disease stage and age.
11. The muscle invasive bladder cancer prognostic prediction model constructed by the method according to any one of claims 7 to 10, wherein the prognostic prediction model comprises a nomogram, the variables of which include differential microbial risk values.
12. Use of the typing method according to any one of claims 1 to 6 for the prognostic prediction of patients with muscle-invasive bladder cancer, which is not intended for the purpose of diagnosis or treatment of the disease.
13. Use of the prognostic predictive model according to claim 11 in the prognostic prediction of patients with muscle-invasive bladder cancer, which is not intended for the purpose of diagnosis and treatment of the disease.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210148697.1A CN114203256B (en) | 2022-02-18 | 2022-02-18 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210148697.1A CN114203256B (en) | 2022-02-18 | 2022-02-18 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114203256A CN114203256A (en) | 2022-03-18 |
CN114203256B true CN114203256B (en) | 2022-06-21 |
Family
ID=80645680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210148697.1A Active CN114203256B (en) | 2022-02-18 | 2022-02-18 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114203256B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116103381A (en) * | 2022-11-25 | 2023-05-12 | 山东大学 | Analysis method for female vaginal flora composition structure typing |
CN116987789B (en) * | 2023-06-30 | 2024-07-26 | 上海仁东医学检验所有限公司 | UTUC molecular typing, single sample classifier and construction method thereof |
CN117577300A (en) * | 2023-11-10 | 2024-02-20 | 上海仁东医学检验所有限公司 | MIBC multiple-group chemical molecular typing method and prediction system |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609347A (en) * | 2017-08-21 | 2018-01-19 | 上海派森诺生物科技股份有限公司 | A kind of grand transcript profile data analysing method based on high throughput sequencing technologies |
WO2019204338A1 (en) * | 2018-04-16 | 2019-10-24 | The Regents Of The University Of California | Compositions and methods for cytotoxic cd4+t cells |
US20200166513A1 (en) * | 2018-11-09 | 2020-05-28 | Chad Borges | Glycan nodes as cancer markers |
CN109797221A (en) * | 2019-03-13 | 2019-05-24 | 上海市第十人民医院 | A kind of biomarker combination and its application for Myometrial involvement bladder cancer progress molecule parting and/or prognosis prediction |
CN111524555A (en) * | 2020-04-20 | 2020-08-11 | 上海欧易生物医学科技有限公司 | Automatic typing method based on human intestinal flora |
CN112435714B (en) * | 2020-11-03 | 2021-07-02 | 北京科技大学 | Tumor immune subtype classification method and system |
CN112609015A (en) * | 2021-03-08 | 2021-04-06 | 天津奇云诺德生物医学有限公司 | Microbial marker for predicting colorectal cancer risk and application thereof |
-
2022
- 2022-02-18 CN CN202210148697.1A patent/CN114203256B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114203256A (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114203256B (en) | MIBC typing and prognosis prediction model construction method based on microbial abundance | |
CN103733065B (en) | Molecular diagnostic assay for cancer | |
CN111128299B (en) | Construction method of ceRNA regulation and control network with significant correlation to colorectal cancer prognosis | |
CN103299188B (en) | Molecular diagnostic assay for cancer | |
CN113450873B (en) | Marker for predicting gastric cancer prognosis and immunotherapy applicability and application thereof | |
Milanez-Almeida et al. | Cancer prognosis with shallow tumor RNA sequencing | |
CN111128385B (en) | Prognosis early warning system for esophageal squamous carcinoma and application thereof | |
CN113061655B (en) | Gene labels for predicting breast cancer neoadjuvant chemotherapy sensitivity and application thereof | |
CN109859796B (en) | Dimension reduction analysis method for DNA methylation spectrum of gastric cancer | |
CN113066585A (en) | Method for efficiently and quickly evaluating prognosis of stage II colorectal cancer patient based on immune gene expression profile | |
Lin et al. | Evolutionary route of nasopharyngeal carcinoma metastasis and its clinical significance | |
Luo et al. | hsa‐mir‐3199‐2 and hsa‐mir‐1293 as novel prognostic biomarkers of papillary renal cell carcinoma by COX ratio risk regression model screening | |
CN113151460A (en) | Gene marker for identifying lung adenocarcinoma tumor cells and application thereof | |
CN115482880A (en) | Head and neck squamous carcinoma glycolysis related gene prognosis model, construction method and application | |
CN116769914A (en) | Marker for predicting glioma prognosis and application thereof | |
Yang et al. | An integrated model of clinical information and gene expression for prediction of survival in ovarian cancer patients | |
CN115424728A (en) | Method for constructing tumor malignant cell gene prognosis risk model | |
CN115572762A (en) | Non-small cell lung cancer prognosis marker, and construction method and application of prognosis model | |
CN111733252A (en) | Characteristic miRNA expression profile combination and early gastric cancer prediction method | |
Zhang et al. | Significance of Aneuploidy in Predicting Prognosis and Treatment Response of Uveal Melanoma | |
Ahn et al. | Predicting survival outcomes in ovarian cancer using gene expression data | |
Haibe-Kains | Identification and assessment of gene signatures in human breast cancer | |
CN111718997A (en) | Characteristic mRNA expression profile combination and early gastric cancer prediction method | |
CN111718996A (en) | Characteristic lincRNA expression profile combination and early gastric cancer prediction method | |
Esterhuysen | Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |