CN116987789A - UTUC molecular typing, single sample classifier and construction method thereof - Google Patents
UTUC molecular typing, single sample classifier and construction method thereof Download PDFInfo
- Publication number
- CN116987789A CN116987789A CN202310791539.2A CN202310791539A CN116987789A CN 116987789 A CN116987789 A CN 116987789A CN 202310791539 A CN202310791539 A CN 202310791539A CN 116987789 A CN116987789 A CN 116987789A
- Authority
- CN
- China
- Prior art keywords
- lncrna
- subtype
- urinary tract
- urothelial cancer
- gene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000031128 Upper tract urothelial carcinoma Diseases 0.000 title claims description 52
- 238000010276 construction Methods 0.000 title abstract description 6
- 108020005198 Long Noncoding RNA Proteins 0.000 claims abstract description 42
- 206010044412 transitional cell carcinoma Diseases 0.000 claims abstract description 42
- 230000014509 gene expression Effects 0.000 claims abstract description 41
- 210000001635 urinary tract Anatomy 0.000 claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000004393 prognosis Methods 0.000 claims abstract description 21
- 238000012163 sequencing technique Methods 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims abstract description 11
- 108090000623 proteins and genes Proteins 0.000 claims description 40
- 239000000523 sample Substances 0.000 claims description 40
- 206010028980 Neoplasm Diseases 0.000 claims description 15
- 230000004083 survival effect Effects 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 8
- 208000023747 urothelial carcinoma Diseases 0.000 claims description 8
- 201000010099 disease Diseases 0.000 claims description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 210000001519 tissue Anatomy 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000007621 cluster analysis Methods 0.000 claims description 5
- 239000013610 patient sample Substances 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 4
- 108091026890 Coding region Proteins 0.000 claims description 3
- 201000011510 cancer Diseases 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 210000003741 urothelium Anatomy 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 claims 2
- 208000006593 Urologic Neoplasms Diseases 0.000 claims 1
- 238000003205 genotyping method Methods 0.000 claims 1
- 239000003550 marker Substances 0.000 claims 1
- 230000001225 therapeutic effect Effects 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 54
- 230000035772 mutation Effects 0.000 description 13
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 description 8
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 description 8
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 6
- 102000015098 Tumor Suppressor Protein p53 Human genes 0.000 description 6
- 238000003559 RNA-seq method Methods 0.000 description 5
- 230000000391 smoking effect Effects 0.000 description 4
- 230000007067 DNA methylation Effects 0.000 description 3
- 102000012199 E3 ubiquitin-protein ligase Mdm2 Human genes 0.000 description 3
- 108050002772 E3 ubiquitin-protein ligase Mdm2 Proteins 0.000 description 3
- 101000605639 Homo sapiens Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Proteins 0.000 description 3
- 102100038332 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Human genes 0.000 description 3
- 230000033228 biological regulation Effects 0.000 description 3
- 102000016914 ras Proteins Human genes 0.000 description 3
- 102100027768 Histone-lysine N-methyltransferase 2D Human genes 0.000 description 2
- 101001008894 Homo sapiens Histone-lysine N-methyltransferase 2D Proteins 0.000 description 2
- 238000010824 Kaplan-Meier survival analysis Methods 0.000 description 2
- -1 PIK CA Proteins 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000001325 log-rank test Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 208000009458 Carcinoma in Situ Diseases 0.000 description 1
- 108091029523 CpG island Proteins 0.000 description 1
- 108091008794 FGF receptors Proteins 0.000 description 1
- 102100029974 GTPase HRas Human genes 0.000 description 1
- 102100039788 GTPase NRas Human genes 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 101000584633 Homo sapiens GTPase HRas Proteins 0.000 description 1
- 101000744505 Homo sapiens GTPase NRas Proteins 0.000 description 1
- 238000012351 Integrated analysis Methods 0.000 description 1
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 1
- 101150040459 RAS gene Proteins 0.000 description 1
- 101150076031 RAS1 gene Proteins 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 208000037280 Trisomy Diseases 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 102000052178 fibroblast growth factor receptor activity proteins Human genes 0.000 description 1
- 230000002962 histologic effect Effects 0.000 description 1
- 201000004933 in situ carcinoma Diseases 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 210000000244 kidney pelvis Anatomy 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 230000004784 molecular pathogenesis Effects 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 230000002632 myometrial effect Effects 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000000626 ureter Anatomy 0.000 description 1
- 230000002485 urinary effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Oncology (AREA)
- General Engineering & Computer Science (AREA)
- Hospice & Palliative Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a molecular typing method for upper urinary tract urothelial cancer, which is used for carrying out molecular typing on the upper urinary tract urothelial cancer according to the lncRNA characteristics of patients suffering from the upper urinary tract urothelial cancer. The invention also discloses a single sample classifier for identifying the molecular typing of the upper urinary tract urothelial cancer and a construction method thereof. The invention utilizes transcriptome sequencing data of patients with the upper urinary tract urothelial cancer to analyze and obtain lncRNA abundance expression data of the patients, then obtains molecular typing of the upper urinary tract urothelial cancer by screening lncRNA related to prognosis and clustering, and further constructs a single sample classifier capable of identifying the molecular typing of the upper urinary tract urothelial cancer, thereby deeply analyzing the influence of the lncRNA abundance expression characteristics of the patients on prognosis from a molecular level and being beneficial to accurate identification and practical application of the molecular typing of the upper urinary tract urothelial cancer.
Description
Technical Field
The invention relates to the field of urinary oncology medicine, in particular to upper urinary tract urothelial carcinoma (upper urinary tract urothelial carcinoma, UTUC), and more particularly relates to molecular typing of UTUC and construction of a single sample classifier.
Background
UTUC is a type of Urothelial Carcinoma (UC) that occurs in the renal pelvis and ureter. In China, UTUC accounts for about 10% -30% of the total UC, and is significantly higher than the proportion of 5% -10% in Western countries. UTUC has some of the same clinical pathological features as urothelial carcinoma (UBC), but it is also unique, such as hidden onset, high grade, strong invasiveness, and high recurrence rate. Current studies show that factors such as sex, age, stage grade of tumor, lymph node metastasis, etc. may be risk factors affecting their prognosis. Compared with UBC, the research on the molecular mechanism of the development of UTUC is limited, and meanwhile, accurate biomarkers for molecular typing and prognosis are also lacking.
31 UTUC whole exon sequencing and RNA sequencing integration analysis results were reported for the first time by Moss TJ et al, journal of Eur Urol, 2017, U.S. cancer center, md.A., moss T J et al, J.2017, 72 (4): 641-649, J.Moss T J, qi Y, xi L, et al, computer genomic characterization of upper tract urothelial carcinoma [ J ]. European urology). The results of the Whole Exon Sequencing (WES) analysis show that the high frequency mutation of UTUC is FGFR3, KMT2D, PIK CA, TP53, etc. UTUC was classified as type 4 by unsupervised cluster analysis of RNA sequencing data, characterized separately as follows: 1) Type I: no PIK3CA mutation, no smoking history, high grade < pT2 tumor, high recurrence; 2) Type II: 100% FGFR3 mutation, low-grade tumor, smoking history, non-myometrial infiltration disease, no recurrence; 3) Type III: 100% FGFR3 mutation, 71% PIK3CA mutation, no TP53 mutation, 5 recurrence, smoking history, and tumor stage < pT2; 4) Type IV: 62.5% KMT2D mutation, 50% FGFR3 mutation, 50% TP53 mutation, no PIK3CA mutation, high-grade tumor, smoking history, carcinoma in situ, and short survival period.
In 2019, robinson BD et al, the university of Wilconall medical college pathology and inspection medical center, described the molecular characteristics of 37 high-grade UTUCs, the vast majority of which were found to be lumen-papillary by integrated analysis of WES and RNA-seq sequencing data. UTUC has an immune environment with T cell depletion, and highly expresses FGFR3. Furthermore, sporadic UTUC has a lower tumor mutational burden than UBC.
In 2021, the molecular pathogenesis was comprehensively characterized by integrative analysis of gene mutation, copy number variation, DNA methylation and gene expression profile of 199 UTUC samples from the Seishi Ogawa study group, university of kyoto, japan. UTUC is first classified into type 5 by the genetically mutated state of TP53, MDM2, RAS and FGFR 3: hyper mutant (5.5%), TP53/MDM2 (37.7%), RAS (HRAS/KRAS/NRAS, 15.1%), FGFR3 (35.2%), trisomy (6.5%). In addition, five C1-C5 expression profiling subtypes were identified by RNA sequencing of 158 UTUC samples and performing a differential-free clustering analysis. Most FGFR3 mutations and most hyper-mutant subtypes are classified in the C1 expression profile subtype, TP53/MDM2 mutations and triple-negative subtypes are mainly classified in the C3-C5 expression profile subtype, while in most cases they belong to the C2 expression profile subtype with mutations in one of the RAS mutant subtypes and FGFR3 subsets. The authors also performed a robust clustering analysis based on the DNA methylation status of tumor-specific CpG islands, resulting in three subclasses of DNA methylation status.
In general, currently few molecular typing studies for UTUC are underway, and based primarily on genomic and mRNA transcriptome data, there is an urgent need to integrate other dimensional histologic information to further explore the biological processes of invasion, recurrence and progression of disease comprehensively. lncRNA (Long non-coding RNA) refers to non-coding RNA with a length of more than 200 nucleotides, has high heterogeneity, and is mainly involved in gene transcription regulation, post-transcriptional regulation, translational regulation, mediated chromosome modification, and the like. lncRNA can be extracted non-invasively from body fluids, tissues and cells. In recent years, lncRNA has received extensive attention and is believed to be involved in developmental processes and various diseases.
Disclosure of Invention
One of the technical problems to be solved by the invention is to provide a molecular typing method for the upper urinary tract urothelial cancer, which can analyze the prognosis difference between different UTUC patients from the lncRNA level.
In order to solve the technical problems, the molecular typing method for the upper urinary tract urothelial cancer provided by the invention is used for carrying out molecular typing on the upper urinary tract urothelial cancer according to the lncRNA characteristics of an upper urinary tract urothelial cancer patient, and comprises the following steps of:
1) Obtaining tumor tissue transcriptome sequencing data and clinical information of a patient with the urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Screening lncRNA which is related to prognosis of the upper urinary tract urothelium cancer and has large variation of abundance expression value in all patient samples as candidate parting characteristics;
5) And 4) carrying out cluster analysis on all patient samples based on the expression matrix of the candidate parting characteristic obtained in the step 4) to obtain the optimal molecular parting result of the upper urinary tract urothelial cancer.
Step 1) above, the clinical information includes a progression-free survival time and a progression-free survival status.
Step 3) above, the normalization preferably uses a TPM normalization method.
Step 4) above, the classification characteristics are preferably screened by a single factor Cox proportional hazards model, a LASSO model and an absolute median difference value in sequence.
The step 5) is preferably performed by a consensus cluster analysis method. The best molecular typing results in classifying the upper urinary tract urothelial cancer into three molecular subtypes of type I, type II and type III.
The second technical problem to be solved by the present invention is to provide a group of markers for molecular typing of upper urinary tract urothelial cancer, the group of markers comprising 46 lncRNA shown in the following table 1:
TABLE 1
ENSG00000235491 | ENSG00000203706 |
ENSG00000228873 | ENSG00000283684 |
ENSG00000259439 | ENSG00000285280 |
ENSG00000226674 | ENSG00000231246 |
ENSG00000204588 | ENSG00000289326 |
ENSG00000240040 | ENSG00000125462 |
ENSG00000224165 | ENSG00000224616 |
ENSG00000289062 | ENSG00000224559 |
ENSG00000225087 | ENSG00000226780 |
ENSG00000203709 | ENSG00000229021 |
ENSG00000233593 | ENSG00000225643 |
ENSG00000175147 | ENSG00000225077 |
ENSG00000291077 | ENSG00000287670 |
ENSG00000189223 | ENSG00000289305 |
ENSG00000226994 | ENSG00000286572 |
ENSG00000227088 | ENSG00000230186 |
ENSG00000289033 | ENSG00000228971 |
ENSG00000238122 | ENSG00000287064 |
ENSG00000228794 | ENSG00000224875 |
ENSG00000228044 | ENSG00000287628 |
ENSG00000289077 | ENSG00000287305 |
ENSG00000228852 | ENSG00000231407 |
ENSG00000289483 | ENSG00000288007 |
。
The invention provides a single sample classifier for classifying the urothelial cancer of the upper urinary tract. The single sample classifier mainly comprises a storage module and a correlation calculation module, wherein the storage module stores the abundance expression central point values of all subtype specificity lncRNA characteristics of UTUC in each subtype sample of UTUC respectively; the correlation calculation module is used for calculating the correlation (which can be pearson correlation or spearman correlation) between the abundance expression values of all subtype specific lncRNA characteristics in the UTUC sample of the subtype category to be identified and the abundance expression central point value of the specific lncRNA characteristics of each subtype stored in the storage module.
The fourth technical problem to be solved by the invention is to provide a construction method of the above-mentioned single sample classifier for upper urinary tract urothelial carcinoma, which specifically comprises the following steps:
1) Obtaining tumor tissue transcriptome sequencing data of a patient with the upper urinary tract urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Calculating an AUC value of each lncRNA predicted subtype obtained in the step 3) aiming at each molecular subtype, and reserving lncRNA with the AUC value larger than 0.7 as a specific lncRNA characteristic of the subtype;
5) Combining and de-weighting the specificity lncRNA features screened in the step 4) to obtain all subtype specificity lncRNA features of the upper urinary tract urothelial cancer;
6) Calculating the average abundance expression value of each specific lncRNA characteristic obtained in the step 5) in each subtype sample, and taking the average abundance expression value as an abundance expression central point value of the specific lncRNA characteristic, and finally obtaining a group of data containing all subtype specific lncRNA characteristic central point values for each subtype.
The fifth technical problem to be solved by the present invention is to provide a method for typing and identifying a UTUC sample by using the single sample classifier, the method comprising the steps of:
calculating the correlation between the abundance expression values of all subtype-specific lncRNA features in the UTUC samples requiring identification of subtype categories and a set of abundance expression center point values containing all subtype-specific lncRNA features corresponding to each subtype, and classifying the samples into the subtype corresponding to the highest correlation.
The full subtype specific lncRNA characteristics of UTUC and the abundance expression central point values in the three molecular subtype i, ii and iii samples of UTUC are preferably as shown in table 3:
TABLE 3 Table 3
The invention utilizes transcriptome sequencing data analysis of UTUC to obtain lncRNA data, and then screens lncRNA related to prognosis of UTUC for consensus clustering to obtain 3 molecular types of UTUC based on lncRNA; meanwhile, the invention further constructs a UTUC single sample classifier by screening the specific lncRNA characteristics corresponding to each molecular subtype, and the single sample classifier can identify the lncRNA molecular typing of a UTUC patient, thereby realizing subtype identification and prognosis layering of the UTUC patient.
Drawings
Fig. 1 is a consensus cluster diagram of the present invention when 1k=3.
FIG. 2 shows that three molecular types based on lncRNA of example 1 of the present invention are significantly related to progression-free survival (PFS).
Fig. 3 shows the prognostic differences between the different types after typing 403 UTUC samples in the TCGA public database using a single sample classifier in accordance with example 2 of the present invention.
Detailed Description
For a more specific understanding of the technical content, features and effects of the present invention, the technical solution of the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments:
example 1 lncRNA-based UTUC molecular typing
1. Collecting the lncRNA abundance expression data and clinical data of UTUC
156 UTUC patients with tumor tissue transcriptome sequencing (RNA-seq) data, data number EGAD00001007667, of the type of BMA file aligned to human reference genome hg19, were downloaded from EGA public database. Downloading a GTF gene annotation file gene code.v43lift37.animation.gtf corresponding to hg19 from a GENCODE website, annotating and quantifying genes by using a featuresource tool, and reserving genes with the gene type of 'lncRNA' and genes with the expression abundance median value larger than 0 in the GTF gene annotation file to finally obtain 7698 lncRNAs.
TPM standardization and log2 conversion are carried out on the abundance expression values of 7698 lncRNAs, and standardized lncRNA abundance expression values are obtained, wherein the TPM standardization conversion formula is as follows:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM (Fragments Per Kilobase of exon model per Million mapped fragments, reads per million maps per kilobase of transcription) value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
In addition, clinical information corresponding to the 156 patients described above, including time to progression free survival and state of progression free survival, was collected.
2. Selection of lncRNA associated with prognosis of UTUC
Screening the 7698 lncRNAs by sequentially using a single factor Cox proportional risk model, a LASSO model and an absolute median difference value: screening to obtain lncRNA with p value smaller than 0.05 in single factor Cox proportion risk model analysis, and carrying out statistics test on the lncRNA; recycling for 100 LASSO model analyses, selecting lncRNAs with non-zero coefficients in the results above 60 times, which lncRNAs are considered to be lncRNAs that are significantly associated with the progression-free survival of the patient; finally, calculating absolute median of the lncRNAs related to prognosis, and selecting the lncRNAs with the first 50% of the absolute median from large to small row as candidate typing characteristics to obtain 46 lncRNAs related to prognosis and with large abundance expression change (the gene IDs of the lncRNAs in Ensembl database are shown in table 1).
TABLE 1 46 lncRNAs with greater abundance expression relative to UTUC prognosis
ENSG00000235491 | ENSG00000203706 |
ENSG00000228873 | ENSG00000283684 |
ENSG00000259439 | ENSG00000285280 |
ENSG00000226674 | ENSG00000231246 |
ENSG00000204588 | ENSG00000289326 |
ENSG00000240040 | ENSG00000125462 |
ENSG00000224165 | ENSG00000224616 |
ENSG00000289062 | ENSG00000224559 |
ENSG00000225087 | ENSG00000226780 |
ENSG00000203709 | ENSG00000229021 |
ENSG00000233593 | ENSG00000225643 |
ENSG00000175147 | ENSG00000225077 |
ENSG00000291077 | ENSG00000287670 |
ENSG00000189223 | ENSG00000289305 |
ENSG00000226994 | ENSG00000286572 |
ENSG00000227088 | ENSG00000230186 |
ENSG00000289033 | ENSG00000228971 |
ENSG00000238122 | ENSG00000287064 |
ENSG00000228794 | ENSG00000224875 |
ENSG00000228044 | ENSG00000287628 |
ENSG00000289077 | ENSG00000287305 |
ENSG00000228852 | ENSG00000231407 |
ENSG00000289483 | ENSG00000288007 |
。
3. Consensus clustering to obtain molecular typing
Based on 46 lncRNAs related to UTUC prognosis obtained by screening, consensus clustering is carried out on all patient samples by using a cancer subtypes package, a clustering algorithm is set as pam, a distance calculation method is "pearson", the clustering number is 2-4, the average profile coefficient of the samples of type 2-4 is calculated, the type 3 (k=3) with the maximum average profile coefficient is determined as the optimal clustering result, and the final UTUC molecular typing result (shown in fig. 1) based on the lncRNA is obtained, wherein 61 patients belong to type I, 34 patients belong to type II and 61 patients belong to type III.
4. Molecular typing prognostic layering ability validation
Based on the obtained UTUC molecular typing results, the prognosis differences among the three types are calculated by using single factor Kaplan-Meier survival analysis, as shown in fig. 2, the type i prognosis is the best, the type iii prognosis is the worst, the log rank test difference p value between the three types is less than 0.001, which indicates that significant differences exist in progression-free survival rates among the UTUC patients of different types, and indicates that the molecular typing of the present embodiment can distinguish prognosis risks of the UTUC patients.
Example 2UTUC typing identification
1. Construction of single sample classifier based on molecular parting label
(1) Screening for subtype-specific lncRNA signatures
Using 7698 lncRNA from 156 samples collected in example 1, AUC values for each lncRNA predicted subtype were calculated for each UTUC molecular subtype using AUC function of R-packet pROC, preserving lncRNA with AUC values greater than 0.7 as the specific characteristics of that subtype. Wherein, the I type obtains 7 specific lncRNA characteristics, the II type obtains 148 specific lncRNA characteristics, the III type obtains 6 specific lncRNA characteristics, wherein 3 specific lncRNA characteristics repeatedly appear in two subtypes, and after all specific lncRNA characteristics are combined and de-duplicated, 158 subtype specific lncRNA characteristics are finally obtained (the gene ID in Ensembl database is shown in table 2).
Table 2 specific lncRNA characteristics of UTUC 3 subtypes
/>
/>
/>
(2) Calculating a central point value for the abundant expression of each subtype specific lncRNA feature
Calculating the average expression values of 158 specific lncRNA features in three subtype samples respectively, wherein the average expression values are taken as the central point values of the specific lncRNA features, and finally, each subtype obtains a group of data (see the table 3) containing the central point values of all the specific lncRNA features, and the group of data can be used for UTUC single sample classification.
TABLE 3 center point values for 158 subtype-specific lncRNA characteristics
/>
/>
/>
2. Novel sample UTUC molecular typing identification
The urothelial cancer data set, which includes 403 cases of urothelial cancer samples with complete gene abundance expression matrix and clinical information (including total survival time), was downloaded from the TCGA public database, and was used to verify the effect of the single sample classifier constructed in this example. And calculating pearson correlations between the abundance expression values of 158 subtype specific lncRNA characteristics in the samples and the center point values of the specific lncRNA characteristics of each subtype in the table 3, when the pearson correlations are highest, the samples belong to the subtype corresponding to the highest correlations, and finally identifying 111 samples as belonging to type i, 180 samples as belonging to type ii and 112 samples as belonging to type iii.
Based on the obtained molecular typing results, the single-factor Kaplan-Meier survival analysis is used for calculating the prognosis difference among three types, as shown in fig. 3, the type i prognosis is the best, the type iii prognosis is the worst, the log rank test difference p value among the three types is less than 0.05, and the difference p value is consistent with the previous typing difference trend, which indicates that the single-sample classifier of the embodiment can identify the UTUC molecular typing of a new sample.
The foregoing embodiments are merely examples of possible or preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and therefore, all equivalent changes and modifications that are consistent with the scope of the present invention shall fall within the scope of the present invention.
Claims (18)
1. A method for molecular typing of urothelial cancer in an upper urinary tract, said method not being used for the diagnosis and treatment of diseases, characterized in that the molecular typing of urothelial cancer is performed on the basis of lncRNA characteristics of patients with urothelial cancer.
2. The molecular typing method according to claim 1, wherein the method comprises the steps of:
1) Obtaining tumor tissue transcriptome sequencing data and clinical information of a patient with the urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Screening lncRNA which is related to prognosis of the upper urinary tract urothelium cancer and has large variation of abundance expression value in all patient samples as candidate parting characteristics;
5) And 4) carrying out cluster analysis on all patient samples based on the expression matrix of the candidate parting characteristic obtained in the step 4) to obtain the optimal molecular parting result of the upper urinary tract urothelial cancer.
3. The molecular typing method of claim 2, wherein step 1) the clinical information includes progression free survival time and progression free survival status.
4. The molecular typing method of claim 2, wherein step 3), the filtering comprises: the gene type remained in the GTF gene annotation file is "lncRNA" and the gene with the median value of expression abundance larger than 0.
5. The molecular typing method of claim 2, wherein in step 3), the normalization process uses a TPM normalization method, and a TPM normalization conversion formula is:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
6. The molecular typing method according to claim 2, wherein step 4) is characterized by screening the typing characteristics sequentially with a single factor Cox proportional hazards model, a LASSO model, and an absolute median difference.
7. The method of molecular typing according to claim 6, wherein the screening method comprises: and screening lncRNAs with p value smaller than 0.05 obtained by single factor Cox proportional risk model analysis, carrying out 100 times of LASSO model analysis circularly, retaining lncRNAs with non-zero coefficients in more than 60 times of circulating results, calculating absolute median differences of the lncRNAs, arranging the lncRNAs from large to small, and selecting lncRNAs with the absolute median differences of which the first 50% are arranged as candidate typing characteristics.
8. The method of molecular typing according to any one of claims 2, 6 or 7, wherein in step 4), the typing profile comprises 46 lncRNA as shown in table 1:
TABLE 1
9. The molecular typing method of claim 2, wherein step 5) employs consensus cluster analysis to determine the cluster with the highest average profile factor as the best cluster result by calculating the average profile factor.
10. The method of molecular typing according to claim 2, wherein in step 5), the optimal molecular typing results in classifying the upper urinary tract urothelial cancer into three molecular subtypes of type i, type ii and type iii.
11. An upper urinary tract urothelial cancer molecular typing marker, comprising 46 lncRNA as shown in table 1:
TABLE 1
12. The single sample classifier for the upper urinary tract and urothelial cancer parting is characterized by comprising a storage module and a correlation calculation module, wherein the storage module stores the abundance expression central point values of all subtype specificity lncRNA characteristics of UTUC in all subtype samples of the UTUC respectively; the correlation calculation module is used for calculating the correlation between the abundance expression value of all subtype specific lncRNA characteristics in the UTUC sample of the subtype category to be identified and the abundance expression central point value of the specific lncRNA characteristics of each subtype stored by the storage module.
13. The single sample classifier of claim 12, wherein the subtype specific lncRNA signature and its abundance expression center point values in three molecular subtype samples of upper urinary tract urothelial cancer are shown in table 3:
TABLE 3 Table 3
14. A method of constructing a single sample classifier for upper urinary tract urothelial carcinoma according to claim 12 or 13, said method not being used for disease diagnosis and treatment purposes, comprising the steps of:
1) Obtaining tumor tissue transcriptome sequencing data of a patient with the upper urinary tract urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Calculating an AUC value of each lncRNA predicted subtype obtained in the step 3) aiming at each molecular subtype, and reserving lncRNA with the AUC value larger than 0.7 as a specific lncRNA characteristic of the subtype;
5) Combining and de-weighting the specificity lncRNA features screened in the step 4) to obtain all subtype specificity lncRNA features of the upper urinary tract urothelial cancer;
6) Calculating the average abundance expression value of each specific lncRNA characteristic obtained in the step 5) in each subtype sample, and taking the average abundance expression value as an abundance expression central point value of the specific lncRNA characteristic, and finally obtaining a group of data containing all subtype specific lncRNA characteristic central point values for each subtype.
15. The method of claim 14, wherein step 3) the filtering comprises: the gene type remained in the GTF gene annotation file is "lncRNA" and the gene with the median value of expression abundance larger than 0.
16. The method of claim 14, wherein in step 3), the normalization process uses a TPM normalization method, and the TPM normalization conversion formula is:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
17. The method of claim 14, wherein step 6) wherein the set of data for each subtype comprising the central point values of all subtype-specific lncRNA signatures is set forth in table 3:
TABLE 3 Table 3
/>
18. A method for the genotyping of an upper urinary tract urothelial cancer sample using the single sample classifier of claim 12 or 13, said method not being used for diagnostic and therapeutic purposes of the disease, comprising the steps of:
calculating the correlation between the abundance expression values of all subtype-specific lncRNA features in the upper urinary tract urothelial cancer sample of which the subtype types need to be identified and a group of abundance expression central point values containing all subtype-specific lncRNA features corresponding to each subtype, and classifying the sample into the subtype corresponding to the highest correlation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310791539.2A CN116987789A (en) | 2023-06-30 | 2023-06-30 | UTUC molecular typing, single sample classifier and construction method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310791539.2A CN116987789A (en) | 2023-06-30 | 2023-06-30 | UTUC molecular typing, single sample classifier and construction method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116987789A true CN116987789A (en) | 2023-11-03 |
Family
ID=88533034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310791539.2A Pending CN116987789A (en) | 2023-06-30 | 2023-06-30 | UTUC molecular typing, single sample classifier and construction method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116987789A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103558395A (en) * | 2013-10-28 | 2014-02-05 | 深圳市第二人民医院 | Application of SMAD3 gene in detection of upper tract urothelial carcinomas |
US20190256924A1 (en) * | 2017-08-07 | 2019-08-22 | The Johns Hopkins University | Methods and materials for assessing and treating cancer |
US20210388450A1 (en) * | 2018-10-29 | 2021-12-16 | Samsung Life Public Welfare Foundation | Biomarker panel for determining molecular subtype of lung cancer, and use thereof |
CN114203256A (en) * | 2022-02-18 | 2022-03-18 | 上海仁东医学检验所有限公司 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
CN114582425A (en) * | 2022-03-14 | 2022-06-03 | 上海交通大学医学院附属仁济医院 | NMIBC prognosis prediction molecular marker, screening method and modeling method |
US20230126920A1 (en) * | 2019-11-08 | 2023-04-27 | Beijing Institute of Genomics, Chinese Academy of Sciences (China National Center for Bioinformation | Method and device for classification of urine sediment genomic dna, and use of urine sediment genomic dna |
-
2023
- 2023-06-30 CN CN202310791539.2A patent/CN116987789A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103558395A (en) * | 2013-10-28 | 2014-02-05 | 深圳市第二人民医院 | Application of SMAD3 gene in detection of upper tract urothelial carcinomas |
US20190256924A1 (en) * | 2017-08-07 | 2019-08-22 | The Johns Hopkins University | Methods and materials for assessing and treating cancer |
CN111868260A (en) * | 2017-08-07 | 2020-10-30 | 约翰斯霍普金斯大学 | Methods and materials for assessing and treating cancer |
US20210388450A1 (en) * | 2018-10-29 | 2021-12-16 | Samsung Life Public Welfare Foundation | Biomarker panel for determining molecular subtype of lung cancer, and use thereof |
US20230126920A1 (en) * | 2019-11-08 | 2023-04-27 | Beijing Institute of Genomics, Chinese Academy of Sciences (China National Center for Bioinformation | Method and device for classification of urine sediment genomic dna, and use of urine sediment genomic dna |
CN114203256A (en) * | 2022-02-18 | 2022-03-18 | 上海仁东医学检验所有限公司 | MIBC typing and prognosis prediction model construction method based on microbial abundance |
CN114582425A (en) * | 2022-03-14 | 2022-06-03 | 上海交通大学医学院附属仁济医院 | NMIBC prognosis prediction molecular marker, screening method and modeling method |
Non-Patent Citations (2)
Title |
---|
TYLER J MOSS等: "Comprehensive Genomic Characterization of Upper Tract Urothelial Carcinoma", EUR UROL, vol. 72, no. 4, 7 June 2017 (2017-06-07), pages 641 - 649, XP085198501, DOI: 10.1016/j.eururo.2017.05.048 * |
李幸达;孙卫兵;蒋葵;: "GATA3在上尿路上皮癌中的表达及临床意义", 临床泌尿外科杂志, no. 02, 6 February 2020 (2020-02-06), pages 31 - 35 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020260534B2 (en) | Using size and number aberrations in plasma DNA for detecting cancer | |
CN113257350B (en) | ctDNA mutation degree analysis method and device based on liquid biopsy and ctDNA performance analysis device | |
CN113366122B (en) | Free DNA end characterization | |
US20230170048A1 (en) | Systems and methods for classifying patients with respect to multiple cancer classes | |
KR20170125044A (en) | Mutation detection for cancer screening and fetal analysis | |
CN111192634A (en) | Method for processing genomic data | |
TW201928065A (en) | Using nucleic acid size range for noninvasive prenatal testing and cancer detection | |
WO2018140521A1 (en) | Methods and processes for assessment of genetic variations | |
WO2020237184A1 (en) | Systems and methods for determining whether a subject has a cancer condition using transfer learning | |
WO2022150663A1 (en) | Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics | |
EP3938534A1 (en) | Systems and methods for enriching for cancer-derived fragments using fragment size | |
CN115418401A (en) | Diagnostic assay for urine monitoring of bladder cancer | |
WO2018136882A1 (en) | Methods for non-invasive assessment of copy number alterations | |
CN113862351A (en) | Kit and method for identifying extracellular RNA biomarkers in body fluid sample | |
CN115807089A (en) | Hepatocellular carcinoma prognosis biomarker and application thereof | |
CN116987789A (en) | UTUC molecular typing, single sample classifier and construction method thereof | |
CN110819700A (en) | Method for constructing small pulmonary nodule computer-aided detection model | |
CN117558346A (en) | UTUC molecular typing and prognosis prediction model construction method | |
Chieruzzi | Identification of RAS co-occurrent mutations in colorectal cancer patients: workflow assessment and enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |