CN116987789A - UTUC molecular typing, single sample classifier and construction method thereof - Google Patents

UTUC molecular typing, single sample classifier and construction method thereof Download PDF

Info

Publication number
CN116987789A
CN116987789A CN202310791539.2A CN202310791539A CN116987789A CN 116987789 A CN116987789 A CN 116987789A CN 202310791539 A CN202310791539 A CN 202310791539A CN 116987789 A CN116987789 A CN 116987789A
Authority
CN
China
Prior art keywords
lncrna
subtype
urinary tract
urothelial cancer
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310791539.2A
Other languages
Chinese (zh)
Inventor
金鸽
赵婷婷
徐小红
曹建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Rendong Medical Laboratory Co ltd
Original Assignee
Shanghai Rendong Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Rendong Medical Laboratory Co ltd filed Critical Shanghai Rendong Medical Laboratory Co ltd
Priority to CN202310791539.2A priority Critical patent/CN116987789A/en
Publication of CN116987789A publication Critical patent/CN116987789A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Oncology (AREA)
  • General Engineering & Computer Science (AREA)
  • Hospice & Palliative Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a molecular typing method for upper urinary tract urothelial cancer, which is used for carrying out molecular typing on the upper urinary tract urothelial cancer according to the lncRNA characteristics of patients suffering from the upper urinary tract urothelial cancer. The invention also discloses a single sample classifier for identifying the molecular typing of the upper urinary tract urothelial cancer and a construction method thereof. The invention utilizes transcriptome sequencing data of patients with the upper urinary tract urothelial cancer to analyze and obtain lncRNA abundance expression data of the patients, then obtains molecular typing of the upper urinary tract urothelial cancer by screening lncRNA related to prognosis and clustering, and further constructs a single sample classifier capable of identifying the molecular typing of the upper urinary tract urothelial cancer, thereby deeply analyzing the influence of the lncRNA abundance expression characteristics of the patients on prognosis from a molecular level and being beneficial to accurate identification and practical application of the molecular typing of the upper urinary tract urothelial cancer.

Description

UTUC molecular typing, single sample classifier and construction method thereof
Technical Field
The invention relates to the field of urinary oncology medicine, in particular to upper urinary tract urothelial carcinoma (upper urinary tract urothelial carcinoma, UTUC), and more particularly relates to molecular typing of UTUC and construction of a single sample classifier.
Background
UTUC is a type of Urothelial Carcinoma (UC) that occurs in the renal pelvis and ureter. In China, UTUC accounts for about 10% -30% of the total UC, and is significantly higher than the proportion of 5% -10% in Western countries. UTUC has some of the same clinical pathological features as urothelial carcinoma (UBC), but it is also unique, such as hidden onset, high grade, strong invasiveness, and high recurrence rate. Current studies show that factors such as sex, age, stage grade of tumor, lymph node metastasis, etc. may be risk factors affecting their prognosis. Compared with UBC, the research on the molecular mechanism of the development of UTUC is limited, and meanwhile, accurate biomarkers for molecular typing and prognosis are also lacking.
31 UTUC whole exon sequencing and RNA sequencing integration analysis results were reported for the first time by Moss TJ et al, journal of Eur Urol, 2017, U.S. cancer center, md.A., moss T J et al, J.2017, 72 (4): 641-649, J.Moss T J, qi Y, xi L, et al, computer genomic characterization of upper tract urothelial carcinoma [ J ]. European urology). The results of the Whole Exon Sequencing (WES) analysis show that the high frequency mutation of UTUC is FGFR3, KMT2D, PIK CA, TP53, etc. UTUC was classified as type 4 by unsupervised cluster analysis of RNA sequencing data, characterized separately as follows: 1) Type I: no PIK3CA mutation, no smoking history, high grade < pT2 tumor, high recurrence; 2) Type II: 100% FGFR3 mutation, low-grade tumor, smoking history, non-myometrial infiltration disease, no recurrence; 3) Type III: 100% FGFR3 mutation, 71% PIK3CA mutation, no TP53 mutation, 5 recurrence, smoking history, and tumor stage < pT2; 4) Type IV: 62.5% KMT2D mutation, 50% FGFR3 mutation, 50% TP53 mutation, no PIK3CA mutation, high-grade tumor, smoking history, carcinoma in situ, and short survival period.
In 2019, robinson BD et al, the university of Wilconall medical college pathology and inspection medical center, described the molecular characteristics of 37 high-grade UTUCs, the vast majority of which were found to be lumen-papillary by integrated analysis of WES and RNA-seq sequencing data. UTUC has an immune environment with T cell depletion, and highly expresses FGFR3. Furthermore, sporadic UTUC has a lower tumor mutational burden than UBC.
In 2021, the molecular pathogenesis was comprehensively characterized by integrative analysis of gene mutation, copy number variation, DNA methylation and gene expression profile of 199 UTUC samples from the Seishi Ogawa study group, university of kyoto, japan. UTUC is first classified into type 5 by the genetically mutated state of TP53, MDM2, RAS and FGFR 3: hyper mutant (5.5%), TP53/MDM2 (37.7%), RAS (HRAS/KRAS/NRAS, 15.1%), FGFR3 (35.2%), trisomy (6.5%). In addition, five C1-C5 expression profiling subtypes were identified by RNA sequencing of 158 UTUC samples and performing a differential-free clustering analysis. Most FGFR3 mutations and most hyper-mutant subtypes are classified in the C1 expression profile subtype, TP53/MDM2 mutations and triple-negative subtypes are mainly classified in the C3-C5 expression profile subtype, while in most cases they belong to the C2 expression profile subtype with mutations in one of the RAS mutant subtypes and FGFR3 subsets. The authors also performed a robust clustering analysis based on the DNA methylation status of tumor-specific CpG islands, resulting in three subclasses of DNA methylation status.
In general, currently few molecular typing studies for UTUC are underway, and based primarily on genomic and mRNA transcriptome data, there is an urgent need to integrate other dimensional histologic information to further explore the biological processes of invasion, recurrence and progression of disease comprehensively. lncRNA (Long non-coding RNA) refers to non-coding RNA with a length of more than 200 nucleotides, has high heterogeneity, and is mainly involved in gene transcription regulation, post-transcriptional regulation, translational regulation, mediated chromosome modification, and the like. lncRNA can be extracted non-invasively from body fluids, tissues and cells. In recent years, lncRNA has received extensive attention and is believed to be involved in developmental processes and various diseases.
Disclosure of Invention
One of the technical problems to be solved by the invention is to provide a molecular typing method for the upper urinary tract urothelial cancer, which can analyze the prognosis difference between different UTUC patients from the lncRNA level.
In order to solve the technical problems, the molecular typing method for the upper urinary tract urothelial cancer provided by the invention is used for carrying out molecular typing on the upper urinary tract urothelial cancer according to the lncRNA characteristics of an upper urinary tract urothelial cancer patient, and comprises the following steps of:
1) Obtaining tumor tissue transcriptome sequencing data and clinical information of a patient with the urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Screening lncRNA which is related to prognosis of the upper urinary tract urothelium cancer and has large variation of abundance expression value in all patient samples as candidate parting characteristics;
5) And 4) carrying out cluster analysis on all patient samples based on the expression matrix of the candidate parting characteristic obtained in the step 4) to obtain the optimal molecular parting result of the upper urinary tract urothelial cancer.
Step 1) above, the clinical information includes a progression-free survival time and a progression-free survival status.
Step 3) above, the normalization preferably uses a TPM normalization method.
Step 4) above, the classification characteristics are preferably screened by a single factor Cox proportional hazards model, a LASSO model and an absolute median difference value in sequence.
The step 5) is preferably performed by a consensus cluster analysis method. The best molecular typing results in classifying the upper urinary tract urothelial cancer into three molecular subtypes of type I, type II and type III.
The second technical problem to be solved by the present invention is to provide a group of markers for molecular typing of upper urinary tract urothelial cancer, the group of markers comprising 46 lncRNA shown in the following table 1:
TABLE 1
ENSG00000235491 ENSG00000203706
ENSG00000228873 ENSG00000283684
ENSG00000259439 ENSG00000285280
ENSG00000226674 ENSG00000231246
ENSG00000204588 ENSG00000289326
ENSG00000240040 ENSG00000125462
ENSG00000224165 ENSG00000224616
ENSG00000289062 ENSG00000224559
ENSG00000225087 ENSG00000226780
ENSG00000203709 ENSG00000229021
ENSG00000233593 ENSG00000225643
ENSG00000175147 ENSG00000225077
ENSG00000291077 ENSG00000287670
ENSG00000189223 ENSG00000289305
ENSG00000226994 ENSG00000286572
ENSG00000227088 ENSG00000230186
ENSG00000289033 ENSG00000228971
ENSG00000238122 ENSG00000287064
ENSG00000228794 ENSG00000224875
ENSG00000228044 ENSG00000287628
ENSG00000289077 ENSG00000287305
ENSG00000228852 ENSG00000231407
ENSG00000289483 ENSG00000288007
The invention provides a single sample classifier for classifying the urothelial cancer of the upper urinary tract. The single sample classifier mainly comprises a storage module and a correlation calculation module, wherein the storage module stores the abundance expression central point values of all subtype specificity lncRNA characteristics of UTUC in each subtype sample of UTUC respectively; the correlation calculation module is used for calculating the correlation (which can be pearson correlation or spearman correlation) between the abundance expression values of all subtype specific lncRNA characteristics in the UTUC sample of the subtype category to be identified and the abundance expression central point value of the specific lncRNA characteristics of each subtype stored in the storage module.
The fourth technical problem to be solved by the invention is to provide a construction method of the above-mentioned single sample classifier for upper urinary tract urothelial carcinoma, which specifically comprises the following steps:
1) Obtaining tumor tissue transcriptome sequencing data of a patient with the upper urinary tract urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Calculating an AUC value of each lncRNA predicted subtype obtained in the step 3) aiming at each molecular subtype, and reserving lncRNA with the AUC value larger than 0.7 as a specific lncRNA characteristic of the subtype;
5) Combining and de-weighting the specificity lncRNA features screened in the step 4) to obtain all subtype specificity lncRNA features of the upper urinary tract urothelial cancer;
6) Calculating the average abundance expression value of each specific lncRNA characteristic obtained in the step 5) in each subtype sample, and taking the average abundance expression value as an abundance expression central point value of the specific lncRNA characteristic, and finally obtaining a group of data containing all subtype specific lncRNA characteristic central point values for each subtype.
The fifth technical problem to be solved by the present invention is to provide a method for typing and identifying a UTUC sample by using the single sample classifier, the method comprising the steps of:
calculating the correlation between the abundance expression values of all subtype-specific lncRNA features in the UTUC samples requiring identification of subtype categories and a set of abundance expression center point values containing all subtype-specific lncRNA features corresponding to each subtype, and classifying the samples into the subtype corresponding to the highest correlation.
The full subtype specific lncRNA characteristics of UTUC and the abundance expression central point values in the three molecular subtype i, ii and iii samples of UTUC are preferably as shown in table 3:
TABLE 3 Table 3
The invention utilizes transcriptome sequencing data analysis of UTUC to obtain lncRNA data, and then screens lncRNA related to prognosis of UTUC for consensus clustering to obtain 3 molecular types of UTUC based on lncRNA; meanwhile, the invention further constructs a UTUC single sample classifier by screening the specific lncRNA characteristics corresponding to each molecular subtype, and the single sample classifier can identify the lncRNA molecular typing of a UTUC patient, thereby realizing subtype identification and prognosis layering of the UTUC patient.
Drawings
Fig. 1 is a consensus cluster diagram of the present invention when 1k=3.
FIG. 2 shows that three molecular types based on lncRNA of example 1 of the present invention are significantly related to progression-free survival (PFS).
Fig. 3 shows the prognostic differences between the different types after typing 403 UTUC samples in the TCGA public database using a single sample classifier in accordance with example 2 of the present invention.
Detailed Description
For a more specific understanding of the technical content, features and effects of the present invention, the technical solution of the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments:
example 1 lncRNA-based UTUC molecular typing
1. Collecting the lncRNA abundance expression data and clinical data of UTUC
156 UTUC patients with tumor tissue transcriptome sequencing (RNA-seq) data, data number EGAD00001007667, of the type of BMA file aligned to human reference genome hg19, were downloaded from EGA public database. Downloading a GTF gene annotation file gene code.v43lift37.animation.gtf corresponding to hg19 from a GENCODE website, annotating and quantifying genes by using a featuresource tool, and reserving genes with the gene type of 'lncRNA' and genes with the expression abundance median value larger than 0 in the GTF gene annotation file to finally obtain 7698 lncRNAs.
TPM standardization and log2 conversion are carried out on the abundance expression values of 7698 lncRNAs, and standardized lncRNA abundance expression values are obtained, wherein the TPM standardization conversion formula is as follows:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM (Fragments Per Kilobase of exon model per Million mapped fragments, reads per million maps per kilobase of transcription) value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
In addition, clinical information corresponding to the 156 patients described above, including time to progression free survival and state of progression free survival, was collected.
2. Selection of lncRNA associated with prognosis of UTUC
Screening the 7698 lncRNAs by sequentially using a single factor Cox proportional risk model, a LASSO model and an absolute median difference value: screening to obtain lncRNA with p value smaller than 0.05 in single factor Cox proportion risk model analysis, and carrying out statistics test on the lncRNA; recycling for 100 LASSO model analyses, selecting lncRNAs with non-zero coefficients in the results above 60 times, which lncRNAs are considered to be lncRNAs that are significantly associated with the progression-free survival of the patient; finally, calculating absolute median of the lncRNAs related to prognosis, and selecting the lncRNAs with the first 50% of the absolute median from large to small row as candidate typing characteristics to obtain 46 lncRNAs related to prognosis and with large abundance expression change (the gene IDs of the lncRNAs in Ensembl database are shown in table 1).
TABLE 1 46 lncRNAs with greater abundance expression relative to UTUC prognosis
ENSG00000235491 ENSG00000203706
ENSG00000228873 ENSG00000283684
ENSG00000259439 ENSG00000285280
ENSG00000226674 ENSG00000231246
ENSG00000204588 ENSG00000289326
ENSG00000240040 ENSG00000125462
ENSG00000224165 ENSG00000224616
ENSG00000289062 ENSG00000224559
ENSG00000225087 ENSG00000226780
ENSG00000203709 ENSG00000229021
ENSG00000233593 ENSG00000225643
ENSG00000175147 ENSG00000225077
ENSG00000291077 ENSG00000287670
ENSG00000189223 ENSG00000289305
ENSG00000226994 ENSG00000286572
ENSG00000227088 ENSG00000230186
ENSG00000289033 ENSG00000228971
ENSG00000238122 ENSG00000287064
ENSG00000228794 ENSG00000224875
ENSG00000228044 ENSG00000287628
ENSG00000289077 ENSG00000287305
ENSG00000228852 ENSG00000231407
ENSG00000289483 ENSG00000288007
3. Consensus clustering to obtain molecular typing
Based on 46 lncRNAs related to UTUC prognosis obtained by screening, consensus clustering is carried out on all patient samples by using a cancer subtypes package, a clustering algorithm is set as pam, a distance calculation method is "pearson", the clustering number is 2-4, the average profile coefficient of the samples of type 2-4 is calculated, the type 3 (k=3) with the maximum average profile coefficient is determined as the optimal clustering result, and the final UTUC molecular typing result (shown in fig. 1) based on the lncRNA is obtained, wherein 61 patients belong to type I, 34 patients belong to type II and 61 patients belong to type III.
4. Molecular typing prognostic layering ability validation
Based on the obtained UTUC molecular typing results, the prognosis differences among the three types are calculated by using single factor Kaplan-Meier survival analysis, as shown in fig. 2, the type i prognosis is the best, the type iii prognosis is the worst, the log rank test difference p value between the three types is less than 0.001, which indicates that significant differences exist in progression-free survival rates among the UTUC patients of different types, and indicates that the molecular typing of the present embodiment can distinguish prognosis risks of the UTUC patients.
Example 2UTUC typing identification
1. Construction of single sample classifier based on molecular parting label
(1) Screening for subtype-specific lncRNA signatures
Using 7698 lncRNA from 156 samples collected in example 1, AUC values for each lncRNA predicted subtype were calculated for each UTUC molecular subtype using AUC function of R-packet pROC, preserving lncRNA with AUC values greater than 0.7 as the specific characteristics of that subtype. Wherein, the I type obtains 7 specific lncRNA characteristics, the II type obtains 148 specific lncRNA characteristics, the III type obtains 6 specific lncRNA characteristics, wherein 3 specific lncRNA characteristics repeatedly appear in two subtypes, and after all specific lncRNA characteristics are combined and de-duplicated, 158 subtype specific lncRNA characteristics are finally obtained (the gene ID in Ensembl database is shown in table 2).
Table 2 specific lncRNA characteristics of UTUC 3 subtypes
/>
/>
/>
(2) Calculating a central point value for the abundant expression of each subtype specific lncRNA feature
Calculating the average expression values of 158 specific lncRNA features in three subtype samples respectively, wherein the average expression values are taken as the central point values of the specific lncRNA features, and finally, each subtype obtains a group of data (see the table 3) containing the central point values of all the specific lncRNA features, and the group of data can be used for UTUC single sample classification.
TABLE 3 center point values for 158 subtype-specific lncRNA characteristics
/>
/>
/>
2. Novel sample UTUC molecular typing identification
The urothelial cancer data set, which includes 403 cases of urothelial cancer samples with complete gene abundance expression matrix and clinical information (including total survival time), was downloaded from the TCGA public database, and was used to verify the effect of the single sample classifier constructed in this example. And calculating pearson correlations between the abundance expression values of 158 subtype specific lncRNA characteristics in the samples and the center point values of the specific lncRNA characteristics of each subtype in the table 3, when the pearson correlations are highest, the samples belong to the subtype corresponding to the highest correlations, and finally identifying 111 samples as belonging to type i, 180 samples as belonging to type ii and 112 samples as belonging to type iii.
Based on the obtained molecular typing results, the single-factor Kaplan-Meier survival analysis is used for calculating the prognosis difference among three types, as shown in fig. 3, the type i prognosis is the best, the type iii prognosis is the worst, the log rank test difference p value among the three types is less than 0.05, and the difference p value is consistent with the previous typing difference trend, which indicates that the single-sample classifier of the embodiment can identify the UTUC molecular typing of a new sample.
The foregoing embodiments are merely examples of possible or preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and therefore, all equivalent changes and modifications that are consistent with the scope of the present invention shall fall within the scope of the present invention.

Claims (18)

1. A method for molecular typing of urothelial cancer in an upper urinary tract, said method not being used for the diagnosis and treatment of diseases, characterized in that the molecular typing of urothelial cancer is performed on the basis of lncRNA characteristics of patients with urothelial cancer.
2. The molecular typing method according to claim 1, wherein the method comprises the steps of:
1) Obtaining tumor tissue transcriptome sequencing data and clinical information of a patient with the urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Screening lncRNA which is related to prognosis of the upper urinary tract urothelium cancer and has large variation of abundance expression value in all patient samples as candidate parting characteristics;
5) And 4) carrying out cluster analysis on all patient samples based on the expression matrix of the candidate parting characteristic obtained in the step 4) to obtain the optimal molecular parting result of the upper urinary tract urothelial cancer.
3. The molecular typing method of claim 2, wherein step 1) the clinical information includes progression free survival time and progression free survival status.
4. The molecular typing method of claim 2, wherein step 3), the filtering comprises: the gene type remained in the GTF gene annotation file is "lncRNA" and the gene with the median value of expression abundance larger than 0.
5. The molecular typing method of claim 2, wherein in step 3), the normalization process uses a TPM normalization method, and a TPM normalization conversion formula is:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
6. The molecular typing method according to claim 2, wherein step 4) is characterized by screening the typing characteristics sequentially with a single factor Cox proportional hazards model, a LASSO model, and an absolute median difference.
7. The method of molecular typing according to claim 6, wherein the screening method comprises: and screening lncRNAs with p value smaller than 0.05 obtained by single factor Cox proportional risk model analysis, carrying out 100 times of LASSO model analysis circularly, retaining lncRNAs with non-zero coefficients in more than 60 times of circulating results, calculating absolute median differences of the lncRNAs, arranging the lncRNAs from large to small, and selecting lncRNAs with the absolute median differences of which the first 50% are arranged as candidate typing characteristics.
8. The method of molecular typing according to any one of claims 2, 6 or 7, wherein in step 4), the typing profile comprises 46 lncRNA as shown in table 1:
TABLE 1
9. The molecular typing method of claim 2, wherein step 5) employs consensus cluster analysis to determine the cluster with the highest average profile factor as the best cluster result by calculating the average profile factor.
10. The method of molecular typing according to claim 2, wherein in step 5), the optimal molecular typing results in classifying the upper urinary tract urothelial cancer into three molecular subtypes of type i, type ii and type iii.
11. An upper urinary tract urothelial cancer molecular typing marker, comprising 46 lncRNA as shown in table 1:
TABLE 1
12. The single sample classifier for the upper urinary tract and urothelial cancer parting is characterized by comprising a storage module and a correlation calculation module, wherein the storage module stores the abundance expression central point values of all subtype specificity lncRNA characteristics of UTUC in all subtype samples of the UTUC respectively; the correlation calculation module is used for calculating the correlation between the abundance expression value of all subtype specific lncRNA characteristics in the UTUC sample of the subtype category to be identified and the abundance expression central point value of the specific lncRNA characteristics of each subtype stored by the storage module.
13. The single sample classifier of claim 12, wherein the subtype specific lncRNA signature and its abundance expression center point values in three molecular subtype samples of upper urinary tract urothelial cancer are shown in table 3:
TABLE 3 Table 3
14. A method of constructing a single sample classifier for upper urinary tract urothelial carcinoma according to claim 12 or 13, said method not being used for disease diagnosis and treatment purposes, comprising the steps of:
1) Obtaining tumor tissue transcriptome sequencing data of a patient with the upper urinary tract urothelial cancer;
2) Comparing the sequencing data obtained in the step 1) to a human reference genome, and annotating genes by using GTF gene annotation files of corresponding versions of the reference genome;
3) Quantifying, filtering, normalizing and log2 transforming the genes annotated in step 2);
4) Calculating an AUC value of each lncRNA predicted subtype obtained in the step 3) aiming at each molecular subtype, and reserving lncRNA with the AUC value larger than 0.7 as a specific lncRNA characteristic of the subtype;
5) Combining and de-weighting the specificity lncRNA features screened in the step 4) to obtain all subtype specificity lncRNA features of the upper urinary tract urothelial cancer;
6) Calculating the average abundance expression value of each specific lncRNA characteristic obtained in the step 5) in each subtype sample, and taking the average abundance expression value as an abundance expression central point value of the specific lncRNA characteristic, and finally obtaining a group of data containing all subtype specific lncRNA characteristic central point values for each subtype.
15. The method of claim 14, wherein step 3) the filtering comprises: the gene type remained in the GTF gene annotation file is "lncRNA" and the gene with the median value of expression abundance larger than 0.
16. The method of claim 14, wherein in step 3), the normalization process uses a TPM normalization method, and the TPM normalization conversion formula is:
wherein i is sample number, j is gene number, R ij For the ready count value, F, of sample i gene j ij FPKM value, L, for sample i Gene j j For the length of the coding region of gene j, T i Number of sequencing reads for sample i.
17. The method of claim 14, wherein step 6) wherein the set of data for each subtype comprising the central point values of all subtype-specific lncRNA signatures is set forth in table 3:
TABLE 3 Table 3
/>
18. A method for the genotyping of an upper urinary tract urothelial cancer sample using the single sample classifier of claim 12 or 13, said method not being used for diagnostic and therapeutic purposes of the disease, comprising the steps of:
calculating the correlation between the abundance expression values of all subtype-specific lncRNA features in the upper urinary tract urothelial cancer sample of which the subtype types need to be identified and a group of abundance expression central point values containing all subtype-specific lncRNA features corresponding to each subtype, and classifying the sample into the subtype corresponding to the highest correlation.
CN202310791539.2A 2023-06-30 2023-06-30 UTUC molecular typing, single sample classifier and construction method thereof Pending CN116987789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310791539.2A CN116987789A (en) 2023-06-30 2023-06-30 UTUC molecular typing, single sample classifier and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310791539.2A CN116987789A (en) 2023-06-30 2023-06-30 UTUC molecular typing, single sample classifier and construction method thereof

Publications (1)

Publication Number Publication Date
CN116987789A true CN116987789A (en) 2023-11-03

Family

ID=88533034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310791539.2A Pending CN116987789A (en) 2023-06-30 2023-06-30 UTUC molecular typing, single sample classifier and construction method thereof

Country Status (1)

Country Link
CN (1) CN116987789A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103558395A (en) * 2013-10-28 2014-02-05 深圳市第二人民医院 Application of SMAD3 gene in detection of upper tract urothelial carcinomas
US20190256924A1 (en) * 2017-08-07 2019-08-22 The Johns Hopkins University Methods and materials for assessing and treating cancer
US20210388450A1 (en) * 2018-10-29 2021-12-16 Samsung Life Public Welfare Foundation Biomarker panel for determining molecular subtype of lung cancer, and use thereof
CN114203256A (en) * 2022-02-18 2022-03-18 上海仁东医学检验所有限公司 MIBC typing and prognosis prediction model construction method based on microbial abundance
CN114582425A (en) * 2022-03-14 2022-06-03 上海交通大学医学院附属仁济医院 NMIBC prognosis prediction molecular marker, screening method and modeling method
US20230126920A1 (en) * 2019-11-08 2023-04-27 Beijing Institute of Genomics, Chinese Academy of Sciences (China National Center for Bioinformation Method and device for classification of urine sediment genomic dna, and use of urine sediment genomic dna

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103558395A (en) * 2013-10-28 2014-02-05 深圳市第二人民医院 Application of SMAD3 gene in detection of upper tract urothelial carcinomas
US20190256924A1 (en) * 2017-08-07 2019-08-22 The Johns Hopkins University Methods and materials for assessing and treating cancer
CN111868260A (en) * 2017-08-07 2020-10-30 约翰斯霍普金斯大学 Methods and materials for assessing and treating cancer
US20210388450A1 (en) * 2018-10-29 2021-12-16 Samsung Life Public Welfare Foundation Biomarker panel for determining molecular subtype of lung cancer, and use thereof
US20230126920A1 (en) * 2019-11-08 2023-04-27 Beijing Institute of Genomics, Chinese Academy of Sciences (China National Center for Bioinformation Method and device for classification of urine sediment genomic dna, and use of urine sediment genomic dna
CN114203256A (en) * 2022-02-18 2022-03-18 上海仁东医学检验所有限公司 MIBC typing and prognosis prediction model construction method based on microbial abundance
CN114582425A (en) * 2022-03-14 2022-06-03 上海交通大学医学院附属仁济医院 NMIBC prognosis prediction molecular marker, screening method and modeling method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TYLER J MOSS等: "Comprehensive Genomic Characterization of Upper Tract Urothelial Carcinoma", EUR UROL, vol. 72, no. 4, 7 June 2017 (2017-06-07), pages 641 - 649, XP085198501, DOI: 10.1016/j.eururo.2017.05.048 *
李幸达;孙卫兵;蒋葵;: "GATA3在上尿路上皮癌中的表达及临床意义", 临床泌尿外科杂志, no. 02, 6 February 2020 (2020-02-06), pages 31 - 35 *

Similar Documents

Publication Publication Date Title
AU2020260534B2 (en) Using size and number aberrations in plasma DNA for detecting cancer
CN113257350B (en) ctDNA mutation degree analysis method and device based on liquid biopsy and ctDNA performance analysis device
CN113366122B (en) Free DNA end characterization
US20230170048A1 (en) Systems and methods for classifying patients with respect to multiple cancer classes
KR20170125044A (en) Mutation detection for cancer screening and fetal analysis
CN111192634A (en) Method for processing genomic data
TW201928065A (en) Using nucleic acid size range for noninvasive prenatal testing and cancer detection
WO2018140521A1 (en) Methods and processes for assessment of genetic variations
WO2020237184A1 (en) Systems and methods for determining whether a subject has a cancer condition using transfer learning
WO2022150663A1 (en) Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics
EP3938534A1 (en) Systems and methods for enriching for cancer-derived fragments using fragment size
CN115418401A (en) Diagnostic assay for urine monitoring of bladder cancer
WO2018136882A1 (en) Methods for non-invasive assessment of copy number alterations
CN113862351A (en) Kit and method for identifying extracellular RNA biomarkers in body fluid sample
CN115807089A (en) Hepatocellular carcinoma prognosis biomarker and application thereof
CN116987789A (en) UTUC molecular typing, single sample classifier and construction method thereof
CN110819700A (en) Method for constructing small pulmonary nodule computer-aided detection model
CN117558346A (en) UTUC molecular typing and prognosis prediction model construction method
Chieruzzi Identification of RAS co-occurrent mutations in colorectal cancer patients: workflow assessment and enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination