WO2016011524A1 - Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon - Google Patents

Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon Download PDF

Info

Publication number
WO2016011524A1
WO2016011524A1 PCT/CA2014/050704 CA2014050704W WO2016011524A1 WO 2016011524 A1 WO2016011524 A1 WO 2016011524A1 CA 2014050704 W CA2014050704 W CA 2014050704W WO 2016011524 A1 WO2016011524 A1 WO 2016011524A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
genes
gene
ymr
yin
Prior art date
Application number
PCT/CA2014/050704
Other languages
English (en)
Inventor
Wayne Xu
Original Assignee
University Of Manitoba
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Manitoba filed Critical University Of Manitoba
Priority to PCT/CA2014/050704 priority Critical patent/WO2016011524A1/fr
Publication of WO2016011524A1 publication Critical patent/WO2016011524A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • C07H21/04Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids with deoxyribosyl as saccharide radical
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Definitions

  • TITLE YIN-YANG GENE EXPRESSION RATIO MODELS FOR GENERATION OF CLINICAL PROGNOSIS SIGNATURES FOR LUNG CANCER PATIENTS
  • the present disclosure pertains to development of prognoses for lung cancer patients. More particularly, the present disclosure pertains to systems, methods and tools for assessment of gene expression in lung cancer cells.
  • NSCLC Non-small cell lung cancer
  • the exemplary embodiments of the present disclosure pertain to methods for developing biomarker signatures for use in providing prognoses for cancer patients.
  • the biomarker signatures are based on a Yin Yang hypothesis that the imbalance of two opposing effects in cancer cells determines a patient's prognosis.
  • the expression values of selected Yin genes and Yang genes are extracted from a patient's microarray expression data and then calculated as a mean ratio of the Yin (Y) gene expression and the Yang (y) gene expression. This mean ratio is referred to herein as the "YMR signature".
  • One embodiment of the present disclosure is a method for developing a protocol for development of a Yin-Yang set of genes for use in the development of a prognosis for a cancer patient.
  • One step of the method pertains to the identification and selection of a group of genes predominantly functions in normal cells and terming these genes as the "Yang Genes”.
  • Another step of the method pertains to the identification and selection of a group of genes that is predominantly functioning in selected cancerous cells of terming these genes as the "Yin genes”.
  • the ratio of expression levels of these two groups of genes indicates the status of the cells as functioning normally or alternatively, functioning in a cancer-mode modality.
  • the cancer prognosis is a multi-dimensional complex process.
  • the essence of the Yin-Yang theory is that it simplifies the multi-dimensional complications into simple two opposing dimensions.
  • the extent of the two-dimensional (Yin and Yang) imbalance would indicate the severity or progression of the cancer thus, predicts the patients' survival time.
  • An aspect of the present disclosure pertains to a model for determination of predictive signature models for cancer patients.
  • the model is a mathematic formula with coefficients obtained from training data. This model produces a risk score for each patient that enables stratification of the patient into a high-risk group or a low-risk group.
  • there are four steps for development of a cancer prognosis signature The first step is identifying and selecting genes that can be used for use assessing a cancer prognosis.
  • the second step is use of the selected genes to build a cancer prognosis signature model.
  • the third step is validate the cancer prognosis signature with independent data sets.
  • the fourth step is to confirm the clinical relevance of the cancer prognosis signature.
  • the cancer prognosis signature may be optimized by adding in tow additional steps between steps three and four, wherein the first additional step comprises optimizing the signature around thirty or less Yin genes and Yang genes that produce the highest performance in all data sets.
  • the second addition step comparing the optimized signature with previously reported lung cancer signatures.
  • the YMR signatures disclosed herein and the methods disclosed herein for deriving the YMR signatures contrasts with all previous signature models that are based on significant data training, and provide a more precise insight into the biology of cancer development.
  • the YMR signatures may be used for the development of qRT-PCR kits for detection of Yin gene expression and Yang gene expression in tissue samples and/or cell samples collected from a patient.
  • the calculated YMR risk score can help the clinical therapy decision making regarding the disease stages.
  • This study can also have potential in drug development by modulating the expressions of selected Yin genes and Yang genes, or by altering other target gene expression so that a lower YMR can be achieved.
  • the YMR approach to biomarker discovery can also be used for preparation of prognoses for other types of cancers and/or diseases. DESCRIPTION OF THE DRAWINGS
  • Fig. 1 is a schematic flowchart illustrating exemplary steps taken to identify and validate the methods for producing the YMR signature model disclosed herein
  • Fig. 2 is a schematic flowchart illustrating use of the YMR model disclosed herein for predicting a lung cancer patient's survival or recurrence-free time period;
  • Fig. 3 is a schematic flowchart illustrating a process for optimization of a Yin and Yang gene list for use in the YMR model disclosed herein;
  • Fig. 4 is a schematic illustration of 2-D Euclidean clustering analysis for identification of candidate Yin genes showing a complete linkage setting for both gene (12,625 genes on HG-U95av2) and 100 samples of Bhattacharjee data set (2001, Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad Sci. USA 98(24): 13790-13795). The region was selected where the genes upregulated in normal samples but downregulated in almost all different types of lung cancers. The region where genes were downregulated in one or few cancer types was not selected;
  • Fig. 5 is a schematic illustration of an expanded section from the 2-D Euclidean clustering analysis shown in Fig. 4;
  • Fig. 6 is a schematic illustration of 2-D Euclidean clustering analysis for identification of candidate Yang genes showing a complete linkage setting for both gene (12,625 genes on HG-U95av2) and 100 samples of Bhattachaqee data set. The region was selected where the genes upregulated in normal samples but downregulated in almost all different types of lung cancers. The region where genes were downregulated in one or few cancer types was not selected;
  • Fig. 7 is a schematic illustration of an expanded section from the 2-D Euclidean clustering analysis shown in Fig. 4;
  • Fig. 8 shows a combination of the cluster analyses from Figs. 1 and 3 for identification and selection of Yin genes and Yang genes of particular relevance in lung cancer.
  • the probe sets are shown in the vertical dendrogram and DNA sample data are shown in the horizontal dendrogram.
  • the expression indices of all 12,625 probe sets of the 100 sample data sets were summarized with the RMA algorithm and then further normalized by itemwise Z-normalization.
  • 74 up-regulated genes (bottom half rows) and 108 (top half rows) down-regulated genes in cancer tissues were selected from the 2D clustering regions.
  • the preselected 74 and 108 probe sets were displayed by re-clustering;
  • Fig. 9 is 3-D depiction of an IPA analysis of Yin gene probe sets. The Molecular Mechanisms of cancer canonical pathway are highlighted by green lines;
  • Fig. 10 is a schematic illustration of the selection of Yin genes (bottom portion) and Yang genes (top portion) using functional analysis. The genes highlighted by the same color are in the same interaction network;
  • Fig. 11 is 3-D depiction of an IPA analysis of Yang gene probe sets.
  • the RAR Activation pathway and the Hepatic Stellate Cell Activation pathway were highlighted by green lines;
  • Fig. 12 is a chart showing boxplots of the distributions of the mean ratios (YMR) of the expressions of Yin genes (Y) relative to the Yang genes (y ) in normal lung samples and lung cancer samples.
  • the YMR were derived from microarray gene expression data sets described in Table S7 (six normal lung data sets and seven different lung cancer type data sets;
  • Figs. 13A-13D are charts showing validation of YMR in four data sets using Kaplan- Meier estimates of the survivor function:
  • low YMR scores in green
  • high YMR scores in red
  • Figs. 14A-14D are charts showing random group gene expression ratios. 500 groups of 31 genes and 500 groups of 32 genes randomly picked up from 12,625 genes among 125 Adenocarcinomas of the Bhattacharjee data set: Fig. 14A shows a histogram of 500 p-values of random group ratios as continuous variable, Fig. 14B shows a histogram of 500 hazard ratios (HR) of random group ratios as continuous variable, Fig. 14C shows a histogram of 500 p-values of random group ratios as dichotomous (ratio >2.0) variable, and Fig. 14D shows a histogram of 500 hazard ratios (HR) of random group ratios as dichotomous (ratio >2.0) variable.
  • HR hazard ratios
  • Figs. 15A-15B are charts showing the effects of dropping Yin genes on continuous and dichotomous YMR and gYMR signatures developed from 442 samples from the DCC data set.
  • "orig” is the original 31 Yin gene, dropping one gene a time, dropping two genes ("24_10", i.e. HIST1H4J, 214463_x; CDC25A, 204696_s), as well as dropping three genes (24-10-7, i.e. HIST1H4J; CDC25A; and IGFBP5, 203425_s). These three genes were chosen because they showed best performance in gYMR after they were dropped.
  • Fig. 15B shows the effects on HR using the same genes as in 15 A;
  • Figs. 16A-16B are charts showing the effects of dropping Yin genes on continuous and dichotomous YMR and gYMR signatures developed from 442 samples from the DCC data set.
  • "orig” is the original 31 Yin gene, dropping one gene a time, dropping two genes ("24_10", i.e. HIST1H4J, 214463_x; CDC25A, 204696_s), as well as dropping three genes (24-10-7, i.e. HIST1H4J; CDC25A; and IGFBP5, 203425_s). These three genes were chosen because they showed best performance in gYMR after they were dropped.
  • FIG. 16B shows the effects on HR using the same genes as in 16 A;
  • FIGs. 17A-17B are charts showing the effects on HR of dropping Yang genes on continuous and dichotomous YMR and gYMR signatures developed from 442 samples from the DCC data set.
  • Fig. 17A shows the effects on YMR, "orig” is the original 31 Yin gene, dropping one gene a time, dropping two genes ("24_10", i.e. HIST1H4J, 214463_x; CDC25A, 204696_s), as well as dropping three genes (24-10-7, i.e. HIST1H4J; CDC25A; and IGFBP5, 203425_s).
  • Fig. 17B shows the effects on gYMR using the same genes as in 17 A;
  • Figs. 19A-19F are charts showing Kaplan-Meier estimates of the survivor function of the gYMR signature in different groups of patients from the DCC data set:
  • Fig. 19B is a chart showing YMR signatures for "low-risk” patients
  • Low gYMR scores (in green) correspond to the highest predicted survival probability and high gYMR scores (in red) correspond to the greatest predicted risk;
  • Figs. 20A-20C are chart showing Kaplan-Meier estimates of the survivor function of patients with or without chemotherapy after diagnosis wherein Fig. 20A shows all Stage I patient samples from the DCC project, Fig. 20B shows low YMR Stage II and Stage III patients, and Fig. 20C shows high YMR Stage II and Stage III patients;
  • Figs. 21A-21C show the optimization of YMR signature sizes using a multiple permutation process (MPP) wherein Fig. 21 A shows the occurrences of signature tests that have p-values ⁇ 0.05, Fig. 21 B shows the YMR 75 th percentile p-value of 1000 random data sets that are less than 0.5, and Fig. 21 C shows the YMR signatures' 85 th percentile p-value of 1000 random data sets that are less than 0.5.
  • the Y-axis represents the proportion of p-values less than 0.05, while the X-axis represents that signature size ranges from the smallest (2-2) to the largets (231-32).
  • the maximum difference of the Yin gene and the Yang gene numbers in each signature is 2;
  • Fig. 22 shows the optimization of YMR signature genes using MPP. Bars denoted by a "*" underneath are Yang genes while the remaining bars are the Yin genes;
  • Figs. 23(A), 23(B) show the Kaplan-Meier survival curves for the Bhattacharjee data (23(A)) and the DCC data (23(B));
  • Figs. 24(A), 24(B) show the Kaplan-Meier survival curves for the TCGA RNAseq data (24(A)) and the MTAB_923 data (24(B));
  • Figs. 25(A), 25(B) show the Kaplan-Meier survival curves for the GSE42127 data (25(A)) and the GSE41271 data (25(B));
  • Figs. 26(A), 26(B) show the Kaplan-Meier survival curves for the GSE31210 data (26(A)) and the GSE14814 data (26(B)); Figs. 27(A), 27(B) show the Kaplan-Meier survival curves for the GSE13213 data
  • Figs. 28(A), 28(B) show the Kaplan-Meier survival curves for the total combined data of 1,664 patient samples (28(A)) and data for all Stage I patients (28(B));
  • Figs. 29(A)-29(D) show the Kaplan-Meier survival curves for different groups of patients from the 1,664 patient samples, wherein Fig. 29(A) shows the patient cohorts who underwent treatment after diagnosis, Fig. 29(B) show the patient cohorts who did not undergo treatment after diagnosis, Fig. 29(C) shows all Stage II patients, and Fig. 29(D) shows all Stage III/IV patients; and
  • Fig. 30 is a schematic illustration of the network connection of ten selected Yin genes and Yang genes.
  • the Ying genes are colored red and Yang genes are colored green.
  • the Yin and Yang genes were connected by two canonical pathway components that are colored in yellow.
  • the exemplary embodiments of the present disclosure pertain to systems, methods and tools for identifying suitable sets of genes that express in a "Yin” and ' ang" manner in lung cancer cells.
  • the expression values of selected Yin genes and Yang genes are extracted from a patient's microarray expression data and then calculated as a mean ratio of the Yin (Y) gene expression and the Yang (y) gene expression. This mean ratio is referred to herein as the ' MR signature". It is well known that variations in gene expression determine phenotype changes and these expression variations can be caused by factors such as the DNA mutations or epigenetics. Gene expression can be used as a surrogate measurement of cancer disease phenotype.
  • Expression variation can be correlated to disease aggressiveness, which can be used to determine patient prognosis.
  • the utility of gene expression-based methods to guide cancer therapy has already been used for breast cancer, where the ONCOTYPE DX ® test (http://www.oncotypedx.com/; ONCOTYPE DX is a registered trademark of Genomic Health Inc., Redwood City, CA, USA) helps define patients most likely to benefit from adjuvant chemotherapy.
  • Many studies used the gene expression signature for lung cancer prognosis prediction. Generally, there are four steps in a prognostic signature development:
  • a predictive signature model is not just a gene list. It is a mathematic formula with coefficients obtained from training data. This model produces a risk score for each patient and the patients will be stratified into high or low risk groups. Most models combined the Cox proportional coefficient of each signature gene multiplied with the gene expression value as the patient risk scores. Some models computed the probability of a patient falling into the low-risk or high-risk class as the patient risk scores.
  • the gene expression values and the patients' survival time of the training data set were used to calculate the coefficients and the same coefficients are supposed to use for risk score calculation of new patients' data.
  • substantial gene expression variations exist within individual subjects. Some genes associated with other aggressive diseases may be present in a subject's tumor. Similarly, a subject might die from a secondary clinical condition. In these instances, a correlation between gene expression in cancer and subject survival does not exist and their data should not be included in the training set.
  • the ratio of two-gene expression within the same individual patients has been reported for biomarker signature development in lung cancer diagnosis and prognosis as well as for breast cancer prognosis.
  • the single two-gene ratio or geometric mean of several two-gene ratios was selected between the treatment failures and the treatment responders from the training data samples.
  • Nat Med.12: 1294-300 used molecular profiles from cell lines to establish sensitivity to chemotherapy.
  • the usefulness of this approach is that one tumor sample can be interrogated for response to many agents on the basis of cell-line derived signatures.
  • Beane et al. found a significant association between docetaxel resistance and PI3-kinase inhibitor (LY-294002) sensitivity, suggesting its use as a second-line therapy (2009, Clinical impact of highthroughput gene expression studies in lung cancer. J. Thorac. Oncol. 4: 109-18). It is well known that there are significant problems with the validation and reproducibility of the above-noted proposed methods and tools for developing prognoses for lung cancer patients.
  • the ideal prediction model in a clinical setting should be applicable to any single patient by providing an informative risk score for that patient.
  • a limitation of all previous prediction models is that the signature gene expression values of new samples have to be comparable to those of the training sample data in terms of data preprocess, analysis platform, and data normalization. For example, Shedden et al. (2008) normalized the entire training and testing data sets together. This is not practical for clinical use.
  • statistical methods can deal with a large number of genes signatures, the number of genes determines the feasibility and cost of assay development, and use in clinical practice. The intent is to reduce the number of genes in a signature while achieving similar prediction performance is crucial in the development of a practical assay. More practical would be the use of a small number of genes by qRT-PCR, even though these qRT-PCR data also need to be normalized before the same models can be applied.
  • the exemplary embodiments of the present disclosure pertain to use of the Yin Yang paradigm for development of cancer biomarker signatures.
  • the Yin Yang imbalance indicates illness and the relationship between Yin and Yang forms the general basis for all diagnoses and treatment protocols in Chinese medicine.
  • the core of Yin and Yang theory is the "global" effects of the perturbation.
  • An important aspect of the present disclosure is the realization that a group of genes predominantly functions in normal lung cells and terming these genes as the ' ang Genes", whereas a group of genes predominantly function in lung cancer cells and are termed herein as the "Yin genes".
  • the ratio of expression levels of these two groups of genes would indicate the status of the cells as functioning normally or alternatively, functioning in a cancer-mode modality.
  • the lung cancer prognosis is a multi-dimensional complex process.
  • the essence of the Yin Yang theory is that it simplifies the multi-dimensional complications into simple two opposing dimensions.
  • the extent of the two-dimensional (Yin and Yang) imbalance would indicate the severity or progression of the cancer thus predict the patients' survival time.
  • Yin gene refers to a group of genes whose expressions and functions dominate in cancerous lung cells.
  • Yang gene refers to whose expressions and functions dominate in normal lung cells.
  • YMR refers to a mean ratio of the concurrent expression of Yin genes and Yang genes in a patient's lung cells.
  • signature means a group of genes whose expression status distinguishes high risk from low risk patients.
  • effective amount means an amount effective, at dosages and for periods of time necessary to achieve the desired results (e.g. the modulation of collagen synthesis). Effective amounts of a molecule may vary according to factors such as the disease state, age, sex, weight of the animal. Dosage regimes may be adjusted to provide the optimum therapeutic response. For example, several divided doses may be administered daily or the dose may be proportionally reduced as indicated by the exigencies of the therapeutic situation.
  • subject as used herein includes all members of the animal kingdom, and specifically includes humans.
  • a cell includes a single cell as well as a plurality or population of cells. Administering an agent to a cell includes both in vitro and in vivo administrations.
  • nucleic acid refers to a polymeric compound comprised of covalently linked subunits called nucleotides.
  • Nucleic acid includes polyribonucleic acid (RNA) and poly deoxyribonucleic acid (DNA), both of which may be single-stranded or double-stranded.
  • DNA includes cDNA, genomic DNA, synthetic DNA, and semisynthetic DNA.
  • gene refers to an assembly of nucleotides that encode a polypeptide, and includes cDNA and genomic DNA nucleic acids.
  • the term "recombinant DNA molecule” refers to a DNA molecule that has undergone a molecular biological manipulation.
  • vector refers to any means for the transfer of a nucleic acid into a host cell.
  • a vector may be a replicon to which another DNA segment may be attached so as to bring about the replication of the attached segment.
  • a "replicon” is any genetic element (e.g., plasmid, phage, cosmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo, i.e., capable of replication under its own control.
  • vector includes plasmids, liposomes, electrically charged lipids (cytofectins), DNA-protein complexes, and biopolymers.
  • a vector may also contain one or more regulatory regions, and/or selectable markers useful in selecting, measuring, and monitoring nucleic acid transfer results (transfer to which tissues, duration of expression, etc.).
  • Cloning vector refers to a replicon, such as plasmid, phage or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.
  • Cloning vectors may be capable of replication in one cell type, and expression in another ("shuttle vector").
  • a cell has been "transfected” by exogenous or heterologous DNA when such DNA has been introduced inside the cell.
  • a cell has been "transformed” by exogenous or heterologous DNA when the transfected DNA effects a phenotypic change.
  • the transforming DNA can be integrated (covalently linked) into chromosomal DNA making up the genome of the cell.
  • nucleic acid molecule refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; "RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; "DNA molecules”), or any phosphoester anologs thereof, such as phosphorothioates and thioesters, in either single stranded form, or a double-stranded helix. Double stranded DNA- DNA, DNA-RNA and RNA-RNA helices are possible.
  • nucleic acid molecule and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms.
  • Modification of a genetic and/or chemical nature is understood to mean any mutation, substitution, deletion, addition and/or modification of one or more residues.
  • Such derivatives may be generated for various purposes, such as in particular that of enhancing its production levels, that of increasing and/or modifying its activity, or that of conferring new pharmacokinetic and/or biological properties on it.
  • the derivatives resulting from an addition there may be mentioned, for example, the chimeric nucleic acid sequences comprising an additional heterologous part linked to one end, for example of the hybrid construct type consisting of a cDNA with which one or more introns would be associated.
  • the claimed nucleic acids may comprise promoter, activating or regulatory sequences, and the like.
  • promoter sequence refers to a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence.
  • the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
  • a coding sequence is "under the control" of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then trans-RNA spliced (if the coding sequence contains introns) and translated into the protein encoded by the coding sequence.
  • homologous in all its grammatical forms and spelling variations refers to the relationship between proteins that possess a "common evolutionary origin,” including homologous proteins from different species. Such proteins (and their encoding genes) have sequence homology, as reflected by their high degree of sequence similarity. This homology is greater than about 75%, greater than about 80%, greater than about 85%. In some cases the homology will be greater than about 90% to 95% or 98%.
  • amino acid sequence homology is understood to include both amino acid sequence identity and similarity. Homologous sequences share identical and/or similar amino acid residues, where similar residues are conservative substitutions for, or "allowed point mutations" of, corresponding amino acid residues in an aligned reference sequence. Thus, a candidate polypeptide sequence that shares 70% amino acid homology with a reference sequence is one in which any 70% of the aligned residues are either identical to, or are conservative substitutions of, the corresponding residues in a reference sequence.
  • polypeptide refers to a polymeric compound comprised of covalently linked amino acid residues.
  • Amino acids are classified into seven groups on the basis of the side chain R: (1) aliphatic side chains, (2) side chains containing a hydroxylic (OH) group, (3) side chains containing sulfur atoms, (4) side chains containing an acidic or amide group, (5) side chains containing a basic group, (6) side chains containing an aromatic ring, and (7) proline, an imino acid in which the side chain is fused to the amino group.
  • a polypeptide of the invention preferably comprises at least about 14 amino acids.
  • protein refers to a polypeptide which plays a structural or functional role in a living cell.
  • corresponding to is used herein to refer to similar or homologous sequences, whether the exact position is identical or different from the molecule to which the similarity or homology is measured.
  • a nucleic acid or amino acid sequence alignment may include spaces.
  • corresponding to refers to the sequence similarity, and not the numbering of the amino acid residues or nucleotide bases.
  • derivative refers to a product comprising, for example, modifications at the level of the primary structure, such as deletions of one or more residues, substitutions of one or more residues, and/or modifications at the level of one or more residues.
  • the number of residues affected by the modifications may be, for example, from 1, 2 or 3 to 10, 20, or 30 residues.
  • derivative also comprises the molecules comprising additional internal or terminal parts, of a peptide nature or otherwise. They may be in particular active parts, markers, amino acids, such as methionine at position -1.
  • derivative also comprises the molecules comprising modifications at the level of the tertiary structure (N-terminal end, and the like).
  • the term derivative also comprises sequences homologous to the sequence considered, derived from other cellular sources, and in particular from cells of human origin, or from other organisms, and possessing activity of the same type or of substantially similar type. Such homologous sequences may be obtained by hybridization experiments. The hybridizations may be performed based on nucleic acid libraries, using, as probe, the native sequence or a fragment thereof, under conventional stringency conditions or preferably under high stringency conditions.
  • the steps that may be used to develop a YMR signature model for developing a prognosis for early-stage lung cancer patents is illustrated in Fig. 1.
  • the first step is to identify a set of suitable Yin genes and Yang genes isolated from lung cancer tissue samples by comparing the total gene compliments extracted from lung cancer patients' tissue samples with a database comprising gene sets isolated from normal lung samples from healthy subjects, and gene sets isolated from lung cancer samples collected from patients of mixed tumour stages with different survival times.
  • a suitable exemplary data set is shown in Table 1.
  • the second step comprises identification of the Yin gene candidates and Yang gene candidates, for example up to about 30 or 40 or 50 each of candidate Yin genes and candidate Yang genes, and then calculating mean ratio of the concurrent expression of Yin genes and Yang genes in cancerous lung tissue samples, i.e., the "YMR signature".
  • the third step comprises validation of the YMR signature by comparisons to YMR signatures calculated for other lung cancer patients using other reported models.
  • the last step is translation of the YMR model and signature into clinical assays.
  • Fig. 2 sets out in more detail the methods used for the first three steps for developing the YMR signature model, wherein (i) the expression values of selected Yin genes and Yang genes for each patient sample are extracted from the expression data sets and then correlated with clinical overall survival (OS) or relapse-free (RF) information for each sample, (ii) the geometric means of all Yin gene expression and geometric mean of all Yang gene expression are computed, and then the ratio of these geometric means (YMR) for each patient sample are calculated, (iii) the Cox regression of the continuous YMR values is then evaluated, and (iv) the YMR signature value is used a risk score to stratify patients into low-risk and high-risk groups.
  • OS overall survival
  • RF relapse-free
  • Algorithms may be used to model the balance of Yin genes and Yang genes for prediction of a patient's survival or recurrence-free time.
  • the expression values of the selected Yin genes and Yang genes may be extracted from microarray expression data or RNA-Seq data, after which the geometry means of Yin gene and Yang gene expression (YMR) may be computed for each patient.
  • the Cox regression of the continuous covariate YMR is preferably evaluated by univarite analysis.
  • the YMR score as a continuous covarite will test if each unit increases in YMR results in proportional scaling of the hazard rate.
  • This YMR signature model does not integrate any coefficients between gene expression and patients' survival time.
  • a median YMR cutoff score or alternatively, graphical diagnostic plots may be used to find optimal YMR signature cutoff scores for stratifying patients into low-risk or high-risk groups.
  • An additional optional fourth step may be incorporated in an exemplary method used to develop a YMR signature model as illustrated in Fig. 1, wherein the YMR signature is optimized around an aggregate of about 30 or less Yin genes and Yang genes that produce the highest performance in the data sets assessed in steps 1 and 2, generally following the flowchart outlined in Fig. 3.
  • the first step is to extract the previously defined Yin genes and Yang genes from expression data sets.
  • the second step is to pick up a set of 1 million combinations of random size of Yin genes and random size of Yang gene, and then to test each combination of Yin gene and Yang gene against the set of 1 million randomly permutated expression data sets with clinical information, wherein each data set comprises 100 to 200 samples.
  • the gene expression of the Bhattacharjee samples was detected by AFFYMETRIX ® HU_U95Av2 GENECHIP ® (AFFYMETRIX and GENECHIP are registered trademarks of Affymetrix Inc., Santa Clara, CA, USA).
  • the raw hybridization intensity data files (CEL) were downloaded from http ://www. broadinstitute. org/ mpr/lung/.
  • the gene expression indexes were processed with the MAS5.0 algorithm by using the EXPRESSIONIST ® Refiner module (GeneData Inc, San Francisco, CA, USA; EXPRESSIONIST is a registered trademark of GeneData AG, Basil, Switzerland). No further normalization was done within each data set in order to keep the individual sample independent in gene biomarker detection.
  • the Robust Multi-array Average (RMA) derived and normalized expression measurements were calculated from the raw CEL files.
  • the gene expression of thessen samples was detected by AFFYMETRIX* HU_U133plus2 GENECHIP* and the signal intensity was calculated by MAS5.0 algorithm.
  • the data set was downloaded from NCBI GEO database (GSE3141).
  • MAS5.0 algorithm was used for gene expression summarization.
  • RNA-seq data were downloaded from TCGA Data Portal (http://tcga-data.nci.nih.gov/tcga/tcgaDownload.isp).
  • the gene expression RKPM reads per kilobase per million mapped reads
  • Yang gene candidates Genes that were expressed higher in normal lung tissues than in lung cancer cells were called “Yang” gene candidates, conversely genes expressed higher in lung cancer cells than normal lung tissues were called “Yin” gene candidates.
  • IPA9.0 Ingenuity Systems Inc., Redwood City, CA, USA
  • the networks are built by direct interactions. The networks with significant scores were selected for further analysis.
  • the expression values of the selected Yin genes and Yang genes were extracted from published microarray expression data. Initially, the Yin (Y) and Yang (y) expression mean ratios (YMR) were calculated as a signature classifier for each sample. In HG-U95A platform of Bhattacharjee data set, the 31 and 32 probe sets were used to extract the Yin gene expression and Yang gene expression values respectively. In extracting Yin and Yang genes from different platforms, the best match probe sets were downloaded from Affymetrix (http : //www. affy metrix. com) and the gene symbols to match the gene IDs. For non- Affymetrix platforms, gene symbols were used for gene IDs. For multiple IDs within the same gene symbol, an average value was used.
  • Affymetrix http : //www. affy metrix. com
  • HG-133plus2 of Rick data set 62 genes were computed to determine average expression values from multiple probe sets, since only one best matched probe set to HG-U95A 3965 l at (RECQL4 gene).
  • 22 Yin genes' expression was derived from 22 best matched probe sets, 3 genes matched single probe sets and 6 genes' expression was averaging expression of multiple probe sets; 29 Yang genes' expression was from best matched probe sets, and 2 genes from multiple probe sets.
  • the patient risk scores were derived from the YMR values. Patients were divided into high-risk and low-risk prognostic groups using YMR cutoff values.
  • a 2-fold difference in the Yin value over the Yang value was defined as a cutoff and then was adjusted by either: (i) a normal sample mean YMR, or (ii) a cancer sample mean YMR. If the normal lung sample YMR is significantly less than 1.0 (for example, the TCGA RNAseq data), the YMR cutoff will be adjusted to be lower than 2.0. If normal sample mean YMR is not available for a particular data set (for example, DCC and Marsh data sets), a cutoff value was selected that is close to the mean YMR of the lung cancer data set since many studies use the mean risk score to stratify patients. This arbitrary YMR cutoff value is only used for the YMR signature validation. In future, a universal YMR cutoff value may be selected for results from a clinically relevant platform such as qPCR.
  • the YMR was compared to a geometric Mean of Yin and Yang Ratio (gYMR).
  • gYMR geometric Mean of Yin and Yang Ratio
  • the effects of dropping genes from the 31 Yin and 32 Yang gene list on associations with clinical outcome were assessed to determine optimal sizes of gene sets.
  • the significance of the YMR signature was assessed by comparing the YMR to the any ratio of randomly picked up groups of identical group size.
  • each YMR was assessed as a dichotomous covariate or a continuous covariate in a Cox proportional hazards model, with 5 to 6 years overall survival or without recurrence as the outcome variable.
  • the estimated hazard ratio, 95% confidence interval and p-value allowed us to directly compare the performances of YMR covariate with other clinical variables.
  • Kaplan-Meier product-limit methods and log-rank tests were used to estimate and test differences in probability of survival between low- and high-risk patient groups.
  • the survivor function was plotted for each subgroup. All statistical analyses were performed using PARTEK ® software, version 6.3 (PARTEK is a registered trademark of Partek Inc., St. Louis, MO, USA) or R statistic package Survcomp.
  • the YMR were measured in new independent data sets. These data sets were processed by different platforms including Affymetrix GENECHIP ® HG-U95, HG-133A, HG-133plus2, ILLUMINA ® beadChip (ILLUMINA is a registered trademark of Continental Resources Inc., Bedford, MA, USA), and two-channel arrays. The YMRs were calculated from these data sets either with or without data normalization based on the original data status.
  • RNA-seq platform To validate the YMR signature for lung cancer prognosis, four independent data sets were used: (i) 125 Bhattacharjee adenocarcinomas sample data set of HG_U95Av2 platform, (ii) 58 Schm adenocarcinomas sample data of HG-133Plus2 platform, (iii) 442 DCC sample files of HG-133A platform, and (iv) 259 TCGA samples of RNA-seq platform. These are well-defined patient samples with clinical information. For analyses in this study, survival or without recurrence outcomes were compared according to high-risk YMR (i.e.
  • YMR is greater than 2.0 or an adjusted cutoff
  • low-risk YMR YMR is less than or equal to 2.0 or an adjusted cutoff
  • the YMR score stratification in the same stages and in response to treatment was tested in the following groups of the DCC patients, respectively: Stage I,; Stage II, Stage III; received chemotherapy; no chemotherapy; chemotherapy on stage I; chemotherapy on stage II & III; no chemotherapy on stage I; no chemotherapy on stage II & III.
  • defensin alpha 1 /// defensin
  • alpha IB /// defensin
  • solute carrier family 6 neurotransmitter transporter
  • cystatin F leukocystatin
  • pro-platelet basic protein chemokine (C-X-C motif)
  • neural precursor cell expressed developmentally
  • Yin genes and Yang genes showed little overlap with the previously reported lung cancer prognostic signature genes.
  • many Yin genes reported here were found in previous studies that relate lung cancer or other tissue type cancer development such as GRIN2D, GAST, AMH, TCF3, EXOSC2, GRM1, CDT1, RecQL4, CSTF2, FCGR2B, RNASEH2A, CDC6, CACYBP, BIRC5, CDC25, NRAS, EN2, and MIF.
  • Typical oncogenes were not found. Accordingly, it appears that the progression genes play more important roles than tumor initial genes in determining lung cancer prognoses.
  • These networks participate in the canonical Molecular Mechanisms of Cancer pathway (Figs. 9, 10). These networks contain 31 genes whose gene symbol names matched the Affymetrix U95 AV2 probe set identifiers. We selected these 31 genes as Yin gene Table 5: Yin genes
  • the 108 down-regulated genes constituted two main networks related to maintenance (network significant score of 63) and cellular development (network significant score of 23) processes.
  • the RAR Activation pathway and the Hepatic Stellate Cell Activation pathway (Fig. 11) invoked by Yang genes exert a wide variety of effects on tissue homeostasis, cell proliferation, differentiation, and apoptosis.
  • lung tissue harbors Hepatic Stellate-like cells, vitamin-A-storing lung cells.
  • Assessment of focus genes retrieved from the networks that involved cell maintenance and cellular development process revealed two groups of genes. These two groups (Tables 6, 7) contain 43 genes resulting in 32 unique genes. We defined these 32 genes as Yang gene candidates for signature development (Table 8).
  • aldehyde dehydrogenase 1 family, member
  • the signature models disclosed herein are based on computation of the YMR as the patient risk scores.
  • the YMR represents a simple combination or interaction effect of the Yin genes and Yang genes. The ratio indicates the Yin and Yang balance status in lung cells or which group of genes is more active than others and the extent of this difference. In normal lung cells, the Yang is greater than Yin. Cancer phenotypes have higher YMR scores then are associated with higher risk disease.
  • YMR is less than 1.0 in normal lung tissues and greater than 1.0 in lung cancer tissues.
  • Several independent sample data sets with different platforms and different preprocesses were assessed (Table 9). YMRs were less than 1.0 in all normal lung data sets (Fig. 12).
  • YMRs of 12 different normal human tissue types were also measured.
  • the data were preprocessed by MAS50.0 and quantile-normalized data was download from NCBI GEO database (GSE803).
  • the YMRs of each sample were directly calculated from the 31 Yin gene and the 32 Yang gene mean values. (Table 10).
  • the YMRs were less than 1.0 in normal lung, as well as in other normal tissues such as the heart, spleen, skeletal muscle, and prostate, but greater than 1.0 in other tissues such as the liver. This result suggests that the Yin and Yang gene expression profiles are tissue type specific.
  • all samples had an YMR greater than 1.0.
  • the YMRs greater than 1.0 in other independent lung cancer sample data sets are also shown in Fig. 12.
  • the YMR were evaluated for prognosis of four data sets in which the patient clinical information was available.
  • the YMR model was validated for the risk outcome of the Bhattachaqee data set from which the model was built. Since the patients' survival time or recurrence free survival time information was not used in the modeling, this data set could therefore serve as an independent data set.
  • the YMR ratio was calculated using RNA-seq data of 259 TCGA samples.
  • the continuous YMR scores associate with the survival rate significantly (p-value 0.007, HR 1.87) (Table 11).
  • the geometric mean of Yin and Yang gene expression ratio was calculated and tested its association with the poor outcome both as a continuous variable and a dichotomous variable.
  • the continuous gYMR does not work for Bhattachaqee data and Marsh data
  • the dichotomous gYMR does not work for Bhattachaqee data either.
  • the YMR is robust in four data sets.
  • Chemotherapy was a category variable (no chemotherapy group as reference);
  • Smoking history was a category variable (no smoking group as reference);
  • Sex was a binary variable (0 for female as reference);
  • Age was a binary variable (0 for ⁇ 60 years old as reference);
  • Tumor stage was a category variable (stage I as reference);
  • Hybrid models identified a 12-gene signature for lung cancer prognosis and chemoresponse prediction.
  • the YMR were evaluated with clinic covariates in lung cancer prognosis.
  • the 442 DCC samples showed greater than 50% survival rate within 5-year (Fig. IOC), which is biased because of the fact that the 5-year overall survival rate for lung cancer is as low as 16% and has not significantly improved over the past 30 years.
  • Table 15 gYMR covariate and multivariate analysis using an actuarial method*
  • Chemo was a category variable (no chemotherapy group as reference);
  • Smoker was a category variable (no smoking group as reference);
  • Sex was a binary variable (0 for female as reference);
  • Age was a binary variable (0 for ⁇ 70 years old as reference);
  • Tumor stage was a category variable (stage I as reference);
  • This disclosure pertains to a new survival prediction signature for lung cancer called 'YMR".
  • This YMR signature was built from a cancer biology hypothesis in contrast to previously reported models that are based on survival time training (Table 14).
  • the YMR value of individual patients can provide valuable biomarker information relevant to lung cancer prognosis and therapeutic decision-making.
  • the ideal prediction model should be applicable to any single patient by providing an informative risk score for that patient.
  • the major shortcoming of all previous prediction models is that the signature gene-expression values of new samples have to be comparable to those of the training sample data in terms of data preprocessing, analysis platform, and data normalization. For example, Shedden et al.
  • YMR signatures as disclosed herein not only simplifies the modeling but also avoids data normalization preprocess since the ratio of each patient is comparable.
  • the YMR is computed from the same individuals; therefore, it works for a single patient sample.
  • YMR works for different data analysis platforms and different data preprocess methods.
  • lung cancer prognosis with the YMR could be improved by optimizing the Yin and Yang gene lists and the number of genes in the YMR calculation.
  • the ratio of two-gene expression within an individual patient has been reported as a biomarker signature development in lung cancer diagnosis and prognosis as well as for breast cancer prognosis.
  • the single two-gene ratio or geometric mean of several two-gene ratios was selected between the treatment failures and the treatment responders from the training data samples.
  • the single two-gene ratio works well for cancer cell type classification or diagnosis; for example, between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA), but it may not be able to reflect the complex tumor progression process for prognosis. In some cases, there could be substantial variation of the two genes among different samples.
  • DEGs were chosen between normal tissue samples and cancer samples. There have been no prior disclosures that selected the DEGs between normal and cancer samples for cancer prognostic signature development. Rather, previous disclosures selected genes between patients of long and short survival time or genes that correlate to survival time (Table 3). In those publications, Cox regression analysis of all genes against the survival time of all patients resulted in a proportional hazard rate for each gene. The top gene in the list, pre- clustered genes, or metagenes were used as signature genes. Other studies selected genes that were differentially expressed between high-risk and low-risk patients who were simply grouped by survival time. If the same idea (gene association with survival time) was used in gene selection then the selected signature gene lists would be similar for different studies.
  • the published signatures showed little overlap in the genes identified as significant predictors of outcome.
  • gene selections were influenced by variations in sample collection, sample size, data processing, and microarray platform.
  • the YMR signatures disclosed herein do not use survival time as a parameter for gene selection and used a gene clustering approach instead of group statistics. It is not unexpected that the gene list does not overlap previously reported lung cancer signature genes as the YMR signature development approach is quite different.
  • a useful prognostic signature should not only predict the patient's prognosis, but should also help clinical therapeutic decision making. Even though surgery alone is a standard treatment for early stage lung cancer, more than 20% of stage I patients will relapse. This portion of patients might benefit from chemotherapy. For the late stage lung cancer patients, after the complete resection of tumors, a good prognostic signal could spare the patient from chemotherapy or recommend less intensive therapy.
  • the YMR signature disclosed herein is a diagnostic tool for use in clinical therapeutic decision making for different stages of lung cancer. For those high YMR stage I patients, a careful therapy recipe is recommended. Chemotherapy can improve outcomes for high YMR stage II & III patients. Example 2;
  • the goal of this study was to narrow down the 31 Yin genes and 32 Yang genes disclosed in Example 1 required for development of reliable YMR signatures while retaining or increasing the YMR signature prognosis performance.
  • a smaller Yin gene / Yang gene list will reduce the clinical costs required to generate the YMR signatures and will be more practical for routine PCR-based detection protocols.
  • the YMR signature development was optimized on a variety of platform data sets using a multiple permutation process (MPP) as illustrated in Fig. 3. Seven hundred and forty one sample data were used in this optimization; specifically (i) 300 samples were downloaded from the DCC U133A NICI caArray database (https://array.nci.
  • the best gene list size was optimized by permutation of 10,000 YMR signatures with different sizes, followed by testing each signature against 1000 random data sets of 200 samples each.
  • the signature with all 31 Yin and 32 Yang genes had the highest occurrence of tests of p-value less than 0.05 among the 100,000,000 data set tests, (Fig. 21(A)-(C)).
  • Very few signatures consisting of 2 Yin and 2 Yang genes had p-values less than 0.05.
  • none of the signatures containing equal or more than 27 Yin or Yang genes had 75 th percentile of p-values less than 0.05.
  • Each signature was tested in 1,000 data sets of 200 random samples each. Signatures with the lowest p-value as well as Hazard Ratio (HR) greater than 1.1, were retained. The top ten best performing YMR signatures were ranked by number of p-value ⁇ 0.05, the median p-value, 90 th percentile p-value, as well as the gene number in the signature (Table 16).
  • IGFBP5 IGFBP5; AMH; NRAS; RECQL4; GRM1 6 SOSTDC1 ; TNNC1 ; CD83; HOXA5; ALDH1A2; GATA2
  • the top two signatures exhibited very close performance but differ in only one Yang gene (MYHIO versus HOXA5). Since HOXA5 occurred more frequently than MYHIO in the 4,398 signatures that have p-value less than 0.05 (Fig. 22) and HOXA5 acts directly downstream of the Retinoic acid receptor (RAR) activation pathway that noted in Example 1 as one of the main Yang effects, the signature of 4 Yin genes (GRMl, IGFBP5, NRAS, RECQL4) and 6 Yang genes (CRIP2, CD83, GATA2, HOXA5, SOSTDC1, TNNC1) genes were chosen for further testing.
  • RAR Retinoic acid receptor
  • This signature showed prognostic significance in 994 of 1,000 data sets with a median p-value of 1.30e-05 and 90 th percentile p-value of 0.002. Each one of these data sets consisted of 200 randomly picked samples. The clinical potential of this signature is high since it was validated in 1,000 different data sets. It is worth noting that all these top 10 signatures present similar performance, and therefore, all 10 signatures are useful. They share the main components of biological processes or pathways. Therefore any one gene from a particular biological process or pathway group could substitute for any other within the group since each would be expected to exhibit the same biological effect in a signature.
  • the optimized YMR significantly stratified patients into high-risk and low-risk groups (Figs. 23-27). These varieties of platform data sets were combined to make up a total of 1,664 samples. This large data set contains 909 samples from the 7 new data sets along with 613 samples used in the optimization studies plus additional 142 DCC samples that were not used in the optimization studies. Among the 741 cases used in optimization, 128 (56 Marsh data samples and 72 TCGA RNAseq samples) were not included in the large data set because these cohorts lacked tumor stage information. As summarized in Table 16, among these patients, the median age at diagnosis 63.2 years, 47% males, 58% were smokers, and 54% were Stage I.
  • the patient cohort included approximately 17% who took therapy, 48% who did not take therapy after diagnosis, and 35% of patients whose treatment information was unknown.
  • EGFR/KRAS mutation status was known for 526 patients and p53 mutation status was known for 207 patients.
  • Table 16 The patient cohort included approximately 17% who took therapy, 48% who did not take therapy after diagnosis, and 35% of patients whose treatment information was unknown.
  • EGFR/KRAS mutation status was known for 526 patients and p53 mutation status was known for 207 patients.
  • YMR signature significantly stratified Stage I patients into high- and low-risk groups (p-value of 3.5e-05, HR of 2.7(1.7-4.2)) (Fig. 28(B)). YMR also stratified stage II patients (p-value of 0.046, HR of 2.8(1.01-7.9)) and stage III patients (p- value of 0.004, HR of 2.7(1.37-5.37) (Figs. 29(A)-(D)). The YMR signature significantly stratified patients who underwent either chemotherapy or radiotherapy treatment, into high- risk and low-risk groups (p-value of 0.03, HR of 2.9(1.1-7.6)). Similarly the YMR signature stratified patients who did not receive treatment (p-value of 0.02, HR of 2.7(1.16-6.26)).
  • Age > 65 years old carries significantly higher risk of death than age ⁇ 65 years (p-value of 5.10 "6 , HR of 1.47(1.25-1.74)), as expected.
  • Cancer stage was the most significant predictor of death.
  • the patients who bore at least one common oncogene mutation (KRAS/EGFR) or tumor suppressor gene p53 did not show higher risk than the patients not carrying these mutations.
  • the 4 Yin genes are oncogenes or involved in tumorigenesis in various organs, whereas the 6 Yang genes are tumor suppressors or involve in apoptosis or other anti-tumor processes.
  • the two pathways discussed in Example 1, i.e., the molecular mechanisms of cancer pathway and the RAR pathway, are the core of this Yin and Yang signature.
  • YMR Yin-Yang ratio model
  • Example 1 It is disclosed in Example 1 that the identified 63 Yin and Yang genes were involved in the canonical Molecular Mechanisms of Cancer pathway and the Retinoic acid receptor (RAR) activation pathway.
  • the refined 4 Yin and 6 Yang genes disclosed in this Example are either directly or indirectly related to these two pathways. The ratio of these gene expressions represents the balance of these pathways, thereby reflecting the biological balance of the Yin and Yang effects within the tumor cells and consequently the risk of either cancer progression or cancer suppression.
  • the network that was generated using the 4 Yin and 6 Yang genes included the p38MAPK, J K, AKT, MAPK, and ERK1/2 gene products, all with established roles in regulation of cancer development and progression.
  • GRM1 (SEQ ID NO: 16) is an oncogene in epithelial cells. GRM1 can activate ERK1/2 (Mann et al, 2006, Stimulation of oncogenic metabotropic glutamate receptor 1 in melanoma cells activates ERK1/2 via PKC. Cellular signalling 18: 1279-1286). ERK1/2 in turn can activate c-JUN and c-FOS transcription factors which regulate genes functional in the cell cycle and oncogenesis.
  • GRM1 (SEQ ID NO: 16) was found to play a role in the regulation of cell proliferation and tumor growth of breast cancer and was suggested as a potential new molecular target for anti-angiogenic therapy of breast cancer (Speyer et al., 2014, Metabotropic Glutamate Receptor-1 as a Novel Target for the Antiangiogenic Treatment of Breast Cancer. PloS one 9:e88830) and renal cell carcinoma (RCC) (Martino et al, 2013, Metabotropic glutamate receptor 1 (Grml) is an oncogene in epithelial cells. Oncogene 32;4366-4376). RECQL4 (SEQ ID NO:28) was found highly expressed in human prostate tumor tissues.
  • RECQL4 Transient and stable suppression of RECQL4 (SEQ ID NO:28) by small interfering RNA and short hairpin RNA vectors drastically reduced the growth and survival of metastatic prostate cancer cells, indicating that RECQL4 (SEQ ID NO:28) could play critical roles in prostate cancer progression (Su et al, Human RecQL4 helicase plays critical roles in prostate carcinogenesis . Cancer research 70:9207- 9217). RECQL4 (SEQ ID NO:28) was also overexpressed in breast cancer cells and may play a critical role in human breast tumor progression (Fang et al, 2013, RecQL4 Helicase Amplification Is Involved in Human Breast Tumorigenesis. PloS one 8:e69600).
  • IGFBP5 (SEQ ID NO:31) is an important member of the insulin growth factor system, which is critical for both normal cell physiology and tumorigenesis. IGFBP5 (SEQ ID NO:31) was found more highly expressed in breast cancer compared to adjacent normal tissues (Pekonen et al, 1992, Insulin-like growth factor binding proteins in human breast cancer tissue. Cancer research 52:5204-5207). IGFBP-5 (SEQ ID NO:31) overexpression has also been found to be a poor prognostic factor in patients with urothelial carcinomas of upper urinary tracts and urinary bladder (Gopal et al, 2013, SOSTDC1 down-regulation of expression involves CpG methylation and is a potential prognostic marker in gastric cancer.
  • the Yang gene HOXA5 (SEQ ID NO: 161) is a transcriptional factor whose expression is lost in more than 60% of breast carcinomas (Chen et al, 2004, HOXA5 -induced apoptosis in breast cancer cells is mediated by caspases 2 and 8. Molec. Cell. Biol. 24:924- 935).
  • HOXA5 acts directly downstream of retinoic acid receptor ⁇ and contributes to retinoic acid-induced apoptosis and growth inhibition and chemopreventive effects, and induction of HOXA5 (SEQ ID NO: 161) expression leads to cell death with features typical of apoptosis (Chen et al, 2007).
  • SOSTDC1 SEQ ID NO:94
  • SOSTDC1 SEQ ID NO:94
  • Expression of SOSTDC1 (SEQ ID NO:94) in gastric tumors increased the probability of both overall and disease-free survival and it is consequently a potential prognostic factor and tumor suppressor in gastric cancer (SEQ ID NO:94).
  • CRIP2 (SEQ ID NO: 136) is a candidate tumor-suppressor gene, capable of functionally suppressing tumor formation. It acts as a repressor of NF-kB- mediated proangiogenic cytokine transcription to suppress tumorigenesis and angiogenesis (Cheung et al, 2011, Cysteine-rich intestinal protein 2 (CRIP 2) acts as a repressor ofNF- ⁇ -mediated proangiogenic cytokine transcription to suppress tumorigenesis and angiogenesis. PNAS 108:8390-8395. Down-regulation of NF-kB leads to positive feedback of the RAR pathway (Figs. 23-27).
  • CRIP2 (SEQ ID NO: 136) induces apoptosis through induction of active caspase 3 and 9 proteins (Lo et al., The LIM domain protein, CPJP2, promotes apoptosis in esophageal squamous cell carcinoma. Cancer letters 316:39-45).
  • GATA2 (SEQ ID NO:171) is critical for organ development and associated with progression of various cancer types and was found to associate with RAR. This association is mediated by the zinc fingers of GATA2 (SEQ ID NO: 171) and the DNA-binding domain of RAR (Tsuzuki et al, 2004, Cross talk between retinoic acid signaling and transcription factor GATA-2. Molec. Cell. Biol.
  • GATA2 SEQ ID NO: 171
  • PloS one 9:e87505 Decreased expression of GATA2 (SEQ ID NO: 171) was associated with poor prognosis of HCC following resection Li et al, 2014, Decreased Expression of GATA2 Promoted Proliferation, Migration and Invasion ofHepG2 In Vitro and Correlated with Poor Prognosis of Hepatocellular Carcinoma. PloS one 9:e87505).
  • GATA2 (SEQ ID NO: 171)
  • human and mouse lung tumors are via an epigenetic mechanism since its promoter was unmethylated in normal lung but frequently methylated in lung tumors and NSCLC cell lines (Tessema et al, 2014, GATA2 is Epigenetically Repressed in Human and Mouse ung Tumors and Is Not Requisite for Survival of KRAS Mutant ung Cancer. J. Thorac. Oncol. 9:784-793).
  • Human CD83 is a marker molecule for mature dendritic cells (DC) that play a key role in inducing and maintaining antitumor immunity. DC antigen-presenting function may be lost or inefficient in the tumor environment.
  • TNNCl SEQ ID NO: 125
  • the troponin C type 1 (slow) gene encodes a central calcium regulatory protein troponin of striated muscle contraction. This was the most frequent gene occurring in the optimizing Yang gene lists, suggesting that this gene could be a tumor suppressor in lung cancer similar to MYOD, another muscle gene, in brain cancer (Dey et al., 2013, MyoD is a tumor suppressor gene in medulloblastoma. Cancer Res. 73:6828-6837).
  • TNNCl SEQ ID NO: 125
  • TNNCl SEQ ID NO: 125
  • binds calcium ions that are involved in apoptotic signaling Pieris et al, 2008, Calcium and apoptosis: ER-mitochondria Ca2+ transfer in the control of apoptosis.
  • Oncogene 27:6407-6418 The combination of different platform data as a whole provides confidence that this
  • YMR will also work using qPCR detection.
  • the 4 Yin genes, 6 Yang genes as well as 3 housekeeping genes are feasible for development as a qPCR assay for clinical use.
  • This Yin and Yang gene signature provides prognostic and potentially predictive information for all stage patients. In particular, those patients whose tumors have a low YMR ratio have better treatment outcomes than those patients who have a higher YMR ratio.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Organic Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne un procédé de sélection d'un ensemble de gènes Yin et de gènes Yang destiné à être utilisé pour l'établissement d'un pronostic d'un patient cancéreux, ledit procédé comprenant les étapes suivantes : (i) criblage d'ensembles de données d'expression génique collectés à partir d'une pluralité d'échantillons de tissus cancéreux de patients afin d'identifier une première pluralité de gènes exprimés dans les cellules cancéreuses ; (ii) criblage d'ensembles de données d'expression génique collectés à partir d'une pluralité de sujets sains ou d'échantillons de tissus normaux adjacents à un cancer afin d'identifier une seconde pluralité de gènes exprimés dans les cellules saines ; et (iii) sélection d'un ensemble de gènes Yin et d'un ensemble de gènes Yang sélectionnés, par le biais d'une analyse fonctionnelle des voies et des gènes ; (iv) construction d'un modèle de rapport d'expression génique Yin et Yang (YMR) comme signature pronostique d'un cancer. Mise en œuvre d'un logiciel de signature pronostique pour un patient cancéreux par : (v) détection des niveaux d'expression des gènes Yin sélectionnés et des gènes Yang sélectionnés ; et (vi) calcul du rapport YMR afin de prédire les résultats cliniques pour le patient.
PCT/CA2014/050704 2014-07-24 2014-07-24 Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon WO2016011524A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CA2014/050704 WO2016011524A1 (fr) 2014-07-24 2014-07-24 Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2014/050704 WO2016011524A1 (fr) 2014-07-24 2014-07-24 Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon

Publications (1)

Publication Number Publication Date
WO2016011524A1 true WO2016011524A1 (fr) 2016-01-28

Family

ID=55162345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2014/050704 WO2016011524A1 (fr) 2014-07-24 2014-07-24 Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon

Country Status (1)

Country Link
WO (1) WO2016011524A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019243567A1 (fr) * 2018-06-21 2019-12-26 Cancer Research Technology Limited Procédé de prédiction de pronostic et de réponse thérapeutique

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014176687A1 (fr) * 2013-04-29 2014-11-06 University Of Manitoba Modèles de rapport d'expression génique de type yin-yang pour la génération de signatures pronostiques cliniques pour des patients atteints d'un cancer du poumon

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014176687A1 (fr) * 2013-04-29 2014-11-06 University Of Manitoba Modèles de rapport d'expression génique de type yin-yang pour la génération de signatures pronostiques cliniques pour des patients atteints d'un cancer du poumon

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BHATTACHARJEE, A. ET AL.: "Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES USA., vol. 98, no. 24, 20 November 2001 (2001-11-20), pages 13790 - 13795, XP003011749, ISSN: 0027-8424, DOI: doi:10.1073/pnas.191502998 *
XU, W. ET AL.: "Yin Yang gene expression ratio signature for lung cancer prognosis.", PLOS ONE., vol. 8, no. 7, 17 July 2013 (2013-07-17), pages 1 - 11, ISSN: 1932-6203 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019243567A1 (fr) * 2018-06-21 2019-12-26 Cancer Research Technology Limited Procédé de prédiction de pronostic et de réponse thérapeutique
CN112567052A (zh) * 2018-06-21 2021-03-26 癌症研究技术有限公司 预后和治疗响应预测方法

Similar Documents

Publication Publication Date Title
JP4938672B2 (ja) p53の状態と遺伝子発現プロファイルとの関連性に基づき、癌を分類し、予後を予測し、そして診断する方法、システム、およびアレイ
Hu et al. Serum microRNA signatures identified in a genome-wide serum microRNA expression profiling predict survival of non-small-cell lung cancer
US8877445B2 (en) Methods for identification of tumor phenotype and treatment
ES2525382T3 (es) Método para la predicción de recurrencia del cáncer de mama bajo tratamiento endocrino
Stigliani et al. High genomic instability predicts survival in metastatic high-risk neuroblastoma
EP1721159B1 (fr) Pronostics de cancer du sein
Ak et al. MicroRNA and mRNA features of malignant pleural mesothelioma and benign asbestos-related pleural effusion
US20130273079A1 (en) Ultraconserved Regions Encoding ncRNAs
CN106574297B (zh) 选择用于癌症治疗的个体化三联疗法的方法
US20100216131A1 (en) Gene expression profiling of esophageal carcinomas
WO2011039734A2 (fr) Utilisation de gènes impliqués dans l'indépendance d'ancrage pour l'optimisation du diagnostic et du traitement du cancer humain
JP7043404B2 (ja) 早期乳癌における内分泌処置後の残留リスクの遺伝子シグネチャー
KR20190089552A (ko) 비근침윤성 방광암 진단용 바이오마커 및 이의 용도
CA2696947A1 (fr) Procedes et outils de diagnostic de cancer chez des patients er-
Hwang et al. Genomic copy number alterations as predictive markers of systemic recurrence in breast cancer
Sriram et al. Genomic medicine in non‐small cell lung cancer: Paving the path to personalized care
Zhao et al. A three long noncoding RNA-based signature for oral squamous cell carcinoma prognosis prediction
Bie et al. Higher expression of SPP1 predicts poorer survival outcomes in head and neck cancer
Nair et al. Genomic and transcriptomic analyses identify a prognostic gene signature and predict response to therapy in pleural and peritoneal mesothelioma
EP2657348B1 (fr) Profils d'ARNmi de diagnostic de la sclérose en plaques
WO2022122994A1 (fr) Procédé de pronostic pour des adénocarcinomes pulmonaires agressifs
US20120238617A1 (en) Microrna expression signature in peripheral blood of patients affected by hepatocarcinoma or hepatic cirrhosis and uses thereof
EP1512758B1 (fr) Pronostic de cancer colorectal
WO2014176687A1 (fr) Modèles de rapport d'expression génique de type yin-yang pour la génération de signatures pronostiques cliniques pour des patients atteints d'un cancer du poumon
WO2016011524A1 (fr) Modèles de rapports d'expression génique yin-yang pour la génération de signatures pronostiques cliniques pour patients atteints d'un cancer du poumon

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14898185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14898185

Country of ref document: EP

Kind code of ref document: A1