WO2019220459A1 - A chip and a method for head & neck cancer prognosis - Google Patents

A chip and a method for head & neck cancer prognosis Download PDF

Info

Publication number
WO2019220459A1
WO2019220459A1 PCT/IN2019/050386 IN2019050386W WO2019220459A1 WO 2019220459 A1 WO2019220459 A1 WO 2019220459A1 IN 2019050386 W IN2019050386 W IN 2019050386W WO 2019220459 A1 WO2019220459 A1 WO 2019220459A1
Authority
WO
WIPO (PCT)
Prior art keywords
samples
genes
sample
tumour
chip
Prior art date
Application number
PCT/IN2019/050386
Other languages
French (fr)
Inventor
Sultan PRADHAN
Susanta ROYCHOUDHURY
Surajit GANGULY
Rajan KANNAN
Sucheta TRIPATHY
Sanjib DEY
Dipanjana DATTA DE
Piyush DAS
lndranil MUKHOPADHYAY
Farokh CHINOY ROSHAN
Rajesh MUNDE
Arnab Choudhury
Original Assignee
Council Of Scientific And Industrial Research
Prince Aly Khan Hospital
Tcg Lifesciences Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Council Of Scientific And Industrial Research, Prince Aly Khan Hospital, Tcg Lifesciences Limited filed Critical Council Of Scientific And Industrial Research
Publication of WO2019220459A1 publication Critical patent/WO2019220459A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the invention relates to a chip and a method for head & neck cancer prognosis and further the biomarkers that predict the aggressive subset amongst early squamous cancer of tongue, biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa, and an analytical process using micro-array based gene expression data to classify metastatic and non-metastatic Squamous cell carcinoma of the head and neck (HNSCC) tumors to aid in the decision making by a Surgeon to dissect or not to dissect the neck of a patient. In other words, this classification process will allow the surgeon to avoid over-treatment as well as under treatment.
  • HNSCC head and neck
  • HNSCC head and neck
  • HNSCC arises from the epithelial mucosal region of the upper aerodigestive tract encompassing regions of oral cavity, larynx and pharynx and has variable etiologies and prognosis. Smoking and alcohol consumption are the foremost risk factors for HNSCC.
  • HNSCC Human papillomavirus
  • Oral submucous fibrosis is a typically Indian disease as it is caused due to the habit of chewing areca nut.
  • a very large number of Indian oral cancer cases have associated submucous fibrosis. This is not so in oral cancers in the Western population.
  • HNSCC has a tendency to metastasize to the cervical lymph nodes. Presence of metastasis to cervical lymph node has a major adverse impact on the prognosis of oral cancer. There are site wise differences in the pattern of lymph node metastases. Level one lymph node is the first echelon of metastasis from cancer of the buccal mucosa whereas level two is the first echelon of metastasis from cancer of the tongue.
  • Presence of cervical lymph node metastases may manifest clinically as a palpable mass or may be demonstrated on imaging and a needle biopsy.
  • N+ neck a sizeable number of cases of oral cancer there may be micro metastases that are neither clinically palpable nor are demonstrable on imaging. These cases may pass off as ' N zero ' (NO) though they are truly ' N+ ' cases.
  • stage Tl tumor size ⁇ 2cms
  • stage T2 tumor size 2-4 ems
  • a radical or modified radical neck dissection is performed.
  • NO (N zero) cases may be harbouring occult cervical lymph node metastases. If the neck is not dissected, the neck disease will manifest within weeks or months as palpable metastatic nodes. Resection at that stage may or may not be possible. If on the other hand, neck dissection is performed prophylactically in all NO cases, 70% of the cases will be subjected to an unnecessary radical surgery with all its costs & morbidity.
  • the initial 102 gene signature developed on a in-house 'customized' array platform and was subsequently subjected to a commercial whole gene expression array for platform transition, followed by the analysis on a dedicated diagnostic array for multicentric validation.
  • Van Hoof SR et al. (Carcinoma Journal of Clinical Oncology 2012 30:33, 4104-4110) reported an analysis that showed an overall accuracy of 72% for the whole validation cohort and an 89% NPV upon combining with clinical assessment.
  • the 825 genes or subset thereof generated from the HNSCC tumors of patient population in the Netherlands disclosed in the PCT publication W02006/085746 suffer from severe limitations and is not applicable to patients in the Asian subcontinent due to their distinct population specific differences in the genetic architecture of the HNSCC tumors.
  • the gene list could classify only at 72% accuracy for the validation cohort that reaches to 89% upon combining with clinical information. Further, the said gene list is unlikely to successfully classify the HNSCC samples originating from the Asian subcontinent as a major percentage of these population cohorts have a history of exposure to smokeless tobacco, as depicted by the current set of patient volunteers whose tissues were subjected to transcriptomics analysis (Table 1).
  • biomarkers that predict the aggressive subset amongst early squamous cancer of tongue biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa within the population in the Asian subcontinent and an analytical process using micro-array based gene expression signature to distinguish potentially metastatic from non-metastatic HNSCC tumors to aid in the decision making by a Surgeon whether to undertake a prophylactic radical neck dissection or whether to wait and watch with regular follow up.
  • the main object of the invention is to provide a chip, a kit, biomarkers and an in-vitro method for head & neck cancer prognosis.
  • Another major objective of the invention is to provide biomarkers to aid the surgeon in making evidence based decision, whether to dissect or not to dissect the neck during surgery for an early (Tl,/T2, NO) Squamous cancer of the oral cavity thus minimising the risk of under treatment and avoiding the morbidities of over-treatment.
  • a major objective of the invention is to provide biomarkers to identify the aggressive subset amongst early HNSCC (Tl, T2, NO) so as to assist the head and neck oncologist in prognostication as also in planning appropriate and optimum treatment upfront.
  • Another object of the invention is to provide biomarkers that predict the aggressive subset amongst early squamous cancer of tongue (Tl /T2) within the population in the Asian subcontinent.
  • Another object of the invention is to provide biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa (T1/T2) within the population in the Asian subcontinent.
  • Yet another object of the invention is to provide an analytical process using micro-array based gene expression data to classify metastatic and non-metastatic HNSCC tumors that can be used by a surgeon to decide on whether to dissect the neck of a patient.
  • Yet another object of the invention is to provide a kit for the determination of head-and-neck tumor, comprising a microarray which includes one or more polynucleotides comprising a nucleic acid sequence complementary to the sequence of one or more mRNAs selected from the group arrived at using the said analytical process, capable of measuring the expression level of the mRNA in tumor biopsy materials or body fluids like saliva and blood, so as to yield a comparative expression profile between sample and control to determine presence of tumor and its metastatic potential.
  • Yet another object of the invention is to provide the said kit for the determination of head-and- neck tumor, including a primer set capable of amplifying the sequence of one or more mRNA in tumor biopsy materials or body fluids like saliva and blood, selected from the group arrived at using the analytical process and a fluorescent probe comprising a polynucleotide consisting of a nucleic acid sequence complementary to the mRNA sequence or a part thereof.
  • the present invention relates to a chip for detection of head & neck cancer prognosis consisting of cDNA sequence complimentary to the nucleotide sequence having Seq id no. 1-957.
  • the invention provides a method for microarray-based gene-expression assay conducted on a microarray chip for quantifying the expression levels of signature biomarkers in a tumour biopsy material, such that the said method comprises steps of:
  • step (e) hybridising the said fT as obtained in step (d) with oligonucleotide probes on the said microarray chip;
  • the scanner generates fluorescent coloured image of the resultant spots due to hybridization of fT to the probes on the Chip.
  • the microarray chip has a biomarker specific oligonucleotide probe on the surface of a glass slide.
  • tumour is selected from a group consisting of HNSCC, at NO, N+, Tl, T2, T3, T4 stage of tongue and buccal mucosa.
  • cRNA as in step‘c’ is a biomarker for detection of head & neck cancer prognosis having Seq id no. 1-957.
  • kits for head & neck cancer prognosis consisting of :
  • up-regulation induction
  • concomitant down-regulation inpression
  • Step VIII repeating Steps V to Step VII 200 times to remove the sample bias and collecting all 200 feature lists so obtained;
  • the invention provides a method for screening therapeutic agents for head-and-neck tumour, comprising the steps: i. administering a test substance to a non-human animal suffering from artificially induced head-and-neck tumour;
  • Fig. 1 illustrates the flow chart of the analysis process followed.
  • the Model depicts the workflow and the numbers of the samples analyzed and gene shortlisted at each analysis. 443 samples were received of which 286 were tumor samples. Of this 286 samples 50 were kept aside for the testing set and the rest were used as a training set for the discovery phase. Reiterative statistical and computer algorithms were used to generate a final gene list. The combined final list was then validated in the 50 samples that were kept aside for the testing phase.
  • Fig. 2 illustrates the Heat Map.
  • the heat map depicts the differentially expressed transcripts (genes) between tumor and control tissues.
  • 2980 differentially expressed transcripts with p value less than 0.05 and 1.5 fold differences were selected.
  • Fig. 3 illustrates the prediction accuracy curve.
  • the lower panel shows how the prediction accuracy is varied by the different transcript lists.
  • Fig. 4 illustrates AUC - training sample.
  • the panels depict the sensitivity of prediction both in the training set samples used for discovery phase and the testing set samples used for validation phase.
  • Fig. 5 illustrates the validation data of prediction of 50 test samples.
  • Fig. 6 illustrates prediction accuracy of 10 follow up samples converted from NO to N+.
  • the picture depicts that prediction with the help of this algorithm accurately predicts the clinical progression of 10 samples. Samples which converted from NO to N+ clinically after 18 months were predicted successfully by the transcript list generated.
  • Fig. 7 illustrates comparison of gene signatures derived from Indian and Caucasian head and neck patients.
  • the graph illustrates that the transcript list generated is indigenous to the India population and is unique when compared to the Caucasian population. This shows the necessity to have an indigenous biomarker for tumor progression as the allele pool is different between Indian and other populations.
  • Fig. 8 illustrates classifying unknown sample as NO or N+:
  • the normalized expression value is used as input into the model generated from the 34 or 288 gene signature using the learning algorithm as shown.
  • Table 1 illustrates the history of 293 patient volunteers whose tissues were subjected to gene- expression analysis for this invention.
  • Table 2 illustrates the clinical description of the samples used for gene-expression analysis.
  • Table 3 illustrates the differentially expressed [Tumour (T) versus Control (C)] 2980 genes or nucleotide sequences.
  • Table 4 illustrates the 288 gene signatures [N+ vs NO]
  • Table 5 illustrates the signature of 34 differentially (N+ versus NO) expressed genes or nucleotide sequences.
  • Table 6 illustrates the prediction accuracy/sensitivity.
  • Table 7 illustrates the signatures of 957 genes or nucleotide sequences [N+ vs NO]
  • RNA samples were collected in the hospital (Table 2) [Tissue material is taken from Prince Aly Khan Hospital (PAKH), Mumbai- 400010] and preserved in RNA Later (Ambion) at -80°C.
  • Total RNA from the tissue samples were isolated and quality and quantity were determined.
  • the isolated RNA samples were converted to cRNA and labelled with cy3 (CyanineTM 3) dye.
  • the labelled cRNA were hybridized with Illumina Human HT-12 v4 Expression Beadchip and the Illumina bioarrays were read in an Illumina iScan Reader and the primary intensity data were obtained in standard file format, using Genome Studio software.
  • the process of analysing the micro-array data comprises the following main steps:
  • Step VIII Repeating Steps V to Step VII 200 times to remove the sample bias and collecting all 200 feature lists so obtained;
  • XI Predicting 50 validation samples set with the gene that appeared highest number of times in the 957-gene list and calculating prediction accuracy followed by prediction with two transcripts having highest and next highest frequency. This procedure was repeated by increasing the number of transcripts.
  • the transcripts list(s) comprising of 288 nucleotide sequences (genes or transcripts) (Table 4) and 34 nucleotide sequences (genes or transcripts) (Table 5) showed highest prediction accuracy.
  • the biomarker information generated from the training set samples was used for class prediction of 50 tumor samples that were initially put aside and are referred to as validation set samples. Analysis of these 50 samples with 288 nucleotide sequences (genes or transcripts) and also with 34 nucleotide sequence signature which is a subset of 288 nucleotide sequences (genes or transcripts) gave overall prediction of 84% and 82 % respectively in these samples (Fig. 5). Moreover, Fig. 3 illustrates that the nucleotide sequence signature of 288 nucleotide sequences (genes or transcripts) or a subset of 34 nucleotide sequences (genes or transcripts) gave overall prediction of > 95% in training samples. The specificity and sensitivity for both the test samples yielded acceptable results as demonstrated in Fig. 4 and Table 6. The ROC curves illustrated in Fig. 4 depict the percentage of false positives and false negatives.
  • the present invention includes a nucleotide sequence (mRNA or cDNA) signature of 288 genes with up-regulation (induction) of one or more genes selected from serial numbers 1-174 and concomitant down-regulation (suppression) of one or more genes from numbers 175-288 as listed in table 4 and a signature of 34 genes (a subset of 288 genes) as given in table 5 where upregulation of one or more genes selected from genes 1-21 and concomitant down- regulation of one or more genes from numbers 22-34 are indicative of aggressive subset of head and neck cancer.
  • Both the signature gene sets of 34 and 288 genes are subsets of 957 genes (Table 7).
  • the present invention provides: (i) A signature of 957 nucleotide sequences or genes or transcripts — any combinations of nucleotide sequences 1-957;
  • nucleoti desequences 1-288 combinations of nucleoti desequences 1-288;
  • nucleotide sequences l-34 that can be used for determining the expression level of the mRNA in tumor biopsy materials so as to yield a comparative expression profile between sample and control to determine presence of tumor and its metastatic potential
  • RNA quality control criteria was set in concordance with tumor analysis best practicing practice group, where samples below RIN value as 7 were not used further.
  • cRNA was prepared using Illumina Total Prep RNA Amplification Kit from Ambion as per the manufacturer's instructions. First strand synthesis of cDNA was performed by reverse transcription of 500ng of RNA using Arrayscript followed by second strand synthesis by DNA polymerase. The double stranded cDNA was then purified and used as template for in vitro transcription at 37 ° for 16 hours to generate cRNA. During in vitro transcription biotynylated 5 -(3- aminoallyl ) -UTP was incorporated in the single stranded cRNA. The cRNA was then purified and the yield was quantified by Qubit spectrophotometry. 750ng of cRNA was used for further hybridizations.
  • Example 3 Array Based Whole Genome Gene expression profiling using Illumina Human HT-12 v4 Expression Beadchip
  • the Direct hybridization Whole-Gene Expression assay offers the highest multiplexing capabilities for whole genome gene expression, simultaneously profiling more than 47,000 transcripts (genes).
  • the Human-HT-l2 v4.0 Expression Beadchip supports 12 samples format facilitates large scale gene expression studies. This technique allows to analyze genome wide transcriptome profile targeting more than 31,000 annotated genes per sample using 47,000 probe sets.
  • Illumina Whole Genome Gene Expression Beadchips consists of oligonucleotides immobilized to beads held in microwells on the surface of an array substrate. The presence of 29- mer array sequence on each bead helps in uniquely identifying the location of the bead on the array surface and hence helps in hybridization based procedure to map the array. Seven-hundred fifty nanogram (750ng) of labeled sample cRNA were detected by hybridization to 50 mer probes on the bead chip. Subsequent steps included washing, blocking, and streptavidin-Cy3 staining followed by serial non-stringent washing steps to remove unbound conjugate. Following the final rinse, the chips were dried by centrifugation and scanned. The Illumina bioarrays were read in an Illumina iScan Reader and the primary intensity data were obtained in standard file format, using Genome Studio software.
  • Raw background subtracted data were extracted by Illumina GenomeStudio Software v3, and were further processed in R statistical environment (http://www.r-project.org) using Lumi package.
  • Raw data were pre-processed for force positive value to handle using variance stabilization transformation of R Bioconductor Lumi package.
  • the robust spline method was used for normalization and outlier is removed by using k- means method. Filtration was done with probes reaching a detection p value ⁇ 0.01 in all samples.
  • Differentially expressed nucleotide sequences were analyzed in the R Bioconductor Limma package. A linear model was fitted for each nucleotide sequence given a series of arrays using lmFit function. Using Benjamini and Hochberg method, the p values were adjusted for multiple testing. Nucleotide sequences/ probes with FC >1.5 or FC ⁇ -1.5 and adjusted p value ⁇ 0.05 were considered to be differentially expressed between tumor and control samples.
  • a Class Prediction based on training testing method was developed. Three hundred three (303) tumor samples were first normalized based on robust spline normalization and outliers (21 samples) were removed. From this group of 286 tumor samples, 50 randomly selected samples were kept aside for validation. The remaining 236 tumor samples were used for identification of the biomarker.
  • the training testing method was performed in 236 samples using the 2980 nucleotide sequences as input to predict NO and N+ tumors (Table 3). The data from each sample was subjected to reiterative statistical analysis to produce a nucleotide sequence/ probe list that could accurately classify early NO and N+ HNSCC patients in 236 training set samples.
  • nucleotide sequence signature of 957 (Table 7) nucleotide sequences or a sub set of 288 (Table 4) nucleotide sequences or a subset of 34 (Table 5) nucleotide sequences gave overall prediction of > 95% in training samples. (Fig. 4).
  • the biomarker information generated from the discovery phase / training set samples was used for class prediction of 50 tumor samples that were initially put aside and are referred to as validation set samples. Analysis of these 50 samples with 288 nucleotide sequences and also with 34 nucleotide sequence signature which is a subset of 288 nucleotide sequences gave overall prediction of 84% and 82 % respectively, in these samples (Fig. 3 and Fig. 5). The specificity and sensitivity for both the test samples yielded acceptable results as depicted in Fig. 4 and Table 6. The ROC curves illustrated in Fig. 4 depict the percentage of false positives and false negatives.
  • the present invention includes a nucleotide sequence (mRNA or cDNA) signature of 288 genes with up-regulation (induction) of one or more genes selected from serial numbers 1-174 and concomitant down-regulation (suppression) of one or more genes from numbers 175-288 as listed in table 4 and a signature of 34 genes (a subset of 288 genes) as given in table 5 where upregulation of one or more genes selected from genes 1-21 and concomitant down- regulation of one or more genes from numbers 22-34 are indicative of aggressive subset of head and neck cancer.
  • Both the signature gene sets of 34 and 288 genes are subsets of 957 genes (Table 7).
  • the list of biomarkers or genes, as provided in tables 4, 5, and 7, was used to identify the aggressive tumour subset amongst early HNSCC (Tl, T2, NO) tumor by assaying the expression level of mRNAs of the corresponding genes in tumour biopsy materials by a hybridization-based assay on a microarray chip.
  • the microarray chip is prepared by spotting (printing) mRNA or biomarker specific oligonucleotide probe as listed in the sequence listing, along with few additional spots (5-10) of oligonucleotide unrelated to human samples for background correction and few oligonucleotides (3-5) for normalization purposes on the surface of a glass slide.
  • the probes spotted on a chip “hybridize” with the corresponding mRNA or biomarker, listed in Tables 4, 5 and 7, present in the tumor sample of the subject.
  • the invention relates to the development of a gene expression based biomarker that predicts the aggressive subset amongst early stage oral cavity squamous cell carcinoma of the head and neck (HNSCC).
  • This gene expression based kit will aid in the decision making by a Surgeon to dissect or not to dissect the lymph node of a patient with oral cavity cancer. Thus, this classification process will allow the surgeon to avoid over-treatment as well as under treatment.

Abstract

The present application provides a method for classification of metastatic and non-metastatic squamous cell carcinoma of head and neck (HNSCC) into N0 (low or no risk of metastases) or N+ (high risk of metastases) based on analysis of gene expression profile. Particularly, the present application provides a method for classification of HNSCC which is specifically effective for the Asian population.

Description

A CHIP AND A METHOD FOR HEAD & NECK CANCER PROGNOSIS
FIELD OF THE INVENTION
The invention relates to a chip and a method for head & neck cancer prognosis and further the biomarkers that predict the aggressive subset amongst early squamous cancer of tongue, biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa, and an analytical process using micro-array based gene expression data to classify metastatic and non-metastatic Squamous cell carcinoma of the head and neck (HNSCC) tumors to aid in the decision making by a Surgeon to dissect or not to dissect the neck of a patient. In other words, this classification process will allow the surgeon to avoid over-treatment as well as under treatment.
BACKGROUND OF THE INVENTION
Squamous cell carcinoma of the head and neck (HNSCC) being the sixth most common cancer worldwide, poses a significant cause for morbidity and mortality, with approximately 540,000 new cases annually worldwide (Vigneswaran and Williams; Oral MaxillofacSurgClin North Am. 2014 May;26(2): 123-41. doi: l0. l0l6/j.coms.20l4.0l .00l). HNSCC is the highest incidence of all cancer in South East Asia and in India it comprises 32% to 40% of total malignancy in comparison to western countries where it is only 3% to 5% (Vigneswaran and Williams; Oral MaxillofacSurgClin North Am. 2014 May; 26(2): l23-4l. doi: 10.1016/j . corns.2014.01.001). Despite the recent advances the 5-year survival rate is approximately 50% due to relatively high recurrence rates in the patients and the development of SPTs in the upper aerodigestive tract (Tabor, Brakenhoff et al.;Clin Cancer Res. 2001 Jun;7(6): 1523-32). HNSCC arises from the epithelial mucosal region of the upper aerodigestive tract encompassing regions of oral cavity, larynx and pharynx and has variable etiologies and prognosis. Smoking and alcohol consumption are the foremost risk factors for HNSCC. However, the difference in prevalence in different population is attributable to the indigenous habit of chewing a mixture of tobacco, areca nut, lime betel leaf and spices in variety of combinations (Tabor, Brakenhoff et al.; Clin Cancer Res. 2001 Jun;7(6): 1523-32). Human papillomavirus (HPV) also is emerging as an important risk factor for the development of HNSCC. In India, it is the chewing of tobacco quid, and its placement in the gingivo-buccal groove that is responsible for the high incidence overall and the predilection for buccal mucosa cancer. In the West it is mainly alcohol consumption along with smoking that is responsible for causing oral cancer which affects mainly the floor of the mouth and the tongue.
Oral submucous fibrosis,- a premalignant condition, is a typically Indian disease as it is caused due to the habit of chewing areca nut. A very large number of Indian oral cancer cases have associated submucous fibrosis. This is not so in oral cancers in the Western population.
HNSCC has a tendency to metastasize to the cervical lymph nodes. Presence of metastasis to cervical lymph node has a major adverse impact on the prognosis of oral cancer. There are site wise differences in the pattern of lymph node metastases. Level one lymph node is the first echelon of metastasis from cancer of the buccal mucosa whereas level two is the first echelon of metastasis from cancer of the tongue.
Presence of cervical lymph node metastases (N+ neck) may manifest clinically as a palpable mass or may be demonstrated on imaging and a needle biopsy. In a sizeable number of cases of oral cancer there may be micro metastases that are neither clinically palpable nor are demonstrable on imaging. These cases may pass off as ' N zero ' (NO) though they are truly ' N+ ' cases.
Treatment of oral squamous cancer must necessarily address the issue of management of the neck.
Surgery is the mainstay of treatment for oral cancer. An early primary tumour viz stage Tl (tumour size < 2cms) or stage T2 (tumour size 2-4 ems) is generally resected intra orally. If the neck is N+, clinically or on imaging, a radical or modified radical neck dissection is performed.
Nearly 30% of the Tl /T2, NO (N zero) cases may be harbouring occult cervical lymph node metastases. If the neck is not dissected, the neck disease will manifest within weeks or months as palpable metastatic nodes. Resection at that stage may or may not be possible. If on the other hand, neck dissection is performed prophylactically in all NO cases, 70% of the cases will be subjected to an unnecessary radical surgery with all its costs & morbidity.
Several clinico-pathological parameters have been used to try and predict the true 'N ' status in a neck that is 'NO ' clinically and on imaging. These include the tumour size, its depth, the grade of the tumour and the presence or absence of perineural invasion. Neither singly, nor in combination has these factors allowed a high enough predictability for a ' wait & watch ' policy to be pursued safely.
If the neck is "NO" clinically and on imaging, a surgeon is faced with the dilemma on whether to dissect the neck at this stage.
Roepman, Wessels et al. in their publication (Nat Genet. 2005 Feb;37(2): 182-6. Epub 2005 Jan 9) and PCT Publication W02006/085746 disclosed a microarray-based study on the European population and reported a gene signature with the predictive strength of 102 differentially expressed genes (those which are up-regulated or down-regulated) for lymph node metastases from 82 primary SCCs of the oral cavity and oropharynx). Significantly, an overall predictive accuracy of 86% was achieved compared with a clinical staging accuracy of 68%. Ropemann et al. (Cancer Res. 2006 Feb 15; 66(4):236l-6) further reported several subsets of an 825-gene panel that are capable of predicting lymph node metastasis in head and neck cancer with equal robustness. A further refinement was reported by Roepman et al. (Cancer Res. 2006 Feb l5;66(4):236l-6) on the predictive genes that enriched two over represented functional categories viz, binding to the extracellular matrix and protease activity for the degradation of the extracellular matrix, supporting the fact that invasion of the tumor cells in the surrounding tissues involve the non-tumor cells in the tumor microenvironment. The initial 102 gene signature developed on a in-house 'customized' array platform and was subsequently subjected to a commercial whole gene expression array for platform transition, followed by the analysis on a dedicated diagnostic array for multicentric validation. Van Hoof SR et al. (Carcinoma Journal of Clinical Oncology 2012 30:33, 4104-4110) reported an analysis that showed an overall accuracy of 72% for the whole validation cohort and an 89% NPV upon combining with clinical assessment. The 825 genes or subset thereof generated from the HNSCC tumors of patient population in the Netherlands disclosed in the PCT publication W02006/085746 suffer from severe limitations and is not applicable to patients in the Asian subcontinent due to their distinct population specific differences in the genetic architecture of the HNSCC tumors. The gene list could classify only at 72% accuracy for the validation cohort that reaches to 89% upon combining with clinical information. Further, the said gene list is unlikely to successfully classify the HNSCC samples originating from the Asian subcontinent as a major percentage of these population cohorts have a history of exposure to smokeless tobacco, as depicted by the current set of patient volunteers whose tissues were subjected to transcriptomics analysis (Table 1).
Smoking and alcohol consumption are the foremost risk factors for HNSCC. However, the difference in prevalence in different population is attributable to the indigenous habit of chewing a mixture of tobacco, areca nut, lime betel leaf and spices in variety of combinations as described by Tabor, Brakenhoff et al. in 2001. This is also supported by the fact that there is very little resemblance with list generated from the data obtained using European cohorts versus this work based on Indian patients (Fig. 7).
There is therefore an unmet need to provide biomarkers that predict the aggressive subset amongst early squamous cancer of tongue, biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa within the population in the Asian subcontinent and an analytical process using micro-array based gene expression signature to distinguish potentially metastatic from non-metastatic HNSCC tumors to aid in the decision making by a Surgeon whether to undertake a prophylactic radical neck dissection or whether to wait and watch with regular follow up.
OBJECTS OF THE INVENTION
The main object of the invention is to provide a chip, a kit, biomarkers and an in-vitro method for head & neck cancer prognosis.
Another major objective of the invention is to provide biomarkers to aid the surgeon in making evidence based decision, whether to dissect or not to dissect the neck during surgery for an early (Tl,/T2, NO) Squamous cancer of the oral cavity thus minimising the risk of under treatment and avoiding the morbidities of over-treatment.
A major objective of the invention is to provide biomarkers to identify the aggressive subset amongst early HNSCC (Tl, T2, NO) so as to assist the head and neck oncologist in prognostication as also in planning appropriate and optimum treatment upfront.
Another object of the invention is to provide biomarkers that predict the aggressive subset amongst early squamous cancer of tongue (Tl /T2) within the population in the Asian subcontinent.
Another object of the invention is to provide biomarkers that can predict the aggressive subset amongst early squamous cancer of the buccal mucosa (T1/T2) within the population in the Asian subcontinent.
Yet another object of the invention is to provide an analytical process using micro-array based gene expression data to classify metastatic and non-metastatic HNSCC tumors that can be used by a surgeon to decide on whether to dissect the neck of a patient.
Yet another object of the invention is to provide a kit for the determination of head-and-neck tumor, comprising a microarray which includes one or more polynucleotides comprising a nucleic acid sequence complementary to the sequence of one or more mRNAs selected from the group arrived at using the said analytical process, capable of measuring the expression level of the mRNA in tumor biopsy materials or body fluids like saliva and blood, so as to yield a comparative expression profile between sample and control to determine presence of tumor and its metastatic potential.
Yet another object of the invention is to provide the said kit for the determination of head-and- neck tumor, including a primer set capable of amplifying the sequence of one or more mRNA in tumor biopsy materials or body fluids like saliva and blood, selected from the group arrived at using the analytical process and a fluorescent probe comprising a polynucleotide consisting of a nucleic acid sequence complementary to the mRNA sequence or a part thereof. SUMMARY OF THE INVENTION
Accordingly, the present invention relates to a chip for detection of head & neck cancer prognosis consisting of cDNA sequence complimentary to the nucleotide sequence having Seq id no. 1-957.
In an embodiment of the invention it provides a method for microarray-based gene-expression assay conducted on a microarray chip for quantifying the expression levels of signature biomarkers in a tumour biopsy material, such that the said method comprises steps of:
(a) preparing total RNA from a tumour sample (T);
(b) purifying the said RNA from samples (T) converting them to respective cDNA;
(c) characterized in converting the said cDNA of sample (T) into cRNA and labelling that cRNA with a fluorescent dye (selected from colours Red, Green, Blue, yellow) to obtain a flurorescent-labelled cRNA (fT);
(d) purifying the fluorescent labelled cRNAs (fT) from free, unreacted dye;
(e) hybridising the said fT as obtained in step (d) with oligonucleotide probes on the said microarray chip;
(f) washing the said chip after hybridization in order to remove excess, unbound fT, followed by processing and scanning using an array reader;
(g) the scanner generates fluorescent coloured image of the resultant spots due to hybridization of fT to the probes on the Chip.
(h) fluorescent spots for the colour assigned to fT is normalized and corrected against the background and non-specific (undesired) hybridization.
(i) the normalized values of the each fluorescent spot from the sample as obtained in (h) are matched with the pre-determined Hyperplane approximation as template (also called classifier template) to obtain the classified NO and N+ samples.
In another embodiment of the invention it provides a method wherein the microarray chip has a biomarker specific oligonucleotide probe on the surface of a glass slide.
In yet another embodiment of the invention provides a method, wherein the tumour is selected from a group consisting of HNSCC, at NO, N+, Tl, T2, T3, T4 stage of tongue and buccal mucosa. In yet another embodiment of the invention provides a method wherein the cRNA as in step‘c’ is a biomarker for detection of head & neck cancer prognosis having Seq id no. 1-957.
In yet another embodiment of the invention provides a kit for head & neck cancer prognosis consisting of :
I. a panel of cDNA sequence complimentary to the nucleotide sequence having Seq id no. 1-957 on a chip,
II. suitable reagents capable of detecting singly or a combination of the cDNA;
III. instruction manual for using the kit.
wherein, up-regulation (induction) of genes having nucleotide sequence of seq id no. 1-555 and concomitant down-regulation (suppression) of genes having nucleotide sequence of seq id no. 556-957 yielding a comparative expression profile between sample and control to determine presence of tumour and its node metastatic potential.
In yet another embodiment the invention provides an in-silico method for analysing the micro-array data obtained by using the chip comprising steps:
I. normalising the intensity output from the iScan for all Tumour and Normal samples (n=443;) using Robust Spline Normalization in Lumi, Bioconductor package method;
II. removing the Outlier samples (n=2l) from the total sample by using K-means clustering method;
III. randomly selecting from all samples 50 Tumour samples as validation set and generating robust gene list from remaining samples (n=372) consisting of Tumour samples (n=236) and Normal samples (n=l36);
IV. using the normalized intensity values of the two groups of samples (n=372 and n=236) for differential expression analysis using the Bioconductor packages limma and lumi and generating a list of 2980 differentially expressed genes that classify the tumour and normal samples with adjusted p value < 0.05 and fold change > 1.5 or <-1.5;
V. generating the robust feature list that would classify the NO and N+ tumour samples using the 236 Tumour samples with 2980 genes wherein the tumour samples (n=236) are separated randomly in two parts; one part containing— 90% of the samples (h=216) and the other part containing— 10% of the samples (n=20) and denoting them as "Sample 1" and "Sample 2", respectively;
VI. applying a Recursive Feature Elimination process on Sample 1 and identifying top 300 features (genes) as important ones;
VII. predicting Sample 1 and Sample 2 separately by choosing top "i" transcripts (genes) from this list varying i=J2,...,300, and finding the feature lists with the highest prediction accuracy in both sample sets;
VIII. repeating Steps V to Step VII 200 times to remove the sample bias and collecting all 200 feature lists so obtained;
IX. preparing a frequency distribution of genes found in all 200 feature lists;
X. characterized in preparing a list of 957 genes from the above 200 feature lists such that each of these 957 transcripts or genes appears at least once in the 200 feature lists thus making it the largest list of which successive lists are subsets of this 957-gene list;
XI. predicting 50 validation samples set with the gene that appeared highest number of times and calculating prediction accuracy followed by prediction with two transcripts having highest and next highest frequency and repeating the procedure by increasing the number of transcripts.
In yet another embodiment the invention provides a method for screening therapeutic agents for head-and-neck tumour, comprising the steps: i. administering a test substance to a non-human animal suffering from artificially induced head-and-neck tumour;
ii. applying the sample from step (i) onto the chip;
iii. comparing the expression level to the expression level of the mRNA (or corresponding cDNA) with control wherein test substance is not administered.
ABBREVIATIONS
HNSCC Head and Neck Squamous Cell Carcinoma
see Squamous Cell Carcinoma
NO Lymph node metastases negative
N+ neck Lymph node metastases positive Ti Tumour size <2cms
T2 Tumour size 2-4 ems
mRNA Messenger ribo-nucleic acid
cRNA Complementary ribo-nucleic acid
cDNA Complementary deoxyribo-nucleic acid
T Tumour
C Control
ROC Receiver Operating Characteristics
UTP Uridine tri phosphate
RSN Robust Spline Normalization
PCR Polymerase Chain Reaction
PPV Positive predictive value
NPV Negative predictive value
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
Fig. 1 illustrates the flow chart of the analysis process followed. The Model depicts the workflow and the numbers of the samples analyzed and gene shortlisted at each analysis. 443 samples were received of which 286 were tumor samples. Of this 286 samples 50 were kept aside for the testing set and the rest were used as a training set for the discovery phase. Reiterative statistical and computer algorithms were used to generate a final gene list. The combined final list was then validated in the 50 samples that were kept aside for the testing phase.
Fig. 2 illustrates the Heat Map. The heat map depicts the differentially expressed transcripts (genes) between tumor and control tissues. In the present analysis 2980 differentially expressed transcripts with p value less than 0.05 and 1.5 fold differences were selected.
Fig. 3 illustrates the prediction accuracy curve. The upper panel illustrates prediction accuracy produced by the combined transcript lists which were generated by the algorithm. It was observed that the accuracy level is higher than 95% in the training set and nearly 85% in the test set by using the two transcript lists which contains n = 34 and n = 288 transcripts. The lower panel shows how the prediction accuracy is varied by the different transcript lists.
Fig. 4 illustrates AUC - training sample. The panels depict the sensitivity of prediction both in the training set samples used for discovery phase and the testing set samples used for validation phase. Fig. 5 illustrates the validation data of prediction of 50 test samples. The picture depicts the bar plot of the probability value generated when predicting the 50 validation samples by using the n = 34 transcripts. In the present analysis, 236 samples are used as train set. The comparison plot shows agreement between the diagnosis clinically and predicted by the 34 transcripts for a given sample.
Fig. 6 illustrates prediction accuracy of 10 follow up samples converted from NO to N+. The picture depicts that prediction with the help of this algorithm accurately predicts the clinical progression of 10 samples. Samples which converted from NO to N+ clinically after 18 months were predicted successfully by the transcript list generated.
Fig. 7 illustrates comparison of gene signatures derived from Indian and Caucasian head and neck patients. The graph illustrates that the transcript list generated is indigenous to the India population and is unique when compared to the Caucasian population. This shows the necessity to have an indigenous biomarker for tumor progression as the allele pool is different between Indian and other populations. Panel A: Shows the venn diagram between the transcript list produced from this work (n = 34) and the transcript list provided by Frank C P Holstege et al. (n= 102). The second venn diagram is between the transcript list produced from this work (n = 288) and the transcript list provided by Frank C P Holstege et al. (n= 102). The third venn diagram is between the transcript list produced from this work (n = 957) and the transcript list provided by Frank C P Holstege et al. (n= 825).
Panel B shows the log fold change value of transcripts (n = 102 and n= 825 provided by Frank C P Holstege et al.) generated by the differential expression between tumour and control samples of this work.
Panel C shows ROC curves by using the transcripts (n = 102 and n = 825) provided by Frank C P Holstege et al. where the train samples and the test samples are from this work.
Fig. 8 illustrates classifying unknown sample as NO or N+: For classifying the unknown tumour samples, the normalized expression value is used as input into the model generated from the 34 or 288 gene signature using the learning algorithm as shown.
BRIEF DESCRIPTION OF ACCOMPANYING TABLES
Table 1 illustrates the history of 293 patient volunteers whose tissues were subjected to gene- expression analysis for this invention. Table 2 illustrates the clinical description of the samples used for gene-expression analysis.
Table 3 illustrates the differentially expressed [Tumour (T) versus Control (C)] 2980 genes or nucleotide sequences.
Table 4 illustrates the 288 gene signatures [N+ vs NO]
Table 5 illustrates the signature of 34 differentially (N+ versus NO) expressed genes or nucleotide sequences.
Table 6 illustrates the prediction accuracy/sensitivity.
Table 7 illustrates the signatures of 957 genes or nucleotide sequences [N+ vs NO]
DETAILED DESCRIPTION OF THE INVENTION
In accordance with this invention, paired tumor and normal tissues were collected in the hospital (Table 2) [Tissue material is taken from Prince Aly Khan Hospital (PAKH), Mumbai- 400010] and preserved in RNA Later (Ambion) at -80°C. Total RNA from the tissue samples were isolated and quality and quantity were determined. The isolated RNA samples were converted to cRNA and labelled with cy3 (Cyanine™ 3) dye. The labelled cRNA were hybridized with Illumina Human HT-12 v4 Expression Beadchip and the Illumina bioarrays were read in an Illumina iScan Reader and the primary intensity data were obtained in standard file format, using Genome Studio software.
The process of analysing the micro-array data comprises the following main steps:
I. Normalising the intensity output from the iScan for all Tumour and Normal samples (n=443; sample descriptions are provided in Table 2) using RSN (Robust Spline Normalization in Lumi, R Bioconductor package) method as described in Fig. 1 ;
II. Removing the Outlier samples (n=2l) from the total sample by using K- means clustering method;
III. Randomly selecting from all samples 50 Tumour samples as validation set and generating the robust gene list form remaining samples (n=372) consisting of Tumour samples (n=236) and Normal samples (n=l36);
IV. Using the normalized intensity values of the two groups of samples (n=236 and n=l36) for differential expression analysis using the Bioconductor packages limma and lumi and generating a list of 2980 differentially expressed genes (Table 3) that classify the tumor and normal samples with adjusted p value < 0.05 and fold change > 1.5 or <-1.5 (Histogram or Heat Map in Fig. 2);
V. Generating the robust feature list that would classify the NO and N+ tumor samples using the 236 Tumour samples with 2980 genes wherein the tumour samples (n=236) are separated randomly in two parts; one part containing— 90% of the samples (h=216) and the other part containing— 10% of the samples (n=20) and denoting them as "Sample 1 " and "Sample 2", respectively;
VI. Applying a Recursive Feature Elimination process on Sample 1 and identifying top 300 features (genes) as important ones;
VII. Predicting Sample 1 and Sample 2 separately by choosing top "i" transcripts (genes) from this list varying i=J2, ...,300, and finding the feature lists with the highest prediction accuracy in both sample sets;
VIII. Repeating Steps V to Step VII 200 times to remove the sample bias and collecting all 200 feature lists so obtained;
IX. Preparing a frequency distribution of genes found in all 200 feature lists;
X. Preparing a list of 957 genes from the above 200 feature lists such that each of these
957 transcripts or genes appears at least once in the 200 feature lists thus making it the largest list of which successive lists are subsets of this 957-gene list (Table 7);
XI. Predicting 50 validation samples set with the gene that appeared highest number of times in the 957-gene list and calculating prediction accuracy followed by prediction with two transcripts having highest and next highest frequency. This procedure was repeated by increasing the number of transcripts. The transcripts list(s) comprising of 288 nucleotide sequences (genes or transcripts) (Table 4) and 34 nucleotide sequences (genes or transcripts) (Table 5) showed highest prediction accuracy.
The biomarker information generated from the training set samples (n=236) was used for class prediction of 50 tumor samples that were initially put aside and are referred to as validation set samples. Analysis of these 50 samples with 288 nucleotide sequences (genes or transcripts) and also with 34 nucleotide sequence signature which is a subset of 288 nucleotide sequences (genes or transcripts) gave overall prediction of 84% and 82 % respectively in these samples (Fig. 5). Moreover, Fig. 3 illustrates that the nucleotide sequence signature of 288 nucleotide sequences (genes or transcripts) or a subset of 34 nucleotide sequences (genes or transcripts) gave overall prediction of > 95% in training samples. The specificity and sensitivity for both the test samples yielded acceptable results as demonstrated in Fig. 4 and Table 6. The ROC curves illustrated in Fig. 4 depict the percentage of false positives and false negatives.
It was observed that the prediction in 50 samples with the 34 nucleotide array set corroborated the clinical diagnosis hence validating the process of analysing the microarray data (Fig. 1). Furthermore, 10 samples of the 50 validation samples that were NO at the recruitment stage were converted to N+ on 18 month follow up. Both the 288 and 34 nucleotide sequence (genes or transcripts) signature correctly predicted those samples as N+ when they were clinically marked at NO with 100% accuracy (Fig. 6).
The present invention includes a nucleotide sequence (mRNA or cDNA) signature of 288 genes with up-regulation (induction) of one or more genes selected from serial numbers 1-174 and concomitant down-regulation (suppression) of one or more genes from numbers 175-288 as listed in table 4 and a signature of 34 genes (a subset of 288 genes) as given in table 5 where upregulation of one or more genes selected from genes 1-21 and concomitant down- regulation of one or more genes from numbers 22-34 are indicative of aggressive subset of head and neck cancer. Both the signature gene sets of 34 and 288 genes are subsets of 957 genes (Table 7).
The method of hybridization, detection and classification of an unknown tumor samples is given below:
(a) preparing total RNA from a tumor sample (T);
(b) converting the said RNA from samples (T) to respective double strand cDNA;
(c) converting the said cDNA of sample (T) into cRNA by in vitro transcription;
(d) hybridizing purified cRNA with the microarray chip containing nucleotide sequence probes as mentioned above;
(e) washing the said chip after hybridization in order to remove excess, probes;
(f) washing, blocking and streptavidin-Cy3 staining of the bound cRNAs;
(g) washing of the labelled chip with buffers under non-stringent condition to remove unbound conjugate;
(h) scanning of the labelled chip in a suitable scanner to collect the fluroscent intensity values;
(i) background correction and normalization of the fluorescent intensity values;
(j) using the normalized expression value as input into the model generated from the 34/288 gene signature using the learning algorithm as shown in Fig. 8 for classifying the unknown tumour samples.
Thus the present invention provides: (i) A signature of 957 nucleotide sequences or genes or transcripts — any combinations of nucleotide sequences 1-957;
(ii) A signature of 288 nucleotide sequences or genes or transcripts — any
combinations of nucleoti desequences 1-288;
(iii)A signature 34 nucleotide sequences or genes or transcripts— any combinations
of nucleotide sequences l-34that can be used for determining the expression level of the mRNA in tumor biopsy materials so as to yield a comparative expression profile between sample and control to determine presence of tumor and its metastatic potential
(iv)A signature of 957 genes, wherein up-regulation (activation) of one or more genes
is selected from serial numbers 1-555 and concomitant down-regulation (suppression) of one or more genes from numbers 556-957 as listed in table 7.
(v) A signature of 288 genes (subset of 957 genes), wherein up-regulation (activation)
of one or more genes is selected from serial numbers 1-174 and concomitant down-regulation (suppression) of one or more genes from numbers 175-288 as listed in table 4.
(vi)A signature of 34 genes (subset of 288 genes) as given in table 5, wherein up- regulation of one or more genes is selected from genes 1-21 and concomitant down-regulation of one or more genes from numbers 22-34.
EXAMPLES
The following examples are given by way of illustration of the present invention and therefore should not be construed to limit the scope of the present invention.
Example 1: RNA Isolation
Since May, 2011, 443 tumor and normal HNSCC tissue samples were weighed and preserved in RNA Later (Ambion) and stored at -80°C for further use. History was recorded and each sample case followed up for 18 months (Table 1 and 2).
About 30mg of the tissues from 443 tissue samples, of which tumor (n = 303) and normal (n = 140) samples, were subjected to total RNA extraction. Total RNA was isolated from the biopsy tissue preserved in RNA latter using a Qiagen RNeasy kit. Following extraction, the RNA was qualitatively and quantitatively checked. Quantitative estimation of RNA yield was measured by Nanodrop Spectrophotometer. In addition to the concentration of RNA, protein and carbohydrate contamination was checked by analyzing 260/280 and 260/230 absorbance ratio respectively. 500 ng of RNA was used for further experiments. Each sample was qualitatively evaluated for RNA integrity by Agilent 2100 Bioanalyser with RNA 600 nano kit. Total RNA quality control criteria was set in concordance with tumor analysis best practicing practice group, where samples below RIN value as 7 were not used further.
Example 2: cRNA synthesis and Labeling
cRNA was prepared using Illumina Total Prep RNA Amplification Kit from Ambion as per the manufacturer's instructions. First strand synthesis of cDNA was performed by reverse transcription of 500ng of RNA using Arrayscript followed by second strand synthesis by DNA polymerase. The double stranded cDNA was then purified and used as template for in vitro transcription at 37 ° for 16 hours to generate cRNA. During in vitro transcription biotynylated 5 -(3- aminoallyl ) -UTP was incorporated in the single stranded cRNA. The cRNA was then purified and the yield was quantified by Qubit spectrophotometry. 750ng of cRNA was used for further hybridizations.
Example 3: Array Based Whole Genome Gene expression profiling using Illumina Human HT-12 v4 Expression Beadchip
The Direct hybridization Whole-Gene Expression assay offers the highest multiplexing capabilities for whole genome gene expression, simultaneously profiling more than 47,000 transcripts (genes). The Human-HT-l2 v4.0 Expression Beadchip supports 12 samples format facilitates large scale gene expression studies. This technique allows to analyze genome wide transcriptome profile targeting more than 31,000 annotated genes per sample using 47,000 probe sets.
Example 4: Array Hybridization
Illumina Whole Genome Gene Expression Beadchips consists of oligonucleotides immobilized to beads held in microwells on the surface of an array substrate. The presence of 29- mer array sequence on each bead helps in uniquely identifying the location of the bead on the array surface and hence helps in hybridization based procedure to map the array. Seven-hundred fifty nanogram (750ng) of labeled sample cRNA were detected by hybridization to 50 mer probes on the bead chip. Subsequent steps included washing, blocking, and streptavidin-Cy3 staining followed by serial non-stringent washing steps to remove unbound conjugate. Following the final rinse, the chips were dried by centrifugation and scanned. The Illumina bioarrays were read in an Illumina iScan Reader and the primary intensity data were obtained in standard file format, using Genome Studio software.
Example 5: Selection of Predictive Nucleotide Sequence Array List
(i) Data Analysis Pipeline Setup for Transcriptome Profiling
Pre-processing of Expression Data
Raw background subtracted data were extracted by Illumina GenomeStudio Software v3, and were further processed in R statistical environment (http://www.r-project.org) using Lumi package. Raw data were pre-processed for force positive value to handle using variance stabilization transformation of R Bioconductor Lumi package. The robust spline method was used for normalization and outlier is removed by using k- means method. Filtration was done with probes reaching a detection p value < 0.01 in all samples. Differentially expressed nucleotide sequences were analyzed in the R Bioconductor Limma package. A linear model was fitted for each nucleotide sequence given a series of arrays using lmFit function. Using Benjamini and Hochberg method, the p values were adjusted for multiple testing. Nucleotide sequences/ probes with FC >1.5 or FC < -1.5 and adjusted p value < 0.05 were considered to be differentially expressed between tumor and control samples.
Comparison between Tumor and Control Samples
In order to elucidate the progression of NO into N+ tumors in HNSCC patients the differential expression of the transcriptome between tumor and normal samples were first compared. This was done by pooling all NO and N+ tumor samples as one group (n=236) and their adjacent normal tissue (n=l36) as another group.
It was observed from the aforesaid analysis that 2980 nucleotide sequences/probes were differentially expressed between tumor and the normal samples (Fig. 2 and Table 3). The threshold for the differential fold change in selecting these 2980 nucleotide sequences was set at 1.5. However, the same 2980 nucleotide sequences/probes did not show any significant expression difference between NO and N+ tumors. The reason for such an observation can be attributed to the fact that the difference in expression of genes between NO and N+ group would be very small. Thus, the Class Prediction method was then used to analyze the difference as opposed to directly calculating differential gene expression between 112 N+ and 124 NO tumor samples. Class prediction effectively predicts the samples according to their NO and N+ status as detailed below. The 2980 nucleotide sequences differentially expressed between tumor and normal samples with >< 1.5 fold difference (adjusted p value < 0.05) was used as probable predictor nucleotide sequences (Table 3).
Class Prediction Analysis
To delineate an effective set of classifier nucleotide sequence, a Class Prediction based on training testing method was developed. Three hundred three (303) tumor samples were first normalized based on robust spline normalization and outliers (21 samples) were removed. From this group of 286 tumor samples, 50 randomly selected samples were kept aside for validation. The remaining 236 tumor samples were used for identification of the biomarker. The training testing method was performed in 236 samples using the 2980 nucleotide sequences as input to predict NO and N+ tumors (Table 3). The data from each sample was subjected to reiterative statistical analysis to produce a nucleotide sequence/ probe list that could accurately classify early NO and N+ HNSCC patients in 236 training set samples. The resulting nucleotide sequence signature of 957 (Table 7) nucleotide sequences or a sub set of 288 (Table 4) nucleotide sequences or a subset of 34 (Table 5) nucleotide sequences gave overall prediction of > 95% in training samples. (Fig. 4).
( ii ) Validation Studies
The biomarker information generated from the discovery phase / training set samples (n=236) was used for class prediction of 50 tumor samples that were initially put aside and are referred to as validation set samples. Analysis of these 50 samples with 288 nucleotide sequences and also with 34 nucleotide sequence signature which is a subset of 288 nucleotide sequences gave overall prediction of 84% and 82 % respectively, in these samples (Fig. 3 and Fig. 5). The specificity and sensitivity for both the test samples yielded acceptable results as depicted in Fig. 4 and Table 6. The ROC curves illustrated in Fig. 4 depict the percentage of false positives and false negatives. The prediction in 50 samples with the 34 nucleotide array set corroborated the clinical diagnosis hence validating the prediction process in this sample set (Fig. 1). Ten samples of the 50 validation samples which were labeled as NO at the recruitment stage and converted to N+ on 18 month follow up. Both the 288 and 34 nucleotide sequence signatures correctly predicted those samples as N+ when they were clinically marked at NO with 100% accuracy (Fig. 6). The present invention includes a nucleotide sequence (mRNA or cDNA) signature of 288 genes with up-regulation (induction) of one or more genes selected from serial numbers 1-174 and concomitant down-regulation (suppression) of one or more genes from numbers 175-288 as listed in table 4 and a signature of 34 genes (a subset of 288 genes) as given in table 5 where upregulation of one or more genes selected from genes 1-21 and concomitant down- regulation of one or more genes from numbers 22-34 are indicative of aggressive subset of head and neck cancer. Both the signature gene sets of 34 and 288 genes are subsets of 957 genes (Table 7).
Kit Design and Unknown Tumour Sample Testing
The list of biomarkers or genes, as provided in tables 4, 5, and 7, was used to identify the aggressive tumour subset amongst early HNSCC (Tl, T2, NO) tumor by assaying the expression level of mRNAs of the corresponding genes in tumour biopsy materials by a hybridization-based assay on a microarray chip.
The microarray chip is prepared by spotting (printing) mRNA or biomarker specific oligonucleotide probe as listed in the sequence listing, along with few additional spots (5-10) of oligonucleotide unrelated to human samples for background correction and few oligonucleotides (3-5) for normalization purposes on the surface of a glass slide. The probes spotted on a chip “hybridize” with the corresponding mRNA or biomarker, listed in Tables 4, 5 and 7, present in the tumor sample of the subject.
ADVANTAGES OF THE INVENTION
The invention relates to the development of a gene expression based biomarker that predicts the aggressive subset amongst early stage oral cavity squamous cell carcinoma of the head and neck (HNSCC). This gene expression based kit will aid in the decision making by a Surgeon to dissect or not to dissect the lymph node of a patient with oral cavity cancer. Thus, this classification process will allow the surgeon to avoid over-treatment as well as under treatment.
Table 1
History of patient volunteers whose tissues were subjected to gene-expression analysis for this invention Distribution of tobacco (n=293)
Figure imgf000021_0001
Distribution of alcohol (n=34)
Figure imgf000021_0002
Distribution of pattern of tobacco and alcohol use.
Figure imgf000021_0003
Tobacco and alcohol consumption in BM and Tongue cases (n=293)
Figure imgf000021_0004
Signs of oral cancer ( n=293 )
Figure imgf000021_0005
Table 2
Clinical Description of the Samples used for Transcriptome Analysis
Figure imgf000022_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000027_0002
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0002
Figure imgf000057_0001
Figure imgf000057_0003
Figure imgf000058_0003
Figure imgf000058_0001
Figure imgf000058_0004
Figure imgf000058_0002
Figure imgf000059_0001
Figure imgf000059_0002
Figure imgf000060_0002
Figure imgf000060_0001
Figure imgf000060_0003
Figure imgf000061_0002
Figure imgf000061_0001
Figure imgf000061_0003
Figure imgf000062_0001
Figure imgf000063_0001
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Figure imgf000067_0002
Figure imgf000067_0001
Figure imgf000067_0003
Figure imgf000068_0001
Figure imgf000069_0001
Figure imgf000070_0002
Figure imgf000070_0001
Figure imgf000070_0003
Figure imgf000071_0002
Figure imgf000071_0001
Figure imgf000071_0003
Figure imgf000072_0001
Figure imgf000072_0002
Figure imgf000073_0001
Figure imgf000074_0001
Figure imgf000075_0001
Figure imgf000076_0002
Figure imgf000076_0001
Figure imgf000076_0003
Figure imgf000077_0002
Figure imgf000077_0001
Figure imgf000077_0003
Figure imgf000078_0002
Figure imgf000078_0001
Figure imgf000078_0003
Figure imgf000079_0002
Figure imgf000079_0001
Figure imgf000079_0003
Figure imgf000080_0001
Figure imgf000080_0002
Figure imgf000081_0001
Figure imgf000081_0002
Figure imgf000082_0001
Figure imgf000082_0002
Figure imgf000083_0001
Figure imgf000083_0002
Figure imgf000083_0003
Figure imgf000084_0001
Figure imgf000085_0002
Figure imgf000085_0001
Figure imgf000085_0003
Figure imgf000086_0002
Figure imgf000086_0001
Figure imgf000086_0003
Figure imgf000087_0001

Claims

We Claim:
1. A chip for detection of head & neck cancer prognosis consisting of cDNA sequence complimentary to the nucleotide sequence having Seq id no. 1-957.
2. The method for microarray-based gene-expression assay conducted on a microarray chip as claimed in claim lfor quantifying the expression levels of signature biomarkers in a tumour biopsy material, such that the said method comprises steps of:
(a) preparing total RNA from a tumour sample (T);
(b) purifying the said RNA from samples (T) converting them to respective
cDNA;
(c) characterized in converting the said cDNA of sample (T) into cRNA and labelling that cRNA with a fluorescent dye (selected from colours Red, Green, Blue, yellow) to obtain a flurore scent-labelled cRNA (fT);
(d) purifying the fluorescent labelled cRNAs (fT) from free, unreacted dye;
(e) hybridising the said fT as obtained in step (d) with oligonucleotide probes on the said microarray chip;
(f) washing the said chip after hybridization in order to remove excess, unbound fT, followed by processing and scanning using an array reader;
(g) the scanner generates fluorescent coloured image of the resultant spots due to hybridization of fT to the probes on the Chip.
(h) fluorescent spots for the colour assigned to fT is normalized and corrected against the background and non-specific (undesired) hybridization.
(i) the normalized values of the each fluorescent spot from the sample as obtained in (h) are matched with the pre-determined Hyperplane approximation as template (also called classifier template) to obtain the classified NO and N+ samples.
3. The method as claimed in claim 2, wherein the microarray chip has a biomarker specific oligonucleotide probe on the surface of a glass slide.
4. The method as claimed in claim 2, wherein the tumour is selected from a group consisting of HNSCC, at NO, N+, Tl, T2, T3, T4 stage of tongue and buccal mucosa.
5. The method as claimed in claim 2, wherein the cRNA as claimed in step‘c’ is a biomarker for detection of head & neck cancer prognosis having Seq id no. 1-957.
6. A kit for head & neck cancer prognosis consisting of :
I. a panel of cDNA sequence complimentary to the nucleotide sequence having Seq id no. 1-957 on a chip as claimed in claim 1,
II. suitable reagents capable of detecting singly or a combination of the cDNA;
III. instruction manual for using the kit.
wherein, up-regulation (induction) of genes having nucleotide sequence of seq id no. 1-555 and concomitant down-regulation (suppression) of genes having nucleotide sequence of seq id no. 556-957 yielding a comparative expression profile between sample and control to determine presence of tumour and its node metastatic potential.
7. An in-silico method for analysing the micro-array data obtained by using the chip as claimed in claim 1 comprising steps:
I. normalising the intensity output from the iScan for all Tumour and Normal samples (n=443;) using Robust Spline Normalization in Lumi, Bioconductor package method;
II. removing the Outlier samples (n=2l) from the total sample by using K-means clustering method;
III. randomly selecting from all samples 50 Tumour samples as validation set and generating robust gene list from remaining samples (n=372) consisting of Tumour samples (n=236) and Normal samples (n=l36);
IV. using the normalized intensity values of the two groups of samples (n=372 and n=236) for differential expression analysis using the Bioconductor packages limma and lumi and generating a list of 2980 differentially expressed genes that classify the tumour and normal samples with adjusted p value < 0.05 and fold change > 1.5 or <-1.5;
V. generating the robust feature list that would classify the NO and N+ tumour samples using the 236 Tumour samples with 2980 genes wherein the tumour samples (n=236) are separated randomly in two parts; one part containing— 90% of the samples (h=216) and the other part containing— 10% of the samples (n=20) and denoting them as "Sample 1" and "Sample 2", respectively;
VI. applying a Recursive Feature Elimination process on Sample 1 and identifying top 300 features (genes) as important ones;
VII. predicting Sample 1 and Sample 2 separately by choosing top "/" transcripts (genes) from this list varying /'= /,2.300 and finding the feature lists with the highest prediction accuracy in both sample sets;
VIII. repeating Steps V to Step VII 200 times to remove the sample bias and collecting all 200 feature lists so obtained;
IX. preparing a frequency distribution of genes found in all 200 feature lists;
X. characterized in preparing a list of 957 genes from the above 200 feature lists such that each of these 957 transcripts or genes appears at least once in the 200 feature lists thus making it the largest list of which successive lists are subsets of this 957- gene list;
XI. predicting 50 validation samples set with the gene that appeared highest number of times and calculating prediction accuracy followed by prediction with two transcripts having highest and next highest frequency and repeating the procedure by increasing the number of transcripts.
8. A method for screening therapeutic agents for head-and-neck tumour, comprising the steps: i. administering a test substance to a non-human animal suffering from artificially induced head-and-neck tumour;
ii. applying the sample from step (i) onto the chip as claimed in claim 1;
iii. comparing the expression level to the expression level of the mRNA (or corresponding cDNA) with control wherein test substance is not administered.
PCT/IN2019/050386 2018-05-15 2019-05-14 A chip and a method for head & neck cancer prognosis WO2019220459A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201811018135 2018-05-15
IN201811018135 2018-05-15

Publications (1)

Publication Number Publication Date
WO2019220459A1 true WO2019220459A1 (en) 2019-11-21

Family

ID=68539644

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2019/050386 WO2019220459A1 (en) 2018-05-15 2019-05-14 A chip and a method for head & neck cancer prognosis

Country Status (1)

Country Link
WO (1) WO2019220459A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006085746A2 (en) * 2005-01-06 2006-08-17 Umc Utrecht Holding B.V. Diagnosis of metastases in hnscc tumours

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006085746A2 (en) * 2005-01-06 2006-08-17 Umc Utrecht Holding B.V. Diagnosis of metastases in hnscc tumours

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DATABASE GEO PLATFORM 17 June 2010 (2010-06-17), "Illumina HumanHT-12 V4.0 expression beadchip", retrieved from NCBI Database accession no. GPL10558 *

Similar Documents

Publication Publication Date Title
US11549148B2 (en) Neuroendocrine tumors
US20220307090A1 (en) Method for predicting the response to chemotherapy in a patient suffering from or at risk of developing recurrent breast cancer
TWI582236B (en) Prognosis prediction for melanoma cancer
JP4435259B2 (en) Detection method of trace gastric cancer cells
JP2009508493A (en) Methods for diagnosing pancreatic cancer
KR20120065959A (en) Markers for predicting gastric cancer prognostication and method for predicting gastric cancer prognostication using the same
JP2009528825A (en) Molecular analysis to predict recurrence of Dukes B colorectal cancer
JP2011516077A (en) Methods, agents, and kits for detecting cancer
US20110143946A1 (en) Method for predicting the response of a tumor in a patient suffering from or at risk of developing recurrent gynecologic cancer towards a chemotherapeutic agent
CA2504403A1 (en) Prognostic for hematological malignancy
US9347088B2 (en) Molecular signature of liver tumor grade and use to evaluate prognosis and therapeutic regimen
AU2008294687A1 (en) Methods and tools for prognosis of cancer in ER- patients
US20180172689A1 (en) Methods for diagnosis of bladder cancer
JP4317854B2 (en) Detection method of trace gastric cancer cells
JP2012513752A (en) Methods and means for typing samples containing colorectal cancer cells
CA2677723C (en) Prognostic markers for classifying colorectal carcinoma on the basis of expression profiles of biological samples.
JP5865241B2 (en) Prognostic molecular signature of sarcoma and its use
US20210079479A1 (en) Compostions and methods for diagnosing lung cancers using gene expression profiles
EP2132330A2 (en) Methods and tools for detecting the presence of colorectal adenocarcinoma cells
WO2019220459A1 (en) A chip and a method for head &amp; neck cancer prognosis
US20220098677A1 (en) Method for determining rcc subtypes
WO2006109086A2 (en) Method to predict the sensitivity of tumors to eg5 inhibitors
US20150329916A1 (en) Methods and kits used in classifying adrenocortical carcinoma
US20130203623A1 (en) Method and kit for classifying a patient
US20150160223A1 (en) Method of predicting non-response to first line chemotherapy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19803610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19803610

Country of ref document: EP

Kind code of ref document: A1