US20160281175A1 - Lung cancer methylation markers - Google Patents

Lung cancer methylation markers Download PDF

Info

Publication number
US20160281175A1
US20160281175A1 US15/096,848 US201615096848A US2016281175A1 US 20160281175 A1 US20160281175 A1 US 20160281175A1 US 201615096848 A US201615096848 A US 201615096848A US 2016281175 A1 US2016281175 A1 US 2016281175A1
Authority
US
United States
Prior art keywords
lung cancer
pitx2
sall3
hoxa10
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/096,848
Inventor
Andreas WEINHÄUSEL
Rudolf Pichler
Christa Nöhammer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AIT Austrian Institute of Technology GmbH
Original Assignee
AIT Austrian Institute of Technology GmbH
Tecnet Equity NO Technologiebeteiligungs Invest GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AIT Austrian Institute of Technology GmbH, Tecnet Equity NO Technologiebeteiligungs Invest GmbH filed Critical AIT Austrian Institute of Technology GmbH
Priority to US15/096,848 priority Critical patent/US20160281175A1/en
Assigned to AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH reassignment AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NOHAMMER, CHRISTA, PICHLER, RUDOLF, WEINHAUSEL, ANDREAS
Assigned to TECNET EQUITY NO TECHNOLOGIEBETEILIGUNGS-INVEST GMBH, AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH reassignment TECNET EQUITY NO TECHNOLOGIEBETEILIGUNGS-INVEST GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH
Publication of US20160281175A1 publication Critical patent/US20160281175A1/en
Assigned to AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH reassignment AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GUENTHER, ELKE, AGNETER, DORIS
Assigned to AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH reassignment AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH CORRECTIVE ASSIGNMENT TO CORRECT THE CONVEYING PARTY DATA PREVIOUSLY RECORDED AT REEL: 042688 FRAME: 0465. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: TECNET EQUITY NO TECHNOLOGIEBETEILIGUNGS-INVEST GMBH
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2523/00Reactions characterised by treatment of reaction samples
    • C12Q2523/10Characterised by chemical treatment
    • C12Q2523/125Bisulfite(s)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the present invention relates to cancer diagnostic methods and means therefor.
  • Neoplasms and cancer are abnormal growths of cells. Cancer cells rapidly reproduce despite restriction of space, nutrients shared by other cells, or signals sent from the body to stop re-production. Cancer cells are often shaped differently from healthy cells, do not function properly, and can spread into many areas of the body. Abnormal growths of tissue, called tumors, are clusters of cells that are capable of growing and di-viding uncontrollably. Tumors can be benign (noncancerous) or malignant (cancerous). Benign tumors tend to grow slowly and do not spread. Malignant tumors can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the body. Malignant cancers can be both locally invasive and metastatic.
  • Cancers can invade the tissues surrounding it by sending out “fingers” of cancerous cells into the normal tissue. Metastatic cancers can send cells into other tissues in the body, which may be distant from the original tumor. Cancers are classified according to the kind of fluid or tissue from which they originate, or according to the location in the body where they first developed. All of these parameters can effectively have an influence on the cancer characteristics, development and progression and subsequently also cancer treatment. Therefore, reliable methods to classify a cancer state or cancer type, taking diverse parameters into consideration is desired. Since cancer is predominantly a genetic disease, trying to classify cancers by genetic parameters is one extensively studied route.
  • RNA-expression studies have been used for screening to identify genetic biomarkers. Over recent years it has been shown that changes in the DNA-methylation pattern of genes could be used as biomarkers for cancer diagnostics. In concordance with the general strategy identifying RNA-expression based biomarkers, the most convenient and prospering approach would start to identify marker candidates by genome-wide screening of methylation changes.
  • RNA expression profiling for elucidation of class differences for distinguishing the “good” from the “bad” situation like diseased vs. healthy, or clinical differences between groups of diseased patients.
  • RNA based markers are more promising markers and expected to give robust assays for diagnostics.
  • Many of clinical markers in oncology are more or less DNA based and are well established, e.g. cytogenetic analyses for diagnosis and classification of different tumor-species. However, most of these markers are not accessible using the cheap and efficient molecular-genetic PCR routine tests.
  • RNA-expression changes range over some orders of magnitudes and these changes can be easily measured using genome-wide expression microarrays. These expression arrays are covering the entire translated transcriptome by 20000-45000 probes. Elucidation of DNA changes via microarray techniques re-quires in general more probes depending on the requested resolution. Even order(s) of magnitude more probes are required than for standard expression profiling to cover the entire 3 ⁇ 10 9 by human genome.
  • Another option for obtaining stable DNA-based biomarkers re-lies on elucidation of the changes in the DNA methylation pattern of (malignant; neoplastic) disease.
  • methylation affects exclusively the cytosine residues of CpG dinucleotides, which are clustered in CpG islands.
  • CpG islands are often found associated with gene-promoter sequences, present in the 5′-untranslated gene regions and are per default unmethylated.
  • an unmethylated CpG island in the associated gene-promoter enables active transcription, but if methylated gene transcription is blocked.
  • the DNA methylation pattern is tissue- and clone-specific and almost as stable as the DNA itself.
  • DNA-methylation is an early event in tumorigenesis which would be of interest for early and initial diagnosis of disease.
  • screening for biomarkers suitable to answering clinical questions including DNA-methylation based approaches would be most successful when starting with a genome-wide approach.
  • Microarray for human genome-wide hybridization testings are known, e.g. the Affymetrix Human Genome U133A Array (NCB1 Database, Acc. No. GLP96).
  • Lung cancer is the third most common malignant neoplasm in the EU following breast and colon cancers. Lung cancer presents the second worst 5-year survival figures following pancreas. Thus, although it accounts for 14% of all cancer diagnoses, lung cancer is responsible for 22% of cancer deaths, indicating the poor prognosis of this tumour type and the comparative lack of progress in treatment. Therapy is hampered by the tendency for lung cancer to be diagnosed at a late stage, hence the need to develop markers for early detection. Approximately 80% of lung cancer cases are of the non-small cell type (NSCLC), with squamous cell carcinoma and adenocarcinoma being the most frequent subtypes.
  • NSCLC non-small cell type
  • a goal of the present invention is to provide an alternative and more cost-efficient route to identify suitable markers for lung cancer diagnostics.
  • the present invention provides a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4.
  • the set further comprises any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLA
  • the present invention provides a method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene marker IDs 1-359 of table 1, suitable for the diagnosis or prognosis of lung cancer or lung cancer type, comprising
  • the present invention provides a master set of 359 genetic markers which has been surprisingly found to be highly relevant for aberrant methylation in the diagnosis or prognosis of lung cancer. It is possible to determine a multitude of marker subsets from this master set which can be used to diagnose and differentiate between various lung cancer or tumor types, e.g. adenocarcinoma and squamous cell carcinoma.
  • the inventive 359 marker genes of table 1 are: NHLH2, MTHFR, PRDM2, MLLT11, S100A9 (control), S100A9, S100A8 (control), S100A8, S100A2, LMNA, DUSP23, LAMC2, PTGS2, MARK1, DUSP10, PARP1, PSEN2, CLIC4, RUNX3, AIM1L, SFN, RPA2, TP73, TP73 (p73), POU3F1, MUTYH, UQCRH, FAF1, TACSTD2, TN-FR5F25, DIRAS3, MSH4, GBP2, GBP2, LRRC8C, F3, NANOS1, MGMT, EBF3, DCLRE1C, KIF5B, ZNF22, PGBD3, SRGN, GATA3, PTEN, MMS19, SFRP5, PGR, ATM, DRD2, CADM1, TEAD1, OPCML, CALCA, CTSD, MYOD1, IGF2, BDNF,
  • Table 1 lists some marker genes in the double such as for different loci and control sequences. It should be understood that any methylation specific region which is readily known to the skilled man in the art from prior publications or available databases (e.g. PubMeth at www.pubmeth.org) can be used according to the present invention. Of course, double listed genes only need to be represented once in an inventive marker set (or set of probes or primers therefor) but preferably a second marker, such as a control region is included (IDs given in the list above relate to the gene ID (or gene loci ID) given in table 1 of the example section).
  • DNA methylation an attractive target for biomarker development, is the fact that cell free methylated DNA can be detected in body-fluids like serum, sputum, and urine from patients with cancerous neoplastic conditions and disease.
  • clinical samples have to be available.
  • archived (tissue) samples Preferably these materials should fulfill the requirements to obtain intact RNA and DNA, but most archives of clinical samples are storing formalin fixed paraffin embedded (FFPE) tissue blocks. This has been the clinic-pathological routine done over decades, but that fixed samples are if at all only suitable for extraction of low quality of RNA.
  • FFPE formalin fixed paraffin embedded
  • any such samples can be used for the method of generating an inventive subset, including fixed samples.
  • the samples can be of lung tissue or any body fluid, e.g. sputum, bronchial lavage, or serum derived from peripheral blood or blood cells.
  • Blood or blood derived samples preferably have reduced, e.g. ⁇ 95%, or no leukocyte content but comprise DNA of the cancerous cells or tumor.
  • the inventive markers are of human genes.
  • the samples are human samples.
  • the present invention provides a multiplexed methylation testing method which 1) outperforms the “classification” success when compared to genomewide screenings via RNA-expression profiling, 2) enables identification of biomarkers for a wide variety of diseases, without the need to prescreen candidate markers on a genomewide scale, and 3) is suitable for minimal invasive testing and 4) is easily scalable.
  • the invention presents a targeted multiplexed DNA-methylation test which outperforms genome-scaled approaches (including RNA expression profiling) for disease diagnosis, classification, and prognosis.
  • the inventive set of 359 markers enables selection of a subset of markers from this 359 set which is highly characteristic of lung cancer and a given lung cancer type.
  • Further indicators differentiating between cancer types or generally neoplastic conditions are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic tumors or nodules. It is sometimes possible to differentiate the sample type from which the methylated DNA is isolated, e.g. urine, blood, tissue samples.
  • the present invention is suitable to differentiate diseases, in particular neoplastic conditions, or tumor types. Diseases and neoplastic conditions should be understood in general including benign and malignant conditions. According to the present invention benign nodules (being at least the potential onset of malignancy) are included in the definition of a disease. After the development of a malignancy the condition is a preferred disease to be diagnosed by the markers screened for or used according to the present invention.
  • the present invention is suitable to distinguish benign and malignant tumors (both being considered a disease according to the present invention). In particular the invention can provide markers (and their diagnostic or prognostic use) distinguishing between a normal healthy state together with a benign state on one hand and malignant states on the other hand.
  • a diagnosis of lung cancer may include identifying the difference to a normal healthy state, e.g. the absence of any neoplastic nodules or cancerous cells.
  • the present invention can also be used for prognosis of lung cancer, in particular a prediction of the progression of lung cancer or lung cancer type.
  • a particularly preferred use of the invention is to perform a diagnosis or prognosis of metastasizing lung cancer (distinguished from non-metastasizing conditions).
  • prognosis should not be understood in an absolute sense, as in a certainty that an individual will develop lung cancer or lung cancer type (including cancer progression), but as an increased risk to develop cancer or the lung cancer type or of cancer progression.
  • Prognosis is also used in the context of predicting disease progression, in particular to predict therapeutic results of a certain therapy of the disease, in particular neoplastic conditions, or lung cancer types.
  • the prognosis of a therapy can e.g. be used to predict a chance of success (i.e. curing a disease) or chance of reducing the severity of the disease to a certain level.
  • markers screened for this purpose are preferably derived from sample data of patients treated according to the therapy to be predicted.
  • the inventive marker sets may also be used to monitor a patient for the emergence of therapeutic results or positive disease progressions.
  • DNA methylation analyses in principle rely either on bisulfite deamination-based methylation detection or on using methylation sensitive restriction enzymes.
  • the restriction enzyme-based strategy is used for elucidation of DNA-methylation changes.
  • Further methods to determine methylated DNA are e.g. given in EP 1 369 493 A1 or U.S. Pat. No. 6,605,432.
  • Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive marker sets (or subsets).
  • a microarray-hybridization step can be used for reading out the PCR results.
  • statistical approaches for class comparisons and class prediction can be used. Such statistical methods are known from analysis of RNA-expression derived microarray data.
  • an amplification protocol can be used enabling selective amplification of the methylated DNA fraction prior methylation testing. Subjecting these amplicons to the methylation test, it was possible to successfully distinguish DNA from sensitive cases from normal healthy controls. In addition it was possible to distinguish lung-cancer patients from healthy normal controls using DNA from serum by the inventive methylation test upon preamplification. Both examples clearly illustrate that the inventive multiplexed methylation testing can be successfully applied when only limiting amounts of DNA are available. Thus, this principle might be the preferred method for minimal invasive diagnostic testing.
  • the 359 marker set test is not a genome-wide test and might be used as it is for diagnostic testing, running a subset of markers—comprising the classifier which enables best classification—would be easier for routine applications.
  • the test is easily scalable.
  • the selected subset of primers/probes could be applied directly to set up of the lower multiplexed test (or single PCR-test).
  • Serum DNA can be used to classify or distinguish healthy patients from individuals with lung-tumors. Only the specific primers comprising the gene-classifier obtained from the methylation test may be set up together in multiplexed PCR reactions.
  • inventive methylation test is a suitable tool for differentiation and classification of neoplastic disease.
  • This assay can be used for diagnostic purposes and for defining biomarkers for clinical relevant issues to improve diagnosis of disease, and to classify patients at risk for disease progression, thereby improving disease treatment and patient management.
  • the first step of the inventive method of generating a subset, step a) of obtaining data of the methylation status preferably comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific digestion analysis.
  • Methylation specific digestion analysis can include either or both of hybridization of suitable probes for detection to non-digested fragments or PCR amplification and detection of non-digested fragments.
  • the inventive selection can be made by any (known) classification method to obtain a set of markers with the given diagnostic (or also prognostic) value to categorize a lung cancer or lung cancer type.
  • classification methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.
  • the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test or t-value test or F-test.
  • Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the subset until a certain diagnostic value is reached, e.g. the herein mentioned at least 70% (or more) correct classification of lung cancer or lung cancer type.
  • Class comparison procedures include identification of genes that were differentially methylated among the two classes using a random-variance t-test.
  • the random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455, 2003).
  • Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01.
  • a stringent significance threshold can be used to limit the number of false positive findings.
  • a global test can also be performed to determine whether the expression profiles differed between the classes by permuting the labels of which arrays corresponded to which classes.
  • the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the “F-test” instead of the “t-test” should be used.
  • Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more bio-logically interpretable and clinically applicable, however, if fewer genes are included. Similar to cross-validation, gene selection is repeated for each training set created in the cross-validation process. That is for the purpose of providing an unbiased estimate of prediction error. The final model and gene set for use with future data is the one resulting from application of the gene selection and classifier fitting to the full dataset.
  • Models for utilizing gene methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al. Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al. Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al.), and Support Vector Machines with linear kernel (Ramaswamy et al. PNAS USA 98:15149-54, 2001). The models incorporated genes that were differentially methylated among genes at a given significance level (e.g.
  • the prediction error of each model using cross validation is preferably estimated.
  • the entire model building process was repeated, including the gene selection process. It may also be evaluated whether the cross-validated error rate estimate for a model was significantly less than one would expect from random prediction.
  • the class labels can be randomly permuted and the entire leave-one-out cross-validation process is then repeated.
  • the significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data. About 1000 random permutations may be usually used.
  • Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3(4):research0017.1-0017.11, 2002).
  • the greedy-pairs approach starts with ranking all genes based on their individual t-scores on the training set.
  • the procedure selects the best ranked gene g i and finds the one other gene g i that together with provides the best discrimination using as a measure the distance between centroids of the two classes with regard to the two genes when projected to the diagonal linear discriminant axis.
  • These two selected genes are then removed from the gene set and the procedure is repeated on the remaining set until the specified number of genes have been selected. This method attempts to select pairs of genes that work well together to discriminate the classes.
  • a binary tree classifier for utilizing gene methylation profile can be used to predict the class of future samples.
  • the first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes.
  • the individual binary classifiers were based on the “Support Vector Machines” incorporating genes that were differentially expressed among genes at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). Classifiers for all possible binary partitions are evaluated and the partition selected was that for which the cross-validated prediction error was minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split.
  • the prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation included re-selection of the optimal partitions at each node and re-selection of the genes used for each cross-validated training set as described by Simon et al. (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003). 10-fold cross validation in which one-tenth of the samples is withheld can be utilized, a binary tree developed on the remaining 9/10 of the samples, and then class membership is predicted for the 10% of the samples withheld. This is repeated 10 times, each time withholding a different 10% of the samples. The samples are randomly partitioned into 10 test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute).
  • the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test. It is also possible to include a step in that the genes are selected d) in order of their rating.
  • the subset selection preferably results in a subset with at least 60%, preferably at least 65%, at least 70%, at least 75%, at least 80% or even at least 85%, at least 90%, at least 92%, at least 95%, in particular preferred 100% correct classification of test samples of lung cancer or lung cancer type.
  • Such levels can be reached by repeating c) steps a) and b) of the inventive method, if necessary.
  • marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.
  • the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 190, at least 200, at least 220, at least 240, at least 260, at least 280, at least 300, at least 320, at least 340, at least 350 or all, genes.
  • the subset should be small it is preferred that not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are selected in step d) for the subset.
  • the present invention provides a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a patient, comprising providing a diagnostic subset of markers identified according to the method depicted above, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type in the sample.
  • the methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC—methylated C—changes by any known methods, including PCR and hybridization techniques).
  • the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments.
  • the methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.
  • methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.
  • the determination using only specific primers aims at specifically amplifying methylated (or in the alternative non-methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions—and consequently only the non-methylated (or alternatively methylated) DNA is obtained.
  • a genome chip or simply a gene chip including hybridization probes for all genes of interest such as all 359 marker genes
  • all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.
  • Either set, a set of probes or a set of primers can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.
  • the method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes.
  • the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof.
  • the DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots).
  • the methylated DNA of the sample is detected by a multiplexed hybridization reaction.
  • a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion.
  • the amplification reaction is multiplexed (e.g. multiplex PCR).
  • the inventive methods are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes.
  • the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng.
  • the inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes.
  • the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng, per ml sample.
  • the present invention provides a subset comprising or consisting of nucleic acid primers or hybridization probes being specific for a potentially methylated region of at least marker genes selected from a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4 or any other subset selected from one of the following groups
  • the present inventive set also includes sets with at least 50% of the above markers for each set since it is also possible to substitute parts of these subsets being specific for—in the case of binary conditions/differentiations—e.g. good or bad prognosis or distinguish between lung cancer or lung cancer types, wherein one part of the subset points into one direction for a certain lung cancer type or cancer/differentiation. It is possible to further complement the 50% part of the set by additional markers specific for diagnosing lung cancer or determining the other part of the good or bad differentiation or differentiation between two lung cancer types. Methods to determine such complementing markers follow the general methods as outlined herein.
  • Each of these marker subsets is particularly suitable to diagnose lung cancer or lung cancer type or distinguish between certain cancers, samples or cancer types in a methylation specific assay of these genes.
  • the inventive primers or probes may be of any nucleic acid, including RNA, DNA, PNA (peptide nucleic acids), LNA (locked nucleic acids).
  • the probes might further comprise methylation specific moieties.
  • the present invention provides a (master) set of 360 marker genes, further also specific gene locations by the PCR products of these genes wherein significant methylation can be detected, as well as subsets therefrom with a certain diagnostic value to detect or diagnose lung cancer or distinguish lung cancer type(s).
  • the set is optimized for a lung cancer or a lung cancer type.
  • Lung cancer types include, without being limited thereto, adenocarcinoma and squamous cell carcinoma.
  • Further indicators differentiating between disease(s), including the diagnosis of any type of lung cancer or lung tumor, or between tumor type(s) are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic.
  • the set can also be optimized for a specific sample type in which the methylated DNA is tested.
  • samples include blood, urine, saliva, hair, skin, tissues, in particular tissues of the cancer origin mentioned above, in particular lung tissue such as potentially affected or potentially cancerous lung tissue, or serum, sputum, bronchial lavage.
  • the sample my be obtained from a patient to be diagnosed.
  • the test sample to be used in the method of identifying a subset is from the same type as a sample to be used in the diagnosis.
  • probes specific for potentially aberrant methylated regions are provided, which can then be used for the diagnostic method.
  • primers suitable for a specific amplification like PCR, of these regions in order to perform a diagnostic test on the methylation state.
  • Such probes or primers are provided in the context of a set corresponding to the inventive marker genes or marker gene loci as given in table 1.
  • Such a set of primers or probes may have all 359 inventive markers present and can then be used for a multitude of different cancer detection methods. Of course, not all markers would have to be used to diagnose a lung cancer or lung cancer type. It is also possible to use certain subsets (or combinations thereof) with a limited number of marker probes or primers for diagnosis of certain categories of lung cancer.
  • the present invention provides sets of primers or probes comprising primers or probes for any single marker subset or any combination of marker subsets disclosed herein.
  • sets of marker genes should be understood to include sets of primer pairs and probes therefor, which can e.g. be provided in a kit.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set k, HOXA10, NEUROD1 and/or either HOXA10 or NEUR001 can be used to diagnose lung cancer and further to distinguish between adenocarcinoma from squamous cell carcinoma.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set n SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set t, HOXA10, RASSF1, F2R and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish between adenocarcinoma and squamous cell carcinoma.
  • the distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • subsets a) to t in particular sets comprising markers of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of these subsets, preferably for the lung cancer type or preferably complete sets a) to t).
  • One preferred set comprises gene markers WT1, SALL3, TERT, ACTB and CPEB4. These markers are common in a set for the diagnosis of lung cancer and suitable to distinguish normal from lung cancer samples. This set preferably is supplemented by the marker genes DLX2, TNFRSF25 or SMAD3.
  • the inventive set may comprise any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLA
  • the methylation of at least two genes is determined.
  • the present invention is provided as an array test system, at least ten, especially at least fifteen genes, are preferred.
  • test set-ups for example in microarrays (“gene-chips”)
  • preferably at least 20, even more preferred at least 30, especially at least 40 genes are provided as test markers.
  • these markers or the means to test the markers can be provided in a set of probes or a set of primers, preferably both.
  • the set comprises up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000 probes or primer pairs (set of two primers for one amplification product), preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, or even more preferred up to 200 probes or primers of any kind, particular in the case of immobilized probes on a solid surface such as a chip.
  • primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.
  • the probes or primers are specific for a methylation in the genetic regions defined by SEQ ID NOs 1081 to 1440, including the adjacent up to 500 base pairs, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent, corresponding to gene marker IDs 1 to 359 of table 1, respectively.
  • probes or primers of the inventive set are specific for the regions and gene loci identified in table 1, last column with reference to the sequence listing, SEQ ID NOs: 1081 to 1440.
  • these SEQ IDs correspond to a certain gene, the latter being a member of the inventive sets, in particular of the subsets a) to t), e.g.
  • the set of the present invention comprises probes or primers for at least one gene or gene product of the list according to table 1, wherein at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, especially preferred at least 100%, of the total probes or primers are probes or primers for genes of the list according to table 1.
  • the set in particular in the case of a set of hybridization probes, is provided immobilized on a solid surface, preferably a chip or in form of a microarray.
  • gene chips using DNA molecules for detection of methylated DNA in the sample
  • Such gene chips also allow detection of a large number of nucleic acids.
  • the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized.
  • Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips.
  • the primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers.
  • the probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.
  • the present invention also relates to a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a subject or patient, comprising obtaining a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof), determining the methylation status of the genes in the sample for which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying the lung cancer or lung cancer type in the sample.
  • inventive method has been described above and all preferred embodiments of such methods also apply to the method using the set provided herein.
  • inventive marker set including certain disclosed subsets and subsets, which can be identified with the methods disclosed herein, are suitable to diagnose lung cancer and distinguish between different lung cancer forms, in particular for diagnostic or prognostic uses.
  • markers used e.g. by utilizing primers or probes of the inventive set
  • inventive diagnostic or prognostic method may be used in smaller amounts than e.g. in the set (or kit) or chip as such, which may be designed for more than one fine tuned diagnosis or prognosis.
  • the markers used for the diagnostic or prognostic method may be up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000, preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20,000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, up to 200, up to 100, up to 80, or even more preferred up to 60.
  • the inventive set of marker primers or probes can be employed in chip (immobilised) based assays, products or methods, or in PCR based kits or methods. Both, PCR and hybridisation (e.g. on a chip) can be used to detect methylated genes.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between lung cancer from normal tissue, in particular for diagnostic or prognostic uses.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between adenocarcinoma from squamous cell carcinoma, in particular for diagnostic or prognostic uses.
  • FIG. 1 Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor.
  • Samples from solid tumors were derived from initial surgical resection of primary tumors. Tumor tissue sections were derived from histopathology and histopathological data as well clinical data were monitored over the time of clinical management of the patients and/or collected from patient reports in the study center. Anonymised data and DNA were provided.
  • the invention assay is a multiplexed assay for DNA methylation testing of up to (or even more than) 360 methylation candidate markers, enabling convenient methylation analyses for tumor-marker definition.
  • the test is a combined multiplex-PCR and microarray hybridization technique for multiplexed methylation testing.
  • inventive marker genes, PCR primer sequences, hybridization probe sequences and expected PCR products are given in table 1, above.
  • methylation analysis is performed via methylation dependent restriction enzyme (MSRE) digestion of 500 ng of starting DNA.
  • MSRE methylation dependent restriction enzyme
  • a combination of several MSREs warrants complete digestion of unmethylated DNA. All targeted DNA regions have been selected in that way that sequences containing multiple MSRE sites are flanked by methylation independent restriction enzyme sites.
  • This strategy enables pre-amplification of the methylated DNA fraction before methylation analyses.
  • the design and pre-amplification would enable methylation testing on serum, urine, stool etc. when DNA is limiting.
  • the methylated DNA fraction is amplified within 16 multiplex PCRs and detected via microarray hybridization. Within these 16 multiplex-PCR reactions 360 different human DNA products can be amplified. From these about 20 amplicons serve as digestion & amplification controls and are either derived from known differentially methylated human DNA regions, or from several regions without any sites of MSREs used in this system.
  • the primer set (every reverse primer is biotinylated) used is targeting 347 different sites located in the 5′UTR of 323 gene regions.
  • PCR amplicons are pooled and positives are detected using strepavidin-Cy3 via microarray hybridization.
  • the melting temperature of CpG rich DNA is very high, primer and probe-design as well as hybridization conditions have been optimized, thus this assay enables unequivocal multiplexed methylation testing of human DNA samples.
  • the assay has been designed such that 24 samples can be run in parallel using 384well PCR plates.
  • the entire procedure provides the user to setup a specific PCR test and subsequent gel-based or hybridization-based testing of selected markers using single primer-pairs or primer-subsets as provided herein or identified by the inventive method from the 360 marker set.
  • MSRE digestion of DNA (about 500 ng) was performed at 37° C. over night in a volume of 30 ⁇ l in 1 ⁇ Tango-restriction enzyme digestion buffer (MBI Fermentas) using 8 units of each MSREs AciI (New England Biolabs), Hin 6 I and Hpa II (both from MBI Fermentas). Digestions were stopped by heat inactivation (10 min, 75° C.) and subjected to PCR amplification.
  • MBI Fermentas Tango-restriction enzyme digestion buffer
  • Microarrays with the probes of the 360 marker set are blocked for 30 min in 3M Urea containing 0.1% SDS, at room temperature submerged in a stirred choplin char. After blocking slides are washed in 0.1 ⁇ SSC/0.2% SDS for 5 min, dipped into water and dried by centrifugation.
  • PCR-amplicon-pool of each sample is mixed with an equal amount of 2 ⁇ hybridization buffer (7 ⁇ SSC, 0.6% SDS, 50% formamide), desaturated for 5 min at 95° C. and held at 70° C. until loading an aliquot of 100 ⁇ l onto an array covered by a gasket slide (Agilent). Arrays are hybridized under maximum speed of rotation in an Agilent-hybridization oven for 16 h at 52° C.
  • 2 ⁇ hybridization buffer 7 ⁇ SSC, 0.6% SDS, 50% formamide
  • microarray-slides After removal of gasket-slides microarray-slides are washed at room temperature in wash-solution I (1 ⁇ SSC, 0.2% SDS) for 5 min and wash solution II (0.1 ⁇ SSC, 0.2% SDS) for 5 min, and a final wash by dipping the slides 3 times into wash solution III (0.1 ⁇ SSC), the slides are dried by centrifugation.
  • streptavidin-Cy3-conjugate (Caltag Laboratories) is diluted 1:400 in PBST-MP (1 ⁇ PBS, 0.1% Tween 20; 1% skimmed dry milk powder [Sucofin; Germany]), pipetted onto microarrays covered with a coverslip and incubated 30 min at room temperature in the dark. Then coverslips are washed off from the slides using PBST (1 ⁇ PBS, 0.1% Tween 20) and then slides are washed in fresh PEST for 5 min, rinsed with water and dried by centrifugation.
  • DNA amount is limited. Although the inventive methylation test is performing well with low amounts of DNA (see above), especially minimal invasive testing using cell free DNA from serum, stool, urine, and other body fluids is of diagnostic relevance.
  • Samples can be preamplified prior methylation testing as follows: DNA was digested with restriction enzyme FspI (and/or Csp6I, and/or MseI, and/or Tsp5091; or their isoschizomeres) and after (heat) inactivation of the restriction enzyme the fragments were circularized using T4 DNA ligase. Ligation-products were digested using a mixture of methylation sensitive restriction enzymes. Upon enzyme-inactivation the entire mixture was amplified using rolling circle amplification (RCA) by phi29-phage polymerase. The RCA-amplicons were then directly subjected to the multiplex-PCRs of the inventive methylation test without further need of digestion of the DNA prior amplification.
  • restriction enzyme FspI and/or Csp6I, and/or MseI, and/or Tsp5091; or their isoschizomeres
  • Ligation-products were digested using a mixture of methylation sensitive restriction enzymes
  • the preamplified DNA which is enriched for methylated DNA regions can be directly subjected to fluorescent-labelling and the labeled products can be hybridized onto the microarrays using the same conditions as described above for hybridization of PCR products. Then the streptavidin-Cy3 detection step has to be omitted and slides should be scanned directly upon stringency washes and drying the slides. Based on the experimental design for microarray analyses, either single labeled or dual-labeled hybridizations might be generated. From our experiences we successfully used the single label-design for class comparisons. Although the preamplification protocol enables analyses of spurious amounts of DNA, it is also suited for performing genomic methylation screens.
  • Hybridizations performed on a chip with probes for the inventive 360 marker genes were scanned using a GenePix 4000A scanner (Molecular Devices, Ismaning, Germany) with a PMT set-ting to 700V/cm (equal for both wavelengths).
  • Raw image data were extracted using GenePix 6.0 software (Molecular Devices, Ismaning, Germany).
  • P-values (p) used for feature selection for classification and prediction were based on the univariate significance levels (alpha).
  • P-values (p) and mis-classification rate during cross validation (MCR) were given along the result data.
  • DNA-methylation-biomarkers suitable for distinction of tumour and normal lung DNA as well as DNA-methylation-profiles from blood DNA of healthy controls were deduced. Diagnostic and prognostic markers subsets are suitable for diagnostic testing and presymptomatic screening for early detection of lung cancer were determined, in DNA derived from lung tissue, but also in DNA extracts from patients other than lung, like sputum, serum or plasma.
  • DNA Methylation testing results and data analyses of chip results as well as qPCR validation of a subset of markers derived from chip-based testing are provided.
  • DNA Samples analysed were from blood of 8 healthy individuals (PB), 19 tumours (AdenoCa, adenocarcinoma) and 19 normal lung tissue (N) of adenocarcinoma patients and 29 tumours (SqCCL, squamous cell carcinoma) and 29 normal lung tissue (N) of squamous cell carcinoma patients.
  • PB healthy individuals
  • AdenoCa adenocarcinoma
  • N normal lung tissue
  • SqCCL squamous cell carcinoma
  • the design of the test enables methylation testing on DNA directly derived from the biological source.
  • the test is also suitable for using a DNA preamplification upon MSRE digestion (as outlined above).
  • biomarker testing is feasible on small samples and limited amounts of DNA.
  • multiplexed PCR and methylation testing is easily performed on preamplified DNA obtained from these DNA samples. This strategy would improve also testing of serum, urine, stool, synovial fluid, sputum and other body fluids using the conceptual design of the methylation test.
  • preamplification enables also differential methylation hybridization of the preamplified DNA itself. This option is warranted by the design of the test and the probes.
  • probes of the methylation test or the array for hybridization of labelled DNA after enrichment of either the methylated as well as the unmethylated DNA fractions of any DNA sample, can be used for methylation testing omitting the multiplex PCR.
  • biomarkers described herein could be applied for methylation testing using alternative approaches, e.g. methylation sensitive PCR and strategies which are sodium-bisulfite DNA deamination based and not based on MSRE digestion of DNA. These sets of methylation markers are suitable markers for disease-monitoring, -progression, -prediction, therapy-decision and -response.
  • Class 1 Sorted by t-value (Sorted by gene pairs) Class 1: N; Class 2: T. Parametric Geom mean Geom mean p- t- % CV of intensities of intensities Fold- Gene value value support in class 1 in class 2 change symbol 1 ⁇ 1e ⁇ 07 ⁇ 9.452 100 1411.8016 13554.578246 0.1041568 WT1 2 ⁇ 1e ⁇ 07 ⁇ 7.222 100 85.5069224 1125.7940428 0.0759525 DLX2 3 ⁇ 1e ⁇ 07 ⁇ 6.648 99 852.3850013 7392.282404 0.1153074 SALL3 4 ⁇ 1e ⁇ 07 ⁇ 6.48 70 235.4745892 592.5077157 0.3974203 TERT 5 0.0017994 3.213 27 437.7037557 291.867223 1.4996674 TNFRSF25 6 5e ⁇ 07 5.391 100 2993.3054637 1117.4218527 2.6787604 ACTB 7 4e ⁇
  • PB, N, and TU is of interest when minimal invasive testing for lung cancer has to be performed using serum- or plasma from peripheral blood.
  • the markers distinguishing PB, N and TU will be best suited therefore.
  • Using “16 methylation markers” derived from the Recursive Feature Elimination method for class prediction with Diagonal Linear Discriminant Analysis enables 91% correct classification.
  • Distinguishing the grade of differentiation of the tumours could be also achieved by DNA methylation marker testing. Although the correct classification is only about 60% in this example, the lung tumour groups “AdenoCa” and “SqCCL” can be split and used separately for determining the grade of tumour-differentiation for better performance.
  • BinTreePred “Differentiation” AdenoCa, SqCCL, N PB
  • Binary Tree prediction (applicable for elucidation of markers for more than 2 classes) provides several sets of predictors which enable classification of PB, AdenoCa, SqCCL, N. These marker sets could be used alternatively for classification.
  • Quantitative PCR with primers for markers elucidated by microarray analysis were run on MSRE-digested DNAs from the same sample groups as analyzed on microarrays. Marker sets for SYBRGreen qPCR were from Example 10f and Example 10d.
  • Statistical testing of the transformed data was performed in the same manner as the microarray data using BRB-AT software.
  • Diagonal Compound Linear 3- Support Covariate Discriminant 1- Nearest Nearest Vector Predictor Analysis Nearest Neighbors Centroid Machines Correct? Correct? Neighbor Correct? Correct? Mean percent of 96 98 94 94 94 100 correct classification: n 48
  • the prediction rule is defined by the inner sum of the weights (wi) and expression (xi) of significant genes.
  • the expression is the log ratios for dual-channel data and log intensities for single-channel data.
  • a sample is classified to the class N if the sum is greater than the threshold; that is, ⁇ iwi xi>threshold.
  • the threshold for the Compound Covariate predictor is ⁇ 172.255
  • the threshold for the Diagonal Linear Discriminant predictor is ⁇ 15.376
  • the threshold for the Support Vector Machine predictor is 0.838
  • Bintree Prediction Histology—p ⁇ 0.05 UNpaired Samples “Compound Covariate Classifier”
  • Group 1 Group 2 Mis-classification rate Node Classes Classes (%) 1 AdenoCa, N 14.6 SqCCL 2 AdenoCa SqCCL 31.2

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Hospice & Palliative Care (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Oncology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The present invention discloses a method of diagnosing lung cancer by using methylation specific markers from a set, having diagnostic power for lung cancer diagnosis and distinguishing lung cancer types in diverse samples; as well as methods to identify sets of prognostic and diagnostic value.

Description

  • The present invention relates to cancer diagnostic methods and means therefor.
  • Neoplasms and cancer are abnormal growths of cells. Cancer cells rapidly reproduce despite restriction of space, nutrients shared by other cells, or signals sent from the body to stop re-production. Cancer cells are often shaped differently from healthy cells, do not function properly, and can spread into many areas of the body. Abnormal growths of tissue, called tumors, are clusters of cells that are capable of growing and di-viding uncontrollably. Tumors can be benign (noncancerous) or malignant (cancerous). Benign tumors tend to grow slowly and do not spread. Malignant tumors can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the body. Malignant cancers can be both locally invasive and metastatic. Locally invasive cancers can invade the tissues surrounding it by sending out “fingers” of cancerous cells into the normal tissue. Metastatic cancers can send cells into other tissues in the body, which may be distant from the original tumor. Cancers are classified according to the kind of fluid or tissue from which they originate, or according to the location in the body where they first developed. All of these parameters can effectively have an influence on the cancer characteristics, development and progression and subsequently also cancer treatment. Therefore, reliable methods to classify a cancer state or cancer type, taking diverse parameters into consideration is desired. Since cancer is predominantly a genetic disease, trying to classify cancers by genetic parameters is one extensively studied route.
  • Extensive efforts have been undertaken to discover genes relevant for diagnosis, prognosis and management of (cancerous)disease. Mainly RNA-expression studies have been used for screening to identify genetic biomarkers. Over recent years it has been shown that changes in the DNA-methylation pattern of genes could be used as biomarkers for cancer diagnostics. In concordance with the general strategy identifying RNA-expression based biomarkers, the most convenient and prospering approach would start to identify marker candidates by genome-wide screening of methylation changes.
  • The most versatile genome-wide approaches up to now are using microarray hybridization based techniques. Although studies have been undertaken at the genomic level (and also the single-gene level) for elucidating methylation changes in diseased versus normal tissue, a comprehensive test obtaining a good success rate for identifying biomarkers is yet not available.
  • Developing biomarkers for disease (especially cancer)-screening, -diagnosis, and -treatment was improved over the last decade by major advances of different technologies which have made it easier to discover potential biomarkers through high-throughput screens. Comparing the so called “OMICs”-approaches like Genomics, Proteomics, Metabolomics, and derivates from those, Genomics is best developed and most widely used for biomarker identification. Because of the dynamic nature of RNA expression and the ease of nucleic acid extraction and the detailed knowledge of the human genome, many studies have used RNA expression profiling for elucidation of class differences for distinguishing the “good” from the “bad” situation like diseased vs. healthy, or clinical differences between groups of diseased patients. Over the years especially microarray-based expression profiling has become a standard tool for research and some approaches are currently under clinical validation for diagnostics. The plasticity over a broad dynamic range of RNA expression levels is an advantage using RNA and also a prerequisite of successful discrimination of classes, the low stability of RNA itself is often seen as a drawback. Because stability of DNA is tremendously higher than stability of RNA, DNA based markers are more promising markers and expected to give robust assays for diagnostics. Many of clinical markers in oncology are more or less DNA based and are well established, e.g. cytogenetic analyses for diagnosis and classification of different tumor-species. However, most of these markers are not accessible using the cheap and efficient molecular-genetic PCR routine tests. This might be due to 1) the structural complexity of changes, 2) the inter-individual differences of these changes at the DNA-sequence level, and 3) the relatively low “quantitative” fold-changes of those “chromosomal” DNA changes. In comparison, RNA-expression changes range over some orders of magnitudes and these changes can be easily measured using genome-wide expression microarrays. These expression arrays are covering the entire translated transcriptome by 20000-45000 probes. Elucidation of DNA changes via microarray techniques re-quires in general more probes depending on the requested resolution. Even order(s) of magnitude more probes are required than for standard expression profiling to cover the entire 3×109 by human genome. For obtaining best resolution when screening biomarkers at the structural genomic DNA level, today genomic tiling arrays and SNP-arrays are available. Although costs of these techniques analysing DNA have decreased over recent years, for biomarker screening many samples have to be tested, and thus these tests are cost intensive.
  • Another option for obtaining stable DNA-based biomarkers re-lies on elucidation of the changes in the DNA methylation pattern of (malignant; neoplastic) disease. In the vertebrate genome methylation affects exclusively the cytosine residues of CpG dinucleotides, which are clustered in CpG islands. CpG islands are often found associated with gene-promoter sequences, present in the 5′-untranslated gene regions and are per default unmethylated. In a very simplified view, an unmethylated CpG island in the associated gene-promoter enables active transcription, but if methylated gene transcription is blocked. The DNA methylation pattern is tissue- and clone-specific and almost as stable as the DNA itself. It is also known that DNA-methylation is an early event in tumorigenesis which would be of interest for early and initial diagnosis of disease. In principle screening for biomarkers suitable to answering clinical questions including DNA-methylation based approaches would be most successful when starting with a genome-wide approach.
  • Shames D et al. (PLOS Medicine 3(12) (2006): 2244-2262) identified multiple genes that are methylated with high penetrance in primary lung, breast, colon and prostate cancers.
  • Sato N et al. (Cancer Res 63(13) (2003): 3735-3742) identified potential targets with aberrant methylation in pancreatic cancer. These genes were tested using a treatment with a de-methylating agent (5-aza-2′-deoxycytidine and/or the histone deacetylase inhibitor trichostatin A) after which certain genes were increased transcribed.
  • Bibikova M et al. (Genome Res 16(3) (2006): 383-393) analysed lung cancer biopsy samples to identify methylated cpu sites to distinguish lung adenocarcinomas from normal lung tissues.
  • Yan P S et al. (Clin Cancer Res 6(4) (2000): 1432-1438) analysed CpG island hypermethylation in primary breast tumor.
  • Cheng Y et al. (Genome Res 16(2) (2006): 282-289) discussed DNA methylation in CpG islands associated with transcriptional silencing of tumor suppressor genes.
  • Ongenaert M et al. (Nucleic Acids Res 36 (2008) Database issue D842-D846) provided an overview over the methylation database “PubMeth”.
  • Microarray for human genome-wide hybridization testings are known, e.g. the Affymetrix Human Genome U133A Array (NCB1 Database, Acc. No. GLP96).
  • A substantial number of differentially methylated genes has been discovered over years rather by chance than by rationality. Albeit some of these methylation changes have the potential being useful markers for differentiation of specifically defined diagnostic questions, these would lack the power for successful delineation of various diagnostic constellations. Thus, the rational approach would start at the genomic-screen for distinguishing the “subtypes” and diagnostically, prognostically and even therapeutically challenging constellations. These rational expectations are the base of starting genomic (and also other—omics) screenings but do not warrant to obtain the maker panel for all clinical relevant constellations which should be distinguished. This is neither unreliable when thinking about a universal approach (e.g. transcriptomics) suitable to distinguish for instance all subtypes in all different malignancies by focusing on a single class of target-molecules (e.g. RNA). Rather all omics-approaches together would be necessary and could help to improve diagnostics and finally patient management.
  • Lung cancer is the third most common malignant neoplasm in the EU following breast and colon cancers. Lung cancer presents the second worst 5-year survival figures following pancreas. Thus, although it accounts for 14% of all cancer diagnoses, lung cancer is responsible for 22% of cancer deaths, indicating the poor prognosis of this tumour type and the comparative lack of progress in treatment. Therapy is hampered by the tendency for lung cancer to be diagnosed at a late stage, hence the need to develop markers for early detection. Approximately 80% of lung cancer cases are of the non-small cell type (NSCLC), with squamous cell carcinoma and adenocarcinoma being the most frequent subtypes. A goal of the present invention is to provide an alternative and more cost-efficient route to identify suitable markers for lung cancer diagnostics.
  • Therefore, in a first aspect, the present invention provides a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4. Preferably the set further comprises any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHC11, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, WT1.
  • In a further aspect, the present invention provides a method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene marker IDs 1-359 of table 1, suitable for the diagnosis or prognosis of lung cancer or lung cancer type, comprising
      • a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 1 sample, preferably 2, 3, 4 or at least 5 samples, of a confirmed lung cancer or lung cancer type state and at least one sample of a lung cancer or lung cancer type negative state,
      • b) correlating the results of the obtained methylation status with the lung cancer or lung cancer type,
      • c) optionally repeating the obtaining a) and correlating b) steps for a different combination of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 and
      • d) selecting as many marker genes which in a classification analysis have a p-value of less than 0.1 in a random-variance t-test, or selecting as many marker genes which in a classification analysis together have a correct lung cancer or lung cancer type prediction of at least 70% in a cross-validation test,
        wherein the selected markers form the subset of diagnostic markers.
  • The present invention provides a master set of 359 genetic markers which has been surprisingly found to be highly relevant for aberrant methylation in the diagnosis or prognosis of lung cancer. It is possible to determine a multitude of marker subsets from this master set which can be used to diagnose and differentiate between various lung cancer or tumor types, e.g. adenocarcinoma and squamous cell carcinoma.
  • The inventive 359 marker genes of table 1 (given in example 1 below) are: NHLH2, MTHFR, PRDM2, MLLT11, S100A9 (control), S100A9, S100A8 (control), S100A8, S100A2, LMNA, DUSP23, LAMC2, PTGS2, MARK1, DUSP10, PARP1, PSEN2, CLIC4, RUNX3, AIM1L, SFN, RPA2, TP73, TP73 (p73), POU3F1, MUTYH, UQCRH, FAF1, TACSTD2, TN-FR5F25, DIRAS3, MSH4, GBP2, GBP2, LRRC8C, F3, NANOS1, MGMT, EBF3, DCLRE1C, KIF5B, ZNF22, PGBD3, SRGN, GATA3, PTEN, MMS19, SFRP5, PGR, ATM, DRD2, CADM1, TEAD1, OPCML, CALCA, CTSD, MYOD1, IGF2, BDNF, CDKN1C, WT1, HRAS, DDB1, GSTP1, CCND1, EPS8L2, PI-WIL4, CHST11, UNG, CCDC62, CDK2AP1, CHFR, GRIN2B, CCND2, VDR, B4GALNT3, NTF3, CYP27B1, GPR92, ERCC5, GJB2, BRCA2, KL, CCNA1, SMAD9, C13orf15, DGKH, DNAJC15, RB1, RCBTB2, PARP2, APEX1, JUB, JUB (control NM 198086), EFS, BAZ1A, NKX2-1, ESR2, HSPA2, PSEN1, PGF, MLH3, TSHR, THBS1, MYO5C, SMAD6, SMAD3, NOX5, DNAJA4, CRABP1, BCL2A1 (ID NO: 111), BCL2A1 (ID NO: 112), BNC1, ARRDC4, SOCS1, ERCC4, NTHL1, PYCARD, AXIN1, CYLD, MT3, MT1A, MT1G, CDH1, CDH13, DPH1, HIC1, NEUROD2 (control), NEUROD2, ERBB2, KRT19, KRT14, KRT17, JUP, BRCA1, COL1A1, CACNA1G, PRKAR1A, SPHK1, SOX15, TP53 (TP53_CGI23_1 kb), TP53 (TP53_both_CGIs_1 kb), TP53 (TP53_CGI36_1 kb), TP53, NPTX1, SMAD2, DCC, MBD2, ONECUT2, BCL2, SERPINB5, SERPINB2 (control), SERPINB2, TYMS, LAMA1, SALL3, LDLR, STK11, PRDX2, RAD23A, GNA15, ZNF573, SPINT2, XRCC1, ERCC2, ERCC1, C5AR1 (NM_001736), C5AR1, POLD1, ZNF350, ZNF256, C3, XAB2, ZNF559, FHL2, IL1B, IL1B (control), PAX8, DDX18, GAD1, DLX2, ITGA4, NEUROD1, STAT1, TMEFF2, HECW2, BOLL, CASP8, SERPINE2, NCL, CYP1B1, TACSTD1, MSH2, MSH6, MXD1, JAG1, FOXA2, THBD, CTCFL, CTSZ, GATA5, CXADR, APP, TTC3, KCNJ15, RIPK4, TFF1, SEZ6L, TIMP3, BIK, VHL, IRAK2, PPARG, MBD4, RBP1, XPC, ATR, LXN, RARRES1, SERPINI1, CLDN1, FAM43A, IQCG, THRB, RARB, TGFBR2, MLH1, DLEC1, CTNNB1, ZNF502, SLC6A20, GPX1, RASSF1, FHIT, OGG1, PITX2, SLC25A31, FBXW7, SFRP2, CHRNA9, GABRA2, MSX1, IGFBP7, EREG, AREG, ANXA3, BMP2K, APC, HSD17B4 (ID No 249), HSD17B4 (ID No 250), LOX, TERT, NEUROG1, NR3C1, ADRB2, CDX1, SPARC, C5orf4, PTTG1, DUSP1, CPEB4, SCGB3A1, GDNF, ERCC8, F2R, F2RL1, VCAN, ZDHHC11, RHOBTB3, PLAGL1, SASH1, ULBP2, ESR1, RNASET2, DLL1, HIST1H2AG, HLA-G, MSH5, CDKN1A, TDRD6, COL21A1, DSP, SERPINE1 (ID No 283), SERPINE1 (ID No 284), FBXL13, NRCAM, TWIST1, HOXA1, HOXA10, SFRP4, IGFBP3, RPA3, ABCB1, TFPI2, COL1A2, ARPC1B, PILRB, GATA4, MAL2, DLC1, EPPK1, LZTS1, TNFRSF10B, TNFRSF10C, TNFRSF10D, TNFRSF10A, WRN, SFRP1, SNAI2, RDHE2, PENK, RDH10, TGFBR1, ZNF462, KLF4, CDKN2A, CDKN2B, AQP3, TPM2, TJP2 (ID NO 320), TJP2 (ID No 321), PSAT1, DAPK1, SYK, XPA, ARMCX2, RHOXF1, FHL1, MAGEB2, TIMP1, AR, ZNF711, CD24, ABL1, ACTB, APC, CDH1 (Ecad 1), CDH1 (Ecad2), FMR1, GNAS, H19, HIC1, IGF2, KCNQ1, GNAS, CDKN2A (P14), CDKN2B (P15), CDKN2A (P16_VL), PITXA, PITXB, PITXC, PITXD, RB1, SFRP2, SNRPN, XIST, IRF4, UNC13B, GSTP1. Table 1 lists some marker genes in the double such as for different loci and control sequences. It should be understood that any methylation specific region which is readily known to the skilled man in the art from prior publications or available databases (e.g. PubMeth at www.pubmeth.org) can be used according to the present invention. Of course, double listed genes only need to be represented once in an inventive marker set (or set of probes or primers therefor) but preferably a second marker, such as a control region is included (IDs given in the list above relate to the gene ID (or gene loci ID) given in table 1 of the example section).
  • One advantage making DNA methylation an attractive target for biomarker development, is the fact that cell free methylated DNA can be detected in body-fluids like serum, sputum, and urine from patients with cancerous neoplastic conditions and disease. For the purpose of biomarker screening, clinical samples have to be available. For obtaining a sufficient number of samples with clinical and “outcome” or survival data, the first step would be using archived (tissue) samples. Preferably these materials should fulfill the requirements to obtain intact RNA and DNA, but most archives of clinical samples are storing formalin fixed paraffin embedded (FFPE) tissue blocks. This has been the clinic-pathological routine done over decades, but that fixed samples are if at all only suitable for extraction of low quality of RNA. It has now been found that according to the present invention any such samples (as any comprising tumor DNA) can be used for the method of generating an inventive subset, including fixed samples. The samples can be of lung tissue or any body fluid, e.g. sputum, bronchial lavage, or serum derived from peripheral blood or blood cells. Blood or blood derived samples preferably have reduced, e.g. <95%, or no leukocyte content but comprise DNA of the cancerous cells or tumor. Preferably the inventive markers are of human genes. Preferably the samples are human samples.
  • The present invention provides a multiplexed methylation testing method which 1) outperforms the “classification” success when compared to genomewide screenings via RNA-expression profiling, 2) enables identification of biomarkers for a wide variety of diseases, without the need to prescreen candidate markers on a genomewide scale, and 3) is suitable for minimal invasive testing and 4) is easily scalable.
  • In contrast to the rational strategy for elucidation of biomarkers for differentiation of disease, the invention presents a targeted multiplexed DNA-methylation test which outperforms genome-scaled approaches (including RNA expression profiling) for disease diagnosis, classification, and prognosis.
  • The inventive set of 359 markers enables selection of a subset of markers from this 359 set which is highly characteristic of lung cancer and a given lung cancer type. Further indicators differentiating between cancer types or generally neoplastic conditions are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic tumors or nodules. It is sometimes possible to differentiate the sample type from which the methylated DNA is isolated, e.g. urine, blood, tissue samples.
  • The present invention is suitable to differentiate diseases, in particular neoplastic conditions, or tumor types. Diseases and neoplastic conditions should be understood in general including benign and malignant conditions. According to the present invention benign nodules (being at least the potential onset of malignancy) are included in the definition of a disease. After the development of a malignancy the condition is a preferred disease to be diagnosed by the markers screened for or used according to the present invention. The present invention is suitable to distinguish benign and malignant tumors (both being considered a disease according to the present invention). In particular the invention can provide markers (and their diagnostic or prognostic use) distinguishing between a normal healthy state together with a benign state on one hand and malignant states on the other hand. A diagnosis of lung cancer may include identifying the difference to a normal healthy state, e.g. the absence of any neoplastic nodules or cancerous cells. The present invention can also be used for prognosis of lung cancer, in particular a prediction of the progression of lung cancer or lung cancer type. A particularly preferred use of the invention is to perform a diagnosis or prognosis of metastasizing lung cancer (distinguished from non-metastasizing conditions).
  • In the context of the present invention “prognosis”, “prediction” or “predicting” should not be understood in an absolute sense, as in a certainty that an individual will develop lung cancer or lung cancer type (including cancer progression), but as an increased risk to develop cancer or the lung cancer type or of cancer progression. “Prognosis” is also used in the context of predicting disease progression, in particular to predict therapeutic results of a certain therapy of the disease, in particular neoplastic conditions, or lung cancer types. The prognosis of a therapy can e.g. be used to predict a chance of success (i.e. curing a disease) or chance of reducing the severity of the disease to a certain level. As a general inventive concept, markers screened for this purpose are preferably derived from sample data of patients treated according to the therapy to be predicted. The inventive marker sets may also be used to monitor a patient for the emergence of therapeutic results or positive disease progressions.
  • Some of the inventive, rationally selected markers have been found methylated in some instances. DNA methylation analyses in principle rely either on bisulfite deamination-based methylation detection or on using methylation sensitive restriction enzymes. Preferably the restriction enzyme-based strategy is used for elucidation of DNA-methylation changes. Further methods to determine methylated DNA are e.g. given in EP 1 369 493 A1 or U.S. Pat. No. 6,605,432. Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive marker sets (or subsets). A microarray-hybridization step can be used for reading out the PCR results. For the analysis of the hybridization data statistical approaches for class comparisons and class prediction can be used. Such statistical methods are known from analysis of RNA-expression derived microarray data.
  • If only limiting amounts of DNA were available for analyses an amplification protocol can be used enabling selective amplification of the methylated DNA fraction prior methylation testing. Subjecting these amplicons to the methylation test, it was possible to successfully distinguish DNA from sensitive cases from normal healthy controls. In addition it was possible to distinguish lung-cancer patients from healthy normal controls using DNA from serum by the inventive methylation test upon preamplification. Both examples clearly illustrate that the inventive multiplexed methylation testing can be successfully applied when only limiting amounts of DNA are available. Thus, this principle might be the preferred method for minimal invasive diagnostic testing.
  • In most situations several genes are necessary for classification. Although the 359 marker set test is not a genome-wide test and might be used as it is for diagnostic testing, running a subset of markers—comprising the classifier which enables best classification—would be easier for routine applications. The test is easily scalable. Thus, to test only the subset of markers, comprising the classifier, the selected subset of primers/probes could be applied directly to set up of the lower multiplexed test (or single PCR-test). Serum DNA can be used to classify or distinguish healthy patients from individuals with lung-tumors. Only the specific primers comprising the gene-classifier obtained from the methylation test may be set up together in multiplexed PCR reactions.
  • In summary the inventive methylation test is a suitable tool for differentiation and classification of neoplastic disease. This assay can be used for diagnostic purposes and for defining biomarkers for clinical relevant issues to improve diagnosis of disease, and to classify patients at risk for disease progression, thereby improving disease treatment and patient management.
  • The first step of the inventive method of generating a subset, step a) of obtaining data of the methylation status, preferably comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific digestion analysis. Methylation specific digestion analysis can include either or both of hybridization of suitable probes for detection to non-digested fragments or PCR amplification and detection of non-digested fragments.
  • The inventive selection can be made by any (known) classification method to obtain a set of markers with the given diagnostic (or also prognostic) value to categorize a lung cancer or lung cancer type. Such methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.
  • Preferably the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test or t-value test or F-test. Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the subset until a certain diagnostic value is reached, e.g. the herein mentioned at least 70% (or more) correct classification of lung cancer or lung cancer type.
  • Class comparison procedures include identification of genes that were differentially methylated among the two classes using a random-variance t-test. The random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455, 2003). Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01. A stringent significance threshold can be used to limit the number of false positive findings. A global test can also be performed to determine whether the expression profiles differed between the classes by permuting the labels of which arrays corresponded to which classes. For each permutation, the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the “F-test” instead of the “t-test” should be used.
  • Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more bio-logically interpretable and clinically applicable, however, if fewer genes are included. Similar to cross-validation, gene selection is repeated for each training set created in the cross-validation process. That is for the purpose of providing an unbiased estimate of prediction error. The final model and gene set for use with future data is the one resulting from application of the gene selection and classifier fitting to the full dataset.
  • Models for utilizing gene methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al. Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al. Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al.), and Support Vector Machines with linear kernel (Ramaswamy et al. PNAS USA 98:15149-54, 2001). The models incorporated genes that were differentially methylated among genes at a given significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). The prediction error of each model using cross validation, preferably leave-one-out cross-validation (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003), is preferably estimated. For each leave-one-out cross-validation training set, the entire model building process was repeated, including the gene selection process. It may also be evaluated whether the cross-validated error rate estimate for a model was significantly less than one would expect from random prediction. The class labels can be randomly permuted and the entire leave-one-out cross-validation process is then repeated. The significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data. About 1000 random permutations may be usually used.
  • Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3(4):research0017.1-0017.11, 2002). The greedy-pairs approach starts with ranking all genes based on their individual t-scores on the training set. The procedure selects the best ranked gene gi and finds the one other gene gi that together with provides the best discrimination using as a measure the distance between centroids of the two classes with regard to the two genes when projected to the diagonal linear discriminant axis. These two selected genes are then removed from the gene set and the procedure is repeated on the remaining set until the specified number of genes have been selected. This method attempts to select pairs of genes that work well together to discriminate the classes.
  • Furthermore, a binary tree classifier for utilizing gene methylation profile can be used to predict the class of future samples. The first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes. The individual binary classifiers were based on the “Support Vector Machines” incorporating genes that were differentially expressed among genes at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003). Classifiers for all possible binary partitions are evaluated and the partition selected was that for which the cross-validated prediction error was minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split. The prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation included re-selection of the optimal partitions at each node and re-selection of the genes used for each cross-validated training set as described by Simon et al. (Simon et al. Journal of the National Cancer Institute 95:14-18, 2003). 10-fold cross validation in which one-tenth of the samples is withheld can be utilized, a binary tree developed on the remaining 9/10 of the samples, and then class membership is predicted for the 10% of the samples withheld. This is repeated 10 times, each time withholding a different 10% of the samples. The samples are randomly partitioned into 10 test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute).
  • Preferably the correlated results for each gene b) are rated by their correct correlation to lung cancer or lung cancer type positive state, preferably by p-value test. It is also possible to include a step in that the genes are selected d) in order of their rating.
  • Independent from the method that is finally used to produce a subset with certain diagnostic or predictive value, the subset selection preferably results in a subset with at least 60%, preferably at least 65%, at least 70%, at least 75%, at least 80% or even at least 85%, at least 90%, at least 92%, at least 95%, in particular preferred 100% correct classification of test samples of lung cancer or lung cancer type. Such levels can be reached by repeating c) steps a) and b) of the inventive method, if necessary.
  • To prevent increase of the number of the members of the subset, only marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.
  • In particular preferred embodiments the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 190, at least 200, at least 220, at least 240, at least 260, at least 280, at least 300, at least 320, at least 340, at least 350 or all, genes.
  • Since the subset should be small it is preferred that not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are selected in step d) for the subset.
  • In a further aspect the present invention provides a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a patient, comprising providing a diagnostic subset of markers identified according to the method depicted above, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type in the sample.
  • The methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC—methylated C—changes by any known methods, including PCR and hybridization techniques). Preferably, the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments. The methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.
  • As further explained below the methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.
  • The determination using only specific primers aims at specifically amplifying methylated (or in the alternative non-methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions—and consequently only the non-methylated (or alternatively methylated) DNA is obtained. By using a genome chip (or simply a gene chip including hybridization probes for all genes of interest such as all 359 marker genes), all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.
  • Alternatively it is possible to use universal primers and amplify a multitude of potentially methylated genetic regions (including the genetic markers of the invention) which are, as described either methylation specific amplified or digested, and then use a set of hybridisation probes for the characteristic markers on e.g. a chip for detection. I.e. gene selection is performed on the chip.
  • Either set, a set of probes or a set of primers, can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.
  • The method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes. In order to provide a robust and optionally re-useable test format, the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof. The DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots).
  • Preferably the methylated DNA of the sample is detected by a multiplexed hybridization reaction. In further embodiments a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion. Preferably, also the amplification reaction is multiplexed (e.g. multiplex PCR).
  • The inventive methods (for the screening of subsets or for diagnosis or prognosis of lung cancer or lung cancer type) are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng. The inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500 ng, below 400 ng, below 300 ng, below 200 ng, below 100 ng, below 50 ng or even below 25 ng, per ml sample.
  • In another aspect the present invention provides a subset comprising or consisting of nucleic acid primers or hybridization probes being specific for a potentially methylated region of at least marker genes selected from a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes being suitable to diagnose or predict lung cancer or a lung cancer type, preferably being selected from adenocarcinoma or squamous cell carcinoma, the marker genes comprising WT1, SALL3, TERT, ACTB, CPEB4 or any other subset selected from one of the following groups
      • a) WT1, DLX2, SALL3, TERI, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C
      • b) WT1, PITX2, SALL3, F2R, DLX2, TERI, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1, DNAJA4, APC, MBD2, ERCC1, HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, CPEB4
      • c) WT1, DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, CPEB4
      • d) WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, CPEB4
      • e) WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, CPEB4
      • f) WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4
      • g) WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, ZNF711
      • h) F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, COL21A1
      • i) KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, BOLL
      • j) WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C
      • k) HOXA10, NEUROD1
      • l) WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TN-FR5F25, KL, SMAD3
      • m) TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, PITX2
      • n) SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, KL
      • o) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL
      • p) SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL
      • q) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, KL
      • r) SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, TNFRSF25
      • s) SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TN-FR5F25, CXADR
      • t) HOXA10, RASSF1, F2R
      • or
  • a set of at least 50%, preferably at least 60%, at least 70%, at least 80%, at least 90%, 100% of the markers of anyone of the above a) to t). The present inventive set also includes sets with at least 50% of the above markers for each set since it is also possible to substitute parts of these subsets being specific for—in the case of binary conditions/differentiations—e.g. good or bad prognosis or distinguish between lung cancer or lung cancer types, wherein one part of the subset points into one direction for a certain lung cancer type or cancer/differentiation. It is possible to further complement the 50% part of the set by additional markers specific for diagnosing lung cancer or determining the other part of the good or bad differentiation or differentiation between two lung cancer types. Methods to determine such complementing markers follow the general methods as outlined herein.
  • Each of these marker subsets is particularly suitable to diagnose lung cancer or lung cancer type or distinguish between certain cancers, samples or cancer types in a methylation specific assay of these genes.
  • The inventive primers or probes may be of any nucleic acid, including RNA, DNA, PNA (peptide nucleic acids), LNA (locked nucleic acids). The probes might further comprise methylation specific moieties.
  • The present invention provides a (master) set of 360 marker genes, further also specific gene locations by the PCR products of these genes wherein significant methylation can be detected, as well as subsets therefrom with a certain diagnostic value to detect or diagnose lung cancer or distinguish lung cancer type(s). Preferably the set is optimized for a lung cancer or a lung cancer type. Lung cancer types include, without being limited thereto, adenocarcinoma and squamous cell carcinoma. Further indicators differentiating between disease(s), including the diagnosis of any type of lung cancer or lung tumor, or between tumor type(s) are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic. The set can also be optimized for a specific sample type in which the methylated DNA is tested. Such samples include blood, urine, saliva, hair, skin, tissues, in particular tissues of the cancer origin mentioned above, in particular lung tissue such as potentially affected or potentially cancerous lung tissue, or serum, sputum, bronchial lavage. The sample my be obtained from a patient to be diagnosed. In preferred embodiments the test sample to be used in the method of identifying a subset is from the same type as a sample to be used in the diagnosis.
  • In practice, probes specific for potentially aberrant methylated regions are provided, which can then be used for the diagnostic method.
  • It is also possible to provide primers suitable for a specific amplification, like PCR, of these regions in order to perform a diagnostic test on the methylation state.
  • Such probes or primers are provided in the context of a set corresponding to the inventive marker genes or marker gene loci as given in table 1.
  • Such a set of primers or probes may have all 359 inventive markers present and can then be used for a multitude of different cancer detection methods. Of course, not all markers would have to be used to diagnose a lung cancer or lung cancer type. It is also possible to use certain subsets (or combinations thereof) with a limited number of marker probes or primers for diagnosis of certain categories of lung cancer.
  • Therefore, the present invention provides sets of primers or probes comprising primers or probes for any single marker subset or any combination of marker subsets disclosed herein. In the following sets of marker genes should be understood to include sets of primer pairs and probes therefor, which can e.g. be provided in a kit.
  • Set a, WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue.
  • Set b, WIT1, PITX2, SALL3, F2R, DLX2, TERT, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1, DNAJA4, APC, MBD2, ERCC1, HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect lung cancer and to distinguish between normal lung tissue and lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set c, WT1, DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set d, WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set e, WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set f, WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to detect lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set g, WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, ZNF711 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung carcinoma, in particular using blood samples, e.g. to distinguish blood from healthy persons from tumor samples, including tumor tissue sample or blood from tumor patients. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set h, F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, COL21A1 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish the grade of differentiation of poor, moderate and well predictions. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set i, KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, BOLL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between malign states (in particular adenocarcinoma and squamous cell carcinoma) together with lung tissue against healthy blood or serum samples. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set j, WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, TNFRSF10C and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose, lung cancer and distinguish between malign states selected from adenocarcinoma and squamous cell carcinoma from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set k, HOXA10, NEUROD1 and/or either HOXA10 or NEUR001 can be used to diagnose lung cancer and further to distinguish between adenocarcinoma from squamous cell carcinoma.
  • Set l, WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TNFRSF25, KL, SMAD3 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue.
  • Set m, TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, PITX2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set n, SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set o, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and distinguish between cancerous lung tissue from healthy lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set p, SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue.
  • Set q, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, KL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer and to distinguish between normal lung tissue (non-cancerous) from lung tumor tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set r, SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, TNFRSF25 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish between adenocarcinoma, healthy lung tissue and squamous cell carcinoma. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set s, SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TNFRSF25, CXADR and 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish adenocarcinoma and squamous cell carcinoma from healthy (benign) lung tissue. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Set t, HOXA10, RASSF1, F2R and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose lung cancer, distinguish between adenocarcinoma and squamous cell carcinoma. The distinction or diagnosis can be made by using any sample as described above, including serum, sputum, bronchial lavage.
  • Also provided are combinations of the above mentioned subsets a) to t), in particular sets comprising markers of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of these subsets, preferably for the lung cancer type or preferably complete sets a) to t). One preferred set comprises gene markers WT1, SALL3, TERT, ACTB and CPEB4. These markers are common in a set for the diagnosis of lung cancer and suitable to distinguish normal from lung cancer samples. This set preferably is supplemented by the marker genes DLX2, TNFRSF25 or SMAD3. Furthermore, the inventive set may comprise any one of the markers ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLA-G, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHC11, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, WT1 or any combination thereof, in particular preferred are markers ACTB, APC, CPEB4, CXADR, DLX2, DNAJA4, F2R, HOXA10, KL, PITX2, RASSF1, SALL3, SPARC, TERT, (either TNFRSF10C or TNFRSF25 or both), WT1 or any combination thereof, even more preferred are markers HOXA10, PITX2, RASSF1, SALL3, SPARC, TERT or any combination thereof, in a marker set according to the present invention, in particular as additional markers for any one of sets a) to t), especially the marker set of markers WT1, SALL3, TERT, ACTB and CPEB4.
  • According to a preferred embodiment of the present invention, the methylation of at least two genes, preferably of at least three genes, especially of at least four genes, is determined. Specifically if the present invention is provided as an array test system, at least ten, especially at least fifteen genes, are preferred. In preferred test set-ups (for example in microarrays (“gene-chips”)) preferably at least 20, even more preferred at least 30, especially at least 40 genes, are provided as test markers. As mentioned above, these markers or the means to test the markers can be provided in a set of probes or a set of primers, preferably both.
  • In a further embodiment the set comprises up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000 probes or primer pairs (set of two primers for one amplification product), preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, or even more preferred up to 200 probes or primers of any kind, particular in the case of immobilized probes on a solid surface such as a chip.
  • In certain embodiments the primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.
  • Preferably the probes or primers are specific for a methylation in the genetic regions defined by SEQ ID NOs 1081 to 1440, including the adjacent up to 500 base pairs, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent, corresponding to gene marker IDs 1 to 359 of table 1, respectively. I.e. probes or primers of the inventive set (including the full 359 set, as well as subsets and combinations thereof) are specific for the regions and gene loci identified in table 1, last column with reference to the sequence listing, SEQ ID NOs: 1081 to 1440. As can be seen these SEQ IDs correspond to a certain gene, the latter being a member of the inventive sets, in particular of the subsets a) to t), e.g.
  • Examples of specific probes or primers are given in table 1 with reference to the sequence listing, SEQ ID NOs 1 to 1080, which form especially preferred embodiments of the invention.
  • Preferably the set of the present invention comprises probes or primers for at least one gene or gene product of the list according to table 1, wherein at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, especially preferred at least 100%, of the total probes or primers are probes or primers for genes of the list according to table 1. Preferably the set, in particular in the case of a set of hybridization probes, is provided immobilized on a solid surface, preferably a chip or in form of a microarray. Since—according to current technology—detection means for genes on a chip allow easier and more robust array design, gene chips using DNA molecules (for detection of methylated DNA in the sample) is a preferred embodiment of the present invention. Such gene chips also allow detection of a large number of nucleic acids.
  • Preferably the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized. Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips.
  • The primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers. The probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.
  • The present invention also relates to a method of identifying lung cancer or lung cancer type in a sample comprising DNA from a subject or patient, comprising obtaining a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof), determining the methylation status of the genes in the sample for which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed lung cancer or lung cancer type positive and/or negative state, thereby identifying the lung cancer or lung cancer type in the sample. In general the inventive method has been described above and all preferred embodiments of such methods also apply to the method using the set provided herein.
  • The inventive marker set, including certain disclosed subsets and subsets, which can be identified with the methods disclosed herein, are suitable to diagnose lung cancer and distinguish between different lung cancer forms, in particular for diagnostic or prognostic uses. Preferably the markers used (e.g. by utilizing primers or probes of the inventive set) for the inventive diagnostic or prognostic method may be used in smaller amounts than e.g. in the set (or kit) or chip as such, which may be designed for more than one fine tuned diagnosis or prognosis. The markers used for the diagnostic or prognostic method may be up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000, preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20,000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, up to 200, up to 100, up to 80, or even more preferred up to 60. The inventive set of marker primers or probes can be employed in chip (immobilised) based assays, products or methods, or in PCR based kits or methods. Both, PCR and hybridisation (e.g. on a chip) can be used to detect methylated genes.
  • The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between lung cancer from normal tissue, in particular for diagnostic or prognostic uses.
  • The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between adenocarcinoma from squamous cell carcinoma, in particular for diagnostic or prognostic uses.
  • The present invention is further illustrated by the following examples, without being restricted thereto.
  • FIGURES
  • FIG. 1: Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor.
  • EXAMPLES Example 1 Gene List
  • TABLE 1
    360 master set (with the 359 marker genes and one control)
    and sequence annotation
    hybrid- primer primer
    isation
    1 2 PCR
    probe (lp) (rp) product
    alt. (SEQ (SEQ (SEQ (SEQ
    gene Gene Gene ID ID ID ID
    ID Symbol Symbol NO:) NO:) NO:) NO:)
    1 NHLH2 NHLH2 1 361 721 1081
    2 MTHFR MTHFR 2 362 722 1082
    3 PRDM2 RIZ1 3 363 723 1083
    (PRDM2)
    4 MLLT11 MLLT11 4 364 724 1084
    5 S100A9 control_ 5 365 725 1085
    S100A9
    6 S100A9 S100A9 6 366 726 1086
    7 S100A8 S100A8 7 367 727 1087
    8 S100A8 control_ 8 368 728 1088
    S100A8
    9 S100A2 S100A2 9 369 729 1089
    10 LMNA LMNA 10 370 730 1090
    11 DUSP23 DUSP23 11 371 731 1091
    12 LAMC2 LAMC2 12 372 732 1092
    13 PTGS2 PTGS2 13 373 733 1093
    14 MARK1 MARK1 14 374 734 1094
    15 DUSP10 DUSP10 15 375 735 1095
    16 PARP1 PARP1 16 376 736 1096
    17 PSEN2 PSEN2 17 377 737 1097
    18 CLIC4 CLIC4 18 378 738 1098
    19 RUNX3 RUNX3 19 379 739 1099
    20 AIM1L NM_ 20 380 740 1100
    017977
    21 SFN SFN 21 381 741 1101
    22 RPA2 RPA2 22 382 742 1102
    23 TP73 TP73 23 383 743 1103
    24 TP73 p73 24 384 744 1104
    25 POU3F1 01.10.06 25 385 745 1105
    26 MUTYH MUTYH 26 386 746 1106
    27 UQCRH UQCRH 27 387 747 1107
    28 FAF1 FAF1 28 388 748 1108
    29 TACSTD2 TACSTD2 29 389 749 1109
    30 TNFRSF25 TNFRSF25 30 390 750 1110
    31 DIRAS3 DIRAS3 31 391 751 1111
    32 MSH4 MSH4 32 392 752 1112
    33 GBP2 Control 33 393 753 1113
    34 GBP2 GBP2 34 394 754 1114
    35 LRRC8C LRRC8C 35 395 755 1115
    36 F3 F3 36 396 756 1116
    37 NANOS1 NM_ 37 397 757 1117
    001009553
    38 MGMT MGMT 38 398 758 1118
    39 EBF3 EBF3 39 399 759 1119
    40 DCLRE1C DCLRE1C 40 400 760 1120
    41 KIF5B KIF5B 41 401 761 1121
    42 ZNF22 ZNF22 42 402 762 1122
    43 PGBD3 ERCC6 43 403 763 1123
    44 SRGN Control 44 404 764 1124
    45 GATA3 GATA3 45 405 765 1125
    46 PTEN PTEN 46 406 766 1126
    47 MMS19 MMS19L 47 407 767 1127
    48 SFRP5 SFRP5 48 408 768 1128
    49 PGR PGR 49 409 769 1129
    50 ATM ATM 50 410 770 1130
    51 DRD2 DRD2 51 411 771 1131
    52 CADM1 IGSF4 52 412 772 1132
    53 TEAD1 Control 53 413 773 1133
    54 OPCML OPCML 54 414 774 1134
    55 CALCA CALCA 55 415 775 1135
    56 CTSD CTSD 56 416 776 1136
    57 MYOD1 MYOD1 57 417 777 1137
    58 IGF2 IGF2 58 418 778 1138
    59 BDNF BDNF 59 419 779 1139
    60 CDKN1C CDKN1C 60 420 780 1140
    61 WT1 WT1 61 421 781 1141
    62 HRAS HRAS1 62 422 782 1142
    63 DDB1 DDB1 63 423 783 1143
    64 GSTP1 GSTP1 64 424 784 1144
    65 CCND1 CCND1 65 425 785 1145
    66 EPS8L2 EPS8L2 66 426 786 1146
    67 PIWIL4 PIWIL4 67 427 787 1147
    68 CHST11 CHST11 68 428 788 1148
    69 UNG UNG 69 429 789 1149
    70 CCDC62 CCDC62 70 430 790 1150
    71 CDK2AP1 CDK2AP1 71 431 791 1151
    72 CHFR CHFR 72 432 792 1152
    73 GRIN2B GRIN2B 73 433 793 1153
    74 CCND2 CCND2 74 434 794 1154
    75 VDR VDR 75 435 795 1155
    76 B4GALNT3 control 76 436 796 1156
    (wrong chr
    of HRAS1)
    77 NTF3 NTF3 77 437 797 1157
    78 CYP27B1 CYP27B1 78 438 798 1158
    79 GPR92 GPR92 79 439 799 1159
    80 ERCC5 ERCC5 80 440 800 1160
    81 GJB2 GJB2 81 441 801 1161
    82 BRCA2 BRCA2 82 442 802 1162
    83 KL KL 83 443 803 1163
    84 CCNA1 CCNA1 84 444 804 1164
    85 SMAD9 SMAD9 85 445 805 1165
    86 C13orf15 RGC32 86 446 806 1166
    87 DGKH DGKH 87 447 807 1167
    88 DNAJC15 DNAJC15 88 448 808 1168
    89 RB1 RB1 89 449 809 1169
    90 RCBTB2 RCBTB2 90 450 810 1170
    91 PARP2 PARP2 91 451 811 1171
    92 APEX1 APEX1 92 452 812 1172
    93 JUB JUB 93 453 813 1173
    94 JUB control_ 94 454 814 1174
    NM_19808
    95 EFS EFS 95 455 815 1175
    96 BAZ1A BAZ1A 96 456 816 1176
    97 NKX2-1 TITF1 97 457 817 1177
    98 ESR2 ESR2 98 458 818 1178
    99 HSPA2 HSPA2 99 459 819 1179
    100 PSEN1 PSEN1 100 460 820 1180
    101 PGF PGF 101 461 821 1181
    102 MLH3 MLH3 102 462 822 1182
    103 TSHR TSHR 103 463 823 1183
    104 THBS1 THBS1 104 464 824 1184
    105 MYO5C MYO5C 105 465 825 1185
    106 SMAD6 SMAD6 106 466 826 1186
    107 SMAD3 SMAD3 107 467 827 1187
    108 NOX5 SPESP1 108 468 828 1188
    109 DNAJA4 DNAJA4 109 469 829 1189
    110 CRABP1 CRABP1 110 470 830 1190
    111 BCL2A1 BCL2A1 111 471 831 1191
    112 BCL2A1 BCL2A1 112 472 832 1192
    113 BNC1 BNC1 113 473 833 1193
    114 ARRDC4 ARRDC4 114 474 834 1194
    115 SOCS1 SOCS1 115 475 835 1195
    116 ERCC4 ERCC4 116 476 836 1196
    117 NTHL1 NTHL1 117 477 837 1197
    118 PYCARD PYCARD 118 478 838 1198
    119 AXIN1 AXIN1 119 479 839 1199
    120 CYLD NM_015247 120 480 840 1200
    121 MT3 MT3 121 481 841 1201
    122 MT1A MT1A 122 482 842 1202
    123 MT1G MT1G 123 483 843 1203
    124 CDH1 CDH1 124 484 844 1204
    125 CDH13 CDH13 125 485 845 1205
    126 DPH1 DPH1 126 486 846 1206
    127 HIC1 HIC1 127 487 847 1207
    128 NEUROD2 control_ 128 488 848 1208
    NEUROD2
    129 NEUROD2 NEUROD2 129 489 849 1209
    130 ERBB2 ERBB2 130 490 850 1210
    131 KRT19 KRT19 131 491 851 1211
    132 KRT14 KRT14 132 492 852 1212
    133 KRT17 KRT17 133 493 853 1213
    134 JUP JUP 134 494 854 1214
    135 BRCA1 BRCA1 135 495 855 1215
    136 COL1A1 COL1A1 136 496 856 1216
    137 CACNA1G CACNA1G 137 497 857 1217
    138 PRKAR1A PRKAR1A 138 498 858 1218
    139 SPHK1 SPHK1 139 499 859 1219
    140 SOX15 SOX15 140 500 860 1220
    141 TP53 TP53_ 141 501 861 1221
    CGI23_1kb
    142 TP53 TP53_ 142 502 862 1222
    bothCGIs_
    1kb
    143 TP53 TP53_ 143 503 863 1223
    CGI36_1kb
    144 TP53 TP53 144 504 864 1224
    145 NPTX1 NPTX1 145 505 865 1225
    146 SMAD2 SMAD2 146 506 866 1226
    147 DCC DCC 147 507 867 1227
    148 MBD2 MBD2 148 508 868 1228
    149 ONECUT2 ONECUT2 149 509 869 1229
    150 BCL2 BCL2 150 510 870 1230
    151 SERPINB5 SERPINB5 151 511 871 1231
    152 SERPINB2 Control 152 512 872 1232
    153 SERPINB2 SERPINB2 153 513 873 1233
    154 TYMS TYMS 154 514 874 1234
    155 LAMA1 LAMA1 155 515 875 1235
    156 SALL3 SALL3 156 516 876 1236
    157 LDLR LDLR 157 517 877 1237
    158 STK11 STK11 158 518 878 1238
    159 PRDX2 PRDX2 159 519 879 1239
    160 RAD23A RAD23A 160 520 880 1240
    161 GNA15 GNA15 161 521 881 1241
    162 ZNF573 ZNF573 162 522 882 1242
    163 SPINT2 SPINT2 163 523 883 1243
    164 XRCC1 XRCC1 164 524 884 1244
    165 ERCC2 ERCC2 165 525 885 1245
    166 ERCC1 ERCC1 166 526 886 1246
    167 C5AR1 NM_001736 167 527 887 1247
    168 C5AR1 C5AR1 168 528 888 1248
    169 POLD1 POLD1 169 529 889 1249
    170 ZNF350 ZNF350 170 530 890 1250
    171 ZNF256 ZNF256 171 531 891 1251
    172 C3 C3 172 532 892 1252
    173 XAB2 XAB2 173 533 893 1253
    174 ZNF559 ZNF559 174 534 894 1254
    175 FHL2 FHL2 175 535 895 1255
    176 IL1B IL1B 176 536 896 1256
    177 IL1B control_IL1B 177 537 897 1257
    178 PAX8 PAX8 178 538 898 1258
    179 DDX18 DDX18 179 539 899 1259
    180 GAD1 GAD1 180 540 900 1260
    181 DLX2 DLX2 181 541 901 1261
    182 ITGA4 ITGA4 182 542 902 1262
    183 NEUROD1 NEUROD1 183 543 903 1263
    184 STAT1 STAT1 184 544 904 1264
    185 TMEFF2 TMEFF2 185 545 905 1265
    186 HECW2 HECW2 186 546 906 1266
    187 BOLL BOLL 187 547 907 1267
    188 CASP8 CASP8 188 548 908 1268
    189 SERPINE2 SERPINE2 189 549 909 1269
    190 NCL NCL 190 550 910 1270
    191 CYP1B1 CYP1B1 191 551 911 1271
    192 TACSTD1 TACSTD1 192 552 912 1272
    193 MSH2 MSH2 193 553 913 1273
    194 MSH6 MSH6 194 554 914 1274
    195 MXD1 MXD1 195 555 915 1275
    196 JAG1 JAG1 196 556 916 1276
    197 FOXA2 FOXA2 197 557 917 1277
    198 THBD THBD 198 558 918 1278
    199 CTCFL BORIS 199 559 919 1279
    200 CTSZ CTSZ 200 560 920 1280
    201 GATA5 GATA5 201 561 921 1281
    202 CXADR CXADR 202 562 922 1282
    203 APP APP 203 563 923 1283
    204 TTC3 TTC3 204 564 924 1284
    205 KCNJ15 Control 205 565 925 1285
    206 RIPK4 RIPK4 206 566 926 1286
    207 TFF1 TFF1 207 567 927 1287
    208 SEZ6L SEZ6L 208 568 928 1288
    209 TIMP3 TIMP3 209 569 929 1289
    210 BIK BIK 210 570 930 1290
    211 VHL VHL 211 571 931 1291
    212 IRAK2 IRAK2 212 572 932 1292
    213 PPARG PPARG 213 573 933 1293
    214 MBD4 MBD4 214 574 934 1294
    215 RBP1 RBP1 215 575 935 1295
    216 XPC XPC 216 576 936 1296
    217 ATR ATR 217 577 937 1297
    218 LXN LXN 218 578 938 1298
    219 RARRES1 RARRES1 219 579 939 1299
    220 SERPINI1 SERPINI1 220 580 940 1300
    221 CLDN1 CLDN1 221 581 941 1301
    222 FAM43A FAM43A 222 582 942 1302
    223 IQCG IQCG 223 583 943 1303
    224 THRB THRB 224 584 944 1304
    225 RARB RARB 225 585 945 1305
    226 TGFBR2 TGFBR2 226 586 946 1306
    227 MLH1 MLH1 227 587 947 1307
    228 DLEC1 DLEC1 228 588 948 1308
    229 CTNNB1 CTNNB1 229 589 949 1309
    230 ZNF502 ZNF502 230 590 950 1310
    231 SLC6A20 SLC6A20 231 591 951 1311
    232 GPX1 GPX1 232 592 952 1312
    233 RASSF1 RASSF1A 233 593 953 1313
    234 FHIT FHIT 234 594 954 1314
    235 OGG1 OGG1 235 595 955 1315
    236 PITX2 PITX2 236 596 956 1316
    237 SLC25A31 SLC25A31 237 597 957 1317
    238 FBXW7 FBXW7 238 598 958 1318
    239 SFRP2 SFRP2 239 599 959 1319
    240 CHRNA9 CHRNA9 240 600 960 1320
    241 GABRA2 GABRA2 241 601 961 1321
    242 MSX1 MSX1 242 602 962 1322
    243 IGFBP7 IGFBP7 243 603 963 1323
    244 EREG EREG 244 604 964 1324
    245 AREG AREG 245 605 965 1325
    246 ANXA3 ANXA3 246 606 966 1326
    247 BMP2K BMP2K 247 607 967 1327
    248 APC APC 248 608 968 1328
    249 HSD17B4 HSD17B4 249 609 969 1329
    250 HSD17B4 HSD17B4 250 610 970 1330
    251 LOX LOX 251 611 971 1331
    252 TERT TERT 252 612 972 1332
    253 NEUROG1 NEUROG1 253 613 973 1333
    254 NR3C1 NR3C1 254 614 974 1334
    255 ADRB2 ADRB2 255 615 975 1335
    256 CDX1 CDX1 256 616 976 1336
    257 SPARC SPARC 257 617 977 1337
    258 C5orf4 Control 258 618 978 1338
    259 PTTG1 PTTG1 259 619 979 1339
    260 DUSP1 DUSP1 260 620 980 1340
    261 CPEB4 CPEB4 261 621 981 1341
    262 SCGB3A1 SCGB3A1 262 622 982 1342
    263 GDNF GDNF 263 623 983 1343
    264 ERCC8 ERCC8 264 624 984 1344
    265 F2R F2R 265 625 985 1345
    266 F2RL1 F2RL1 266 626 986 1346
    267 VCAN CSPG2 267 627 987 1347
    268 ZDHHC11 ZDHHC11 268 628 988 1348
    269 RHOBTB3 RHOBTB3 269 629 989 1349
    270 PLAGL1 PLAGL1 270 630 990 1350
    271 SASH1 SASH1 271 631 991 1351
    272 ULBP2 ULBP2 272 632 992 1352
    273 ESR1 ESR1 273 633 993 1353
    274 RNASET2 RNASET2 274 634 994 1354
    275 DLL1 DLL1 275 635 995 1355
    276 HIST1H2AG HIST1H2AG 276 636 996 1356
    277 HLA-G HLA-G 277 637 997 1357
    278 MSH5 MSH5 278 638 998 1358
    279 CDKN1A CDKN1A 279 639 999 1359
    280 TDRD6 TDRD6 280 640 1000 1360
    281 COL21A1 COL21A1 281 641 1001 1361
    282 DSP DSP 282 642 1002 1362
    283 SERPINE1 SERPINE1 283 643 1003 1363
    284 SERPINE1 SERPINE1 284 644 1004 1364
    285 FBXL13 FBXL13 285 645 1005 1365
    286 NRCAM NRCAM 286 646 1006 1366
    287 TWIST1 TWIST1 287 647 1007 1367
    288 HOXA1 HOXA1 288 648 1008 1368
    289 HOXA10 HOXA10 289 649 1009 1369
    290 SFRP4 SFRP4 290 650 1010 1370
    291 IGFBP3 IGFBP3 291 651 1011 1371
    292 RPA3 RPA3 292 652 1012 1372
    293 ABCB1 ABCB1 293 653 1013 1373
    294 TFPI2 TFPI2 294 654 1014 1374
    295 COL1A2 COL1A2 295 655 1015 1375
    296 ARPC1B ARPC1B 296 656 1016 1376
    297 PILRB PILRB 297 657 1017 1377
    298 GATA4 GATA4 298 658 1018 1378
    299 MAL2 NM_052886 299 659 1019 1379
    300 DLC1 DLC1 300 660 1020 1380
    301 EPPK1 NM_031308 301 661 1021 1381
    302 LZTS1 LZTS1 302 662 1022 1382
    303 TNFRSF10B TNFRSF10B 303 663 1023 1383
    304 TNFRSF10C TNFRSF10C 304 664 1024 1384
    305 TNFRSF10D TNFRSF10D 305 665 1025 1385
    306 TNFRSF10A TNFRSF10A 306 666 1026 1386
    307 WRN WRN 307 667 1027 1387
    308 SFRP1 SFRP1 308 668 1028 1388
    309 SNAI2 SNAI2 309 669 1029 1389
    310 RDHE2 RDHE2 310 670 1030 1390
    311 PENK PENK 311 671 1031 1391
    312 RDH10 RDH10 312 672 1032 1392
    313 TGFBR1 TGFBR1 313 673 1033 1393
    314 ZNF462 ZNF462 314 674 1034 1394
    315 KLF4 KLF4 315 675 1035 1395
    316 CDKN2A p14_ 316 676 1036 1396
    CDKN2A
    317 CDKN2B CDKN2B 317 677 1037 1397
    318 AQP3 AQP3 318 678 1038 1398
    319 TPM2 TPM2 319 679 1039 1399
    320 TJP2 TJP2 320 680 1040 1400
    321 TJP2 TJP2 321 681 1041 1401
    322 PSAT1 PSAT1 322 682 1042 1402
    323 DAPK1 DAPK1 323 683 1043 1403
    324 SYK SYK 324 684 1044 1404
    325 XPA XPA 325 685 1045 1405
    326 ARMCX2 ARMCX2 326 686 1046 1406
    327 RHOXF1 OTEX 327 687 1047 1407
    328 FHL1 FHL1 328 688 1048 1408
    329 MAGEB2 MAGEB2 329 689 1049 1409
    330 TIMP1 TIMP1 330 690 1050 1410
    331 AR AR_humara 331 691 1051 1411
    332 ZNF711 ZNF6 332 692 1052 1412
    333 CD24 CD24 333 693 1053 1413
    334 ABL1 ABL 334 694 1054 1414
    335 ACTB Aktin_VL 335 695 1055 1415
    336 APC APC 336 696 1056 1416
    337 CDH1 Ecad1 337 697 1057 1417
    338 CDH1 Ecad2 338 698 1058 1418
    339 FMR1 FX 339 699 1059 1419
    340 GNAS GNASexAB 340 700 1060 1420
    341 H19 H19 341 701 1061 1421
    342 HIL1 Igf2 342 702 1062 1422
    343 IGF2 Igf2 343 703 1063 1423
    344 KCNQ1 LIT1 344 704 1064 1424
    345 GNAS NESP55 345 705 1065 1425
    346 CDKN2A P14 346 706 1066 1426
    347 CDKN2B P15 347 707 1067 1427
    348 CDKN2A P16_VL 348 708 1068 1428
    349 PITX2 PitxA 349 709 1069 1429
    350 PITX2 PitxB 350 710 1070 1430
    351 PITX2 PitxC 351 711 1071 1431
    352 PITX2 PitxD 352 712 1072 1432
    353 RB1 Rb 353 713 1073 1433
    354 SFRP2 SFRP2_VL 354 714 1074 1434
    355 SNRPN SNRPN 355 715 1075 1435
    356 XIST XIST 356 716 1076 1436
    357 IRF4 chr6_ 357 717 1077 1437
    control
    358 UNC13B chr9_ 358 718 1078 1438
    control
    359 GSTP1 GSTP1 360 720 1080 1440
    360 Lamda lambda_ 359 719 1079 1439
    (control) PCR
  • Example 2 Samples
  • Samples from solid tumors were derived from initial surgical resection of primary tumors. Tumor tissue sections were derived from histopathology and histopathological data as well clinical data were monitored over the time of clinical management of the patients and/or collected from patient reports in the study center. Anonymised data and DNA were provided.
  • Example 3 Principle of the Assay and Design
  • The invention assay is a multiplexed assay for DNA methylation testing of up to (or even more than) 360 methylation candidate markers, enabling convenient methylation analyses for tumor-marker definition. In its best mode the test is a combined multiplex-PCR and microarray hybridization technique for multiplexed methylation testing. The inventive marker genes, PCR primer sequences, hybridization probe sequences and expected PCR products are given in table 1, above.
  • Targeting hypermethylated DNA regions in the inventive marker genes in several neoplasias, methylation analysis is performed via methylation dependent restriction enzyme (MSRE) digestion of 500 ng of starting DNA. A combination of several MSREs warrants complete digestion of unmethylated DNA. All targeted DNA regions have been selected in that way that sequences containing multiple MSRE sites are flanked by methylation independent restriction enzyme sites. This strategy enables pre-amplification of the methylated DNA fraction before methylation analyses. Thus, the design and pre-amplification would enable methylation testing on serum, urine, stool etc. when DNA is limiting.
  • When testing DNA without pre-amplification upon digestion of 500 ng the methylated DNA fraction is amplified within 16 multiplex PCRs and detected via microarray hybridization. Within these 16 multiplex-PCR reactions 360 different human DNA products can be amplified. From these about 20 amplicons serve as digestion & amplification controls and are either derived from known differentially methylated human DNA regions, or from several regions without any sites of MSREs used in this system. The primer set (every reverse primer is biotinylated) used is targeting 347 different sites located in the 5′UTR of 323 gene regions.
  • After PCR amplicons are pooled and positives are detected using strepavidin-Cy3 via microarray hybridization. Although the melting temperature of CpG rich DNA is very high, primer and probe-design as well as hybridization conditions have been optimized, thus this assay enables unequivocal multiplexed methylation testing of human DNA samples. The assay has been designed such that 24 samples can be run in parallel using 384well PCR plates.
  • Handling of many DNA samples in several plates in parallel can be easily performed enabling completion of analyses within 1-2 days.
  • The entire procedure provides the user to setup a specific PCR test and subsequent gel-based or hybridization-based testing of selected markers using single primer-pairs or primer-subsets as provided herein or identified by the inventive method from the 360 marker set.
  • Example 4 MSRE Digestion of DNA
  • MSRE digestion of DNA (about 500 ng) was performed at 37° C. over night in a volume of 30 μl in 1× Tango-restriction enzyme digestion buffer (MBI Fermentas) using 8 units of each MSREs AciI (New England Biolabs), Hin 6 I and Hpa II (both from MBI Fermentas). Digestions were stopped by heat inactivation (10 min, 75° C.) and subjected to PCR amplification.
  • Example 5 PCR Amplification
  • An aliquot of 20 μl MSRE digested DNA (or in case of preamplification of methylated DNA—see below—about 500 ng were added in a volume of 20 μl) was added to 280 μl of PCR-Premix (without primers). Premix consisted of all reagents obtaining a final concentration of 1× HotStarTaq Buffer (Qiagen); 160 μM dNT-Ps, 5% DMSO and 0.6 U Hot Firepol Taq (Solis Biodyne) per 20 μl reaction. Alternatively an equal amount of HotStarTaq (Qiagen) could be used. Eighteen (18) μl of the Pre-Mix including digested DNA were aliquoted in 16 0.2 ml PCR tubes and to each PCR tube 2 μl of each primer-premix 1-16 (containing 0.83pmol/μl of each primer) were added. PCR reactions were amplified using a thermal cycling profile of 15 min/95° C. and 40 cycles of each 40 sec/95° C., 40 sec/65° C., 1 min20 sec/72° C. and a final elongation of 7 min/72° C., then reactions were cooled. After amplification the 16 different multiplex-PCR amplicons from each DNA sample were pooled. Successful amplification was controlled using 10 μl of the pooled 16 different PCR reactions per sample. Positive amplification obtained a smear in the range of 100-300 bp on EtBr stained agarose gels; negative amplification controls must not show a smear in this range.
  • Example 6 Microarray Hybridization and Detection
  • Microarrays with the probes of the 360 marker set are blocked for 30 min in 3M Urea containing 0.1% SDS, at room temperature submerged in a stirred choplin char. After blocking slides are washed in 0.1×SSC/0.2% SDS for 5 min, dipped into water and dried by centrifugation.
  • The PCR-amplicon-pool of each sample is mixed with an equal amount of 2× hybridization buffer (7×SSC, 0.6% SDS, 50% formamide), desaturated for 5 min at 95° C. and held at 70° C. until loading an aliquot of 100 μl onto an array covered by a gasket slide (Agilent). Arrays are hybridized under maximum speed of rotation in an Agilent-hybridization oven for 16 h at 52° C. After removal of gasket-slides microarray-slides are washed at room temperature in wash-solution I (1×SSC, 0.2% SDS) for 5 min and wash solution II (0.1×SSC, 0.2% SDS) for 5 min, and a final wash by dipping the slides 3 times into wash solution III (0.1×SSC), the slides are dried by centrifugation.
  • For detection of hybridized biotinylated PCR amplicons, streptavidin-Cy3-conjugate (Caltag Laboratories) is diluted 1:400 in PBST-MP (1×PBS, 0.1% Tween 20; 1% skimmed dry milk powder [Sucofin; Germany]), pipetted onto microarrays covered with a coverslip and incubated 30 min at room temperature in the dark. Then coverslips are washed off from the slides using PBST (1×PBS, 0.1% Tween 20) and then slides are washed in fresh PEST for 5 min, rinsed with water and dried by centrifugation.
  • Example 7 DNA Preamplification for Methylation Profiling (Optional)
  • In many situations DNA amount is limited. Although the inventive methylation test is performing well with low amounts of DNA (see above), especially minimal invasive testing using cell free DNA from serum, stool, urine, and other body fluids is of diagnostic relevance.
  • Samples can be preamplified prior methylation testing as follows: DNA was digested with restriction enzyme FspI (and/or Csp6I, and/or MseI, and/or Tsp5091; or their isoschizomeres) and after (heat) inactivation of the restriction enzyme the fragments were circularized using T4 DNA ligase. Ligation-products were digested using a mixture of methylation sensitive restriction enzymes. Upon enzyme-inactivation the entire mixture was amplified using rolling circle amplification (RCA) by phi29-phage polymerase. The RCA-amplicons were then directly subjected to the multiplex-PCRs of the inventive methylation test without further need of digestion of the DNA prior amplification.
  • Alternatively the preamplified DNA which is enriched for methylated DNA regions can be directly subjected to fluorescent-labelling and the labeled products can be hybridized onto the microarrays using the same conditions as described above for hybridization of PCR products. Then the streptavidin-Cy3 detection step has to be omitted and slides should be scanned directly upon stringency washes and drying the slides. Based on the experimental design for microarray analyses, either single labeled or dual-labeled hybridizations might be generated. From our experiences we successfully used the single label-design for class comparisons. Although the preamplification protocol enables analyses of spurious amounts of DNA, it is also suited for performing genomic methylation screens.
  • To elucidate methylation biomarkers for prediction of meta-stasis risk on a genomewide level we subjected 500 ng of DNA derived from primary tumor samples to amplification of the methylated DNA using the procedure outlined above. RCA-amplicons derived from metastasized and non-metastasized samples were labelled using the CGH Labeling Kit (Enzo, Farmingdale, N.Y.) and labelled products hybridized onto human 244 k CpG island arrays (Agilent, Waldbronn, Germany). All manipulations were according the instructions of the manufacturers.
  • Example 8 Data Analysis
  • Hybridizations performed on a chip with probes for the inventive 360 marker genes were scanned using a GenePix 4000A scanner (Molecular Devices, Ismaning, Germany) with a PMT set-ting to 700V/cm (equal for both wavelengths). Raw image data were extracted using GenePix 6.0 software (Molecular Devices, Ismaning, Germany).
  • Microarray data analyses were performed using BRB-ArrayTools developed by Dr. Richard Simon and BRB-ArrayTools Development Team. The software package BRB Array Tools (version 3.6; in the www at linus.nci.nih.gov/BRB-ArrayTools.html) was used according recommendations of authors and settings used for analyses are delineated in the results if appropriate. For every hybridization, background intensities were subtracted from foreground intensities for each spot. Global normalization was used to median center the log-ratios on each array in order to adjust for differences in spot/label intensities.
  • P-values (p) used for feature selection for classification and prediction were based on the univariate significance levels (alpha). P-values (p) and mis-classification rate during cross validation (MCR) were given along the result data.
  • Example 9 Lung Cancer Test
  • DNA methylation analysis of 96 DNA samples derived from both normal and lung-tumour tissue of 48 patient samples and 8 DNA samples isolated from peripheral blood (PB) of healthy individuals were analysed for methylation deviations in the inventive set of 359 genes.
  • From this analysis DNA-methylation-biomarkers suitable for distinction of tumour and normal lung DNA as well as DNA-methylation-profiles from blood DNA of healthy controls were deduced. Diagnostic and prognostic markers subsets are suitable for diagnostic testing and presymptomatic screening for early detection of lung cancer were determined, in DNA derived from lung tissue, but also in DNA extracts from patients other than lung, like sputum, serum or plasma.
  • DNA Methylation testing results and data analyses of chip results as well as qPCR validation of a subset of markers derived from chip-based testing are provided.
  • DNA Samples analysed were from blood of 8 healthy individuals (PB), 19 tumours (AdenoCa, adenocarcinoma) and 19 normal lung tissue (N) of adenocarcinoma patients and 29 tumours (SqCCL, squamous cell carcinoma) and 29 normal lung tissue (N) of squamous cell carcinoma patients.
  • For DNA methylation testing 600 ng of DNA were digested and data derived from DNA-microarray hybridizations analysed using the BRB array tools statistical software package. Class comparison, and class prediction analysis were performed with respect to sample groups as listed above or for delineation of biomarkers for tumour samples both AdenoCa and SqCCL were treated as one tumour sample group (TU).
  • The design of the test enables methylation testing on DNA directly derived from the biological source. The test is also suitable for using a DNA preamplification upon MSRE digestion (as outlined above). Thus using the methylation specific preamplification of minute amounts of DNA samples, biomarker testing is feasible on small samples and limited amounts of DNA. Thus multiplexed PCR and methylation testing is easily performed on preamplified DNA obtained from these DNA samples. This strategy would improve also testing of serum, urine, stool, synovial fluid, sputum and other body fluids using the conceptual design of the methylation test.
  • The possibility of preamplification enables also differential methylation hybridization of the preamplified DNA itself. This option is warranted by the design of the test and the probes. Thus using the probes of the methylation test (or the array) for hybridization of labelled DNA after enrichment of either the methylated as well as the unmethylated DNA fractions of any DNA sample, can be used for methylation testing omitting the multiplex PCR.
  • In addition the biomarkers described herein could be applied for methylation testing using alternative approaches, e.g. methylation sensitive PCR and strategies which are sodium-bisulfite DNA deamination based and not based on MSRE digestion of DNA. These sets of methylation markers are suitable markers for disease-monitoring, -progression, -prediction, therapy-decision and -response.
  • Example 10 Biomarkers from Microarray-Testing of Patient Samples Example 10a CLASS COMPARISON: TU Vs. Normal: p<0.005, Unpaired Samples; 2 Fold Change
  • These list of methylation markers were found significant (p<0.005) between TU and N using “unpaired” statistical testing of DNA methylation of 48 tumour samples versus 48 healthy lung tissue samples. Significant markers with 2 fold difference of signal intensities of both classes with p<0.005 are listed.
  • TABLE 2
    Sorted by p-value of the univariate test.
    Class 1: N; Class 2: T.
    The 32 genes are significant at the nominal 0.005 level of the
    univariate test with the fold change 2
    Per- Geom Geom
    muta- mean of mean of
    Parametric tion p- intensities intensities Fold- Gene
    p-value FDR value in class 1 in class 2 change symbol
    1 <1e−07 <1e−07 <1e−07 1411.8016 13554.578246 0.1041568 WT1
    2 <1e−07 <1e−07 <1e−07 85.5069224 1125.7940428 0.0759525 DLX2
    3 <1e−07 <1e−07   1e−07 852.3850013 7392.282404 0.1153074 SALL3
    4   1e−07 <1e−07   1e−07 235.4745892 592.5077157 0.3974203 TERT
    5 <1e−07 <1e−07 <1e−07 274.9097126 833.6648468 0.3297605 PITX2
    6 <1e−07 <1e−07 <1e−07 80.5286413 265.3042755 0.3035331 HOXA10
    7 <1e−07 <1e−07 <1e−07 112.6645619 855.6410585 0.1316727 F2R
    8   1e−07 4.5e−06  <1e−07 2002.2452679 266.6906343 7.507745 CPED4
    9   4e−07 1.46e−05     1e−07 718.311462 4609.4380991 0.1558349 NHLH2
    10   4e−07 1.46e−05   <1e−07 10347.8184959 3603.9811381 2.8712188 SMAD3
    11   5e−07 1.65e−05   <1e−07 2993.3054637 1117.4218527 2.6787604 ACTB
    12 2.8e−06 8.49e−05     1e−07 296.6448711 3941.769913 0.0752568 HOXA1
    13 3.6e−06 0.0001008 <1e−07 2792.0699393 17199.6551909 0.1623329 BOLL
    14 5.9e−06 0.0001342 <1e−07 8664.2840567 2178.4607085 3.9772506 APC
    15 1.21e−05  0.0002591 <1e−07 96.7848387 472.6945117 0.2047513 MT1G
    16 1.36e−05  0.000275   1e−07 653.0579403 2188.6201533 0.298388 PENK
    17 1.97e−05  0.0003774 <1e−07 1710.9865406 4044.9737351 0.4229908 SPARC
    18 3.16e−05  0.0005751 <1e−07 1639.128227 811.4430136 2.0200164 DNAJA4
    19 3.85e−05  0.0006673 <1e−07 114.7065029 292.8694482 0.3916643 RASSF1
    20 4.28e−05  0.0007081 <1e−07 564.6571983 189.2105463 2.9842797 HLA-G
    21 4.98e−05  0.0007881   1e−04 1339.8175413 446.1370253 3.0031525 ERCC1
    22   6e−05 0.00091   1e−04 395.6248705 1158.1502714 0.3416006 ONECUT2
    23 6.58e−05  0.000958 <1e−07 2517.3232246 1024.0897145 2.4581081 APC
    24 8.45e−05  0.0011392 <1e−07 232.2537844 701.7843246 0.3309475 ABCB1
    25 0.0002382 0.0029898   1e−04 3027.5067641 1165.5391698 2.5975161 ZNF573
    26 0.0003469 0.003946 <1e−07 360.9888133 148.6109072 2.4290869 KCNJ15
    27 0.0003582 0.0039511   3e−04 1818.1186026 4147.2970277 0.4383864 ZDHHC11
    28 0.0012332 0.01192 0.0013 238.5488592 512.9101159 0.465089 SFRP2
    29 0.0019349 0.0176076 0.0015 310.5591882 1215.8855725 0.2554181 GDNF
    30 0.002818 0.0227945 0.0022 4930.1368809 2261.9370298 2.1796084 PTTG1
    31 0.0038228 0.0267596 0.0045 2402.9850212 974.5347994 2.4657765 SERPIN-I1
    32 0.0039256 0.0269326 0.0031 208.6539745 417.3186041 0.4999872 TN-FRSF10C
  • Example 10b CLASS Prediction: TU Vs Normal: p<0.005, Unpaired Samples; 2Fold Change
  • Class prediction using different statistical methods for elucidating marker panels enabling best correct classification of TU and N (p<0.005).
  • Performance of Classifiers During Cross-Validation.
  • Diagonal
    Mean Compound Linear 3- Support
    Number of Covariate Discriminant 1- Nearest Nearest Vector
    genes in Predictor Analysis Nearest Neighbors Centroid Machines
    classifier Correct? Correct? Neighbor Correct? Correct? Correct?
    Mean percent 100 100 98 98 98 98
    of correct
    classification:
  • TABLE 3
    Composition of classifier: Sorted by t -value
    Geometric mean
    Parametric % CV of intensities Gene
    p-value t-value support (class N/class T) symbol
    1 <1e−07 −10.859 100 0.1041568 WT1
    2 <1e−07 −7.903 100 0.3297605 PITX2
    3 <1e−07 −7.314 100 0.1153074 SALL3
    4 <1e−07 −7.063 100 0.1316727 F2R
    5 <1e−07 −7.028 100 0.0759525 DLX2
    6 <1e−07 −6.592 100 0.3974203 TERT
    7 <1e−07 −6.539 100 0.3035331 HOXA10
    8 <1e−07 −6.495 100 0.7772068 MSH4
    9 <1e−07 −6.357 100 0.1558349 NHLH2
    10   4e−07 −5.915 100 0.5405671 GNA15
    11   4e−07 −5.908 100 0.298388 PENK
    12  4.2e−06 −5.206 100 0.3916643 RASSF1
    13   5e−06 −5.155 100 0.1623329 BOLL
    14 1.05e−05 −4.935 100 0.0752568 HOXA1
    15  3.1e−05 −4.61 100 0.3416006 ONECUT2
    16 4.26e−05 −4.514 100 0.3309475 ABCB1
    17 4.59e−05 −4.491 100 0.4229908 SPARC
    18 4.96e−05 −4.467 100 0.2047513 MT1G
    19 8.53e−05 −4.301 100 0.6381881 HSPA2
    20 0.0002478 −3.966 100 0.465089 SFRP2
    21 0.0002786 −3.929 100 0.7532617 PYCARD
    22 0.0003286 −3.876 100 0.6491186 GAD1
    23 0.0004296 −3.789 100 0.8137828 C5orf4
    24 0.0004695 −3.76 100 0.7676414 C5AR1
    25 0.0004699 −3.76 100 0.2554181 GDNF
    26 0.0006369 −3.66 100 0.4383864 ZDHHC11
    27 0.0008023 −3.584 100 0.8171479 SERPINE1
    28 0.0009028 −3.544 100 0.6392075 NKX2-1
    29 0.0009179 −3.539 100 0.5993327 PITX2
    30 0.0010255 −3.501 100 0.7691876 C5AR1
    31 0.0011267 −3.47 100 0.5118859 ZNF256
    32 0.0014869 −3.375 100 0.5593175 FAM43A
    33 0.0015714 −3.356 100 0.6862518 SFRP2
    34 0.0019233 −3.287 100 0.3698669 MT3
    35 0.0019731 −3.278 100 0.7715219 SERPINE1
    36 0.0019838 −3.276 100 0.8088555 CLIC4
    37 0.0023911 −3.21 100 0.4999872 TNFRSF10C
    38 0.0027742 −3.158 92 0.8776257 GABRA2
    39 0.0028024 −3.154 92 0.7069999 MTHFR
    40 0.0030868 −3.12 81 0.6837301 ESR2
    41 0.0033263 −3.093 79 0.6327604 NEUROG1
    42 0.0036825 −3.057 67 0.6444277 PITX2
    43 0.0044243 −2.99 44 0.732542 PLAGL1
    44 0.004896  −2.953 40 0.4992372 TMEFF2
    45 0.0037996 3.046 65 2.1796084 PTTG1
    46 0.0034628 3.079 73 1.1394289 CADM1
    47 0.0024932 3.196 100 1.0870547 S100A8
    48 0.0024284 3.205 100 1.3497772 EFS
    49 0.0020087 3.271 100 1.2801593 JUB
    50 0.0017007 3.329 100 1.1823596 ITGA4
    51 0.0015061 3.371 100 1.5959594 MAGEB2
    52 0.0013429 3.41 100 1.294098 ERBB2
    53 0.0011103 3.475 100 1.3485708 SRGN
    54 0.0007894 3.589 100 1.3193821 GNAS
    55 0.0007437 3.609 100 1.9621539 TJP2
    56 0.000457  3.769 100 2.4290869 KCNJ15
    57 0.0004291 3.789 100 1.3004513 SLC25A31
    58 0.0001587 4.107 100 2.5975161 ZNF573
    59 0.0001331 4.163 100 1.4996674 TNFRSF25
    60 9.26e−05 4.276 100 2.4581081 APC
    61 4.88e−05 4.472 100 1.9612086 KCNQ1
    62 3.62e−05 4.564 100 1.4971047 LAMC2
    63 1.82e−05 4.77 100 1.5467277 SPHK1
    64 1.68e−05 4.794 100 2.0200164 DNAJA4
    65 1.45e−05 4.838 100 3.9772506 APC
    66   9e−06 4.979 100 1.388284 MBD2
    67  8.6e−06 4.994 100 3.0031525 ERCC1
    68  4.5e−06 5.182 100 2.9842797 HLA-G
    69  4.2e−06 5.202 100 1.7516486 CXADR
    70  1.4e−06 5.521 100 1.9112579 TP53
    71  1.1e−06 5.605 100 2.6787604 ACTB
    72   9e−07 5.647 100 1.9365988 KL
    73   6e−07 5.755 100 2.8712188 SMAD3
    74   2e−07 6.05 100 1.4368727 HIST1H2AG
    75   2e−07 6.115 100 7.507745 CPEB4
  • Example 10c 4 Greedy Pairs>>92% Correct Using SVM (Support Vector Machine)
  • Using “4 pairs of methylation markers” derived from greedy pairs class prediction with supportive vector machines enables 92% correct classification of TU and N.
  • Performance of Classifiers During Cross-Validation.
  • Diagonal
    Compound Linear Support
    Covariate Discriminant 3-Nearest Nearest Vector
    Predictor Analysis 1-Nearest Neighbors Centroid Machines
    Correct? Correct? Neighbor Correct? Correct? Correct?
    Mean percent 90 90 90 89 91 92
    of correct
    classification:
  • Performance of the Support Vector Machine Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.917 0.917 0.917 0.917
    T 0.917 0.917 0.917 0.917
  • TABLE 4
    Composition of classifier: Sorted by t-value
    (Sorted by gene pairs)
    Class 1: N; Class 2: T.
    Parametric Geom mean Geom mean
    p- t- % CV of intensities of intensities Fold- Gene
    value value support in class 1 in class 2 change symbol
    1 <1e−07 −9.452 100 1411.8016 13554.578246 0.1041568 WT1
    2 <1e−07 −7.222 100 85.5069224 1125.7940428 0.0759525 DLX2
    3 <1e−07 −6.648 99 852.3850013 7392.282404 0.1153074 SALL3
    4 <1e−07 −6.48 70 235.4745892 592.5077157 0.3974203 TERT
    5 0.0017994 3.213 27 437.7037557 291.867223 1.4996674 TNFRSF25
    6   5e−07 5.391 100 2993.3054637 1117.4218527 2.6787604 ACTB
    7   4e−07 5.474 76 10347.818495 3603.9811381 2.8712188 SMAD3
    8 <1e−07 5.832 98 2002.2452679 266.6906343 7.507745 CPEB4
  • Example 10d (BRB v3.8) 5 Greedy Pairs
  • Using “5 pairs of methylation markers” derived from greedy pairs class prediction with supportive vector machines enables 95% correct classification of TU and N.
  • Performance of Classifiers During Cross-Validation:
  • Mean Diagonal Bayesian
    Number Compound Linear 3- Support Compound
    of genes Covariate Discriminant 1- Nearest Nearest Vector Covariate
    in Predictor Analysis Nearest Neighbors Centroid Machines Predictor
    classifier Correct? Correct? Neighbor Correct? Correct? Correct? Correct?
    Mean percent 92 94 90 94 92 95 95
    of correct
    classification:
    Note:
    NA denotes the sample is unclassified. These samples are excluded in the compuation of the mean percent of correct classification
  • Performance of the Support Vector Machine Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.958 0.938 0.939 0.957
    T 0.938 0.958 0.957 0.939
  • TABLE 5
    Composition by classifier: Sorted by t-value (Sorted by gene pairs)
    Class 1: N; Class 2: T.
    Geom mean Geom mean
    Parametric % CV of intensities of intensities Fold- Gene
    p-value t-value support in class 1 in class 2 change symbol
    1   <1e−07 −9.531 100 1378.5556347 13613.2679786 0.1012656 WT1
    2   <1e−07 −7.419 100 78.691453 1122.0211285 0.0701337 DLX2
    3   <1e−07 −6.702 100 832.1044249 7415.7421008 0.1122078 SALL3
    4   <1e−07 −6.625 100 223.339058 595.0731922 0.03753136 TERT
    5   <1e−07 −6.586 100 267.2568518 837.2745062 0.3191986 PITX2
    6 0.0029082 3.057 35 427.3964613 286.9546694 1.4894215 TNFRSF25
    7   1.26e−05 4.612 70 7297.8279144 3875.9637585 1.8828421 KL
    8     9e−07 5.255 99 2922.8174216 1122.2601272 2.6044028 ACTB
    9     9e−07 5.266 98 10104.1419624 3617.8969167 2.792822 SMAD3
    10     2e−07 5.603 100 1911.6531674 265.654275 7.1960188 CPEB4
  • Example 10e Recursive Feature Elimination Method
  • Using “16 methylation markers” derived from the Recursive Feature Elimination method for class prediction with Diagonal Linear Discriminant Analysis enables 100% correct classification of TU and N.
  • Performance of Classifiers During Cross-Validation.
  • Mean Diagonal
    Number Compound Linear 3- Support
    of genes Covariate Discriminant 1- Nearest Nearest Vector
    in Predictor Analysis Nearest Neighbors Centroid Machines
    classifier Correct? Correct? Neighbor Correct? Correct? Correct?
    Mean percent 98 100 96 96 94 96
    of correct
    classification:
  • TABLE 6
    Composition of classifier: Sorted by t-value
    Geometric mean
    Parametric % CV of intensities Gene
    p-value t-value support (class N/class T) symbol
    1 <1e−07 −10.859 100 0.1041568 WT1
    2 <1e−07 −7.903 100 0.3297605 PITX2
    3 <1e−07 −7.314 98 0.1153074 SALL3
    4 <1e−07 −7.028 81 0.0759525 DLX2
    5 <1e−07 −6.592 98 0.3974203 TERT
    6 <1e−07 −6.539 98 0.3035331 HOXA10
    7 4.2e−06  −5.206 98 0.3916643 RASSF1
    8 4.59e−05 −4.491 94 0.4229908 SPARC
    9 0.0329896 −2.197 88 0.5237754 IRAK2
    10 0.0496307 −2.015 98 0.6640548 ZNF711
    11 1.68e−05 4.794 79 2.0200164 DNAJA4
    12 4.5e−06  5.182 79 2.9842797 HLA-G
    13 4.2e−06  5.202 79 1.7516486 CXADR
    14 1.4e−06  5.521 75 1.9112579 TP53
    15 1.1e−06  5.605 100 2.6787604 ACTB
    16  2e−07 6.115 100 7.507745 CPEB4
  • Example 10f (BRB v3.8) Recursive Feature Elimination Method
  • Due to some differences in data importing/normalisation repeated collation of data for statistics (using BRB v. 3.8) a genelist with minor differences (compared to example 12e) has been calculated form data, and is as given below:
  • Performance of Classifiers During Cross-Validation.
  • Mean Diagonal
    Number Compound Linear 3- Support
    of genes Covariate Discriminant 1- Nearest Nearest Vector
    in Predictor Analysis Nearest Neighbors Centroid Machines
    classifier Correct? Correct? Neighbor Correct? Correct? Correct?
    Mean percent 96 100 96 96 96 96
    of correct
    classification:
  • TABLE 7
    Composition of classifier: Sorted by t-value
    Geometric mean
    Parametric % CV of intensities Gene
    p-value t-value support (class N/class TU) symbol
    1 <1e−07 −10.777 100 0.1012656 WT1
    2 <1e−07 −8.046 88 0.3191986 PITX2
    3 <1e−07 −7.336 98 0.1122078 SALL3
    4 <1e−07 −7.232 85 0.1264427 F2R
    5 <1e−07 −6.712 100 0.3753136 TERT
    6 <1e−07 −6.524 98 0.2930706 HOXA10
    7 1.6e−06  −5.49 98 0.3695951 RASSF1
    8 3.87e−05 −4.543 83 0.4112493 SPARC
    9 0.0313421 −2.219 88 0.5143877 IRAK2
    10 0.0366617 −2.151 98 0.6452171 ZNF711
    11 0.3333009 0.978 58 1.1102014 DRD2
    12 4.91e−05 4.471 77 1.9749991 DNAJA4
    13 2.25e−05 4.707 75 1.7030259 CXADR
    14 7.4e−06  5.036 88 1.8582045 TP53
    15 2.1e−06  5.402 100 2.6044028 ACTB
    16  5e−07 5.815 100 7.1960188 CPEB4
  • Example 10g Recursive Geneset for “PB-N-TU” Distinction Using CLASS Prediction
  • To distinguish PB, N, and TU is of interest when minimal invasive testing for lung cancer has to be performed using serum- or plasma from peripheral blood. The markers distinguishing PB, N and TU will be best suited therefore. Using “16 methylation markers” derived from the Recursive Feature Elimination method for class prediction with Diagonal Linear Discriminant Analysis enables 91% correct classification.
  • Performance of Classifiers During Cross-Validation:
  • Diagonal Linear 3-Nearest Nearest
    Discriminant 1-Nearest Neighbors Centroid
    Analysis Correct? Neighbor Correct? Correct?
    Mean percent 91 89 87 88
    of correct
    classification:
  • Performance of the Diagonal Linear Discriminant Analysis Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.875 0.946 0.933 0.898
    PB 1 0.948 0.615 1
    T 0.938 0.982 0.978 0.948
  • Performance of the 1-Nearest Neighbor Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.979 0.821 0.825 0.979
    PB 0.75 0.99 0.857 0.979
    T 0.833 1 1 0.875
  • Performance of the 3-Nearest Neighbors Classifier:
  • Class Sensitivity Specificity PPV NPV
    N
    1 0.75 0.774 1
    PB 0.125 1 1 0.932
    T 0.854 1 1 0.889
  • Performance of the Nearest Centroid Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.812 0.929 0.907 0.852
    PB 1 0.917 0.5 1
    T 0.917 0.982 0.978 0.932
  • TABLE 8
    Composition by classifier: Sorted by p-value
    Class 1: N; Class 2: PB; Class 3: T.
    Geom mean Geom mean Geom mean
    Parametric % CV of intensities of intensities of intensities Gene
    p-value t-value support in class 1 in class 2 in class 3 symbol
    1   <1e−07 65.961 100 1411.8016 335.9542052 13554.578246 WT1
    2   <1e−07 34.742 100 2993.3054637 240.5599546 1117.4218527 ACTB
    3   <1e−07 30.862 100 85.5069224 70.3843498 1125.7940428 DLX2
    4   <1e−07 30.03 100 274.9097126 128.8159291 833.6648468 PITX2
    5   <1e−07 28.153 100 852.3850013 349.2428569 7392.282404 SALL3
    6   <1e−07 23.333 100 80.5286413 62.0661721 265.3042755 HOXA10
    7   <1e−07 21.159 100 235.4745892 296.8149796 592.5077157 TERT
    8     2e−07 17.8 100 2002.2452679 1697.5965438 266.6906343 CPEB4
    9    4.3e−06 13.991 100 564.6571983 1254.1750649 189.2105463 HLA-G
    10   1.54e−05 12.388 100 1710.9865406 1310.5286603 4044.9737351 SPARC
    11    1.9e−05 12.132 100 114.7065029 81.1382549 292.8694482 RASSF1
    12   6.55e−05 10.614 100 1639.128227 1576.0887022 811.4430136 DNAJA4
    13 0.0008203 7.63 100 1484.6917542 1429.9219493 847.5968076 CXADR
    14 0.0008501 7.589 100 11761.052468 9062.1655722 6153.5665863 TP53
    15 0.041843 3.276 100 105.5844903 94.1143599 201.5835284 IRAK2
    16 0.3946752 0.938 100 483.3048928 567.8776158 727.8087385 ZNF711
  • Example 10h Class Prediction “Differentiation”→Poor-Moderate-Well
  • Distinguishing the grade of differentiation of the tumours could be also achieved by DNA methylation marker testing. Although the correct classification is only about 60% in this example, the lung tumour groups “AdenoCa” and “SqCCL” can be split and used separately for determining the grade of tumour-differentiation for better performance.
  • Performance of Classifiers During Cross-Validation.
  • Diagonal Linearn 3-Nearest Nearest
    Discriminant 1-Nearest Neighbors Centroid
    Analysis Correct? Neighbor Correct? Correct?
    Mean percent 50 52 57 62
    of correct
    classification:
  • TABLE 9
    Composition by classifier: Sorted by p-value
    Class 1: moderate; Class 2: poor; Class 3: well.
    Geom mean Geom mean Geom mean
    Parametric % CV of intensities of intensities of intensities Gene
    p-value t-value support in class 1 in class 2 in class 3 symbol
    1 0.0002337 10.127 100 2426.5840626 190.6171197 840.042225 F2R
    2 0.002636 6.796 100 409.0809522 178.099004 3103.6338503 ZNF256
    3 0.0034931 6.432 100 67.1145733 81.4305823 63.5786575 CDH13
    4 0.0044626 6.118 100 30915.9294466 15055.465308 6829.1471271 SERPINB5
    5 0.0082321 5.35 100 289.011498 400.2767665 163.1721958 KRT14
    6 0.0092929 5.2 100 2890.2702155 418.2345934 211.3575002 DLX2
    7 0.0111512 4.977 100 68.3488191 83.3593382 60.6607364 AREG
    8 0.0286999 3.846 98 62.1904027 62.94364 74.3029102 THRB
    9 0.0326517 3.696 92 64.7904336 80.1596633 60.6607364 HSD17B4
    10 0.0414877 3.418 62 5631.0373836 2622.6315852 3310.1373187 SPARC
    11 0.0449927 3.325 79 894.5655128 1191.0908574 510.2671098 HECW2
    12 0.0480858 3.249 40 441.1103703 1018.9640546 852.4793505 COL21A1
  • Example 10i BinTreePred “Differentiation” AdenoCa, SqCCL, N PB
  • Using Binary Tree prediction (applicable for elucidation of markers for more than 2 classes) provides several sets of predictors which enable classification of PB, AdenoCa, SqCCL, N. These marker sets could be used alternatively for classification.
  • Optimal Binary Tree: Cross-Validation Error Rates for a Fixed Tree Structure Shown Below
  • Mis-classifi-
    Node Group 1 Classes Group 2 Classes cation rate (%)
    1 AdenoCa, N, SqCCL PB 0.0
    2 AdenoCa, SqCCL N 9.4
    3 AdenoCa SqCCL 31.2
    Figure US20160281175A1-20160929-C00001
  • Results of Classification, Node 1:
  • TABLE 10
    Composition of classifier (23 genes): Sorted by p-value
    Geom mean of Geom mean of
    Parametric % CV intensities in group intensities in group
    p-value t-value support 1 2 Gene symbol
    1   <1e−07 11.494 100 5370.6044342 241.377309 KL
    2   <1e−07 13.624 100 15595.1182874 226.4099812 HIST1H2AG
    3   <1e−07 14.042 100 15562.4306923 62.0661607 TJP2
    4   <1e−07 20.793 100 36238.4478078 169.7749739 SRGN
    5   <1e−07 8.845 92 2847.6405879 176.5970582 CDX1
    6   <1e−07 7.452 100 357.4232278 64.4047416 TNFRSF25
    7   <1e−07 6.909 97 4344.5133099 90.5259025 APC
    8   <1e−07 6.607 100 38027.3831138 10046.5061814 HIC1
    9   <1e−07 6.428 100 1605.6039019 115.3436683 APC
    10     2e−07 5.611 100 439.58106 107.9138518 GNA15
    11     2e−07 5.53 100 1828.8750958 240.5597144 ACTB
    12   2.47e−05 4.42 100 4374.5147937 335.954606 WT1
    13   3.53e−05 −4.327 100 693.9070151 2419.282873 KRT17
    14   4.73e−05 −4.251 100 3086.6035554 8432.6551975 AIM1L
    15   5.58e−05 −4.207 100 11780.3636838 25260.4242674 DPH1
    16 0.0001755 3.895 96 2120.616338 688.5899191 PITX2
    17 0.0005056 3.593 100 478.7300449 128.8159563 PITX2
    18 0.0012022 −3.332 100 167.4354555 461.2140013 KIF5B
    19 0.0015431 −3.254 100 865.090709 2041.1567322 BMP2K
    20 0.0020491 −3.164 100 10857.4258468 26743.6730071 GBP2
    21 0.0023603 3.119 100 1819.6185255 218.3422479 NHLH2
    22 0.0040506 2.941 96 614.495327 62.0661607 GDNF
    23 0.0043281 2.918 98 6929.8366248 784.5416613 BOLL
  • Results of Classification, Node 2:
  • TABLE 11
    Composition of classifier (32 genes): Sorted by p-value
    Geom mean of Geom mean of
    Parametric % CV intensities in group intensities in group
    p-value t-value support 1 2 Gene symbol
    1   <1e−07 9.452 92 13554.5792299 1411.801824 WT1
    2   <1e−07 7.222 92 1125.7939487 85.5069135 DLX2
    3   <1e−07 6.648 69 7392.2771156 852.3852836 SALL3
    4   <1e−07 6.48 92 592.5077475 235.4746794 TERT
    5   <1e−07 6.445 92 833.6646395 274.909652 PITX2
    6   <1e−07 6.123 92 265.3043233 80.5286481 HOXA10
    7   <1e−07 6.019 92 855.6411657 112.6645794 F2R
    8   <1e−07 −5.832 92 266.6907851 2002.2457379 CPEB4
    9     4e−07 5.482 92 4609.4395265 718.3111003 NHLH2
    10     4e−07 −5.474 92 3603.9808376 10347.8149677 SMAD3
    11     5e−07 −5.391 92 1117.4212918 2993.3062317 ACTB
    12    2.8e−06 4.984 92 3941.7717994 296.6448908 HOXA1
    13    3.6e−06 4.922 92 17199.6559171 2792.0695552 BOLL
    14    5.9e−06 −4.802 92 2178.4609569 8664.280092 APC
    15   1.21e−05 4.622 92 472.6943985 96.784825 MT1G
    16   1.36e−05 4.593 69 2188.6204084 653.0580827 PENK
    17   1.97e−05 4.497 92 4044.9730493 1710.9865557 SPARC
    18   3.16e−05 −4.373 92 811.4434055 1639.128128 DNAJA4
    19   3.85e−05 4.321 92 292.869462 114.7064501 RASSF1
    20   4.28e−05 −4.293 92 189.210499 564.6573579 HLA-G
    21   4.98e−05 −4.253 92 446.1371701 1339.8173509 ERCC1
    22     6e−05 4.203 92 1158.1503785 395.6249449 ONECUT2
    23   6.58e−05 −4.178 92 1024.089614 2517.3225611 APC
    24   8.45e−05 4.11 92 701.7840426 232.2538242 ABCB1
    25 0.0002382 −3.821 92 1165.5392514 3027.5052576 ZNF573
    26 0.0003469 −3.713 92 148.6108699 360.9887854 KCNJ15
    27 0.0003582 3.704 92 4147.2987214 1818.1188972 ZDHHC11
    28 0.0012332 3.332 46 512.9098469 238.5488699 SFRP2
    29 0.0019349 3.19 92 1215.8855046 310.5592635 GDNF
    30 0.002818 −3.068 92 2261.9371454 4930.1357863 PTTG1
    31 0.0038228 −2.966 92 974.5345902 2402.9849125 SERPINI1
    32 0.0039256 2.957 90 417.3184202 208.6541481 TNFRSF10C
  • Results of Classification, Node 3:
  • TABLE 12
    Composition of classifier (2 genes): Sorted by p-value
    Geom mean Geom mean
    of of
    Parametric t- % CV intensities intensities Gene
    p-value value support in group 1 in group 2 symbol
    1 0.000302 3.91 40 584.5327307 158.116767 HOXA10
    2 0.0038089 3.048 46 180.3474561 67.115885 NEUROD1
  • Example 11 qPCR Validation of Biomarkers
  • Quantitative PCR with primers for markers elucidated by microarray analysis were run on MSRE-digested DNAs from the same sample groups as analyzed on microarrays. Marker sets for SYBRGreen qPCR were from Example 10f and Example 10d.
  • TABLE 13
    Markers used for SYBRGreen-qPCR:
    Gene
    Unique id symbol
    Ahy_61_chr11:32411664-32412266 +_401-464 WT1
    349_hy_35-PitxA_chr4:111777754-111778067 PITX2
    Ahy_156_chr18:74841510-74841935 +_336-389 SALL3
    Ahy_265_chr5:76046889-76047178 +_134-197 F2R
    Ahy_252_chr5:1348529-1348893 +_138-187 TERT
    Ahy_289_chr7:27180142-27180796 +_181-238 HOXA10
    Ahy_233_chr3:50352877-50353278 +_108-157 RASSF1
    Ahy_257_chr5:151046476-151047183 +_57-106 SPARC
    Ahy_212_chr3:10181572-10181986 +_249-298 IRAK2
    Ahy_332_chrX:84385510-84385717 +_42-106 ZNF711
    Ahy_51_chr11:112851438-112851650 +_57-107 DRD2
    Ahy_109_chr15:76343347-76343876 +_373-428 DNAJA4
    Ahy_202_chr21:17806218-17806561 +_104-167 CXADR
    Ahy_143_chr17:7532353-7532949 +_415-476 TP53
    335_hy_4-Aktin_VL_chr7:5538506-5538805 ACTB
    Ahy_261_chr5:173247753-173248208 +_350-404 CPEB4
    Ahy_181_chr2:172672873-172673656 +_177-227 DLX2
    Ahy_30_chr1:6448693-6448938 +_57-107 TNFRSF25
    Ahy_83_chr13:32489371-32489688 +_181-245 KL
    Ahy_107_chr15:65146236-65146654 +_305-366 SMAD3
  • Negative amplification (no Cp-value generated upon 45 cycles of PCR amplification with SYBR green) were set to Cp=45; all qPCR-Cp-values were subtracted from 45.01 to obtain transformed data directly comparable to microarray data,—thus the higher the value the more product was generated (resembles a lower Cp-value. Statistical testing of the transformed data was performed in the same manner as the microarray data using BRB-AT software.
  • Class comparison and different strategies/methods for class prediction using the qPCR enables correct classification of different sample groups. Although qPCR conditions were not optimized but run under our standard conditions, successful classification of groups with markers deduced from microarray-analysis confirms reliability of methylation markers.
  • TABLE 14
    9 markers from Table 13 showed significant class difference fold changes
    mean of log mean of log
    Gene intensities intensities
    Unique id symbol for N for T FoldDiff
    Ahy_30_chr1:6448693-6448938 +_57-107 TNFRSF25 7.40354 8.5125 0.46
    Ahy_156_chr18:74841510-74841935 +_336-389 SALL3 1.59063 7.04229 0.02
    Ahy_233_chr3:50352877-50353278 +_108-157 RASSF1 5.80167 7.95708 0.22
    Ahy_252_chr5:1348529-1348893 +_138-187 TERT 0.01 1.1725 0.45
    Ahy_257_chr5:151046476-151047183 +_57-106 SPARC 11.76 14.10521 0.20
    Ahy_265_chr5:76046889-76047178 +_134-197 F2R 0.70917 4.87917 0.06
    Ahy_289_chr7:27180142-27180796 +_181-238 HOXA10 1.67708 3.88125 0.22
    Ahy_332_chrX:84385510-84385717 +_42-106 ZNF711 4.635 6.48875 0.28
    349_hy_35-PitxA_chr4:111777754-111778067 PITX2 5.48854 8.61813 0.11
  • Example 11a CLASS Prediction: TU Vs Normal: p<0.01>>SVM 100%, Paired Samples Performance of Classifiers During Cross-Validation Mean Percentage of Correction Classification:
  • Diagonal
    Compound Linear 3- Support
    Covariate Discriminant 1- Nearest Nearest Vector
    Predictor Analysis Nearest Neighbors Centroid Machines
    Correct? Correct? Neighbor Correct? Correct? Correct?
    Mean percent of 96 98 94 94 94 100
    correct classification:
    n = 48
  • TABLE 15
    Composition of classifier: Sorted by t -value
    Geometric mean
    Parametric % CV of intensities Gene
    p-value t-value support (class N/class T) symbol
    1 1e−07 −6.184 100 0.0228499 SALL3
    2 2e−07 −6.162 100 0.1142619 PITX2
    3 4e−07 −5.879 100 0.1967986 SPARC
    4 3.5e−06 −5.254 100 0.0555527 F2R
    5 8.08e−05   −4.318 100 0.4467377 TERT
    6 0.0009183 −3.538 100 0.2244683 RASSF1
    7 0.0011335 −3.468 100 0.21701 HOXA10
    8 0.0045818 2.978 100 1.7787126 CXADR
    9 0.0012761 3.427 100 3.3134481 KL
  • Example 11b CLASS Prediction: TU vs Normal: p<0.01 Performance of the Support Vector Machine Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.917 0.875 0.88 0.913
    T 0.875 0.917 0.913 0.88
  • Performance of the Bayesian Compound Covariate Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.792 0.604 0.667 0.744
    T 0.604 0.792 0.744 0.667
  • TABLE 16
    Composition of classifier: Sorted by t-value
    Class 1: N; Class 2: T.
    Geom mean Geom mean
    Parametric % CV of intensities of intensities Fold- Gene
    p-value t-value support in class 1 in class 2 change Unique id symbol
    1   <1e−07 −6.713 100 3.011798 131.8077746 0.0228499 Ahy_156_chr18:74841510- SALL3
    74841935 +_336-389
    2   <1e−07 −6.491 100 3468.2688243 17623.4446406 0.1967986 Ahy_257_chr5:151046476- SPARC
    151047183 +_57-106
    3   <1e−07 −6.208 100 44.8968301 392.9290497 0.1142619 349_hy_35-PitxA_chr4: PITX2
    111777754-111778067
    4     1e−06 −5.248 100 1.6348595 29.429 0.0555527 Ahy_265_chr5:76046889- F2R
    76047178 +_134-197
    5   3.91e−05 −4.318 100 1.0069555 2.2540195 0.4467377 Ahy_252_chr5:1348529- TERT
    1348893 +_138-187
    6 0.0003748 −3.691 100 55.7796365 248.4967761 0.2244683 Ahy_233_chr3:50352877- RASSF1
    50353278 +_108-157
    7 0.0009309 −3.419 100 3.1978081 14.7357642 0.21701 Ahy_289_chr7:27180142- HOXA10
    27180796 +_181-238
    8 0.0009772 3.404 100 3114.5146028 939.9618007 3.3134481 Ahy_83_chr13:32489371- KL
    32489688 +_181-245
  • TABLE 16b
    Prediction rule from the linear predictors
    Table. Compound Diagonal Linear Support
    Gene Covariate Discriminant Vector
    Weights Genes Predictor Analysis Machines
    1 Ahy_83_chr13:32489371-32489688 +_181-245 3.4041 0.2794 1.2796
    2 Ahy_156_chr18:74841510-74841935 +_336-389 −6.7126 −0.3444 −0.2136
    3 Ahy_233_chr3:50352877-50353278 +_108-157 −3.6907 −0.2633 0.0512
    4 Ahy_252_chr5:1348529-1348893 +_138-187 −4.3175 −0.6681 −1.1674
    5 Ahy_257_chr5:151046476-151047183 +_57-106 −6.4911 −0.7486 −0.7093
    6 Ahy_265_chr5:76046889-76047178 +_134-197 −5.2477 −0.2752 −0.0135
    7 Ahy_289_chr7:27180142-27180796 +_181-238 −3.419 −0.221 −0.3187
    8 349_hy_35-PitxA_chr4:111777754-111778067 −6.2083 −0.5132 −0.353
  • The prediction rule is defined by the inner sum of the weights (wi) and expression (xi) of significant genes. The expression is the log ratios for dual-channel data and log intensities for single-channel data.
  • A sample is classified to the class N if the sum is greater than the threshold; that is, Σiwi xi>threshold.
  • The threshold for the Compound Covariate predictor is −172.255
    The threshold for the Diagonal Linear Discriminant predictor is −15.376
    The threshold for the Support Vector Machine predictor is 0.838
  • Example 11c Recursive Feature Extraction (n=10) Prediction: TU Vs Normal 98% Correct, Paired Samples
  • TABLE 17
    Composition of classifiers: Sorted by t-value
    Geometric mean
    Parametric % CV of intensities Gene
    p-value t-value support (class N/class T) symbol
    1 1e−07 −6.184 100 0.0228499 SALL3
    2 2e−07 −6.162 100 0.1142619 PITX2
    3 4e−07 −5.879 100 0.1967986 SPARC
    4 3.5e−06 −5.254 100 0.0555527 F2R
    5 0.0011335 −3.468 100 0.21701 HOXA10
    6 0.0188086 −2.434 92 0.5671786 DRD2
    7 0.3539709 0.936 94 1.2886257 ACTB
    8 0.1083921 1.637 100 1.8305684 DNAJA4
    9 0.0045818 2.978 98 1.7787126 CXADR
    10 0.0012761 3.427 100 3.3134481 KL
  • Example 11d Greedy Pairs (6) Prediction: TU Vs Normal: 88% SVM, UNpaired Samples Performance of the Support Vector Machine Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.896 0.854 0.86 0.891
    T 0.854 0.896 0.891 0.86
  • Performance of the Bayesian Compound Covariate Classifier:
  • Class Sensitivity Specificity PPV NPV
    N 0.812 0.604 0.672 0.763
    T 0.604 0.812 0.763 0.672
  • TABLE 18
    Composition of classifier: Sorted by t-value (Sorted by gene pairs)
    Class 1: N; Class 2: T.
    Geom mean Geom mean
    Parametric % CV of intensities of intensities Fold- Gene
    p-value t-value support in class 1 in class 2 change symbol
    1   <1e−07 −6.713 100 3.011798 131.8077746 0.0228499 SALL3
    2   <1e−07 −6.491 100 3468.2688243 17623.4446406 0.1967986 SPARC
    3   <1e−07 −6.208 100 44.8968301 392.9290497 0.1142619 PITX2
    4     1e−06 −5.248 100 1.6348595 29.429 0.0555527 F2R
    5   3.91e−05 −4.318 100 1.0069555 2.2540195 0.4467377 TERT
    6 −0.0003748 −3.691 100 55.7796365 248.4967761 0.2244683 RASSF1
    7 0.0009309 −3.419 100 3.1978081 14.7357642 0.21701 HOXA10
    8 0.0137274 −2.512 100 169.3121483 365.1891236 0.4636287 TNFRSF25
    9 0.1465343 1.464 98 4255.1669082 2324.5057894 1.8305684 DNAJA4
    10 0.1463194 1.465 50 326.8534389 203.1873409 1.6086309 TP53
    11 0.0176345 2.416 100 2588.5288498 1455.2822633 1.7787126 CXADR
    12 0.0009772 3.404 100 3114.5146028 939.9618007 3.3134481 KL

    Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor. The area under the curve is 0.944 (FIG. 1).
  • Example 11e CLASS Prediction: Histology: p<0.05 Using all qPCRs for Class Prediction Analysis of Tumor-Subtype Versus Normal Lung Tissue
  • TABLE 19
    Composition of classifier: Sorted by p-value
    Class 1: AdenoCa; Class 2: N; Class 3: SqCCL.
    Geom mean Geom mean Geom mean
    Parametric % CV of intensities of intensities of intensities Gene
    p-value t-value support in class 1 in class 2 in class 3 symbol
    1   <1e−07 23.305 100 11832.9848147 3468.2688243 22878.8045137 SPARC
    2   <1e−07 22.546 100 98.6115161 3.011798 159.4048479 SALL3
    3     1e−07 19.146 100 7.6044403 1.6348595 71.4209691 F2R
    4     1e−07 19.124 100 359.9316118 44.8968301 416.1715345 PITX2
    5   2.81e−05 11.753 100 90.8736104 55.7796365 480.3462809 RASSF1
    6   3.15e−05 11.611 100 48.8581148 3.1978081 6.7191365 HOXA10
    7 0.0001543 9.66 100 1.9602703 1.0069555 2.4699516 TERT
    8 0.0042218 5.802 100 1047.8074626 3114.5146028 875.3966524 KL
    9 0.0233243 3.914 100 263.7738716 169.3121483 451.9439364 TNFRSF25
  • Performance of Classifiers During Cross-Validation
  • Mean Percent of Correct Classification, n=96:
  • Diagonal Linear 3-Nearest Nearest
    Discriminant 1-Nearest Neighbors Centroid
    Analysis Correct? Neighbor Correct? Correct?
    Mean percent 72 74 74 72
    of correct
    classification:
  • Example 11f Bintree Prediction: Histology—p<0.05 UNpaired Samples “Compound Covariate Classifier” Optimal Binary Tree: Cross-Validation Error Rates for a Fixed Tree Structure Shown Below
  • Group 1 Group 2 Mis-classification rate
    Node Classes Classes (%)
    1 AdenoCa, N 14.6
    SqCCL
    2 AdenoCa SqCCL 31.2
  • Results of Classification, Node 1:
  • TABLE 20
    Composition of classifiers (10 genes): Sorted by p-value
    Geom mean of Geom mean of
    Parametric % CV intensities in group intensities in group
    p-value t-value support 1 2 Gene symbol
    1   <1e−07 6.713 100 131.8077753 3.011798 SALL3
    2   <1e−07 6.491 100 17623.4448347 3468.2687994 SPARC
    3   <1e−07 6.208 100 392.9290438 44.8968296 PITX2
    4     1e−06 5.248 100 29.4290011 1.6348595 F2R
    5   3.91e−05 4.317 100 2.2540195 1.0069556 TERT
    6 0.0003748 3.691 100 248.4967776 55.779638 RASSF1
    7 0.0009309 3.419 100 14.7357644 3.197808 HOXA10
    8 0.0009772 −3.404 100 939.9618108 3114.5147006 KL
    9 0.0137274 2.511 100 365.1891266 169.3121466 TNFRSF25
    10 0.0176345 −2.416 100 1455.2823102 2588.528822 CXADR
  • Results of Classification, Node 2:
  • TABLE 21
    Composition of classifier (3 genes): Sorted by p-value
    Geom mean Geom mean
    Parametric % CV of intensities of intensities Gene
    p-value t-value support in group 1 in group 2 symbol
    1 0.0058346 2.892 50 48.8581156 6.7191366 HOXA10
    2 0.0253305 −2.312 50 90.8736092 480.3462899 RASSF1
    3 0.0330755 −2.197 49 7.6044405 71.4209719 F2R
  • SEQUENCE LISTING
    SEQ ID NO: DNA-SEQUENCE
    1 CGGCCGGTCAGGAATCCCCATCCTGGAGCGCAGGCGGAGAGCCAGTGGCT
    2 CCAAAAAAGGTGACACTGCCCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGTCCGGGCC
    3 CGCCCCGCCCCCGCCAACAACCGCCGCTCTGATTGGCCCGGCGCTTGTCTCTT
    4 AGCGGCCTCAGCCTGCGCACCCCAGGAGCGTGGATGACTACGGCCACCCC
    5 GCAGCCGAGAGGGTCAGGCCCCCATAGGTCCTCAGCCTGCTTCAACCTCAAAGGGGATGGGGG
    6 TCCTGGCAGCATTACCACACTGCTCACCTGTGAAGCAATCTTCCGGAGACAGGGCCAAAGGGCCA
    7 CTGACAAGAGACATGCAGGGCTGAGAGGCAGCTCCTTTTTATAGCGGTTAGGCTTGGCCAGCTGC
    8 TGGCATCCACTTGCTTGATCCAGCCAGATTCCCACTCCCATGCCCTCTCCACTATTGCGATTGC
    9 CTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGTGCCGGCCTCCTGTCATCCTCC
    10 CCGGCGCACTCCGACTCCGAGCAGTCTCTGTCCTTCGACCCGAGCCCCGC
    11 TAGGTGGTGAGTTACTTGGCTCGGAGCGGGCGAGGGGACGCGTGGGCGGAGCG
    12 AACCACCTGATCAAGGAAAAGGAAGGCACAGCGGAGCGCAGAGTGAGAACCACCAACCGAGGCGC
    13 CGGGGGTAGGCTTTGCTGTCTGAGGGCGTCTGGCTGTGGAGCTGAAGGAGGCGCTGCTGAG
    14 GCCCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTGGCGCCC
    15 CGGGGCCACGCGCTAAGGGCCCGAACTTGGCAGCTGACCGTCCCGGACAG
    16 CCACCGAACACGCCGCACCGGCCACCGCCGTTCCCTGATAGATTGCTGATGC
    17 GAACTGGGTCGTGGAAGGATCGCGGGGAGCGGCCCTCAGGCCTTCGGCCTCACT
    18 CCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCT
    19 GGCGGCTGGTGCTTGGGTCTACGGGAATACGCATAACAGCGGCCGTCAGGGCGCC
    20 TCAGATTCCTCAGGGCCGCAGAGGTGTGGAGCTGGTTGGGCCGGTTCTTCACCCTCCTCCC
    21 CTGGCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCCGG
    22 CTAACCTTCCTCGCCGCCTTCCTGCGGGTGACCCCCAAACGCCCCAGCTCCGC
    23 CCGACTTGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTT
    24 ACAGAGTCGGCACCGGCGTCCCCAGCTCTGCCGAAGATCGCGGTCGGGTC
    25 GGGGATGGAGAACTCTCCTCGCTTCGTCCTCTCTCCCGGGGAATCCCTAACCCCGCACTGCG
    26 GTGGCTCGGGTCCACCCGGGCTGCGAGCCGGCAGCACAGGCCAATAGGCAATTAG
    27 CTCACCCCGCGACTTACCCCACACCCCGCTCTCCAGAACCCCCATATGGGCGCTCACC
    28 ACACACCACTGCAGCGTTCAAACGCTGGGAAGAAGACTCCCTTGTGGCACCGGAAACCCACGAGG
    29 CCGCCACGAACTTGGGGTGCAGCCGATAGCGCTCGCGGAAGAGCCGCCTC
    30 CTCCATAGCCCTCCGACGGGCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCT
    31 TGGACACCCCAAGAGCTCACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGC
    32 CCGCCCGTTTCAGCGGCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCA
    33 AACCCAGGCTTGTCAGCCTAAGAACACGGGATCTCTTCACTGTGGTTCATGTGTAGAGTG-
    GAGTTTCCA
    34 CAGTCCCCTGCCGTGCGCTCGCATTCCTCAGCCCTTGGGTGGTCCATGGGA
    35 CAGGTGGGCGTCTCAGGGGTGGGAGTGGCCGCGTCGTGAAGCGGAGAGAGGA
    36 CTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTG
    37 TTGTATGGATTTCGCCCAGGGGAAAGCGCTCCAACGCGCGGTGCAAACGGAAGCCACTG
    38 GAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAGGATGCGCAGACTGCCT
    39 ACGCACCGCGGCTCCTCGCGTCCAGCCGCGGCCAAGGAAGTTACTACTCGCCCAAAT
    40 CGCTGCCTCGCCATTGGGCGGCCGAACGCAGCCACGTCCAATCAGAGGAGT
    41 GAGGTTCTGGGGACCGGGAGAGTGGCCACCTTCTTCCTCCTCGCGAAGAGCAGGCCGGG
    42 AGTGGGATTGGGGCACTTGGGGCGCTCGGGGCCTGCGTCGGATACTCGGGTC
    43 TCAAGCCGCCTCAGGTGAGCGCTCCTTGGCGCTACTTCCGGTCTCAGGTGAGGCCGC
    44 TTGTGACGTGTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCAC-
    CCTGC
    45 CGGGGGGAGAAGTCCTGGAGCGGGTTTGGGTTGCAGTTTCCTTGTGCCGGGGATCCTGTCC
    46 GAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAGGCCTCTGGCTGCTGAGG
    47 CCCTCTCTCCCCTGGCCCGCAAAGTTTTGGCGGAGCCATCGCTGGGGCTGAGC
    48 CCAGGGGGAACTTGTGGCAGTGCAGCATCTCAGGCCAGGGGAAGCCGTAGGCCTCCATGA
    49 CGCCACCCAGAGCCCGAGGTTTGCCCTTCAGAAGCGGACCCGCAGACTCCTCGGACT
    50 CGCCGAAATGAAACCCGCCTCCGTTCGCCTTCGGAACTGTCGTCACTTCCGTCCTCAGACTTGGA
    51 TCCCTTGTTTTGAGGCGGGAACGCAACCCTCGACCGCCCACTGCGCTCCCA
    52 GGCAGCCGGGAAATCCCGTGTCCCCACTCGTGGCAGAGGACGCTGTGGGG
    53 CCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCATGTCAA
    54 GGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACTTTCTCCCGGTGCCG
    55 AAGTTCCGCCAGTGCACAGCAACCAATGGGCGGAGGGGTCCTTTGCCCCTGGGTTGC
    56 AGTTGGGCCGGATCAGCTGACCCGCGTGTTTGCACCCGGACCGGTCACGTG
    57 GGGCCGCTGCCTACTGTGGGCCTGCAAGGCGTGCAAGCGCAAGACCACCA
    58 ACCTCCCTGCTGCGTGTCGCAAACCGAACAGCGGGCGTTGGCCCTCCTGC
    59 GGGACCCGGAGCTCCAGGCTGCGCCTTGCGCCCGGGTCAGACATTATTTAGCTCTTCGGTTGAGC
    60 GGCCGTGCGGGGCTCACCGGAGATCAGAGGCCCGGACAGCTTCTTGATCGCC
    61 CCACTGCCTGCGGTAGAACCTGGTCCCGCATAGCTTGGACTCGGATAAGTCAAGTTCTCTTCCA
    62 GGGCCGCAGGCCCCTGAGGAGCGATGACGGAATATAAGCTGGTGGTGGTGGGC
    63 GCAGGACCCGGATGAGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGC
    64 AAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGGGACTCCAG
    65 AGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGCGCCC
    66 GCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTCCCGGCC
    67 TGCTGGACATCCACCGCCTCCAGGCAGTTTCGCCGTCACACCGTCGCCATCTGTAGC
    68 GGCCGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGC
    69 CGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGCCAGAAGACGCTCTACTCCTTTTTCTCC
    70 CGGGGGCGGAGGAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCG
    71 AGGGCCAGCCCTTGGGGGCTCCCAGATGGGGCGTCCACGTGACCCACTGC
    72 GTGAAAGGTCGGCGAAAGAGGAGTAAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCG
    73 GCGCTGAGGTGCAGCGCACGGGGCTTCACCTGCAACGTGTCGATTGGACG
    74 GAGGCCTCATGCCTCCGGGGAAAGGAAGGGGTGGTGGTGTTTGCGCAGGGGGAGC
    75 CGAAGTGGAAACCGGAGTTGCGTCATTGCTCCCACCCGATATCACCTTGGCAGCGACCGCG
    76 ATGGGGTGCTCATCTTCCTGGAGCTGAGGAGCTGGGACGGGCATGGGGTGCTCATCCTCCTG
    77 TTCCAGCCGGTGATTGCAATGGACACCGAACTGCTGCGACAACAGAGACGCTACAACTCACCGCG
    78 CAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGCCCGCTGGCGGCAGCGGC
    79 GACACACCCACCTCAGCAGATCTCAGCCCATCCCTCCCAGCTCAGTGCACTCACCCAACCCCAC
    80 CGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCTGCG
    81 GGCCCAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGACCCACGC
    82 ACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGACGAGCGCGTCTTCCGCAGTCCCA
    83 CCCTTGGAAGGCGTGGAATTAGGAGAGAAATCCCTTAGTGGGCACACGAGTGAGTGCCCCTTGGA
    84 CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTCCCGCCCC
    85 TGCGTGGGTCGCCTCGCGTCTCTCTCTCCCACCCCACCTCTGAGATTTCTTGCCAGCACC
    86 GACTTCGCGTCGCCCTTCCACGAGCGCCACTTCCACTACGAGGAGCACCTGGAGCGCAT
    87 GAGGCTGCGAGCCTGGGCTCCCAGGGAGTTCGACTGGCAGAGGCGGGTGCAG
    88 CCATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGCGCCTGCCACCACTCCCGGC
    89 CAGGGGACGTTGAAATTATTTTTGTAACGGGAGTCGGGAGAGGACGGGGCGTGCCCCGACGTGCG
    90 ACCCTGGAACGACGCCAAACGCGACCCCTACCAGAGGACTCGCGCATGCGCAGC
    91 GTTCCCAAAGGGTTTCTGCAGTTTCACGGAGCTTTTCACATTCCACTCGG
    92 GAAAGACACCGCGGAACTCCCGCGAGCGGAGACCCGCCAAGGCCCCTCCAG
    93 CCCTCTCCGCCCCAAACAGCTCCCCACTCCCCCAGCCTGCCCCCACCCTC
    94 ATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACTGGGTGTGGTAGAGGCTGTCCAGGG
    95 CCCAGCGGGGCCCTTAGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCG
    96 GCGCCCAAGGCCCTGCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTC
    97 CCCAGCCGAGCAGGGGGAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGG
    98 GCCGCCACCTGTTGAGGAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGG
    99 CGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCGGCTATAAGAACCGGGAACTGGGCGCG
    100 GGCGGGGAAGCGTATGTGCGTGATGGGGAGTCCGGGCAAGCCAGGAAGGCACC
    101 GGAGCCCGCAGTGCGTGCGAGGGGCTCTCGGCAGGTCCAGACGCCTCGCC
    102 CGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGCCCTTCTGGTGTT
    103 GCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAACGAGAATCCTCC
    104 GGCAGGAAAGGGCCCGAAGGCAGCGAAGGCGAACGCGGCGCACCAACCTG
    105 ACAGGGTCTTCCCACCCACAGGGCACCCAGGCGCAGCGGAGCCAGGAGGG
    106 ACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGTGGGGTCTTTCTGGCGGC
    107 CAGCCGGGCAGATAACAAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGT
    108 GGCCTCGACGCCGAGGGGTGTCCCTCTCCTCTCCTGGTCAGGGAACGCAGCAACTGA
    109 GGGCGGCAGTCAGAGCTGGAGCTCCGGGGAATCAGACGGGCAGCCAAAGGAGCAGA
    110 CGGAAGTGCCCCGGTCCTGGAGGGGGTGGAAGTTGGGGAGCCCAGGCAGGA
    111 CCGAGAGGGAAGAAAAAAATACCCTCTTTGGGCCAGGCACGGTGGCTCACCCCTGTAATCCCAGC
    112 TCCCAGCACTTTGGGAGGCTGAGGCGAGCGGATCACGAGATCAGAAGATCGAGACCATCCTGGC
    113 CCCCGGGACCGGATAACGCCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCC
    114 CTCGCGACCCCGGCTCCGGGCCTCTGCCGACCTCAGGGGCAGGAAAGAGTC
    115 CCCGAGGCTCGCCCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCT
    116 CTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGTGCTCAACACGCAGC
    117 CCGGCCTTTCCGCCAGAGGGCGGCACAGAACTACAACTCCCAGCAAGCTCCCAAGGCG
    118 GGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCACCAGGTAGGACTGGGACTCCCTTAGGGCCTG
    119 GGGAGTGTCCTCCTCCGGGACAGCCGGACTCCCGCCGACTTCTGGGCGGC
    120 GGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAGAGAGGACTTG
    121 GACAGATGCAGTGCGTGCGCCGGAGCCCAAGCGCACAAACGGAAAGAGCGGG
    122 TCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCC
    123 TGCGGCTTCTCTCACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCG
    124 GCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCACCGCTTCCCTTCTTCCA
    125 CCCGCACCCCTATTGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTG
    126 GCAGAGTTCGTGCAGGGAGTTCGCACATAGGAGAGCACCGGTCCGGGAGTGCCAGGCTCG
    127 CGGCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGAC
    128 TTTTTGGGACAACCATGGAGGGGTCCTCCGTCTCGGCCTCTTCGCATATCCCCCTCCGTGATCC
    129 CGGCGGGTCAGATCTCGCTCCCTTTCGGACAACTTACCTCGGAGAGGAGTCAAGGGGAGAGGGGA
    130 CCCGGACGAGCTCTCCTATCCCGAAGTTGTGGACAGTCGAGACGCTCAGGGCAGCCGGGC
    131 CGGCCGGTGGAGGGGGGAAGGGAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAG
    132 CACCAAAGCCACCACCCAAGCCAGCACCAAGGCCACCACCATATCCTCCCCCAAAGCCACTACCA
    133 CCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATGGCTTCCA
    134 CCTGTCCGGATCCCTCCCCGCCTTGCTCAGATCTCTGGTTCGCGGAGCTCCGAGGC
    135 GCGCAGGGGCCCAGTTATCTGAGAAACCCCACAGCCTGTCCCCCGTCCAGGAAGTCTCAGCGAG
    136 TCCTGCCCCAGTAAGCGTTGGACCGGGAGACGCAGTGCTCAGCATCGGTCAGCAGGG
    137 GCGCCGAGGAGTCGGGACAGCCCCGGAGCTTCATGCGGCTCAACGACCTG
    138 GGCCCCAGCGGAGACTCGGCAGGGCTCAGGTTTCCTGGACCGGATGACTGACCTGAGC
    139 CGCCGGCTGCGAAGTTGAGCGAAAAGTTTGAGGCCGGAGGGAGCGAGGCCGG
    140 GGAGCCGCTTGGCCTCCTCCACGAAGGGCCGCTTCTCGTCCTCGTCCAGCAGC
    141 AAATGTGGAGCCAAACAATAACAGGGCTGCCGGGCCTCTCAGATTGCGACGGTCCTCCTCGGCC
    142 CCTCTCAGATTGCGACGGTCCTCCTCGGCCTGGCGGGCAAACCCCTGGTTTAGCACTTCTCA
    143 TCTCCCCACGCTTCCCCGATGAATAAAAATGCGGACTCTGAACTGATGCCACCGCCTCCCGA
    144 GCCCAATCGGAAGGTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCG
    145 CGTGGGGGGCTGTTTCCCGTCTGTCCAGCCGCGCCCACTTCTCAGGCCCAAAG
    146 GGGGCCCTCGTGTTGCTGAACGAGGGCGGGTTCGCGATGTAAATAAGCCCAGAGGTGGGGTC
    147 CCTGGGTCCCCTCGGCTCTCGGAAGAAAAACCAACAGCATCTCCAGCTCTCGCGCGGAATTGTC
    148 CATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACTGCCTCCCACCGCACTGGGGTCAA
    149 ATCCCGCTGCACCACGCCATGAGCATGTCCTGCGACTCGTCTCCGCCTGGC
    150 GCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCGGCGAAGTCG
    151 CATTTCTTTCAATTGTGGACAAGCTGCCAAGAGGCTTGAGTAGGAGAGGAGTGCCGCCGAGGCGG
    152 AAGTTCACTGAGGGTTGTAAGAGTCAGAATGGACTCCATGGAAGTTATGGGGTGTGAATCAAACCT-
    CACA
    153 CAGCACTTTGGGAGGCCGAGGTGGGCGGATTGCCTGAGGTCAGGAGTTTGAGACCAGCCTGG
    154 GGGCAACACACACAGCAGCGACAGCCGGGAGGTAAGCCGCGTCCCAGCGG
    155 CTGAGGGGAGGAGAAACTGGGCTGCGGGGGTCCGGGAGGGTGGATTCCGAGAAACTATGTGCCC
    156 GTGTCCCAGCGCGTTGACGCAGCCTGTGATCCCTCGCGAGGCGAGGAGAAGGTC
    157 AACCCCGACCTCAGGTGATCTGCCCAAAAGTGCTGGGATTACAGGCGTCAGCCACCGCGCC
    158 AGGACGAAGTTGACCCTGACCGGGCCGTCTCCCAGTTCTGAGGCCCGGGTCCCACTGGAACT
    159 GGAGACGCGTTGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAAT
    160 CACAAAGGCCAAGGAGGGAGTGCGCAGGTCACGTGCGCCGGTGGTCAGCG
    161 CTGACCTGGCGCTGCTGCCCCTGGTGCCTGACGGAGGATGAGAAGGCCGCC
    162 AAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCACCGCCGGACGCTGGCTCCC
    163 GTTCTGTTGGGGGCGAGGCCCGCGCAAGCCCCGCCTCTTCCCCGGCACCAG
    164 GCGTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCG
    165 TGTCTGAGTATTGATCGAACCCAGGAGTTCGAGATCAGCTTGAGCAAGATAGCGAGAACCCCCGC
    166 GAAAGACTGCAGAGGGATCGAGGCGGCCCACTGCCAGCACGGCCAGCGTGG
    167 TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC
    168 TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC
    169 GGGGACGAGCAGGAAAAGGCCGGGGTGGGGGTGGAATTCCTCGGCGGGCAG
    170 GGGAGCCTGAGGCAGGAGAATCGCTTGAATCCGGGAGGCGGAGGTTGCAGTAAGCCGAGATCGC
    171 CTTTCGGAGGCCTCATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT
    172 CCTCCTGGGGTCAAGTGATCATCCTGGCTCAACCACCCAAGTAGCCGGGACTACGGGTGGCCGC
    173 CCAATGCCCCAACGCAGGCCACCCCCGGCTCCTCTGTGGACTCACGAAGACAAGGTC
    174 CTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGGCTGCGCTGGGGCCT
    175 GGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCGGACAGCGTCACCT
    176 GCCAGGATGGTCTCGATCTCCTGACCTTGTGATCTGCCCGCCTCGGCCTCCCAAAGTGTTGGG
    177 TGCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTG
    178 GACCGGTTTTATCCCGCTGAGGCCCTGGGAGATGGGTCTGGCGAGGCTCGTAGGCCGC
    179 GCGGAACCTCAAATTGCGGCAGCGGAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGG
    180 GCTCAGTCCCTCCGGTGTGCAGGACCCCGGAAGTCCTCCCCGCACAGCTCTCGCT
    181 CGGGCAGGCGGGACCGGGAGGTCAATAACTGCAGCGTCCGAGCTGAGCCCA
    182 CGCGGTGGGCCGACTTCCCCTCCTCTTCCCTCTCTCCTTCCTTTAGCCCGCTGGCGCC
    183 TCCCCGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACG
    184 CTCCGCGCTCAGCCAATTAGACGCGGCTGTTCCGTGGGCGCCACCGCCTC
    185 GCGAGAGGGTCGTCCGCTGAGAAGCTGCGCCGGAGACGCGGGAAGCTGCTG
    186 GACCCGCCTGCGTCGCCACCCTCTCGCCGCTCCCTGCCGCCACCTTCCTC
    187 GAGGGGTCCGGGACGAAGCCACCCGCGCGGTAGGGGGCGACTTAGCGGTTTCA
    188 CCCCGAACAAAAAATTCAAATGGGAAAGAGAGGCAGATGGCAGAGAACAGGGGAGGGGCTGGGCA
    189 GCGGCGAGGAGGGTCACAGCCGGAAAGAGGCAGCGGTGGCGCCTGCAGAC
    190 GGCGGTCTCCGGTTCGCCAATGTGGCTGGGTCCGTAGGCTTGGGCAGCCT
    191 CCTCCCCTTTGCGTGCGGAGCTGGGCTTTGCGTGCGCCGCTTCTGGAAAGTCG
    192 AGCCTACTCACTCCCCCAACTCCCGGGCGGTGACTCATCAACGAGCACCAGCGGCCAGA
    193 CAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAGCCGAAGGAGACGCTGCAGTTGGAGAGCG
    194 AGATTTCCCGCCAGCAGGAGCCGCGCGGTAGATGCGGTGCTTTTAGGAGCTCCGTCCGACA
    195 CGGGCGTGGTGGTGGGCACCTGTAATCCCAGCTACTCAGAAGGTTGAGGCAGGAGAATCGCTTGA
    196 TCCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCC
    197 CCCTGGTCGAGCCCCCTTTCCTCCCGGGTCCACAGCGAGTCCCCTGAGGAAGGAGGG
    198 CAGGGACCCGCGAGTCCCTGGACACGCACTGGCCAACGCCAGACCCCATC
    199 CAAGCAGCCCTCGGCCAGACCAAGCACACTCCCTCGGAGGCCTGGCAGGG
    200 GAGAAGGAGCGACCCCCAAAACGAAGCGGCTGGATCTGACCTTCCAAGGCCTGTTGGCGACGC
    201 TTCTTCCCCGCAGGGTCAGCGCTGGGGCTCCGGCCGTAGAGCCACGTGACC
    202 ATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGTTATAAATAAACGTT
    203 ACCGAGTGCGCTGCTGTGCGAGTGGGATCCGCCGCGTCCTTGCTCTGCCC
    204 GTGTGGTGAGTGTGGGTGTGTGCGCGTCTCCTCGCGTCCCTCGCTGAGGTGCCT
    205 GCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGATCATGCCCATCCTGTGCAGAAG
    206 GCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGGGCCACTC
    207 TCTCCATGGTGGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGC
    208 CGCGGGACTCCGCGGGATCTCGCTGTTCCTCGCTCTGCTCCTGGGGAGCC
    209 CGCCCCCTTTTTGGAGGGCCGATGAGGTAATGCGGCTCTGCCATTGGTCTGAGGGGGC
    210 GTTCTGTTGCCAATGCCATTCAGACCCCAGTCCGGGATTCCGCGCTCGGGGTGCG
    211 TTTCCGCGAGCGCGTTCCATCCTCTACCGAGCGCGCGCGAAGACTACGGAGGTCGA
    212 ACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCTGAGCCTCCTG
    213 GAGAGTGGACGCGGGAAAGCCGGTGGCTCCCGCCGTGGGCCCTACTGTGC
    214 GGCTACAGCCGCCATTTCCACGCTCCACCAATCAAATCCATTCTCGAGGAAGACGCACCGCCCC
    215 AGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCG
    216 ATCGGGCGAAGCTCGCGGGAAACCGCTCTGGGTGCGCAGGACAAAGACGCG
    217 CGACGGAGCCGTGTGGAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCG
    218 CGCCCCACTACTGCCTGCAGCGGGCTTCCTTACTCCGCCTGCTGGTTCCTACTGGAGGAGAGGCC
    219 GCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGTTGAAGAAGTGAAGCGCCG
    220 TGAAGGGAGGGCTTGGTGTGGGGACTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTA
    221 CGGGAGCGCCCGGTTGGGGAACGCGCGGCTGGCGGCGTGGGGACCACCCG
    222 CAGCACCGGAGAGGGCGCACTGCAAAGGCGGGCAGCAGACCGTGGAGAGC
    223 GGCGCAGAGGCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGC
    224 GCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCTG
    225 CGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGAGCGATCCGAGCAGGGTTTGTCTGGGC
    226 GCAGCGCTGAGTTGAAGTTGAGTGAGTCACTCGCGCGCACGGAGCGACGACACCCC
    227 CGCGCGCTCGCCGTCCGCCACATACCGCTCGTAGTATTCGTGCTCAGCCTCGTAGTGGC
    228 CGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGGGAGTTAGCGGCGTCTCG
    229 GGGGGCGTCGGGCTTGGGACAGGGGAGGATACCAGGGCCACCTTCCCCAACCC
    230 CGGGCTGGAGGGTTATCTGGGAAGTCAGCCCCGGCCTCGGTCCTCTCCACGTTGCTGC
    231 GGAACGAGGTGTCCTGGGAACACTCCCGGGTCTGTAACTTCGGACAAATCACGCTCGCTTTCCCG
    232 AAACGAGAGAGTAGCCAGACTCTCCGCGCATGGAGCCGACGGCACCCACCAGCACACCG
    233 TACTCACGCGCGCACTGCAGGCCTTTGCGCACGACGCCCCAGATGAAGTC
    234 TGACCGGACAGAGCAGAGCGGGGACTGCAATTCCCAGAAGACCCCACGGTAGGGGCGG
    235 AGACAATCCCGGAGGGGGAAAGGCGAGCAGCTGGCAGAGAGCCCAGTGCCGGCC
    236 GGCCGAAGAGTCGGGAGCCGGAGCCGGGAGAGCGAAAGGAGAGGGGACCTGGC
    237 CCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCACCGCGCATCCAGGAGCCAC
    238 CTCTGATGACGCTCCAAGGGAAGAGGAAGTGGGGATCGGCGAGCGGGTGGGTGCGC
    239 TGAAGGGTAATCCGAGGAGGGCTGGTCACTACTTTCTGGGTCTGGTTTTGCGTTGAGAATGCCCC
    240 CGGTCCTGCATGCAATGCAAGCCTGAGCTCTCCCGCCATAAGGCTGCAGCGGTGTGG
    241 CCTGGAGGAGGAGGAGTCAGGCCGGGTAGGAGGGCTAAGGAGGTTCCCGGGAAGGCAGGGCCC
    242 GCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAGCCGGC
    243 GAGCGGCGCAGGGTTGGAGAGGGAAGCGCTCGTGCCCACCTTGCTCGCAG
    244 CCGATGACCGCGGGGAGGAGGATGGAGATGCTCTGTGCCGGCAGGGTCCC
    245 GCCGCCCTACAGACGTTCGCACACCTGGGTGCCAGCGCCCCAGAGGTCCC
    246 GGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAGGAGGGGCAGGAGGAAGGGGTG
    247 CGGCGGGACCATGAAGAAGTTCTCTCGGATGCCCAAGTCGGAGGGCGGCAGCGG
    248 GTGGGCGCACGTGACCGACATGTGGCTGTATTGGTGCAGCCCGCCAGGGTGT
    249 GAAAGAGCCGGAAACACCTGGTCTCTCAAGCAGGTACAGCCCGCTTCTCCCCAGCACCCCGGTG
    250 GCAGCCGCAGCTGAGGTCACCCCGCTGAGGTGGTGGGGAGGGGAATGGTT
    251 GGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGCCGC
    252 GGGCACCACGAATGCCGGACGTGAAGGGGAGGACGGAGGCGCGTAGACGC
    253 GAGGCCGCCATCGCCCCTCCCCCAACCCGGAGTGTGCCCGTAATTACCGCC
    254 CGCGGGGAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAGGGGCGC
    255 CTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAGGTAGCGGCCAC
    256 CACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTACGAGTGGATGC
    257 CCCTTGCACACGAGCTGACGGCGTGAACGGGGGTGTCGGGGTTGGTGCAA
    258 GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAGGCA
    259 GGTTGGGGGACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCG
    260 CGTTGAAAGCGAAGAAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTC
    261 GAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCG
    262 GCTCCTGAGAAAGCCCTGCCCGCTCCGCTCACGGCCGTGCCCTGGCCAACTT
    263 GATGCTGCTGCCGGAGCTGAGGTCTTGCCTGGAGATCCGAACGAGACACCACGTCAACCGG
    264 TGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGTCCAGA
    265 CCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTTCGCGAAA
    266 CAGGCGGCTCCCGCAGTCTAAGGGACCTGGCGCGAGTCCGGGAAGCGGAGG
    267 CTGCACGCGGTGCGAAGGGGCCAGCAGGGAAGGAGCAGAGGATGGGGGGT
    268 CGGGGCCACAGGACCCTGGGGCTTGAGTCACACAAGAATGTCTCTGGGAGACCCGAGAGACTCA
    269 CTTAGAGGAGGAGGAGCAGCGGCAGCGGCAGCAGGAGGCGACAGCTGCCAGCCG
    270 CTCATACCAGATAGGCGCGAACGCCTCTGGCAGCGGCGTCCAGGGGGTCCGGC
    271 GGGTGCTGGCACATCCGAGGCGTTCTCCCGACTCTGGACCGACGTGATGGGTATCCTGG
    272 CATGATAAGCCAGGGACCTCGCGGCGCAGGCGGAGGGAGGGAGAGCGTCGC
    273 CCCCCCACTCAACAGCGTGTCTCCGAGCCCGCTGATGCTACTGCACCCGCCG
    274 TCCCACCTGCTGCCCGAGGAAGACTTCCGGGAGAAACGCTGTCTCCGAGCCCCCG
    275 CCAGGTGAAGCCGAAGGGGAAGCGGATGGGGTTGCTGAACGCGGAGTCGGCG
    276 CAGTGGCCCTGCGCGACGTTCGGCGCTACCAGAACTCCGAGCTGCTGATCAGCAAGC
    277 AAGGATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGG
    278 GCAGGCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAA
    279 GAGGGAAGTGCCCTCCTGCAGCACGCGAGGTTCCGGGACCGGCTGGCCTG
    280 GAAGCGCGACCTCGGGCGGTTGGAGGGGCTACCGGGTCTTACCAGTCCGTGGCG
    281 CCCAACCCGAGCAAGACCTGCGCTGAAACGGATTGGCTGCCCTCCGCCCG
    282 AGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAGGCTCCCACC
    283 ACCACACGGCCAAGGGCACCTGACCCTGTCAAAACCCCAAATCCAGCTGGGCGCG
    284 CCGAGGCAGCCGGATCACGAAGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACCCCGT
    285 CCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGC
    286 GGCGCAGATGGCGCTCGCTGCGAGATGGATGCTCCAGGGCGGGTAATCACTCCTG
    287 CCAGGCCTCCTGGAAACGGTGCCGGTGCTGCAGAGCCCGCGAGGTGTCTG
    288 GGCGAGAGGTGAGAAGGGAAGAGGGCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGC
    289 GCCTGCCTCGCCTCTGCCCGAGCTGATGAGCGAGTCGACCAAAAAAGAGTTCGCGGCG
    290 CATTGCGGGACCCTATTTATCCCGACACCTCCCCTGACGTGGGCTCGGAACGCTCCCTTGGCAG
    291 CGAAGGCCGGAGCCACAGCGCTCGGTGTAGATGCCGCACGGCTGGCCCTC
    292 GGGCTGGATGAGTCCGGAAGTGGAGATTGGCTGCTTAGTGACGCGCGGCGTCCCGG
    293 CGCCAGTGCGATTCTCCCTCCCGGTTCCAGTCGCCGCGGACGATGCTTCCTC
    294 CGTCCGAGAAAGCGCCTGGCGGGAGGAGGTGCGCGGCTTTCTGCTCCAGG
    295 TCCGGCTGCGCCACGCTATCGAGTCTTCCCTCCCTCCTTCTCTGCCCCCTCCGCTCC
    296 CAGCCTCAGTTTCCCCATTGGTAAAGCATTGACGGTGGTTGCGGACGGCTTCTGCGGACAGAGCC
    297 CCTGAGACAGGCCGAACCCAACTCTTCACAGGGCCGAATTCTTTGCCCGCAGCCCAGCACC
    298 CAGAGGGGGGTGCCGGGGTCGCGGACTGCCACCAGGTTGAGGAAAGGAGGGG
    299 CGACATCCTGCGGACCTACTCGGGCGCCTTCGTCTGCCTGGAGATTGTAAGTGGGGCCGC
    300 ACCGCCTCCTCCCCGCTGTCTGGGTCGCAGGCCTTAGCGACGGGCTGTTCTCCG
    301 CTCGGGACTCCAGGGCTGTCCCTCCCGCAGGCTGTCCTTCCACCTCCACCCCA
    302 CGGCCGCTCCTCGTAGGCCAGGCTGGAGGCAAGCTCCTTCTCCTCAAAGCTGCGCTGC
    303 CATCTCTTCCCCCGACTCCGACGACTGGTGCGTCTTGCCCGGACATGCCCGG
    304 CCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGTGAGTCCCGGCC
    305 CCCACTCTTCCCCTGACTCCGACGGCGGGTTCGTCCTGCCCAGACATGCCCG
    306 GTCCCCCTCTCTCTCTGCCCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGG
    307 CAGCCTGCTGAGGGGAAGAGGGGGTCTCCGCTCTTCCTCAGTGCACTCTCTGACTGAAGCCCGGC
    308 ACTGACTCCGGAGGCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAG
    309 GGCCAGGCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCG
    310 CCGGGCTTCTGGGACGCTCAGCCGTGCGCTACCCGGTGCAGCTGCTTTCTCACC
    311 TTTAGGTAGACGTGGAGGCGACTCAGATCGCCTCGCGGTTCCCGGGATGGCGCGGTCG
    312 TGACCAGGACCGCAGGCAAGCACCGCGGCGACGGTTCCAGCCAGGAAAATGAG
    313 GGGCCGGACCCGGCCTCTGGCTCGCTCCTGCTCTTTCTCAAACATGGCGCG
    314 GCCGCGCTCCTCGCACCGCCTTCTCCGCAGGTCTTTATTCATCATCTCATCTCCCTCTTCCCC
    315 GAGCTGCGAACTGGTCGGCGGCGCAAGGCGCGGACTCCGGTGAGTTGTGT
    316 GCCCGCGTTCCTCTCCCTCCCGCCTACCGCCACTTTCCCGCCCTGTGTGC
    317 ACGCGTCGCGGAGTCCTCACTGCCCCGCCTCGCTCTGGCAGAGTGGGGAG
    318 GCGAGCAGCGGCCTCCAGCGCTGGTGGCTCCCTTTATAGGAGCGCTGGAGACACGGG
    319 GGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCC
    320 CTCTGCTCGCGTGCTGCTCTGAAGTTGTTCCCCGATGCGCCGTAGGAAGCTGGGATTCTCCCA
    321 AGGGAGGTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGA
    322 GCCCCTGGCCCTGACTGCTGGTGCGAGGCAGTGCACGACTCAGCTGGCCG
    323 GGCCGGGTAACGGAGAGGGAGTCGCCAGGAATGTGGCTCTGGGGACTGCCTCGCTCG
    324 GGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGGGAACGACGCGCTGCTTTCATGCCC
    325 CAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGGGAAAGCGCGGACGC
    326 GTCTATCAAAAGTCTTTTCGTTTCCCCCTCCCCCTTTCCCCACCGCCCACCAAAATGAGCCGCG
    327 ATGCCGCCATCGCGGTTCATGCCGTTCTCGTGGTTCACACCGCCCTCAGGG
    328 TCCCGGTCTTCGGATCCGAGCCGGTCCTCGGGAAAGAGCCTGCCACCGCGT
    329 TGAGAGGCTCCGGTAAAGCCGTCCGGCAATGTTCCACCTGGAAAGTTCCAGGGCAGGGGAAGGG
    330 CCCAGGGAGAGGGAGAGGAGGCGGGTGGGAGAGGAGGAGGGTGTATCTCCTTTCGTCGGCCCG
    331 CCCGTCTTCTCTCCCGCAGCTGCCTCAGTCGGCTACTCTCAGCCAACCCC
    332 GACCCCCCTTTGGCCCCCTACCCTGCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCA
    333 CCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCTCAC
    334 CGCGCGGCCTTCCCCCTGCGAGGATCGCCATTGGCCCGGGTTGGCTTTGGAAAGCGG
    335 CCACCCAGTTCAACGTTCCACGAACCCCCAGAACCAGCCCTCATCAACAGGCAGCAAGAAGGGCC
    336 AAGCAGCTGTGTAATCCGCTGGATGCGGACCAGGGCGCTCCCCATTCCCGTCGGGAG
    337 CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT
    338 CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT
    339 CCCTCCACCGGAAGTGAAACCGAAACGGAGCTGAGCGCCTGACTGAGGCCGAACCCCC
    340 TTGTCCCTTTTTCGTTTGCTCATCCTTTTTGGCGCTAACTCTTAGGCAGCCAGCCCAGCAGCCCG
    341 TTCTCAGGCCTATGCCGGAGCCTCGAGGGCTGGAGAGCGGGAAGACAGGCAGTGCTCGG
    342 CAGCGTTTCCTGTGGCCTCTGGGACCTCTTGGCCAGGGACAAGGACCCGTGACTTCCTTGCTTGC
    343 AGGCAGGCCCGCAAGCCGTGTGAGCCGTCGCAGCCGTGGCATCGTTGAGGAGTGCTGTTT
    344 GACTCTGGGTATGTTCTCGAAAGTTGTTACAACCCCAACCCAGGGTTGACCTCAAACACAGGAGG
    345 CTCTGGCTCTCCTGCTCCATCGCGCTCCTCCGCGCCCTTGCCACCTCCAACGCCCGT
    346 CGGGAGCGCGGCTGTTCCTGGTAGGGCCGTGTCAGGTGACGGATGTAGCTAGGGGGCG
    347 CCCCAAGCCGCAGAAGGACGACGGGAGGGTAATGAAGCTGAGCCCAGGTCTCCTAGGAAGGAGA
    348 GGGCTCTTCCGCCAGCACCGGAGGAAGAAAGAGGAGGGGCTGGCTGGTCACCAGAGGGTG
    349 TTCTCTTCCATCCCATCCTCCCTTCTGGTCCTCCTTTCCACAGTGGGAGTCCGTGCTCCTGCTCC
    350 CCGCCTCTGTGCCTCCGCCAACCCGACAACGCTTGCTCCCACCCCGATCCCCGCACC
    351 CCGCGCCACGTGAGGGCGGCAAGAGGGCACTGGCCCTGCGGCGAGGCCCCAGCGAGG
    352 CACTGCTGATAGGTGCAGGCAGGACAGTCCCTCCACCGCGGCTCGGGGCGTCCTGATT
    353 CGGGAGCCTCGCGGACGTGACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGAC
    354 TGTCCTCCCGGTGTCCCGCTTCTCCGCGCCCCAGCCGCCGGCTGCCAGCTTTTCGGG
    355 GGTGTCGCGACAGGTCCTATTGCGGGTGTCTGCGGTGGGAAGGGCGGTGGTGACTGG
    356 ACATATGACAACGCCTGCCATATTGTCCCTGCGGCAAAACCCAACACGAAAAGCACACAGCA
    357 GGAAACCCTCACCCAGGAGATACACAGGAGCACTGGCTTTGGCAGCAGCTCACAATGAGAAAGA
    358 TTACCATTGGCTTAGGGAAAGGAGCTTACTGGGAACTGGGAGCTAGGTGGCCTGAGGAGACTGGG
    359 AAGAACAGGCACGCGTGCTGGCAGAAACCCCCGGTATGACCGTGAAAACGGCCCGCC
    360 ccggggactccagggcgcccctctgcggccgacgcccggggtgcagcggccgccggggctggggccg-
    gcgggagtccgcgggaccctccagaagagcggccggcgccgtgactca
    361 TCACGGGGGCGGGGAGACGC
    362 GCACAGGGTGGGGCAGGGAGCA
    363 accgggccttccgcgcccct
    364 TCCCACCTCCCCCAACATTCCAGTTCCT
    365 TCACAGAGCCAGGCAAGCATGGGTGA
    366 ggagcagcaggctcgctcgggga
    367 gcccaaagtgcggggccaaccc
    368 CGGAAAGAGGAAGGCATTTGCTGGGCAAT
    369 CCAGCGGCCCCGCGGGATTT
    370 ccgacagcgcccggcccaga
    371 TGGGCCAATCCCCGCGGCTG
    372 GGGCGGCTGCGGGGAGCGAT
    373 CGCCAGGACCGCGCACAGCA
    374 GCGGGCAAGAGAGCGCGggag
    375 AGCGCGCAGCCAGGGGCGAC
    376 CGTGCGCTCACCCAGCCGCAG
    377 TGAGGGCCCGGGGTGGGGCT
    378 ATATGCgcccggcgcggtgg
    379 CCGCAGGGGAAGGCCGGGGA
    380 TCCTGAGGCGGGGCCGTCCG
    381 GGAGGCCGGGGACGCCGAGA
    382 GCCGCCGGCTCCCCCGTATG
    383 GCAGGAGCGACGCGCGCCAA
    384 cgggggaaacgcaggcgtcgg
    385 ccccccaccctggacccgcag
    386 CGCCCGGCTTTCCGGCGCAC
    387 ccgctgggccgccccTTGCT
    388 CGCTTCTCCATAGCTCGCCACACACACAC
    389 TCCGCGCACGCGCAAGTCCA
    390 CGTCTCAACTCACCGCCGCCACCG
    391 GACAAATGCGCTGCTCGGAGAGACTGCC
    392 TGCGCCTGCGCAGTGCAGCTTAGTG
    393 gaagtcaagggctttcaacctcccctgcc
    394 tggatcccgcacaggggctgca
    395 GCCGCCTGTGGTTTTCCGCGCAT
    396 Gcgcgctctcccgcgcctct
    397 TTCCGGCCCAGCCCCAACCC
    398 TCCGGGTCAGGCGCACAGGGC
    399 GGGGGCGGTGCCTGCGCCATA
    400 GGCGCGGGCCCTCAGGTTCTCC
    401 gcgtccgcggcTCCTCAGCG
    402 GGGAGGCGCCCAGCGAGCCA
    403 GCGCGCAGGGGGCCTTATACAAAGTCG
    404 CCCCCACCCCCTTTCTTTCTGGGTTTTG
    405 CGCGCGTTCCCTCCCGTCCG
    406 gccggcggAGGCAGCCGTTC
    407 TGCCTGGTGCCCCGAGCGAGC
    408 CGGCGGCGGCGCTACCTGGA
    409 GTGGTGGCCAGCGGGGAGCG
    410 GGCGGCACTGAACTCGCGGCAA
    411 CCTCGGCGATCCCCGGCCTGA
    412 ACGCAGGGAGCGCGCGGAGG
    413 TGAAATACTCCCCCACAGTTTTCATGTG
    414 TCCGGGCGCACGGGGAGCTG
    415 ggcggcggcgTCCAGCCAGA
    416 AGGGTCGCCGAGGCCGTGCG
    417 CCGCGCCTGATGCACGTGGG
    418 gccgggagcgggcggaggaa
    419 AGGGGCGCACCGGGCTGGCT
    420 TGCCACGGGAGGAGGCGGGAA
    421 cgggcatcggcgcgggatga
    422 acaccgccggcgcccaccac
    423 CCCCCAACAGCGCGCAGCGA
    424 GCCCCGCTGGGGACCTGGGA
    425 TCCCGGGGGACCCACTCGAGGC
    426 GCCCGCGGAGGGGCACACCA
    427 GGCCCACGTGCTCGCGCCAA
    428 CGGCGGAGCGGCGAGGAGGA
    429 GCCTCGCCGGTTCCCGGGTG
    430 gcaggcgcgccgATGGCGTT
    431 CCTCCCGGCTTCTGCATCGAGGGC
    432 GCGGTCCGCGAGTGGGAGCG
    433 AGCAGCGCCGCCTCCCACCC
    434 CCGACCGTGCTGGCGGCGAC
    435 TCCCGGGCTCCGCTCGCCAA
    436 GCATGGGGTGCTCATCTTCCCGGAGC
    437 CCCGAGAGCCGGAGCGGGGA
    438 GCCGCTGCAGGGCGTCTGGG
    439 gcgctgccccaagctggcttcc
    440 TCAGGATGCCAGCGTGACGGAAGCAA
    441 GGGCGGTGCCATCGCGTCCA
    442 GGTGGGTCGCCGCCGGGAGA
    443 AGGCGGAGGGCCACGCAGGG
    444 GGTCCGGGGGCGCCGCTGAT
    445 GCGGCCTGCGGCTCGGTTCC
    446 CGGGAACCGTGGCGGCCCCT
    447 gcggggaaggcggggaaggc
    448 gcctcccggtttcaggcc
    449 CAGCCCGCGCACCGACCAGC
    450 CCCCCAGCCACACCAGACGTGGG
    451 tgggcttcctgccccatggttccct
    452 TCCGCGCTGGGCCGCAGCTTT
    453 gcatggcccggtggcctgca
    454 TGGGCAGGGGAGGGGAGTGCTTGA
    455 TCCCCGGCGCCTTCCTCCTCC
    456 TCCACCGCGCTTCCCGGCTATGC
    457 CCCGCATCTGACCGCAGGACCCC
    458 TGCGGACACGTGCTTTTCCCGCAT
    459 GGAGCTGGAAGAGTTTGTGAGGGCGGTCC
    460 CGGCCGCCAACGACGCCAGA
    461 AGCGCCCGGTCAGCCCGCAG
    462 TCCCGCCAGGCCCAGCCCCT
    463 CCGATTCTTCCCAGCAGATGGCCCCAA
    464 ACGCACACCGCCCCCAAGCG
    465 TAGGCCCCGAGGCCGGAGCG
    466 GGGGTTCGCGCGAGCGCTTTG
    467 GCCAGTCTCCCGCCCCCTGAGCA
    468 TGAGGAGGCAGCGGACCGGGGA
    469 GCCGGCTCCACGGACCCACG
    470 GCCGCCACCGCCACCATGCC
    471 TTGAGTAAGGATGATACCGAGAGGGAAGA
    472 tgggccaggcacggtggctca
    473 CCCGGCGAAGTGGGCGGCTC
    474 GGCGGCCTTACCCTGCCGCGAG
    475 ggtggggccggcgAGGGTCA
    476 TCGGCGCGGACCGGCTCCTCTA
    477 GGCCCATGCGGCCCCGTCAC
    478 TGGGATTGCCAGGGGCTGACCG
    479 CGCCGGAGCACGCGGCTACTCA
    480 CCCTCGGCGCCGGCCCGTTA
    481 GCACAGCGGCGGCGAGTGGG
    482 TCACCTCGGGCGGGGCGGAC
    483 GAGACGGGGCCGGGCGCAGA
    484 CGCATTCGGGCCGCAAGCTCC
    485 GGCCCGAAAGGGCCGGAGCG
    486 ACGGCGGCCGGGTGACCGAC
    487 TCCACCGGCGGCCGCTCACC
    488 GCGGTCAGGGACCCCCTTCCCC
    489 CGGCCGAAGCTGCCGCCCCT
    490 GGCGGCCTTGTGCCGCTGGG
    491 TCGCGGGAGGAGCGGCGAGG
    492 TGCCCACCAGAAGCccatcaccacc
    493 TGGGCCATGTGCCCCACCCC
    494 CCCGCCAGCCCAGGGCGAGA
    495 gccccctgtccctttcccgggact
    496 GGTGGGGGTCCGCACCCAGCAAT
    497 ggggcccccgggTTGCGTGA
    498 TGCCTGCACAGACGACAGCACCCC
    499 AGGCCGCGCCGGGCTCAGGT
    500 CGGGGTAGTCGCGCAGGTGTCGG
    501 tgcaggcggagaatagcagcctccctc
    502 ccggaaatgctgctgcaagaggca
    503 gcgtcggatccctgagaacttcgaagcca
    504 CCCGGCTCCGCGGGTTCCGT
    505 GCGTCGCCGGGGCTGGACGTT
    506 GGGGCCTGCCGCCTCGTCCA
    507 CGCACACCGCTGGCGGACACC
    508 CGCAAACCATCTTCCCCGACGCCTT
    509 GGGCCCTCCGCCGCCTCCAA
    510 CCACCACCGTGGCAAAGCGTCCC
    511 TCACAGCCCCTTCCTGCCCGAACA
    512 TGCTTGATGCTCACCACTGTTCTTGCTGC
    513 ggccaggcccggtggctcaca
    514 TGCGGGACGGGTGGCGGGAA
    515 gGCTTGGCCCCGCCACCCAG
    516 GGCGGGGAAGGCGACCGCAG
    517 ggcgcccaaccaccacgcc
    518 GAAAAGCCCCGGCCGGCCTCC
    519 CCGCAGGTGCGGGGGAGCGT
    520 CCCCGCCCACAGCGCGGAGTT
    521 AGCAGGGGCCCGGGGGCGAT
    522 CCATGACCGCGGTGGCTTGTGGG
    523 GGCAGGTGCTCAGCGGGCAGACG
    524 GGGTGCGCCCTGCGCTGGCT
    525 GAATTTGGTCCTCCTGCGCCTGCCA
    526 TGGCTTCCGCGGCGCCAATC
    527 GGCCAGGAGAGGGGCCGAGCCT
    528 cgagcgccggccccccttct
    529 CGGTTGCGAGGGCACCCTTTGGC
    530 tacccggacgcggtggcg
    531 GCGCCGCCGAGCCTCAGCCA
    532 tgcagcctcaacctcctgggg
    533 CCTTGCCGACCCAGCCTCGATCCC
    534 GGCGGCGTTCGGTGGTGTCCC
    535 CCCGGACTCCCCCGCGCAGA
    536 cggccccctgcaagttccgc
    537 TGCCCAGGGGAGCCCTCCA
    538 GCCGGCTGCAGGCCCTCACTGGT
    539 TGTCACACCTGCCGATGAAACTCCTGCG
    540 CCCCTGCGCACCCCTACCAGGCA
    541 TCCTGGGGGAGCGCGGTGGG
    542 AGTGGGGCCGGGCGAGTGCG
    543 GCGTCCAGGCTGTGCGctcccc
    544 GGCGCGGCGGTGCAGCCTCT
    545 gaggcggcggcggtggcagt
    546 CGCGCGACCCGCCGATTGTG
    547 CCGCGGACGCCGCTCTGCAC
    548 tgaacccgggaggcggaggttgc
    549 TCTCGGCGGCGCGGGGAGTC
    550 aggcggccacgggaggggga
    551 GGACCCGAGCGGGGCGGAGA
    552 AAGCACCTggggcggggcggag
    553 GCCGCTCGGGGGACGTGGGA
    554 CACCGCCAGCGTGCCAGCCC
    555 TATTCTTggccgggtgcggt
    556 CCGCTTCCCGCGAGCGAGCC
    557 CAGCCGGCGCTCCGCACCTG
    558 GCGGAGCGCGCTTGGCCTCA
    559 ggcctcgagcccacccagacttggc
    560 TGCCGCGCCGTAAGGGCCACC
    561 ACGGCGGTGGCGGTGGGTCG
    562 AACCTGCCCAGTTACTGCCCCACTCCG
    563 TCCAGCGCCCGAGCCGTCCA
    564 GCTGCTGCTGCCCGCGTCCG
    565 CACTGCTTAGGCCACACGATCCCCCAA
    566 GGCCGGACGCGCCTCCCAAG
    567 TCGGCCAGGGTGCCGAGGGC
    568 tccgcccgcccCACAGCCAG
    569 CGCGCCCCAGCCCACCCACT
    570 ccgtgctgggcgcaggggaa
    571 TGCGCACGCGCACAGCCTCC
    572 CGGTGAGTGCGGCCCGGGGA
    573 TGGCCGAGAGGGAGCCCCACACC
    574 CCCAGCGCCGCAACGCCCAG
    575 GCCACAAGCGGGCGGGACGG
    576 TCCTCTGGACAACGGGGAGCGGGAA
    577 CGCGGGTTCCCGGCGTCTCC
    578 GCGCCGCCCGTCCTGCTTGC
    579 ACGCGCGGCCCTCCTGCACC
    580 GGGCGGGGCAAGCCCTCACCTG
    581 GGGAGCGCCCCCTGGCGGTT
    582 GCGAATGGTTCGCGCCGGCCT
    583 TTTCCGCCGGCTGGGCCCTC
    584 TCTCCGGGTcccccgcgtgc
    585 GCAGCCCGGGTAGGGTTCACCGAAA
    586 GGGCGGAGAGAGGTCCTGCCCAGC
    587 CCCTCACCCCAGCCGCGACCCTT
    588 GCGATGACGGGATCCGAGAGAAAGGCA
    589 TCCGCAGGCCGCGGGAAAGG
    590 GGCCCCAGTCCACCTCTGGGAGCG
    591 GCTTGGCCGCCCCCGGGATG
    592 CCCTCCATGCGCAATCCCAAGGGC
    593 gcggcgactgcgctgcccct
    594 TGGGCTTGCCTCCCCGCCCCT
    595 GGCGGCCCAAGGAGGGCGAA
    596 gctgcgcggcTGGCGATCCA
    597 TCACCGCCTCCGGACCCCTCCC
    598 CCCTTCCAGCCACCCCGCCCTG
    599 GCGGGACACCGGGAGGACAGCG
    600 CCCTGGGTTCCCGGCTTCTCAGCCA
    601 TGGCGGTGATGGGCggaggagg
    602 CCAGCCCGCCCGGAGCCCAT
    603 TGCCCGCGGGGGAATCGCAG
    604 TGCCGCGAGCCCGTCTGCTCC
    605 TGCGGCCCCCTCCCGGCTGA
    606 GCAGCAGGGCGCGGCTTCCC
    607 GCCGCAGCACGCTCGGACGG
    608 TGCGGAGTGCGGGTCGGGAAGC
    609 GGCGCGGGGGCAGGTGAGCA
    610 ggcgcgggggcaggtgagcat
    611 CAGTGACGGGCGGTGGGCCTG
    612 CGGCGACCCTTTGGCCGCTGG
    613 CCGCGGCAGCCCGGGTGAA
    614 GGGCGAGCGAGCGGGACCGA
    615 TGGGGCAGTGCCGGTGTGCTG
    616 TCGCTGGCATTCGGGCCCCCT
    617 GGAGCCGTGATGGAGCCGGGAGG
    618 TGCCAGGGTGTCTTGGCTCTGGCCT
    619 CCGGCTCCGGCGGGGAAGGA
    620 GGCCAGGGTGCCGTCGCGCTT
    621 TCGGCTCGGTCCTGAGGAGAAGGACTCA
    622 GCGCGGGGAACCTGCGGCTG
    623 GCCGCCGCTGCTTTGGGTGGG
    624 CACCTGAGCCCGCGGGGGAAcc
    625 GAACGCCGGCCTCACCGGCA
    626 CCCGTGGTCCCAGCGCTCCTGCT
    627 GTGCGACCCGGCGCCCAAGC
    628 TGGCTCTGCGCTGCCTTTGGTGGC
    629 cgcgcgggcggcTCCTTTGT
    630 TGGCCCGTTGGCGAGGTTAGAGCG
    631 gacccggcatccgggcaggc
    632 GCCCGGACTGTAATCACGTCCACTGGGA
    633 CCGCCGCCAACGCGCAGGTC
    634 CGCTGCCAGCTGCCGCTCCG
    635 AGCGCCCACCTGCGCCTCGC
    636 GCGGGCCAGGGCGGCATGAA
    637 GGCTGCGACCTGGGGTCCGACG
    638 GGTTAGGAGGGCGGGGCGCGTG
    639 CAGCGCACCAACGCAGGCGAGG
    640 TCGGCTGGCCCCGCCCACTC
    641 CGGGGTTGCCGTCGCAGCCA
    642 TCCGCACTCCCGCCCGGTTCC
    643 ggaccccctgggcagcaccctg
    644 cgaggcagccggatcacg
    645 GGCGCGTGCGGGCGTTGTCC
    646 CCAGGATGCGGCAGCGCCCAC
    647 cgATGCGGCCCGCGGAGGAG
    648 CGTTCTGCGCGCGCCCGACTC
    649 CCCCGCCGTGGGCGTAGTAAccg
    650 AACCCGCCCGGGCAGCTCCA
    651 GCAGCGGTCGCGCCTCGTCG
    652 CGCAATCGCGCTGTCTCTGAAAGGGG
    653 GGAGCGCCCGCCGTTGATGCC
    654 CCATGGCCCGCTGCGCCCTC
    655 TGGGGGCGGGGTGCAGGGGT
    656 CCGACCCTGCGCCCGGCAGT
    657 CGGCTTCAAGTCCACGGCCCTGTGATG
    658 ACCCCACCTGCCCGCGCTGC
    659 ggcgcgcggagacgcagcag
    660 CGTGAGCCGGCGCTCCTGATGC
    661 CTGCCGCGGGGGTGCCAAGG
    662 CCTGCTGCGCGCGCTGGCTC
    663 CCTGGCGGCCCAGGTCGCTCCT
    664 GAGCGCCCCGGCCGCCTGAT
    665 CGCCGCACGGGACAGCCAGG
    666 GCCCGGACATGCCCCGCCAC
    667 cgggggccgccgcctgactt
    668 CCAGTGGCGGCCCTCGGCCT
    669 CGCCCGGCGCGGATAACGGTC
    670 TGCTCCGGGTGGGGAGGGAGGC
    671 TGCCTGGGCGCAGAACGGGGTC
    672 GGGTCCTAATCCCCAGGCTGCGCTGA
    673 TCCGCGTCCCCGGCTGCTCC
    674 GGGCAGGGCTGACGTTGGGAGCG
    675 GCCGTGGGCGCAGGGGCTGT
    676 cctgcgcacgcgggaagggc
    677 CGCGGACGCAGCCGAGCTCAA
    678 CGACCCATGGCGGGGCAGGC
    679 tccgctccccgcccctggct
    680 tgtgccgcgcggttgggagg
    681 TCACTCACGCTCTCAGCCCGGGGA
    682 CGGCAAGCGGGCTTCGGGAAGAA
    683 CCCCGCGGGCCGGGTGAGAA
    684 CGGCGGCGGCTGGAGAGCGA
    685 CGGGCCCCGGGACTCGGCTT
    686 GACGGAATGTGGGGTGCGGGCCT
    687 TGCGGCTGCTGCCGAGGCTCC
    688 ACCGCTGCGCGAGGGAgggg
    689 GGGGGTGCGGCGTCTGGTCAGC
    690 GGCCGGGGGAAATGCGGCCT
    691 tgcctggtaggactgacggctgcctttg
    692 AGCGCGGGCGCCTCGATCTCC
    693 TCCCGGCTGGTCGGCGCTCCT
    694 CCGGGGCTGGGACGGCGCTT
    695 GGGCGGGGTGGGGCTGGAGC
    696 GTGCGGTTGGGCGGGGCCCT
    697 GGCGGTGCCTCCGGGGCTCA
    698 GGCGGTGCCTCCGGGGCTCA
    699 CGGGAGCCCGCCCCCGAGAG
    700 TCCTGCCATCCGCGCCTTTGCA
    701 AGGCACAGGGGCAGCTCCGGCAC
    702 CGACCCCTCCGACCGTGCTTCCG
    703 CCCGCAGGGTGGCTGCGTCC
    704 GCGTCTGCCGGCCCCTCCCC
    705 TAGGCCGCCGGGCAGCCACC
    706 GGGGAGCGGGGACGCGAGCA
    707 GCCGGCTGGCTCCCCACTCTGC
    708 TCGCTCACGGCGTCCCCTTGCC
    709 TCCCCGCTGCCCTGGCGCTC
    710 GGCCAGAGGCAGGCCCGCAGC
    711 TGCCCGGGTCATCGGACGGGAG
    712 CCCAGTGCGCACGGCGAGGC
    713 AGCGTCCCAGCCCGCGCACC
    714 TGCTCCCCCGGGTCGGAGCC
    715 CGCTCGCATTGGGGCGCGTC
    716 TGCGGCAAGCCCGCCATGATG
    717 TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTG
    718 GGACCATGAGTGTTTCCATGCTTGGCATCAGA
    719 tcagccactgcttcgcaggctgacg
    720 cggccagctgcgcggcgact
    721 TCGGAGAAGCGCGAGGGGTCCA
    722 GCCGGGTGGGGGCTGCCTTG
    723 tcctcgcccggcgcgattgg
    724 GGCCGTGCAGTTGGTCCCCTGGC
    725 GCGAGCCTGCTGCTCCTCTGGCACC
    726 gccagagctgtgcaggctcggcattt
    727 tgcccagcaaatgccttcctctttccg
    728 TGGCCTGACCACCAATGCAGGGGA
    729 TCCACCTGGGCTTCTGGGCAGGGA
    730 agctggcctgcgccccgctg
    731 AGCCGCGGCAGCGCCAGTCC
    732 GGGGCCGGGCCGCTCAGTCTCT
    733 GCAGTGAGCGTCAGGAGCACGTCCAGG
    734 cccgATCCCCCGGCGCGAAT
    735 GGCGTGACCGTGGCGCGGAA
    736 AGCGGCCCGCAGAGCTCCACCC
    737 GGCAGGCGGGCGCAGGGAAG
    738 tctgccccgggttcacgccat
    739 CGGGCGGGCCCTGGCGAGTA
    740 GCAAGCCCGCCACCCCAGGGAC
    741 GGCCCAGGCGGATGGGGTTGG
    742 TCCGAGAGGCGTGTGGTAGCGGGAGA
    743 AGGCGGCCGCGGGCGTTAGC
    744 aaggcagcgcgggccaccga
    745 ggcatcctgcccgccgcctg
    746 TGGGGCGGGGTCTCGCCGTC
    747 TCGGGCTCGCGCACCTCCCC
    748 CCAGGTGCGCGCTTCGCTCCC
    749 ACCTGCGCCACCGCCCCACC
    750 GCCGAGCAGAGGGGGCACCTGG
    751 TCGCGCCGCTCTGCGTTGGG
    752 CCGCCGGGGCAGAAGGCGAG
    753 TCcactggacaggggtgggagcctctg
    754 gcccaccggcgctgcgctct
    755 GCGGTGCCAGCCCCGCTGTG
    756 GACCCGCCTGCGTCCTCCAGGG
    757 CCCATCACAGCCGCCCAACCAGC
    758 GAGCggggcggagccgagga
    759 TGCAATTGTGCAGTGGCTGCGTTTGTTTC
    760 CCCGACCGGATGCTCCTTGACTTTGCC
    761 GCGAGCGCGCGCACCGATTG
    762 CACTCCGCCGGCCGCTCCTCA
    763 TCGGGGGTCCCGGCCGAATG
    764 GCTCTCCCAGCTGCACGCCAACTTCTTG
    765 GGAGGAGCCTGGCGCTGGCGAGT
    766 TGGCTCTGGACCGCAGCCGGGTA
    767 ACGGCGGCGTCCCGGGTCAA
    768 TGGCCAAGCGCTGCCACTCGGA
    769 CGCAGGCCGCTGCGGTGGAG
    770 GCGCCTGCGCCATGTCCACCA
    771 TGGTGCCTCCCGCAACCCTTGGC
    772 GCCCGGCTCCAGGCGGGGAA
    773 GCAATGCTGGCTGACCTGGACC
    774 CGCCCGCCCGTCGGGATGAG
    775 TGCCCCCACCATCCCCCACCA
    776 GGCGCGAGCGGCGGGAACTG
    777 GGCGCCGCTCGCGCATGGT
    778 CCCGCTCTGCCCCGTCGCAC
    779 GTAGCGCGGGCGAGCgggga
    780 AGCGCCGAGCAGGGCGCGAA
    781 ggcggcggccacgcaggttc
    782 ccctcccgcacgctgggttgc
    783 TCACGGCCGCATCCGCCACA
    784 CGGCGCCGGCCGCTCTTCTG
    785 CCGGCAGAGAATgggagcgggagg
    786 TCGGCCGGGGCGCCAGGTCT
    787 TGGGGCTGCGGGCGATGCCT
    788 GGCTGCGGGGACCGGGGTGT
    789 CGGCCCAAGCCGCGCCTCAC
    790 ccgcgcccggAACCGCTGCT
    791 TCGGCCGGGAGCGTGGGAGC
    792 TGCAGACATTGGCGCGTTCCTCCA
    793 GGACCCACGCGCCGAGCCCAT
    794 GGAGGGGGCGAGTGAGGGATTAGGTCCG
    795 TCCCCTCACGCCGATGCCACG
    796 CCATGCCCGCCCCAGCTCCTCA
    797 CCGCCGTGATGTTCTGTTCGCCACC
    798 CGTGGCTGCCCCTGCACTCGTCG
    799 tctggccagtccgtgaaggcctctga
    800 CCGGGGTGCAAGGGCCACGC
    801 cgccgcgcTTCCTCCCGACG
    802 AGCGACCCGGGGCGTGAGGC
    803 TGCGGAAACCTATCACCGCTTCCTTTCCA
    804 ggcagggcggggcagggttg
    805 GGGTCTCCAGACTGATGGGCCGGTGA
    806 CGCTGAAGCCGCTGCTGTCGCTGA
    807 TCCCACGCTCCCGCCGAGCC
    808 AAATATgccggacgcggtgg
    809 CGCCTTTCCGCGGCGGGAGC
    810 CCCAGCCCAGGCCGCAGGCA
    811 ccccgcaggggacctcataacccaa
    812 GAGTTGGCTCGGCGTCCCTGGCA
    813 tccctccgcctggtgggtcccc
    814 TGACCCCTGGCACATCAGGAAAGGGC
    815 TGCCCCGCAAGAACGGCCCAG
    816 GGCCTCGGAGTGCGACGCGAGC
    817 GCGCCAACCCAGACCCGCGCTT
    818 TGCAAGCGCGGAGGCTGCGA
    819 AGCCGGGCCACGGGCAGACA
    820 CCCGGGCGGCCACAAAGGGC
    821 CCCCATCCCAGGTGACCGCCCTG
    822 TGACTCTGGGGGAAGCACGCGACG
    823 GGTGCGGCCGAAGCCGTCGC
    824 TGCCCCTCGGGCCCTCGCTG
    825 GGCCACGGGGACCGGGGACA
    826 GGGCGCCGCAGGGCGACAAC
    827 GCAGCGCGCTTTGGGAAGGAAGGC
    828 GGGTTCCACCCGCGCCCACG
    829 TCGCGGCCCAGACCCCCGAC
    830 CGAGACCCGGTGCGCCTGGGAG
    831 aggtgcccgccaccatgc
    832 cgcccaggctggagtgcagtggc
    833 GCCGGCGAGGTCTCCGCGGTCT
    834 CGCAGGGCCACCGGCTCGGA
    835 GCCCCGGAGCATGCGCGAGA
    836 CCCCTGGGGACCCCTGCCATCCTT
    837 TTAccccgcgccgcgccacc
    838 GCGGGCCGAGCCCACCAACC
    839 GCGCGGTGGCCGCTTGGAGG
    840 CCCGCCAGCGGCctgtgcct
    841 CGCGCATGCCAAGCCCGCTG
    842 GGCGCAGGAGCAGTTGGGGTCCA
    843 TGGGGTAGGCGGAACGCCAAGGG
    844 CCCGCTTCACGCCCCCACCG
    845 GCAGCCCGGGTGGGCAAGGC
    846 TGCAGTTGCCCTTGCCCTGCGAC
    847 TGGCCGGGCGCCTCCATCGT
    848 GCCTGCGATGGGCTCGGTGGG
    849 CCGCGGTTCGCATGGCGCTC
    850 TGGGCCATCTCGAGCCGCTGCC
    851 TGGGGGAGTGCGGGTCGGAGC
    852 CTGCCGCGCCCCCAGCACCT
    853 GGCTGCTGGCGGGGCCGTCT
    854 GGGCGCGGCGACTTGGGGGT
    855 aaactgcgactgcgcggcgtgag
    856 TGCTGGGGCCGTGGGGGTGC
    857 TCCGCGCTGCCCGGGTCCTT
    858 GTGGCGGCCCCCGCGGATCT
    859 GGGGAGGCGCCACCGCCGTT
    860 GGAGCGGGAGGGCGCTGGGA
    861 tgaaggctgtcagtcgtggaagtgagaagtgc
    862 ggagaaaatccaattgaaggctgtcagtcgtgg
    863 ggggacaaccggggcggatccc
    864 CCCGGGAGGAGAGGCGAACAGCG
    865 AGTGCGCGGGTGCCGGGTGG
    866 TGGCATCCCCTACCCGGGCCCTA
    867 GAGGCTGGTTCCTTGTCGTCGGTTGGG
    868 GCGGGGTCAGGCCGGGGTCA
    869 GGCAGCGGCTGGAGCGGTGTCA
    870 GCCCGGGCACACGCCCCATC
    871 gcaccgccacgcccactgcc
    872 TGTCATGCTTCTTTCTCCCCACTGACTCA
    873 gcccaggctggggtgcaatggc
    874 CGCCTCGGGGGCCACGGCAT
    875 CGTGGGTCCTGGCCCGGGGA
    876 TCCCCGGGCGGCCATTAGGCA
    877 GGCGGGGGTGGGAGTGATCCC
    878 CGTCAGTCCCGGCTGCGAGTCCA
    879 CCGGGGTCCGCGCCATGCTG
    880 CATGGCGGGGCCCGAGCGAC
    881 CCGCCTCCTTGCCCCGACACCC
    882 TCGGACACGCCTTCGCCTCAGCC
    883 CGAGCTGGGCGCAGGCGCAA
    884 GCGGGGTTGTGTGTGGCGGAGG
    885 accgcgcccggccTGCAAAG
    886 GCGGGGCCAGAGAGGCCGGAA
    887 GCCCCAAGGGAAGATGCAGGGAGGAA
    888 gccccaagggaagatgcagggaggaa
    889 GCCCGCACGTGCACCACCCA
    890 GGGTGACGAAGTGGTGTCTTTACCGAgga
    891 CCGCCGTGCGCCTGTGGGAA
    892 ggctgctgcgggaggatcac
    893 TGGGCATCCAGAAAAATGGTGGTGATGGC
    894 gccgcgccgggccCTATGAG
    895 CCGCCATGCGGGCAGGGACC
    896 TGTTACAggctggacacggtggctc
    897 cggaacttgcagggggccga
    898 TGCAAAATCCTCCCCTTCCCGCACCC
    899 GCGCTGGAGCCACGCGACGA
    900 GGGGTCCGCTCCCGCGTTCG
    901 CGCCCCGGGCTGAGAGCTGGGT
    902 GGCCCTTCGGGGGCCGGGTT
    903 TGGCCACAAAGGGGCCGGAATGG
    904 ACCCCAGCGCGTGGGCGGAG
    905 GGGCTGCGGGGCGCCTTGAC
    906 GCACCGCGGCTGGAGCGGAC
    907 AGGCGATCCCAAGGCTGTTGGAGGC
    908 tccacccgccttggcctccca
    909 cggcgggaaggcggggcaag
    910 ggagccgcggcgtgagtgcg
    911 GGCCGGCACCCCACGCCAAG
    912 GCGGGGCGGAGCGCACACCT
    913 GCGGCCAGCAGCGCGTCCTC
    914 CCGACAGCCGGCAAGGCCCAA
    915 ttgtttttgtttgtttgttttgaaagggag
    916 CCCCGGTTTCCCCGCGCCTC
    917 GGCTGGACGCGCCCTCCGACA
    918 TCCCACGCGCCCGCCCCTAC
    919 cggccacgccttccgcggtg
    920 GGCTCCGCTGGGGCGCAGGT
    921 GCCGCCCCGTGTCGTGCGTC
    922 GGCGTCAGTTGGAGTGTGGGGTCGG
    923 CCGAGCGGGGTGGGCCGGAT
    924 CATCGCGCGGGACCCAACCCA
    925 CAGTGGGTGGATCTCACCTGCCTTCGG
    926 GAGGCCGCGGGGCTCCGACA
    927 GAGCCTGCCCTATAAAATCCGGGGCTCG
    928 TCCCGGCGGGTGGTGCCTGA
    929 TCTGAGCGCCCGCCGCCTGC
    930 GGCTGCCGGCGCGGGACCTA
    931 TCCGGGGCATTCCCTCCGCGAT
    932 TGGCGGCGGCCCCTGCTCGT
    933 cggcgCGCGACTGGGAGGGA
    934 GGCGCCAGCGCAACCAGAGCG
    935 CGAAGGTGGCGCGGCCTGGA
    936 CCCAGCGGGCTTCGCGGGAG
    937 CCCGCTTGCCCCGCCCCCTA
    938 CCCACACCTCCACCTGCTGGTGCCT
    939 ATGCAGCCCCGCCGGCAACG
    940 CCGGATGCCCGGTGTGCCTGG
    941 GCGAGCAGGGACGCAGCTCTGGTG
    942 CGCGCTCGGCCCGCTCAGTG
    943 TGGTGCCGGCAGGGAGGGGA
    944 GGGCGGTGGCGATGGCTGGC
    945 GGCTGTTGGTCTTTTTCCCAGCCCCGAA
    946 CCGGGCCGGCAGCGCAGATGT
    947 CGGAGGGCGATGGGGCCCTG
    948 GGGGCCGGGCTGCGAAGCTG
    949 TGCCTGGGCACCCCACGGACG
    950 GCCCTACGTCCGGGCAGCACGC
    951 CTGTGCGCGTCCCCGCCGTG
    952 TGCAGCGGCGCCTCGGACCC
    953 ccgctgggcgcgctgggaag
    954 GGCGCATGCTCTGCGCGTATTGGC
    955 GGGTGGGCGGGCCGTTCTGAGG
    956 GGGCTGCCGGGTTGGCGCAG
    957 GGCGCGTGCGGAAAAGCTGCG
    958 TCCAGGCCGCCCTCGGGTCA
    959 GGGGAGGGGGCGCAGCCAGA
    960 GGCAGCGTGGTCTTCCACTTCCCCCT
    961 GGGATCGAGGGATCGAGGCAGGGGA
    962 CGGCCATGAGCGCCTCCACGC
    963 CCCGGTGTGCGGCAGCGACG
    964 TTGGGGCGGCCGGAAGCCAG
    965 CGCAGCGGCGGCGTCTCGGT
    966 CCGCGACCTCCCCAAGCCACCC
    967 GGCGGCCGACCGCGAACACC
    968 CCCCATTTCCGAGTCCGGCAGCA
    969 CCCAGCCTGGCCTCTCCTCTCAGGCA
    970 cggctctttcctcctcaagagatgcggtg
    971 CGCCGCCGTCCCTGGTGCAG
    972 TGGGGACCCCTCGCCGCCTG
    973 GCGCCCAGCCCGCCCCAAGA
    974 caggggacgcgggcgtgcag
    975 CCGGGCGGGGCCCAACTGCT
    976 CCCGAGCAGGGCCGGAGCAGA
    977 CCCCTCCACATTCCCGCGGTCCT
    978 TCCTTTGTGGCCTGGGCAGGATGCAG
    979 GCAGCGCGCGGTTTGGGGCT
    980 GAGGCCTGCGGGCGCTGCTG
    981 TCACGGTTGCTGGGCCGTCGC
    982 CGGGGTGGGCCTCGCGGAGA
    983 GCCTGCGCTCCTGGCGCCCT
    984 CGCCTTCGGAGAGCAGAGTCAACACGGA
    985 TGCCCCTAAATGAGAAAGGGCCCTTGAG
    986 GCCACGCCCCGGGACCGGAA
    987 TCCCGCCCAGGGGCCTCCCA
    988 ccccgcgcccggccAAAGAA
    989 GGACCGCCGCACAGCCCCAA
    990 GGGCAGCGGTGGCCGTGCAT
    991 TTCCTGCGCCGCCCCCTCCC
    992 GGCGTCTCCCTGTCCCCGCCTG
    993 GCCGGCCTCGCGCACCGTGT
    994 CCCGGGACGTGCGCGCTTGG
    995 TGTCCCCCGAGCCGCCCTGC
    996 TCGCTCTCGTGCAGCGGCGTCA
    997 CCCGCGCGCTGCAGCATCTCC
    998 CCCCAGCTGCCGCCATCGCA
    999 GCCCGGGCCCGCCTCAAGGA
    1000 TGCCGGCGAGGCCTTTTCTCGG
    1001 GGCGGGTGGGGAGCGCGAAC
    1002 CCCGCCGCCGCTGGTCACCT
    1003 ccggctgcctcggcctccca
    1004 ggtgtgcaccaccacgcc
    1005 GGCGCGTCCCGGCGGCTTCT
    1006 AGTCCCTGCGCCCCGCCCTG
    1007 TGCCCCCAAACTTTCCGCCTGCAC
    1008 CTTGCGGCCACCCGGCGAGC
    1009 TCGCGCGGAAACTCTGGCTCGG
    1010 GCTGCGGCCCAGAGGGGGTGA
    1011 CGGCGGGCTTGGGTCCCGTG
    1012 TCCCCCGCCGCACCAGCACC
    1013 GCGCGGTGCGGGGACCTGCT
    1014 GCCGGACGCTCGCCCCGCAT
    1015 GAGTGCTCTGCAGCCCCGACATGGG
    1016 CCGCGCAGACGTCGGAGCCCAA
    1017 TGGCCGAGGCGCGTGGCGAG
    1018 GGCCGCGCTGCCCCAGGGAT
    1019 CCGGGGGCGGACGCAGAGGA
    1020 GGGGGCGGAGCCTGGGAATGGG
    1021 GGGCGGGCCCTGTGGGTGGA
    1022 CCGCTCCCCCATCTCCACGGACG
    1023 GACCCAGGGAGGCGCGGGGA
    1024 TGCCCGGCCGCAGGTGACCA
    1025 GCGCCGGGAGTGGGCAGGGA
    1026 ACCCAGGCCGGCGCGGGAAG
    1027 ttcccgccgcccggtcctca
    1028 CGCGCCGGTGACGGACGTGG
    1029 AACCCTCCCAGCCAAAACGGGCTCA
    1030 CGGGCGAGGCCGCCCTTTGG
    1031 GGCCGCGGACGCCCAGGAAA
    1032 CCGTTTGGAACGTGGCCCAAGAGGC
    1033 CCCGCCTCCGCTCCCCGCTT
    1034 ggtggcggcggcagaggagga
    1035 CGCGGGGAGCAGAGGCGGTG
    1036 gggcgcccgcgctgagggt
    1037 GGGCCTGGCCTCCCGGCGAT
    1038 CACCCGGCGTCCGCACCAGC
    1039 CGGCGCTGGTTTGGCGGCCT
    1040 ccaggagccccggaggccacg
    1041 GCGATCTCCTGCCCAGGTGTGTGCTC
    1042 ACTGCCCGGGCTCGCCGCAC
    1043 TGCGGCAACGGTGGCACCCC
    1044 GGAGCGAAGCTGGCGGAACCCACC
    1045 GGCGGCCGACGGGGCTTTGC
    1046 GGCCGCGGGTGCCTCGGTCT
    1047 GCGCTCCAGCCATGGCGCGTT
    1048 GCCGGACGGGCGTGGGGAGA
    1049 TCCCCCGCGACTGCCCCTCC
    1050 GGGTGGCAGCGGGTGCGGAA
    1051 gctcgcccgctcgcagccaa
    1052 CGAGGTTCCGCAGCCCGAGCCA
    1053 GCGCGGGGGACCGAAACCGTG
    1054 GCCGAGCCCGGCCCAAAGCC
    1055 TGCCAACGTTCACCCGGCTGGC
    1056 GACAGTGCGAGGGAAAACCACCTTCCCC
    1057 GGGTCGGGCCGGGCTGGAGC
    1058 GGGTCGGGCCGGGCTGGAGC
    1059 GCGGGGCCGAGGGGCTGAGC
    1060 GCCCGGCCACCTCGGGGAGC
    1061 ACTGTCTGCCAAGCCAGCCCCAGGG
    1062 GGATGGTGGCGCCGGGCTGC
    1063 TCCAGGAGGGCCAGGTCACAGCTGC
    1064 CGGCTGGCTCGCTTGGCTGGC
    1065 TCCGGCGCTGTTGGGCAGCC
    1066 CCTGCGCACGCGGGAAGGGC
    1067 TCTTCCCTTCTTTCCCACGCTGCTCCG
    1068 CAGCGCCCCCGCCTCCAGCA
    1069 GCTGCGCGGCTGGCGATCCA
    1070 GCCGACGACCGGAGGGCCCACT
    1071 TGCCCAGGCTGGCCCCTCGG
    1072 CGCGGCCCTCCCCAGCCCTC
    1073 CCCCGCCCGGCAACTGAGCG
    1074 AAGAGCCCGCGCGCCGAGCC
    1075 TGCCCACTGCGGTTACCCCGCAT
    1076 GCATGGTGGTGGACATGTGCGGTCA
    1077 CATAGAAGAGGAAGGCAAAGGCTGTGACAGGCA
    1078 TCATCCTAGACTTGCAGTCAAGATGCCTGCCC
    1079 agccagcggtgccggtgccc
    1080 gccccgctccgccccagtgc
    1081 CACGGGGGCGGGGAGACGCGGGGTGCACTTCTCGCCCCGAGGGCCTCCGGCGAAGCAACCCGGCAGC-
    CGCGGCGCCCGAGGGCCTGGCGCTGGTCTGGGGCTGCGCCGGGGGCGCCTGGCTCTGGGGTGCGGCCGGTCAG-
    GAATCCCCATCCTGGAGCGCAGGCGGAGAGCCAGTGGCTGGGGGCGGGAAGGCTTCTTGGACCCCTCGCGCTTC
    TCCGA
    1082 CACAGGGTGGGGCAGGGAGCATCAGGGGGCAGGCAGCCACACCCCCGACACATCAAGACACCTGAGT-
    GGCAGGTTCAAGCCGGAGGCGCTGTATTTCCACACAGGAAGAAGGCCAAAAAAGGTGACACTGC-
    CCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGTCCGGGCCGCCTCACTCAAAGCCTTGCCCTCCGCTGC
    TGCCAGGCTCCTTGCATGCAAGGCAGCCCCCACCCGGC
    1083 accgggccttccgcgcccctcgccccacgccgcgggtgcggtcctccctccagcagagggttccgg-
    gcgccggcgcggcccgcacggggccgggagcccttcctgccggccgggtgcgcgcggcgccgccgacagct-
    gtttgccatcggcgccgctcccgcccgcgtcccggtgcgcgccccgcccccgccaacaaccgccgctctgattg
    gcccggcgcttgtctcttctctccccgcagccaatcgcgccggg
    1084 CCCACCTCCCCCAACATTCCAGTTCCTTCTTTTCCTTCTACTCTTCAGCGGCCTCAGCCTGCGCAC-
    CCCAGGAGCGTGGATGACTACGGCCACCCCGGGCGCGCACCCCTTTCCCACCACCCCAGCATCTCTGCAGC-
    CCAGGACACCCGCCTCCCCCACACCCCGCATCCGGTGTGTCTCCGCCTGGCCCGGCCGGCGCGGCAGGCGGGCC
    AGGGGACCAACTGCACGGCC
    1085 CACAGAGCCAGGCAAGCATGGGTGAGAGCTCAGACCATCCTTGTTGGACTAAAAGGAAGGG-
    GCAGACTGCCCATGGGGGGCAGCCGAGAGGGTCAGGCCCCCATAGGTCCTCAGCCTGCTTCAACCTCAAAGGG-
    GATGGGGGGCTGAGTGGTGCCAGAGGAGCAGCAGGCTCGC
    1086 ggagcagcaggctcgctcggggagagtagggccttaggatagaagggaaatgaactaaacaac-
    cagcttcctcccaaaccagtttcaggccagggctgggaatttcacaaaaaagcagaaggcgctctgtgaa-
    catttcctgccccgccccagcccccttcctggcagcattaccacactgctcacctgtgaagcaatcttccggag
    acagggccaaagggccaagtgccccagtcaggagctgcctataaatgc
    1087 gcccaaagtgcggggccaacccagacagtcccacttaccaggtcttctgaaagacagctgacaa-
    gagacatgcagggctgagaggcagctcctttttatagcggttaggcttggccagctgcccacagcttcaggc-
    catcagagacagcttctccctgccagagttgctacagtctctggtttctcaaccaggtgaatgtggcaatcact
    gtgcagaatgaaaattttgggtggggaggtaggagaagcggaaag
    1088 GGAAAGAGGAAGGCATTTGCTGGGCAATAGTGCCCAGAAGGAAAAAGCAGGTAGGGGG-
    GCTCTTTTTCTGGGCTGCTGGCATCCACTTGCTTGATCCAGCCAGATTCCCACTCCCATGCCCTCTCCACTAT-
    TGCGATTGCTAATCCCCTGCATTGGTGGTCAGGCCA
    1089 CAGCGGCCCCGCGGGATTTTGCCCAGCTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGT-
    GCCGGCCTCCTGTCATCCTCCCTTTCTTGTCCGCCAGACCCTCTGGCGCCCTGCTTACGACTCAAACAG-
    GAGACAGTGCTGATTCATTTCCAAGCGGCCTTCCTACACCCACACCTGCTTCACATAGATGAGGTTTCCCGGAC
    AGTCCCTGCCCAGAAGCCCAGGTGGA
    1090 ccgacagcgcccggcccagatccccacgcctgccaggagcaagccgagagccagccggccg-
    gcgcactccgactccgagcagtctctgtccttcgacccgagccccgcgccctttccgggacccctgc-
    cccgcgggcagcgctgccaacctgccggccatggagaccccgtcccagcggcgcgccacccgcagcggggcgca
    ggccagct
    1091 GGGCCAATCCCCGCGGCTGGGCAGAGCGACCCGAGGGCGGCGCCCTGCAGACCACGTGGCCCGGGAG-
    GCGCCGAGGCCAGGTAGGTGGTGAGTTACTTGGCTCGGAGCGGGCGAGGGGACGCGTGGGCGGAGCGGGGCTG-
    GCCAGCCTCGGCCCCCATGACCCGCTGTCCTGTGCCCTTTCCCAGCGATGGGCGTGCAGCCCCCCAACTTCTCC
    TGGGTGCTTCCGGGCCGGCTGGCGGGACTGGCGCTGCCGCG
    1092 GGCGGCTGCGGGGAGCGATTTTCCAGCCCGGTTTGTGCTCTGTGTGTTTGTCTGCCTCTGGAGGGCT-
    GGGTCCTCCTTATTCACAGGTGAGTCACACCCTGAAACACAGGCTCTCTTCCTGTCAGGACTGAGTCAG-
    GTAGAAGAGTCGATAAAACCACCTGATCAAGGAAAAGGAAGGCACAGCGGAGCGCAGAGTGAGAACCACCAACC
    GAGGCGCCGGGCAGCGACCCCTGCAGCGGAGACAGAGACTGAGCG
    1093 GCCAGGACCGCGCACAGCAGCAGGGCGCGGGCGAGCATCGCAGCGGCGGGCAGGGCGCGGCGCGGGG-
    GTAGGCTTTGCTGTCTGAGGGCGTCTGGCTGTGGAGCTGAAGGAGGCGCTGCTGAGGAGTTCCTGGACGT-
    GCTCCTGACGCTCACTGC
    1094 CGGGCAAGAGAGCGCGggaggaggaggaggagaaaaaggaggaggaggaggaggaggaggCGGC-
    CCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTggcgcccgcccacccggcggcggccgc-
    cAAGTGCCTCTGGGCGCTGCGTGCCGCGCCCGCTGCTCCGCGCGCAGCCGGCTCGGGCCGCTCCTCCTGACTGA
    GGCGCGGCGGCGGCGGTGGCTGTGACCGCGCGGACCGAGCCGAGAC
    1095 GCGCGCAGCCAGGGGCGACGCTTCCGCTCCGAGCCGCGGCCCGGGGCCACGCGCTAAGGGC-
    CCGAACTTGGCAGCTGACCGTCCCGGACAGGGAGGCCCTTCAGCCTCGACGCGGCCTGCGTCCTCCGGAGGGC-
    CCTGCTCCGCCCGGGAAGCGTCCGCCTCCCGCCCGCCCGCCCGCAGATGTCGCTGCCCCTCTGGCTGTCCCGGC
    CTGACCGCCGCGCGCCGCCCTGCTGCTCACCTACTTCCGCGCCACGG
    1096 GTGCGCTCACCCAGCCGCAGGCGCCTGAGCGGCCAGAGCCGCCACCGAACACGCCGCACCGGCCAC-
    CGCCGTTCCCTGATAGATTGCTGATGCCTGGCCGCGGGAACGCCCACGGAACCCGCGTCCAcggggcggggc-
    cggcggcgcgcgcgccccctgccggccggggggcggAGTTTCCCGGGCGCCTGCCGGGTGGAGCTCTGCGGGCC
    GCT
    1097 GAGGGCCCGGGGTGGGGCTGCGCCCTGAGGGCCCTGCCCTGCCCTCCGCACGCCTCTGGCCACG-
    GTCCCTTCCCCGGCTGTGGGTCTGCGGCCCCTGCGTGCGCAGCGCTCCTGGCCTCTGCGGCCAGCGCGGGG-
    GCGGAGAGAGGAGAGTGCCCGGCAGGCGGCGGCTGGGCCGGCCCGGAACTGGGTCGTGGAAGGATCGCGGGGAG
    CGGCCCTCAGGCCTTCGGCCTCACTGCGTCCCCACTTCCCTGCGCC
    1098 TATGCgcccggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcggat-
    cacgaggtcaggagatcgagaccatcctgactaacacggtgaaaccccgtctc-
    tactaaaaatacaaaaattagccgggcgcggtggcgggtgcctgtagtcccagctacttgggaggctgaggcag
    gagaatggcgtgaacccggggcaga
    1099 CGCAGGGGAAGGCCGGGGAGGGAGGTGTGAAGCGGCGGCTGGTGCTTGGGTCTACGG-
    GAATACGCATAACAGCGGCCGTCAGGGCGCCGGGCAGGCGGAGACGGCGCGGCTTcccccgggggcggccg-
    gcgcgggcgccTCCTCGGCCGCCGCTGCCGCGAGAAGCGGGAAAGCAGAAgcggcggggcccgggcctcagggc
    gcagggggcggcgcccggccACTACTCGCCAGGGCCCGCCCG
    1100 CCTGAGGCGGGGCCGTCCGGCACCCTGTGATGGGGCGTGGCCCCTGGGGAGGCTCCCACCAGCCCT-
    CAGATTCCTCAGGGCCGCAGAGGTGTGGAGCTGGTTGGGCCGGTTCTTCACCCTCCTCCCCTGGTGCTTGCCT-
    GTGCCCCAGCAGGGTGACAGTGATGTAGTAGCGGGTCCTCCTGGAAGAGGGACGCGTGTGTAGGGTCTGGGCAG
    GCTCTGGCAAGGCAGTCCCTGGGGTGGCGGGCTTGC
    1101 GAGGCCGGGGACGCCGAGAGCCGGGTCTTCTACCTGAAGATGAAGGGTGACTACTACCGCTACCTG-
    GCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCCGGTCAGCCTACCAGGAGGCCATG-
    GACATCAGCAAGAAGGAGATGCCGCCCACCAACCCCATCCGCCTGGGCC
    1102 CCGCCGGCTCCCCCGTATGAGGAGCTGCCATAGCTTTCGAATCCACCTGTTTTGAACAACAG-
    GATTAGTGCCTGTGCCACGTCCCACGCCTCCGAGAAACCCGCAGGCTCCCGGAGGCTTCGC-
    CCCTTCAAACACTGCCCGAGTCTCCCTAACCTTCCTCGCCGCCTTCCTGCGGGTGACCCCCAAACGCCCCAGCT
    CCGCTCCCGCCCTTCCTCTCCCGCTACCACACGCCTCTCGGA
    1103 CAGGAGCGACGCGCGCCAAAAGGCGGCGGGAAGGAGGCGGGGCAGAGCGCGCCCGGGACCCCGACT-
    TGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTTGGCCCCGCCGCGACTCGGTGGCCCGCGCT-
    GCCTTCCCGCGCGCCGGGCTAAAAAGGCGCTAACGCCCGCGGCCGCCT
    1104 cgggggaaacgcaggcgtcgggcacagagtcggcaccggcgtccccagctctgccgaagatcgcg-
    gtcgggtctggcccgcgggaggggccctggcgccggacctgcttcggccctgcgtgggcggcctcgccgg-
    gctctgcaggagcgacgcgcgccaaaaggcggcgggaaggaggcggggcagagcgcgcccgggaccccgacttg
    gacgcggccagctggagaggcggagcgccgggaggagaccttggcc
    1105 ccccccaccctggacccgcaggctcaggagtccacgcggggagaggggatg-
    gagaactctcctcgcttcgtcctctctcccggggaatccctaaccccgcactgcgttacctgtcgctttggg-
    gaggccgctgccgggatccggccccgaacagcccgggggggcaggggcgggggtcgtcgaggggatgggggcag
    agagcaggcggcgggcaggatgcc
    1106 GCCCGGCTTTCCGGCGCACTCCAGGGGGCGTGGCTCGGGTCCACCCGGGCTGCGAGCCG-
    GCAGCACAGGCCAATAGGCAATTAGCGCGCGCCAGGCTGCCTTCCCCGCGCCGGACCCGGGACGTCTGAACG-
    GAAGTTCGACCCATCGGCGACCCGACGGCGAGACCCCGCCCCA
    1107 cgctgggccgccccTTGCTCTTAGCCAGAGGTAGCCCCTCACCCCGCGACTTACCCCACAC-
    CCCGCTCTCCAGAACCCCCATATGGGCGCTCACCGCCCGCCCGCACAGCTCGAACAGGGCGGGGGGAGCGT-
    TGGGGCCCGAGGCCGAGCTCTTCGCTGGCGCCGCCTCCCGGGACGTGGCCTCCATGGTCGTTGCCGCCGCTACC
    TCACAGAACCAGCAACTCCGGGCGCGCCAGGCCTCGGGCGCCGCCATCT
    1108 GCTTCTCCATAGCTCGCCACACACACACACACACGCCACGCACCGTATAAAAGCCTAAATGACACAC-
    CACTGCAGCGTTCAAACGCTGGGAAGAAGACTCCCTTGTGGCACCGGAAACCCACGAGGTTGGAAGTGG-
    GAGGGGAAGAGGGCCAGATACTTCACCTGAAAATCCGCCAGGATCATCTCCCGGTCCATGTTGGACGCCATGGC
    GGCCGCCGAGTTCCGCGGCTCCGGGAGCGAAGCGCGCACCTGG
    1109 CCGCGCACGCGCAAGTCCAGGCCGCCGCGGCCCTGGAATAGAGACTCGCCCTTGAT-
    GTCCCTCTCGAAGTAGTAGGCGGCATCGCCGATATCCACGTCACCGGCGGCCTTCTGAGACGTGTTCTGC-
    CGCAGCTCGATCTGGATGGTGGGCTGCTCGTAGTGCACGGCCGCCACGAACTTGGGGTGCAGCCGATAGCGCTC
    GCGGAAGAGCCGCCTCAGCTCGGCGTCCAGGTCTGAGTGGTTGAAGGCGCCGGCG
    1110 GTCTCAACTCACCGCCGCCACCGCCGCGCAGCCCCGCGGCCGCTGCTCCATAGCCCTCCGACGG-
    GCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCTGCCCGTCGTGGTTCCGCCTTCAgccccgcgcccgcagggc-
    ccgccccgcgccgtcgagaagggcccgcctggcgggcggggggaggcggggccgcccgAGCCCAACCGAGTCCG
    ACCAGGTGCCCCCTCTGCTCGGC
    1111 ACAAATGCGCTGCTCGGAGAGACTGCCGCGGCAACCAACTGGACACCCCAAGAGCT-
    CACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGCGGGGGCGCAGGGAGTGTGG-
    GTAACAGGCATAGATTCCGCTTGCGCAATACGTGGTAAGAAACCAGCTGTGAGGGGCTGGCCCAACGCAGAGCG
    GCGCGA
    1112 GCGCCTGCGCAGTGCAGCTTAGTGCGTCGGCGCGCAGTTCTCCCGCCCGTTTCAGCG-
    GCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCATACTTCTCGGGTCAGGGAAG-
    GTTTGGGAGGATGCTGAGGCCTGAGATCTCATCAACCTCGCCTTCTGCCCCGGCGG
    1113 aagtcaagggctttcaacctcccctgccccattcatacagtggaaggtctaacccaggcttgt-
    cagcctaagaacacgggatctcttcactgtggttcatgtgtagagtggagtttccatgctgaga-
    gagacaagcaaagaagaccagaggctcccacccctgtccagtgGA
    1114 tggatcccgcacaggggctgcaggtggagctacctgccagtcccctgccgtgcgctcgcattcct-
    cagcccttgggtggtccatgggactgggcgccatggagcagggggtggtgcttgtcggggaggctggggc-
    cgcacaggagcccatggagtgggtgggaggctcaggcatggcgggctgcaggtccggagccctgccctgcggga
    acgcagctaaggctcggtgagaaatagagcgcagcgccggtgggc
    1115 CCGCCTGTGGTTTTCCGCGCATTGTGAGGGATGAGGGGTGGAGGTGGTATTAGACGCAGC-
    CGAATCCTCCCTCAGAGTCCGCCAGGTGGGCGTCTCAGGGGTGGGAGTGGCCGCGTCGTGAAGCGGAGAGAG-
    GATTTCTCTCCTGGTCCTGGAGAAGGCCCCCGGCGGCCGGCGGCATCCCTCGCTGGCGAGTCCCGGGAGCGAGG
    TGGTCTCTGCAGGGGAGGAAGTTCCCGGGCGGCGCGGCCTGCGTCACAG
    1116 cgcgctctcccgcgcctctgcccgcccccggcgcccgcccccgccgctcctcccgactccccgc-
    ccccggcccGGGTCACTTGCCGTCGCGGTGGGCGGCCCCCGGCGAGTCCACACCCCTGCCCCGCCTCCTCCCG-
    GTAGGAAACTCCGGGACCCTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTGCC
    CTGGAGGACGCAGGCGGGTC
    1117 TCCGGCCCAGCCCCAACCCCGACCTAAGTAACCGGCTATCGGCCACCCATTGGCTGAAGTCCCT-
    GAGCACCTGTTGGGAGGAAGGCTGCTGCGTGCAGCCGGAAAGTCCTGCGTCCCTCCGCTCTTACCGCGGCAG-
    GAACCACAGCCTCCCCGAACCTCAGGGTTTGTATGGATTTCGCCCAGGGGAAAGCGCTCCAACGCGCGGTGCAA
    ACGGAAGCCACTGGCTGGTTGGGCGGCTGTGATGGG
    1118 CCGGGTCAGGCGCACAGGGCAGCGGCGCTGCCGGAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAG-
    GATGCGCAGACTGCCTCAGGCCCGGCGCCGCCGCACAGGGCATGCGCCGACCCGGTCGGGCGGGAACAc-
    cccgcccctcccgggctccgccccagctccgcccccgcgcgccccggccccgcccccgcgcgctctcttgcttt
    tctcaggtcctcggctccgccccGCTC
    1119 GGGGCGGTGCCTGCGCCATATATGGGAgcggccgcccctcgccgcgcccctcgccgccgccgccgc-
    cgcgctcgccgactgactgcctgacggcgccgcgagccggcccgagccccgcgagccccgcgagccccgccgc-
    cgccgagcgccaccgagcgccgccgccgccccccgccacgcaccgcggcTCCTCGCGTCCAGCCGCGGCCAAGG
    AAGTTACTACTCGCCCAAATAAATCTTGAAAAGAAACAAACG
    1120 GCGCGGGCCCTCAGGTTCTCCCTATCGAAGCGGTCTATGGAGATAGTTGGATACTCGGCCATCTGC-
    CCCTCGAAAGAACTCATAGCGCCGCCGATCCCAGAGTCCGGGACCCCAAAACCGCAGCTGAAGCCAAGGC-
    CAGCCCTGACCGCGCCGCCACTTCCGGGAAGCCGCGCGCTGCCTCGCCATTGGGCGGCCGAACGCAGCCACGTC
    CAATCAGAGGAGTCCGGAGACCGGGGGCAAAGTCAAGGAGCATCC
    1121 cgtccgcggcTCCTCAGCGTCCCCCTTTACGGTCTGGGCGGACTGCGGGGGCTGGGGAGGTTCTGGG-
    GACCGGGAGAGTGGCCACCTTCTTCCTCCTCGCGAAGAGCAGGCCGGGCCTACCCGTCCGCCCGCTCTGC-
    CGTCCGCTGGCCGGCCGACTGCTGCCCGATCACTCCTGAGGCCGCCGTTGGGCGACAGGGCGGTGCGGGAG-
    GAGGACTGCGCAGGCGCAGTGGGCCAGGCGGCCCGGCGACCAATCGG
    1122 GGAGGCGCCCAGCGAGCCAGAGTGGTGGCTGGTCCCGCGCGGTGAGTGGGATTGGGGCACTTGGG-
    GCGCTCGGGGCCTGCGTCGGATACTCGGGTCCGCTCGGGAGCGCGCTGGCCGCAACGAGGGCGGCGCGGGC-
    CCGGGCGATGGCGTGGCTTGCGTCTCCCGCCTccgggcagggcctggccgccgggcgggggcgggagggccacg
    cgggcccagggtggggccgcggcctgcgcggcgggcgggccgggt
    1123 CGCGCAGGGGGCCTTATACAAAGTCGGAGAAGTAGCTGGGTCGCTGGCCGGCCAGGGACTCAAGC-
    CGCCTCAGGTGAGCGCTCCTTGGCGCTACTTCCGGTCTCAGGTGAGGCCGCCGGAAGCGGGCACTTGGC-
    CCTAAGACCCGCTACAGTGCGTCCTCGCTGACAGGCTCAATCACCACGGCGAGGCCAAggcgcggggccgcggc
    ccgcccgAGAAGCCTGAGCTGGGCCCCGACACCCCCTGCCCGACATT
    1124 CCCCACCCCCTTTCTTTCTGGGTTTTGATGTGGATGTCTTTCTATTTGTTCAGGAAATTGTGACGT-
    GTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCACCCTGCTA-
    CATTTCCTAATCAAGAAGTTGGCGTGCAGCTGGGAGAGC
    1125 GCGCGTTCCCTCCCGTCCGCCCCCAAgccccgcgggcctcgcccaccctgcccgccgcccctccgc-
    cggcggccgcccTCTGCGGCGCCCCTTTCCGGTCAGTGGAGGGGCGGGAGGAGGGGCGGGGGTGCGCGGG-
    GCGGGGGGAGAAGTCCTGGAGCGGGTTTGGGTTGCAGTTTCCTTGTGCCGGGGATCCTGTCCCCTACTCGCCAG
    CGCCAGGCTCCTCC
    1126 ccggcggAGGCAGCCGTTCGGAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAG-
    GCCTCTGGCTGCTGAGGAGAAGCAGGCCCAGTCGCTGCAACCATCCAGCAGCCGCCGCAGCAGCCATTACCCG-
    GCTGCGGTCCAGAGCCA
    1127 GCCTGGTGCCCCGAGCGAGCCGGGAGTAGCTGCGGCGGTGCCCGCCCCCTCTCTCCGC-
    CCCTCCAGCGGAGCTGGTCTCCGGCCGGGCACCGTCGCGGGCCCCCCTGGCCCGGCCACCTGGGACCGTGCT-
    GGGGAGTCTGCCACTTCCCTCTCTCCCCTGGCCCGCAAAGTTTTGGCGGAGCCATCGCTGGGGCTGAGCGCGCC
    CCCGGGGGGAGATCGGGGAGCGCCCGATGCCGGGCGGCCGGAGCCATTGAC
    1128 GGCGGCGGCGCTACCTGGAGGCGCGGTGGCGGGCAGGTGCCCGAACTGCACGGCGATGCAGAG-
    GTCGTTGTCCAGGGGGAACTTGTGGCAGTGCAGCATCTCAGGCCAGGGGAAGCCGTAGGCCTCCATGAGCG-
    GCGCGCAGCCGGCGCGCACGGCCTCGCACAGCGAGCGGCACGGGTAGATGGGCCGGTCGAGACAGACGGGCGCA
    AAGAGCGAGCACAGGAAGACCTGCGTATCCGAGTGGCAGCGCTTGGC
    1129 TGGTGGCCAGCGGGGAGCGCCCGGGCGCCATCGGCGCGTCCTGCTCCACCAGGGCGACCCTGG-
    GCGCTGAGAAGCGGGAATCTTCCTTGGGGACCAGGGCGACGCCTCCTGCTGCCGCCCCCGGCGGGACAGC-
    CGCGGCTCCTCCTCCAGCCGCCGCGCCACCCAGAGCCCGAGGTTTGCCCTTCAGAAGCGGACCCGCAGACTCCT
    CGGACTCAGAGCCATCCTCCTCCTCAACCTCCACCGCAGCGGCCTGCG
    1130 GCGGCACTGAACTCGCGGCAATTTGTCCCGCCTCTTTCGCTTCACGGCAGCCAATCGCTTCCGCCA-
    GAGAAAGAAAGGCGCCGAAATGAAACCCGCCTCCGTTCGCCTTCGGAACTGTCGTCACTTCCGTCCTCAGACT-
    TGGAGGGGCGGGGATGAGGAGGGCGGGGAGGACGACGAGGGCGAAGAGGGTGGGTGAGAGCCCCGGAGCCCGAG
    CCGAAGGGCGAGCCGCAAACGCTAAGTCGCTGGCCATTGGTG
    1131 CTCGGCGATCCCCGGCCTGAACGGGTAGGAGGGGTTGGGGGATTCCGCCATCCCTTGTTTTGAG-
    GCGGGAACGCAACCCTCGACCGCCCACTGCGCTCCCACCCACACCCAGAGTAATAAGCTGTGATTGCAGGCT-
    GGGTCCTCACCGTCTGCTCGCCAGTCTTCTCCTTTGAGGACTCAGAAGCCAAGGGTTGCGGGAGGCACCA
    1132 CGCAGGGAGCGCGCGGAGGCCCGCAGGGTGCCCGCCTGGCCGCAGAGGCCGCGACGCCCCCTCCGC-
    CACCCTCGGGCCGCCGAAAGAACGGGCAGCCGGGAAATCCCGTGTCCCCACTCGTGGCAGAGGACGCTGTGGG-
    GCGGGCGGGCTGCGGGCTCCCGGCGCCTTCCCGCAGAGGCGGCGACAgcggccgccccccccgcggggccgggc
    cggggAACTTTCCCCGCCTGGAGCCGGGC
    1133 GAAATACTCCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCAT-
    GTCAATAGAGAATATTTCCACAGGTGTGCTAGGCACTTGTGGTAGATGTTGCAGGGAAGTCAGGACTGGG-
    GACAGCTTGGTCCCTACTTCAAGGTTACAGTCTAGGAGCTGAGAGTGGCAAAGTGACCTGATTCTACAGGGTAA
    AAGCCCCAGAGATAAATGACATAGGTCCAGGTCAGCCAGCATTG
    1134 CCGGGCGCACGGGGAGCTGGGCGGACGGCGGCCCCCGCCTCCTCCGGGGACGCGGCAC-
    GAGACGCGGGGACGCGCGGACGCCACGCTCAGCGGCCGCCCCCGGCCTCCGCGCCGCCTTCCTCCCGG-
    GAGCAGCCCCGACGCGCGCGGGCCCGGACCGCCGGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACTTT
    CTCCCGGTGCCGCCTCGGAGCGAGCGGGCTGGCGGGCGGCGCGGACTGCGCGCTC
    1135 gcggcggcgTCCAGCCAGAGCCCTGTGGAAGCGGCGGCGACACTTGGGCTGGGCAGTGTCTCTGAT-
    GCCTCCCAGCGCCAGCGACTGCTCTTATTCCCGCCGCTGTGGGTCGGGAAAGTTCCGCCAGTGCACAGCAAC-
    CAATGGGCGGAGGGGTCCTTTGCCCCTGGGTTGCGTCACCCTCATGCTTCCAGAACCTGGAGGATCCAGCAGGA
    CCGTCCCACTTGTATTTGCATTGAGGTCATTGATGGAAATGGT
    1136 GGGTCGCCGAGGCCGTGCGCTTATAGCCGGGATGACGCCGCAGTTGGGCCGGATCAGCTGAC-
    CCGCGTGTTTGCACCCGGACCGGTCACGTgggcgcggccggcgtgcgcggggcggggcggagcggggcctg-
    gcctgggcggggcAACCTCGGCGCACGCGCACAgcgcccgggcggggggcggggTGGTGGTGCGCCTGCCGCGC
    CTACAGTTCCCGCCGCTCGCGCC
    1137 CGCGCCTGATGCACGTGGGCGCGCTCCTGAAACCCGAAGAGCACTCGCACTTCCCCGCGGCGGT-
    GCACCCGGCCCCGGGCGCACGTGAGGACGAGCATGTGCGCGCGCCCAGCGGGCACCACCAGGCGGGCCGCT-
    GCCTACTGTGGGCCTGCAAGGCGTGCAAGCGCAAGACCACCAACGCCGACCGCCGCAAGGCCGCCACCATGCGC
    GAGCGGCGCC
    1138 ccgggagcgggcggaggaagggccgggcgtccggcgcaagcccgcgccgccccagcccoggccccg-
    gcccggcccgcACACGCCGCTTACCTGGAAGCCGGCGACGCTGCCGCCCACCTCCCTGCTGCGTGTCGCAAAC-
    CGAACAGCGGGCGTTGGCCCTCCTGCCGGACACTCCTCTGCCAGCGCCGCTCTGGCCGAGTCGCGGGGGCCGAA
    TGTGCGACGGGGCAGAGCGGG
    1139 GGGGCGCACCGGGCTGGCTCCTCTGTCCGGCCCGGGAGCCCGAGGCGCTACGGGGTGCGCGG-
    GACAGCGAgcgggcgggtgcgcccgggcgcggcggcggcAGCGTCGGGGACCCGGAGCTCCAGGCTGCGCCT-
    TGCGCCCGGGTCAGACATTATTTAGCTCTTCGGTTGAGCTTCGATTGGTCAAACGGCGCCGccccccccccccc
    gccccccgccccccgctccccGCTCGCCCGCGCTAC
    1140 GCCACGGGAGGAGGCGGGAACCCAGCGAGGCCCCCGAgggctggggggaccggccggccg-
    gacaaagcggggccgggccgggccggggcggggccgtgcggggcTCACCGGAGATCAGAGGCCCG-
    GACAGCTTCTTGATCGCCGCGCCGTTGGCGCTGGCGGCCGCGGTGCCGGCCGCGGGACGTCCCGAAATCCCCGA
    GTGCAGCTGGTCAGCGAGAGGCTCCTGGCCGCGCTGCCCCTGGTTCGCGCCCTGCT
    1141 cgggcatcggcgcgggatgagaaaccaacctgatacttatcgtgtgccgagttccctcct-
    tgtatcctgactaagcacagcgaataaccctgtccttgttctaaccccaggtcttgaagaaatact-
    gtcccagctgagccccgcgtttacaagatgaagaggcgccccagatgcgctgaaagaaaggccaaagctcgtgc
    ctccttccactgcctgcggtagaacctggtcccgcatagcttggactcggataag
    1142 acaccgccggcgcccaccaccaccagcttatattccgtcatcgctcctcaggggcctgcggcccggg-
    gtcctcctacagggtctcctgccccacctgccaaggagggccctgctcagccaggcccaggcccagccccag-
    gccccacagggcagctgctggcagggccatctgaagggcaaacccacagcggtccctgggccccaacgccaggc
    agcaaggactgcagcgtgcctacctgtgcagctgcaacccag
    1143 CCCCAACAGCGCGCAGCGAACTCCACTGCCGCTGCCTCCGCCCCAGAGACACGTTGCAGGCCA-
    GAGCGGCCGGGGCGCGGGGCATCACGGGACGGCCTCACCTGGCCTCTTGGAGGACTCCCGAAGCCCGAGGC-
    CGCCAACCGAAGGAGGCCCCGCCCCCGGAGGCACCGCCTCGCCTCTTTCCGCCAGCGCCCGCAGGACCCGGATG
    AGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGCGCCTTCGGATG
    1144 CCCCGCTGGGGACCTGGGAAAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGG-
    GACTCCAGGGCGCCCCTCTGCGGCCGACGCCCGGGGTGCAGCGGCCGCCGGGGCTGGGGCCGGCGG-
    GAGTCCGCGGGACCCTCCAGAAGAGCGGCCGGCGCCG
    1145 CCCGGGGGACCCACTCGAGGCGGACGGGGCCCCCTGCACCCCTCTTCCCTGGCGGGGAGAAAGGCT-
    GCAGCGGGGCGATTTGCATTTCTATGAAAACCGGACTACAGGGGCAACTCCGCCGCAGGGCAGGCGCG-
    GCGCCTCAGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGcgcccgcgccccctccccctgc
    gcccgcccccgcccccctcccgctcccATTCTCTGCCGG
    1146 CCCGCGGAGGGGCACACCAggcgggtgttggggaggacgcagagggctggggctggagcccag-
    gcggggcagggggcggggcggagctgggtccgaggccggCGGGGGCGCCTCCATCCCACGC-
    CCTCCTCCCCCGCGCGCCCGCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTcccggccc
    cgccccggcccggccccgccgAGCGTCCCACCCGCCCGCGGGAGACCTGGCGCCCCG
    1147 GCCCACGTGCTCGCGCCAACCCCTACGCCCCAGCGCGCCTTCTCCACCCACGCACGGGCCTCG-
    GACGCATTTCCAGCCCCGGCGTTGGTTGTGGATGCTGGACATCCACCGCCTCCAGGCAGTTTCGCCGTCACAC-
    CGTCGCCATCTGTAGCCAAAGCAAAACATATCCTAACTGAGACTTTGCAGCTCTTGTGGCCACTCTGGGCTCAC
    CGGGAACATGAGTGGAAGAGCCCGAGTGAAGGCCAGAGGCATCGC
    1148 GGCGGAGCGGCGAGGAGGAGGAGCAGGAGCGCGCAGCCAGCGGGTCCACGCATCT-
    CAGCACTTCCAGACCAACTCCGGCACCTTCCACACCCCTGCCCGGGCTGGGGGCTCCGAGAGCGGC-
    CGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGCGCCTCCCGGGCTCCGGTCCGCGCG
    GCGGGGTCCCTGCTCCTGCGCCCCGGGCGCGCTTCCCGGACACCCCGGTCCCCGCAGCC
    1149 CCTCGCCGGTTCCCGGGTGGCGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGC-
    CAGAAGACGCTCTACTCCTTTTTCTCCCCCAGCCCCGCCAGGAAGCGACACGCCCCCAGCCCCGAGCCGGC-
    CGTCCAGGGGACCGGCGTGGCTGGGGTGCCTGAGGAAAGCGGAGATGCGGCGGTGAGGCGCGGCTTGGGCCG
    1150 caggcgcgccgATGGCGTTTCTGAGGTGACGCCGCCCACACCGGGCTTCTCCGGGGGCGGAG-
    GAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCGCCAGGTAAGCAGCGGTTccgggcgcgg
    1151 CTCCCGGCTTCTGCATCGAGGGCCTTCCAGGGCCAGCCCTTGGGGGCTCCCAGATGGG-
    GCGTCCACGTGACCCACTGCCCCCACGCCCGCGCGCGGGCCCCAGCAGCCCCAGAGCTGCGC-
    CAACTTCGTTCACTCCGCGCTCACCTTACGGGGGTCCCCGCGTGACCGCATGGGGTAGCCCCTGCTCCCACGCT
    CCCGGCCGA
    1152 CGGTCCGCGAGTGGGAGCGGCTGCTTGTGGGCAGGGTGGACGCGGGGCCACGTCTTGGCCG-
    GCGTTTTGCGGGGTCTTCCTGTTCTGAACGCGCGTAACTTTTGCCTCAGTATCTCACTTCTTGGAATCCGGCG-
    GCGTTCACGTGTGTGCTCCAGAGAAGGGCGCCAGAGGGTATTCCCTGAAAGTGAAAGGTCGGCGAAAGAGGAGT
    AAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCGGTTTGGA
    1153 GCAGCGCCGCCTCCCACCCCGGGCTTGTGCTGAATGGGTTCTGATTGTGCACGGGGTGCACACTGG-
    GCATTTCTTGGAAGGGGCACACTGacgcgcgcacacacgcccccgacgcgcacgcgccccgcgcgcact-
    cacactcacccccgcgcacactcacccccgcgcacactcacgcTGCCGCCGCGCTGAGGTGCAGCGCACGGGGC
    TTCACCTGCAACGTGTCGATTGGACGGATGGGCTCGGCGCGTGGGT
    1154 CGACCGTGCTGGCGGCGACTTCACCGCAGTCGGCTCCCAGGGAGAAAGCCTGGCGAGTGAG-
    GCGCGAAACCGGAGGGGTCGGCGAGGATGCGGGCGAAGGACCGAGCGTGGAGGCCTCATGCCTCCGGGGAAAG-
    GAAGGGGTGGTGGTGTTTGCGCAGGGGGAGCGAGGGGGAGCCGGACCTAATCCCTCACTCGCCCCCTCC
    1155 CCCGGGCTCCGCTCGCCAACCTGTTACTGCTGCAGAACGCCAGGAAGCTCAGCCTG-
    ATCCCACAGATTAGGGTAAAATATCCCGGGGGGCCGAAGTGGAAACCGGAGTTGCGTCATTGCTCCCAC-
    CCGATATCACCTTGGCAGCGACCGCGGCTGACCACGTTCCCGGCCTGTCGCGAATCTCACCCAAGGGAGCTGAG
    TCTCAGCTTCCCTGGTCCCTGGTCCCGAGTTCCGCCTTCCCCCCCCGCCCCGTGGC
    1156 CATGGGGTGCTCATCTTCCCGGAGCTGAGGAGCTGGGGCGGGCATGGGGTGCTCATCTTCCTG-
    GAGCTGAGGAGCTGGGACGGGCATGGGGTGCTCATCCTCCTGGAGCTGAGGATCTGGGGCGGGTGTGGGAT-
    GCTCATCCTCCTGGAGCTGAGGAGCTGGGGCGGGCATGGGGTGCTCATCTTCCCGGAGCTGAGGAGCTGGGGCG
    GGCATGGGGTGCTCATCTTCCCAGAGCTGAGGAGCTGGGGCGGGCAT
    1157 CCGAGAGCCGGAGCGGGGAGGGCCCGCCAAGTCAGCATTCCAGCCGGTGATTGCAATGGACAC-
    CGAACTGCTGCGACAACAGAGACGCTACAACTCACCGCGGGTCCTGCTGAGCGACAGCACCCCCTTGGAGC-
    CCCCGCCCTTGTATCTCATGGAGGATTACGTGGGCAGCCCCGTGGTGGCGAACAGAACATCACGGCGG
    1158 CCGCTGCAGGGCGTCTGGGCTTCTGGGGGCAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGC-
    CCGCTGGCGGCAGCGGCGGTGCTCCGTCCAGGGCGAGAAGCTGCAGCGCTCGGGCCGGGGTCCCTCCT-
    GTCGCAGCAGCTCCTCGACGAGTGCAGGGGCAGCCACG
    1159 gcgctgccccaagctggcttccgctgcctgctctgggctgggctgggctgggctgggctggtag-
    gacctgctcccagggcgggaggggacacacccacctcagcagatctcagcccatccctcccagctcagt-
    gcactcacccaaccccacacgggccaaggagagagtgaagaggaagcattgccctcagaggccttcacggactg
    gccaga
    1160 CAGGATGCCAGCGTGACGGAAGCAAGTAACCACCAAGGCATCACCACTGGCGCTAAACTTCT-
    CACTTCCGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCT-
    GCGCGTCGTATTAGACGGAAACCGAGCGGGCCCATTTTTCATGGGTTTGCGGACCCACCAGCGAAGGCGGGAGG
    TGTCGCAGGGACATCTTCTGGCTGTTTCCGTCGCCTGCGTGGCCCTTGCACCCCGG
    1161 GGCGGTGCCATCGCGTCCACTTCCCCGGCCGCCCCATTCCAGCTCCGGAGCTCGGCCGCAGAAACGC-
    CCGCTCCAGAAggcggcccccgccccccggcccAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGAC-
    CCACGCGGCCGGGCAACCGCTCTGGGTCTCGCGGTCCCTCCCCGCGCCAGGTTCCTGGCCGGGCAGTCCGGGGC
    CGGCGGGCTCACCTGCGTCGGGAGGAAgcgcggcg
    1162 GTGGGTCGCCGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCT-
    TACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGAC-
    GAGCGCGTCTTCCGCAGTCCCAGTCCAGCGTGGCGGGGGAGCGCCTCACGCCCCGGGTCGCT
    1163 GGCGGAGGGCCACGCAGGGGAGACAGAGGGCCTCCACAGGGGCCAGGGGGAAGTGTGGGAACT-
    GAGTCTCCCCCAGACGAGGCTTCACTTGGACACGTGTATGTGGTCACCGGGGGAAACTGAGCAGTTCT-
    GACTTCCCTTGGAAGGCGTGGAATTAGGAGAGAAATCCCTTAGTGGGCACACGAGTGAGTGCCCCTTGGAGTCC
    ATCTGTGGAAAGGAAGCGGTGATAGGTTTCCGCA
    1164 GTCCGGGGGCGCCGCTGATTGGCCGATTCAACAGACGCGGGTGGGCAGCTCAGCCGCATCGCTAAGC-
    CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTcccgccccgcccttccgtgccctgc-
    ccttccctgcccttccccgccctgccccgcccggcccggcccggccctgcccaaccctgccccgccctgcc
    1165 CGGCCTGCGGCTCGGTTCCCGCCTCTTCCCCACCCCCAGCCCCGCGCTGCCCTCTCGGTCCCCCT-
    GCGCGACCCCAGGCTCGGCCCCTGCCCGGCCTGCCGGGGTGGCCCGGGGGTGGGGTGGGAGCCCTTTGTCT-
    GCGTGGGTCGCCTCGCGTCTCTCTCTCCCACCCCACCTCTGAGATTTCTTGCCAGCACCTGGAGCCCGAAACCA
    GAAGAGTTGTCAGCCCAACAAGAATATAGGATCACCGGCCCATCA
    1166 GGGAACCGTGGCGGCCCCTCCTGGCCCTGGGAGGTGGTCCCGCTGCCCCCCTGACTTCCGTGCACT-
    GAGCCCCTGGCCCTGCCCGCAGCCCCGGCCCTGGACTCGGCGGCCGCGGAGGACCTGTCGGACGCGCTGTGC-
    GAGTTTGACGCGGTGCTGGCCGACTTCGCGTCGCCCTTCCACGAGCGCCACTTCCACTACGAGGAGCACCTGGA
    GCGCATGAAGCGGCGCAGCAGCGCCAGTGTCAGCGACAGCAGC
    1167 cggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggatggT-
    GAGACggtgaggcggggcggggcctggggcgcgggcggggcggggaggggtggggcggggcCCGGGGGCGCTG-
    GACCGCGGTGCTGCGGGACGGATTCCCGGCGGCTGCGCGGGAGGCTGCGAGCCTGGGCTCCCAGGGAGTTCGAC
    TGGCAGAGGCGGGTGCAGGGAACCCGCGGCTCGGCGGGAGCGTG
    1168 cctcccggtttcaggccattctcctgcctcagcctcccaagtagctgggactacaggcgcctgccac-
    cactcccggctaattttttgtatttttagtagagacgggggtttcaccgtgttagccaggatggtctcgatct-
    gcttacctcgtgatccgcccgcctcggcctcccaaagtgctgggattacaggcgtgagccaccgcgtccggcAT
    ATTT
    1169 AGCCCGCGCACCGACCAGCGCCCCAGTTCCCCACAGACGCCGGCGGGCCCGGGAGCCTCGCGGACGT-
    GACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGACGCGGCGCTCAGTTGCCGGGCGGGGGAGG-
    GCGCGTCCGGTTTTTCTCAGGGGACGTTGAAATTATTTTTGTAACGGGAGTCGGGAGAGGACGGGGCGTGCCCC
    GACGTGCGCGCGCGTCGTCCTCCCCGGCGCTCCTCCACAGCTCGCTG
    1170 CCCCAGCCACACCAGACGTGGGAGCTTAGGATGAGAGCGGCCTCCGAGCAGATGATCACCCTG-
    GAACGACGCCAAACGCGACCCCTACCAGAGGACTCGCGCATGCGCAGCGCAGCCTGGGCCGGCGGCCTGG-
    GCAGGATGTAGTCGCGAGCAGCGCACCGGGCCCACGCCAGCGGAATTGCGCATGCGCAGGGCCGCCTCTGCCTG
    CGGCCTGGGCTGGG
    1171 tgggcttcctgccccatggttccctctgttcccaaagggtttctgcagtttcacggagcttttca-
    cattccactcggtttttttttttttgagactcgctctgtcgcccaggctggaatgcagtggcgcgatctcg-
    gctcactgcaagctccgcctcccgggttcacgccattctgcttcagcctcccaagtagctgggattataggcgc
    ccgccaccacgcccggctaatggctaattttttgtattttttttt
    1172 CCGCGCTGGGCCGCAGCTTTCCGGAGCGCAGAGGAAGCTGGCCAGCCTGCAGATAGCACTGG-
    GAAAGACACCGCGGAACTCCCGCGAGCGGAGACCCGCCAAGGCCCCTCCAGGGACCTGTCTTCCTAACTGC-
    CAGGGACGCCGAGCCAACTC
    1173 gcatggcccggtggcctgcactccagtgaggtggctgaactctgaccagccaagagaaaac-
    ccccctctccgccccaaacagctccccactcccccagcctgcccccaccctccccacattccagtctttcact-
    gtcgccccaggcaacttggctgcccaagaccaagccccaccaagaagctggagggccaggcaagtccaggatgg
    gcaagcagggaagcacgagagggagaaacagaggtgaggaaggaagg
    1174 GGGCAGGGGAGGGGAGTGCTTGAGTATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACT-
    GGGTGTGGTAGAGGCTGTCCAGGGCCTGGCAGGCATTGCTCTGCCCATAGATGCCTTTGTTGCACT-
    TGATACAGGTGCCTGAGAAGAGAAAAGTGTCACACTCTACTCCCCCAGGTCAAAACCAGG-
    GATTCCCAAGCTTTCCTGACTGCCCTTTCCTGATGTGCCAGGGGTCA
    1175 CCCCGGCGCCTTCCTCCTCCGGACTCCGCTGCATGCCTCGCTTGCGGTGGTCCGATCG-
    GCTTTCTCCGGGAGCTTTCCTCTCCCCGCCACGCCCCCGTCTCCCCGGCCGTCCCCGCGCCTCTCG-
    GCCTCCCTTTCATTAGCCCCACATCTGTCTTTCCCATGGGAGGGAGCGCGCGCCTTCCGCCCAGCGGGGCCCTT
    AGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCGCTGGGCCGTTCT
    1176 CCACCGCGCTTCCCGGCTATGCGAAAGTGAAAACGAGGGGCGCCCAAGGCCCT-
    GCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTCTTCCTCACTGATCTCCCGGGGGCG-
    GAGACGCTGAGTTCCCCGGAGACGAGTTAGTCACCAAGAAGAGGCGGTGACAGAGAGCGCGGCTCGCGTCGCAC
    TCCGAGGCC
    1177 CCGCATCTGACCGCAGGACCCCAGCGCTACCAAGTGCCTGTTCTTGGACCCCCAGCCGAGCAGGGG-
    GAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGGCGCCGCTGCCGGGCCGCCCTCCCTGATGC-
    CCAGCGCGCAGCCTGCCGGCGCCGCGCCTTCTGGACGGCTCTCGCCGCACCTCCTGAGCTCAGCCCGCGGCCCC
    GCAGTGGGGCGGCCTCACTTACTGGCGGGGAAGCGCGGGTCTGGGTTGGCGC
    1178 GCGGACACGTGCTTTTCCCGCATTAGGGGGGGTCTcccggcgcgcgccccgccgccACCTGTTGAG-
    GAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGGCGCCAGCCCTGCCCCGCAGCCCCAGAGC-
    CCGTCGCAGCTCGGGTGGTCCCTCCCCGGCCCAGCGCTCGCCGCCTGCTCTTCGCCCTGCAAGTTTCAAGAGGC
    AGTTATTTCTCGCAGCCTCCGCGCTTGCA
    1179 GAGCTGGAAGAGTTTGTGAGGGCGGTCCCGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCG-
    GCTATAAGAACCGGGAACTGGGCGCGGGGAGCTGAGTTGCTGGTAGTGCCCGTGGTGCTTGGTTCGAGGTGGC-
    CGTTAGTTGACTCCGCGGAGTTCATCTCCCTGGTTTTCCCGTCCTAACGTCGCTCGCCTTTCAGTCAGGATGTC
    TGCCCGTGGCCCGGCT
    1180 GGCCGCCAACGACGCCAGAGCCGGAAATGACGACAACGGTGAGGGTTCTCGGGCGGGGCCTGG-
    GACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGGCTGGTCTGGAAGGAACCTGAGC-
    TACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATGTGCGTGATGGGGAGTCCGGGCAAGCCAGGAAGGC
    ACCGCGGACATGGGCGGCCGCGGGCAGGGCCCGGCCCTTTGTGGCCG
    1181 GCGCCCGGTCAGCCCGCAGCGCCCGGCCAGCCCGCAGCGCCGGAGCCCGCAGTGCGTGCGAGGG-
    GCTCTCGGCAGGTCCAGACGCCTCGCCGAGCCCAGCCCGCAGCTccccgggccgcgccgcgcccgcccACAGG-
    GCCCACAGCCCTGCTTCGGCTCTCAGGGCGGTCACCTGGGATGGGG
    1182 CCCGCCAGGCCCAGCCCCTCCCTGGCCAGCCCCGTCCTTGTCCCCAAACTgggcccgcccggccgc-
    caggccgccgggcctccggggcccTCGCGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGC-
    CCTTCTGGTGTTAGAAGAGACCCCGGCATCATCTTTTCGTCGCGTGCTTCCCCCAGAGTCA
    1183 CGATTCTTCCCAGCAGATGGCCCCAAAGTTCAGTTCCTGAATTGCCTCGCGGAGCCGCGGGCT-
    GCAACGTGAGGCGGCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAAC-
    GAGAATCCTCCAGGGGTGCAGGGCGACGGCTTCGGCCGCACC
    1184 CGCACACCGCCCCCAAGCGGCCGGCCGAGGGAGCGCCGCGGCAGCGGGAGAGGCGTCTCTGTGGGC-
    CCCCTGGCAGCCGCGGCAGGAAAGGGCCCGAAGGCAGCGAAGGCGAACGCGGCGCACCAACCTGCCGGC-
    CCCGCCGACGCCGCGCTCACCTCCCTCCGGGGCGGGCGTGGGGCCAGCTCAGGACAGGCGCTCGGGGGACGCGT
    GTCCTCACCCCACGGGGACGGTGGAGGAGAGTCAGCGAGGGCCCGA
    1185 AGGCCCCGAGGCCGGAGCGGCGGAGGGGGCGGCCCCTCCCACAGGGTCTTCCCACCCACAGGGCAC-
    CCAGGCGCAGCGGAGCCAGGAGGGGGCTTACCCGCGGGCAGGGACGGAGCACGCCGGGGCCCTGGAGGG-
    GCGACGCTCGCTCGTGTCCCCGGTCCCCGTGGCC
    1186 GGGTTCGCGCGAGCGCTTTGTGCTCATGGACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGT-
    GGGGTCTTTCTGGCGGCGCGCCGCCTGCAGCCCCCCTAAAGCGCGGGGGCTGGAGTTGTTGAGCAGCCCCGC-
    CGCTGTGGTCCATGTAGCCGCTGGCCGCGCGCGGACTGCGGCTCGGCGTGCGCGTGTTCCCGGCCGTCCCGCCT
    CGGCGAGCTCCCTCATGTTGTCGCCCTGCGGCGCCC
    1187 CCAGTCTCCCGCCCCCTGAGCATGCACGCACTTTGGTTGCAGTGCAATGCTCTGACTTCCAAATGG-
    GAGAGACAAGTGGCGGAAAATAGGGTCTTCTCCCACCTCCCACCCCCCCATCCCGACTCTTTTGC-
    CCTTCTTTTGGTCCAAGAGATTTTGAAACCGTGCAGAACGAGGGAGAGGGGCAGGCTGCAGCCGGGCAGATAAC
    AAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGTAGAG
    1188 GAGGAGGCAGCGGACCGGGGACACCCTGGGGGAACTTCCCGAGCTCCGCGACCTCGAAGCCTGGC-
    CCTTCCTTCTCCCTGGTCCTACATGCCTCCCTCCCCCACTGTCCGGGGTCCTGGCCTCGACGCCGAGGGGT-
    GTCCCTCTCCTCTCCTGGTCAGGGAACGCAGCAACTGAGGCGGCGCGGCCCAGATGAGACGGGAAGCGCCTGCG
    GGCCGTGGGCGCGGGTGGAACCC
    1189 CCGGCTCCACGGACCCACGGAAGGGCAAGGGGGCGGCCTCGGGGCGGCGGGACAGTTGTCGGAGG-
    GCGCCCTCCAGGCCCAAGCCGCCTTCTCCGGCCCCCGCCATGGCCCGGGGCGGCAGTCAGAGCTG-
    GAGCTCCGGGGAATCAGACGGGCAGCCAAAGGAGCAGACGCCCGAGAAGCCCAGGTGAGCGGCTGGGCCGCGCC
    GGACGGGCGTCGGGGGTCTGGGCCGCGA
    1190 CCGCCACCGCCACCATGCCCAACTTCGCCGGCACCTGGAAGATGCGCAGCAGCGAGAATTTCGAC-
    GAGCTGCTCAAGGCACTGGGTAAGCTGGTGCAGAGGGCGCGCCCCGACGGGGAGATGCGGCCCGGAGGTGC-
    CCTGGTCCCGGAAGTGCCCCGGTCCTGGAGGGGGTGGAAGTTGGGGAGCCCAGGCAGGAGGGAGTCCCCGGGGC
    AATAGATCGCCTTGTCTCCCAGGCGCACCGGGTCTCG
    1191 TGAGTAAGGATGATACCGAGAGGGAAGAAAAAAATACCCTCTTTGggccaggcacggtggctcac-
    ccctgtaatcccagcactttgggaggctgaggcgagcggatcacgagatcagaagatcgagaccatcctg-
    gctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagccaggcatggtggcgggcacct
    1192 tgggccaggcacggtggctcacccctgtaatcccagcactttgggaggctgaggcgagcggatcac-
    gagatcagaagatcgagaccatcctggctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagc-
    caggcatggtggcgggcacctgtagtcccagctacttgggaggctgaggcaggagaatcctttgaacccaggag
    gcggagcttgcagtgagctgagattgtgccactgcactccag
    1193 CCGGCGAAGTGGGCGGCTCCCCAAGCGCCCAGGCTGCGCAGCACGATggccgcccccgccgcgcac-
    cgcgtgtgcccgcacgcccgccccctgcgccccggggacgcctctccgcccctccccctgcccctccgcccac-
    cgcgcggtcgccccacgccgcgggcgctgcttcgccgcccgggaggccgcctcccgccccgggACCGGATAACG
    CCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCCCCCGCAG
    1194 GCGGCCTTACCCTGCCGCGAGCGCCTGTGACAGCGGCGCCGCTGTGCTCGCGACCCCGGCTCCGG-
    GCCTCTGCCGACCTCAGGGGCAGGAAAGAGTCGCCCGGCGGGATGGGCGGGGAGGCTGGGTGCGCGGCGGC-
    CGTGGGTGCCGAGGGCCGCGTGAAGAGCCTGGGTCTGGTGTTCGAGGACGAGCGCAAGGGCTGCTATTCCAGCG
    GCGAGACAGTGGCCGGGCACGTGCTGCTGGAGGCGTCCGAGCCGG
    1195 gtggggccggcgAGGGTCAGGGGCATCGCGGCCGCGACCCCATTCTGCAGCCCCCGAGGCTCGC-
    CCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCTCCCGCCTCCTCGCCCAGGGCCCGGCTCACCTg-
    gcggcggggcgcgggacgccgcgggcgggacggcggggggctccggggcgctccggggcggcTCTCGCGCATGC
    TCCGGGGC
    1196 CGGCGCGGACCGGCTCCTCTACCACTTTCTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGT-
    GCTCAACACGCAGCCGGCCGAGGAGGTGCGGCCGCGCTGGCGCGGGAGTGAGGGGACTCCGAGAGTGTTGAGG-
    GCCTCCTGAGCGGATGCGAGGCCTCTGACAGGGATGGAGGGGCTCTGAGGGGGATTCAGGCCCCTGACACTACG
    CGATGACACAGAGAAGGATGGCAGGGGTCCCCAGGGG
    1197 GCCCATGCGGCCCCGTCACGTGATGCAAGGATCGCCGGCCTTTCCGCCAGAGGGCGGCACAGAAC-
    TACAACTCCCAGCAAGCTCCCAAGGCGGCCCTCCGCGCAATGCCGCTACCGGAAGTGCGGGTCGCGCTTCCG-
    GCGGCGTCCCGGGGCCAGGGGGGTGCGCCTTTCTCCGCGTcggggcggcccggagcgcggtggcgcggcgcggg
    gTAA
    1198 GGGATTGCCAGGGGCTGACCGGAGTGTTGCTGGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCAC-
    CAGGTAGGACTGGGACTCCCTTAGGGCCTGGAGGAGCAAGTCCTTGCAGGTCCAGTTCCAGGCTGGTGT-
    GAAACTGAAGAGCTTCCGCATCTTGCTTGGGTTGGTGGGCTCGGCCCGC
    1199 GCCGGAGCACGCGGCTACTCAGGCCGAACCCCGACCCGGACCCGGCACGCGGCCTCGGCGAGGGCGG-
    GCGGGAGTGTCCTCCTCCGGGACAGCCGGACTCCCGCCGACTTCTGGGCGGCGGGGAGGGCTCCAGGCCCG-
    GCTCTCCCGGGCCCCCGCACGCGATGCGCGGCCCCTGCAGCTGCTCCGTGCCCCGAGACGCGCCCGAGGCCTCG
    GACCTCCAAGCGGCCACCGCGC
    1200 CCTCGGCGCCGGCCCGTTAGTTgcccgggcccgagccggccgggcccgcgggTTGCCGAGCCCGCT-
    GACGTCAGCCCGGGTTTCCCCCCCCCACCGGGGCTTCCCCATCCCCCGAGGCTTCCCGGGAGGGCTGC-
    GAGTCCGGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAGAGAGGACTTGGGGCGGGAGCCGCGCG
    GGACGCTGTCCCCCTCCCGCCCCCCAccccatttacagattgggaga
    1201 CACAGCGGCGGCGAGTGGGTCGTGCACGCGGATGCGGGGTGGGAGTGGGGGCGCACGCGCGGGCGT-
    GGGCGAGCGGGCCCCGGCAGTGCACACACACGGCAGGGGCGGGCGACAGATGCAGTGCGTGCGCCGGAGC-
    CCAAGCGCACAAACGGAAAGAGCGGGCGCGGTGCGCAGGGGCGGGCGCCCAGCGGGCTTGGCATGCGCG
    1202 CACCTCGGGCGGGGCGGACTCGGCTGGGCGGACTCAGCGGGGCGGGCGCAGGCGCAGGGCGG-
    GTCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCCCTAC-
    CAAGCCTTCCACGTGCGCCTTATAGCCTCTCAACTTCTTGCTTGGGATCTCCAACCTCACCGCGGCTCGAAATG
    GACCCCAACTGCTCCTGCGCC
    1203 AGACGGGGCCGGGCGCAGACGCCCCGCCCCGCCCTTGCACCCAGCCCGCTGAGTCCGCACCGC-
    CCGCGGTCCCGGCCTGGGCTGTGCGCAGGAGATGGGCCAAGTGCAAGGTCCCTTGAGCGCAGCTGG-
    GCGCACACCGCAGGACGGCCCCTTTCGCACCGGCTCGCGAGGGAGGCGCTGTGCCCCCCGTGTGCGGCTTCTCT
    CACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCGCCCC
    1204 GCATTCGGGCCGCAAGCTCCGCGCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCAC-
    CGCTTCCCTTCTTCCAAGAAAGTTCGGGTCCTGAGGAGCGGAGCGGCCTGGAAGCCTCGCGCGCTCCGGAC-
    CCCCCAGTGATGGGAGTGGGGGGTGGGTGGTGAGGGGCGAGCGCGGCTTTCCTGCCCCCTCCAGCGCAGACCGA
    GGCGGGGGCGTCTGGCCGCGGAGTCCGCGGGGTGGGCTCGCGCGGGCGGTGG
    1205 GCCCGAAAGGGCCGGAGCGTGTCCCCCGCCAGGGCGCAGGCCCCAGCCCCCCGCACCCCTAT-
    TGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTGCCGCCTCTGCTGCCTTCCCTGAGCGGGAGCG-
    GAGCGCAGAGAAAAGTTCAAGCCTTGCCCACCCGGGCTGC
    1206 CGGCGGCCGGGTGACCGACCACTGCTTACCAGGAGGGGAGACTGGCAGGGGGGGCTCAAGGAA-
    CATCTGGTGGGTGTCCCCTTCACAAGACTCGGCCTGCAGAGTTCGTGCAGGGAGTTCGCACATAGGAGAGCAC-
    CGGTCCGGGAGTGCCAGGCTCGTGCCCGGCCGGGGAGAGGAGTGGGAGACTAAGTCGCAGGGCAAGGGCAACT-
    GCA
    1207 CCACCGGCGGCCGCTCACCTCCTGCTCCTTCTCCTGGTCCGGGCGGGCCGGCCTGG-
    GCTCCCACTCCAGAGGGCAGCCGGTCCTTCGCCGGTGCCCAGGCCGCAGGGCTGATGCCCCCGCTCAGCT-
    GAGGGAAGGGGAAGTGGAGGGGAGAAGTGCCGGGCTGGGGCCAGGCGGCCAGGGCGCCGCACGGCTCTCACCCG
    GCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGACACGATG
    1208 CGGTCAGGGACCCCCTTCCCCCTTCAAGCTGACTCCCTCCCACAAGGCTCTTCAGATCTCGT-
    TGTATTTTGGGATTGATGGGGGAAAAATCCAAATTTGTTTGTTTGCTTCCCTTTTTTCGGTGGTGGGGAAAG-
    GTGGCAGGCTTTTTGGGACAACCATGGAGGGGTCCTCCGTCTCGGCCTCTTCGCATATCCCCCTCCGTGATCCT
    GCCTTCCCCCCCCACCGAGCCCATCGCAGGC
    1209 GGCCGAAGCTGCCGCCCCTCCTCCCAACCGGCGGGTCAGATCTCGCTCCCTTTCGGACAACT-
    TACCTCggagaggagtcaaggggagaggggaggggagggggggagggggcaagagagagaggggggagaa-
    gaggGATCTTCTCGCTTATTTCATTGTTCCCCCATCTTCAGGGAGCGGGGGCAGCGGCTCCTCAAGGCGGCGGG
    CGCCGGCGTCTTCAGAGCGCCATGCGAACCGCGG
    1210 GCGGCCTTGTGCCGCTGGGGGCTCCTCCTCGCCCTCTTGCCCCCCGGAGCCGCGAGCACCCAAGGT-
    GGGTCTGGTGTGGGGAGGGGACGGAGCAGCGGCGGGACCCTGCCCTGTGGATGCCCCGCCGAGGTCCCGCGGC-
    CGGCGGGGCCAGAGGGGCCCGGACGAGCTCTCCTATCCCGAAGTTGTGGACAGTCGAGACGCTCAGGGCAGCCG
    GGCCCTGGGGCCCTCGGGCGGGAGGGGGCAGTTACACGGCAG
    1211 CGCGGGAGGAGCGGCGAGGCCCTCACCTGGCGCCTTTTATGCCCGCGGCCGGTGGAGGGGGGAAGG-
    GAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAGGAATTTGCAGGCTCCTGAGAGCAAATATGG-
    GCTCTCTCCCCATTGGTCAATTCCCTCCCCTCCCAGAGACCAGAGGCCCCTGCCCTCCAGAGGTGCCCCGCCCC
    GGTCCGCGCAGAAGCTCCGACCCGCACTCCCCCA
    1212 GCCCACCAGAAGCccatcaccaccagcaaagccaccaccaaagccaccacccaagccagcaccaag-
    gccaccaccatatcctcccccaaagccactaccaAAGCTGCTGCTGCTGCTGCTGAAGCCACCGCCATAGC-
    CGCCCCCCAGCCCGCAGGCTCCCCCAGAGGAGAAGCGGGAGGATGAGACAGACAGGCCGCCCCCGTAGGTGCTG
    GGGGCGCGGCAG
    1213 GGGCCATGTGCCCCACCCCACAGCCCCACCCTGCCCTGCCCACCACCCCAAGCCCGGCCCTGG-
    GTCCCAGGGTCCCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATG-
    GCTTCCACTTCCCAGACAGGCCCAGACGGCCCCGCCAGCAGCC
    1214 CCGCCAGCCCAGGGCGAGAGTCAGGGACGCGGCGTCGGGCGAGCTGCGCGGGCCCCGGGGGAG-
    GCGCGACCCCGGAGGCACCTGTCCGGATCCCTCCCCGCCTTGCTCAGATCTCTGGTTCGCGGAGCTCCGAG-
    GCGCGCTCGGCCCGAACCGCGCGACCCCCAAGTCGCCGCGCCC
    1215 gccccctgtccctttcccgggactctactacctttacccagagcagagggtgaaggcctcct-
    gagcgcaggggcccagttatctgagaaaccccacagcctgtcccccgtccaggaagtctcagcgagctcacgc-
    cgcgcagtcgcagttt
    1216 GTGGGGGTCCGCACCCAGCAATAACCCGGGTCTTCCCGCTCCGGCTCCTGCCCCAGTAAGCGTTG-
    GACCGGGAGACGCAGTGCTCAGCATCGGTCAGCAGGGGGCGCAAGGACCCCGCCCCGCCGAGTCCGCGC-
    CAAAGTTTCTCATCCTCCACCCGCCCACGCTCCGCACCCCCTCCGCGGCTGCCCAGCACCCCCACGGCCCCAGC
    A
    1217 gggcccccgggTTGCGTGAGGACACCTCCTCTGAGGGGCGCCGCTTGCCCCTCTCCGGATCGC-
    CCGGGGCCCCGGCTGGCCAGAGGATGGACGAGGAGGAGGATGGAGCGGGCGCCGAGGAGTCGGGACAGCCCCG-
    GAGCTTCATGCGGCTCAACGACCTGTcgggggccgggggccggccggggccggggTCAGCAGAAAAGGAC-
    CCGGGCAGCGCGGA
    1218 GCCTGCACAGACGACAGCACCCCCGGCGGGGGAGAGCGGCCCCAGCGGAGACTCGGCAGGGCTCAG-
    GTTTCCTGGACCGGATGACTGACCTGAgcccggggcccgggcggcgctggccgggcACAGGATGCGCGGCCCG-
    GAGAGCGCATCCCGGCCATCCGCCCGCGCTCGGCCCCGCAGCGCAGCTGCTGCAGATCCGCGGGGGCCGCCAC
    1219 GGCCGCGCCGGGCTCAGGTTCCACCCCCGGGAGCGCGGGGCGGAGCCAGGCCGGCGCCGAGGCT-
    CAGTGCCCTCCCCGCTCCGCGGCGCCGGCTGCGAAGTTGAGCGAAAAGTTTGAGGCCGGAGGGAGCGAGGC-
    CGGGGAGTCCGCTCCAGCGGGGCGCTCCAGTCCCTCAGACGTGGGCTGAGCTTGGGACGAGCTGCGTTCCGCCC
    CAGGCCACTGTAGGGAACGGCGGTGGCGCCTCCCC
    1220 GGGGTAGTCGCGCAGGTGTCGGGCGCGGAGCCGCTTGGCCTCCTCCACGAAGGGC-
    CGCTTCTCGTCCTCGTCCAGCAGCTTCCACTGCGCGCCCAGGCGCTTGGAGATCTCGGAGTTGTGCATCT-
    TGGGGTTCTGCTGCGCCATCTGGCGGCGCTGAGCGGAGCTCCACACCATGAACGCGTTCATCGGCCGCTTCACC
    TTCTCCAGGGGCAGCGTCCCGGGGGCCGCGGGGCTCCCAGCGCCCTCCCGCTCC
    1221 tgcaggcggagaatagcagcctccctctgccaagtaagaggaaccggcctaaagga-
    cattttctctctctctcctcccctctcatcgggtgaatagtgagctgctccggcaaaaagaaaccggaaat-
    gctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgggcctctcagattgcgacg
    gtcctcctcggcctggcgggcaaacccctggtttagcacttctcacttccacga
    1222 ccggaaatgctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgg-
    gcctctcagattgcgacggtcctcctcggcctggcgggcaaacccctggtttagcacttct-
    cacttccacgactgacagccttcaattggattttctcc
    1223 gcgtcggatccctgagaacttcgaagccatcctggctgaggctaatctccgctgtgcttcctct-
    gcagtatgaagactttggagactcaaccgttagctccggactgctgtccttcagaccaggacccagctccagc-
    ccatccttctccccacgcttccccgatgaataaaaatgcggactctgaactgatgccaccgcctcccgaaaggg
    gggatccgccccggttgtcccc
    1224 CCGGCTCCGCGGGTTCCGTGGGTCGCCCGCGAAATCTGATCCGGGATGCGGCGGCCCAATCGGAAG-
    GTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCGCGGTGCTAAGGAACACAGT-
    GCTTTCAAAAGAATTGGCGTCCGCTGTTCGCCTCTCCTCCCGGG
    1225 CGTCGCCGGGGCTGGACGTTCGCAGCGGCGCTTCGGAAGGGGGCCCCGCGGGAGCAGCCGC-
    CCGCGTCTCCAGCAGCTTCCCCTTGCCAGGCGCCGCGCGCGCCCGGTATCCCCGGGTGTCCACCTGTGCGT-
    GGGGGGCTGTTTCCCGTCTGTCCAGCCGCGCCCACTTCTCAGGCCCAAAGGCCAGCAGGAAGGGTCCCGGAGGT
    GGCTGGGGGCGTCCACCTGAGAAGCTCCGCTCTCGCTCAGACACCCCAC
    1226 GGGCCTGCCGCCTCGTCCACCGTCCGTCGTGAGGCCGGCAGCGGACACGTGCTCATCCCACGGGGAG-
    GCCCCGCGCAGCGCGGAGGACGCGCCTGAGAGAGAAAAGGGGTTCGGGAGAAGCCCGAGGACCCGGCCCGT-
    GACTGGGCGCGCCCTATGCAAATGAGCGGGCGGGGCCCTCGTGTTGCTGAACGAGGGCGGGTTCGCGATGTAAA
    TAAGCCCAGAGGTGGGGTCTTTGGAGAGCACTTAGGGCCCGGG
    1227 GCACACCGCTGGCGGACACCCCAGTAACAAGTGAGAGCGCTCCAC-
    CCCGCAGTCCCCCCCGCCTCTCCTCCCTGGGTCCCCTCGGCTCTCGGAAGAAAAAC-
    CAACAGCATCTCCAGCTCTCGCGCGGAATTGTCTCTTCAACTTTACCCAACCGACGACAAGGAACCAGCCTC
    1228 GCAAACCATCTTCCCCGACGCCTTCCACATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACT-
    GCCTCCCACCGCACTGGGGTCAACTCTCACCCAAGGGTTCCGCCACCTTCCACCACCAAACCAGCCTGTCCCT-
    GCCACATGCCCCCCGGGCCCCAGCGCTCATCCTCTGCCCAGGCCCGCTCTTGACCCCTGACCCCGGCCTGAC-
    CCCGC
    1229 GGCCCTCCGCCGCCTCCAACCGCGCACCAGGAGCTGGGCAcggcggcagcggcggcagcggcg-
    gcgTCGCGCTCGGCCATGGTCACCAGCATGGCCTCGATCCTGGACGGCGGCGACTACCGGC-
    CCGAGCTCTCCATCCCGCTGCACCACGCCATGAGCATGTCCTGCGACTCGTCTCCGCCTGGCATGGGCATGAGC
    AACACCTACACCACGCTGACACCGCTCCAGCCGCTGCC
    1230 CACCACCGTGGCAAAGCGTCCCCGCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCG-
    GCGAAGTCGCGGCGGTAGCGGCGGGAGAAGTCGTCGCCGGCCTGGCGGAGGGTCAGGTGGACCACAGGTG-
    GCACCGGGCTGAGCGCAggccccgcggcggcgccgggggcagccggggTCTGCAGCGGCGAGGTCCTGGCGACC
    GGGTCCCGGGATGCGGCTGGATGGGGCGTGTGCCCGGGC
    1231 CACAGCCCCTTCCTGCCCGAACATGTTGGAGGCCTTTTGGAAGCTGT-
    GCAGACAACAGTAACTTCAGCCTGAATCATTTCTTTCAATTGTGGACAAGCTGCCAAGAGGCTTGAGTAGGA-
    GAGGAGTGCCGCcgaggcggggcggggcggggcgtggagctgggctggcagtgggcgtggcggtgc
    1232 GCTTGATGCTCACCACTGTTCTTGCTGCTCAAGGGAAACCAAGTATATATTTGTGGATAG-
    ATCCTAACTCAGATGATACTGTCAGAATATATAAGATTCCTATACCACATCCTGAACTCTGAAAGT-
    TGCAGTTCTACGTAGAAGTTCACTGAGGGTTGTAAGAGTCAGAATGGACTCCATGGAAGTTATGGGGTGTGAAT
    CAAACCTCACAGGTGAGTCAGTGGGGAGAAAGAAGCATGACA
    1233 ggccaggcccggtggctcacacctgtaatcccagcactttgggaggccgaggtgggcggattgcct-
    gaggtcaggagtttgagaccagcctggccaacatggtgaaaccccgtctctactaaaaataccaaaaattagc-
    cagtcgtagtggtgggcacctgtaatcccagctattcaggaggctgaggcaggaggatcacttgaacccaagag
    gcgggagttgcagtgagcagagatcacgccattgcaccccag
    1234 GCGGGACGGGTGGCGGGAAGGAGGGAGGCGCGGCTGGGGAGAGCGCTCGGGAGCTGCCGGGCGCT-
    GCGGaccccgtttagtcctaacctcaatcctgcgagggaggggacgcatcgtcctcctcgccttacagacgc-
    cgaaacggagggtcccattagggacgtgactggcgcgggcaacacacacagcagcgacagccgggaGGTAAGCC
    GCGTCCCAGCGGCTCCGCGGCCGGGCTCGCAGTCGCCCCAGTGA
    1235 GCTTGGCCCCGCCACCCAGACCCCTCCCCCGGGGGCGCCCAGCTTGGCCTCTGGGTCCCG-
    GCGCACGCGGACCCCAAGTCGGGGAGGCCGGGCTGACCGCGGCCGCCTCCCCGGCTCCGGGTAGGAGGTGG-
    GCAGAGAAGGTGGGCTGAGGGGAGGAGAAACTGGGCTGCGGGGGTCCGGGAGGGTGGATTCCGAGAAACTATGT
    GCCCAGCTGACCCTGCCCGCCCCGCCGCGGCCCTGCAGTCCCCGGGCCAG
    1236 GCGGGGAAGGCGACCGCAGCCCACCTACCGCTGGACGCGGGTTGGGGACCCCGCCGCCCGGC-
    CAGCTTTGTTcgggggcccgcggcccctcccgggcccccgcACCGCCTCGGGTGACCCGCGGT-
    GTCCCAGCGCGTTGACGCAGCCTGTGATCCCTCGCGAGGCGAGGAGAAGGTCGGGGGCTTGGCTCTGCCTAATG
    GCCGCCCGGGGA
    1237 gcgcccaaccaccacgcccgcctaatttttgtatttttagtagagacgggttttcaccattttggc-
    caggctggtctcgaaccccgacctcaggtgatctgcccaaaagtgctgggattacaggcgtcagccaccgcgc-
    ccggccGGGACCCTCTCTTCTAACTCGGAGCTGGGTGTGGGGACCTCCAGTCCTAAAACAAGGGATCACTCCCA
    CCCCCGCC
    1238 AAAAGCCCCGGCCGGCCTCCCCAGGGTCCCCGAGGACGAAGTTGACCCTGACCGGGC-
    CGTCTCCCAGTTCTGAGGCCCGGGTCCCACTGGAACTCGCGTCTGAGCCGCCGTCCCGGACCCCCGGTGC-
    CCGCCGGTCCGCAGACCCTGCACCGGGCTTGGACTCGCAGCCGGGACTGACG
    1239 CGCAGGTGCGGGGGAGCGTGCGGCCGGGTCCATGCGCCTGCGGGCGGCGGGGGGAGACGCGT-
    TGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAATGCGCCCGC-
    CGCAGACTCCGCGCACGACTCCGCCTGGGAGCGCGTTGGGGGCCGTTGGAGTCCAGCATGGCGCGGACCCCGG
    1240 CCCGCCCACAGCGCGGAGTTTAGTCTGCGCGTGCCTCGCTCGAGAACGCGCTCGTGCGCATGC-
    CCACAAAGGCCAAGGAGGGAGTGCGCAGGTCACGTGCGCCGGTGGTCAGCGCGCGCATTGCCTGCCCCG-
    GAAGTGGTcggcgcgcggcgcggcgcgccTGGGCGCTAAGATGGCGGCGGCGTGAGTTGCATGTTGTGTGAGGA
    TCCCGGGGCCGCCGCGTCGCTCGGGCCCCGCCATG
    1241 GCAGGGGCCCGGGGGCGATGCCACCCGGTGCCGACTGAGGCCACCGCACCATGGCCCGCTCGCT-
    GACCTGGCGCTGCTGCCCCTGGTGCCTGACGGAGGATGAGAAGGCCGCCGCCCGGGTGGACCAGGAGAT-
    CAACAGGATCCTCTTGGAGCAGAAGAAGCAGGACCGCGGGGAGCTGAAGCTGCTGCTTTTGGGTGAGTCCAGGG
    TCGGTGGGCGGTGGGTGGTGGGCAGTGGGCGGTGGCCAGCCGGCAGGG
    1242 CATGACCGCGGTGGCTTGTGGGAAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCAC-
    CGCCGGACGCTGGCTCCCGGAGGTTTTAGTTTTCCCTCTACCAGGAGTGTGAAGACACAGAGACTTAT-
    TGCGCTGGCGAAGATGGCTGAGGCGAAGGCGTGTCCGA
    1243 GCAGGTGCTCAGCGGGCAGACGCCCCGCCCCGCCCCGCCAGGTTCTGTTGGGGGCGAGGC-
    CCGCGCAAGCCCCGCCTCTTCCCCGGCACCAGGGGCGGGCCCAGGTGCGCCCAGGGCCGGGGAGCGGC-
    CGCGCAGGTGCCTGCCCTTTGCGCCTGCGCCCAGCTCG
    1244 GGTGCGCCCTGCGCTGGCTAAAGTGCGCAAGCGCGCGAGGCTCGGGCCTTTCAAAccccggcgcgc-
    cggcgccggcgTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCGCTGCGTTCCT-
    TAGCAACGAGCGTTTCCTCCAGCCCCGCCTCCCTCCGCCACACACAACCCCGC
    1245 AATTTGGTCCTCCTGCGCCTGCCAAGATTGTCTgagtattgatcgaacccaggagttcgagat-
    cagcttgagcaagatagcgagaacccccgcccctccacctcgtctcaaaaaaaaaaaaaaaTCGTCT-
    CAGTAGCGAATAGTCTAACGGAGAATGACAGGGAAATTGGTGATCCTTTCTGGGCCCAAGAGTTAGAAATGGCT
    TTGCAggccgggcgcggt
    1246 GGCTTCCGCGGCGCCAATCTCCACCCGCAGTCTCCGCCTCCCGCACCTGTGGTCCGGGCCTCACG-
    GTTTCAGCGCCGCGAGGCCTCACCTGCTGGTCTTGGAGCCTCAAGGGAAAGACTGCAGAGGGATCGAGGCGGC-
    CCACTGCCAGCACGGCCAGCGTGGCCCAGGGCTCGCAGCACTTCCGGCCTCTCTGGCCCCGC
    1247 GCCAGGAGAGGGGCCGAGCCTGCACAGGAGCTTCCTCGGTTTTCCGAGCGCCGGCCCCCCTTCTCT-
    GCCTGGGAGGAGGTGGTTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGT-
    GCAGACCAATGAGAGCCCCAGAGAGAAAGACGGTCATTTCCTCCCTGCATCTTCCCTTGGGGC
    1248 cgagcgccggccccccttctctgcctgggaggaggtggttagagtcccctgggtgtgtgccccgca-
    gagggagctctggcctcagtgcccagtgtgcagaccaatgagagccccagagagaaagacggt-
    catttcctccctgcatcttcccttggggc
    1249 GGTTGCGAGGGCACCCTTTGGCCCGGGGGCGCGCAGGAGAGGGCAGGGGCCAGGGGTTTCCTGGGC-
    GAGGGCGCGGGGACGAGCAGGAAAAGGCCGGGGTGGGGGTGGAATTCCTCGGCGGGCAGGGGGCGCATGCGC-
    CGGGCACCGTGGGGCGGGACGTGGCCCGGGAGGAGCTGGGGGGACTGGGTGGTGCACGTGCGGGC
    1250 acccggacgcggtggcgcgcgcctgtaatcccagctactcgggagcctgaggcaggagaatcgct-
    tgaatccgggaggcggaggttgcagtaagccgagatcgcgccactgcaccccagcctgggcgaca-
    gagcaagactccTCGGTAAAGACACCACTTCGTCACCC
    1251 CGCCGCCGAGCCTCAGCCACGCCTCTGTGCAGCGGGGAAGACTCCTCTCGCGCCTTCTCAGTCAGT-
    CACGGATGATGCTGACCCAGCGCTCCGGGGCTTTCTACCAAGTAATCAGTCCAGACAAATGCCAAAACGAC-
    CGCCACAAGGAGGACAACGGAAGTCCCGCCGCGACCGCGCGTGCGCTTACGGAAACACCACCTTTCGGAGGCCT
    CATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT
    1252 gcagcctcaacctcctggggtcaagtgatcatcctggctcaaccacccaagtagccgggactacgg-
    gtggccgccaccatgcccggataatttttttatttttgtggagatgggggtcccacgatgttgc-
    ccagtccagtcttgaactcctgggctcaagtgatcctcccgcagcagcc
    1253 CTTGCCGACCCAGCCTCGATCCCCTGCGGCGTCCAGGTCCCAATGCCCCAACGCAGGCCACCCCCG-
    GCTCCTCTGTGGACTCACGAAGACAAGGTCCGGCCGCTCGGGCCGCGAGAGTCGCGCCATCACCAC-
    CATTTTTCTGGATGCCCA
    1254 GCGGCGTTCGGTGGTGTCCCGGTGCAGCCACGCGAGAGTAGAAGGGTGGAAAGGGGAGGTGCCCAGT-
    GAAATGGAGCCTGTCCCGTGCACTTTCGGGCATTTCGAGCATCTTGTGGGCTCTCCCAAGTCGCGGC-
    CCCTCCTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGGCTGCGCTGGGGCCTCGGCCCGGGAGG
    AGGCGGGGGGCACGGCCTTTCCATTTTCCCTGCTCCCCTCTGCAGAA
    1255 CCGGACTCCCCCGCGCAGACCACCGTGCCAGGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCG-
    GACAGCGTCACCTCCCCGCAGCCGCCGGCTGGGACCCGCGGCCAGCCTTTACCCAGGCTCGCCCGGTCCCTGC-
    CCGCATGGCGG
    1256 ggccccctgcaagttccgcctcccgggttcacaccattctcctgcctcagcctccccagcagctgg-
    gactacaggcacctgccgccacgcccggctaattttttgtatttttagtagagacagggtttcaccatgt-
    tagccaggatggtctcgatctcctgaccttgtgatctgcccgcctcggcctcccaaagtgttgggattacaggc
    gtgagccaccgtgtccagccTGTAACA
    1257 GCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTGTAG-
    GAACACCCTCCTTTGCCTCTTTGACAGGTCTGCATCATAACACtttttttttttttttgagacagagtct-
    cactctgtcgcccaggctggagtgcagtggcacgatctcggccccctgcaagttccg
    1258 CCGGCTGCAGGCCCTCACTGGTTGGGTCCGCCCGCGAGGGTGCCCTGGGCCCGGT-
    GTCTCTCCTCCTTCTGAAGTTTGTTCCCATCCACCCGGCATCACCGACCGGTTTTATCCCGCTGAGGCCCTGG-
    GAGATGGGTCTGGCGAGGCTCGTAGGCCGCGGATTGGCTGGCTGGGTGCAGGGGGGTGCGGGAAGGGGAGGATT
    TTGCA
    1259 GTCACACCTGCCGATGAAACTCCTGCGTAAGAAGATCGAGAAGCGGAACCTCAAATTGCGGCAGCG-
    GAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGGCTCAGAAGACCCACGCGCGAGCCCTG-
    GCGCGTTCGGGCGGCCGGGGGCCCAGCTGCTCTGTGTGACGGAGGCAGCTTCCCCTGCAGCGTGTGTGATTGGG
    GAGAGTGAAAAGGCAGCTTCCACTCGGGACCCGCGCTGCTGCCCACTC
    1260 CCCTGCGCACCCCTACCAGGCAGGCTCGCTGCCTTTCCTCCCTCTTGTCTCTCCAGAGCCG-
    GATCTTCAAGGGGAGCCTCCGTGCCCCCGGCTGCTCAGTCCCTCCGGTGTGCAGGACCCCG-
    GAAGTCCTCCCCGCACAGCTCTCGCTTCTCTTTGCAGCCTGTTTCTGCGCCGGACCAGTCGAGGACTCTGGACA
    GTAGAGGCCCCGGGACGACCGAGCTGATGGCGTCTTCGACCCCATCTTCGTCCGCAACC
    1261 CCTGGGGGAGCGCGGTGGGGGTAAGATAAGGGATGGGGGCTCCGAGGGCTGGGAACTGCAGGAAG-
    GAAAGAAGCGGCGGGGCCGCCCGGGTCAAGGGGCCACGTGGGGGAGGGCGGGCAGGCGGGACCGGGAGGT-
    CAATAACTGCAGCGTCCGAGCTGAGCCCAGGGGAGCGGGCGAGGAGAAAGAAGCCTCAGAGCGCCCGGGAAGCC
    TCGCGCGCCTGGGAGGCTTCCATCTCCCGGGACCCAGCTCTCAGCC
    1262 GTGGGGCCGGGCGAGTGCGCGGCATCCCAGGCCGGCCCGAACGCTCCGCCCGCGGTGGGC-
    CGACTTCCCCTCCTCTTCCCTCTCTCCTTCCTTTAGCCCGCTGGCGCCGGACACGCTGCGCCTCATCTCT-
    TGGGGCGTTCTTCCCCGTTGGCCAACCGTCGCATCCCGTGCAACTTTGGGGTAGTGGCCGTTTAGTGTTGAATG
    TTCCCCACCGAGAGCGCATGGCTTGGGAAGCGAGGCGCGAACCCGGCCCCC
    1263 CGTCCAGGCTGTGCGctccccgttctcccctcctccccacttctccccacgcct-
    tgctcgtctcccgccctcctccgacaaccgctcccctcaccctccacccctacccccgc-
    ccctcctccttcctccccGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACGTGACCTG
    CCCATTTGTATGCCGCGGAGCGCTCCATTCCGGCCCCTTTGTGGCCA
    1264 GCGCGGCGGTGCAGCCTCTCCCGAGCGCGCTGGGTCGCCTCTGCTCGGTCTGGGGTCTGCCAG-
    GCGCGATCCCCCCGGTGCAGCCGAGCCCCTCCGCAGACTCTGCGCAGGAAAGCGAAACTACCCGGCAG-
    GAGAAAAGGCAGCGCTGGCGCCCGGCCCCCTTCCGCCCCCACCAATCACCGGGCGGCTCCGCGCTCAGCCAATT
    AGACGCGGCTGTTCCGTGGGCGCCACCGCCTCCCTCTGCGGGCCGCTGCT
    1265 aggcggcggcggtggcagtggcacccggcggggaagcagcagcCAAACCCGCGCATGATCTCGA-
    GAGTTTCAGCAACATCCAGGGACTGGGCTCAGCCCCGGAGCGAGAGGGTCGTCCGCTGAGAAGCTGCGCCG-
    GAGACGCGGGAAGCTGCTGCCATAAGGAGG GAGCTCTGGGAAGCCGGAGGACAGGAGGAGACGGGAGTCCAGGG
    GCAGACGAGTGGAGCCCGAGGAGGCAGGGTGGAGGGAGAGTCAAGG
    1266 GCGCGACCCGCCGATTGTGTCGAGTCAGCAGCGGCAGCGGGGACGCGCGAAGCCATGGCTCCCGC-
    CCGCGCTCGGGAGGGCGCCGGGGGTCCTGCGCCTCCGGGAGGTTTGTGGCCGAgcgcggcgcggccccgagcg-
    gccccgcagcgcccggctccccgccgcTCGCTCTCCAGGCGCCGACCCGCCTGCGTCGCCACCCTCTCGCCGCT
    CCCTGCCGCCACCTTCCTCCCGCCCGGGTGCCGGGCGTCCGCT
    1267 CGCGGACGCCGCTCTGCACCTGTTGCCGCCGTCACTCATCCCGCCAGGCGGGCGGGGCCGCGCGGGT-
    GGCTTGGTCAGGACCTGCCATTCAGCCCAGTCGGGCTCCGGTGCTCGCCCCGGACGGCGCCCCAAGCGG-
    GTCCCGGCCCCGCTGAGCACCTCCAGCAGTGGCACAGCCTCTGGAGGGGTCCGGGACGAAGCCACCCGCGCGGT
    AGGGGGCGACTTAGCGGTTTCAGCCTCCAACAGCCTTGGGATCGC
    1268 tgaacccgggaggcggaggttgctgtgagccgagatggcaccattgcactccagcctgggcaacaa-
    gagcgaaactccgtcccccgaacaaaaaattcaaatgggaaagagaggcagatggcagagaacaggggaggg-
    gctgggcaccgtggctcatgcctgtaatcccagcactttgggaggccaaggcgggtgga
    1269 CTCGGCGGCGCGGGGAGTCGGAGGACGCAGCCAAGCGGCGGCGGCGAGGAGGGTCACAGCCGGAAA-
    GAGGCAGCGGTGGCGCCTGCAGACGCCGCGCAGCCCGGGCAGCCCCACAGCGCAAGCTGGCTGCCGCGGCG-
    GCGGGGGCTTTATCGGCGGCGCCGCGCGGGCccccgccccttcctgccgcccccgcccccggcccgccttgccc
    cgccttcccgccg
    1270 aggcggccacgggagggggaggggctggcaacggcgccgtgggggcggggctcgctttgtgcaag-
    gtccgcgctgattgggccgtgggcgcgcgggtcccggcctgcgtcgtgggactggcgtttttggcgccggct-
    gtgaggggagcgcgggggtggtggaatcgggcggtctccggttcgccaatgtggctgggtccgtaggcttgggc
    agccttggagttcctcagagaccccgcgctcggtcccggcacgc
    1271 GACCCGAGCGGGGCGGAGAGTGGCAGGAGGAGGCGAATCTCCGCGCTCCGGCGAACTTTATCGGGT-
    TGAAGTTTCTGCTGTCGCCTCCCCTTTGCGTGCGGAGCTGGGCTTTGCGTGCGCCGCTTCTGGAAAGTCG-
    GCTCCAGTCATATCCCTGGGCGCTGCCTGCGGCCGCTCCTCCCGCGCTTCTCACGGCACCTGACACGCGGAGGC
    GGCGGCCGAGGGTGGGGTGCCGGCCACCACCACCCTTGGCGTGGG
    1272 AGCACCTggggcggggcggagcggggcgcgcgggcccACACCTGTGGAGAGGGCCGCGCCCCAACT-
    GCAGCGCCGGGGCTGGGGGAGGGGAGCCTACTCACTCCCCCAACTCCCGGGCGGTGACTCATCAACGAGCAC-
    CAGCGGCCAGAGGTGAGCAGTCCCGGGAAGGGGCCGAGAGGCGGGGCCGCCAGGTCGGGCAGGTGT-
    GCGCTCCGCCCCGC
    1273 CCGCTCGGGGGACGTGGGAGGGGAGGCGGGAAACAGCTTAGTGGGTGTGGGGTCGCGC-
    ATTTTCTTCAACCAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAGCCGAAGGAGACGCTGCAGTTGGA-
    GAGCGCGGCCGAGGTCGGCTTCGTGCGCTTCTTTCAGGGCATGCCGGAGAAGCCGACCACCACAGTGCGCCTTT
    TCGACCGGGGCGACTTCTATACGGCGCACGGCGAGGACGCGCTGCTGGCCGC
    1274 ACCGCCAGCGTGCCAGCCCCGCCCCTACCCACCAGTGTGCCAGCCCCGCCCTTCCCCACGTcgc-
    cgcgcgcccgggggcggggcctggcgcgcaccgcccgcgcACGGCGAGGCGCCTGTTGATTGGCCACTGGGGC-
    CCGGGTTCCTCCGGCGGAGCGCGCCTCCCCCCAGATTTCCCGCCAGCAGGAGCCGCGCGGTAGATGCGGTGCTT
    TTAGGAGCTCCGTCCGACAGAACGGTTGGGCCTTGCCGGCTGTC
    1275 ATTCTTggccgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggtgggtggat-
    cacctgaggtcaagagttcgagaccagcctggccaacatggtgaaaccccgtctc-
    tactaaaaatacaaaaattagccgggcgtggtggtgggcacctgtaatcccagctactcagaaggttgaggcag
    gagaatcgcttgaacccgggagaaggaggttgcagtgagccgagatcgcgccattgcac
    1276 CGCTTCCCGCGAGCGAGCCGCCCAGAGCGCTCTGCTGGCGGCAGAGGCGGCGGCGAGGCTG-
    GCGCGCTTGCCGCCGTCTGCTCGCCCCGCGGAGGCGACCTGGGCAGACGCTGCTGG-
    GAACTTTGAAAAACTTTCCTGGAGCCAGGCTTGCCGCAGATTCGAGGGGAAGCCTCGGCCGCGTCCCACCCCCT
    CCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCCGGGGCA
    1277 AGCCGGCGCTCCGCACCTGCCCCTCAGCGCCTGCCGTCCGCCCCACCGCCGCGGCGC-
    CCCGCACTCCTGGGCGGGCCAGGGGAGCGGGCTGGGCGGGCGATCGGGCACGCGGGATCCCTGGTCGAGC-
    CCCCTTTCCTCCCGGGTCCACAGCGAGTCCCCTGAGGAAGGAGGGACCTGGGAGGAAACCACCCTCTGGGGCGG
    CTCCGGCCTCCAGCCCCCGCCCCGTCTCATCGCGCCGGGCGCCCGGTGCGCCTG
    1278 CGGAGCGCGCTTGGCCTCACAGGACAGTGGGTGTGGCTGGGGTGACGGGGCAGGGTGGGGAAGACTG-
    GCCTAACACCAGCGCCCTCTGCCCCATGGCTGGCCAGGGACCCGCGAGTCCCTGGACACGCACTGGCCAACGC-
    CAGACCCCATCTCATCGGGTGGGGAAGTCGCGGGGACACTGTCAGGGCGCCGAAGTCCGGACCCGGCTCAGAGG
    CGGTGGCAGGTGAATTGCTGCGGCGCCGGG TAGG GGCGGGC
    1279 ggcctcgagcccacccagacttggccaagcagccctcggccagaccaagcacactccctcggag-
    gcctggcagggcccctgctttaccctgccccccacgccccgccccgacccgaccctcccaggcagcccct-
    cagcgtctgccgcccgcccttgggcctttccggccagcccctccctccgcccacgcccagaacagcccatgctc
    ttggaggagagcaggtgggcttgaccgggactggcccctcaccgcgg
    1280 GCCGCGCCGTAAGGGCCACCCCCAGAGGCCGAGGAGGTGGGGCTGGCCTGGCTTTCTGGCCAGGT-
    GGGGCTTGTCCAACCCCACAAACATCAGGGCTCACCCTGGATGTGGAAGAGAAGGAGCGAC-
    CCCCAAAACGAAGCGGCTGGATCTGACCTTCCAAGGCCTGTTGGCGACGCAGGGCCCCCAGGAGGCAGAGCGCG
    CGCCTGGCCCGGGCGATGGGCCTCCCGTCCCCCCAGGGCTGCCTCCCCGCCGGTG
    1281 CGGCGGTGGCGGTGGGTCGGCGACCGGCGGGCCGAAGACTGGAAGCCCGGGCCGCTGAG-
    GCTCCGCAgccccctccgcgccgccccggcccgcccccgccgcgccgccccttccctccccgcgcccgc-
    cccTTCTTCCCCGCAGGGTCAGCGCTGGGGCTCCGGCCGTAGAGCCACGTGACCCTGGCAGGCCCTGCTCGCGG
    GGCTTGGCGACAAGGACGCACGACACGGGGCGGC
    1282 ACCTGCCCAGTTACTGCCCCACTCCGCGGAATAAGCTCTTACCCAC-
    CGCTCCTCTTCTTCAATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGT-
    TATAAATAAACGTTATCTGGAAGAGCAGCCGACAACAACTTTCAAGATCTCCAATTCCCCGAC-
    CCCACACTCCAACTGACGCC
    1283 CCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTGCCAAACCGGGCAGCATCGCGACCCT-
    GCGCGGGGCACCGAGTGCGCTGCTGTGCGAGTGGGATCCGCCGCGTCCTTGCTCTGCCCGCGCCGCCACCGC-
    CGCCGTCTCCCGGGGCCCCCGCGCACGCTCCTCCGCGTGCTCTCGCCTACCGCTGCCGAGGAAACTGACGGAGC
    CCGAGCGCGGCGGCGGGGCTCAGAGCCAGGCGAGTCAGCTGATCC
    1284 CTGCTGCTGCCCGCGTCCGAGGCTCGCGGGCGGCGGGCCCGGGTGAGTGCACACCCGGCGCGCTGC-
    CGGGCTCCCGGATGTGTCACCTTGTCCCGCTGCAGCCGAGATGCCGGGGGAGCGGGGCCTTCCACAC-
    CCCCTCCGTGGGTGTGTGGTGAGTGTGGGTGTGTGCGCGTCTCCTCGCGTCCCTCGCTGAGGTGCCTACTGTGT
    CTGCATGGGTTGGGTCCCGCGCGATG
    1285 ACTGCTTAGGCCACACGATCCCCCAAGCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGAT-
    CATGCCCATCCTGTGCAGAAGGTCACTATAGGAACACATGGCACAGGGAAGAAAACGCCCATAGAAATTCA-
    CATGGTGCTTGTCTAAACCGAAGGCAGGTGAGATCCACCCACTG
    1286 GCCGGACGCGCCTCCCAAGGGCGCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGG-
    GCCACTCGGGGAGTGTCAGGAAGCACGACTGGGCGCCTTAGGACGTCCGGGCAGACGCGGCCCCCGAGGAGC-
    CCCAGAGGAGCCCCAGAGGAGCCGCCTGACCCGGCCCCGACGTGCGCGATCGAGCCCGGGCTCGCCAAAGCCCC
    CGCGCCCCTCCGGCCCGGACAGGCCGAGTGGACATTGTCGGAG
    1287 CGGCCAGGGTGCCGAGGGCCAGCATGGACACCAGGACCAGGGCGCAGATCACCTTGTTCTCCATGGT-
    GGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGCCGCCCGAGCCCCG-
    GATTTTATAGGGCAGGCTC
    1288 ccgcccgcccCACAGCCAGCGGCTCCGCGCCCCCTGCAGCCACGATGCCCGCGGCCCGGCCGCCCGC-
    CGCGGGACTCCGCGGGATCTCGCTGTTCCTCGCTCTGCTCCTGGGGAGCCCGGCGGCAGCGCTGGAGCGAG-
    GTAAGCGCCCCGAGGGGCGGGGCGGGCAGGGGGCAAAGTTGCCGGGAGAGCGGGGCAGCCAGGGGTCGGGGCTG
    ACCAGGGCGACTCAGGCACCACCCGCCGGGA
    1289 GCGCCCCAGCCCACCCACTCGCGTGCCCACGGCGGCATTATTCCCTATAAGGATCTGAACG-
    ATCCGGGGGCGGCCCCGCCCCGTTACCCCTTGCCCCCGGCCCCGCCCCCTTTTTGGAGGGCCGATGAGGTAAT-
    GCGGCTCTGCCATTGGTCTGAGGGGGCGGGCCCCAACAGCCCGAGGCGGGGTCCCCGGGGGCCCAGCGCTATAT
    CACTCGGCCGCCCAGGCAGCGGCGCAGAGCGGGCAGCAGGCAGGCGG
    1290 cgtgctgggcgcaggggaaacagcgacgcacgggacaaaACAAGCTTGCAGAACAGCAGGGGGCAGA-
    GAGGCTGTAAACAAGCCAACGGGCTGCACTTGTAGCGGTTCTGTTGCCAATGCCATTCAGACCCCAGTCCGG-
    GATTCCGCGCTCGGGGTGCGAGAGGCCGCTCCcggggaggggcgggacccgggcggggcgggaggggcggggcg
    CCCGGGCCTATTAGGTCCCGCGCCGGCAGCC
    1291 GCGCACGCGCACAGCCTCCGGCCGGCTATTTCCGCGAGCGCGTTCCATCCTCTAC-
    CGAGCGCGCGCGAAGACTACGGAGGTCGACTCGGGAGCGCGCACGCAGCTCCGCCCCGCGTCCGACCCGCG-
    GATCCCGCGGCGTCCGGCCCGGGTGGTCTGGATCGCGGAGGGAATGCCCCGGA
    1292 GGTGAGTGCGGCCCGGGGAGGGGAGGGGACCAGGGCGACCGGAGCCCCCAGCGATCCCGCCTG-
    GAGCGGCCGCCAAGCTCCCTCGGGCACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCT-
    GAGCCTCCTGGACCGGGTCCGCCGCGGACCTCGGCCTGTCACCTGAAGGTGCCGCGTGGTCTCTGAGGACGTCT
    GTCGACGAGCAGGGGCCGCCGCCA
    1293 GGCCGAGAGGGAGCCCCACACCTCGGTCTCCCCAGACCGGCCCTGGCCGGGG-
    GCATCCCCCTAAACTTCGGATCCCTCCTCGGAAATGGGACCCTCTCTGGGCCGCCTCCCAGCGGTGGTGGC-
    GAGGAGCAAACGACACCAGGTAGCCTGCCGCGGGGCAGAGAGTGGACGCGGGAAAGCCGGTGGCTCCCGCCGTG
    GGCCCTACTGTgcgcgggcggcggccgagcccgggccgcTCCCTCCCAGTCGCGcgcc
    1294 CCAGCGCCGCAACGCCCAGGGTGTGGGGCGGAGTAAGATGTGAAACCTCTTCAGCTCACGGCACCGG-
    GCTGCAACCGAGGTCTGAATGTTGCGAAAGCGCCCCAGACGCCGCCGCTGCTTTCCGGCCGCCCCCTCGGC-
    TACAGCCGCCATTTCCACGCTCCACCAATCAAATCCATTCTCGAGGAAGACGCACCGCCCCCACACGCCCCGAC
    CAATCGCTCGCGCTCTGGTTGCGCTGGCGCC
    1295 CCACAAGCGGGCGGGACGGCTGGAGACTGCCGGGACAGCGGCTGCCGGTGCTACGCGGGTGGTGG-
    GCGGCCCGGAAATGAGCGCCCTCCGGGGACAGGGGGCTCTGCGGGGCGGCGACAGCTG-
    GATTCCCAGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCGAGGGCGGGC
    GCAGGGGCGGGGGATGTCGAAGGGTCAGGTTTGTCCAGGCCGCGCCACCTTCG
    1296 CCTCTGGACAACGGGGAGCGGGAAAAAAGCTACGCAGGAGCTTGGATCGGGCGAAGCTCGCGG-
    GAAACCGCTCTGGGTGCGCAGGACAAAGACGCGGGGACAGCGGGGAGGGCCGGCCGCAGCCTGCCGGGCTGC-
    CCCCACGGCGCGGAACGCGCGCAGCAACCTCCACCAGGCCTCCGCGTCTGGACTCCCGCCCTGCCTCTGGGCCT
    CCTCCGCCCACCGGCGGCGTCTCCCGCGAAGCCCGCTGGG
    1297 GCGGGTTCCCGGCGTCTCCAAAGCTACCGCTGCCGGAAGAGCGCGGCGCCCGACGGAGCCGTGTG-
    GAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCGCTTGCCAGGCCCAGCGTCTTTTCTGCATAGGAC-
    CCGGGGGAAGCCGGGAAGCCGTTAGGGGGCGGGGCAAGCGGG
    1298 CGCCGCCCGTCCTGCTTGCTGCTGGGTCCGGTTGCCGAGGCGGAAAAGTCGCAAGCTCCTTCAGT-
    CAGTCTTCTTCCTCAGCTCCTTCCGACTCCGGAAGCTGCTGTTTGGGCCCAGGCTCCCTGCATCCGAGAGC-
    CCTGGGCTGACTGCTTCTGAGGCCCCGCCCCACTACTGCCTGCAGCGGGCTTCCTTACTCCGCCTGCTGGTTCC
    TACTGGAGGAGAGGCCAGCATGCTTGTCAGGCACCAGCAGGTGGA
    1299 CGCGCGGCCCTCCTGCACCTCGGCCAGCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGT-
    TGAAGAAGTGAAGCGCCGCGCGCGCCGCCTGCTGCAGGAGCCTGCGCGGGACCCCAGCATCCTGAGGCTGC-
    CCAGGGTCGTCGGGGTCCCCGGACCCCGCGGGCGCCGCCACCGGGGCGAGCAACAGCAGCAGCGCGAGCAGCGG
    GGCGGTGGGGCGCGGGCCCCTGGGCCCGGACCAGGGAGCAGGCAGCCG
    1300 GGCGGGGCAAGCCCTCACCTGCGCCAATCAGGGTGCGGAGTAGGCCCCGCAGGCGCCTCACCCAT-
    TGAGGGGGCGGGCTGACAGAGCAGAGGAAGGAAGGGGGTGAGGGGCCTGTGGTGGGGATCCTGGGGCTGTCGG-
    GCTGAGTATGCCGTGTGGGTGGAGAGGAAGCCTCGGGGAAATCGCCCAGGTGAAGGGAGGGCTTGGTGTGGGGA
    CTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTAAGC
    1301 GGAGCGCCCCCTGGCGGTTTCAGGGCGGCTCACCGAGAGGGCGCCGGGAGCGCCCGGTTGGG-
    GAACGCGCGGCTGGCGGCGTGGGGACCACCCGGCAGGACCAGGCACCAGAGCTGCGTCCCTGCTCGC
    1302 CGAATGGTTCGCGCCGGCCTATATTTACCCGAGATCTTCCTCCCGGACGGCAAGGATGTGAGGCAG-
    GCGAGCCGGACGCCGCTCGCAGCACCGGAGAGGGCGCACTGCAAAGGCGGGCAGCAGACCGTGGAGAGCCCGG-
    GAGCGGAGCTGGACACCGCCTCGGAGGGAAGAAATGAGGTAGCGGCGGTTCCCGGACCCGGCCATGCCCGTCCC
    CTGTTCTCGGAGCCCAGCGCCGTCTCGGCCAGGCCAGCCCGG
    1303 TTCCGCCGGCTGGGCCCTCCGTCTACCCCCAGCGGCGAggggcggggccggcgcgggcgcAGAG-
    GCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGCGGCCGAATGGGGCGGAGGAGTGGGT-
    TACTCGGGCTCAGGCCCGGGCCGGAGCCGGTGGCGCTGGAGCGGGTCTGTGTGGGTCCGAAGCGTTTTACTCCT
    GTTGGGCGGGCTCCGGGCCAGCGCCACATCTACTCCCGTCTCCTTGGGC
    1304 CTCCGGGTcccccgcgtgcccggcccgccccggcccgcTTCCCGGGCGCTGTCTTACTCCGGGC-
    CCGGGGCGCCTGCTCCGCGCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCT-
    GTCCGACTTGCGCGGCGGTGGCGGCGGCGGCCAAGAGCAGGCAAACCCGGCTCCGCCAGGGGCGCAGCGAGGAA
    ATGGCCTCCTGGCGCACACCCCGCCGCCGCCGCCAGCCATCGCCACCGCC
    1305 CAGCCCGGGTAGGGTTCACCGAAAGTTCACTCGCATATATTAGGCAATTCAATCTTTCATTCTGTGT-
    GACAGAAGTAGTAGGAAGTGAGCTGTTCAGAGGCAGGAGGGTCTATTCTTTGCCAAAGGGGGGAC-
    CAGAATTCCCCCATGCGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGAGCGATCCGAGCAGGGTTTGTCTGG
    GCACCGTCGGGGTAGGATCCGGAACGCATTCGGAAGGCTTTTTGCAAGC
    1306 GGCGGAGAGAGGTCCTGCCCAGCTGTTGGCGAGGAGTTTCCTGTTTCCCCCGCAGCGCTGAGT-
    TGAAGTTGAGTGAGTCACTCGCGCGCACGGAGCGACGACACCCCCGCGCGTGCACCCGCTCGGGACAGGAGC-
    CGGACTCCTGTGCAGCTTCCCTCGGCCGCCGGGGGCCTCCCCGCGCCTCGCCGGCCTCCAGGCCCCCTCCTGGC
    TGGCGAGCGGGCGCCACATCTGGCCCGCACATCTGCGCTGCCGGCC
    1307 CCTCACCCCAGCCGCGACCCTTCAAGGCCAAGAGGCGGCAGAGCCCGAGGCCTGCAC-
    GAGCAGCTCTCTCTTCAGGAGTGAAGGAGGCCACGGGCAAGTCGCCCTGACGCAGACGCTCCACCAGGGC-
    CGCGCGCTCGCCGTCCGCCACATACCGCTCGTAGTATTCGTGCTCAGCCTCGTAGTGGCGCCTGACGTCGCGTT
    CGCGGGTAGCTACGATGAGGCGGCGACAGACCAGGCACAGGGCCCCATCGCCCT
    1308 CGATGACGGGATCCGAGAGAAAGGCAAGGCGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGG-
    GAGTTAGCGGCGTCTCGGTTGCCATGGAGACCAGGAGCTCCAAAACGCGGAGGTCTTTAGCGTCCCGGAC-
    CAACGAGTGCCAGGGGACAATGTGGGCGCCAACTTCGCCACCAGCCGGGTCCAGCAGCCCCAGCCAGCCCACCT
    GGAAGTCCTCCTTGTATTCCTCCCTCGCCTACTCTGAGGCCTTCCA
    1309 CCGCAGGCCGCGGGAAAGGCGCGCCGAGTCCTGCAGCTGCTCTCCCGGTTCGGGAAACGCGCGGG-
    GCGGGGGCGTCGGGCTTGGGACAGGGGAGGATACCAGGGCCACCTTCCCCAACCCAGGCCGCGGGGGCCCG-
    GCCTCCCCGATGCAGACCACAGCGCCCTCACGGGCTGCCCTCAGGCCGCGCAGCGGGCAGCCGCCAGCCGTCAC
    CCCGGGGAGCGTCCGTGGGGTGCCCAGGCA
    1310 GCCCCAGTCCACCTCTGGGAGCGCCTGCGCCGCTCCGCGGAGAGTCCGTGGATCTCACAGTGAGC-
    GAGTTGGGACCCAGGGAGGGGAAAAGAGAGGACCCCGGCGAGCCATTGCTGGGGCGGCGGGCTGGAGGGT-
    TATCTGGGAAGTCAGCCCCGGCCTCGGTCCTCTCCACGTTGCTGCCTACGCGTGCTGCCCGGACGTAGGGC
    1311 CTTGGCCGCCCCCGGGATGGGGCGAGGGGTTCCCGAGGGCTtgggagggcggcttgggaga-
    gagctccggctccggaacgaggtgtcctgggaacactcccgggtctgtaacttcggacaaat-
    cacgctcgctttcccggcctcagtgtgccgttctgtaacttgggtctaaCCCCGGCTCGCACACACGGCGGGGA
    CGCGCACAG
    1312 CCTCCATGCGCAATCCCAAGGGCGGAGAGGAATTTCAGCAGCTACGAGCAACAGAAAGGAAACGAGA-
    GAGTAGCCAGACTCTCCGCGCATGGAGCCGACGGCACCCACCAGCACACCGCCGGCGCCCCCAGCCACTACT-
    GCACGTCCGCccccgccccgccccgctccgcccGGCGCACCTGATGCCCAAACTGGTTGCACGGGAAGCCGAGC
    ACCACCAGGCCCCGGGGTCCGAGGCGCCGCTGCA
    1313 gcggcgactgcgctgccccttggctgccccttccgctctcgtaggcgcgcggggccactact-
    cacgcgcgcactgcaggcctttgcgcacgacgccccagatgaagtcgccacagaggtcgcaccacgtgtgcgt-
    ggcgggccccgcgggctggaagcggtggccacggccagggaccagctgccgtgtggggttgcacgcggtgcccc
    gcgcgatgcgcagcgcgttggcacgctccagccgggtgcggccctt
    1314 GGGCTTGCCTCCCCGCCCCTACCTTCCAGGATGTTGACAGCTGGGAATGAAAGGCAGAGGGAGG-
    GAGCGCGGGGCCGGAGCGCCGCCTGGGAGTGTGCCCACTGGGTGGCCGCCTGAGGGACCCGGGAACAGAGG-
    GCAAAAAGTCCTGTGACCGGACAGAGCAGAGCGGGGACTGCAATTCCCAGAAGACCCCACGGTAGGGGCGGGAC
    CCAAGATGGCCGCTTGTCTGGGGACAGGAGCGGAGGCCAATACGCG
    1315 GCGGCCCAAGGAGGGCGAACGCCTAAGACTGCAAAGGCTCGGGGGAGAACGGCTCTCGGAGAACGG-
    GCTGGGGAAGGACGTGGCTCTGAAGACGGACAGCCCTGAGGAACCGCGGGGCGCCCAGATGGAACTCGT-
    TAGCGCCCCGAGTGCAGACAATCCCGGAGGGGGAAAGGCGAGCAGCTGGCAGAGAGCCCAGTGCCGGCCAACCG
    CGCGAGCGCCTCAGAACGGCCCGCCCACCC
    1316 ctgcgcggcTGGCGATCCAGGAGCGAGCACAGCGCCCGGGCGAGCGCCGGGGGGAGCGAGCAGGG-
    GCGACGAGAAACGAGGCAGGGGAGGGAAGCAGATGCCAGCGGGCCGAAGAGTCGGGAGCCGGAGCCGGGA-
    GAGCGAAAGGAGAGGGGACCTGGCGGGGCACTTAGGAGCCAACCGAGGAGCAGGAGCACGGACTCCCACTGTGG
    AAAGGAGGACCAGAAGGGAGGATGGGATGGAAGAGAAGAAAAAGCA
    1317 CACCGCCTCCGGACCCCTCCCTCATCAGAAAGCCCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCAC-
    CGCGCATCCAGGAGCCACGTGTCAGGAGTCACGTGTCAGGTGTCACGTGTCAGGCGTCACGTGGCTGGAGGC-
    CGTTGGAGCGCCTGCGCAGCTTTTCCGCACGCGCC
    1318 CCTTCCAGCCACCCCGCCCTGGGCGCCTCTGGCGCGCTCTGATGACGCTCCAAGGGAAGAGGAAGT-
    GGGGATCGGCGAGCGGGTGGGTGCGCCTCGGGCCGCGGGACTCGCAGCCGCCACCGCCGCTGCCGCCTCTACG-
    GCCGCGTCAGAACTGAAGAGAGGAAGGGGAGGAGCCGAGTCGAGCCTAAGCTGCCGCCCGATCTTACCCCTGAC
    CCGAGGGCGGCCTGGA
    1319 CGGGACACCGGGAGGACAGCGCGGGCGAGGCGCTGCAAGCCCGCGCGCAGCTCCGGGGGGCTCCGAC-
    CCGGGGGAGCAGAATGAGCCGTTGCTGGGGCACAGCCAGAGTTTTCTTGGCCTTTTTTATGCAAATCTGGAGG-
    GTGGGGGGAGCAAGGGAGGAGCCAATGAAGGGTAATCCGAGGAGGGCTGGTCACTACTTTCTGGGTCTGGTTTT
    GCGTTGAGAATGCCCCTCACGCGCTTGCTGGAAGGGAATTC
    1320 CCTGGGTTCCCGGCTTCTCAGCCACTGGAGCTGCCAGTCTCAAATTACCGGAGGGGAGGGAGGGCAG-
    GCCTGGATCTCAGGATCTCGGTCCTGCATGCAATGCAAGCCTGAGCTCTCCCGCCATAAGGCTGCAGCGGTGT-
    GGGCTCCTTGTGCCCAGATCCTTTGTATTCATAGGGGGAAGTGGAAGACCACGCTGCC
    1321 GGCGGTGATGGGCggaggaggaggaagaggaggaggaggaagaggaggagggggaAAACGATGACAG-
    GAGCTGGGGCCGGGGGGGGAAATTGGGGGGACGCGGGCGGAGGCGCGGTGCGCGCCGGCGGTGGCGGGCAC-
    GAGCCCCGCGCCTGGAGGAGGAGGAGTCAGGCCGGGTAGGAGGGCTAAGGAGGTTCCCGGGAAGGCAGGGcccc
    ccctcccccccctcccccccccccACACACACACACTCCCCTG
    1322 CAGCCCGCCCGGAGCCCATGCCCGGCGGCTGGCCAGTGCTGCGGCAGAAGGGGGGGCCCGGCTCTGC-
    ATGGCCCCGGCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAgccg-
    gcggggggaggcgcgggccaggcccccagcgccgccgcggccACGGCAGCCGCCATGGGCGCGGACGAGGAGGG
    GGCCAAGCCCAAAGTGTCCCCTTCGCTCCTGCCCTTCAGCGT
    1323 GCCCGCGGGGGAATCGCAGTGAGCAGCGCGGGGCGAGGCCGCCGCGGACGCCCCGTCGGATGTGC-
    CCTTCGCTGGGCCGAGCGGCGCAGGGTTGGAGAGGGAAGCGCTCGTGCCCACCTTGCTCGCAGGTGCCCT-
    TGCTGACCTGGGTGATGGCCTTCTCCCCGCGGCTCTCGGCCCTCTGGCTGGCGGCGCGCAGCTGGCAGCCGCTC
    GGGTAGGTGGTGCCGTCGCTGCCGCACACCGGG
    1324 GCCGCGAGCCCGTCTGCTCCCGCCCTGCCCGTGCACTCTCCGCAGCCGCCCTCCGCCAAGC-
    CCCAGCGCCCGCTCCCATCGCCGATGACCGCGGGGAGGAGGATGGAGATGCTCTGTGCCGGCAGGGTCCCT-
    GCGCTGCTGCTCTGCCTGGGTAAGTTCTCCCCCTCTGGCTTCCGGCCGCCCCAA
    1325 GCGGCCCCCTCCCGGCTGAGCCTATAAAGCGGCAGGTGCGCGCCGCCCTACAGACGTTCGCACACCT-
    GGGTGCCAGCGCCCCAGAGGTCCCGGGACAGCCCGAggcgccgcgcccgccgccccgAGCTCCCCAAGCCTTC-
    GAGAGCGGCGCACACTCCCGGTCTCCACTCGCTCTTCCAACACCCGCTCGTTTTGGCGGCAGCTCGTGTCCCAG
    AGACCGAGTTGCCCCAGAGACCGAGACGCCGCCGCTGCG
    1326 CAGCAGGGCGCGGCTTCCCTTTCCCGGGGCCTGGGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAG-
    GAGGGGCAGGAGGAAGGGGTGCGGTCGCGATCCGGACCCGGAGCCAGCGCGGAGCACCTGCGCCCGCGGCT-
    GACACCTTCGCTCGCAGTTTGTTCGCAGTTTACTCGCACACCAGTTTCCCCCACCGCGCTTTGGGTAAGTTCAG
    CCTCCCGGCGCGTCCCCGCGAGCCTCGCCCACAGCCGCCTGCTG
    1327 CCGCAGCACGCTCGGACGGGCCAGGGGCGGCGACCCCTCGCGGACGCCCGGCTGCGCGCCGGGC-
    CGGGGACTTGCCCTTGCACGCTCCCTGCGCCCTCCAGCTCGCCGGCGGGACCATGAAGAAGTTCTCTCGGAT-
    GCCCAAGTCGGAGggcggcagcggcggcggagcggcgggtggcggggctggcggggccggggccggggccggct
    gcggctccggcggcTCGTCCGTGGGGGTCCGGGTGTTCGCGGTCG
    1328 GCGGAGTGCGGGTCGGGAAGCGGAGAGAGAAGCAGCTGTGTAATCCGCTGGATGCGGACCAGG-
    GCGCTCCCCATTCCCGTCGGGAGCCCGCCGATTGGCTGGGTGTGGGCGCACGTGACCGACATGTGGCTGTAT-
    TGGTGCAGCCCGCCAGGGTGTCACTGGAGACAGAATGGAGGTGCTGCCGGACTCGGAAATGGGG
    1329 GCGCGGGGGCAGGTGAGCATGCGAAGGTTGGAGGCCGCGCCCCTTGCTGAGGCGCAGCTGGCT-
    GCTCTTTTCGGGCCGGCATACGCGCGCAGCCGCAGCTGAGGTCACCCCGCTGAGGTGGTGGGGAGGGGAATG-
    GTTATTCTTGAGGCACCGCATCTCTTGAGGAGGAAAGAGCCGGAAACACCTGGTCTCTCAAGCAGGTACAGCCC
    GCTTCTCCCCAGCACCCCGGTGTGGGCTTCCCAAGGTCCTGCCTGA
    1330 ggcgcgggggcaggtgagcatgcgaaggttggaggccgcgccccttgctgaggcgcagctggct-
    gctcttttcgggccggcatacgcgcgcagccgcagctgaggtcaccccgctgaggtggtggggaggggaatg-
    gttattcttgaggcaccgcatctcttgaggaggaaagagccg
    1331 AGTGACGGGCGGTGGGCCTGGGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGC-
    CGCGGCGGTGCGGTTGTCGCGGATCAGCAGGATCGGAGTGCGGGGCTGCTGGGCGGAGGCGTTGGCTGCAC-
    CAGGGACGGCGGCG
    1332 GGCGACCCTTTGGCCGCTGGCCTGATCCGGAGACCCAGGGCTGCCTCCAGGTCCGGACGCGGG-
    GCGTCGGGCTCCGGGCACCACGAATGCCGGACGTGAAGGGGAGGACGGAGGCGCGTAGACGCGGCTGGG-
    GACGAACCCGAGGACGCATTGCTCCCTGGACGGGCACGCGGGACCTCCCGGAGTGCCTCCCTGCAACACTTCCC
    CGCGACTTGGGCTCCTTGACACAGGCCCGTCATTTCTCTTTGCAGGTTC
    1333 CGCGGCAGCCCGGGTGAATGGAGCGAGGCGGCAGGTCATCCCCGTGCAGCGCCCGG-
    GTATTTGCATAATTTATGCTCGCGGGAGGCCGCCATCGCCCCTCCCCCAACCCGGAGTGTGCCCGTAATTAC-
    CGCCGGCCAATCGGCGGCGTCGCGCGGCCCCGGGAGTCGGCTCGGGCTAAGCTGGCCAGGGCGTCTCCAGGCAG
    TGAAACAGAGGCGGGGTCGGCGGGCGATTAGCGGCCGAGGCACGCTCCTCTTG
    1334 GGCGAGCGAGCGGGACCGAGCGGGGAGCGGGTGGAGGCGGCGCCACG-
    GCGCGCACACACTCGCACACACGCGCTCCCACTCCAcccccggccgctccccgcccgaggggccgcgcggcg-
    gccgcggggAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAggggcgcccgcggtccctgcccccacgcc
    ctccgcgcgggccccgccaccccggccccgacggcgcctgcacgcccgcgtcccctg
    1335 GGGGCAGTGCCGGTGTGCTGCCCTCTGCCTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAG-
    GTAGCGGCCACAGAAGAGCCAAAAGCTCCCGGGTTGGCTGGTAAGGACACCACCTCCAGCTTTAGCCCTCT-
    GGGGCCAGCCAGGGTAGCCGGGAAGCAGTGGTGGCCCGCCCTCCAGGGAGCAGTTGGGCCCCGCCCGG
    1336 CGCTGGCATTCGGGCCCCCTCCAGACTTTAGCCCGGTgccggcgccccctgggcccggcccgg-
    gcctcctggcgcagcccctcgggggcccgggcACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTAC-
    GAGTGGATGCGGCGCAGCGTGGCGGCCGGAGGCGGCGGTGGCAGCGGTAAGGACCCTTCCCTCGCCCTGCGCCT
    CTGGACCTGCAGGTGCTCGGGCGCGGCCCAGGCCGCCCCCTGTCTGA
    1337 GAGCCGTGATGGAGCCGGGAGGAGAGGCGCATCCTCAGCAGAGCTTCCCTCCCTTGCACACGAGCT-
    GACGGCGTGAACGGGGGTGTCGGGGTTGGTGCAACTATAGAAGGGAAAGGCTGGGCGGGGGTCACACATACCT-
    CAGTGGCAGGCAGGCAGGCGGCAGGCAGAGCGCGCTCTCCGGGCAGTCTGAAGGACCGCGGGAATGTGGAGGGG
    1338 GCCAGGGTGTCTTGGCTCTGGCCTGAGTCGGGTATGTGAAAGCCTTTTGGGGCAGGAAGGG-
    GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAG-
    GCAGCCTCCAGTCCCCCTCCTTTCAGCCTTGTCATTCTCTGCATCCTGCCCAGGCCACAAAGGA
    1339 CGGCTCCGGCGGGGAAGGAGGCgggctgcggctgcggctggggctgaagctggggctggggttgggg-
    GACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCGGACTGCTAACTGGACCAACG-
    GCAACTGTCTGATGAGTGCCAGCCCCAAACCGCGCGCTGC
    1340 GCCAGGGTGCCGTCGCGCTTGGCGCCGTCCAGGGCGGCGCTGCGCTCGTCCAGCAACACCACGGCGT-
    GGTAGGCGCCGGCCAGCAGGCGGCCGCGGAGCTCGGCGTTGGGCACGATGTGCTCCAGGCCCATGGCGCCCT-
    TGGCCCGGCGCCGCACGATGGTGCTGAAGCGCACGTTGACAGAGCCGGCGATGTGGCCGGCGTTGAAAGCGAAG
    AAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTCC
    1341 CGGCTCGGTCCTGAGGAGAAGGACTCAGCCGCGGCTGCGGGACCCGGGCACCGGGAggcggtggcg-
    gcggcggcggcggcagcagcggcgacagcagaggaggaagaggaggaagaaggaaagaaaaagaagaaCCAG-
    GAGGAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCGGCGACGGCCCAGCAAC
    CGTGA
    1342 CGCGGGGAACCTGCGGCTGCCCGGGCAAGGCCACGAGGCTTCTTATACCCGGTCCTCGC-
    CCCTCCAGCGCCGGCCTCGCCCGCGCTCCTGAGAAAGCCCTGCCCGCTCCGCTCACGGCCGTGCCCTGGC-
    CAACTTCCTGCTGCGGCCGGCGGGCCCTGGGAAGCCCGTGCCCCCTTCCCTGCCCGGGCCTCGAGGACTTCCTC
    TTGGCAGGCGCTGGGGCCCTCTGAGAGCAGGCAGGCCCGGCCTTTGTCTCCG
    1343 CCGCCGCTGCTTTGGGTGGGGGGCTGACAGGGCTGCGCGCGTCGCGCTCTTGGCTGGGGCTGCGCGG-
    GCCCGGGGCGCTGCGGGCGGCTCAGCGGCAGCTGCCGCGCTCTGCGCCTCCTCTGGGCGCACTGCCTGG-
    GAGCACGAGACTGGTTTGTCTGATGCTGCTGCCGGAGCTGAGGTCTTGCCTGGAGATCCGAACGAGACACCACG
    TCAACCGGCGCGGGGAGTCCCGTGAAGACATGAG GGCGCCAGGAG
    1344 ACCTGAGCCCGCGGGGGAAccccccccccaccccoggggaaccccccccacccccgccgc-
    cccccgccTGCAAGTTGTTACCAGTAAATAAAAGGGATCCTATTTTAGCAAGCCACACAGCATTAGAGG-
    GCAAATAATAGTTTGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGTCCAG
    ATTCTAACTGGCCTGTTAGCCAAAAAGTAAGGTTTTCTTTACCTCCGTGTTG
    1345 AACGCCGGCCTCACCGGCAGACGCGCGCCCTCCTCCCAGATGCGCAGGTGACCCCGGCGGGCG-
    GCGCGGGAAAGGGAAGAGCTCCGCGAGGCCGCGCGGGGGGGAAGCGGGAGAAGC-
    CGCTCTTCCTATTCCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTTCGCG
    AAATAAAGAGAGACGTATAAAAGTTCAAGAATTCTGTCCAGACTCAAGGGCCCTTTCTCATTTA
    1346 CCGTGGTCCCAGCGCTCCTGCTATTTGCATTCCAAAGCAGACACCTCATGCGCTCAACCCCGC-
    CCGCAGGCGGCTCCCGCAGTCTAAGGGACCTGGCGCGAGTCCGGGAAGCGGAGGGCGCAGCTGCGCAGG-
    GAAGGGGGCCGGGGGCGGGACCAGGGCGCGCGTTCCGGTCCCGGGGCGTGGC
    1347 TGCGACCCGGCGCCCAAGCAGCCTGGGACCTTGCGCGGACCTGACCCCTTCAGACCGCAGGCAGTCT-
    GGGAGGAGGTCCGGCCGGGGGAGGTGCAGGATCCCCGCCGTGTCTCTTTGACGACTTGGGGACTGTCACG-
    GTTCTCTCCCGGCGCCCCTGGGTTCTTTTGTCCTGCACGCGGTGCGAAGGGGCCAGCAGGGAAGGAGCAGAGGA
    TGGGGGGTGGGGTTGTTGGAGCCCCGCGGAGGTCTGGGAGGCCC
    1348 GGCTCTGCGCTGCCTTTGGTGGCTCCTCCCTGGTCCTCTAAATGTGACACCAGGCGGATGCGGGGC-
    CACAGGACCCTGGGGCTTGAGTCACACAAGAATGTCTCTGGGAGACCCGAGAGACTCACAGTTATGAAACAG-
    GACCATGGTTCTTTggccgggcgcgggg
    1349 gcgcgggcggcTCCTTTGTGTCCAGCCGCCGCCACCGGAGCTCCCGGGGCCTCCGCGGG-
    GAGCGCGTCCCCCGCATCCGCCCGACCCCCGGGGCTGGCACGTGCTGCGCCCGGTCCGCTGAGGGGGCGGAG-
    GCCCCGATCTCCCCGACCCCCCTTCTCTGCTTAGAGGAGGAGGAGCAGCGGCAGCGGCAGCAGGAGGCGACAGC
    TGCCAGCCGAGGAGGCGCGGCGGAGAGGGGACTGCGGTCAGCTGCGTCCA
    1350 GGCCCGTTGGCGAGGTTAGAGCGCCAGGTTGTAAGAATCGGGTCTGTGGACCTCATACCAGATAG-
    GCGCGAACGCCTCTGGCAGCGGCGTCCAGGGGGTCCGGCGGCACTCGCGGTGGGGCTGCCTGGGTTGCGGGT-
    GACGATCTGCGGGGTCCCGCACCCGGCCCCGCGGAGCCCGGACCCGCACGTAGGCGGCGCGGCAAAGGCACACC
    CTCCTCGCGGCCGCGAACCCAGCGCCGTCCTCGCAGCGCGGCAA
    1351 acccggcatccgggcaggctgcgcgcgggtgcggggcgagggcgccgcggggACTGGGACGCACGGC-
    CCGCGCGCGGGACACGGCCATGGAGGACGCGGGAGCAGCTGGCCCGGGGCCGGAGCCTGAGCCCGAGC-
    CCGAGCCGGAGCCCGAGCCCGCGCCGGAGCCGGAACCGGAGCCCAAGCCGGGTGCTGGCACATCCGAGGCGTTC
    TCCCGACTCTGGACCGACGTGATGGGTATCCTGGTAAGTTACCTGG
    1352 CCCGGACTGTAATCACGTCCACTGGGAACTGGCGCAGTAGTGGAGGGGACGCGATCAGGCCCGTG-
    GCTGCGCCCAGAGCATGATAAGCCAGGGACCTCGCGGCGCAGGCGGAGGGAGGGAGAGCGTCGCGGACCCAG-
    GCGGGGACAGGGAGACGCC
    1353 CGCCGCCAACGCGCAGGTCTACGGTCAGACCGGCCTCCCCTACGGCCCCGGGTCTGAGGCTGCG-
    GCGTTCGGCTCCAACGGCCTGGGGGGTTTCCCCCCACTCAACAGCGTGTCTCCGAGCCCGCTGATGCTACT-
    GCACCCGCCGCCGCAGCTGTCGCCTTTCCTGCAGCCCCACGGCCAGCAGGTGCCCTACTACCTGGAGAACGAGC
    CCAGCGGCTACACGGTGCGCGAGGCCGGC
    1354 GCTGCCAGCTGCCGCTCCGGCTCCCACTTCCCACCTGCTGCCCGAGGAAGACTTCCGG-
    GAGAAACGCTGTCTCCGAGCCCCCGCGCCGCCGCGCTCCCTCCGCTGCAGCAGCGGCCACCGGGTGCGCCCG-
    GAGCCCTGGGACGGCCTAAACCAGTATCTCGCGGGCCCCGCGCCGGGCTCCGGGAATGGCCGCAGCAGCCCTGG
    CGACCCGGGCCCCTCGGAGCTCCCCTTCAGGATCGTGCACCAAGCGCGCAC
    1355 GCGCCCACCTGCGCCTCGCGGGGTCCCCGAGGTCCCGCCACCGAGCGCCCAAGGCGG-
    GATCCCAGCGCGTCCTGCAGCCCGCCCAGCTTCAGGGCCGGCCCGGCGCGCGCAGGTGCGGCACTCACCGGC-
    CAGGTGAAGCCGAAGGGGAAGCGGATGGGGTTGCTGAACGCGGAGTCGGCGCCCCCGCCGTCGGGCAGACTGAA
    GGAGTCGACGCCCAGCACGGGGGTGACGGCGCTGCCGTAGGTGCAGGGCGGC
    1356 CGGGCCAGGGCGGCATGAAGAAGTCCCGCCGCTACGTGCCCGGCACAGTGGCCCTGCGCGACGTTCG-
    GCGCTACCAGAACTCCGAGCTGCTGATCAGCAAGCTGCCGCTCCTGCGAGAGCTCGGCGGTGACGCCGCT-
    GCACGAGAGCGA
    1357 GCTGCGACCTGGGGTCCGACGGACGCCTCCTCCGCGGGTATGAACAGTATGCCTACGATGGCAAG-
    GATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGGCTCAG-
    ATCTCCAAGCGCAAGTGTGAGGCGGCCAATGTGGCTGAACAAAGGAGAGCCTACCTGGAGGGCACGTGCGTG-
    GAGTGGCTCCACAGATACCTGGAGAACGGGAAGGAGATGCTGCAGCGCGCGGG
    1358 GTTAGGAGGGCGGGGCGCGTGCGCGCGCACCTCGCTCACGCGCCGGCGCGCTCCTTTTGCAG-
    GCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAAGGCGG-
    GAAAACGCTGCGATGGCGGCAGCTGGGG
    1359 AGCGCACCAACGCAGGCGAGGGACTGGGGGAGGAGGGAAGTGCCCTCCTGCAGCACGCGAG-
    GTTCCGGGACCGGCTGGCCTGCTGGAACTCGGCCAGGCTCAGCTGGCTCGGCGCTGGGCAGCCAGGAGCCTGG-
    GCCCCGGGGAGGGCGGTCCCGGGCGGCGCGGTGGGCCGAGCGCGGGTCCCGCCTCCTTGAGGCGGGCCCGGGC
    1360 CGGCTGGCCCCGCCCACTCTCCGCGGCCGGAAGTGGCGGCGCCGAGTGAGGTAAATGCGTGCCCG-
    GAAGCGCGACCTCGGGCGGTTGGAGGGGCTACCGGGTCTTACCAGTCCGTGGCGGGAGTCCCGGAGGAC-
    CCTCGACGGGGGAGTTGCCGAGAAAAGGCCTCGCCGGCA
    1361 GGGGTTGCCGTCGCAGCCAGCTGAGTGTTGCGCCAGGGGGACAGGTATGTTCCAGGCAGTGGCAAGC-
    CCAACCCGAGCAAGACCTGCGCTGAAACGGATTGGCTGCCCTCCGCCCGGAGTCCGTTCTCCCTGCAGCGGC-
    CAGTGCAGAGCTCAGAGGCTCAGAAACTCGCTCTCAGCCCCCTGGAGGCGGAGCCCGGGAGATAAGGTTCGCGC
    TCCCCACCCGCC
    1362 CCGCACTCCCGCCCGGTTCCCCGGCCGTCCGCCTATCCTTGGCCCCCTCCGCTTTCTCCGCGCCGGC-
    CCGCCTCGCTTATGCCTCGGCGCTGAGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAG-
    GCTCCCACCCGCGGATCAACACTCTGGGCCGCATGATCCGCGCCGAGTCTGGCCCGGACCTGCGCTACGAGGTG
    ACCAGCGGCGGCGGG
    1363 ggaccccctgggcagcaccctggccacccttccatccacaacatccagaccacacggccaagg-
    gcacctgaccctgtcaaaaccccaaatccagctgggcgcggtggctcatgcctgtaatcccagcatttgggag-
    gccgaggcagccgg
    1364 gaggcagccggatcacgaagtcaggagttcgagaccagcctgaccaacatggtgaaaccccgtctc-
    tactaaaatacaaaaattagccgggcgtggtggtgcacacc
    1365 GCGCGTGCGGGCGTTGTCCCGGCAACCAGGGGGCGGGGCTGGGCGTGGCACCGCCCCGCGCTCCGCT-
    GCCAGGGGCGGGAGGGAGGAATGGTTGCTTCACGCCCCGGGGGAAGAGACGGGAAGCTCGGCTCTGGGT-
    TGCGGGCCCCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGCAGCGCCCCGCACCGCC
    CCGCCTCGCCTGGGAGAAGCCGCCGGGACGCGCC
    1366 CAGGATGCGGCAGCGCCCACCCGCGCGGCGTGGAGGGGGCCGGGGGCGGCGCTCGGCGCAGATG-
    GCGCTCGCTGCGAGATGGATGCTCCAGGGCGGGTAATCACTCCTGGCTCAACACAGCATCCCGGGCGGAGCG-
    GATGCCAGATCCCACCGCTAAGAGCCTGGGCTGGGAAAGCAATCTTTCCAGGCAGCCCCCAGCCCGGTGCGCCG
    GCCCCGACAAGTCCCAGCCCTCGGAGGCAGGGCGGGGCGCAGGGA
    1367 gATGCGGCCCGCGGAGGAGAGAGCAGGAGGACGGACGGGAGGGACCTCCGCGGGGAGG-
    GCGCGCgggggaggcggggagggaggcgggagggggaggggACGGTGTGGATGGCCCCGAG-
    GTCCAAAAAGAAAGCGCCCAACGGCTGGACGCACACCCCGCCAGGCCTCCTGGAAACGGTGCCGGTGCTGCAGA
    GCCCGCGAGGTGTCTGGGAGTTGGGCGAGAGCTGCAGACTTGGAGGCTCTTATACCTCCGTG
    1368 GTTCTGCGCGCGCCCGACTCCGCTGCCCGCCCCGCCAGGCCTCCGGGAGGTGGGGGCTGGGAG-
    GCGTCCCCCGCTCCCGCCCCCTCCCCACCGTTCAATGAAAGATGAACTGGCGAGAGGTGAGAAGGGAAGAGG-
    GCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGCCAGAGCTCGCCGGGTGGCCGCAAG
    1369 CCCGCCGTGGGCGTAGTAAccgccaccgccgccgccccccgcgccaccaccaccgccgccT-
    GCCTCGCCTCTGCCCGAGCTGATGAGCGAGTCGACCAAAAAAGAGTTCGCGGCGGGGCTCTCCGAGCATGA-
    CATTGTTGTGGGATAATTTGGCGAAGGGAGCAGATAGCCCTTTCTGGCTGACATTTCTTGTGCAAAACATGCT-
    GAATACGATTAGCAATCCCCCCGCACCGCGGCGGGCGCCCGCAGCCAATC
    1370 ACCCGCCCGGGCAGCTCCAGTCCCGGACTCCGCAGCTCGGAGCGCAGCCAGCCACGGCCATTGCGG-
    GACCCTATTTATCCCGACACCTCCCCTGACGTGGGCTCGGAACGCTCCCTTGGCAGCTGCAGCCGCGGCGCGG-
    GCTCCCCCTCGGCCGCCCCACCCCCAGGCCCGTCGGTGCAGAAGCGGTGACATCACCCCCTCTGGGCCGCAGC
    1371 CAGCGGTCGCGCCTCGTCGGGCGACGGCTGGCAGCGAAGGCCGGAGCCACAGCGCTCGGTGTAGAT-
    GCCGCACGGCTGGCCCTCGCTCAGTGCGCACGTCAGGCAGCAGCCGCAGCCCGGCTCGCGCAC-
    CAGCTCCGCGCACACGGCGGGCGGAGGCGCGCACTGGGCCAGTGCACGCGCGTCGCACGGCTCGCAGCGCACCA
    CGGGACCCAAGCCCGCCG
    1372 GCAATCGCGCTGTCTCTGAAAGGGGTGGAGAAGGGGCTGGATGAGTCCGGAAGTGGAGATTGGCT-
    GCTTAGTGACGCGCGGCGTCCCGGAAGTTGACAGATACAGGGCGAGAGGCAGTGGAGGCGGGACTTG-
    GATAGGGGCGGAACCTGAGACTACCTTTCTGCGATCACAGGATTCCCGGCGGTGACTTGACCCCGGAAGTGGGG
    TGTGAAGCTCCGGTGCTGGTGCGGCGGGGGA
    1373 GAGCGCCCGCCGTTGATGCCCCAGCTGCTCTGGCCGCGATGGGCACTGCAGGGGCTTTCCTGT-
    GCGCGGGGTCTCCAGCATCTCCACGAAGGCAGAGTTGGGGGTCTGGCAGCGCGTTCTGGACTTTGCCCGCCGC-
    CAGTGCGATTCTCCCTCCCGGTTCCAGTCGCCGCGGACGATGCTTCCTCCCACCCACCGCCCGCGGGCTCAGAG
    AGCAGGTCCCCGCACCGCGC
    1374 CATGGCCCGCTGCGCCCTCTCCGCCGGTTGGGGAGAGAAGCTCCTGGAGCGGCCAGATACCTGTTG-
    GCTCCTGAGCAGCATCGCCCAGTGCAGCCTCCGTCAGGAAAAGCAGCAGAATCGACAGCCCCAGGGGGC-
    GAGCGGGGTCCATGGTGCAGGGGGTCGGGCGGCCCGCTGGGCAAGGCGTCCGAGAAAGCGCCTGGCGGGAGGAG
    GTGCGCGGCTTTCTGCTCCAGGCGGCCCGGGTGCCCGCTTTATGCG
    1375 GGGGGCGGGGTGCAGGGGTGGAGGGGCGGGGAGGCGGGCTCCGGCTGCGCCACGCTATC-
    GAGTCTTCCCTCCCTCCTTCTCTGCCCCCTCCGCTCCCGCTGGAGCCCTCCACCCTACAAGTGGCCTACAGG-
    GCACAGGTGAGGCGGGACTGGACAGCTCCTGCTTTGATCGCCGGAGATCTGCAAATTCTGCCCATGTCGGGGCT
    GCAGAGCACTC
    1376 CGACCCTGCGCCCGGCAGTCCCCGGGGGCCGTGCGCCCGGCCCAGGCTCGGAGGTCCAGCCCAGCG-
    GCGGCTCAGGCTGCGCGCCTGGCTCCCAGCCTCAGTTTCCCCATTGGTAAAGCATTGACGGTGGTTGCGGACG-
    GCTTCTGCGGACAGAGCCTTGGGCTCCGACGTCTGCGCGG
    1377 GGCTTCAAGTCCACGGCCCTGTGATGGGATGTGGGCAGGGCCTGAGACAGGCCGAAC-
    CCAACTCTTCACAGGGCCGAATTCTTTGCCCGCAGCCCAGCACCCCGAAGGAGCTTGCCTCGGCTTCAAG-
    GCGCACCTAATGGGCACCGGATCGCTGGGGCGCTGAGGATGCCGCTCCGGGGCCTCCACGAGGCGGCCTCGCCA
    CGCGCCTCGGCCA
    1378 CCCCACCTGCCCGCGCTGCTTCTACCTGAAACTGGCCAAGGGCCCGAGCCCGGACCGGAGCCGT-
    GACTTCCCTCCGCCGGCCACGGGGCTGCCCGGATCCGCCGGGTTATGTCGCTTGGCTTTGGGCTCAGGGGT-
    CACCGTGGGCAGAGGGGGGTGCCGGGGTCGCGGACTGCCACCAGGTTGAGGAAAGGAGGGGCCTTTTGGCTGGG
    GAAAGAGCGTGGTGGGGGACCCGCGGCCGATGGAATCCCTGGGGCA
    1379 gcgcgcggagacgcagcagcggcagcggcagcATGTCGGCCGGCGGAGCGTCAGTCCCGCCGC-
    CCCCGAACCCCGCCGTGTCCTTCCCGCCGCCCCGGGGTCACCCTGCCCGCCGGCCCCGACATCCTGCG-
    GACCTACTCGGGCGCCTTCGTCTGCCTGGAGATTGTAAGTGGGGCCGCCGGAGCGAGGGTCGCGCGGGGAGCGA
    GGACAGGCGGCGGCATCCTTGTCCCCCGGGCTGTCTTCCTCTGCGTCCGC
    1380 GTGAGCCGGCGCTCCTGATGCGGAGAGGTGCGGCCATGTCCTGGCTGGGAGCGAAGCGC-
    CCTCGCTCGGGCAGTCGGAGCGAACTGTCTCCCGCGCGCTCCGCCAGCCGGGCCCTCCCGCTGGGCCCAC-
    CCCCCGAGGGGCGGGGCCAGAGCGGGCGGCACCGCCTCCTCCCCGCTGTCTGGGTCGCAGGCCTTAGCGACGGG
    CTGTTCTCCGGCCCCGCCCCATTCCCAGGCTCCGCCCCC
    1381 TGCCGCGGGGGTGCCAAGGGAAGTGCCAGCTCAGAGGGACCATGTGGGCGCAGGCACCCAGGCG-
    GCGCCGGGAGGCCTCTCGGGACTCCAGGGCTGTCCCTCCCGCAGGCTGTCCTTCCACCTCCACCCCAGGC-
    CAACGCCCTCCCGCCAGCCCAGGGTCCTGTGTCCTCGAGTCCTTCCTGGGCACCCTGGTCCCATCCTTAGCCCT
    GCCCGAGGGGCCCAGCCCTGCTCCAAAAGGGCTGTGGCTCCACCCAC
    1382 CTGCTGCGCGCGCTGGCTCTTCTGCGAGGCCTGCTTGAGCTTGTTGCCGCCTTTGGGCTCCGGGC-
    CCTCCAGCTCGTCCCTGCAGCGCCGCGGCCGCTCCTCGTAGGCCAGGCTGGAGGCAAGCTCCTTCTCCT-
    CAAAGCTGCGCTGCAGCTTCTGGAGGGCGCCCTCCCTCTCCAACAGCTTCTGCTCCAGCTCCTGGATGCTGCAC
    TCGTCCGTGGAGATGGGGGAGCGG
    1383 CTGGCGGCCCAGGTCGCTCCTGCCCAACCCGGGGACCCATCTCTTCCCCCGACTCCGACGACTGGT-
    GCGTCTTGCCCGGACATGCCCGGCCGCAGGCGACCCGGGCCACGCACCCCCGCCGTGTCCCCCTCTCTCCCT-
    GCCCTCTCCAGGCGCCAGGCACGCTCTTCCCCAGCCAGGGACCGCGGCGGGGACTCACCAACAGCAGGACCGCG
    GCGACAACGAGCACAAGGGTCTTGGGGACCCGGGGCCCAGGCC
    1384 AGCGCCCCGGCCGCCTGATGGCCGAGGCAGGGTGCGACCCAGGACCCAGGACGGCGTCGGGAAC-
    CATACCATGGCCCGGATCCCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGT-
    GAGTCCCGGCCGCGGTCCCTGGCTGGGGAAGAGCGCACCTGGCGCCGGGAGGGGGCAGGGAGACGGGGACACGG
    CAGGGATGCCTGGCCCTGGTCACCTGCGGCCGGGCA
    1385 GCCGCACGGGACAGCCAGGGGGAGCGCGCGCTCTGCTCCCTCGCGGCCCGGTCGCTCCTGCCCAGC-
    CCGGGCACCCCACTCTTCCCCTGACTCCGACGGCGGGTTCGTCCTGCCCAGACATGCCCGGCCGCAGGCGAC-
    CCGGGCCAAGCATCCCCACCGTGTCCCCCTCTCTCCCTGCCCACTCCCGGCGC
    1386 CCCGGACATGCCCCGCCACAAGTGACCCGGGCCAGGCACCCCCGCCGCGTCCCCCTCTCTCTCTGC-
    CCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGGCAGGACCGCGGTGGGGACTCACCTGCAGCAGGAC-
    CCCGACGACGACAAACTTGAAGGTCTTGTGGACCCGGAGCCGAGGGCTGGCTTCCCGCGCCGGCCTGGGT
    1387 cgggggccgccgcctgacttcggacaccggccccgcacccgccaggaggggagggaaggggag-
    gcggggagagcgacggcggggggcgggcggtggaccccgcctcccccggcacagcctgctgaggggaa-
    gagggggtctccgctcttcctcagtgcactctctgactgaagcccggcgcgtggggtgcagcgggagtgcgagg
    ggactggacaggtgggaagatgggaatgaggaccgggcggcgggaa
    1388 CAGTGGCGGCCCTCGGCCTGCGGTCGGAGGCGGCGCGGGCGGGGAGGCGGCGCTGCGGGCTGGGT-
    GCGCCCCGGCTCCCGGAGGTGCGGCGAGCAGGAAggcgcggggcggcgggcgcgcggcACTGACTCCGGAG-
    GCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAGCCCTCGGAGCCGCCCCGCGCAGCCAATCAGCTCCCG
    GCGGGGCGAGCCGCACTCGTTACCACGTCCGTCACCGGCGCG
    1389 GCCCGGCGCGGATAACGGTCCGGCGGGAGGACACGGCGGTCCCTACAGCATCGCGGCGGGCCAG-
    GCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCGGCGCCTCTGAAGTCACCCG-
    GCTCCTTTACGAACTGAGCCCGTTTTGGCTGGGAGGGTT
    1390 GCTCCGGGTGGGGAGGGAGGCTGGCAGCTCACCCCCGGGGGCGAGGGGTCTGCGTTAGCCGTAGC-
    CACGGGAGCCCGGGCTTCTGGGACGCTCAGCCGTGCGCTACCCGGTGCAGCTGCTTTCTCACCAGCTCGCGG-
    GTGGGTCCTGCCGCGGCTCGGCGACCCGCGCCCCCTTGCGAGCGACCCAGCGTGAAACCAGCCCAAAGGGCG-
    GCCTCGCCCG
    1391 GCCTGGGCGCAGAACGGGGTCCCTCGGCAGGACCCTCGCCGCGACAGCCTCAGCAGGGGATCGTC-
    GAGCAAAAGCCCGCAGGAATGCTCCTTTCTGGGGCCCCGCCCTCCCGGCCGACAGCTTTTAGGTAGACGTG-
    GAGGCGACTCAGATCGCCTCGCGGTTCCCGGGATGGCGCGGTCGCCCCCAACGCGAGGCTGCCTGGGGCACCCG
    GCTCTTTTCCTGGGCGTCCGCGGCC
    1392 GGTCCTAATCCCCAGGCTGCGCTGACAGGATTAGGCTCCGTTCCTCCCCATAATGTTCCCAGGAC-
    GAGCCTCATGGGGACGAACTACAAATCCCAGCATGCACCAGTCTTCGCCCGCCCGGCGGGAGGGCAACGGCT-
    GACCAGGACCGCAGGCAAGCACCGCGGCGACGGTTCCAGCCAGGAAAATGAGAGCCTCTTGGGCCACGTTCCAA
    ACGG
    1393 CCGCGTCCCCGGCTGCTCCTCCTCGTGCTggcggcggcggcggcggcggcggcggcgCT-
    GCTCCCGGGGGCGACGGGTGAgcggcggcgcggcgggcgggcgactgcggggcgcgcgggccggacccg-
    gccTCTGGCTCGCTCCTGCTCTTTCTCAAACATggcgcggggccgggggcgcaggtggcggcgccggggcccgg
    gccgggctctcgtggcgccgcgcggctcggcggctgccgggcgAACCGCAAGC
    1394 GGCAGGGCTGACGTTGGGAGCGCTATGAGCTGCCGGGCAGGGTCCTCACCGGGGGCTTCCTCTGCGG-
    GCCAGGGCTGCCGGGCGCCACCGGGACGCGAGCGCGCACGCCTCGGCCCGGCGGCCGCGCTCCTCGCAC-
    CGCCTTCTCCGCAGGTCTTTATTCATCATCTCATctccctcttccccttctccttctcctttgcctccttctcc
    tttgcctccttctcctcctcttcctccccctcctccaccaccacc
    1395 CCGTGGGCGCAGGGGCTGTGGCCGGGGCGGTGGGCGGGCGGTGCCGCCAGGTGAGACTGGCTGCCGT-
    GGCGCGGAGCTGCGAACTGGTCGGCGGCGCAAGGCGCGGACTCCGGTGAGTTGTGTGGAGCGCGCGCGGCCAT-
    GGGCGCGGGCCACGGGCGGGTGGGAGGGTGGGGGGCCAGAGGGGCGGGGGAGGGTCACTCGGCGGCTCCCGGTG
    CCGCCGCCGCCCGCCACCGCCTCTGCTCCCCGCG
    1396 cctgcgcacgcgggaagggctgccggaggcgcccgtagggaggcgcgcgcgcgggcggctcagggc-
    ccgcgttcctctccctcccgcctaccgccactttcccgccctgtgtgcgcccccacccccaccac-
    catcttcccaccctcagcgcgggcgccc
    1397 GCGGACGCAGCCGAGCTCAAAGCCGCTCTGGCCGCAGGGTGCGGACGCGTCGCGGAGTCCTCACTGC-
    CCCGCCTCGCTCTGGCAGAGTGGGGAGCCAGCCGGCAAAGAATTCCGTTTTCAGCTGGGCCAAGGGGCCG-
    GCGTCTCCCCACCCCCTTAGGCTCCGCCCCCTGTCCGCTGTGATCGCCGGGAGGCCAGGCCC
    1398 GACCCATGGCGGGGCAGGCGGCGGCGCTGTCGGGCGGGCAGGGGTGGCGGGAGGCGGTGGCGCAGC-
    GAGCAGCGGCCTCCAGCGCTGGTGGCTCCCTTTATAGGAGCGCTGGAGACACGGGCCCCGCCCGCCCTGCAGC-
    CCCGCCCTGCAGTCCCGGAGCGCCGAGGAGTGCGCGCCCCCTCGCCCCCGCCCCACCTCGGCTGGGAGGCTGGT
    GCGGACGCCGGGTG
    1399 ccgctccccgcccctggctccgcctggc-
    cccactcccctccgcgcgccttccctcttctcccccgctccccGCGGACGCTCCTCTCTTTCCCAGTGGGC-
    CAACTTTATGCTGAAATTTCTTTTCTGCCCTTTTTTGGGATGTTTCCCCATTGGGAGGCGGAGCCGGGCTGCGG
    CGGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCCGGGGCGAGGT-
    GCCTTCTCCCACCCAG
    1400 tgtgccgcgcggttgggaggagggtcgtgagcgtgagcgtgggagcgctgggggctctgctcgcgt-
    gctgctctgaagttgttccccgatgcgccgtaggaagctgggattctcccatccggacgtgggacgcaggg-
    gaggggtaggtttcaccgtccgggctgatgactcgtggcctccggggctcctgg
    1401 CACTCACGCTCTCAGCCCGGGGAATCCCAGCGGGGAGGAGGGAGGGAG-
    GTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGAAAACAG-
    GTCCACCTTTGCCAGCGTAATTCAGAAAGAGATGTAATTTTCTGAGAGCACACACCTGGGCAGGAGATCGC
    1402 GGCAAGCGGGCTTCGGGAAGAATGCAGTTGGTGAGGAAGCTCGGCGAGGCGTGCCCGTGCAGCTGC-
    CCCTGGCCCTGACTGCTGGTGCGAGGCAGTGCACGACTCAGCTGGCCGGGGCCTGCTGTCCCGCCGGTGC-
    CACGCACCTGCAGACGCCCGGGCTGTGCCATCTCCTGGGCCGGTCCGGGGGCTGGGGCGGGGCGAAAAAGAAAA
    AGCTCTGATCTCTGCCTTCGCCTCGCGCAGCTGTGCGGCGAGCCC
    1403 CCCGCGGGCCGGGTGAGAACAGGTGGCGCCGGCCCGACCAGGCGCTTTGTGTCGGGGCGCGAG-
    GATCTGGAGCGAACTGCTGCGCCTCGGTGGGCCGCTCCCTTCCCTCCCTTGCTCCCCCGGGCGGCCGCACGC-
    CGGGTCGGCCGGGTAACGGAGAGGGAGTCGCCAGGAATGTGGCTCTGGGGACTGCCTCGCTCGGGGAAGGGGAG
    AGGGTGGCCACGGTGTTAGGAGAGGCGCGGGAGCCGAGAGGTGGCG
    1404 GGCGGCGGCTGGAGAGCGAGGAGGAGCGGGTGGCCCCGCGCTGCGCCCGCCCTCGCCTCACCTG-
    GCGCAGGTAGGTGTGGCCGCGTCCCCTACCCGGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGG-
    GAACGACGCGCTGCTTTCATGCCCTTTCTTGTTCTACCTTCATCGGCCGAGGTAAAAGTGCTGAAACCATGTGA
    ATAAAATACAGGTGGGTTCCGCCAGCTTCGCTCC
    1405 GGGCCCCGGGACTCGGCTTGCACGAGCCAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGG-
    GAAAGCGCGGACGCGGCCCAAACCTCCAGTAGCCGCAGCCGCCGTCGCCGAGTAGGGCCGGGCAGCCAGCCGG-
    GCCTGGCGCAGCATCAGTGCCCGCTGCCGCTTCCGCTCGATACTCGCCCGCACCGAGGCAGGCAGCTCCGCGGG
    TTGCTCTAAAGCCGCCGCCTCCGGCAAAGCCCCGTCGGCCGCC
    1406 ACGGAATGTGGGGTGCGGGCCTGAATATTATAAACAAAACCAAAAAACACTGGCTGGAAAG-
    GAAGTAAGCGGATTCTTCGTAAAGTCTATCAAAAGTCTTTTCGTTTCCCCCTCCCCCTTTCCCCACCGCCCAC-
    CAAAATGAGCCGCGTTTGAGCACCTCAGGTCTGGAAAGCCGGCCAGGAGTGGGGGAGACCGAGGCACCCGCGGC
    C
    1407 GCGGCTGCTGCCGAGGCTCCTGGTTTCCACCGCCGCCCTCGGGGATCATGCCGCCATCGCGGTTCAT-
    GCCGTTCTCGTGGTTCACACCGCCCTCAGGGTTCATATTACCCATGAGGCCTGGAGCTCCTTGGCCAACATG-
    GCCTTCTGCGCTTGATGCTGCCCCCAGCTGAGGTGTGGGGCTTATTTTTACCTGGTATACACTCAGGCAGTAGA
    ACACGGTGTCGTGGACGAGCGAACGCGCCATGGCTGGAGCGC
    1408 CCGCTGCGCGAGGGAgggggcccgaggcgcccccggcccgcccTCCTCCCGGTCTTCGGATCCGAGC-
    CGGTCCTCGGGAAAGAGCCTGCCACCGCGTCCCCGCAGCCACCCTCTCCGCGTGCCCGGCCCTCTCCAGTG-
    GCGGGGGCACGTGGGCGCGCGGGGTGCGTGGCAAGCCGCCCCTCTCCCCACGCCCGTCCGGC
    1409 GGGGTGCGGCGTCTGGTCAGCCAGGGGTGAATTCTCAGGACTGGTCGGCAGTCAAGGTGAGGACCCT-
    GAGTGTAAACTGAAGAGACCACCCCCACCTGTAACAAAGAGGGCCCCACTAAGTCCCGCTTCTGCATTTG-
    GTCCTGAGAGGCTCCGGTAAAGCCGTCCGGCAATGTTCCACCTGGAAAGTTCCAGGGCAGGGGAAGGGTGGGGG
    GAGGGGCAGTCGCGGGGGA
    1410 GCCGGGGGAAATGCGGCCTCTAAGCTCTCCGCTGAGGCGGCTTGGAAGGAATAGTGACTGACGTg-
    gaggtgggggaggtggctggcccgggcgaggcccagggagagggagaggaggcgggtgggagaggaggagggT-
    GTATCTCCTTTCGTCGGCCCGCCCCTTGGCTTCTGCACTGATGGTGGGTGGATGAGTAATGCATCCAGGAAGCC
    TGGAGGCCTGTGGTTTCCGCACCCGCTGCCACCC
    1411 tgcctggtaggactgacggctgcctttgtcctcctcctctccaccccgcctccccccaccct-
    gccttccccccctcccccgtcttctctcccgcagctgcctcagtcggctactctcagccaacccccctcac-
    cacccttctccccacccgcccccccgcccccgtcggcccagcgctgccagcccgagtttgcagagag-
    gtaactccctttggctgcgagcgggcgagc
    1412 GCGCGGGCGCCTCGATCTCCCGCGCGCGCGCGTGCGCGAGACCCCCCTTTGGCCCCCTACCCT-
    GCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCAGCAGCAATATAAAGGAGAATGAGGCG-
    GCGCGCCTCCCAGACGCAGAGTAGATTGTGATTGGCTCGGGCTGCGGAACCTCG
    1413 CCCGGCTGGTCGGCGCTCCTCGCAGGCGGTGTCCCGGTCCGGAGCGATCTGCGCGCTCGGCCCCGCG-
    GCCGCGCCCTCCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCT-
    CACGGGGGCGTCCCGGGTCCGCATCCTCCCAGGAGCTGGGGAGCCGCTCGCTGGGCGCGGACCCGCTGCCTGAC
    GCTGCAAACTACACGGTTTCGGTCCCCCGCGC
    1414 CCGGGGCTGGGACGGCGCTTccaggcggagaaagacctccgcgggccgcgcgcggccttccccctgc-
    gaggatcgccattggcccgggttggctttggaaagcggcggtGGCTTTGGGCCGGGCTCGGC
    1415 GGGCGGGGTGGGGCTGGAGCtcctgtctcttggccagctgaatggaggcccagtggcaacacag-
    gtcctgcctggggatcaggtctgctctgcaccccaccttgctgcctggagccgcccacctgacaacctct-
    catccctgctctgcagatccggtcccatccccactgcccaccccacccccccagcactccacccagttcaacgt
    tccacgaacccccagaaccagccctcatcaacaggcagcaagaaggg
    1416 GTGCGGTTGGGCGGGGCCCTgtgccccactgcggagtgcgggtcgggaagcggagagagaagcagct-
    gtgtaatccgctggatgcggaccagggcgctccccattcccgtcgggagcccgccgattggctgggtgtgg-
    gcgcacgtgaccgacatgtggctgtattggtgcagcccgccagggtgtcactggagacagaatggaggtgctgc
    cggactcggaaatggggtaggtgctggagccaccatggccagg
    1417 GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact-
    gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC
    1418 GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact-
    gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC
    1419 CGGGAGCCCGCCCCCGAGAGgtgggctgcgggcgctcgaggcccagccgccgccgccgccgccgc-
    cgccgccgcctccgccgccgccgccgccgccgccgccgccgcgctgccgcacgccccctggcagcg-
    gcgcctccgtcaccgccgccgcccgcgctcgccgtcggcccgccgcccgctcagaggcggccctccaccggaag
    tgaaaccgaaacggagctgagcgcctgactgaggccgaacccccggcccg
    1420 TCCTGCCATCCGCGCCTTTGCActtttctttttgagttgacatttcttggtgctttttg-
    gtttctcgctgttgttgggtgctttttggtttgttcttgtccctttttcgtttgctcatcctttttg-
    gcgctaactcttaggcagccagcccagcagcccgaagcccgggcagccgcgctccgcggccccggggcagcgcg
    gcgggaaccgcagccaagccccccgacacggggcgcacgggggccgggcagcccg
    1421 AGGCACAGGGGCAGCTCCGGCACggctttctcaggcctatgccggagcctcgagggctggagagcgg-
    gaagacaggcagtgctcggggagttgcagcaggacgtcaccaggagggcgaagcggccacgggaggggggc-
    cccgggacattgcgcagcaaggaggctgcaggggctcggcctgcgggcgccggtcccacgaggcactgcggccc
    agggtctggtgcggagagggcccacagtggacttggtgacgct
    1422 CGACCCCTCCGACCGTGCTTCCGgtgagggtcctgggcccctttcccactctctagagaca-
    gagaaatagggcttcgggcgcccagcgtttcctgtggcctctgggacctcttggccagggacaaggacccgt-
    gacttccttgcttgctgtgtggcccgggagcagctcagacgctggctccttctgtccctctgcccgtggacatt
    agctcaagtcactgatcagtcacaggggtggcctgtcaggtcaggcgg
    1423 CCCGCAGGGTGGCTGCGTCCttccagggcctggcctgagggcaggggtg-
    gtttgctcccccttcagcctccgggggctggggtcagtgcggtgctaacacggctctctctgtgctgtgg-
    gacttccaggcaggcccgcaagccgtgtgagccgtcgcagccgtggcatcgttgaggagtgct-
    gtttccGCAGCTGTGACCTGGCCCTCCTGGA
    1424 GCGTCTGCCGGCCCCTCCCCttgtccgtcccctccgcgccgctggcgcgcgccttctgaatgc-
    caagcattgccataaactccggggacaaaagcctgggtcacaaaagccccctctagaagttcacaccctgag-
    gcttccctggcaaggctgggggccgtttggcccttccatgtggactgcaaaaacagtgttggaatgcaggactc
    tgggtatgttctcgaaagttgttacaaccccaacccagggttgacc
    1425 TAGGCCGCCGGGCAGCCACCgcgctcctctggctctcctgctccatcgcgctcctccgcgcccttgc-
    cacctccaacgcccgtgcccagcagcgcgcGGCTGCCCAACAGCGCCGGA
    1426 GGGGAGCGGGGACGCGAGCAgcaccagaatccgcgggagcgcggctgttcctggtagggccgtgt-
    caggtgacggatgtagctagggggcgagctgcctggagttgcgttccaggcgtccggcccctgggccgtcac-
    cgcggggcgcccgcgctgagggtgggaagatggtggtgggggtgggggcgcacacagggcgggaaagtggcggt
    aggcgggagggagaggaacgcgggccctgagccgcccgcgcgcg
    1427 GCCGGCTGGCTCCCCACTCTGCcagagcgaggcggggcagtgaggactccgcgacgcgtccgcac-
    cctgcggccagagcggctttgagctcggctgcgtccgcgctaggcgctttttcccagaagcaatccag-
    gcgcgcccgctggttcttgagcgccaggaaaagcccggagctaacgaccggccgctcggccactgcacggggcc
    ccaagccgcagaaggacgacgggagggtaatgaagctgagcccaggtc
    1428 TCGCTCACGGCGTCCCCTTGCCtggaaagataccgcggtccctccagaggatttgagggacagg-
    gtcggagggggctcttccgccagcaccggaggaagaaagaggaggggctggctggtcaccagagggtggggcg-
    gaccgcgtgcgctcggcggctgcggagagggggagagcaggcagcgggcggcggggagcagcatggagccggcg
    gcggggagcagcatggagccttcggctgactggctggccacggc
    1429 TCCCCGCTGCCCTGGCGCTCcccctttgatttattagggctgccgggttggcgcagat-
    tgctttttcttctcttccatcccatcctcccttctggtcctcctttccacagtgggagtccgtgctcct-
    gctcctcggttggctcctaagtgccccgccaggtcccctctcctttcgctctcccggctccggctcccgactct
    tcggcccgctggcatctgcttccctcccctgcctcgtttctcgtcgcccctgct
    1430 GGCCAGAGGCAGGCCCGCAGCtccctgccccgcctctgtgcctccgccaacccgacaacgct-
    tgctcccaccccgatccccgcacccgcgcgaAGTGGGCCCTCCGGTCGTCGGC
    1431 TGCCCGGGTCATCGGACGGGAGgccgcgccacgtgagggcggcaagagggcactggccctgcggc-
    gaggccccagcgaggggcgcttccCCGAGGGGCCAGCCTGGGCA
    1432 CCCAGTGCGCACGGCGAGGCagtagcccggccccgcactgctgataggtgcaggcag-
    gacagtccctccaccgcggctcggggcgtcctgattggtgcggagccacgtcagtcgcacccggagaagg-
    gtctgggaggaggcggaggcggaGAGGGCTGGGGAGGGCCGCG
    1433 AGCGTCCCAGCCCGCGCACCgaccagcgccccagttccccacagacgccggcgggcccgg-
    gagcctcgcggacgtgacgccgcgggcggaagtgacgttttcccgcggttggacgcggCGCTCAGTTGCCGG-
    GCGGGG
    1434 TGCTCCCCCGGGTCGGAGCCccccggagctgcgcgcgggcttgcagcgcctcgcccgcgct-
    gtcctcccggtgtcccgcttctccgcgccccagccgccggctgccagcttttcggggccccgagtcgcac-
    ccagcgaagagagcgggcccgggacaagctcgaactccggccgcctcgcccttccccggctccgctccctctgc
    cccctcggggtcgcgcgcccacgatgctgcagggccctggctcgctgctg
    1435 CGCTCGCATTGGGGCGCGTCccccatccgcccccaactgtggtgtcgcgacaggtcctattgcgggt-
    gtctgcggtgggaagggcggtggtgactgggagcATGCGGGGTAACCGCAGTGGGCA
    1436 TGCGGCAAGCCCGCCATGATGtccacgtgacaaaagccatgatatacatatgacaacgcctgccata-
    ttgtccctgcggcaaaacccaacacgaaaagcacacagcaaagacaaagaggcccgccatgttttacactgcg-
    gcaagaccttcagccgccatcttttcctgtgTGACCGCACATGTCCACCACCATGC
    1437 TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTGggaaaccctcacccaggagatacacaggagcactg-
    gctttggcagcagctcacaatgagaaagaTGCCTGTCACAGCCTTTGCCTTCCTCTTCTATG
    1438 GGACCATGAGTGTTTCCATGCTTGGCATCAGAcatgtcttctacccctattcagtctgtcatccact-
    ggtcaagaatcccaaacattctaaaactgtgtccacatctcttctgggtaactcttatgattggagg-
    gcttcctgaggtgtgaagtctatcacagatccagtgactaacttctagcttcatcttattctcacttaggggag
    aagagttgaggcccaagcaaacctcttcttaccattggcttagggaa
    1439 tcagccactgcttcgcaggctgacgttactgacgtggtgccagcgacggagggcgagaacgc-
    cagcgcggcgcagccggacgtgaacgcgcagatcaccgcagcggttgcggcagaaaacagccgcattatggg-
    gatcctcaactgtgaggaggctcacggacgcgaagaacaggcacgcgtgctggcagaaacccccggtatgaccg
    tgaaaacggcccgccgcattctggccgcagcaccacagagtgcacag
    1440 cggccagctgcgcggcgactccggggactccagggcgcccctctgcggccgacgcccggggt-
    gcagcggccgccggggctggggccggcgggagtccgcgggaccctccagaagagcggccggcgccgtgact-
    cagcactggggcggagcggggc

Claims (21)

1.-15. (canceled)
16. A nucleic acid primer or hybridization probe set specific for at least one potentially methylated region of at least one marker gene suitable to diagnose or predict lung cancer or a lung cancer type.
17. The set of claim 16, wherein the at least one the marker gene is further defined as WT1, SALL3, TERT, ACTB, or CPEB4.
18. The set of claim 16, wherein the lung cancer is adenocarcinoma or squamous cell carcinoma.
19. The set of claim 16, further comprising a nucleic acid primer or hybridization probe specific for at least one additional marker gene defined as ABCB1, ACTB, AIM1L, APC, AREG, BMP2K, BOLL, C5AR1, C5orf4, CADM1, CDH13, CDX1, CLIC4, COL21A1, CPEB4, CXADR, DLX2, DNAJA4, DPH1, DRD2, EFS, ERBB2, ERCC1, ESR2, F2R, FAM43A, GABRA2, GAD1, GBP2, GDNF, GNA15, GNAS, HECW2, HIC1, HIST1H2AG, HLAG, HOXA1, HOXA10, HSD17B4, HSPA2, IRAK2, ITGA4, JUB, KCNJ15, KCNQ1, KIF5B, KL, KRT14, KRT17, LAMC2, MAGEB2, MBD2, MSH4, MT1G, MT3, MTHFR, NEUROD1, NHLH2, NKX2-1, ONECUT2, PENK, PITX2, PLAGL1, PTTG1, PYCARD, RASSF1, S100A8, SALL3, SERPINB5, SERPINE1, SERPINI1, SFRP2, SLC25A31, SMAD3, SPARC, SPHK1, SRGN, TERT, THRB, TJP2, TMEFF2, TNFRSF10C, TNFRSF25, TP53, ZDHHCI1, ZNF256, ZNF711, F2R, HOXA10, KL, SALL3, SPARC, TNFRSF25, or WT1.
20. The set of claim 16, further defined as a nucleic acid primer or hybridization probe set comprising nucleic acid primers or hybridization probes being specific for potentially methylated regions of at least 50% of the marker genes in at least one of the following combinations:
WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, and TNFRSF10C;
WT1, PITX2, SALL3, F2R, DLX2, TERT, HOXA10, MSH4, NHLH2, GNA15, PENK, RASSF1, BOLL, HOXA1, ONECUT2, ABCB1, SPARC, MT1G, HSPA2, SFRP2, PYCARD, GAD1, C5orf4, C5AR1, GNDF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, GDNF, ZDHHC11, SERPINE1, NKX2-1, PITX2, C5AR1, ZNF256, FAM43A, SFRP2, MT3, SERPINE1M, CLIC4, TNFRSF10C, GABRA2, MTHFR, ESR2, NEUROG1, PITX2, PLAGL1, TMEFF2, PTTG1, CADM1, S100A8, EFS, JUB, ITGA4, MAGEB2, ERBB2, SRGN, GNAS, TJP2, KCNJ15, SLC25A31, ZNF573, TNFRSF25, APC, KCNQ1, LAMC2, SPHK1 DNAJA4, APC, MBD2, ERCC1 HLA-G, CXADR, TP53, ACTB, KL, SMAD3, HIST1H2AG, and CPEB4;
WT1 DLX2, SALL3, TERT, TNFRSF25, ACTB, SMAD3, and CPEB4;
WT1, DLX2, SALL3, TERT, PITX2, TNFRSF25, KL, ACTB, SMAD3, and CPEB4;
WT1, PITX2, SALL3, DLX2, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DNAJA4, HLA-G, CXADR, TP53, ACTB, and CPEB4;
WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, and CPEB4;
WT1, ACTB, DLX2, PITX2, SALL3, HOXA10, TERT, CPEB4, HLA-G, SPARC, RASSF1, DNAJA4, CXADR, TP53, IRAK2, and ZNF711;
F2R, ZNF256, CDH13, SERPINB5, KRT14, DLX2, AREG, THRB, HSD17B4, SPARC, HECW2, and COL21A1;
KL, HIST1H2AG, TJP2, SRGN, CDX1, TNFRSF25, APC, HIC1, APC, GNA15, ACTB, WT1, KRT17, AIM1L, DPH1, PITX2, PITX2, KIF5B, BMP2K, GBP2, NHLH2, GDNF, and BOLL;
WT1, DLX2, SALL3, TERT, PITX2, HOXA10, F2R, CPEB4, NHLH2, SMAD3, ACTB, HOXA1, BOLL, APC, MT1G, PENK, SPARC, DNAJA4, RASSF1, HLA-G, ERCC1, ONECUT2, APC, ABCB1, ZNF573, KCNJ15, ZDHHC11, SFRP2, GDNF, PTTG1, SERPINI1, and TNFRSF10C;
HOXA10 and NEUROD1;
WT1, PITX2, SALL3, F2R, TERT, HOXA10, RASSF1, SPARC, IRAK2, ZNF711, DRD2, DNAJA4, CXADR, TP53, ACTB, CPEB4, DLX2, TNFRSF25, KL, and SMAD3;
TNFRSF25, SALL3, RASSF1, TERT, SPARC, F2R, HOXA10, ZNF711, and PITX2
SALL3, PITX2, SPARC, F2R, TERT, RASSF1, HOXA10, CXADR, and KL
SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, and KL;
SALL3, PITX2, SPARC, F2R, HOXA10, DRD2, ACTB, DNAJA4, CXADR, KL;
SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, TNFRSF25, DNAJA4, TP53, CXADR, and KL;
SPARC, SALL3, F2R, PITX2, RASSF1, HOXA10, TERT, KL, and TNFRSF25;
SALL3, SPARC, PITX2, F2R, TERT, RASSF1, HOXA10, KL, TNFRSF25, CXADR; and
HOXA10, RASSF1, and F2R.
21. The set of claim 16, further defined as comprising not more than 100000 probes or primer pairs.
22. The set of claim 16, further defined as comprising not more than 100000 probes or primer pairs.
23. The set of claim 22, further defined as comprising immobilized probes on a solid surface.
24. The set of claim 22, wherein the primer pairs and probes are specific for a methylated upstream region of an open reading frame of the marker genes.
25. The set of claim 22, wherein the probes or primers are specific for methylation in the genetic regions defined by any of SEQ ID NOs 1081 to 1440 including the adjacent up to 500 base pairs corresponding to any of gene marker IDs 1 to 359.
26. The set of claim 25, wherein the probes or primers are of SEQ ID NOs 1 to 1080.
27. A method of identifying or predicting a lung cancer or a lung cancer type in a patient, comprising:
obtaining a set of nucleic acid primers or hybridization probes of claim 16;
using the set to determine the methylation status of genes for which the members of the set are specific in a sample of DNA from the patient; and
comparing the methylation status of the genes with the status of a confirmed lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type, if any, in the patient.
28. The method of claim 27, wherein the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridization analysis to non-digested or digested fragments or PCR amplification analysis of non-digested or digested fragments.
29. A method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene marker IDs 1-359 of Table 1, suitable for the diagnosis or prognosis of lung cancer or lung cancer type, comprising:
a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 1 sample of a confirmed lung cancer or lung cancer type state and at least one sample of a lung cancer or lung cancer type negative state;
b) correlating the results of the obtained methylation status with the lung cancer or lung cancer type;
c) optionally repeating the obtaining a) and correlating b) steps for a different combination of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359; and
d) selecting as many marker genes which in a classification analysis have a p-value of less than 0.1 in a random-variance t-test, or selecting as many marker genes which in a classification analysis together have a correct lung cancer or lung cancer type prediction of at least 70% in a cross-validation test;
wherein the selected markers form the subset of diagnostic markers.
30. The method of claim 29, wherein a) is further defined as comprising obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 5 samples of a confirmed lung cancer or lung cancer type state.
31. The method of claim 29, wherein the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test, and selected in step d) in order of the rating.
32. The method of claim 29, wherein not more than 40 marker genes are selected in step d) for the subset.
33. The method of claim 29, wherein the step a) of obtaining data of the methylation status comprises determining data of the methylation status by methylation specific PCR analysis, methylation specific digestion analysis, or hybridization analysis to non-digested or digested fragments, or PCR amplification analysis of non-digested or digested fragments.
34. A method of identifying or predicting a lung cancer or a lung cancer type in a patient, comprising:
providing a set of a diagnostic subset of markers identified by a method of claim 29;
using the set to determine methylation status of genes for which the members of the set are specific in a sample comprising DNA from the patient; and
comparing the methylation status of the genes with the status of a confirmed lung cancer type positive and/or negative state, thereby identifying lung cancer or lung cancer type, if any, in the patient.
35. The method of claim 34, wherein the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridization analysis to non-digested or digested fragments or PCR amplification analysis of non-digested or digested fragments.
US15/096,848 2009-01-28 2016-04-12 Lung cancer methylation markers Abandoned US20160281175A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/096,848 US20160281175A1 (en) 2009-01-28 2016-04-12 Lung cancer methylation markers

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
EP09450020A EP2233590A1 (en) 2009-01-28 2009-01-28 Methylation assay
EP09450020.4 2009-01-28
PCT/EP2010/051032 WO2010086388A1 (en) 2009-01-28 2010-01-28 Lung cancer methylation markers
US201113146901A 2011-07-28 2011-07-28
US15/096,848 US20160281175A1 (en) 2009-01-28 2016-04-12 Lung cancer methylation markers

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US13/146,901 Continuation US20110287967A1 (en) 2009-01-28 2010-01-28 Lung Cancer Methylation Markers
PCT/EP2010/051032 Continuation WO2010086388A1 (en) 2009-01-28 2010-01-28 Lung cancer methylation markers

Publications (1)

Publication Number Publication Date
US20160281175A1 true US20160281175A1 (en) 2016-09-29

Family

ID=40578874

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/146,901 Abandoned US20110287967A1 (en) 2009-01-28 2010-01-28 Lung Cancer Methylation Markers
US13/146,903 Active US10718026B2 (en) 2009-01-28 2010-01-28 Methylation assay
US15/096,848 Abandoned US20160281175A1 (en) 2009-01-28 2016-04-12 Lung cancer methylation markers

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US13/146,901 Abandoned US20110287967A1 (en) 2009-01-28 2010-01-28 Lung Cancer Methylation Markers
US13/146,903 Active US10718026B2 (en) 2009-01-28 2010-01-28 Methylation assay

Country Status (4)

Country Link
US (3) US20110287967A1 (en)
EP (4) EP2233590A1 (en)
CA (2) CA2750978A1 (en)
WO (2) WO2010086388A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183440A1 (en) * 2018-03-22 2019-09-26 Ionis Pharmaceuticals, Inc. Methods for modulating fmr1 expression
US11410750B2 (en) 2018-09-27 2022-08-09 Grail, Llc Methylation markers and targeted methylation probe panel
US12024750B2 (en) 2020-10-01 2024-07-02 Grail, Llc Methylation markers and targeted methylation probe panel

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2233590A1 (en) 2009-01-28 2010-09-29 AIT Austrian Institute of Technology GmbH Methylation assay
ES2658626T3 (en) * 2009-02-12 2018-03-12 Curna, Inc. Treatment of diseases related to glial cell-derived neurotrophic factor (GDNF) by inhibition of natural antisense transcript to GDNF
ES2633111T3 (en) 2010-03-26 2017-09-19 Hongzhi Zou Methods and materials to detect colorectal neoplasia
CA2804626C (en) 2010-07-27 2020-07-28 Genomic Health, Inc. Method for using expression of glutathione s-transferase mu 2 (gstm2) to determine prognosis of prostate cancer
JP5693895B2 (en) * 2010-08-26 2015-04-01 公益財団法人ヒューマンサイエンス振興財団 Cervical cancer test marker and cervical cancer test method
WO2012031329A1 (en) * 2010-09-10 2012-03-15 Murdoch Childrens Research Institute Assay for detection and monitoring of cancer
EP2630261B1 (en) 2010-10-19 2019-04-17 Oslo Universitetssykehus HF Methods and biomarkers for detection of bladder cancer
WO2013012781A2 (en) * 2011-07-15 2013-01-24 The Johns Hopkins University Genome-wide methylation analysis and use to identify genes specific to breast cancer hormone receptor status and risk of recurrence
ITPD20110324A1 (en) * 2011-10-12 2013-04-13 Innovation Factory S C A R L METHOD FOR DETECTION OF GALFA15 AS A TUMOR MARKER IN PANCREATIC CARCINOMA
EP2816120B1 (en) 2012-02-13 2018-04-11 Beijing Institute for Cancer Research Method for in vitro estimation of tumorigenesis, metastasis, or life expectancy and artificial nucleotide used
EP2636752A1 (en) * 2012-03-06 2013-09-11 Universiteit Maastricht In vitro method for determining disease outcome in pulmonary carcinoids.
CN102787174B (en) * 2012-09-06 2013-08-07 南京大学 Kit for morbidity-related tumor suppressor gene epigenetic mutation detection of gastrointestinal neoplasms and application thereof
JP6392762B2 (en) * 2012-10-02 2018-09-19 シュピーンゴテック ゲゼルシャフト ミット ベシュレンクテル ハフツング Methods to help predict the risk of developing cancer in female subjects
WO2014062218A1 (en) * 2012-10-16 2014-04-24 University Of Southern California Colorectal cancer dna methylation markers
US9518989B2 (en) 2013-02-12 2016-12-13 Texas Tech University System Composition and method for diagnosis and immunotherapy of lung cancer
CN118028467A (en) 2013-03-14 2024-05-14 梅奥医学教育和研究基金会 Detection of neoplasms
EP2886659A1 (en) * 2013-12-20 2015-06-24 AIT Austrian Institute of Technology GmbH Gene methylation based colorectal cancer diagnosis
EP3126529B1 (en) 2014-03-31 2020-05-27 Mayo Foundation for Medical Education and Research Detecting colorectal neoplasm
US10184154B2 (en) 2014-09-26 2019-01-22 Mayo Foundation For Medical Education And Research Detecting cholangiocarcinoma
US10030272B2 (en) 2015-02-27 2018-07-24 Mayo Foundation For Medical Education And Research Detecting gastrointestinal neoplasms
CN115927612A (en) 2015-03-27 2023-04-07 精密科学公司 Detecting esophageal disorders
KR20180081042A (en) 2015-08-31 2018-07-13 메이오 파운데이션 포 메디칼 에쥬케이션 앤드 리써치 A method for detecting gastric neoplasia
CN105154542B (en) * 2015-09-01 2018-04-17 杭州源清生物科技有限公司 One group of gene for being used for lung cancer molecule parting and its application
CA3002196C (en) 2015-10-30 2024-05-14 Exact Sciences Development Company, Llc Multiplex amplification detection assay and isolation and detection of dna from plasma
US10913986B2 (en) 2016-02-01 2021-02-09 The Board Of Regents Of The University Of Nebraska Method of identifying important methylome features and use thereof
WO2017180886A1 (en) 2016-04-14 2017-10-19 Mayo Foundation For Medical Education And Research Detecting pancreatic high-grade dysplasia
US10370726B2 (en) 2016-04-14 2019-08-06 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasia
WO2017192221A1 (en) * 2016-05-05 2017-11-09 Exact Sciences Corporation Detection of lung neoplasia by analysis of methylated dna
US11685955B2 (en) 2016-05-16 2023-06-27 Dimo Dietrich Method for predicting response of patients with malignant diseases to immunotherapy
DE102016005947B3 (en) * 2016-05-16 2017-06-08 Dimo Dietrich A method for estimating the prognosis and predicting the response to immunotherapy of patients with malignant diseases
WO2017210662A1 (en) * 2016-06-03 2017-12-07 Castle Biosciences, Inc. Methods for predicting risk of recurrence and/or metastasis in soft tissue sarcoma
WO2017212734A1 (en) * 2016-06-10 2017-12-14 国立研究開発法人国立がん研究センター Method for predicting effect of pharmacotherapy on cancer
WO2018009709A1 (en) * 2016-07-06 2018-01-11 Youhealth Biotech, Limited Lung cancer methylation markers and uses thereof
WO2018009703A1 (en) * 2016-07-06 2018-01-11 Youhealth Biotech, Limited Breast and ovarian cancer methylation markers and uses thereof
WO2018009696A1 (en) * 2016-07-06 2018-01-11 Youhealth Biotech, Limited Colon cancer methylation markers and uses thereof
US10093986B2 (en) 2016-07-06 2018-10-09 Youhealth Biotech, Limited Leukemia methylation markers and uses thereof
CN107847515B (en) * 2016-07-06 2021-01-29 优美佳生物技术有限公司 Solid tumor methylation marker and application thereof
EP3481953A4 (en) * 2016-07-06 2020-04-15 Youhealth Biotech, Limited Liver cancer methylation markers and uses thereof
CN106282347A (en) * 2016-08-17 2017-01-04 中南大学 HoxC11 as biomarker preparation adenocarcinoma of lung pre-diagnostic reagent in application
EP4293128A3 (en) * 2016-09-02 2024-03-20 Mayo Foundation for Medical Education and Research Detecting hepatocellular carcinoma
CN109963862B (en) 2016-09-07 2024-01-30 武汉华大吉诺因生物科技有限公司 Polypeptides and uses thereof
WO2018045509A1 (en) 2016-09-07 2018-03-15 武汉华大吉诺因生物科技有限公司 Polypeptide and application thereof
US10787666B2 (en) * 2016-12-14 2020-09-29 Shanghaitech University Compositions and methods for treating cancer by inhibiting PIWIL4
CN106755466A (en) * 2017-01-12 2017-05-31 宁夏医科大学 A kind of its method for building up of specific DNA methylome and application
CA3054836A1 (en) 2017-02-28 2018-09-07 Mayo Foundation For Medical Education And Research Detecting prostate cancer
WO2018174860A1 (en) * 2017-03-21 2018-09-27 Mprobe Inc. Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling
WO2018214249A1 (en) * 2017-05-22 2018-11-29 立森印迹诊断技术(无锡)有限公司 Imprinted gene grading model, system composed of same, and application of same
CN109528749B (en) * 2017-09-22 2023-04-28 上海交通大学医学院附属瑞金医院 Application of long-chain non-coding RNA-H19 in preparation of drug for treating pituitary tumor
AU2018374176A1 (en) 2017-11-30 2020-06-11 Exact Sciences Corporation Detecting breast cancer
CN112313345B (en) * 2018-05-18 2024-02-20 立森印迹诊断技术(无锡)有限公司 Method for diagnosing cancer by biopsy cell sample
CN110714075B (en) * 2018-07-13 2024-05-03 立森印迹诊断技术(无锡)有限公司 Grading model for detecting benign and malignant degrees of lung tumor and application thereof
KR102637032B1 (en) * 2020-01-28 2024-02-15 주식회사 젠큐릭스 Composition for diagnosing bladder cancer using CpG methylation status of specific gene and uses thereof
CN114075603A (en) * 2020-09-01 2022-02-22 闫池 Method for determining differential methylation sites of CpG island of AJUBA gene promoter
CN112301132A (en) * 2020-11-18 2021-02-02 中国医学科学院肿瘤医院 Kit for multi-gene combined detection of cancer and detection method thereof
CN113528667B (en) * 2021-07-20 2022-09-20 中国科学院上海营养与健康研究所 Diagnosis method of giant cell tumor of bone
KR102404750B1 (en) * 2021-10-14 2022-06-07 주식회사 엔도믹스 Methylation marker genes for colorectal diagnosis using cell-free dna and use thereof
CN114438188A (en) * 2021-11-29 2022-05-06 中国辐射防护研究院 Use of hypermethylated CDK2AP1 gene as molecular marker for alpha radiation damage prediction
CN114395623B (en) * 2021-12-16 2024-03-08 上海市杨浦区中心医院(同济大学附属杨浦医院) Gene methylation detection primer composition, kit and application thereof

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6582908B2 (en) * 1990-12-06 2003-06-24 Affymetrix, Inc. Oligonucleotides
US6605432B1 (en) 1999-02-05 2003-08-12 Curators Of The University Of Missouri High-throughput methods for detecting DNA methylation
DE10161625A1 (en) 2001-12-14 2003-07-10 Epigenomics Ag Methods and nucleic acids for the analysis of a pulmonary cell division disorder
ES2272636T3 (en) 2002-06-05 2007-05-01 Epigenomics Ag PROCEDURE FOR THE QUANTITATIVE DETERMINATION OF THE GRADE OF METHYLACON OF CITOSINES IN POSITIONS CPG.
WO2005027712A2 (en) * 2003-05-30 2005-03-31 Temple University - Of The Commonwealth System Of Higher Education Methods of diagnosing, prognosing and treating breast cancer
WO2006084051A2 (en) * 2005-02-01 2006-08-10 John Wayne Cancer Institute Use of id4 for diagnosis and treatment of cancer
CA2604689A1 (en) * 2005-04-15 2006-10-26 Oncomethylome Sciences, Inc. Methylation markers for diagnosis and treatment of cancers
US9512483B2 (en) * 2005-07-09 2016-12-06 Lovelace Respiratory Research Institute Gene methylation as a biomarker in sputum
WO2007032748A1 (en) * 2005-09-15 2007-03-22 Agency For Science, Technology & Research Method for detecting dna methylation
EP2004860A4 (en) * 2006-03-29 2009-12-30 Wayne John Cancer Inst Methylation of estrogen receptor alpha and uses thereof
US20100003189A1 (en) * 2006-07-14 2010-01-07 The Regents Of The University Of California Cancer biomarkers and methods of use thereof
US8911937B2 (en) * 2007-07-19 2014-12-16 Brainreader Aps Method for detecting methylation status by using methylation-independent primers
EP2233590A1 (en) 2009-01-28 2010-09-29 AIT Austrian Institute of Technology GmbH Methylation assay

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183440A1 (en) * 2018-03-22 2019-09-26 Ionis Pharmaceuticals, Inc. Methods for modulating fmr1 expression
US11661601B2 (en) 2018-03-22 2023-05-30 Ionis Pharmaceuticals, Inc. Methods for modulating FMR1 expression
US11410750B2 (en) 2018-09-27 2022-08-09 Grail, Llc Methylation markers and targeted methylation probe panel
US11685958B2 (en) 2018-09-27 2023-06-27 Grail, Llc Methylation markers and targeted methylation probe panel
US11725251B2 (en) 2018-09-27 2023-08-15 Grail, Llc Methylation markers and targeted methylation probe panel
US11795513B2 (en) 2018-09-27 2023-10-24 Grail, Llc Methylation markers and targeted methylation probe panel
US12024750B2 (en) 2020-10-01 2024-07-02 Grail, Llc Methylation markers and targeted methylation probe panel

Also Published As

Publication number Publication date
EP3124624A3 (en) 2017-04-12
EP2391729A1 (en) 2011-12-07
US10718026B2 (en) 2020-07-21
CA2750978A1 (en) 2010-08-05
US20110287967A1 (en) 2011-11-24
CA2750979A1 (en) 2010-08-05
EP3124624B1 (en) 2019-11-20
EP2391729B1 (en) 2016-09-14
US20110287968A1 (en) 2011-11-24
WO2010086388A1 (en) 2010-08-05
EP2233590A1 (en) 2010-09-29
EP2391728A1 (en) 2011-12-07
WO2010086389A1 (en) 2010-08-05
EP3124624A2 (en) 2017-02-01

Similar Documents

Publication Publication Date Title
US20160281175A1 (en) Lung cancer methylation markers
EP3083993B1 (en) Gene methylation based colorectal cancer diagnosis
Radpour et al. Hypermethylation of tumor suppressor genes involved in critical regulatory pathways for developing a blood-based test in breast cancer
Toiyama et al. DNA methylation and microRNA biomarkers for noninvasive detection of gastric and colorectal cancer
US20170121775A1 (en) Detection and Prognosis of Lung Cancer
CN109153993B (en) Detecting high dysplasia of pancreas
US20100240549A1 (en) Specific amplification of tumor specific dna sequences
WO2008045133A2 (en) Molecular assay to predict recurrence of dukes&#39; b colon cancer
WO2012047899A2 (en) Novel dna hypermethylation diagnostic biomarkers for colorectal cancer
WO2009108917A2 (en) Markers for improved detection of breast cancer
CN113557308A (en) Detection of endometrial cancer
JP2020513795A (en) Prostate cancer detection
JP2023524224A (en) Methods for early detection, treatment responsiveness and prognosis prediction of colorectal cancer
JP2023524067A (en) Methods for identification and relative quantification of nucleic acid sequence, mutation, copy number or methylation changes using nuclease, ligation, deamination, DNA repair and polymerase reactions in combination with carryover prevention and marker
WO2016044142A1 (en) Bladder cancer detection and monitoring
WO2021003629A1 (en) Methods and Compositions for Lung Cancer Detection
WO2011133935A2 (en) Methods and kits for risk assessment of barrett&#39;s neoplastic progression
KR101504069B1 (en) Methods and Methylation Markers for detecting or diagnosing cholangiocarcinoma
KR20140077446A (en) Methods and Methylation Markers for detecting or diagnosing cholangiocarcinoma
TW202417642A (en) Methylation markers for identifying cancer and the applications
Sahnane Methylation profiles of exocrine and neuroendocrine colorectal carcinomas using methylation-specific multiple ligation-dependent probe amplification.

Legal Events

Date Code Title Description
AS Assignment

Owner name: AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH, AUSTRIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH;REEL/FRAME:039140/0674

Effective date: 20120419

Owner name: TECNET EQUITY NO TECHNOLOGIEBETEILIGUNGS-INVEST GM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH;REEL/FRAME:039140/0674

Effective date: 20120419

Owner name: AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH, AUSTRIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEINHAUSEL, ANDREAS;PICHLER, RUDOLF;NOHAMMER, CHRISTA;REEL/FRAME:039140/0561

Effective date: 20110810

AS Assignment

Owner name: AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH, AUSTRIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AGNETER, DORIS;GUENTHER, ELKE;SIGNING DATES FROM 20170518 TO 20170522;REEL/FRAME:042688/0465

AS Assignment

Owner name: AIT AUSTRIAN INSTITUTE OF TECHNOLOGY GMBH, AUSTRIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE CONVEYING PARTY DATA PREVIOUSLY RECORDED AT REEL: 042688 FRAME: 0465. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:TECNET EQUITY NO TECHNOLOGIEBETEILIGUNGS-INVEST GMBH;REEL/FRAME:043150/0617

Effective date: 20161101

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION