WO2010086389A1 - Methylation assay - Google Patents

Methylation assay Download PDF

Info

Publication number
WO2010086389A1
WO2010086389A1 PCT/EP2010/051033 EP2010051033W WO2010086389A1 WO 2010086389 A1 WO2010086389 A1 WO 2010086389A1 EP 2010051033 W EP2010051033 W EP 2010051033W WO 2010086389 A1 WO2010086389 A1 WO 2010086389A1
Authority
WO
WIPO (PCT)
Prior art keywords
genes
pitx2
methylation
disease
dna
Prior art date
Application number
PCT/EP2010/051033
Other languages
French (fr)
Inventor
Andreas WEINHÄUSEL
Rudolf Pichler
Christa NÖHAMMER
Original Assignee
Ait Austrian Institute Of Technology Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ait Austrian Institute Of Technology Gmbh filed Critical Ait Austrian Institute Of Technology Gmbh
Priority to EP16188579.3A priority Critical patent/EP3124624B1/en
Priority to CA2750979A priority patent/CA2750979A1/en
Priority to US13/146,903 priority patent/US10718026B2/en
Priority to EP10701378.1A priority patent/EP2391729B1/en
Publication of WO2010086389A1 publication Critical patent/WO2010086389A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2523/00Reactions characterised by treatment of reaction samples
    • C12Q2523/10Characterised by chemical treatment
    • C12Q2523/125Bisulfite(s)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the present invention relates to cancer diagnostic methods and means therefore.
  • Neoplasms and cancer are abnormal growths of cells. Cancer cells rapidly reproduce despite restriction of space, nutrients shared by other cells, or signals sent from the body to stop reproduction. Cancer cells are often shaped differently from healthy cells, do not function properly, and can spread into many areas of the body. Abnormal growths of tissue, called tumors, are clusters of cells that are capable of growing and dividing uncontrollably. Tumors can be benign (noncancerous) or malignant (cancerous) . Benign tumors tend to grow slowly and do not spread. Malignant tumors can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the body. Malignant cancers can be both locally invasive and metastatic.
  • Cancers can invade the tissues surrounding it by sending out "fingers” of cancerous cells into the normal tissue. Metastatic cancers can send cells into other tissues in the body, which may be distant from the original tumor. Cancers are classified according to the kind of fluid or tissue from which they originate, or according to the location in the body where they first developed. All of these parameters can effectively have an influence on the cancer characteristics, development and progression and subsequently also cancer treatment. Therefore, reliable methods to classify a cancer state or cancer type, taking diverse parameters into consideration is desired. Since cancer is predominantly a genetic disease, trying to classify cancers by genetic parameters is one extensively studied route.
  • RNA-expression studies have been used for screening to identify genetic biomarkers. Over recent years it has been shown that changes in the DNA-methylation pattern of genes could be used as biomarkers for cancer diagnostics. In concordance with the general strategy identifying RNA-expression based biomarkers, the most convenient and prospering approach would start to identify marker candidates by genome-wide screening of methylation changes.
  • RNA expression profiling for elucidation of class differences for distinguishing the "good” from the "bad” situation like diseased vs. healthy, or clinical differences between groups of diseased patients.
  • RNA based markers are more promising markers and expected to give robust assays for diagnostics.
  • Many of clinical markers in oncology are more or less DNA based and are well established, e.g. cytogenetic analyses for diagnosis and classification of different tumor-species. However, most of these markers are not accessible using the cheap and efficient molecular-genetic PCR routine tests.
  • RNA-expression changes range over some orders of magnitudes and these changes can be easily measured using genome-wide expression microarrays. These expression arrays are covering the entire translated transcriptome by 20000-45000 probes. Elucidation of DNA changes via microarray techniques requires in general more probes depending on the requested resolution. Even order (s) of magnitude more probes are required than for standard expression profiling to cover the entire 3xlO 9 bp human genome.
  • DNA-based biomarkers relies on elucidation of the changes in the DNA methylation pattern of (malignant; neoplastic) disease.
  • methylation affects exclusively the cytosine residues of CpG dinucleotides, which are clustered in CpG islands.
  • CpG islands are often found associated with gene-promoter sequences, present in the 5' -untranslated gene regions and are per default unmethylated.
  • an unmethylated CpG island in the associated gene-promoter enables active transcription, but if methylated gene transcription is blocked.
  • the DNA methylation pattern is tissue- and clone-specific and almost as stable as the DNA itself. It is also known that DNA-methylation is an early event in tumorigenesis which would be of interest for early and initial diagnosis of disease.
  • Microarray for human genome-wide hybridization testings are known, e.g. the Affymetrix Human Genome U133A Array (NCBl Database, Ace. No. GLP96) .
  • transcriptomics e.g. transcriptomics
  • a goal of the present invention is to provide an alternative and more cost-efficient route to identify suitable markers for cancer diagnostics.
  • the present invention provides a method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene IDs 1-359 in table 1, suitable for the diagnosis or prognosis of a disease or tumor type in a sample, comprising a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene ID 1-359 in at least 1 sample, preferably 2, 3, 4 or at least 5 samples, of a confirmed disease or tumor type positive state and at least one sample of a disease or tumor type negative state, b) correlating the results of the obtained methylation status with the disease or tumor type states, c) optionally repeating the obtaining a) and correlating b) steps for at least partially different at least 50 random genes selected from the 359 genes of gene IDs 1-359, and d) selecting as many marker genes which in a classification analysis together yield at least a 65%, preferably at least 70%, correct classification of the disease or tumor type or have a p-value of less than
  • the present invention provides a master set of 359 genetic markers which has been surprisingly found to be highly relevant for aberrant methylation in the diagnosis or prognosis of dis ⁇ eases. It is possible to determine a multitude of marker subsets from this master set which can be used to differentiate between various disease or tumor type.
  • the inventive 359 marker genes of table 1 are: NHLH2, MTHFR, PRDM2, MLLTIl, S100A9 (control), S100A9, S100A8 (control), S100A8, S100A2, LMNA, DUSP23, LAMC2, PTGS2, MARKl, DUSPlO, PARPl, PSEN2, CLIC4, RUNX3, AIMlL, SFN, RPA2, TP73, TP73 (p73), POU3F1, MUTYH, UQCRH, FAFl, TACSTD2, TN- FRSF25, DIRAS3, MSH4, GBP2, GBP2, LRRC8C, F3, NANOSl, MGMT, EBF3, DCLRElC, KIF5B, ZNF22, PGBD3, SRGN, GATA3, PTEN, MMS19, SFRP5, PGR, ATM, DRD2, CADMl, TEADl, OPCML, CALCA, CTSD, MYODl,
  • TP53_CGI36_lkb TP53, NPTXl, SMAD2, DCC, MBD2, 0NECUT2, BCL2, SERPINB5, SERPINB2 (control), SERPINB2, TYMS, LAMAl, SALL3, LDLR, STKIl, PRDX2, RAD23A, GNA15, ZNF573, SPINT2, XRCCl, ERCC2, ERCCl, C5AR1 (NM_001736), C5AR1, POLDl, ZNF350, ZNF256, C3, XAB2, ZNF559, FHL2, ILlB, ILlB (control), PAX8, DDX18, GADl, DLX2, ITGA4, NEURODl, STATl, TMEFF2, HECW2, BOLL, CASP8, SER- PINE2, NCL, CYPlBl, TACSTDl, MSH2, MSH6, MXDl, JAGl, F0XA2 , THBD
  • genes only need to be represented once in an inventive marker set (or set of probes or primers therefor) but preferably a second marker, such as a control region is included (IDs given in the list above relate to the gene ID (or gene loci ID) given in table 1 of the example section) .
  • DNA methylation an attractive target for biomarker development, is the fact that cell free methylated DNA can be detected in body-fluids like serum, sputum, and urine from patients with cancerous neoplastic conditions and disease.
  • clinical samples have to be available.
  • archived (tissue) samples Preferably these materials should fulfill the requirements to obtain intact RNA and DNA, but most archives of clinical samples are storing formalin fixed paraffin embedded (FFPE) tissue blocks. This has been the clinic-pathological routine done over decades, but that fixed samples are if at all only suitable for extraction of low quality of RNA.
  • FFPE formalin fixed paraffin embedded
  • any such samples can be used for the method of generating an inventive subset, including fixed samples.
  • the samples can be of lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer. These cancers can then be subsequently diagnosed by the inventive set (or subsets) .
  • the present invention provides a multiplexed methylation testing method which 1) outperforms the "classification" success when compared to genomewide screenings via RNA-expression profiling, 2) enables identification of biomarkers for a wide variety of diseases, without the need to prescreen candidate markers on a genomewide scale, and 3) is suitable for minimal invasive testing and 4) is easily scalable.
  • the invention presents a targeted multiplexed DNA-methylation test which outperforms genome-scaled approaches (including RNA expression profiling) for disease diagnosis, classification, and prognosis.
  • the inventive set of 359 markers enables selection of a subset of markers from this 359 set which is highly characteristic of a given disease or tumor type.
  • the disease is a neoplastic condition.
  • cancer can be diagnosed with the inventive set or given selective subsets thereof, but a wide range of other diseases detected via the DNA methylation changes of the patient.
  • Diseases can be genetic diseases of few, many or all cells in a subject patient (including cancer), or infectious diseases, which lead to altered gene regulation via DNA methylation, e.g. viral, in particular retroviral, infections.
  • the disease is a trisomy, such as trisomy 21.
  • neoplastic conditions, or tumor types include, without being limited thereto, cancer of different origin such as lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer.
  • cancer of different origin such as lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal,
  • neoplastic conditions or tumor types are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic tumors or nodules. It is sometimes possible to differentiate the sample type from which the methylated DNA is isolated, e.g. urine, blood, tissue samples.
  • the present invention is suitable to differentiate diseases, in particular neoplastic conditions, or tumor types. Diseases and neoplastic conditions should be understood in general including benign and malignant conditions. According to the present invention benign nodules (being at least the potential onset of malignancy) are included in the definition of a disease. After the development of a malignancy the condition is a preferred disease to be diagnosed by the markers screened for or used according to the present invention.
  • the present invention is suitable to distinguish benign and malignant tumors (both being considered a disease according to the present invention) .
  • the invention can provide markers (and their diagnostic or prognostic use) distinguishing between a normal healthy state together with a benign state on one hand and malignant states o n the other hand.
  • the invention is also suitable to differentiate between non-solid cancers including leukemia and healthy states.
  • a diagnosis of a disease may include identifying the difference to a normal healthy state, e.g. the absence of any neoplastic nodules or cancerous cells.
  • the present invention can also be used for prognosis of such conditions, in particular a prediction of the progression of a disease, such as a neoplastic condition, or tumor type.
  • a particularly preferred use of the invention is to perform a diagnosis or prognosis of a metastasising neoplastic disease (distinguished from non-metastasising conditions) .
  • prognosis should not be understood in an absolute sense, as in a certainty that an individual will develop cancer or a disease or tumor type (including cancer progression) , but as an increased risk to develop cancer or the disease or tumor type or of cancer progression.
  • Prognosis is also used in the context of predicting disease progression, in particular to predict therapeutic results of a certain therapy of the disease, in particular neoplastic conditions, or tumor types.
  • the prognosis of a therapy can e.g. be used to predict a chance of success (i.e. curing a disease) or chance of reducing the severity of the disease to a certain level.
  • markers screened for this purpose are preferably derived from sample data of patients treated according to the therapy to be predicted.
  • the inventive marker sets may also be used to monitor a patient for the emergence of therapeutic results or positive disease progressions.
  • DNA methylation analyses in principle rely either on bisulfite deamination-based methylation detection or on using methylation sensitive restriction enzymes.
  • the restriction enzyme-based strategy is used for elucidation of DNA-methylation changes.
  • Further methods to determine methylated DNA are e.g. given in EP 1 369 493 Al or US 6,605,432.
  • Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive marker sets (or subsets) .
  • a microarray-hybridization step can be used for reading out the PCR results.
  • statistical approaches for class comparisons and class prediction can be used. Such statistical methods are known from analysis of RNA-expression derived microarray data.
  • an amplification protocol can be used enabling selective amplification of the methylated DNA fraction prior methylation testing. Subjecting these amplicons to the methylation test, it was possible to successfully distinguish DNA from sensitive cases, e.g. distinguishing leukemia (CML) from normal healthy controls. In addition it was possible to distinguish breast-cancer patients from healthy normal controls using DNA from serum by the inventive methylation test upon preamplification . Both examples clearly illustrate that the inventive multiplexed methylation testing can be successfully applied when only limiting amounts of DNA are available. Thus, this principle might be the preferred method for minimal invasive diagnostic testing.
  • CML leukemia
  • the 359 marker set test is not a genome-wide test and might be used as it is for diagnostic testing, running a subset of markers - comprising the classifier which enables best classification - would be easier for routine applications.
  • the test is easily scalable.
  • the selected subset of primers/probes could be applied directly to set up of the lower multiplexed test (or single PCR-test) . This was confirmed when serum DNA using a classifier for distinguishing healthy females from individuals with breast-tumors (or other specific tumors) was tested. Only the specific primers comprising the gene-classifier obtained from the methylation test were set up together in multiplexed PCR reactions. Data derived upon hybridization of PCR amplicons were in line with initial classification. Thus, correct classification with the down-scaled test using only a subset was possible.
  • inventive methylation test is a suitable tool for differentiation and classification of neoplastic disease.
  • This assay can be used for diagnostic purposes and for defining biomarkers for clinical relevant issues to improve diagnosis of disease, and to classify patients at risk for disease progression, thereby improving disease treatment and patient management .
  • the first step of the inventive method of generating a sub- set, step a) of obtaining data of the methylation status preferably comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific digestion analysis.
  • Methylation specific digestion analysis can include either or both of hybridization of suitable probes for detection to non-digested fragments or PCR amplification and detection of non-digested fragments.
  • the inventive selection can be made by any (known) classification method to obtain a set of markers with the given diagnostic (or also prognostic) value to categorize a certain disease or tumor type.
  • classification methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.
  • the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test or t-value test or F-test.
  • Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the subset until a certain diagnostic value is reached, e.g. the herein mentioned at least 70% (or more) correct classification of the disease or tumor type.
  • Class Comparison procedures include identification of genes that were differentially methylated among the two classes using a random-variance t-test.
  • the random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455,2003).
  • Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01.
  • a stringent significance threshold can be used to limit the number of false positive findings.
  • a global test can also be performed to determine whether the expression profiles differed between the classes by permuting the labels of which arrays corresponded to which classes.
  • the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the "F-test" instead of the "t-test” should be used.
  • Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more biologically interpretable and clinically applicable, however, if fewer genes are included. Similar to cross-validation, gene selection is repeated for each training set created in the cross- validation process. That is for the purpose of providing an unbiased estimate of prediction error. The final model and gene set for use with future data is the one resulting from application of the gene selection and classifier fitting to the full dataset .
  • Models for utilizing gene methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al . Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al . Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al . ) , and Support Vector Machines with linear kernel (Ramaswamy et al . PNAS USA 98:15149-54, 2001). The models incorporated genes that were differentially methylated among genes at a given significance level (e.g.
  • the prediction error of each model using cross validation is preferably estimated.
  • the entire model building process was repeated, including the gene selection process. It may also be evaluated whether the cross- validated error rate estimate for a model was significantly less than one would expect from random prediction.
  • the class labels can be randomly permuted and the entire leave-one-out cross-val- idation process is then repeated.
  • the significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data. About 1000 random permutations may be usually used.
  • Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3 (4) : research0017.1- 0017.11, 2002) .
  • the greedy-pairs approach starts with ranking all genes based on their individual t-scores on the training set.
  • the procedure selects the best ranked gene g x and finds the one other gene g : that together with g x provides the best discrimination using as a measure the distance between centroids of the two classes with regard to the two genes when projected to the diagonal linear discriminant axis.
  • These two selected genes are then removed from the gene set and the procedure is repeated on the remaining set until the specified number of genes have been selected. This method attempts to select pairs of genes that work well together to discriminate the classes.
  • a binary tree classifier for utilizing gene methylation profile can be used to predict the class of future samples.
  • the first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes.
  • the individual binary classifiers were based on the "Support Vector Machines" incorporating genes that were differentially expressed among genes at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003) . Classifiers for all possible binary partitions are evaluated and the partition selected was that for which the cross-validated prediction error was minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split.
  • the prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation included re-selection of the optimal partitions at each node and re-selection of the genes used for each cross-validated training set as described by Simon et al . (Simon et al . Journal of the National Cancer Institute 95:14-18, 2003). 10-fold cross validation in which one-tenth of the samples is withheld can be utilized, a binary tree developed on the remaining 9/10 of the samples, and then class membership is predicted for the 10% of the samples withheld. This is repeated 10 times, each time withholding a different 10% of the samples. The samples are randomly partitioned into 10 test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute) .
  • the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test. It is also possible to include a step in that the genes are selected d) in order of their rating.
  • the subset selection preferably results in a subset with at least 60%, preferably at least 65%, at least 70%, at least 75%, at least 80% or even at least 85%, at least 90%, at least 92%, at least 95%, in particular preferred 100% correct classification of test samples of the disease or tumor type.
  • Such levels can be reached by repeating c) steps a) and b) of the inventive method, if necessary.
  • marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.
  • the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 190, at least 200, at least 220, at least 240, at least 260, at least 280, at least 300, at least 320, at least 340, at least 350 or all, genes.
  • the subset should be small it is preferred that not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are selected in step d) for the subset.
  • the present invention provides a method of identifying a disease or tumor type in a sample comprising DNA from a patient, comprising providing a diagnostic subset of markers identified according to the method depicted above, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed disease or tumor type positive and/or negative state, thereby identifying the disease or tumor type in the sample .
  • the methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC - methylated C - changes by any known methods, including PCR and hybridization techniques) .
  • the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments.
  • the methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.
  • methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.
  • the determination using only specific primers aims at specifically amplifying methylated (or in the alternative non- methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions - and consequently only the non-methylated (or alternatively methylated) DNA is obtained.
  • a genome chip or simply a gene chip including hybridization probes for all genes of interest such as all 359 marker genes
  • all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.
  • Either set, a set of probes or a set of primers can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.
  • the method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes.
  • the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof.
  • the DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots) .
  • the methylated DNA of the sample is detected by a multiplexed hybridization reaction.
  • a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion.
  • the amplification reaction is multiplexed (e.g. multiplex PCR).
  • the inventive methods are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes.
  • the DNA amount in the sample is below 500ng, below 400ng, below 300ng, below 200ng, below lOOng, below 50ng or even below 25ng.
  • the inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes.
  • the DNA amount in the sample is below 500ng, below 400ng, below 300ng, below 200ng, below lOOng, below 50ng or even below 25ng, per ml sample .
  • the present invention provides a subset comprising or consisting of nucleic acid primers or hybridization probes being specific for a potentially methylated region of at least marker genes selected from one of the following groups a) CHRNA9, RPA2, CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SER- PINB5, PIWIL4, NTF3, CDK2AP1 b) IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl c) KRT17, IGFBP7, RHOXFl, CLIC4, TP53, DLX2, ITGA4, AIMlL, SERPINl, SERPIN2, TP53, XIST, TEADl, CDKN2A, CTSD, OPCML, RPA2, BRCA2, CDHl, S100A9, SERPINB2, BCL2A1, UNC13B, ABLl, TIMPl, ATM, FBXW7, SFRP5,
  • TP53 k ARRDC4, DUSPl, SMAD9, HOXAlO, C3, ADRB2, BRCA2 , SYK 1) PITX2, MT3, RPA3, TNFRSFlOD, PTEN, TP53, PAX8, TGFBR2,
  • HICl HICl, CALCA, PSATl, MBD2, NTF3, PLAGLl, F2R, GJB2, ARRDC4,
  • NTHLl m MT3, RPA3, TNFRSFlOD, HOXAl, C13orf15, TGFBR2, HICl, CALCA,
  • PSATl NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl n) PITX2, PAX8, CD24, TP53, ESRl, TNFRSFlOD, RAD23A, SCGB3A1.
  • RARB TP53, LZTSl o) DUSPl, TFPI2, TJP2, S100A9, BAZlA, CPEB4, AIMlL, CDKN2A,
  • PINB5 PIWIL4, NTF3, CDK2AP1 r
  • IGF2 KCNQl
  • SCGB3A1 EFS
  • BRCAl BRCAl
  • ITGA4 H19
  • PTTGl s KRT17
  • AQP3 TP53
  • ZNF462 NEUROGl
  • GATA3, MTlA JUP
  • RGC32 SPINT2, DUSPl t) NCL, XPA, MYODl, Pitx2 u) SPARC, PIWIL4, SERPINB5, TEADl, EREG, ZDHHCIl, C5orf4 v) HSD17B4, DSP, SPARC, KRT17, SRGN, C5orf4, PIWIL4, SERPINB5,
  • ACTB EFS, PARP2, TP73, HICl, BCL2A1, CRABPl, CXADR, BDNF, COLlAl tt) EFS, ACTB, BCL2A1, TP73, HICl, SERPINIl, CXADR uu) ACTB, TP73, SERPINIl, CXADR, HICl, BCL2A1, EFS vv) FBXL13, PITX2, NKX2-1, IGF2, C5AR1, SPARC, RUNX3, CHSTIl, CHRNA9, ZNF462, HSD17B4, UNG, TJP2, ERBB2, S0X15, ERCC8, CDXl, ANXA3, CDHl, CHFR, TACSTDl, MTlA ww) TP53, PTTGl, VHL, TP53, S100A2, ZNF573, RDHlO, TSHR, MY- O5C, MBD2,
  • TNFRSF25 HICl
  • LAMC2 LAMC2
  • SPARC WTl
  • PITX2 PITX2
  • GNA15 GNA15
  • ESRl KL
  • HICl xx HICl, LAMC2, SPARC, WTl, PITX2, GNA15, KL, HICl yy) HICl, KL, ESRl
  • the present inventive set also includes sets with at least 50% of the above markers for each set since it is also possible to substitute parts of these subsets being specific for - in the case of binary conditions/differentiations - e.g. good or bad prognosis or distinguish between diseases or tumor types, wherein one part of the subset points into one direction for a certain tumor type or disease/differentiation. It is possible to further complement the 50% part of the set by additional markers specific for determining the other part of the good or bad differentiation or differentiation between two diseases or tumor types. Methods to determine such complementing markers follow the general methods as outlined herein .
  • Each of these marker subsets is particularly suitable to diagnose a certain disease or tumor type or distinguish between a certain disease or tumor type in a methylation specific assay of these genes.
  • nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes selected from at least 180, preferably at least 200, more preferred at least 220, in particular preferred at least 240, even more preferred at least 260, most preferred at least 280, or even at least 300, preferably at least 320 or at least 340, or at least 360, marker genes of table 1.
  • the set may comprise even more primers or hybridization probes not given in table 1.
  • the inventive primers or probes may be of any nucleic acid, including RNA, DNA, PNA (peptide nucleic acids) , LNA (locked nucleic acids) .
  • the probes might further comprise methylation specific moieties.
  • the present invention provides a (master) set of 360 marker genes, further also specific gene locations by the PCR products of these genes wherein significant methylation can be detected, as well as subsets therefrom with a certain diagnostic value to distinguish specific disease or tumor type.
  • the set is optimized for a certain disease or tumor type.
  • Cancer types include, without being limited thereto, cancer of different origin such as leukemia, a soft tissue cancer, for example breast cancer, colorectal cancer, head or neck cancer, cervical, prostate, thyroid, brain, eye or pancreatic cancer. Further indicators differentiating between disease or tumor type are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic .
  • the set can also be optimized for a specific sample type in which the methylated DNA is tested.
  • samples include blood, urine, saliva, hair, skin, tissues, in particular tissues of the cancer origin mentioned above, in particular breast or thyroid tissue.
  • the sample my be obtained from a patient to be diagnosed.
  • the test sample to be used in the method of identifying a subset is from the same type as a sample to be used in the diagnosis.
  • probes specific for potentially aberrant methylated regions are provided, which can then be used for the diagnostic method.
  • primers suitable for a specific amplification like PCR, of these regions in order to perform a diagnostic test on the methylation state.
  • Such probes or primers are provided in the context of a set corresponding to the inventive marker genes or marker gene loci as given in table 1.
  • Such a set of primers or probes may have all 359 inventive markers present and can then be used for a multitude of different cancer detection methods. Of course, not all markers would have to be used to diagnose a certain disease or tumor type. It is also possible to use certain subsets (or combinations thereof) with a limited number of marker probes or primers for diagnosis of certain categories of cancer.
  • the present invention provides sets of primers or probes comprising primers or probes for any single marker subset or any combination of marker subsets disclosed herein.
  • sets of marker genes should be understood to include sets of primer pairs and probes therefor, which can e.g. be provided in a kit.
  • normal or benign states including struma nodosa and follicular adenoma
  • malign states in particular follicular thyroid carcinoma, papillary thyroid carcinoma
  • benign states including struma nodosa and follicular adenoma
  • malign states in particular follicular thyroid carcinoma, papillary thyroid carcinoma and medullary thyroid carcinoma
  • SMAD3, NANOSl, TERT, BCL2, SPARC, SFRP2, MGMT, MYODl, LAMAl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between normal or benign states (including struma nodosa and follicular adenoma) together with malign states (in particular follicular thyroid carcinoma and papillary thyroid carcinoma) against medullary thyroid carcinoma.
  • benign states including struma nodosa and follicular adenoma
  • malign states in particular follicular thyroid carcinoma and papillary thyroid carcinoma
  • TJP2, CALCA, PITX2, TFPI2, CDKN2B and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between malign states (in particular follicular thyroid carcinoma and papillary thyroid carcinoma) together with follicular adenoma against struma nodosa.
  • Set k ARRDC4, DUSPl, SMAD9, HOXAlO, C3, ADRB2, BRCA2 , SYK and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular thyroid carcinoma and papillary thyroid carcinoma.
  • NTHLl Set m, MT3, RPA3, TNFRSFlOD, HOXAl, C13orf15, TGFBR2, HICl, CALCA, PSATl, NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular adenoma (benign) and follicular thyroid carcinoma (malign) .
  • Set r IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish lobular breast carcinoma from ductal breast carcinoma.
  • Set t, NCL, XPA, MYODl, Pitx2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish lobular breast carcinoma from ductal breast carcinoma.
  • TIMPl TIMPl
  • COL21A1, COL1A2, KL, CDKN2A sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
  • Set x, TIMPl, COL21A1, COL1A2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
  • Set z, TDRD6, XIST, LZTSl, IRF4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
  • Set aa, TIMPl, COL21A1, COL1A2, KL, CDKN2A and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancerous metastases in bone, liver and lung and is additionally particularly suitable to distinguish between metastasising and non- metastasising cancer, in particular from primary breast cancer.
  • Set bb, DSP, AR, IGF2, MSXl, SERPINEl, and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancerous metastases in bone, liver and lung and is additionally particularly suitable to distinguish between metastasising cancer in liver from metastasising cancer in bone and lung, in particular from primary beast cancer.
  • Set cc, FHLl, LMNA, GDNF and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in liver from metastases in bone, and lung.
  • Set dd, FBXW7, GNAS, KRT14 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non- metastasising cancer, in particular to distinguish metastases in liver and bone from metastases in lung.
  • Set ee, CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone and liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in bone from metastases in liver.
  • Set ff, DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to dia- gnose cancer in liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastasising liver cancer and non-metastasising cancer.
  • Set gg, FHLl, LMNA, GDNF and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in liver from metastases in bone, and lung.
  • Set hh, FBXW7, GNAS, KRT14 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non- metastasising cancer, in particular to distinguish metastases in liver and bone from metastases in lung.
  • Set ii, CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone and liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in bone from metastases in liver.
  • Set 11 SFN, BAZlA, DIRAS3, CTCFL, ARMCX2, GBP2, MAGEB2, NEUROD2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to identify breast cancer in particular in serum samples.
  • genes common to sets qq) , rr) , ss) , tt) and uu) are used to diagnose trisomy, in particular trisomy 21.
  • this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples.
  • This set allows an easy blood test, which may comprise disseminated cancerous cells.
  • the present invention furthermore provides additional subsets suitable to detect and diagnose breast cancer by using any at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more markers of the above set ww.
  • sub-subsets have been preferably validated according to any methods disclosed therein, in particular any cross-valida- tion methods providing a positive classification for the diagnosis of breast cancer (in comparison to non cancerous samples) as mentioned above for step d) , in particular having a p-value of less than 0.1, preferably less than 0.05, even more preferred less than 0.01, in a random-variance t-test.
  • this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples.
  • This set allows an easy blood test, which may comprise disseminated cancerous cells.
  • the set is used in a test together with control markers such as MARKl, PARPl, NHLH2, PSEN2, MTHFR, POS Biotin Control RET, DUSPlO.
  • Set yy, HICl, KL, ESRl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, in particular by using blood samples or samples derived from blood, including serum.
  • this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples. This set allows an easy blood test, which may comprise disseminated cancerous cells.
  • subsets a) to yy) in particular sets comprising markers of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of these subsets, preferably for the same disease or tumor type like breast, lung, liver, bone or thyroid cancer or trisomy 21 or arthritis, preferably complete sets a) to yy) .
  • the methylation of at least two genes is determined.
  • the present invention is provided as an array test system, at least ten, especially at least fifteen genes, are preferred.
  • test set-ups for example in microarrays ("gene-chips")
  • preferably at least 20, even more preferred at least 30, especially at least 40 genes are provided as test markers.
  • these markers or the means to test the markers can be provided in a set of probes or a set of primers, preferably both.
  • the set comprises up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000 probes or primer pairs (set of two primers for one amplification product), preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, or even more preferred up to 200 probes or primers of any kind, particular in the case of immobilized probes on a solid surface such as a chip.
  • primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.
  • the probes or primers are specific for a methyla- tion in the genetic regions defined by SEQ ID NOs 1081 to 1440, including the adjacent up to 500 base pairs, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent, corresponding to gene marker IDs 1 to 359 of table 1, respectively.
  • probes or primers of the inventive set are specific for the regions and gene loci identified in table 1, last column with reference to the sequence listing, SEQ ID NOs: 1081 to 1440.
  • these SEQ IDs correspond to a certain gene, the latter being a member of the inventive sets, in particular of the subsets a) to yy) , e.g..
  • the set of the present invention comprises probes or primers for at least one gene or gene product of the list according to table 1, wherein at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, especially preferred at least 100%, of the total probes or primers are probes or primers for genes of the list according to table 1.
  • the set in particular in the case of a set of hybridization probes, is provided immobilized on a solid surface, preferably a chip or in form of a mi- croarray.
  • gene chips using DNA molecules for detection of methylated DNA in the sample
  • Such gene chips also allow detection of a large number of nucleic acids.
  • the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized.
  • Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips .
  • the primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers.
  • the probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.
  • the present invention also relates to a method of identifying a disease or tumor type in a sample comprising DNA from a subject or patient, comprising obtaining a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof) , determining the methylation status of the genes in the sample for which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed disease or tumor type positive and/or negative state, thereby identifying the disease or tumor type in the sample.
  • a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof)
  • determining the methylation status of the genes in the sample for which the members of the set are specific for comparing the methylation status of the genes with the status of a confirmed disease or tumor type positive and/or negative state, thereby identifying the disease or tumor type in the sample.
  • inventive marker set including certain disclosed subsets and subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophae- geal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer, cancer from benign conditions, in particular for diagnostic or prognostic uses.
  • the markers used e.g. by utilizing primers or probes of the inventive set
  • the inventive diagnostic or prognostic method may be used in smaller amounts than e.g. in the set (or kit) or chip as such, which may be designed for more than one fine tuned diagnosis or prognosis.
  • the markers used for the diagnostic or prognostic method may be up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000, preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20,000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, up to 200, up to 100, up to 80, or even more preferred up to 60.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between thyroid cancer from benign thyroid tissue, in particular for diagnostic or prognostic uses.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between breast cancer from normal tissue and benign breast tumors, in particular for diagnostic or prognostic uses.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between hereditary from sporadic breast cancer, in particular for diagnostic or prognostic uses.
  • inventive marker set including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between breast cancer responsive to herceptin treatment from likely non-responders, in particular for diagnostic or prognostic uses.
  • Figure 1 A 961 gene classifier derived from genome-wide expression profiling enables differentiation of a group of patients with (yes) and without (no) metastases during follow up of patients suffering from breast cancer upon analyses of primary tumor tissues. Dendrogramm obtained from clustering experiments using centered correlation (values shown on the vertical axis) .
  • Figure 2 Performance of expression profiling versus CpG360 methylation. Correct classification (%) using 7 different classification tests is depicted from a 961 gene—classifier, a targeted set of 385 genes (Lauss 2007), and a 4 gene DNA- methylation classifier derived from the methylation test (Cp- G360A) . Although consisting of only 4 genes, the methylation based classifier performs best.
  • FIG. 3 Multidimensional scaling using the 19 gene classifier for serum testing of breast tumors illustrates good classification of tumor versus healthy controls.
  • Methylation data from DNA-samples of benign tumors (B) , the breast cancer cell line MCF7, normal females (NormF) and males (NormM) and several breast cancer patients (Tu) were derived from DNA upon preampli- fication of the methylated DNA; several normal controls (Norm_direct) were tested without preamplification.
  • Figure 4 Shows class prediction using PAMR (predicting analysis of microarrays) to determine the minimum subset of using the 359 marker genes of table 1.
  • the minimal set contains only 3 markers (set yy) . Further combinations resulted in the same mis- classification error of 0%.
  • Figure 5 Dendrogram for clustering experiments, using centered correlation and average linkage.
  • Table 1 360 master set (with the 359 marker genes and one control) and sequence annotation
  • Samples from solid tumors were derived from initial surgical resection of primary tumors. Tumor tissue sections were derived from histopathology and histopathological data as well clinical data were monitored over the time of clinical management of the patients and/or collected from patient reports in the study center. Anonymised data were provided.
  • Tissue samples were homogenized in a FASTPREP homogenizer (MP Biomedicals, Eschwege, Germany) in lysis buffer provided with the Qiagen "All Prep" nucleic acid preparation kit (Qiagen, Hilden, Germany) .
  • DNA and RNA concentrations were measured on a Nanodrop photometer.
  • RNA quality was controlled using a BioAna- lyser (Agilent, Waldbronn, Germany) . All conditions were according to manufacturer's recommendations.
  • RNA samples derived from breast cancer tissue were analyzed with 44k human whole genome oligo microarrays (Agilent Technologies) .
  • RNA expression levels from different samples were analyzed on a single microarray using the Single-Color Low RNA Input Linear Amplification Kit PLUS (Agilent Technologies, Waldbronn, Germany) .
  • 200ng of total RNA were employed and amplified samples were prepared for hybridization using the Gene Expression Hybridization Kit (Agilent Technologies).
  • Hybridization was performed over night at 65°C in - A A - a rotating hybridization oven (Agilent Technologies) . Stringency washes, image aquisition and feature extraction was performed according to the manufacturer's protocol (Agilent Technologies, Waldbronn, Germany) .
  • the invention assay is a multiplexed assay for DNA methyla- tion testing of up to (or even more than) 360 methylation candidate markers, enabling convenient methylation analyses for tumor-marker definition.
  • the test is a combined multiplex-PCR and microarray hybridization technique for multiplexed methylation testing.
  • inventive marker genes, PCR primer sequences, hybridization probe sequences and expected PCR products are given in table 1, above.
  • methylation analysis is performed via methylation dependent restriction enzyme (MSRE) digestion of 500ng of starting DNA.
  • MSRE methylation dependent restriction enzyme
  • a combination of several MSREs warrants complete digestion of unmethylated DNA. All targeted DNA regions have been selected in that way that sequences containing multiple MSRE sites are flanked by methylation independent restriction enzyme sites.
  • This strategy enables pre-amp- lification of the methylated DNA fraction before methylation analyses.
  • the design and pre-amplification would enable methylation testing on serum, urine, stool etc. when DNA is limiting.
  • the methylated DNA fraction is amplified within 16 multiplex PCRs and detected via microarray hybridization. Within these 16 multiplex-PCR reactions 360 different human DNA products can be amplified. From these about 20 amplicons serve as digestion & amplification controls and are either derived from known differentially methylated human DNA regions, or from several regions without any sites of MSREs used in this system.
  • the primer set (every reverse primer is biotinylated) used is targeting 347 different sites located in the 5'UTR of 323 gene regions .
  • PCR amplicons are pooled and positives are detected using strepavidin-Cy3 via microarray hybridization.
  • the melting temperature of CpG rich DNA is very high, primer and - 45 - probe-design as well as hybridization conditions have been optimized, thus this assay enables unequivocal multiplexed methyl- ation testing of human DNA samples.
  • the assay has been designed such that 24 samples can be run in parallel using 384well PCR plates .
  • the entire procedure provides the user to setup a specific PCR test and subsequent gel-based or hybridization-based testing of selected markers using single primer-pairs or primer-subsets as provided herein or identified by the inventive method from the 360 marker set.
  • MSRE digestion of DNA (about 500ng) was performed at 37°C over night in a volume of 30 ⁇ l in Ix Tango-restriction enzyme digestion buffer (MBI Fermentas) using 8 units of each MSREs Acil (New England Biolabs) , Hin 6 I and Hpa II (both from MBI Fermentas). Digestions were stopped by heat inactivation (10 min, 75°C) and subjected to PCR amplification.
  • Example 8 Microarray hybridization and detection:
  • Microarrays with the probes of the 360 marker set are blocked for 30 min in 3M Urea containing 0.1%SDS, at room temperature submerged in a stirred choplin char. After blocking slides are washed in 0. Ix SSC/0.2%SDS for 5 min, dipped into water and dried by centrifugation .
  • the PCR-amplicon-pool of each sample is mixed with an equal amount of 2x hybridization buffer (7xSSC, 0,6%SDS, 50% formam- ide) , denaturated for 5min at 95°C and held at 70 0 C until loading an aliqout of lOO ⁇ l onto an array covered by a gasket slide (Agilent) .
  • Arrays are hybridized under maximum speed of rotation in an Agilent-hybridization oven for 16h at 52°C. After removal of gasket-slides microarray-slides are washed at room temperature in wash-solution I (IxSSC, 0.2%SDS) for 5 min and wash solution II (0. IxSSC, 0.2%SDS) for 5 min, and a final wash by dipping the slides 3 times into wash solution III (0. IxSSC), the slides are dried by centrifugation.
  • wash-solution I IxSSC, 0.2%SDS
  • wash solution II (0. IxSSC, 0.2%SDS
  • streptavidin-Cy3-conjugate (Caltag Laboratories) is diluted 1:400 in PBST-MP (IxPBS, 0.1% Tween 20; 1% skimmed dry milk powder [Sucofin; Germany]), pipetted onto microarrays covered with a coverslip and incubated 30 min at room temperature in the dark. Then coverslips are washed off from the slides using PBST (IxPBS, 0.1% Tween 20) and then slides are washed in fresh PBST for 5 min, rinsed with water and dried by centrifugation.
  • PBST-MP IxPBS, 0.1% Tween 20
  • skimmed dry milk powder skimmed dry milk powder [Sucofin; Germany]
  • DNA amount is limited. Although the inventive methylation test is performing well with low amounts of DNA (see above), especially minimal invasive testing using cell free DNA from serum, stool, urine, and other body fluids is of diagnostic relevance.
  • RCA by phi29-phage polymerase.
  • the RCA-amplicons were then directly subjected to the multiplex-PCRs of the inventive methylation test without further need of digestion of the DNA prior amplification.
  • the preamplified DNA which is enriched for methylated DNA regions can be directly subjected to flourescent- labelling and the labeled products can be hybridized onto the microarrays using the same conditions as described above for hybridization of PCR products. Then the streptavidin-Cy3 detection step has to be omitted and slides should be scanned directly upon stringency washes and drying the slides. Based on the experimental design for microarray analyses, either single labeled or dual-labeled hybridizations might be generated. From our experiences we successfully used the single label-design for class comparisons. Although the preamplification protocol enables analyses of spurious amounts of DNA, it is also suited for performing genomic methylation screens.
  • Hybridizations performed on a chip with probes for the inventive 360 marker genes were scanned using a GenePix 4000A scanner (Molecular Devices, Ismaning, Germany) with a PMT set- - 48 - ting to 700V/cm (equal for both wavelengths) .
  • Raw image data were extracted using GenePix 6.0 software (Molecular Devices, Ismaning, Germany) .
  • Hybridizations performed on whole genome arrays were scanned using an Agilent DNA microarray scanner and raw image data were extracted using the Agilent Feature Extraction Software (v9.5.3.1) .
  • P-values (p) used for feature selection for classification and prediction were based on the univariate significance levels (alpha) .
  • P-values (p) and mis-classification rate during cross validation (MCR) were given along the result data.
  • Example 11 Multiplexed methylation testing outperforms the "classification" success when compared to genomewide a nd targeted screenings via RNA expression profiling
  • Example 12 Multiplexed Methylation Testing enables identification of biomarkers for a wide variety of (neoplastic) diseases
  • Table 3 Composition of classifier (12 genes) - Sorted by p- value :
  • Table 4 Composition of classifier (8 genes)- Sorted by p-value:
  • Example 13 Classification of diseased versus healthy on minimal amounts of initial DNA samples upon preamplification confirms suitability of the test for diagnosis of neoplastic disease
  • CML chronic myeloid leukemia
  • DNA samples were derived from 8 CML patients at diagnosis, 13 patients within their chronic phase of disease, 3 patients were in the accelerated phase and 3 were blast crisis patients.
  • DNA (lOOng) from CML-patients and controls were subjected to preamplification outlined in example 6.
  • the amplicons derived from the preamplification procedure were directly subjected to the inventive methylation test.
  • Table 5 Composition of classifier (36 genes) - Sorted by p- value :
  • the minute amounts of serum DNA (about 10-lOOng/ml) derived from patients and controls were subjected to preamplification of the methylated DNA fraction as outlined in the methods.
  • Derived amplicons were subjected to methylation testing using the inventive methylation test.
  • This example does illustrate that the test is suitable for classification of neoplastic disease, in this case breast cancer, from serum of patients. In other words the test enables minimal invasive diagnosis of malignancies.
  • Table 8 Composition of classifier - Sorted by t-value:
  • Example 14 Thyroid-Cancer-Diagnostics: diagnostic methylation markers for elucidation of nodular thyroid disease
  • FTA folicular adenoma (benign)
  • FTC Follicular thyroid carcinoma (malign)
  • PTC Papillary thyroid carcinoma (malign)
  • MTC Medullary thyroid carcinoma (malign)
  • MTC has been excluded within this class comparison due to its low frequency (about 5% of all thyroid malignancies) but is elucidated by the different genes in chapter 2.
  • MTC is distinguished from other entities (FTA, FTC, PTC, SN) as depicted in "node 2" classification list Although in 2) all classes are distinguished (sometimes to a not very good correct classification rate) , those contrasts which are of utmost clinical/diagnostic relevance were analysed in detail for distinguishing
  • Table 9 Sorted by p-value of the univariate test.
  • Class 1 benign; Class 2: FTC or PTC.
  • the first 7 genes are significant at the nominal 0.01 level of the univariate test
  • the support vector machine classifier was used for class prediction. There were 5 nodes in the classification tree. - 58 -
  • Table 10 Composition of classifier (5 genes)- Sorted by p- value :
  • Table 11 Composition of classifier (9 genes)- Sorted by p- value :
  • Table 12 Composition of classifier (5 genes) - Sorted by p- value :
  • Table 13 Composition of classifier (9 genes) - Sorted by p- value :
  • Table 14 Composition of classifier (8 genes) - Sorted by p- value
  • Table 15 Composition of classifier - Sorted by t-value (Sorted by gene pairs)
  • Class 1 FTA
  • Class 2 FTC
  • Table 16 Composition of classifier - Sorted by t-value: Class 1: FTA; Class 2: FTC
  • Table 17 Sorted by p-value of the univariate test.
  • Class 1 FTC
  • Class 2 SN.
  • Table 18 Genes which discriminate among classes - Sorted by p- value of the univariate test.
  • Class 1 PTC
  • Class 2 SN.
  • the first 16 genes are significant at the nominal 0.05 level of the univariate test
  • Metastasis Markers elucidation and prediction of patients at risk to develop metastases using tissue specimens from the primary tumor at the time of intial surgery - 66 -
  • organ of Metastases plus additional secondary affected metas organ (“liver_plus”, “lung_plus”, “bone_plus”)
  • N normal control individuum - in this settings the group N contains 4 healthy females and 2 females with a confirmed benign tumor (fibroadenoma) .
  • Table 19 Composition of classifier - Sorted by p-value
  • Table 20 Composition of classifier - Sorted by p-value
  • Table 21 Composition of classifier - Sorted by p-value
  • Table 22 Composition of classifier - Sorted by t-value
  • Table 23 Composition of classifier - Sorted by t -value: Class 1: m; Class 2: nm.
  • Geom mean Paramet- Geom mean of t- % CV of intens- Ratio of Gene ric p- . . . intensities value support ities in geom means symbol value in class 2 class 1
  • ParametGeom mean of t- % CV of intensRatio of Gene ric p- intensities value support ities in geom means symbol value in class 2 class 1
  • Table 24 Composition of classifier: - Sorted by t -value: Class 1: m; Class 2: nm.
  • Table 26 Composition of classifier - Sorted by p-value
  • Table 28 Composition of classifier - Sorted by t-value: Class 1: m; Class 2: nm .
  • Example 17 genelists for prediction of organ of metastases 17.1. Organ of Metastases (Binary Tree Classification)
  • Table 29 Composition of classifier (6 genes) - Sorted by p- value :
  • Table 30 Composition of classifier (5 genes) - Sorted by p- value :
  • Table 31 Composition of classifier (3 genes) - Sorted by p- value :
  • Table 32 Composition of classifier (3 genes) - Sorted by p- value :
  • Table 33 Composition of classifier (7 genes) - Sorted by p- value :
  • Table 34 Composition of classifier (6 genes) - Sorted by p- value :
  • GENEFILTERS ON Exclude a gene under any of the following conditions :
  • Table 35 Composition of classifier (3 genes) - Sorted by p- value :
  • Table 36 Composition of classifier (3 genes) - Sorted by p- value :
  • Table 37 Composition of classifier (7 genes) - Sorted by p- value :
  • Table 38 Composition of classifier (6 genes) - Sorted by p- value :
  • Example 17.4.1 Classifier defined using the inventive methylation test can be used for correct diagnosis and confirms scalability of the test
  • Table 39 Composition of classifier - Sorted by t-value: Class 1: Norm; Class 2: T.
  • Table 40 Composition of classifier - Sorted by t-value: Class 1: N; Class 2: T.
  • Table 41 Composition of classifier - Sorted by t -value: (Sorted by gene pairs) : Class 1: control; Class 2: nodule.
  • Table 42 Composition of classifier - Sorted by t-value: (Sorted by gene pairs): Class 1: control; Class 2: nodule.
  • Tumor-DNA from patients should be tested by the following markers for elucidating metastases already present, which might be not detectable by routine clinical examination or imaging.
  • Table 43 Composition of classifier (5 genes) sorted by p-value:
  • the p-value in the table is testing the hypothesis if expression data is predictive of survival.
  • Table 45 Loading matrix of the significant genes and the correlations between the principal components and the signficant genes :
  • a new sample is predicted as high (low) risk if its prognostic index is larger than (smaller than or equal to) 1.532975.
  • the prognostic index can be computed by the simple formula ⁇ iwi xi - 149.6498 where wi and xi are the weight and logged gene expression for the i-th gene.
  • the Cox proportional hazards model is fitted using the principal components and clinical covariates from the training data- set.
  • the estimated coefficients are (-3.184, -20.948) for the principal components and (-0.709, 0.148) for the clinical covariates
  • the p-value in the table is testing the hypothesis if the expression data is predictive of survival over and above the covariates .
  • Example 19 Methylation Markers in Non-Tumor / Non-Neo- plastic Disease: Trisomy diagnosis - 85 -
  • DNA derived from Cytogen fixed cells of Healthy Controls (5 females ...46XX; 5 males....46XY) and Trisomy-Patients (5 females....47XX+21; 6...males 47XY+21; and single samples with trisomy of chrl3 ... 47XX+13, and trisomy of chr 9...47XX+9 and one blinded sample with trisomy) were used for DNA Methylation testing.
  • Table 46 Composition of classifier (11 genes) - Sorted by p- value :
  • Table 47 Composition of classifier (19 genes) - Sorted by p- value :
  • Table 48 Composition of classifier (11 genes) - Sorted by p- value :
  • Table 49 Sorted by p-value of the univariate test.
  • Class 1 46; Class 2: 47.
  • the first 11 genes are significant at the nominal 0.01 level of the univariate test
  • Class 1 46; Class 2: 47.
  • the first 10 genes are significant at the nominal 0.01 level of the univariate test
  • Class 1 46; Class 2: 47.
  • the first 7 genes are significant at the nominal 0.01 level of the univariate test
  • Table 53 Composition of classifier - Sorted by t-value:
  • Class 1 46; Class 2: 47.
  • Leave-one-out cross-validation method was used to compute mis- classification rate.
  • Equal class prevalences is used in the Bayesian compound covariate predictor.
  • Negative Predictive Value n22/ (nl2+n22)
  • Sensitivity is the probability for a class A sample to be correctly predicted as class A
  • PPV is the probability that a sample predicted as class A actually belongs to class A
  • NPV is the probability that a sample predicted as non class A actually does not belong to class A.
  • Table 54 Composition of classifier - Sorted by t-value: Class 1: 46; Class 2: 47.
  • the area under the curve is 0.882.
  • classification rule used above is different from the class prediction.
  • a sample's posterior probability is greater than the threshold, it is predicted as Class 1. Otherwise, it is predicted as Class 2.
  • Osteoarthritis (OA, also known as degenerative arthritis, degen- - 94 - erative joint disease) is a group of diseases and mechanical abnormalities involving degradation of joints, [1] including articular cartilage and the subchondral bone next to it.
  • Table 55 Composition of classifier - Sorted by t-value
  • Example 21 Breast Cancer vs. blood DNA - 96 -
  • Example 21.1 Class Prediction using "grid of alpha levels”: resulted in 100% correct classification
  • Leave-one-out cross-validation method was used to compute mis- classification rate.
  • Equal class prevalences is used in the Bayesian compound covariate predictor.
  • Table 56 - Composition of classifier Sorted by t-value Class 1: BrCa; Class 2: norm blood.
  • Table 57 Composition of classifiers from Class Prediction Analysis - Sorted by gene pairs
  • Table 58 Composition of classifier - Sorted by t-value
  • Example 21.3 Class Prediction using PAMR ⁇ 100% correct Concept: define minial set of genes using PAM (prediction analysis of microarrays) elucidates 3 genes sufficient for 100% correct diagnostic testing
  • Prediction Table a cross-tabulation of true (rows) versus predicted (columns) classes for the PAM fit (Fig. 4a and b)
  • Table 59 Composition of PAM classifier - 3 genes selected by PAM (threshold equal to 8.57)
  • Class 1 BrCa
  • Class 2 norm blood.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Hospice & Palliative Care (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Oncology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Apparatus For Radiation Diagnosis (AREA)

Abstract

The present invention discloses a method of generating subsets of methylation specific markers from a set, having diagnostic power for various diseases, e.g. cancer of thyroid, breast, colon, or leukemia, in diverse samples; identified subsets of that set, as well as methods for the prognosis and diagnosis of diseases.

Description

Methylation Assay
The present invention relates to cancer diagnostic methods and means therefore.
Neoplasms and cancer are abnormal growths of cells. Cancer cells rapidly reproduce despite restriction of space, nutrients shared by other cells, or signals sent from the body to stop reproduction. Cancer cells are often shaped differently from healthy cells, do not function properly, and can spread into many areas of the body. Abnormal growths of tissue, called tumors, are clusters of cells that are capable of growing and dividing uncontrollably. Tumors can be benign (noncancerous) or malignant (cancerous) . Benign tumors tend to grow slowly and do not spread. Malignant tumors can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the body. Malignant cancers can be both locally invasive and metastatic. Locally invasive cancers can invade the tissues surrounding it by sending out "fingers" of cancerous cells into the normal tissue. Metastatic cancers can send cells into other tissues in the body, which may be distant from the original tumor. Cancers are classified according to the kind of fluid or tissue from which they originate, or according to the location in the body where they first developed. All of these parameters can effectively have an influence on the cancer characteristics, development and progression and subsequently also cancer treatment. Therefore, reliable methods to classify a cancer state or cancer type, taking diverse parameters into consideration is desired. Since cancer is predominantly a genetic disease, trying to classify cancers by genetic parameters is one extensively studied route.
Extensive efforts have been undertaken to discover genes relevant for diagnosis, prognosis and management of (cancerous) disease . Mainly RNA-expression studies have been used for screening to identify genetic biomarkers. Over recent years it has been shown that changes in the DNA-methylation pattern of genes could be used as biomarkers for cancer diagnostics. In concordance with the general strategy identifying RNA-expression based biomarkers, the most convenient and prospering approach would start to identify marker candidates by genome-wide screening of methylation changes.
The most versatile genome-wide approaches up to now are us- ing microarray hybridization based techniques. Although studies have been undertaken at the genomic level (and also the single-gene level) for elucidating methylation changes in diseased versus normal tissue, a comprehensive test obtaining a good success rate for identifying biomarkers is yet not available.
Developing biomarkers for disease (especially cancer) - screening, -diagnosis, and -treatment was improved over the last decade by major advances of different technologies which have made it easier to discover potential biomarkers through high- throughput screens. Comparing the so called "OMICs"-approaches like Genomics, Proteomics, Metabolomics, and derivates from those, Genomics is best developed and most widely used for bio- marker identification. Because of the dynamic nature of RNA expression and the ease of nucleic acid extraction and the detailed knowledge of the human genome, many studies have used RNA expression profiling for elucidation of class differences for distinguishing the "good" from the "bad" situation like diseased vs. healthy, or clinical differences between groups of diseased patients. Over the years especially microarray-based expression profiling has become a standard tool for research and some approaches are currently under clinical validation for diagnostics. The plasticity over a broad dynamic range of RNA expression levels is an advantage using RNA and also a prerequisite of successful discrimination of classes, the low stability of RNA itself is often seen as a drawback. Because stability of DNA is tremendously higher than stability of RNA, DNA based markers are more promising markers and expected to give robust assays for diagnostics. Many of clinical markers in oncology are more or less DNA based and are well established, e.g. cytogenetic analyses for diagnosis and classification of different tumor-species. However, most of these markers are not accessible using the cheap and efficient molecular-genetic PCR routine tests. This might be due to 1) the structural complexity of changes, 2) the inter-individual differences of these changes at the DNA-sequence level, and 3) the relatively low "quantitative" fold-changes of those "chromosomal" DNA changes. In comparison, RNA-expression changes range over some orders of magnitudes and these changes can be easily measured using genome-wide expression microarrays. These expression arrays are covering the entire translated transcriptome by 20000-45000 probes. Elucidation of DNA changes via microarray techniques requires in general more probes depending on the requested resolution. Even order (s) of magnitude more probes are required than for standard expression profiling to cover the entire 3xlO9 bp human genome. For obtaining best resolution when screening bio- markers at the structural genomic DNA level, today genomic tiling arrays and SNP-arrays are available. Although costs of these techniques analysing DNA have decreased over recent years, for biomarker screening many samples have to be tested, and thus these tests are cost intensive.
Another option for obtaining stable DNA-based biomarkers relies on elucidation of the changes in the DNA methylation pattern of (malignant; neoplastic) disease. In the vertebrate genome methylation affects exclusively the cytosine residues of CpG dinucleotides, which are clustered in CpG islands. CpG islands are often found associated with gene-promoter sequences, present in the 5' -untranslated gene regions and are per default unmethylated. In a very simplified view, an unmethylated CpG island in the associated gene-promoter enables active transcription, but if methylated gene transcription is blocked. The DNA methylation pattern is tissue- and clone-specific and almost as stable as the DNA itself. It is also known that DNA-methylation is an early event in tumorigenesis which would be of interest for early and initial diagnosis of disease.
Shames D et al . (PLOS Medicine 3(12) (2006): 2244-2262) identified multiple genes that are methylated with high penetrance in primary lung, breast, colon and prostate cancers.
Sato N et al . (Cancer Res 63(13) (2003): 3735-3742) identified potential targets with aberrant methylation in pancreatic cancer. These genes were tested using a treatment with a de- methylating agent (5-aza-2 ' -deoxycytidine and/or the histone deacetylase inhibitor trichostatin A) after which certain genes were increased transcribed.
Bibikova M et al . (Genome Res 16(3) (2006): 383-393) analysed lung cancer biopsy samples to identify methylated cpu sites to distinguish lung adenocarcinomas from normal lung tissues .
Yan P S et al . (Clin Cancer Res 6(4) (2000): 1432-1438) analysed CpG island hypermethylation in primary breast tumor. Cheng Y et al . (Genome Res 16(2) (2006): 282-289) discussed DNA methylation in CpG islands associated with transcriptional silencing of tumor suppressor genes.
Ongenaert M et al . (Nucleic Acids Res 36 (2008) Database issue D842-D846) provided an overview over the methylation database "PubMeth".
Microarray for human genome-wide hybridization testings are known, e.g. the Affymetrix Human Genome U133A Array (NCBl Database, Ace. No. GLP96) .
In principle screening for biomarkers suitable to answering clinical questions including DNA-methylation based approaches would be most successful when starting with a genome-wide approach. A substantial number of differentially methylated genes has been discovered over years rather by chance than by rationality. Albeit some of these methylation changes have the potential being useful markers for differentiation of specifically defined diagnostic questions, these would lack the power for successful delineation of various diagnostic constellations. Thus, the rational approach would start at the genomic-screen for distinguishing the "subtypes" and diagnostically, prognost- ically and even therapeutically challenging constellations. These rational expectations are the base of starting genomic (and also other -omics) screenings but do not warrant to obtain the maker panel for all clinical relevant constellations which should be distinguished. This is neither unreliable when thinking about a universal approach (e.g. transcriptomics) suitable to distinguish for instance all subtypes in all different malignancies by focusing on a single class of target-molecules (e.g. RNA) . Rather all omics-approaches together would be necessary and could help to improve diagnostics and finally patient management .
A goal of the present invention is to provide an alternative and more cost-efficient route to identify suitable markers for cancer diagnostics.
Therefore, in a first aspect, the present invention provides a method of determining a subset of diagnostic markers for potentially methylated genes from the genes of gene IDs 1-359 in table 1, suitable for the diagnosis or prognosis of a disease or tumor type in a sample, comprising a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene ID 1-359 in at least 1 sample, preferably 2, 3, 4 or at least 5 samples, of a confirmed disease or tumor type positive state and at least one sample of a disease or tumor type negative state, b) correlating the results of the obtained methylation status with the disease or tumor type states, c) optionally repeating the obtaining a) and correlating b) steps for at least partially different at least 50 random genes selected from the 359 genes of gene IDs 1-359, and d) selecting as many marker genes which in a classification analysis together yield at least a 65%, preferably at least 70%, correct classification of the disease or tumor type or have a p-value of less than 0.1, preferably less than 0.05, even more preferred less than 0.01, in a random-variance t- test, wherein the selected markers form the subset of the diagnostic markers .
The present invention provides a master set of 359 genetic markers which has been surprisingly found to be highly relevant for aberrant methylation in the diagnosis or prognosis of dis¬ eases. It is possible to determine a multitude of marker subsets from this master set which can be used to differentiate between various disease or tumor type.
The inventive 359 marker genes of table 1 (given in example 1 below) are: NHLH2, MTHFR, PRDM2, MLLTIl, S100A9 (control), S100A9, S100A8 (control), S100A8, S100A2, LMNA, DUSP23, LAMC2, PTGS2, MARKl, DUSPlO, PARPl, PSEN2, CLIC4, RUNX3, AIMlL, SFN, RPA2, TP73, TP73 (p73), POU3F1, MUTYH, UQCRH, FAFl, TACSTD2, TN- FRSF25, DIRAS3, MSH4, GBP2, GBP2, LRRC8C, F3, NANOSl, MGMT, EBF3, DCLRElC, KIF5B, ZNF22, PGBD3, SRGN, GATA3, PTEN, MMS19, SFRP5, PGR, ATM, DRD2, CADMl, TEADl, OPCML, CALCA, CTSD, MYODl, IGF2, BDNF, CDKNlC, WTl, HRAS, DDBl, GSTPl, CCNDl, EPS8L2, PI- WIL4, CHSTIl, UNG, CCDC62, CDK2AP1, CHFR, GRIN2B, CCND2, VDR, B4GALNT3, NTF3, CYP27B1, GPR92, ERCC5, GJB2, BRCA2 , KL, CCNAl, SMAD9, C13orfl5, DGKH, DNAJC15, RBl, RCBTB2, PARP2, APEXl, JUB, JUB (control_NM_198086) , EFS, BAZlA, NKX2-1, ESR2, HSPA2, PSENl, PGF, MLH3, TSHR, THBSl, MYO5C, SMAD6, SMAD3, NOX5, DNAJA4 , CRABPl, BCL2A1 (ID NO: 111), BCL2A1 (ID NO: 112), BNCl, ARRDC4, SOCSl, ERCC4, NTHLl, PYCARD, AXINl, CYLD, MT3, MTlA, MTlG, CDHl, CDH13, DPHl, HICl, NEUROD2 (control), NEUROD2, ERBB2, KRT19, KRT14, KRT17, JUP, BRCAl, COLlAl, CACNAlG, PRKARlA, SPHKl, S0X15, TP53 (TP53_CGI23_lkb) , TP53 (TP53_both_CGIs_lkb) , TP53
(TP53_CGI36_lkb) , TP53, NPTXl, SMAD2, DCC, MBD2, 0NECUT2, BCL2, SERPINB5, SERPINB2 (control), SERPINB2, TYMS, LAMAl, SALL3, LDLR, STKIl, PRDX2, RAD23A, GNA15, ZNF573, SPINT2, XRCCl, ERCC2, ERCCl, C5AR1 (NM_001736), C5AR1, POLDl, ZNF350, ZNF256, C3, XAB2, ZNF559, FHL2, ILlB, ILlB (control), PAX8, DDX18, GADl, DLX2, ITGA4, NEURODl, STATl, TMEFF2, HECW2, BOLL, CASP8, SER- PINE2, NCL, CYPlBl, TACSTDl, MSH2, MSH6, MXDl, JAGl, F0XA2 , THBD, CTCFL, CTSZ, GATA5, CXADR, APP, TTC3, KCNJ15, RIPK4, TFFl, SEZ6L, TIMP3, BIK, VHL, IRAK2, PPARG, MBD4, RBPl, XPC, ATR, LXN, RARRESl, SERPINIl, CLDNl, FAM43A, IQCG, THRB, RARB, TGFBR2, MLHl, DLECl, CTNNBl, ZNF502, SLC6A20, GPXl, RASSFl, FHIT, OGGl, PITX2, SLC25A31, FBXW7, SFRP2, CHRNA9, GABRA2 , MSXl, IGFBP7, EREG, AREG, ANXA3, BMP2K, APC, HSD17B4 (ID No 249), HSD17B4 (ID No 250), LOX, TERT, NEUROGl, NR3C1, ADRB2, CDXl, SPARC, C5orf4, PTTGl, DUSPl, CPEB4, SCGB3A1, GDNF, ERCC8, F2R, F2RL1, VCAN, ZDHHCIl, RH0BTB3, PLAGLl, SASHl, ULBP2, ESRl, RNASET2, DLLl, HIST1H2AG, HLA-G, MSH5, CDKNlA, TDRD6, COL21A1, DSP, SERPINEl
(ID No 283), SERPINEl (ID No 284), FBXL13, NRCAM, TWISTl, HOXAl, HOXAlO, SFRP4, IGFBP3, RPA3, ABCBl, TFPI2, COL1A2, ARPClB, PILRB, GATA4, MAL2, DLCl, EPPKl, LZTSl, TNFRSFlOB, TNFRSFlOC, TNFRSFlOD, TNFRSFlOA, WRN, SFRPl, SNAI2, RDHE2, PENK, RDHlO, TGFBRl, ZNF462, KLF4, CDKN2A, CDKN2B, AQP3, TPM2, TJP2 (ID NO 320), TJP2 (ID No 321), PSATl, DAPKl, SYK, XPA, ARMCX2, RHOXFl, FHLl, MAGEB2, TIMPl, AR, ZNF711, CD24, ABLl, ACTB, APC, CDHl
(Ecad 1), CDHl (Ecad2), FMRl, GNAS, H19, HICl, IGF2, KCNQl, GNAS, CDKN2A (P14), CDKN2B (P15), CDKN2A (P16_VL), PITXA, PITXB, PITXC, PITXD, RBl, SFRP2, SNRPN, XIST, IRF4, UNC13B, GSTPl. Table 1 lists some marker genes in the double such as for different loci and control sequences. It should be understood that any methylation specific region which is readily known to the skilled man in the art from prior publications or available databases (e.g. PubMeth at www.pubmeth.org) can be used according to the present invention. Of course, double listed genes only need to be represented once in an inventive marker set (or set of probes or primers therefor) but preferably a second marker, such as a control region is included (IDs given in the list above relate to the gene ID (or gene loci ID) given in table 1 of the example section) .
One advantage making DNA methylation an attractive target for biomarker development, is the fact that cell free methylated DNA can be detected in body-fluids like serum, sputum, and urine from patients with cancerous neoplastic conditions and disease. For the purpose of biomarker screening, clinical samples have to be available. For obtaining a sufficient number of samples with clinical and "outcome" or survival data, the first step would be using archived (tissue) samples. Preferably these materials should fulfill the requirements to obtain intact RNA and DNA, but most archives of clinical samples are storing formalin fixed paraffin embedded (FFPE) tissue blocks. This has been the clinic-pathological routine done over decades, but that fixed samples are if at all only suitable for extraction of low quality of RNA. It has now been found that according to the present invention any such samples can be used for the method of generating an inventive subset, including fixed samples. The samples can be of lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer. These cancers can then be subsequently diagnosed by the inventive set (or subsets) .
The present invention provides a multiplexed methylation testing method which 1) outperforms the "classification" success when compared to genomewide screenings via RNA-expression profiling, 2) enables identification of biomarkers for a wide variety of diseases, without the need to prescreen candidate markers on a genomewide scale, and 3) is suitable for minimal invasive testing and 4) is easily scalable.
In contrast to the rational strategy for elucidation of biomarkers for differentiation of disease, the invention presents a targeted multiplexed DNA-methylation test which outperforms genome-scaled approaches (including RNA expression profiling) for disease diagnosis, classification, and prognosis.
The inventive set of 359 markers enables selection of a subset of markers from this 359 set which is highly characteristic of a given disease or tumor type. Preferably the disease is a neoplastic condition. However, not only cancer can be diagnosed with the inventive set or given selective subsets thereof, but a wide range of other diseases detected via the DNA methylation changes of the patient. Diseases can be genetic diseases of few, many or all cells in a subject patient (including cancer), or infectious diseases, which lead to altered gene regulation via DNA methylation, e.g. viral, in particular retroviral, infections. Preferably the disease is a trisomy, such as trisomy 21. Diseases, in particular neoplastic conditions, or tumor types include, without being limited thereto, cancer of different origin such as lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer. Further indicators differentiating between diseases, neoplastic conditions or tumor types are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic tumors or nodules. It is sometimes possible to differentiate the sample type from which the methylated DNA is isolated, e.g. urine, blood, tissue samples.
The present invention is suitable to differentiate diseases, in particular neoplastic conditions, or tumor types. Diseases and neoplastic conditions should be understood in general including benign and malignant conditions. According to the present invention benign nodules (being at least the potential onset of malignancy) are included in the definition of a disease. After the development of a malignancy the condition is a preferred disease to be diagnosed by the markers screened for or used according to the present invention. The present invention is suitable to distinguish benign and malignant tumors (both being considered a disease according to the present invention) . In particular the invention can provide markers (and their diagnostic or prognostic use) distinguishing between a normal healthy state together with a benign state on one hand and malignant states o n the other hand. The invention is also suitable to differentiate between non-solid cancers including leukemia and healthy states. A diagnosis of a disease may include identifying the difference to a normal healthy state, e.g. the absence of any neoplastic nodules or cancerous cells. The present invention can also be used for prognosis of such conditions, in particular a prediction of the progression of a disease, such as a neoplastic condition, or tumor type. A particularly preferred use of the invention is to perform a diagnosis or prognosis of a metastasising neoplastic disease (distinguished from non-metastasising conditions) .
In the context of the present invention "prognosis", "prediction" or "predicting" should not be understood in an absolute sense, as in a certainty that an individual will develop cancer or a disease or tumor type (including cancer progression) , but as an increased risk to develop cancer or the disease or tumor type or of cancer progression. "Prognosis" is also used in the context of predicting disease progression, in particular to predict therapeutic results of a certain therapy of the disease, in particular neoplastic conditions, or tumor types. The prognosis of a therapy can e.g. be used to predict a chance of success (i.e. curing a disease) or chance of reducing the severity of the disease to a certain level. As a general inventive concept, markers screened for this purpose are preferably derived from sample data of patients treated according to the therapy to be predicted. The inventive marker sets may also be used to monitor a patient for the emergence of therapeutic results or positive disease progressions.
Some of the inventive, rationally selected markers have been found methylated in some instances. DNA methylation analyses in principle rely either on bisulfite deamination-based methylation detection or on using methylation sensitive restriction enzymes. Preferably the restriction enzyme-based strategy is used for elucidation of DNA-methylation changes. Further methods to determine methylated DNA are e.g. given in EP 1 369 493 Al or US 6,605,432. Combining restriction digestion and multiplex PCR amplification with a targeted microarray-hybridization is a particular advantageous strategy to perform the inventive methylation test using the inventive marker sets (or subsets) . A microarray-hybridization step can be used for reading out the PCR results. For the analysis of the hybridization data statistical approaches for class comparisons and class prediction can be used. Such statistical methods are known from analysis of RNA-expression derived microarray data.
If only limiting amounts of DNA were available for analyses an amplification protocol can be used enabling selective amplification of the methylated DNA fraction prior methylation testing. Subjecting these amplicons to the methylation test, it was possible to successfully distinguish DNA from sensitive cases, e.g. distinguishing leukemia (CML) from normal healthy controls. In addition it was possible to distinguish breast-cancer patients from healthy normal controls using DNA from serum by the inventive methylation test upon preamplification . Both examples clearly illustrate that the inventive multiplexed methylation testing can be successfully applied when only limiting amounts of DNA are available. Thus, this principle might be the preferred method for minimal invasive diagnostic testing.
In most situations several genes are necessary for classification. Although the 359 marker set test is not a genome-wide test and might be used as it is for diagnostic testing, running a subset of markers - comprising the classifier which enables best classification - would be easier for routine applications. The test is easily scalable. Thus, to test only the subset of markers, comprising the classifier, the selected subset of primers/probes could be applied directly to set up of the lower multiplexed test (or single PCR-test) . This was confirmed when serum DNA using a classifier for distinguishing healthy females from individuals with breast-tumors (or other specific tumors) was tested. Only the specific primers comprising the gene-classifier obtained from the methylation test were set up together in multiplexed PCR reactions. Data derived upon hybridization of PCR amplicons were in line with initial classification. Thus, correct classification with the down-scaled test using only a subset was possible.
In summary the inventive methylation test is a suitable tool for differentiation and classification of neoplastic disease. This assay can be used for diagnostic purposes and for defining biomarkers for clinical relevant issues to improve diagnosis of disease, and to classify patients at risk for disease progression, thereby improving disease treatment and patient management .
The first step of the inventive method of generating a sub- set, step a) of obtaining data of the methylation status, preferably comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific digestion analysis. Methylation specific digestion analysis can include either or both of hybridization of suitable probes for detection to non-digested fragments or PCR amplification and detection of non-digested fragments.
The inventive selection can be made by any (known) classification method to obtain a set of markers with the given diagnostic (or also prognostic) value to categorize a certain disease or tumor type. Such methods include class comparisons wherein a specific p-value is selected, e.g. a p-value below 0.1, preferably below 0.08, more preferred below 0.06, in particular preferred below 0.05, below 0.04, below 0.02, most preferred below 0.01.
Preferably the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test or t-value test or F-test. Rated (best first, i.e. low p- or t-value) markers are the subsequently selected and added to the subset until a certain diagnostic value is reached, e.g. the herein mentioned at least 70% (or more) correct classification of the disease or tumor type.
Class Comparison procedures include identification of genes that were differentially methylated among the two classes using a random-variance t-test. The random-variance t-test is an improvement over the standard separate t-test as it permits sharing information among genes about within-class variation without assuming that all genes have the same variance (Wright G. W. and Simon R, Bioinformatics 19:2448-2455,2003). Genes were considered statistically significant if their p value was less than a certain value, e.g. 0.1 or 0.01. A stringent significance threshold can be used to limit the number of false positive findings. A global test can also be performed to determine whether the expression profiles differed between the classes by permuting the labels of which arrays corresponded to which classes. For each permutation, the p-values can be re-computed and the number of genes significant at the e.g. 0.01 level can be noted. The proportion of the permutations that give at least as many significant genes as with the actual data is then the significance level of the global test. If there are more than 2 classes, then the "F-test" instead of the "t-test" should be used.
Class Prediction includes the step of specifying a significance level to be used for determining the genes that will be included in the subset. Genes that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the set. It doesn't matter whether the specified significance level is small enough to exclude enough false discoveries. In some problems better prediction can be achieved by being more liberal about the gene sets used as features. The sets may be more biologically interpretable and clinically applicable, however, if fewer genes are included. Similar to cross-validation, gene selection is repeated for each training set created in the cross- validation process. That is for the purpose of providing an unbiased estimate of prediction error. The final model and gene set for use with future data is the one resulting from application of the gene selection and classifier fitting to the full dataset .
Models for utilizing gene methylation profile to predict the class of future samples can also be used. These models may be based on the Compound Covariate Predictor (Radmacher et al . Journal of Computational Biology 9:505-511, 2002), Diagonal Linear Discriminant Analysis (Dudoit et al . Journal of the American Statistical Association 97:77-87, 2002), Nearest Neighbor Classification (also Dudoit et al . ) , and Support Vector Machines with linear kernel (Ramaswamy et al . PNAS USA 98:15149-54, 2001). The models incorporated genes that were differentially methylated among genes at a given significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455,2003). The prediction error of each model using cross validation, preferably leave-one-out cross-validation (Simon et al . Journal of the National Cancer Institute 95:14-18, 2003), is preferably estimated. For each leave-one-out cross-validation training set, the entire model building process was repeated, including the gene selection process. It may also be evaluated whether the cross- validated error rate estimate for a model was significantly less than one would expect from random prediction. The class labels can be randomly permuted and the entire leave-one-out cross-val- idation process is then repeated. The significance level is the proportion of the random permutations that gave a cross-validated error rate no greater than the cross-validated error rate obtained with the real methylation data. About 1000 random permutations may be usually used.
Another classification method is the greedy-pairs method described by Bo and Jonassen (Genome Biology 3 (4) : research0017.1- 0017.11, 2002) . The greedy-pairs approach starts with ranking all genes based on their individual t-scores on the training set. The procedure selects the best ranked gene gx and finds the one other gene g: that together with gx provides the best discrimination using as a measure the distance between centroids of the two classes with regard to the two genes when projected to the diagonal linear discriminant axis. These two selected genes are then removed from the gene set and the procedure is repeated on the remaining set until the specified number of genes have been selected. This method attempts to select pairs of genes that work well together to discriminate the classes.
Furthermore, a binary tree classifier for utilizing gene methylation profile can be used to predict the class of future samples. The first node of the tree incorporated a binary classifier that distinguished two subsets of the total set of classes. The individual binary classifiers were based on the "Support Vector Machines" incorporating genes that were differentially expressed among genes at the significance level (e.g. 0.01, 0.05 or 0.1) as assessed by the random variance t-test (Wright G. W. and Simon R. Bioinformatics 19:2448-2455, 2003) . Classifiers for all possible binary partitions are evaluated and the partition selected was that for which the cross-validated prediction error was minimum. The process is then repeated successively for the two subsets of classes determined by the previous binary split. The prediction error of the binary tree classifier can be estimated by cross-validating the entire tree building process. This overall cross-validation included re-selection of the optimal partitions at each node and re-selection of the genes used for each cross-validated training set as described by Simon et al . (Simon et al . Journal of the National Cancer Institute 95:14-18, 2003). 10-fold cross validation in which one-tenth of the samples is withheld can be utilized, a binary tree developed on the remaining 9/10 of the samples, and then class membership is predicted for the 10% of the samples withheld. This is repeated 10 times, each time withholding a different 10% of the samples. The samples are randomly partitioned into 10 test sets (Simon R and Lam A. BRB-ArrayTools User Guide, version 3.2. Biometric Research Branch, National Cancer Institute) .
Preferably the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p-value test. It is also possible to include a step in that the genes are selected d) in order of their rating.
Independent from the method that is finally used to produce a subset with certain diagnostic or predictive value, the subset selection preferably results in a subset with at least 60%, preferably at least 65%, at least 70%, at least 75%, at least 80% or even at least 85%, at least 90%, at least 92%, at least 95%, in particular preferred 100% correct classification of test samples of the disease or tumor type. Such levels can be reached by repeating c) steps a) and b) of the inventive method, if necessary.
To prevent increase of the number of the members of the subset, only marker genes with at least a significance value of at most 0.1, preferably at most 0.8, even more preferred at most 0.6, at most 0.5, at most 0.4, at most 0.2, or more preferred at most 0.01 are selected.
In particular preferred embodiments the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 190, at least 200, at least 220, at least 240, at least 260, at least 280, at least 300, at least 320, at least 340, at least 350 or all, genes.
Since the subset should be small it is preferred that not more than 60, or not more than 40, preferably not more than 30, in particular preferred not more than 20, marker genes are selected in step d) for the subset.
In a further aspect the present invention provides a method of identifying a disease or tumor type in a sample comprising DNA from a patient, comprising providing a diagnostic subset of markers identified according to the method depicted above, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed disease or tumor type positive and/or negative state, thereby identifying the disease or tumor type in the sample .
The methylation status can be determined by any method known in the art including methylation dependent bisulfite deamination (and consequently the identification of mC - methylated C - changes by any known methods, including PCR and hybridization techniques) . Preferably, the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridisation analysis to non-digested or digested fragments or PCR amplification analysis of non-digested fragments. The methylation status can also be determined by any probes suitable for determining the methylation status including DNA, RNA, PNA, LNA probes which optionally may further include methylation specific moieties.
As further explained below the methylation status can be particularly determined by using hybridisation probes or amplification primer (preferably PCR primers) specific for methylated regions of the inventive marker genes. Discrimination between methylated and non-methylated genes, including the determination of the methylation amount or ratio, can be performed by using e.g. either one of these tools.
The determination using only specific primers aims at specifically amplifying methylated (or in the alternative non- methylated) DNA. This can be facilitated by using (methylation dependent) bisulfite deamination, methylation specific enzymes or by using methylation specific nucleases to digest methylated (or alternatively non-methylated) regions - and consequently only the non-methylated (or alternatively methylated) DNA is obtained. By using a genome chip (or simply a gene chip including hybridization probes for all genes of interest such as all 359 marker genes) , all amplification or non-digested products are detected. I.e. discrimination between methylated and non-methylated states as well as gene selection (the inventive set or subset) is before the step of detection on a chip.
Alternatively it is possible to use universal primers and amplify a multitude of potentially methylated genetic regions (including the genetic markers of the invention) which are, as described either methylation specific amplified or digested, and then use a set of hybridisation probes for the characteristic markers on e.g. a chip for detection. I.e. gene selection is performed on the chip.
Either set, a set of probes or a set of primers, can be used to obtain the relevant methylation data of the genes of the present invention. Of course, both sets can be used.
The method according to the present invention may be performed by any method suitable for the detection of methylation of the marker genes. In order to provide a robust and optionally re-useable test format, the determination of the gene methylation is preferably performed with a DNA-chip, real-time PCR, or a combination thereof. The DNA chip can be a commercially available general gene chip (also comprising a number of spots for the detection of genes not related to the present method) or a chip specifically designed for the method according to the present invention (which predominantly comprises marker gene detection spots) .
Preferably the methylated DNA of the sample is detected by a multiplexed hybridization reaction. In further embodiments a methylated DNA is preamplified prior to hybridization, preferably also prior to methylation specific amplification, or digestion. Preferably, also the amplification reaction is multiplexed (e.g. multiplex PCR).
The inventive methods (for the screening of subsets or for diagnosis or prognosis of a disease or tumor type) are particularly suitable to detect low amounts of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500ng, below 400ng, below 300ng, below 200ng, below lOOng, below 50ng or even below 25ng. The inventive method is particularly suitable to detect low concentrations of methylated DNA of the inventive marker genes. Preferably the DNA amount in the sample is below 500ng, below 400ng, below 300ng, below 200ng, below lOOng, below 50ng or even below 25ng, per ml sample .
In another aspect the present invention provides a subset comprising or consisting of nucleic acid primers or hybridization probes being specific for a potentially methylated region of at least marker genes selected from one of the following groups a) CHRNA9, RPA2, CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SER- PINB5, PIWIL4, NTF3, CDK2AP1 b) IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl c) KRT17, IGFBP7, RHOXFl, CLIC4, TP53, DLX2, ITGA4, AIMlL, SERPINl, SERPIN2, TP53, XIST, TEADl, CDKN2A, CTSD, OPCML, RPA2, BRCA2, CDHl, S100A9, SERPINB2, BCL2A1, UNC13B, ABLl, TIMPl, ATM, FBXW7, SFRP5, ACTB, MSXl, LOX, S0X15, DGKH, CYLD, XPA, XPC d) NEUR0D2, CTCFL, GBP2, SFN, MAGEB2, DIRAS3, ARMCX2, HRAS e) SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUR0D2 f) PITX2, TJP2, CD24, ESRl, TNFRSFlOD, PRA3, RASSFl g) GATA5, RASSFl, HIST1H2AG, NPTXl, UNC13B h) SMAD3, NANOSl, TERT, BCL2, SPARC, SFRP2, MGMT, MYODl, LAMAl i) TJP2, CALCA, PITX2, TFPI2, CDKN2B j) PITX2, TNFRSFlOD, PAX8, RAD23A, GJB2, F2R, TP53, NTHLl,
TP53 k) ARRDC4, DUSPl, SMAD9, HOXAlO, C3, ADRB2, BRCA2 , SYK 1) PITX2, MT3, RPA3, TNFRSFlOD, PTEN, TP53, PAX8, TGFBR2,
HICl, CALCA, PSATl, MBD2, NTF3, PLAGLl, F2R, GJB2, ARRDC4,
NTHLl m) MT3, RPA3, TNFRSFlOD, HOXAl, C13orf15, TGFBR2, HICl, CALCA,
PSATl, NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl n) PITX2, PAX8, CD24, TP53, ESRl, TNFRSFlOD, RAD23A, SCGB3A1.
RARB, TP53, LZTSl o) DUSPl, TFPI2, TJP2, S100A9, BAZlA, CPEB4, AIMlL, CDKN2A,
PITX2, ARPClB, RPA3, SPARC, SFRP4, LZTSl, MSH4, PLAGLl, AB-
CBl, C13orfl5, XIST, TDRD6, CCDC62, HOXAl, IRF4, HSD12B4,
S100A9, MT3, KCNJ15, BCL2A1, S100A8, PITX2, THBD, NANOSl,
SYK, SMAD2, GNAS, HRAS, RARRESl, APEXl, or p) TJP2, CALCA, PITX2, PITX2, ESRl, EFSSMAD3, ARRDC4, CD24,
FHL2, PITX2, RDHE2, KIF5B, C3, KRT17, RASSFl q) CHRNA9, RPA2 , CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SER-
PINB5, PIWIL4, NTF3, CDK2AP1 r) IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl s) KRT17, AQP3, TP53, ZNF462, NEUROGl, GATA3, MTlA, JUP,
RGC32, SPINT2, DUSPl t) NCL, XPA, MYODl, Pitx2 u) SPARC, PIWIL4, SERPINB5, TEADl, EREG, ZDHHCIl, C5orf4 v) HSD17B4, DSP, SPARC, KRT17, SRGN, C5orf4, PIWIL4, SERPINB5,
ZDHHCIl, EREG w) TIMPl, COL21A1, COL1A2, KL, CDKN2A x) TIMPl, COL21A1, COL1A2 y) BCL2A1, SERPINB2, SERPINEl, CLIC4, BCL2A1, ZNF256, ZNF573,
GNAS, SERPINB2 z) TDRD6, XIST, LZTSl, IRF4 aa) TIMPl, COL21A1, COL1A2, KL, CDKN2A, Lamda, bb) DSP, AR, IGF2, MSXl, SERPINEl CC) FHLl, LMNA, GDNF dd) FBXW7, GNAS, KRT14 ee) CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB ff) DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 gg) FHLl, LMNA, GDNF hh) FBXW7, GNAS, KRT14 ii) CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB jj) DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 kk) SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUR0D2 11) SFN, BAZlA, DIRAS3, CTCFL, ARMCX2 , GBP2, MAGEB2, NEUR0D2 mm) DIRAS3, C5AR1, BAZlA, SFN, ERCCl, SNRPN, PILRB, KRT17, CDKN2A, H19, EFS, TJP2, HRAS, NEUR0D2 , GBP2, CTCFL nn) DIRAS3, C5AR1, SFN, BAZlA, HIST1H2AG, XAB2, HOXAl, HICl, GRIN2B, BRCAl, C13orf15, SLC25A31, CDKN2A, H19, EFS, TJP2, HRAS, NEUR0D2, GBP2, CTCFL oo) TFPI2, NEUROD2, DLX2, TTC3, TWISTl pp) MAGEB2, MSH2, ARPClB, NEUROD2 , DDX18, PIWIL4, MSXl, COL1A2, ERCC4, GADl, RDHlO, TP53, APC, RHOXFl, ATM qq) ACTB, EFS, CXADR, LAMC2, DNAJA4 , CRABPl, PARP2, HICl, MTH- FR, S100A9, PTX2 rr) ACTB, EFS, CXADR, LAMC2, DNAJA4 , PARP2, CRABPl, HICl, SER- PINIl, MTHFR, PITX2
SS) ACTB, EFS, PARP2, TP73, HICl, BCL2A1, CRABPl, CXADR, BDNF, COLlAl tt) EFS, ACTB, BCL2A1, TP73, HICl, SERPINIl, CXADR uu) ACTB, TP73, SERPINIl, CXADR, HICl, BCL2A1, EFS vv) FBXL13, PITX2, NKX2-1, IGF2, C5AR1, SPARC, RUNX3, CHSTIl, CHRNA9, ZNF462, HSD17B4, UNG, TJP2, ERBB2, S0X15, ERCC8, CDXl, ANXA3, CDHl, CHFR, TACSTDl, MTlA ww) TP53, PTTGl, VHL, TP53, S100A2, ZNF573, RDHlO, TSHR, MY- O5C, MBD2, CPEB4, BRCAl, CD24, COLlAl, VDR, TP53, KLF4, ADRB2, ERCC2, SPINT2, XAB2, RBl, APEXl, RPA3, TP53, BRCA2 , MSH2, BAZlA, SPHKl, ERCC8, SERPINIl, RPA2 , SCGB3A1, MLH3, CDK2AP1, MTlG, PITX2, SFRP5, ZNF711, TGFBR2, C5AR1, DPHl, CDXl, GRIN2B, C5orf4, BOLL, HOXAl, NEUROD2, BCL2A1, ZNF502, FOXA2 , MYODl,
HOXAlO, TMEFF2, IQCG, LXN, SRGN, PTGS2, ONECUT2, PENK, PITX2,
DLX2, SALL3, APC, APC, HIST1H2AG, ACTB, RASSFl, S100A9, TERT,
TNFRSF25, HICl, LAMC2, SPARC, WTl, PITX2, GNA15, ESRl, KL,
HICl xx) HICl, LAMC2, SPARC, WTl, PITX2, GNA15, KL, HICl yy) HICl, KL, ESRl
or a set of at least 50%, preferably at least 60%, at least 70%, at least 80%, at least 90%, 100% of the markers of anyone of the above (a) to (yy) . The present inventive set also includes sets with at least 50% of the above markers for each set since it is also possible to substitute parts of these subsets being specific for - in the case of binary conditions/differentiations - e.g. good or bad prognosis or distinguish between diseases or tumor types, wherein one part of the subset points into one direction for a certain tumor type or disease/differentiation. It is possible to further complement the 50% part of the set by additional markers specific for determining the other part of the good or bad differentiation or differentiation between two diseases or tumor types. Methods to determine such complementing markers follow the general methods as outlined herein .
Each of these marker subsets is particularly suitable to diagnose a certain disease or tumor type or distinguish between a certain disease or tumor type in a methylation specific assay of these genes.
Also provided is a set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes selected from at least 180, preferably at least 200, more preferred at least 220, in particular preferred at least 240, even more preferred at least 260, most preferred at least 280, or even at least 300, preferably at least 320 or at least 340, or at least 360, marker genes of table 1. Of course the set may comprise even more primers or hybridization probes not given in table 1.
The inventive primers or probes may be of any nucleic acid, including RNA, DNA, PNA (peptide nucleic acids) , LNA (locked nucleic acids) . The probes might further comprise methylation specific moieties.
The present invention provides a (master) set of 360 marker genes, further also specific gene locations by the PCR products of these genes wherein significant methylation can be detected, as well as subsets therefrom with a certain diagnostic value to distinguish specific disease or tumor type. Preferably the set is optimized for a certain disease or tumor type. Cancer types include, without being limited thereto, cancer of different origin such as leukemia, a soft tissue cancer, for example breast cancer, colorectal cancer, head or neck cancer, cervical, prostate, thyroid, brain, eye or pancreatic cancer. Further indicators differentiating between disease or tumor type are e.g. benign (non (or limited) proliferative) or malignant, metastatic or non-metastatic . The set can also be optimized for a specific sample type in which the methylated DNA is tested. Such samples include blood, urine, saliva, hair, skin, tissues, in particular tissues of the cancer origin mentioned above, in particular breast or thyroid tissue. The sample my be obtained from a patient to be diagnosed. In preferred embodiments the test sample to be used in the method of identifying a subset is from the same type as a sample to be used in the diagnosis.
In practice, probes specific for potentially aberrant methylated regions are provided, which can then be used for the diagnostic method.
It is also possible to provide primers suitable for a specific amplification, like PCR, of these regions in order to perform a diagnostic test on the methylation state.
Such probes or primers are provided in the context of a set corresponding to the inventive marker genes or marker gene loci as given in table 1.
Such a set of primers or probes may have all 359 inventive markers present and can then be used for a multitude of different cancer detection methods. Of course, not all markers would have to be used to diagnose a certain disease or tumor type. It is also possible to use certain subsets (or combinations thereof) with a limited number of marker probes or primers for diagnosis of certain categories of cancer.
Therefore, the present invention provides sets of primers or probes comprising primers or probes for any single marker subset or any combination of marker subsets disclosed herein. In the following sets of marker genes should be understood to include sets of primer pairs and probes therefor, which can e.g. be provided in a kit.
Set a, CHRNA9, RPA2, CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SERPINB5, PIWIL4, NTF3, CDK2AP1 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect breast cancer and to distinguish between normal breast tissue, ductal and lobular breast carcinomas.
Set b, IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect breast cancer and to distinguish between normal breast tissue, ductal and lobular breast carcinomas.
Set c, KRT17, IGFBP7, RHOXFl, CLIC4, TP53, DLX2, ITGA4, AIMlL, SERPINl, SERPIN2, TP53, XIST, TEADl, CDKN2A, CTSD, OPCML, RPA2, BRCA2, CDHl, S100A9, SERPINB2, BCL2A1, UNC13B, ABLl, TIM- PI, ATM, FBXW7, SFRP5, ACTB, MSXl, LOX, SOX15, DGKH, CYLD, XPA, XPC and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are suitable to diagnose neoplastic disease (chronic myeloid leukemia) .
Set d, NEUROD2, CTCFL, GBP2, SFN, MAGEB2, DIRAS3, ARMCX2, HRAS and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are in particular suitable to detect minimal invasive cancer, in particular breast cancer.
Set e, SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUROD2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers are also suitable to detect cancer in limiting amounts of DNA, e.g. using minimal invasive testing using DNA from serum, in particular breast cancer.
Set f, PITX2, TJP2, CD24, ESRl, TNFRSFlOD, PRA3, RASSFl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between normal or benign states (including struma nodosa and follicular adenoma) and malign states (in particular follicular thyroid carcinoma, papillary thyroid carcinoma) . Set g, GATA5, RASSFl, HIST1H2AG, NPTXl, UNC13B and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between normal tissue against the sum of benign states (including struma nodosa and follicular adenoma) and malign states (in particular follicular thyroid carcinoma, papillary thyroid carcinoma and medullary thyroid carcinoma) .
Set h, SMAD3, NANOSl, TERT, BCL2, SPARC, SFRP2, MGMT, MYODl, LAMAl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between normal or benign states (including struma nodosa and follicular adenoma) together with malign states (in particular follicular thyroid carcinoma and papillary thyroid carcinoma) against medullary thyroid carcinoma.
Set i, TJP2, CALCA, PITX2, TFPI2, CDKN2B and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between malign states (in particular follicular thyroid carcinoma and papillary thyroid carcinoma) together with follicular adenoma against struma nodosa.
Set j, PITX2, TNFRSFlOD, PAX8, RAD23A, GJB2, F2R, TP53, NTHLl, TP53 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular adenoma (benign) and malign states selected from follicular thyroid carcinoma and papillary thyroid carcinoma.
Set k, ARRDC4, DUSPl, SMAD9, HOXAlO, C3, ADRB2, BRCA2 , SYK and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular thyroid carcinoma and papillary thyroid carcinoma.
Set 1, PITX2, MT3, RPA3, TNFRSFlOD, PTEN, TP53, PAX8, TGF- BR2, HICl, CALCA, PSATl, MBD2, NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular adenoma (benign) and follicular thyroid carcinoma (malign) . Set m, MT3, RPA3, TNFRSFlOD, HOXAl, C13orf15, TGFBR2, HICl, CALCA, PSATl, NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular adenoma (benign) and follicular thyroid carcinoma (malign) .
Set n, PITX2, PAX8, CD24, TP53, ESRl, TNFRSFlOD, RAD23A, SCGB3A1, RARB, TP53, LZTSl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between follicular adenoma (benign) and papillary thyroid carcinoma (malign) .
Set o, DUSPl, TFPI2, TJP2, S100A9, BAZlA, CPEB4, AIMlL, CDKN2A, PITX2, ARPClB, RPA3, SPARC, SFRP4, LZTSl, MSH4, PLAGLl, ABCBl, C13orfl5, XIST, TDRD6, CCDC62, HOXAl, IRF4, HSD12B4, S100A9, MT3, KCNJ15, BCL2A1, S100A8, PITX2, THBD, NANOSl, SYK, SMAD2, GNAS, HRAS, RARRESl, APEXl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between struma nodosa (benign) and follicular thyroid carcinoma (malign) .
Set p, TJP2, CALCA, PITX2, PITX2, ESRl, EFS, SSMAD3, ARRDC4, CD24, FHL2, PITX2, RDHE2, KIF5B, C3, KRT17, RASSFl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose thyroid carcinoma and distinguish between struma nodosa (benign) and papillary thyroid carcinoma (malign) .
Set q, CHRNA9, RPA2, CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SERPINB5, PIWIL4, NTF3, CDK2AP1 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish non malignant breast tissue from lobular breast carcinoma and ductal breast carcinoma.
Set r, IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish lobular breast carcinoma from ductal breast carcinoma. Set s, KRT17, AQP3, TP53, ZNF462, NEUROGl, GATA3, MTlA, JUP, RGC32, SPINT2, DUSPl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish non malignant breast tissue from lobular breast carcinoma and ductal breast carcinoma.
Set t, NCL, XPA, MYODl, Pitx2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, distinguish between breast cancer and healthy breast tissue and additionally to distinguish lobular breast carcinoma from ductal breast carcinoma.
Set u, SPARC, PIWIL4, SERPINB5, TEADl, EREG, ZDHHCIl, C5orf4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer .
Set v, HSD17B4, DSP, SPARC, KRT17, SRGN, C5orf4, PIWIL4, SERPINB5, ZDHHCIl, EREG and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
Set w, TIMPl, COL21A1, COL1A2, KL, CDKN2A and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
Set x, TIMPl, COL21A1, COL1A2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
Set y, BCL2A1, SERPINB2, SERPINEl, CLIC4, BCL2A1, ZNF256, ZNF573, GNAS, SERPINB2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
Set z, TDRD6, XIST, LZTSl, IRF4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer and is additionally particularly suitable to distinguish between metastasising and non-metastasising cancer.
Set aa, TIMPl, COL21A1, COL1A2, KL, CDKN2A and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancerous metastases in bone, liver and lung and is additionally particularly suitable to distinguish between metastasising and non- metastasising cancer, in particular from primary breast cancer.
Set bb, DSP, AR, IGF2, MSXl, SERPINEl, and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancerous metastases in bone, liver and lung and is additionally particularly suitable to distinguish between metastasising cancer in liver from metastasising cancer in bone and lung, in particular from primary beast cancer.
Set cc, FHLl, LMNA, GDNF and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in liver from metastases in bone, and lung.
Set dd, FBXW7, GNAS, KRT14 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non- metastasising cancer, in particular to distinguish metastases in liver and bone from metastases in lung.
Set ee, CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone and liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in bone from metastases in liver.
Set ff, DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to dia- gnose cancer in liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastasising liver cancer and non-metastasising cancer.
Set gg, FHLl, LMNA, GDNF and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in liver from metastases in bone, and lung.
Set hh, FBXW7, GNAS, KRT14 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone, liver and lung and to distinguish between metastasising and non- metastasising cancer, in particular to distinguish metastases in liver and bone from metastases in lung.
Set ii, CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in bone and liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastases in bone from metastases in liver.
Set jj, DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose cancer in liver and to distinguish between metastasising and non-metastasising cancer, in particular to distinguish metastasising liver cancer and non-metastasising cancer.
Set kk, SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUROD2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to identify breast cancer in particular in serum samples.
Set 11, SFN, BAZlA, DIRAS3, CTCFL, ARMCX2, GBP2, MAGEB2, NEUROD2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to identify breast cancer in particular in serum samples.
Set mm, DIRAS3, C5AR1, BAZlA, SFN, ERCCl, SNRPN, PILRB, KRT17, CDKN2A, H19, EFS, TJP2, HRAS, NEUROD2 , GBP2, CTCFL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to identify breast cancer in particular in serum samples. Set nn, DIRAS3, C5AR1, SFN, BAZlA, HIST1H2AG, XAB2, HOXAl, HICl, GRIN2B, BRCAl, C13orf15, SLC25A31, CDKN2A, H19, EFS, TJP2, HRAS, NEUROD2, GBP2, CTCFL and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to distinguish between nodule positive conditions (malign and benign tumors) and normal controls, in particular in serum samples.
Set oo, TFPI2, NEUROD2, DLX2, TTC3, TWISTl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to distinguish between no metastasis and present metastasis conditions in breast cancer.
Set pp, MAGEB2, MSH2, ARPClB, NEUROD2, DDX18, PIWIL4, MSXl, COL1A2, ERCC4, GADl, RDHlO, TP53, APC, RHOXFl, ATM and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to predict the emergence of metastasis in breast cancer patients, in particular in patients that are currently diagnosed not to have metastasis. The emergence of a different metastasis can be e.g. within four months, within six months, within eight months, within one year or within eighteen months.
Set qq, ACTB, EFS, CXADR, LAMC2, DNAJA4 , CRABPl, PARP2, HICl, MTHFR, S100A9, PTX2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose trisomy 21, in particular in both male and female patients.
Set rr, ACTB, EFS, CXADR, LAMC2, DNAJA4 , PARP2, CRABPl, HICl, SERPINIl, MTHFR, PITX2 and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose trisomy 21 and to distinguish between normal and trisomy samples.
Set ss, ACTB, EFS, PARP2, TP73, HICl, BCL2A1, CRABPl, CXADR, BDNF, COLlAl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to distinguish normal from trisomy patients, in particular trisomy 21 patients.
Set tt, EFS, ACTB, BCL2A1, TP73, HICl, SERPINIl, CXADR and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to distinguish normal from trisomy, in particular trisomy 21 patients .
Set uu, ACTB, TP73, SERPINIl, CXADR, HICl, BCL2A1, EFS and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to distinguish normal from trisomy, in particular trisomy 21 patients .
In preferred embodiments the genes common to sets qq) , rr) , ss) , tt) and uu) are used to diagnose trisomy, in particular trisomy 21.
Set vv, FBXL13, PITX2, NKX2-1, IGF2, C5AR1, SPARC, RUNX3, CHSTIl, CHRNA9, ZNF462, HSD17B4, UNG, TJP2, ERBB2, SOX15, ERCC8, CDXl, ANXA3, CDHl, CHFR, TACSTDl, MTlA and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose arthritis, in particular osteoarthritis, and to distinguish arthritic DNA from healthy (non-arthritic) DNA, in particular DNA from cartilage tissue, or bone samples, e.g. subchondral bone.
Set ww, TP53, PTTGl, VHL, TP53, S100A2, ZNF573, RDHlO, TSHR, MYO5C, MBD2, CPEB4, BRCAl, CD24, COLlAl, VDR, TP53, KLF4, ADRB2, ERCC2, SPINT2, XAB2, RBl, APEXl, RPA3, TP53, BRCA2 , MSH2, BAZlA, SPHKl, ERCC8, SERPINIl, RPA2 , SCGB3A1, MLH3, CDK2AP1, MTlG, PITX2, SFRP5, ZNF711, TGFBR2, C5AR1, DPHl, CDXl, GRIN2B, C5orf4, BOLL, HOXAl, NEUROD2 , BCL2A1, ZNF502, FOXA2 , MYODl, HOXAlO, TMEFF2, IQCG, LXN, SRGN, PTGS2, ONECUT2, PENK, PITX2, DLX2 , SALL3, APC, APC, HIST1H2AG, ACTB, RASSFl, S100A9, TERT, TN- FRSF25, HICl, LAMC2, SPARC, WTl, PITX2, GNA15, ESRl, KL, HICl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, in particular by using blood samples or samples derived from blood, including serum. In particular, this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples. This set allows an easy blood test, which may comprise disseminated cancerous cells. The present invention furthermore provides additional subsets suitable to detect and diagnose breast cancer by using any at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10 or more markers of the above set ww. These sub-subsets have been preferably validated according to any methods disclosed therein, in particular any cross-valida- tion methods providing a positive classification for the diagnosis of breast cancer (in comparison to non cancerous samples) as mentioned above for step d) , in particular having a p-value of less than 0.1, preferably less than 0.05, even more preferred less than 0.01, in a random-variance t-test.
Set xx, HICl, LAMC2, SPARC, WTl, PITX2, GNA15, KL, HICl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, in particular by using blood samples or samples derived from blood, including serum. In particular, this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples. This set allows an easy blood test, which may comprise disseminated cancerous cells. Preferably, the set is used in a test together with control markers such as MARKl, PARPl, NHLH2, PSEN2, MTHFR, POS Biotin Control RET, DUSPlO.
Set yy, HICl, KL, ESRl and sets with at least 50%, preferably at least 60%, at least 70%, at least 80% or at least 90% of these markers can be used to diagnose breast cancer, in particular by using blood samples or samples derived from blood, including serum. In particular, this set is suitable to distinguish between cancerous cells of breast cancer and normal blood samples. This set allows an easy blood test, which may comprise disseminated cancerous cells.
Also provided are combinations of the above mentioned subsets a) to yy) , in particular sets comprising markers of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more of these subsets, preferably for the same disease or tumor type like breast, lung, liver, bone or thyroid cancer or trisomy 21 or arthritis, preferably complete sets a) to yy) .
According to a preferred embodiment of the present invention, the methylation of at least two genes, preferably of at least three genes, especially of at least four genes, is determined. Specifically if the present invention is provided as an array test system, at least ten, especially at least fifteen genes, are preferred. In preferred test set-ups (for example in microarrays ("gene-chips")) preferably at least 20, even more preferred at least 30, especially at least 40 genes, are provided as test markers. As mentioned above, these markers or the means to test the markers can be provided in a set of probes or a set of primers, preferably both.
In a further embodiment the set comprises up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000 probes or primer pairs (set of two primers for one amplification product), preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, or even more preferred up to 200 probes or primers of any kind, particular in the case of immobilized probes on a solid surface such as a chip.
In certain embodiments the primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.
Preferably the probes or primers are specific for a methyla- tion in the genetic regions defined by SEQ ID NOs 1081 to 1440, including the adjacent up to 500 base pairs, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent, corresponding to gene marker IDs 1 to 359 of table 1, respectively. I.e. probes or primers of the inventive set (including the full 359 set, as well as subsets and combinations thereof) are specific for the regions and gene loci identified in table 1, last column with reference to the sequence listing, SEQ ID NOs: 1081 to 1440. As can be seen these SEQ IDs correspond to a certain gene, the latter being a member of the inventive sets, in particular of the subsets a) to yy) , e.g..
Examples of specific probes or primers are given in table 1 with reference to the sequence listing, SEQ ID NOs 1 to 1080, which form especially preferred embodiments of the invention.
Preferably the set of the present invention comprises probes or primers for at least one gene or gene product of the list according to table 1, wherein at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, especially preferred at least 100%, of the total probes or primers are probes or primers for genes of the list according to table 1. Preferably the set, in particular in the case of a set of hybridization probes, is provided immobilized on a solid surface, preferably a chip or in form of a mi- croarray. Since - according to current technology - detection means for genes on a chip allow easier and more robust array design, gene chips using DNA molecules (for detection of methylated DNA in the sample) is a preferred embodiment of the present invention. Such gene chips also allow detection of a large number of nucleic acids.
Preferably the set is provided on a solid surface, in particular a chip, whereon the primers or probes can be immobilized. Solid surfaces or chips may be of any material suitable for the immobilization of biomolecules such as the moieties, including glass, modified glass (aldehyde modified) or metal chips .
The primers or probes can also be provided as such, including lyophilized forms or being in solution, preferably with suitable buffers. The probes and primers can of course be provided in a suitable container, e.g. a tube or micro tube.
The present invention also relates to a method of identifying a disease or tumor type in a sample comprising DNA from a subject or patient, comprising obtaining a set of nucleic acid primers (or primer pairs) or hybridization probes as defined above (comprising each specific subset or combinations thereof) , determining the methylation status of the genes in the sample for which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed disease or tumor type positive and/or negative state, thereby identifying the disease or tumor type in the sample. In general the inventive method has been described above and all preferred embodiments of such methods also apply to the method using the set provided herein.
The inventive marker set, including certain disclosed subsets and subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between lung, gastric, colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophae- geal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer, cancer from benign conditions, in particular for diagnostic or prognostic uses. Preferably the markers used (e.g. by utilizing primers or probes of the inventive set) for the inventive diagnostic or prognostic method may be used in smaller amounts than e.g. in the set (or kit) or chip as such, which may be designed for more than one fine tuned diagnosis or prognosis. The markers used for the diagnostic or prognostic method may be up to 100000, up to 90000, up to 80000, up to 70000, up to 60000 or 50000, preferably up to 40000, up to 35000, up to 30000, up to 25000, up to 20,000, up to 15000, up to 10000, up to 7500, up to 5000, up to 3000, up to 2000, up to 1000, up to 750, up to 500, up to 400, up to 300, up to 200, up to 100, up to 80, or even more preferred up to 60.
The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between thyroid cancer from benign thyroid tissue, in particular for diagnostic or prognostic uses.
The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between breast cancer from normal tissue and benign breast tumors, in particular for diagnostic or prognostic uses.
The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between hereditary from sporadic breast cancer, in particular for diagnostic or prognostic uses.
The inventive marker set, including certain disclosed subsets, which can be identified with the methods disclosed herein, are suitable to distinguish between breast cancer responsive to herceptin treatment from likely non-responders, in particular for diagnostic or prognostic uses.
The present invention is further illustrated by the following figures and examples, without being restricted thereto.
Figures :
Figure 1: A 961 gene classifier derived from genome-wide expression profiling enables differentiation of a group of patients with (yes) and without (no) metastases during follow up of patients suffering from breast cancer upon analyses of primary tumor tissues. Dendrogramm obtained from clustering experiments using centered correlation (values shown on the vertical axis) . Figure 2: Performance of expression profiling versus CpG360 methylation. Correct classification (%) using 7 different classification tests is depicted from a 961 gene—classifier, a targeted set of 385 genes (Lauss 2007), and a 4 gene DNA- methylation classifier derived from the methylation test (Cp- G360A) . Although consisting of only 4 genes, the methylation based classifier performs best.
Figure 3: Multidimensional scaling using the 19 gene classifier for serum testing of breast tumors illustrates good classification of tumor versus healthy controls. Methylation data from DNA-samples of benign tumors (B) , the breast cancer cell line MCF7, normal females (NormF) and males (NormM) and several breast cancer patients (Tu) were derived from DNA upon preampli- fication of the methylated DNA; several normal controls (Norm_direct) were tested without preamplification.
Figure 4: Shows class prediction using PAMR (predicting analysis of microarrays) to determine the minimum subset of using the 359 marker genes of table 1. The minimal set contains only 3 markers (set yy) . Further combinations resulted in the same mis- classification error of 0%.
Figure 5: Dendrogram for clustering experiments, using centered correlation and average linkage.
Examples:
Example 1: gene list
Table 1: 360 master set (with the 359 marker genes and one control) and sequence annotation
Figure imgf000034_0001
- 33a -
Figure imgf000035_0001
- 34 -
Figure imgf000036_0001
- 35 -
Figure imgf000037_0001
- 36 -
Figure imgf000038_0001
- 37 -
Figure imgf000039_0001
- 38 -
Figure imgf000040_0001
- 39 -
Figure imgf000041_0001
- 40 -
Figure imgf000042_0001
- 41 -
Figure imgf000043_0001
- 42 -
Figure imgf000044_0001
- 43 -
Figure imgf000045_0001
Example 2 : Samples
Samples from solid tumors were derived from initial surgical resection of primary tumors. Tumor tissue sections were derived from histopathology and histopathological data as well clinical data were monitored over the time of clinical management of the patients and/or collected from patient reports in the study center. Anonymised data were provided.
Example 3: DNA and RNA isolation
Tissue samples were homogenized in a FASTPREP homogenizer (MP Biomedicals, Eschwege, Germany) in lysis buffer provided with the Qiagen "All Prep" nucleic acid preparation kit (Qiagen, Hilden, Germany) . DNA and RNA concentrations were measured on a Nanodrop photometer. RNA quality was controlled using a BioAna- lyser (Agilent, Waldbronn, Germany) . All conditions were according to manufacturer's recommendations.
Example 4: Whole genome expression profiling
RNA samples derived from breast cancer tissue were analyzed with 44k human whole genome oligo microarrays (Agilent Technologies) .
RNA expression levels from different samples were analyzed on a single microarray using the Single-Color Low RNA Input Linear Amplification Kit PLUS (Agilent Technologies, Waldbronn, Germany) . For each amplification, 200ng of total RNA were employed and amplified samples were prepared for hybridization using the Gene Expression Hybridization Kit (Agilent Technologies). Hybridization was performed over night at 65°C in - A A - a rotating hybridization oven (Agilent Technologies) . Stringency washes, image aquisition and feature extraction was performed according to the manufacturer's protocol (Agilent Technologies, Waldbronn, Germany) .
Example 5: Principle of the assay and design
The invention assay is a multiplexed assay for DNA methyla- tion testing of up to (or even more than) 360 methylation candidate markers, enabling convenient methylation analyses for tumor-marker definition. In its best mode the test is a combined multiplex-PCR and microarray hybridization technique for multiplexed methylation testing. The inventive marker genes, PCR primer sequences, hybridization probe sequences and expected PCR products are given in table 1, above.
Targeting hypermethylated DNA regions in the inventive marker genes in several neoplasias, methylation analysis is performed via methylation dependent restriction enzyme (MSRE) digestion of 500ng of starting DNA. A combination of several MSREs warrants complete digestion of unmethylated DNA. All targeted DNA regions have been selected in that way that sequences containing multiple MSRE sites are flanked by methylation independent restriction enzyme sites. This strategy enables pre-amp- lification of the methylated DNA fraction before methylation analyses. Thus, the design and pre-amplification would enable methylation testing on serum, urine, stool etc. when DNA is limiting.
When testing DNA without pre-amplification upon digestion of 500ng the methylated DNA fraction is amplified within 16 multiplex PCRs and detected via microarray hybridization. Within these 16 multiplex-PCR reactions 360 different human DNA products can be amplified. From these about 20 amplicons serve as digestion & amplification controls and are either derived from known differentially methylated human DNA regions, or from several regions without any sites of MSREs used in this system. The primer set (every reverse primer is biotinylated) used is targeting 347 different sites located in the 5'UTR of 323 gene regions .
After PCR amplicons are pooled and positives are detected using strepavidin-Cy3 via microarray hybridization. Although the melting temperature of CpG rich DNA is very high, primer and - 45 - probe-design as well as hybridization conditions have been optimized, thus this assay enables unequivocal multiplexed methyl- ation testing of human DNA samples. The assay has been designed such that 24 samples can be run in parallel using 384well PCR plates .
Handling of many DNA samples in several plates in parallel can be easily performed enabling completion of analyses within 1-2 days .
The entire procedure provides the user to setup a specific PCR test and subsequent gel-based or hybridization-based testing of selected markers using single primer-pairs or primer-subsets as provided herein or identified by the inventive method from the 360 marker set.
Example 6: MSRE digestion of DNA
MSRE digestion of DNA (about 500ng) was performed at 37°C over night in a volume of 30μl in Ix Tango-restriction enzyme digestion buffer (MBI Fermentas) using 8 units of each MSREs Acil (New England Biolabs) , Hin 6 I and Hpa II (both from MBI Fermentas). Digestions were stopped by heat inactivation (10 min, 75°C) and subjected to PCR amplification.
Example 7: PCR amplification
An aliquot of 20μl MSRE digested DNA (or in case of preamp- lification of methylated DNA - see below - about 500ng were added in a volume of 20μl) was added to 280μl of PCR-Premix (without primers) . Premix consisted of all reagents obtaining a final concentration of Ix HotStarTaq Buffer (Qiagen) ; 160μM dNT- Ps, 5% DMSO and 0.6U Hot Firepol Taq (Sous Biodyne) per 20μl reaction. Alternatively an equal amount of HotStarTaq (Qiagen) could be used. Eighteen (18) μl of the Pre-Mix including digested DNA were aliquoted in 16 0.2ml PCR tubes and to each PCR tube 2μl of each primer-premix 1-16 (containing 0.83pmol/μl of each primer) were added. PCR reactions were amplified using a thermal cycling profile of 15min/95°C and 40 cycles of each 40sec/95°C, 40sec/65°C, lmin20sec/720C and a final elongation of 7min/72°C, then reactions were cooled. After amplification the 16 different mutiplex-PCR amplicons from each DNA sample were pooled. Successful amplification was controlled using lOμl of the pooled 16 different PCR reactions per sample. Positive amp- - 4 6 - lification obtained a smear in the range of 100-300 bp on EtBr stained agarose gels; negative amplification controls must not show a smear in this range.
Example 8: Microarray hybridization and detection:
Microarrays with the probes of the 360 marker set are blocked for 30 min in 3M Urea containing 0.1%SDS, at room temperature submerged in a stirred choplin char. After blocking slides are washed in 0. Ix SSC/0.2%SDS for 5 min, dipped into water and dried by centrifugation .
The PCR-amplicon-pool of each sample is mixed with an equal amount of 2x hybridization buffer (7xSSC, 0,6%SDS, 50% formam- ide) , denaturated for 5min at 95°C and held at 700C until loading an aliqout of lOOμl onto an array covered by a gasket slide (Agilent) . Arrays are hybridized under maximum speed of rotation in an Agilent-hybridization oven for 16h at 52°C. After removal of gasket-slides microarray-slides are washed at room temperature in wash-solution I (IxSSC, 0.2%SDS) for 5 min and wash solution II (0. IxSSC, 0.2%SDS) for 5 min, and a final wash by dipping the slides 3 times into wash solution III (0. IxSSC), the slides are dried by centrifugation.
For detection of hybridized biotinylated PCR amplicons, streptavidin-Cy3-conjugate (Caltag Laboratories) is diluted 1:400 in PBST-MP (IxPBS, 0.1% Tween 20; 1% skimmed dry milk powder [Sucofin; Germany]), pipetted onto microarrays covered with a coverslip and incubated 30 min at room temperature in the dark. Then coverslips are washed off from the slides using PBST (IxPBS, 0.1% Tween 20) and then slides are washed in fresh PBST for 5 min, rinsed with water and dried by centrifugation.
Example 9: DNA Preamplification for methylation profiling (optional)
In many situations DNA amount is limited. Although the inventive methylation test is performing well with low amounts of DNA (see above), especially minimal invasive testing using cell free DNA from serum, stool, urine, and other body fluids is of diagnostic relevance.
In the present case only 10-lOOng were obtained from ImI of serum when testing cell free DNA from serum of breast cancer patients. From a set of patients with "chronic lymphatic leukemia" - 47 -
(CML) only limited amounts of about lOOng were available; thus those samples were also preamplified prior methylation testing as follows: DNA was digested with restriction enzyme Fspl
(and/or Cspβl, and/or Msel, and/or Tsp509I; or their isoschizomeres) and after (heat) inactivation of the restriction enzyme the fragments were circularized using T4 DNA ligase. Lig- ation-products were digested using a mixture of methylation sensitive restriction enzymes. Upon enzyme-inactivation the entire mixture was amplified using rolling circle amplification
(RCA) by phi29-phage polymerase. The RCA-amplicons were then directly subjected to the multiplex-PCRs of the inventive methylation test without further need of digestion of the DNA prior amplification.
Alternatively the preamplified DNA which is enriched for methylated DNA regions can be directly subjected to flourescent- labelling and the labeled products can be hybridized onto the microarrays using the same conditions as described above for hybridization of PCR products. Then the streptavidin-Cy3 detection step has to be omitted and slides should be scanned directly upon stringency washes and drying the slides. Based on the experimental design for microarray analyses, either single labeled or dual-labeled hybridizations might be generated. From our experiences we successfully used the single label-design for class comparisons. Although the preamplification protocol enables analyses of spurious amounts of DNA, it is also suited for performing genomic methylation screens.
To elucidate methylation biomarkers for prediction of metastasis risk on a genomewide level we subjected 500ng of DNA derived from primary tumor samples to amplification of the methylated DNA using the procedure outlined above. RCA-amplicons derived from metastasised and non-metastasised samples were labelled using the CGH Labeling Kit (Enzo, Farmingdale, NY) and labelled products hybridized onto human 244k CpG island arrays
(Agilent, Waldbronn, Germany) . All manipulations were according the instructions of the manufacturers.
Example 10: Data Analysis
Hybridizations performed on a chip with probes for the inventive 360 marker genes were scanned using a GenePix 4000A scanner (Molecular Devices, Ismaning, Germany) with a PMT set- - 48 - ting to 700V/cm (equal for both wavelengths) . Raw image data were extracted using GenePix 6.0 software (Molecular Devices, Ismaning, Germany) .
Hybridizations performed on whole genome arrays were scanned using an Agilent DNA microarray scanner and raw image data were extracted using the Agilent Feature Extraction Software (v9.5.3.1) .
Microarray data analyses were performed using BRB-ArrayTools developed by Dr. Richard Simon and BRB-ArrayTools Development Team. The software package BRB Array Tools (version 3.6; in the www at linus . nci . nih . gov/BRB-ArrayTools . html) was used according recommendations of authors and settings used for analyses are delineated in the results if appropriate. For every hybridization, background intensities were subtracted from foreground intensities for each spot. Global normalization was used to median center the log-ratios on each array in order to adjust for differences in spot/label intensitites .
P-values (p) used for feature selection for classification and prediction were based on the univariate significance levels (alpha) . P-values (p) and mis-classification rate during cross validation (MCR) were given along the result data.
Example 11: Multiplexed methylation testing outperforms the "classification" success when compared to genomewide a nd targeted screenings via RNA expression profiling
RNA and DNA breast cancer tissue samples of the primary tumor from patients were used for genomic expression profiling and DNA methylation analyses, respectively, for elucidation of bio- markers to predict metastasis during follow up of disease. From the 44k expression analyses of patient samples with (n=6) and without (n=6) metastases class-prediction did elucidate 961 different RNA-expression markers suitable for classification of either group (Figure 1). Cross validation obtained a 83% correct classification for prediction of development of metastases during follow up of breast cancer patients.
In addition expression data of a subset of 385 biomarkers elucidated by Lauss 2007 (Lauss M, Kriegner A, Vierlinger K, Visne I, Yildiz A, Dilaveroglu E, Noehammer C. Consensus genes of the literature to predict breast cancer recurrence 33. Breast Cancer Res Treat 2008 ; 110 : 235-44) from the 44k Agilent expres- - 4 9 - sion arrays was used as second comparison for class prediction and obtained 67% correct classification of patients with and without metastasis.
Using the inventive DNA methylation data of the same primary tumor samples as used for class prediction via expression profiling, good classification of both primary tumor groups by only a few genes (n=4; p=0.01) was obtained. Class prediction using these classifiers gave a correct classification of more than 83% by using different statistical tests. Best classification of 100% was obtained using diagonal linear discriminant analysis. In Figure 2 the performance of genome-scaled and "targeted" expression profiling is presented of a predefined marker set (Lauss 2007) versus the inventive methylation testing for the purpose of predicting the risk of metastasis in breast cancer patients when analysing primary tumor tissue.
Example 12: Multiplexed Methylation Testing enables identification of biomarkers for a wide variety of (neoplastic) diseases
12.1 Classification of Tumor vs Normal & histologically different tumor subgroups exemplified using breast cCancer Patient tissue
Although prediction of the risk of metastasis is a major challenge and would be of great interest for therapeutic intervention, it is also of interest to distinguish histological entities of primary breast tumors and also to distinguish normal tissue from tumor tissue. Therefore DNA derived from several ductal (n=8) and lobular (n=8) primary tumors were subjected to the methylation test. From several patients normal tissue (n=4) adjacent to the primary tumor was also available for analysis. Class prediction using binary tree algorithm within BRB-AT did elucidate good classification (MCR=12.5%) of histopathological distinct subgroups of lobular and ductal breast primary tumors by a 8-gene classifier (p<0.005). Although normal tissue adjacent to the neoplastic nodes was available only from 4 patients, 12 methylation-markers enable distinction from tumors (p<0.005; MCR=30%; table 3) .
Binary tree prediction for classification of normal (Bre) breast tissue, and ductal (Duct) and lobular (Lob) breast carcinomas. Gene classifiers discriminating nodes 1) and 2) of the binary tree are listed in subtables 1) & 2), respectively. - 50 -
Optimal Binary Tree:
Table 2: Cross-validation error rates for a fixed tree structure shown below
Figure imgf000052_0001
Node 1
Table 3: Composition of classifier (12 genes) - Sorted by p- value :
Figure imgf000052_0002
- 51 -
Figure imgf000053_0001
Node 2
Table 4: Composition of classifier (8 genes)- Sorted by p-value:
Figure imgf000053_0002
For testing the usability of the inventive methylation test on neoplasias other than breast cancer, several solid tumor entities of the thyroid, brain and also leukemia (ALL, CML) samples were tested. Different clinical relevant classes for each setting were analysed and all samples and most subgroups could be successfully classified.
Example 13: Classification of diseased versus healthy on minimal amounts of initial DNA samples upon preamplification confirms suitability of the test for diagnosis of neoplastic disease
13.1 Classification upon preamplification exemplified by distinguishing chronic myeloid leukemia (CML) and normal DNA
The methylation pattern of a set of 28 different DNA samples derived from a patient suffering from chronic myeloid leukemia - 52 - versus 12 normal controls were analysed. DNA samples were derived from 8 CML patients at diagnosis, 13 patients within their chronic phase of disease, 3 patients were in the accelerated phase and 3 were blast crisis patients.
Because only limited amounts of DNA were available from patients, DNA (lOOng) from CML-patients and controls were subjected to preamplification outlined in example 6.
The amplicons derived from the preamplification procedure were directly subjected to the inventive methylation test.
Binary Tree Prediction of leukemias versus normal controls did perform well to distinguish leukemia at the different stages of disease from normals by a 36-gene classifier (p<0.005; MCR=12.5%). Although some more specific analyses were performed to distinguish different subtypes, this example does illustrate that the test is suitable for classification of neoplastic disease upon selective preamplification of methylated DNA. Thus even if only limiting amounts of sample-DNA are available, the inventive methylation test can successfully be applied upon preamplification .
Table 5: Composition of classifier (36 genes) - Sorted by p- value :
Figure imgf000054_0001
- 53 -
Figure imgf000055_0001
- 54 -
Figure imgf000056_0001
13.2 Classification of diseased versus healthy individuals u sing DNA samples derived from serum confirms suitability of the test for minimal invasive diagnostic testing of cancer (breast c ancer)
DNA was isolated from serum of breast cancer patients (n=16) at initial diagnosis and female healthy controls (n=6) and two patients with benign tumors. The minute amounts of serum DNA (about 10-lOOng/ml) derived from patients and controls were subjected to preamplification of the methylated DNA fraction as outlined in the methods. Derived amplicons were subjected to methylation testing using the inventive methylation test. Using different statistical methods for class prediction did successfully elucidate classifiers for distinguishing patients with malign tumors (n=18) from benign and healthy controls (n=8) . Binary Tree Prediction of serum from tumors versus controls did perform well to distinguish diseased from normal individuals by a 9-gene classifier (p<0.005; MCR=16.7%) . This example does illustrate that the test is suitable for classification of neoplastic disease, in this case breast cancer, from serum of patients. In other words the test enables minimal invasive diagnosis of malignancies.
Table 6: Comparison of T (malign tumors) /N (benign node & normal), Composition of classifier (9 genes) - Sorted by p-value:
Figure imgf000056_0002
- 55 -
Figure imgf000057_0001
Table 7: Performance of classifiers during cross-validation
Figure imgf000057_0002
- 56 - Performance of the 3-Nearest Neighbours Classifier:
Figure imgf000058_0001
Table 8: Composition of classifier - Sorted by t-value:
Figure imgf000058_0002
Example 14: Thyroid-Cancer-Diagnostics: diagnostic methylation markers for elucidation of nodular thyroid disease
6 histological classes were used:
SD...normal thyroid tissue
SN...Struma nodosa (benign)
FTA = folicular adenoma (benign)
FTC... Follicular thyroid carcinoma (malign) PTC... Papillary thyroid carcinoma (malign) MTC... Medullary thyroid carcinoma (malign)
1. Of diagnostic importance would be to distinguish "benign" vs "malign" entities.
MTC has been excluded within this class comparison due to its low frequency (about 5% of all thyroid malignancies) but is elucidated by the different genes in chapter 2.
2. Within the "binary tree prediction approach" MTC is distinguished from other entities (FTA, FTC, PTC, SN) as depicted in "node 2" classification list Although in 2) all classes are distinguished (sometimes to a not very good correct classification rate) , those contrasts which are of utmost clinical/diagnostic relevance were analysed in detail for distinguishing
3.1. FTC vs FTA using "Class Prediction" for defining a 18 gene classifier (100% correct classification)
3.2. FTC vs FTA using another feature selection strategy resulting in a 15 gene classifier (97% correct classification)
3.3. PTC vs FTA
3.4. FTC vs SN
3.5. PTC vs SN
14.1. Benign (SN, FTA) vs Malign (FTC, PTC)
Table 9: Sorted by p-value of the univariate test.
Class 1: benign; Class 2: FTC or PTC.
The first 7 genes are significant at the nominal 0.01 level of the univariate test
Figure imgf000059_0001
The support vector machine classifier was used for class prediction. There were 5 nodes in the classification tree. - 58 -
Cross-validation error rates for a fixed optimal binary tree structure shown below
Figure imgf000060_0001
Results of classification, Node 1: Cross-validation results for a fixed tree structure: patients correct classified
FTA 16/17
FTC 20/20
MTC 7/7
PTC 18/18
SD 0/5
SN 18/18
Percent correctly classified: 93%
Table 10: Composition of classifier (5 genes)- Sorted by p- value :
Figure imgf000060_0002
Results of classification, Node 2: Cross-validation results for a fixed tree structure
Composition of classifier (9 genes): patients correct classified
FTA 16/17
FTC 19/20 - 59 -
MTC 4/8
PTC 19/19
SN 19/19
Percent correctly classified: 93%
Table 11: Composition of classifier (9 genes)- Sorted by p- value :
Figure imgf000061_0001
Results of classification, Node 3:
Cross-validation results for a fixed tree structure: patients correct classified
FTA 17/17
FTC 18/20
PTC 18/19
SN 7/18
Percent correctly classified: 80%
Table 12 : Composition of classifier (5 genes) - Sorted by p- value :
Figure imgf000061_0002
- 60 -
Figure imgf000062_0001
Results of classification, Node 4:
Cross-validation results for a fixed tree structure: patients correct classified
FTA 5/17
FTC 15/20
PTC 16/18
Percent correctly classified: 66%
Table 13: Composition of classifier (9 genes) - Sorted by p- value :
Figure imgf000062_0002
Results of classification, Node 5:
Cross-validation results for a fixed tree structure: patients correct classified
FTC 12/20
PTC 13/19
Percent correctly classified: 64% - 61 -
Table 14: Composition of classifier (8 genes) - Sorted by p- value
Figure imgf000063_0001
Example 15: Specific Diagnostically Challenging Contrasts
15.1 FTC/FTA using a 18 gene list derived from the test obtained 100% correct classification
Table 15: Composition of classifier - Sorted by t-value (Sorted by gene pairs)
Class 1: FTA; Class 2: FTC
Figure imgf000063_0002
- 62 -
Figure imgf000064_0001
15.2 FTC/FTA 15 gene list/97% Correct Classification Performance of the Support Vector Machine Classifier
Table 16: Composition of classifier - Sorted by t-value: Class 1: FTA; Class 2: FTC
Figure imgf000064_0002
- 63 -
15.3 PTC vs FTA
Figure imgf000065_0001
15.4 FTC vs. SN
Table 17: Sorted by p-value of the univariate test. Class 1: FTC; Class 2: SN. The first 38 genes which discriminate among classes and are significant at the nominal 0.05 level of the univariate test
Figure imgf000065_0002
- 64 -
Figure imgf000066_0001
- 65 -
Figure imgf000067_0001
15 . 5 PTC / SN
Table 18: Genes which discriminate among classes - Sorted by p- value of the univariate test.
Class 1: PTC; Class 2: SN. The first 16 genes are significant at the nominal 0.05 level of the univariate test
Figure imgf000067_0002
Example 16: DNA Methylation Biomarkers for Breast Cancer Diagnostics
1. distinguishing Breast Cancer (BrCa) from healthy breast tissue
2. Metastasis Markers: elucidation and prediction of patients at risk to develop metastases using tissue specimens from the primary tumor at the time of intial surgery - 66 -
2.1. ARC-CpG 360 test on original tumor DNA
2.2. ARC-CpG 360 test on original tumor DNA (using housekeeping genes normalisation)
2.3. ARC-CpG 360 test on original tumor DNA (using multiplex-normalisation)
2.4. distinguishing Metastasis/non-Metastasis applying the ARC-CpG 360 test on APA (adapter-primed amplification) products of original tumor DNA
2.5. applying the original DNA (APA-template) into the test
3. genelists for prediction of organ of metastases
3.1. Organ of Metastases
3.2. Organ of Metastases plus additional secondary affected metas . organ ("liver_plus", "lung_plus", "bone_plus")
4. Breast Cancer (BrCa) diagnosis using DNA derived from serum of patients
-methylated DNA from serum of breast cancer patients was RCA- preamplified and subjected to ARC-CpG360 testing
4.1. Identification of BrCa patients - compound covariate predictor: patient (T) vs controls (N)
4.2. Identification of BrCa patients - support vector machines predictor: patient (T) vs controls (N)
4.3. Identification of BrCa patients - greedy pairs & Compound Covariate Predictor = 96% correct
4.4. Identification of BrCa patients - final combined list - greedy pairs = 100% correct
Abbreviations lob... lobular breast carcinoma duct... ductal breast carcinoma bre or healthy... non malignant breast tissue ben... breast tissue derived from beninge nodular disease (fibro adenoma) m... patient-samples (intial diagnosis) developing metastases during follow up nm.. patient-samples (intial diagnosis) with NO metastases during follow up - 67 -
T... tumor patient
N... normal control individuum - in this settings the group N contains 4 healthy females and 2 females with a confirmed benign tumor (fibroadenoma) .
16.1. distinguishing Breast Cancer (BrCa) from healthy breast tissue
16.1.1. lob/duct/healthy
Binary tree prediction
Cross-validation error rate for a fixed binary tree shown below:
Figure imgf000069_0001
Node 1
Table 19: Composition of classifier - Sorted by p-value
Figure imgf000069_0002
- 68 -
Node 2
Table 20: Composition of classifier - Sorted by p-value
Figure imgf000070_0001
16.1.2. lob/duct/healthy [derived from analyses using non- mixed hybridization conditions]
binary tree prediction
Misclassi fication
Node Group 1 Classes Group 2 Clas ses rate ( % )
1 ductal , lobular healty breast ti ssue 2 0
2 ductal lobular 25
Node 1
Table 21: Composition of classifier - Sorted by p-value
Figure imgf000070_0002
- 69 -
Figure imgf000071_0001
Node 2
Table 22: Composition of classifier - Sorted by t-value
Figure imgf000071_0002
16.2. distinguishing Breast Cancer (BrCa) from benign breast tissue
16.2.1. Metastasis Markers:
16.2.1.1. NM vs M via Class prediction (88% correct classif; SVM)
Table 23: Composition of classifier - Sorted by t -value: Class 1: m; Class 2: nm.
Geom mean Paramet- Geom mean of t- % CV of intens- Ratio of Gene ric p- . . . intensities value support ities in geom means symbol value in class 2 class 1
10.0046263 -3.152 92 4618.6739493 22964.4785573 0.2011225 SPARC
20.00532883.394 88 1444.9276316 646.777971 2.2340396 PIWIL4 - 70 -
Geom mean
ParametGeom mean of t- % CV of intensRatio of Gene ric p- intensities value support ities in geom means symbol value in class 2 class 1
3 0.0004492 4.555 100 27438.2922955 13506.1416447 2.0315419 SERPINB5
4 0.0001677 4.728 100 783.1275118 498.3793173 1.5713483 TEADl
5 0.0005684 4.962 100 9591.5219686 1035.8974395 9.2591425 EREG
6 8e-07 8.333 100 7422.5296339 2919.4183827 2.5424686 ZDHHCIl
7 1.3e-06 12.86 100 4921.0334002 682.5406118 7.2098763 C5orf4
16.2.1.2. NM vs M via Class prediction (alternatively normalised upon "housekeeping genes" 79% correct classif; SVM)
Table 24: Composition of classifier: - Sorted by t -value: Class 1: m; Class 2: nm.
Paramet% CV Geom mean of Geom mean of
Gene ric p- t-value supintensities intensities Fold-change symbol value port in class 1 in class 2
1 0.0001415 -14.234 96 816.9876694 22923.064881 0.0356404 HSD17B4
2 0.0083213 -4.853 79 666.243811 1526.6299468 0.4364147 DSP
3 0.007109 -2.968 29 9716.9365382 49855.3171528 0.1949027 SPARC
4 0.0064307 3.132 67 1836.3963711 722.3717675 2.5421763 KRT17
5 0.002913 3.348 92 59846.4011636 52021.1516203 1.1504244 SRGN
6 0.0032404 3.599 92 5491.6264846 757.0224107 7.2542456 C5orf4
7 0.000719 4.31 100 2679.7639487 996.300545 2.6897144 PIWIL4
SERPIN-
8 0.0003903 4.629 100 50887.0698062 27462.471286 1.8529676 B5
9 0.0001157 5.283 100 13765.8286072 5936.1478811 2.3189834 ZDHHCIl
10 3.19e-05 6.444 100 23438.6348936 1484.7197091 15.7865722 EREG
16.2.1.3. NM vs M _upon multiplex normalisation Class prediction (binary tree prediction 83% correct classified) - 71 -
Table 25: Composition of classifier - Sorted by p-value: Class 1: m (n=18) Class 2: nm (n=5)
ParametGeom mean of Geom mean of t- % CV Ratio of Gene ric p- intensities intensities value support geom means symbol value in group 1 in group 2
1 < le-07 -le+0787 885.547163 1982.5964469 0.4466603 TIMPl
2 < le-07 le+07 100 1453.0368811 685.1212898 2.1208462 COL21A1
3 < le-07 le+07 91 658.5014611 551.9419886 1.1930628 COL1A2
4 0.0014608 -5.53987 471.8007783 761.7592596 0.6193568 KL
5 0.00310963.87 100 1080.7271447 802.5161476 1.3466734 CDKN2A
Table 26: Composition of classifier - Sorted by p-value
ParametGeom mean of Geom mean of t- % CV Ratio of Gene ric p- intensities intensities value support geom means symbol value in group 1 in group 2
1 < le-07 -le+07 87 885.547163 1982.5964469 0.4466603 TIMPl
2 < le-07 le+07 100 1453.0368811 685.1212898 2.1208462 COL21A1
3 < le-07 le+07 91 658.5014611 551.9419886 1.1930628 COL1A2
16.2.1.4. NM vs M upon APA Class prediction (Diagonal Linear Discriminant=100% correct classif; SVM =92%)
Table 27: Composition of classifier - Sorted by t-value: Class 1: m; Class 2: nm. [n=6 per each group]
Paramet% CV Geom mean of Geom mean of e sym ric p- value
Figure imgf000073_0001
1 6.8e-06 -8.508 100 699.2454811 3384.966489 0.2065738 BCL2A1
2 1.24e-05 -7.956 100 1144.4907092 6068.1628967 0.1886058 SERPINB2
3 4.68e-05 -6.81 100 1612.7663831 6041.3773778 0.2669534 SERPINEl
4 5.75e-05 -6.644 100 2910.0453562 9519.5437319 0.3056917 CLIC4
5 0.0002064 -5.671 100 599.5692432 5858.0250012 0.1023501 BCL2A1
6 0.0009722 4.605 17 196.0758645 122.9028821 1.5953724 ZNF256
7 0.0003679 5.26 75 329.0570275 139.2977392 2.3622568 ZNF573
8 5.61e-05 6.664 100 1752.8553582 626.6081244 2.7973709 GNAS - 72 -
Paramet- % CV Geom mean of Geom mean of t- Gene sym- ric p- sup- intensities intensities Fold-change vvaalluuee bol value port in class 1 in class 2
95.32e-05 6.706 100 360.9191684 110.8183685 3.2568533 SERPINB2
16.2.1.5. NM vs M using the APA-template for Class predic¬ tion (SVM =92%)
Table 28: Composition of classifier - Sorted by t-value: Class 1: m; Class 2: nm .
% CV Geom mean of Geom mean of
Parametric t- Fold- Gene supintensities in intensities in p-value value change symbol port class 1 class 2
12.9e-05 -7.206100 4505.2826317 10969.8418903 0.4106971 TDRD6
20.0001947 -5.713100 966.693001 4664.3939694 0.2072494 XIST
30.0006817 -4.84 17 1735.0546548 10070.7577787 0.1722864 LZTSl
40.0009291 -4.6358 1817.5569529 4443.2023065 0.4090646 IRF4
Example 17: genelists for prediction of organ of metastases 17.1. Organ of Metastases (Binary Tree Classification)
Optimal Binary Tree: Cross-validation error rates for a fixed tree structure shown below
Group 1 Mis-classification rate
Node Group 2 Classes Classes (%) bone, liver,
1 nm 17.4 lung 2 bone, lung liver 38.
Node 1
Table 29: Composition of classifier (6 genes) - Sorted by p- value :
Figure imgf000074_0001
- 73 -
Figure imgf000075_0001
Node 2
Table 30: Composition of classifier (5 genes) - Sorted by p- value :
Figure imgf000075_0002
17.2. Organ of Metastases plus additional metastasised organ (Binary Tree Classification)
Optimal Binary Tree:
Cross-validation error rates for a fixed tree structure shown below - 74 -
Figure imgf000076_0001
17.2.1. Results of classification, Node 1 :
Table 31: Composition of classifier (3 genes) - Sorted by p- value :
Figure imgf000076_0002
17.2.2. Results of classification, Node 2:
Table 32: Composition of classifier (3 genes) - Sorted by p- value :
Figure imgf000076_0003
17.2.3. Results of classification, Node 3 : - 75 -
Table 33: Composition of classifier (7 genes) - Sorted by p- value :
Figure imgf000077_0001
17.2.4.Results of classification, Node 4 :
Table 34: Composition of classifier (6 genes) - Sorted by p- value :
Figure imgf000077_0002
17.3. Organ of Metastases plus additional metastasised organ (Binary Tree Classification) -Genefilters on
GENEFILTERS ON = Exclude a gene under any of the following conditions :
Less than 20 % of methylation data have at least a 1.5-fold change in either direction from gene's median value - 76 -
Optimal Binary Tree:
Figure imgf000078_0001
Cross-validation error rates for a fixed tree structure shown below
Figure imgf000078_0002
17.3.1. Results of classification, Node 1 :
Table 35: Composition of classifier (3 genes) - Sorted by p- value :
Figure imgf000078_0003
17.3.2. Results of classification, Node 2 : - 77 -
Table 36: Composition of classifier (3 genes) - Sorted by p- value :
17.3.3. Results of classification, Node 3 :
Table 37: Composition of classifier (7 genes) - Sorted by p- value :
Figure imgf000079_0002
17.3.4. Results of classification, Node 4 :
Table 38: Composition of classifier (6 genes) - Sorted by p- value :
Figure imgf000079_0003
- 78 -
Figure imgf000080_0001
Example 17.4. Breast Cancer (BrCa) diagnosis using DNA derived from serum of patients
Example 17.4.1: Classifier defined using the inventive methylation test can be used for correct diagnosis and confirms scalability of the test
For designing a practical test including only diagnostically relevant classifiers performance of different feature extraction strategies using cross-validation from candidate markers derived from the methylation test of all 360 markers was evaluated.
The different feature extraction strategies were based on settings of using either p-values (p<0,005), a "Greedy Pairs" approach (n=10 greedy pairs) , and recursive feature elimination method. From these approaches a final marker panel for serum- testing was chosen obtaining 100% of correct classification during cross validation by statistical tests like Compound Covari- ate Predictor, Diagonal Linear Discriminant Analysis, 1-Nearest Neighbour Centroid, and Bayesian Compound Covariate Predictor; other approaches like 3-Nearest Neighbours and Support Vector Machines resulted in 95% correct classification during cross validation .
Only 19 selected biomarkers derived from feature extraction of all 360 marker-candidates were used in a separate assay and serum-DNA samples from patients and controls were analyzed. Using the 19 methylation markers 100% correct classification of tumor-samples (n=9) versus controls (n=9; Figure 3) was obtained.
17.4.2. T vs N (Compound Covariate Predictor = 83% correct) - 7 9 -
Table 39: Composition of classifier - Sorted by t-value: Class 1: Norm; Class 2: T.
Figure imgf000081_0001
17.4.3. T vs N (SVM= 82% correct; p<0.005)
Genes significantly different between the classes at 0.005 significance level were used for class prediction.
Table 40: Composition of classifier - Sorted by t-value: Class 1: N; Class 2: T.
Figure imgf000081_0002
17.4.4. T vs N- (Compound Covariate Predictor=96% correct; greedy pairs) - 80 -
Table 41: Composition of classifier - Sorted by t -value: (Sorted by gene pairs) : Class 1: control; Class 2: nodule.
Figure imgf000082_0001
17.4.5. Nodule pos vs control - (final combined list=100! correct; greedy pairs)
Table 42: Composition of classifier - Sorted by t-value: (Sorted by gene pairs): Class 1: control; Class 2: nodule.
Figure imgf000082_0002
- 8 1 -
Figure imgf000083_0001
Example 18: Breast cancer methylation markers
18.1. Diagnosis of existing metastases
Tumor-DNA from patients should be tested by the following markers for elucidating metastases already present, which might be not detectable by routine clinical examination or imaging.
patient groups:
0... no metastasis at diagnosis and durign follow up 1... metastasis during follow up 2... metastasis at diagnosis
Binary Tree Classification algorithm was used. Feature selection was based on the univariate significance level (alpha = 0.01 ) The support vector machine classifier was used for class prediction There were 2 nodes in the classification tree. - 82 -
Optimal Binary Tree:
Figure imgf000084_0001
NODE 1
Table 43: Composition of classifier (5 genes) sorted by p-value:
Geom mean of in- Geom mean of in-
Parametric p- % CV Gene t-value tensities in tensities in value support symbol group 1 group 2
1 0.0002927 -3 92 98 149.7469303 1031.3845804 TFPI2
2 0.0049604 2.Ϊ 352 98 221.1562041 133.2523039 NEUROD2
3 0.0057474 -2 897 94 639.3980244 6450.0516594 DLX2
4 0.006399 -2 857 92 99.6970101 112.7940118 TTC3
5 0.0066379 -2 843 98 99.6970101 109.6402212 TWISTl
18.2. Prediction of Metastases in Lymphnode-negative patients at inital-diagnosis
Survival Risk Prediction using BRB-ArrayTools
Table 44: Genes used in classifier of risk groups:
15 genes selected by fitting Cox proportional hazards models (alpha equals to 0.05)
The coefficients of the fitted Cox proportional hazards model using the principal components from the training dataset is (25.622, -19.237)
The percent of variability explained by the first 2 principal components is 75.797
The p-value in the table is testing the hypothesis if expression data is predictive of survival.
Figure imgf000084_0002
- 83 -
1 0.0037508 100 AhY _329 chrX:30143481-30143982 + _299-362 MAGEB2
2 0.0062305 100 AhY _193 chr2:47483597-47484030 + 217-278 MSH2
3 0.0078116 100 AhY _296 _chr7:98809925-98810139 + 127-191 ARPClB
4 0.0096053 100 AhY 128 chrl7: 35016970-35017711 + 250-313 NEUROD2
5 0.0156618 100 AhY 179 chr2: 118288805-118289169 + 64-128 DDXl 8
6 0.0182671 94.44 AhY 67 chrll:93939956-93940471 + 256-312 PIWIL4
7 0.0196289 94.44 AhY 242 chr4:4911767-4913093 + 771-835 MSXl
8 0.0220021 88.89 AhY _295 _chr7:93861567-93861950 + _62-118 COL1A2
9 0.0362384 55.56 AhY _116 _chrl 6: 13921618 -13921939 + 51-102 ERCC4
10 0.0370792 55.56 AhY _180 _chr2: 171383096-171383604 + 124-178 GADl
11 0.0415033 44.44 AhY _312 chr8:74368624-74368884 + _181-233 RDHlO
12 0.0460605 38.89 AhY 144 _chrl7:7532353-7532949 +_ 96-151 TP53
13 0.0465973 38.89 336 _hY_ 5-APC chr5:112101294+112101593 APC
14 0.0468988 38.89 AhY 327 chrX: 119133199-119133871 + 406-456 RHOXFl
15 0.0492264 27.78 AhY 50 chrll: 107598519-107599317 + 128-192 ATM
Table 45: Loading matrix of the significant genes and the correlations between the principal components and the signficant genes :
A new sample is predicted as high (low) risk if its prognostic index is larger than (smaller than or equal to) 1.532975. The prognostic index can be computed by the simple formula ∑iwi xi - 149.6498 where wi and xi are the weight and logged gene expression for the i-th gene.
- 84 -
Figure imgf000086_0001
Genes used in classifier of risk groups:
26 genes selected by fitting Cox proportional hazards models (alpha equals to 0.05)
The Cox proportional hazards model is fitted using the principal components and clinical covariates from the training data- set. The estimated coefficients are (-3.184, -20.948) for the principal components and (-0.709, 0.148) for the clinical covariates
The percent of variability explained by the first 2 principal components is 64.388
The p-value in the table is testing the hypothesis if the expression data is predictive of survival over and above the covariates .
Example 19: Methylation Markers in Non-Tumor / Non-Neo- plastic Disease: Trisomy diagnosis - 85 -
DNA derived from Cytogen fixed cells of Healthy Controls (5 females ...46XX; 5 males....46XY) and Trisomy-Patients (5 females....47XX+21; 6...males 47XY+21; and single samples with trisomy of chrl3 ... 47XX+13, and trisomy of chr 9...47XX+9 and one blinded sample with trisomy) were used for DNA Methylation testing.
The following data-analysis examplifies successful class-distinction of normal (class label... "46") and Down Syndrome patients (trisomy of chr21, class label... "47") .
The entire set of DNAs was amplified within the 359 marker set by Multiplex PCRs on 2 different PCR machines and data derived from both runs were used either together for analysis or separately. When a set of data was used from only the "Biorad"-PCR- machine, which was used for standard-analysis, this is indicated as "biorad+21".
Surprisingly, it was found that not only genes of the triplicated chromosome were affected but also genes which are not located on the additional chromosome are aberrantly methylated and serve as markers for detection of syndromal disease.
This is of relevance for diagnostic testing of patients suspected suffering from disease and also for prenatal testing (DNA derived from aminocentesis, chorionic villi, and DNA derived from fetal-cells or free DNA in serum of peripheral blood of pregnant women) .
Optimal Binary Tree: BinTree pred. (p<0.01)
Figure imgf000087_0001
Results of classification, NODE 1: - 86 -
Cross-validation results for a fixed tree structure: Percent correctly classified: 90, n=42
Table 46: Composition of classifier (11 genes) - Sorted by p- value :
Figure imgf000088_0001
Results of classification, NODE 2:
Cross-validation results for a fixed tree structure: Percent correctly classified: 100, n=20
Table 47: Composition of classifier (19 genes) - Sorted by p- value :
Figure imgf000088_0002
- 87 -
Figure imgf000089_0001
Results of classification, NODE 3:
Cross-validation results for a fixed tree structure: Percent correctly classified: 82, n=22
Table 48: Composition of classifier (11 genes) - Sorted by p- value :
Figure imgf000089_0002
9 0.0063042 3.051 50 616.5029552 52.8096314 TIMPl
10 0 .0079761 -2 .947 45 53. 8713383 160 .551704 TFPI2
11 0 .0085383 2. 916 52 110 .7146251 46. 2232459 MLHl
Example 19.1. ClassComparison "46 vs 47+21". (p<0.01)
Genes which discriminate among classes:
Table 49: Sorted by p-value of the univariate test.
Class 1: 46; Class 2: 47.
The first 11 genes are significant at the nominal 0.01 level of the univariate test
Figure imgf000090_0001
Example 19.2. ClassComparison "46 vs 47+21: biorad+21 only". - 8 9 -
(p< 0 . 01 )
Genes which discriminate among classes:
Table 50: Sorted by p-value of the univariate test:
Class 1: 46; Class 2: 47.
The first 10 genes are significant at the nominal 0.01 level of the univariate test
Figure imgf000091_0001
Example 19.3. ClassComparison "46 vs 47+21: biorad+21 only". (p<0.01)
Genes which discriminate among classes:
Table 51: Sorted by p-value of the univariate test:
Class 1: 46; Class 2: 47.
The first 7 genes are significant at the nominal 0.01 level of the univariate test
Figure imgf000091_0002
- 90 -
Figure imgf000092_0001
Example 19.4. ClassPrediction "46 vs 47+21: biorad+21 only". (p<0.05 & 0.005)
Correct Classif: 90%
p<0,05 many genes. • set p<0.005 • CorrClass = 90% (most methods OK)
Table 53: Composition of classifier - Sorted by t-value:
Class 1: 46; Class 2: 47.
Figure imgf000092_0002
Example 19.5. ClassPred λM6vs47_prediction" . (p<0.005)
Feature selection criteria:
Genes significantly different between the classes at 0.005 significance level were used for class prediction.
Cross-validation method: - 91 -
Leave-one-out cross-validation method was used to compute mis- classification rate.
T-values used for the (Bayesian) compound covariate predictor were truncated at abs(t)=10 level.
Equal class prevalences is used in the Bayesian compound covariate predictor.
Threshold of predicted probability for a sample being predicted to a class from the Bayesian compound covariate predictor 0.8.
Performance of classifiers during cross-validation
Figure imgf000093_0001
Performance of classifiers during cross-valudation :
Let, for some class A, nil = number of class A samples predicted as A nl2 = number of class A samples predicted as non-A n21 = number of non-A samples predicted as A n22 = number of non-A samples predicted as non-A
Then the following parameters can characterize performance of classifiers :
Sensitivity = nil/ (nll+nl2)
Specificity = n22/ (n21+n22)
Positive Predictive Value (PPV) = nil/ (nll+n21)
Negative Predictive Value (NPV) = n22/ (nl2+n22)
Sensitivity is the probability for a class A sample to be correctly predicted as class A,
Specificity is the probability for a non class A sample to be correctly predicted as non-A, - 92 -
PPV is the probability that a sample predicted as class A actually belongs to class A,
NPV is the probability that a sample predicted as non class A actually does not belong to class A.
For each classification method and each class, these parameters are listed in the tables below. Performance of the Compound Covariate Predictor Classifier:
Figure imgf000094_0001
Performance of the Diagonal Linear Discriminant Analysis Classifier :
Figure imgf000094_0002
Performance of the 1-Nearest Neighbor Classifier:
Figure imgf000094_0003
Performance of the 3-Nearest Neighbors Classifier:
Figure imgf000094_0004
Performance of the Nearest Centroid Classifier:
Figure imgf000094_0005
Performance of the Support Vector Machine Classifier: - 93 -
Figure imgf000095_0001
Performance of the Bayesian Compound Covariate Classifier:
Figure imgf000095_0002
Predictions of classifiers for new samples:
Table 54: Composition of classifier - Sorted by t-value: Class 1: 46; Class 2: 47.
Figure imgf000095_0003
Cross-Validation ROC curve from the Bayesian Compound Covariate Predictor
The area under the curve is 0.882.
Note: the classification rule used above is different from the class prediction. Here, if a sample's posterior probability is greater than the threshold, it is predicted as Class 1. Otherwise, it is predicted as Class 2.
Example 20: Osteoarthritis
Osteoarthritis (OA, also known as degenerative arthritis, degen- - 94 - erative joint disease) is a group of diseases and mechanical abnormalities involving degradation of joints, [1] including articular cartilage and the subchondral bone next to it.
6 arthritic and healthy paired cartilage DNA patient samples of (N-12) & corresponding PB (N=6) were used for enrichment of the Methylated DNA fraction using Restriction enzymes and Rolling- Circle Amplification (RCA) . RCA-amplicons (n=18) and unamplified DNA from PB (n=6, methylationsensitive digested) were subjected to the ARC-CpG360 assay (Fig. 5) .
Class Prediction: A) PAIRED - CARTILAGE
Performance of classifiers during cross-validation, n=6
Figure imgf000096_0001
- 95 -
Figure imgf000097_0001
Performance of classifiers during cross-validation delineated a classifier via Diagonal Linear Discriminant Analysis which en- bales correct classification of DNA from healthy versus diseased cartilage tissue in 83% of samples.
Table 55: Composition of classifier - Sorted by t-value
Figure imgf000097_0002
Example 21: Breast Cancer vs. blood DNA - 96 -
Example 21.1. Class Prediction using "grid of alpha levels": resulted in 100% correct classification
47 breast cancer ("BrCa") samples and 30 samples of normal blood ("norm_blood") were compared.
Feature selection criteria:
Genes significantly different between the classes at the 0.01, 0.005, 0.001 and 0.0005 significance levels were used to build four predictors. The predictor with the lowest cross-validation mis-classification rate was selected. The best predictor consisted of genes significantly different between the classes at the 5e-04 significance level.
Cross-validation method:
Leave-one-out cross-validation method was used to compute mis- classification rate.
T-values used for the (Bayesian) compound covariate predictor were truncated at abs(t)=10 level.
Equal class prevalences is used in the Bayesian compound covariate predictor.
Threshold of predicted probability for a sample being predicted to a class from the Bayesian compound covariate predictor 0.8.
Performance of classifiers during cross-validation.
Table 56 - Composition of classifier: Sorted by t-value Class 1: BrCa; Class 2: norm blood.
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
- 99 -
78 < le-07 15 .745 100 2646 .4240937 79. 6479263 33. 2265285 ESRl
79 < le-07 16 .193 100 8358 .7943596 230 .2362876 36. 3052864 KL
80 < le-07 17 .733 100 2278 1.9857745 663 .1165477 34. 3559301 HICl
Example 21.2. Class Prediction: gene-pairs 100% correct
Table 57: Composition of classifiers from Class Prediction Analysis - Sorted by gene pairs
Figure imgf000101_0001
Table 58: Composition of classifier - Sorted by t-value
Rows 1-8 in the table contain control genes, 9-16 diagnostic genes suitable for class-prediction (= elucidation of Breast Cancer) - 100 -
Figure imgf000102_0001
Example 21.3. Class Prediction using PAMR → 100% correct Concept: define minial set of genes using PAM (prediction analysis of microarrays) elucidates 3 genes sufficient for 100% correct diagnostic testing
Cross-validation mis-classification rate as a function of the threshold parameter. Threshold 8.57 was selected.
Prediction Table: a cross-tabulation of true (rows) versus predicted (columns) classes for the PAM fit (Fig. 4a and b)
Figure imgf000102_0002
- 1 0 1 -
Cross-validation mis-classification rate: 0 percent. These parameters are listed in the table below
Figure imgf000103_0001
Table 59: Composition of PAM classifier - 3 genes selected by PAM (threshold equal to 8.57)
Class 1: BrCa; Class 2: norm blood.
Figure imgf000103_0002
- 102 -
SEQUENCE LISTING
SEQ ID NO: DNA-SEQUENCE
1 CGGCCGGTCAGGAATCCCCATCCTGGAGCGCAGGCGGAGAGCCAGTGGCT
2 CCAAAAAAGGTGACACTGCCCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGTCCGGGCC
3 CGCCCCGCCCCCGCCAACAACCGCCGCTCTGATTGGCCCGGCGCTTGTCTCTT
4 AGCGGCCTCAGCCTGCGCACCCCAGGAGCGTGGATGACTACGGCCACCCC
5 GCAGCCGAGAGGGTCAGGCCCCCATAGGTCCTCAGCCTGCTTCAACCTCAAAGGGGATGGGGG
6 TCCTGGCAGCATTACCACACTGCTCACCTGTGAAGCAATCTTCCGGAGACAGGGCCAAAGGGCCA
7 CTGACAAGAGACATGCAGGGCTGAGAGGCAGCTCCTTTTTATAGCGGTTAGGCTTGGCCAGCTGC
8 TGGCATCCACTTGCTTGATCCAGCCAGATTCCCACTCCCATGCCCTCTCCACTATTGCGATTGC
9 CTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGTGCCGGCCTCCTGTCATCCTCC
10 CCGGCGCACTCCGACTCCGAGCAGTCTCTGTCCTTCGACCCGAGCCCCGC
11 TAGGTGGTGAGTTACTTGGCTCGGAGCGGGCGAGGGGACGCGTGGGCGGAGCG
12 AACCACCTGATCAAGGAAAAGGAAGGCACAGCGGAGCGCAGAGTGAGAACCACCAACCGAGGCGC
13 CGGGGGTAGGCTTTGCTGTCTGAGGGCGTCTGGCTGTGGAGCTGAAGGAGGCGCTGCTGAG
14 GCCCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTGGCGCCC
15 CGGGGCCACGCGCTAAGGGCCCGAACTTGGCAGCTGACCGTCCCGGACAG
16 CCACCGAACACGCCGCACCGGCCACCGCCGTTCCCTGATAGATTGCTGATGC
17 GAACTGGGTCGTGGAAGGATCGCGGGGAGCGGCCCTCAGGCCTTCGGCCTCACT
18 CCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCT
19 GGCGGCTGGTGCTTGGGTCTACGGGAATACGCATAACAGCGGCCGTCAGGGCGCC
20 TCAGATTCCTCAGGGCCGCAGAGGTGTGGAGCTGGTTGGGCCGGTTCTTCACCCTCCTCCC
21 CTGGCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCCGG
22 CTAACCTTCCTCGCCGCCTTCCTGCGGGTGACCCCCAAACGCCCCAGCTCCGC
23 CCGACTTGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTT 24 ACAGAGTCGGCACCGGCGTCCCCAGCTCTGCCGAAGATCGCGGTCGGGTC
25 GGGGATGGAGAACTCTCCTCGCTTCGTCCTCTCTCCCGGGGAATCCCTAACCCCGCACTGCG
26 GTGGCTCGGGTCCACCCGGGCTGCGAGCCGGCAGCACAGGCCAATAGGCAATTAG
27 CTCACCCCGCGACTTACCCCACACCCCGCTCTCCAGAACCCCCATATGGGCGCTCACC
28 ACACACCACTGCAGCGTTCAAACGCTGGGAAGAAGACTCCCTTGTGGCACCGGAAACCCACGAGG
29 CCGCCACGAACTTGGGGTGCAGCCGATAGCGCTCGCGGAAGAGCCGCCTC
30 CTCCATAGCCCTCCGACGGGCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCT
31 TGGACACCCCAAGAGCTCACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGC
32 CCGCCCGTTTCAGCGGCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCA
33 AACCCAGGCTTGTCAGCCTAAGAACACGGGATCTCTTCACTGTGGTTCATGTGTAGAGTG- GAGTTTCCA
34 CAGTCCCCTGCCGTGCGCTCGCATTCCTCAGCCCTTGGGTGGTCCATGGGA
35 CAGGTGGGCGTCTCAGGGGTGGGAGTGGCCGCGTCGTGAAGCGGAGAGAGGA
36 CTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTG - 103 -
37 TTGTATGGATTTCGCCCAGGGGAAAGCGCTCCAACGCGCGGTGCAAACGGAAGCCACTG
38 GAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAGGATGCGCAGACTGCCT
39 ACGCACCGCGGCTCCTCGCGTCCAGCCGCGGCCAAGGAAGTTACTACTCGCCCAAAT
40 CGCTGCCTCGCCATTGGGCGGCCGAACGCAGCCACGTCCAATCAGAGGAGT
41 GAGGTTCTGGGGACCGGGAGAGTGGCCACCTTCTTCCTCCTCGCGAAGAGCAGGCCGGG
42 AGTGGGATTGGGGCACTTGGGGCGCTCGGGGCCTGCGTCGGATACTCGGGTC
43 TCAAGCCGCCTCAGGTGAGCGCTCCTTGGCGCTACTTCCGGTCTCAGGTGAGGCCGC
44 TTGTGACGTGTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCAC-
CCTGC
45 CGGGGGGAGAAGTCCTGGAGCGGGTTTGGGTTGCAGTTTCCTTGTGCCGGGGATCCTGTCC
46 GAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAGGCCTCTGGCTGCTGAGG
47 CCCTCTCTCCCCTGGCCCGCAAAGTTTTGGCGGAGCCATCGCTGGGGCTGAGC
48 CCAGGGGGAACTTGTGGCAGTGCAGCATCTCAGGCCAGGGGAAGCCGTAGGCCTCCATGA
49 CGCCACCCAGAGCCCGAGGTTTGCCCTTCAGAAGCGGACCCGCAGACTCCTCGGACT
50 CGCCGAAATGAAACCCGCCTCCGTTCGCCTTCGGAACTGTCGTCACTTCCGTCCTCAGACTTGGA
51 TCCCTTGTTTTGAGGCGGGAACGCAACCCTCGACCGCCCACTGCGCTCCCA
52 GGCAGCCGGGAAATCCCGTGTCCCCACTCGTGGCAGAGGACGCTGTGGGG
53 CCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCATGTCAA
54 GGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACTTTCTCCCGGTGCCG
55 AAGTTCCGCCAGTGCACAGCAACCAATGGGCGGAGGGGTCCTTTGCCCCTGGGTTGC
56 AGTTGGGCCGGATCAGCTGACCCGCGTGTTTGCACCCGGACCGGTCACGTG
57 GGGCCGCTGCCTACTGTGGGCCTGCAAGGCGTGCAAGCGCAAGACCACCA
58 ACCTCCCTGCTGCGTGTCGCAAACCGAACAGCGGGCGTTGGCCCTCCTGC
59 GGGACCCGGAGCTCCAGGCTGCGCCTTGCGCCCGGGTCAGACATTATTTAGCTCTTCGGTTGAGC
60 GGCCGTGCGGGGCTCACCGGAGATCAGAGGCCCGGACAGCTTCTTGATCGCC
61 CCACTGCCTGCGGTAGAACCTGGTCCCGCATAGCTTGGACTCGGATAAGTCAAGTTCTCTTCCA
62 GGGCCGCAGGCCCCTGAGGAGCGATGACGGAATATAAGCTGGTGGTGGTGGGC
63 GCAGGACCCGGATGAGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGC
64 AAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGGGACTCCAG
65 AGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGCGCCC
66 GCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTCCCGGCC
67 TGCTGGACATCCACCGCCTCCAGGCAGTTTCGCCGTCACACCGTCGCCATCTGTAGC
68 GGCCGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGC
69 CGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGCCAGAAGACGCTCTACTCCTTTTTCTCC 70 CGGGGGCGGAGGAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCG
71 AGGGCCAGCCCTTGGGGGCTCCCAGATGGGGCGTCCACGTGACCCACTGC
72 GTGAAAGGTCGGCGAAAGAGGAGTAAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCG
73 GCGCTGAGGTGCAGCGCACGGGGCTTCACCTGCAACGTGTCGATTGGACG
74 GAGGCCTCATGCCTCCGGGGAAAGGAAGGGGTGGTGGTGTTTGCGCAGGGGGAGC
75 CGAAGTGGAAACCGGAGTTGCGTCATTGCTCCCACCCGATATCACCTTGGCAGCGACCGCG - 104 -
76 ATGGGGTGCTCATCTTCCTGGAGCTGAGGAGCTGGGACGGGCATGGGGTGCTCATCCTCCTG
77 TTCCAGCCGGTGATTGCAATGGACACCGAACTGCTGCGACAACAGAGACGCTACAACTCACCGCG
78 CAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGCCCGCTGGCGGCAGCGGC
79 GACACACCCACCTCAGCAGATCTCAGCCCATCCCTCCCAGCTCAGTGCACTCACCCAACCCCAC
80 CGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCTGCG
81 GGCCCAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGACCCACGC
82 ACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGACGAGCGCGTCTTCCGCAGTCCCA
83 CCCTTGGAAGGCGTGGAATTAGGAGAGAAATCCCTTAGTGGGCACACGAGTGAGTGCCCCTTGGA 84 CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTCCCGCCCC
85 TGCGTGGGTCGCCTCGCGTCTCTCTCTCCCACCCCACCTCTGAGATTTCTTGCCAGCACC
86 GACTTCGCGTCGCCCTTCCACGAGCGCCACTTCCACTACGAGGAGCACCTGGAGCGCAT
87 GAGGCTGCGAGCCTGGGCTCCCAGGGAGTTCGACTGGCAGAGGCGGGTGCAG
88 CCATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGCGCCTGCCACCACTCCCGGC
89 CAGGGGACGTTGAAATTATTTTTGTAACGGGAGTCGGGAGAGGACGGGGCGTGCCCCGACGTGCG
90 ACCCTGGAACGACGCCAAACGCGACCCCTACCAGAGGACTCGCGCATGCGCAGC
91 GTTCCCAAAGGGTTTCTGCAGTTTCACGGAGCTTTTCACATTCCACTCGG
92 GAAAGACACCGCGGAACTCCCGCGAGCGGAGACCCGCCAAGGCCCCTCCAG
93 CCCTCTCCGCCCCAAACAGCTCCCCACTCCCCCAGCCTGCCCCCACCCTC
94 ATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACTGGGTGTGGTAGAGGCTGTCCAGGG
95 CCCAGCGGGGCCCTTAGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCG
96 GCGCCCAAGGCCCTGCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTC
97 CCCAGCCGAGCAGGGGGAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGG
98 GCCGCCACCTGTTGAGGAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGG
99 CGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCGGCTATAAGAACCGGGAACTGGGCGCG
100 GGCGGGGAAGCGTATGTGCGTGATGGGGAGTCCGGGCAAGCCAGGAAGGCACC 101 GGAGCCCGCAGTGCGTGCGAGGGGCTCTCGGCAGGTCCAGACGCCTCGCC
102 CGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGCCCTTCTGGTGTT
103 GCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAACGAGAATCCTCC
104 GGCAGGAAAGGGCCCGAAGGCAGCGAAGGCGAACGCGGCGCACCAACCTG
105 ACAGGGTCTTCCCACCCACAGGGCACCCAGGCGCAGCGGAGCCAGGAGGG
106 ACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGTGGGGTCTTTCTGGCGGC
107 CAGCCGGGCAGATAACAAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGT
108 GGCCTCGACGCCGAGGGGTGTCCCTCTCCTCTCCTGGTCAGGGAACGCAGCAACTGA
109 GGGCGGCAGTCAGAGCTGGAGCTCCGGGGAATCAGACGGGCAGCCAAAGGAGCAGA
110 CGGAAGTGCCCCGGTCCTGGAGGGGGTGGAAGTTGGGGAGCCCAGGCAGGA
111 CCGAGAGGGAAGAAAAAAATACCCTCTTTGGGCCAGGCACGGTGGCTCACCCCTGTAATCCCAGC
112 TCCCAGCACTTTGGGAGGCTGAGGCGAGCGGATCACGAGATCAGAAGATCGAGACCATCCTGGC
113 CCCCGGGACCGGATAACGCCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCC
114 CTCGCGACCCCGGCTCCGGGCCTCTGCCGACCTCAGGGGCAGGAAAGAGTC
115 CCCGAGGCTCGCCCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCT - 105 -
116 CTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGTGCTCAACACGCAGC
117 CCGGCCTTTCCGCCAGAGGGCGGCACAGAACTACAACTCCCAGCAAGCTCCCAAGGCG
118 GGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCACCAGGTAGGACTGGGACTCCCTTAGGGCCTG
119 GGGAGTGTCCTCCTCCGGGACAGCCGGACTCCCGCCGACTTCTGGGCGGC 120 GGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAGAGAGGACTTG 121 GACAGATGCAGTGCGTGCGCCGGAGCCCAAGCGCACAAACGGAAAGAGCGGG
122 TCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCC
123 TGCGGCTTCTCTCACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCG
124 GCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCACCGCTTCCCTTCTTCCA
125 CCCGCACCCCTATTGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTG
126 GCAGAGTTCGTGCAGGGAGTTCGCACATAGGAGAGCACCGGTCCGGGAGTGCCAGGCTCG
127 CGGCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGAC
128 TTTTTGGGACAACCATGGAGGGGTCCTCCGTCTCGGCCTCTTCGCATATCCCCCTCCGTGATCC
129 CGGCGGGTCAGATCTCGCTCCCTTTCGGACAACTTACCTCGGAGAGGAGTCAAGGGGAGAGGGGA
130 CCCGGACGAGCTCTCCTATCCCGAAGTTGTGGACAGTCGAGACGCTCAGGGCAGCCGGGC
131 CGGCCGGTGGAGGGGGGAAGGGAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAG
132 CACCAAAGCCACCACCCAAGCCAGCACCAAGGCCACCACCATATCCTCCCCCAAAGCCACTACCA
133 CCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATGGCTTCCA 134 CCTGTCCGGATCCCTCCCCGCCTTGCTCAGATCTCTGGTTCGCGGAGCTCCGAGGC
135 GCGCAGGGGCCCAGTTATCTGAGAAACCCCACAGCCTGTCCCCCGTCCAGGAAGTCTCAGCGAG
136 TCCTGCCCCAGTAAGCGTTGGACCGGGAGACGCAGTGCTCAGCATCGGTCAGCAGGG
137 GCGCCGAGGAGTCGGGACAGCCCCGGAGCTTCATGCGGCTCAACGACCTG
138 GGCCCCAGCGGAGACTCGGCAGGGCTCAGGTTTCCTGGACCGGATGACTGACCTGAGC
139 CGCCGGCTGCGAAGTTGAGCGAAAAGTTTGAGGCCGGAGGGAGCGAGGCCGG
140 GGAGCCGCTTGGCCTCCTCCACGAAGGGCCGCTTCTCGTCCTCGTCCAGCAGC
141 AAATGTGGAGCCAAACAATAACAGGGCTGCCGGGCCTCTCAGATTGCGACGGTCCTCCTCGGCC
142 CCTCTCAGATTGCGACGGTCCTCCTCGGCCTGGCGGGCAAACCCCTGGTTTAGCACTTCTCA
143 TCTCCCCACGCTTCCCCGATGAATAAAAATGCGGACTCTGAACTGATGCCACCGCCTCCCGA
144 GCCCAATCGGAAGGTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCG
145 CGTGGGGGGCTGTTTCCCGTCTGTCCAGCCGCGCCCACTTCTCAGGCCCAAAG
146 GGGGCCCTCGTGTTGCTGAACGAGGGCGGGTTCGCGATGTAAATAAGCCCAGAGGTGGGGTC
147 CCTGGGTCCCCTCGGCTCTCGGAAGAAAAACCAACAGCATCTCCAGCTCTCGCGCGGAATTGTC
148 CATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACTGCCTCCCACCGCACTGGGGTCAA
149 ATCCCGCTGCACCACGCCATGAGCATGTCCTGCGACTCGTCTCCGCCTGGC
150 GCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCGGCGAAGTCG
151 CATTTCTTTCAATTGTGGACAAGCTGCCAAGAGGCTTGAGTAGGAGAGGAGTGCCGCCGAGGCGG
152 AAGTTCACTGAGGGTTGTAAGAGTCAGAATGGACTCCATGGAAGTTATGGGGTGTGAATCAAACCT- CACA
153 CAGCACTTTGGGAGGCCGAGGTGGGCGGATTGCCTGAGGTCAGGAGTTTGAGACCAGCCTGG 154 GGGCAACACACACAGCAGCGACAGCCGGGAGGTAAGCCGCGTCCCAGCGG - 10 6 -
155 CTGAGGGGAGGAGAAACTGGGCTGCGGGGGTCCGGGAGGGTGGATTCCGAGAAACTATGTGCCC
156 GTGTCCCAGCGCGTTGACGCAGCCTGTGATCCCTCGCGAGGCGAGGAGAAGGTC
157 AACCCCGACCTCAGGTGATCTGCCCAAAAGTGCTGGGATTACAGGCGTCAGCCACCGCGCC
158 AGGACGAAGTTGACCCTGACCGGGCCGTCTCCCAGTTCTGAGGCCCGGGTCCCACTGGAACT
159 GGAGACGCGTTGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAAT
160 CACAAAGGCCAAGGAGGGAGTGCGCAGGTCACGTGCGCCGGTGGTCAGCG
161 CTGACCTGGCGCTGCTGCCCCTGGTGCCTGACGGAGGATGAGAAGGCCGCC
162 AAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCACCGCCGGACGCTGGCTCCC
163 GTTCTGTTGGGGGCGAGGCCCGCGCAAGCCCCGCCTCTTCCCCGGCACCAG
164 GCGTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCG
165 TGTCTGAGTATTGATCGAACCCAGGAGTTCGAGATCAGCTTGAGCAAGATAGCGAGAACCCCCGC
166 GAAAGACTGCAGAGGGATCGAGGCGGCCCACTGCCAGCACGGCCAGCGTGG
167 TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC
168 TTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTGTGC
169 GGGGACGAGCAGGAAAAGGCCGGGGTGGGGGTGGAATTCCTCGGCGGGCAG
170 GGGAGCCTGAGGCAGGAGAATCGCTTGAATCCGGGAGGCGGAGGTTGCAGTAAGCCGAGATCGC 171 CTTTCGGAGGCCTCATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT
172 CCTCCTGGGGTCAAGTGATCATCCTGGCTCAACCACCCAAGTAGCCGGGACTACGGGTGGCCGC
173 CCAATGCCCCAACGCAGGCCACCCCCGGCTCCTCTGTGGACTCACGAAGACAAGGTC
174 CTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGGCTGCGCTGGGGCCT
175 GGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCGGACAGCGTCACCT
176 GCCAGGATGGTCTCGATCTCCTGACCTTGTGATCTGCCCGCCTCGGCCTCCCAAAGTGTTGGG
177 TGCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTG
178 GACCGGTTTTATCCCGCTGAGGCCCTGGGAGATGGGTCTGGCGAGGCTCGTAGGCCGC
179 GCGGAACCTCAAATTGCGGCAGCGGAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGG
180 GCTCAGTCCCTCCGGTGTGCAGGACCCCGGAAGTCCTCCCCGCACAGCTCTCGCT
181 CGGGCAGGCGGGACCGGGAGGTCAATAACTGCAGCGTCCGAGCTGAGCCCA
182 CGCGGTGGGCCGACTTCCCCTCCTCTTCCCTCTCTCCTTCCTTTAGCCCGCTGGCGCC 183 TCCCCGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACG
184 CTCCGCGCTCAGCCAATTAGACGCGGCTGTTCCGTGGGCGCCACCGCCTC
185 GCGAGAGGGTCGTCCGCTGAGAAGCTGCGCCGGAGACGCGGGAAGCTGCTG
186 GACCCGCCTGCGTCGCCACCCTCTCGCCGCTCCCTGCCGCCACCTTCCTC
187 GAGGGGTCCGGGACGAAGCCACCCGCGCGGTAGGGGGCGACTTAGCGGTTTCA
188 CCCCGAACAAAAAATTCAAATGGGAAAGAGAGGCAGATGGCAGAGAACAGGGGAGGGGCTGGGCA
189 GCGGCGAGGAGGGTCACAGCCGGAAAGAGGCAGCGGTGGCGCCTGCAGAC
190 GGCGGTCTCCGGTTCGCCAATGTGGCTGGGTCCGTAGGCTTGGGCAGCCT
191 CCTCCCCTTTGCGTGCGGAGCTGGGCTTTGCGTGCGCCGCTTCTGGAAAGTCG
192 AGCCTACTCACTCCCCCAACTCCCGGGCGGTGACTCATCAACGAGCACCAGCGGCCAGA
193 CAGGAGGTGAGGAGGTTTCGACATGGCGGTGCAGCCGAAGGAGACGCTGCAGTTGGAGAGCG 194 AGATTTCCCGCCAGCAGGAGCCGCGCGGTAGATGCGGTGCTTTTAGGAGCTCCGTCCGACA - 107 -
195 CGGGCGTGGTGGTGGGCACCTGTAATCCCAGCTACTCAGAAGGTTGAGGCAGGAGAATCGCTTGA
196 TCCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCC
197 CCCTGGTCGAGCCCCCTTTCCTCCCGGGTCCACAGCGAGTCCCCTGAGGAAGGAGGG
198 CAGGGACCCGCGAGTCCCTGGACACGCACTGGCCAACGCCAGACCCCATC
199 CAAGCAGCCCTCGGCCAGACCAAGCACACTCCCTCGGAGGCCTGGCAGGG
200 GAGAAGGAGCGACCCCCAAAACGAAGCGGCTGGATCTGACCTTCCAAGGCCTGTTGGCGACGC
201 TTCTTCCCCGCAGGGTCAGCGCTGGGGCTCCGGCCGTAGAGCCACGTGACC
202 ATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGTTATAAATAAACGTT
203 ACCGAGTGCGCTGCTGTGCGAGTGGGATCCGCCGCGTCCTTGCTCTGCCC
204 GTGTGGTGAGTGTGGGTGTGTGCGCGTCTCCTCGCGTCCCTCGCTGAGGTGCCT
205 GCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGATCATGCCCATCCTGTGCAGAAG
206 GCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGGGCCACTC
207 TCTCCATGGTGGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGC
208 CGCGGGACTCCGCGGGATCTCGCTGTTCCTCGCTCTGCTCCTGGGGAGCC
209 CGCCCCCTTTTTGGAGGGCCGATGAGGTAATGCGGCTCTGCCATTGGTCTGAGGGGGC
210 GTTCTGTTGCCAATGCCATTCAGACCCCAGTCCGGGATTCCGCGCTCGGGGTGCG
211 TTTCCGCGAGCGCGTTCCATCCTCTACCGAGCGCGCGCGAAGACTACGGAGGTCGA
212 ACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCTGAGCCTCCTG
213 GAGAGTGGACGCGGGAAAGCCGGTGGCTCCCGCCGTGGGCCCTACTGTGC
214 GGCTACAGCCGCCATTTCCACGCTCCACCAATCAAATCCATTCTCGAGGAAGACGCACCGCCCC
215 AGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCG
216 ATCGGGCGAAGCTCGCGGGAAACCGCTCTGGGTGCGCAGGACAAAGACGCG
217 CGACGGAGCCGTGTGGAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCG
218 CGCCCCACTACTGCCTGCAGCGGGCTTCCTTACTCCGCCTGCTGGTTCCTACTGGAGGAGAGGCC
219 GCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGTTGAAGAAGTGAAGCGCCG
220 TGAAGGGAGGGCTTGGTGTGGGGACTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTA
221 CGGGAGCGCCCGGTTGGGGAACGCGCGGCTGGCGGCGTGGGGACCACCCG
222 CAGCACCGGAGAGGGCGCACTGCAAAGGCGGGCAGCAGACCGTGGAGAGC
223 GGCGCAGAGGCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGC
224 GCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCTG
225 CGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGAGCGATCCGAGCAGGGTTTGTCTGGGC
226 GCAGCGCTGAGTTGAAGTTGAGTGAGTCACTCGCGCGCACGGAGCGACGACACCCC
227 CGCGCGCTCGCCGTCCGCCACATACCGCTCGTAGTATTCGTGCTCAGCCTCGTAGTGGC
228 CGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGGGAGTTAGCGGCGTCTCG
229 GGGGGCGTCGGGCTTGGGACAGGGGAGGATACCAGGGCCACCTTCCCCAACCC
230 CGGGCTGGAGGGTTATCTGGGAAGTCAGCCCCGGCCTCGGTCCTCTCCACGTTGCTGC
231 GGAACGAGGTGTCCTGGGAACACTCCCGGGTCTGTAACTTCGGACAAATCACGCTCGCTTTCCCG
232 AAACGAGAGAGTAGCCAGACTCTCCGCGCATGGAGCCGACGGCACCCACCAGCACACCG
233 TACTCACGCGCGCACTGCAGGCCTTTGCGCACGACGCCCCAGATGAAGTC
234 TGACCGGACAGAGCAGAGCGGGGACTGCAATTCCCAGAAGACCCCACGGTAGGGGCGG - 108 -
235 AGACAATCCCGGAGGGGGAAAGGCGAGCAGCTGGCAGAGAGCCCAGTGCCGGCC
236 GGCCGAAGAGTCGGGAGCCGGAGCCGGGAGAGCGAAAGGAGAGGGGACCTGGC
237 CCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCACCGCGCATCCAGGAGCCAC
238 CTCTGATGACGCTCCAAGGGAAGAGGAAGTGGGGATCGGCGAGCGGGTGGGTGCGC
239 TGAAGGGTAATCCGAGGAGGGCTGGTCACTACTTTCTGGGTCTGGTTTTGCGTTGAGAATGCCCC
240 CGGTCCTGCATGCAATGCAAGCCTGAGCTCTCCCGCCATAAGGCTGCAGCGGTGTGG
241 CCTGGAGGAGGAGGAGTCAGGCCGGGTAGGAGGGCTAAGGAGGTTCCCGGGAAGGCAGGGCCC
242 GCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAGCCGGC
243 GAGCGGCGCAGGGTTGGAGAGGGAAGCGCTCGTGCCCACCTTGCTCGCAG
244 CCGATGACCGCGGGGAGGAGGATGGAGATGCTCTGTGCCGGCAGGGTCCC
245 GCCGCCCTACAGACGTTCGCACACCTGGGTGCCAGCGCCCCAGAGGTCCC
246 GGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAGGAGGGGCAGGAGGAAGGGGTG
247 CGGCGGGACCATGAAGAAGTTCTCTCGGATGCCCAAGTCGGAGGGCGGCAGCGG
248 GTGGGCGCACGTGACCGACATGTGGCTGTATTGGTGCAGCCCGCCAGGGTGT
249 GAAAGAGCCGGAAACACCTGGTCTCTCAAGCAGGTACAGCCCGCTTCTCCCCAGCACCCCGGTG
250 GCAGCCGCAGCTGAGGTCACCCCGCTGAGGTGGTGGGGAGGGGAATGGTT
251 GGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGCCGC
252 GGGCACCACGAATGCCGGACGTGAAGGGGAGGACGGAGGCGCGTAGACGC
253 GAGGCCGCCATCGCCCCTCCCCCAACCCGGAGTGTGCCCGTAATTACCGCC 254 CGCGGGGAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAGGGGCGC 255 CTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAGGTAGCGGCCAC 256 CACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTACGAGTGGATGC 257 CCCTTGCACACGAGCTGACGGCGTGAACGGGGGTGTCGGGGTTGGTGCAA
258 GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAGGCA
259 GGTTGGGGGACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCG
260 CGTTGAAAGCGAAGAAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTC
261 GAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCG
262 GCTCCTGAGAAAGCCCTGCCCGCTCCGCTCACGGCCGTGCCCTGGCCAACTT
263 GATGCTGCTGCCGGAGCTGAGGTCTTGCCTGGAGATCCGAACGAGACACCACGTCAACCGG
264 TGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGTCCAGA
265 CCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTTCGCGAAA
266 CAGGCGGCTCCCGCAGTCTAAGGGACCTGGCGCGAGTCCGGGAAGCGGAGG
267 CTGCACGCGGTGCGAAGGGGCCAGCAGGGAAGGAGCAGAGGATGGGGGGT
268 CGGGGCCACAGGACCCTGGGGCTTGAGTCACACAAGAATGTCTCTGGGAGACCCGAGAGACTCA
269 CTTAGAGGAGGAGGAGCAGCGGCAGCGGCAGCAGGAGGCGACAGCTGCCAGCCG
270 CTCATACCAGATAGGCGCGAACGCCTCTGGCAGCGGCGTCCAGGGGGTCCGGC
271 GGGTGCTGGCACATCCGAGGCGTTCTCCCGACTCTGGACCGACGTGATGGGTATCCTGG
272 CATGATAAGCCAGGGACCTCGCGGCGCAGGCGGAGGGAGGGAGAGCGTCGC
273 CCCCCCACTCAACAGCGTGTCTCCGAGCCCGCTGATGCTACTGCACCCGCCG
274 TCCCACCTGCTGCCCGAGGAAGACTTCCGGGAGAAACGCTGTCTCCGAGCCCCCG - 10 9 -
275 CCAGGTGAAGCCGAAGGGGAAGCGGATGGGGTTGCTGAACGCGGAGTCGGCG
276 CAGTGGCCCTGCGCGACGTTCGGCGCTACCAGAACTCCGAGCTGCTGATCAGCAAGC
277 AAGGATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGG
278 GCAGGCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAA
279 GAGGGAAGTGCCCTCCTGCAGCACGCGAGGTTCCGGGACCGGCTGGCCTG
280 GAAGCGCGACCTCGGGCGGTTGGAGGGGCTACCGGGTCTTACCAGTCCGTGGCG
281 CCCAACCCGAGCAAGACCTGCGCTGAAACGGATTGGCTGCCCTCCGCCCG
282 AGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAGGCTCCCACC
283 ACCACACGGCCAAGGGCACCTGACCCTGTCAAAACCCCAAATCCAGCTGGGCGCG
284 CCGAGGCAGCCGGATCACGAAGTCAGGAGTTCGAGACCAGCCTGACCAACATGGTGAAACCCCGT
285 CCGGCGTCTCCGCGTGGGGCGCACCGTCCGACCCCCCCCTCCCGGTGTGC
286 GGCGCAGATGGCGCTCGCTGCGAGATGGATGCTCCAGGGCGGGTAATCACTCCTG
287 CCAGGCCTCCTGGAAACGGTGCCGGTGCTGCAGAGCCCGCGAGGTGTCTG
288 GGCGAGAGGTGAGAAGGGAAGAGGGCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGC
289 GCCTGCCTCGCCTCTGCCCGAGCTGATGAGCGAGTCGACCAAAAAAGAGTTCGCGGCG
290 CATTGCGGGACCCTATTTATCCCGACACCTCCCCTGACGTGGGCTCGGAACGCTCCCTTGGCAG
291 CGAAGGCCGGAGCCACAGCGCTCGGTGTAGATGCCGCACGGCTGGCCCTC
292 GGGCTGGATGAGTCCGGAAGTGGAGATTGGCTGCTTAGTGACGCGCGGCGTCCCGG
293 CGCCAGTGCGATTCTCCCTCCCGGTTCCAGTCGCCGCGGACGATGCTTCCTC
294 CGTCCGAGAAAGCGCCTGGCGGGAGGAGGTGCGCGGCTTTCTGCTCCAGG
295 TCCGGCTGCGCCACGCTATCGAGTCTTCCCTCCCTCCTTCTCTGCCCCCTCCGCTCC
296 CAGCCTCAGTTTCCCCATTGGTAAAGCATTGACGGTGGTTGCGGACGGCTTCTGCGGACAGAGCC
297 CCTGAGACAGGCCGAACCCAACTCTTCACAGGGCCGAATTCTTTGCCCGCAGCCCAGCACC
298 CAGAGGGGGGTGCCGGGGTCGCGGACTGCCACCAGGTTGAGGAAAGGAGGGG
299 CGACATCCTGCGGACCTACTCGGGCGCCTTCGTCTGCCTGGAGATTGTAAGTGGGGCCGC
300 ACCGCCTCCTCCCCGCTGTCTGGGTCGCAGGCCTTAGCGACGGGCTGTTCTCCG
301 CTCGGGACTCCAGGGCTGTCCCTCCCGCAGGCTGTCCTTCCACCTCCACCCCA
302 CGGCCGCTCCTCGTAGGCCAGGCTGGAGGCAAGCTCCTTCTCCTCAAAGCTGCGCTGC
303 CATCTCTTCCCCCGACTCCGACGACTGGTGCGTCTTGCCCGGACATGCCCGG
304 CCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGTGAGTCCCGGCC
305 CCCACTCTTCCCCTGACTCCGACGGCGGGTTCGTCCTGCCCAGACATGCCCG
306 GTCCCCCTCTCTCTCTGCCCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGG
307 CAGCCTGCTGAGGGGAAGAGGGGGTCTCCGCTCTTCCTCAGTGCACTCTCTGACTGAAGCCCGGC
308 ACTGACTCCGGAGGCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAG
309 GGCCAGGCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCG
310 CCGGGCTTCTGGGACGCTCAGCCGTGCGCTACCCGGTGCAGCTGCTTTCTCACC
311 TTTAGGTAGACGTGGAGGCGACTCAGATCGCCTCGCGGTTCCCGGGATGGCGCGGTCG
312 TGACCAGGACCGCAGGCAAGCACCGCGGCGACGGTTCCAGCCAGGAAAATGAG
313 GGGCCGGACCCGGCCTCTGGCTCGCTCCTGCTCTTTCTCAAACATGGCGCG
314 GCCGCGCTCCTCGCACCGCCTTCTCCGCAGGTCTTTATTCATCATCTCATCTCCCTCTTCCCC - 1 1 0 -
315 GAGCTGCGAACTGGTCGGCGGCGCAAGGCGCGGACTCCGGTGAGTTGTGT
316 GCCCGCGTTCCTCTCCCTCCCGCCTACCGCCACTTTCCCGCCCTGTGTGC
317 ACGCGTCGCGGAGTCCTCACTGCCCCGCCTCGCTCTGGCAGAGTGGGGAG
318 GCGAGCAGCGGCCTCCAGCGCTGGTGGCTCCCTTTATAGGAGCGCTGGAGACACGGG
319 GGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCC
320 CTCTGCTCGCGTGCTGCTCTGAAGTTGTTCCCCGATGCGCCGTAGGAAGCTGGGATTCTCCCA
321 AGGGAGGTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGA
322 GCCCCTGGCCCTGACTGCTGGTGCGAGGCAGTGCACGACTCAGCTGGCCG
323 GGCCGGGTAACGGAGAGGGAGTCGCCAGGAATGTGGCTCTGGGGACTGCCTCGCTCG
324 GGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGGGAACGACGCGCTGCTTTCATGCCC
325 CAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGGGAAAGCGCGGACGC
326 GTCTATCAAAAGTCTTTTCGTTTCCCCCTCCCCCTTTCCCCACCGCCCACCAAAATGAGCCGCG
327 ATGCCGCCATCGCGGTTCATGCCGTTCTCGTGGTTCACACCGCCCTCAGGG
328 TCCCGGTCTTCGGATCCGAGCCGGTCCTCGGGAAAGAGCCTGCCACCGCGT
329 TGAGAGGCTCCGGTAAAGCCGTCCGGCAATGTTCCACCTGGAAAGTTCCAGGGCAGGGGAAGGG
330 CCCAGGGAGAGGGAGAGGAGGCGGGTGGGAGAGGAGGAGGGTGTATCTCCTTTCGTCGGCCCG
331 CCCGTCTTCTCTCCCGCAGCTGCCTCAGTCGGCTACTCTCAGCCAACCCC
332 GACCCCCCTTTGGCCCCCTACCCTGCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCA
333 CCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCTCAC 334 CGCGCGGCCTTCCCCCTGCGAGGATCGCCATTGGCCCGGGTTGGCTTTGGAAAGCGG
335 CCACCCAGTTCAACGTTCCACGAACCCCCAGAACCAGCCCTCATCAACAGGCAGCAAGAAGGGCC
336 AAGCAGCTGTGTAATCCGCTGGATGCGGACCAGGGCGCTCCCCATTCCCGTCGGGAG
337 CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT
338 CCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCTTGCGGAAGTCAGT
339 CCCTCCACCGGAAGTGAAACCGAAACGGAGCTGAGCGCCTGACTGAGGCCGAACCCCC
340 TTGTCCCTTTTTCGTTTGCTCATCCTTTTTGGCGCTAACTCTTAGGCAGCCAGCCCAGCAGCCCG 341 TTCTCAGGCCTATGCCGGAGCCTCGAGGGCTGGAGAGCGGGAAGACAGGCAGTGCTCGG
342 CAGCGTTTCCTGTGGCCTCTGGGACCTCTTGGCCAGGGACAAGGACCCGTGACTTCCTTGCTTGC
343 AGGCAGGCCCGCAAGCCGTGTGAGCCGTCGCAGCCGTGGCATCGTTGAGGAGTGCTGTTT
344 GACTCTGGGTATGTTCTCGAAAGTTGTTACAACCCCAACCCAGGGTTGACCTCAAACACAGGAGG
345 CTCTGGCTCTCCTGCTCCATCGCGCTCCTCCGCGCCCTTGCCACCTCCAACGCCCGT
346 CGGGAGCGCGGCTGTTCCTGGTAGGGCCGTGTCAGGTGACGGATGTAGCTAGGGGGCG
347 CCCCAAGCCGCAGAAGGACGACGGGAGGGTAATGAAGCTGAGCCCAGGTCTCCTAGGAAGGAGA
348 GGGCTCTTCCGCCAGCACCGGAGGAAGAAAGAGGAGGGGCTGGCTGGTCACCAGAGGGTG
349 TTCTCTTCCATCCCATCCTCCCTTCTGGTCCTCCTTTCCACAGTGGGAGTCCGTGCTCCTGCTCC
350 CCGCCTCTGTGCCTCCGCCAACCCGACAACGCTTGCTCCCACCCCGATCCCCGCACC
351 CCGCGCCACGTGAGGGCGGCAAGAGGGCACTGGCCCTGCGGCGAGGCCCCAGCGAGG
352 CACTGCTGATAGGTGCAGGCAGGACAGTCCCTCCACCGCGGCTCGGGGCGTCCTGATT
353 CGGGAGCCTCGCGGACGTGACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGAC 354 TGTCCTCCCGGTGTCCCGCTTCTCCGCGCCCCAGCCGCCGGCTGCCAGCTTTTCGGG - I l l -
355 GGTGTCGCGACAGGTCCTATTGCGGGTGTCTGCGGTGGGAAGGGCGGTGGTGACTGG
356 ACATATGACAACGCCTGCCATATTGTCCCTGCGGCAAAACCCAACACGAAAAGCACACAGCA
357 GGAAACCCTCACCCAGGAGATACACAGGAGCACTGGCTTTGGCAGCAGCTCACAATGAGAAAGA
358 TTACCATTGGCTTAGGGAAAGGAGCTTACTGGGAACTGGGAGCTAGGTGGCCTGAGGAGACTGGG
359 AAGAACAGGCACGCGTGCTGGCAGAAACCCCCGGTATGACCGTGAAAACGGCCCGCC
360 ccggggactccagggcgcccctctgcggccgacgcccggggtgcagcggccgccggggctggggccg- gcgggagtccgcgggaccctccagaagagcggccggcgccgtgactca
361 TCACGGGGGCGGGGAGACGC
362 GCACAGGGTGGGGCAGGGAGCA
363 accgggccttccgcgcccct
364 TCCCACCTCCCCCAACATTCCAGTTCCT 365 TCACAGAGCCAGGCAAGCATGGGTGA
366 ggagcagcaggctcgctcgggga
367 gcccaaagtgcggggccaaccc
368 CGGAAAGAGGAAGGCATTTGCTGGGCAAT
369 CCAGCGGCCCCGCGGGATTT
370 ccgacagcgcccggcccaga
371 TGGGCCAATCCCCGCGGCTG 372 GGGCGGCTGCGGGGAGCGAT 373 CGCCAGGACCGCGCACAGCA 374 GCGGGCAAGAGAGCGCGggag 375 AGCGCGCAGCCAGGGGCGAC 376 CGTGCGCTCACCCAGCCGCAG 377 TGAGGGCCCGGGGTGGGGCT 378 ATATGCgcccggcgcggtgg
379 CCGCAGGGGAAGGCCGGGGA
380 TCCTGAGGCGGGGCCGTCCG
381 GGAGGCCGGGGACGCCGAGA
382 GCCGCCGGCTCCCCCGTATG 383 GCAGGAGCGACGCGCGCCAA
384 cgggggaaacgcaggcgtcgg
385 ccccccaccctggacccgcag 386 CGCCCGGCTTTCCGGCGCAC 387 ccgctgggccgccccTTGCT
388 CGCTTCTCCATAGCTCGCCACACACACAC
389 TCCGCGCACGCGCAAGTCCA
390 CGTCTCAACTCACCGCCGCCACCG
391 GACAAATGCGCTGCTCGGAGAGACTGCC
392 TGCGCCTGCGCAGTGCAGCTTAGTG
393 gaagtcaagggctttcaacctcccctgcc - 1 12 -
394 tggatcccgcacaggggctgca
395 GCCGCCTGTGGTTTTCCGCGCAT
396 Gcgcgctctcccgcgcctct
397 TTCCGGCCCAGCCCCAACCC
398 TCCGGGTCAGGCGCACAGGGC
399 GGGGGCGGTGCCTGCGCCATA
400 GGCGCGGGCCCTCAGGTTCTCC
401 gcgtccgcggcTCCTCAGCG
402 GGGAGGCGCCCAGCGAGCCA
403 GCGCGCAGGGGGCCTTATACAAAGTCG
404 CCCCCACCCCCTTTCTTTCTGGGTTTTG
405 CGCGCGTTCCCTCCCGTCCG
406 gccggcggAGGCAGCCGTTC 407 TGCCTGGTGCCCCGAGCGAGC 408 CGGCGGCGGCGCTACCTGGA 409 GTGGTGGCCAGCGGGGAGCG
410 GGCGGCACTGAACTCGCGGCAA
411 CCTCGGCGATCCCCGGCCTGA
412 ACGCAGGGAGCGCGCGGAGG
413 TGAAATACTCCCCCACAGTTTTCATGTG
414 TCCGGGCGCACGGGGAGCTG
415 ggcggcggcgTCCAGCCAGA
416 AGGGTCGCCGAGGCCGTGCG
417 CCGCGCCTGATGCACGTGGG
418 gccgggagcgggcggaggaa
419 AGGGGCGCACCGGGCTGGCT 420 TGCCACGGGAGGAGGCGGGAA
421 cgggcatcggcgcgggatga
422 acaccgccggcgcccaccac
423 CCCCCAACAGCGCGCAGCGA 424 GCCCCGCTGGGGACCTGGGA 425 TCCCGGGGGACCCACTCGAGGC 426 GCCCGCGGAGGGGCACACCA 427 GGCCCACGTGCTCGCGCCAA 428 CGGCGGAGCGGCGAGGAGGA 429 GCCTCGCCGGTTCCCGGGTG
430 gcaggcgcgccgATGGCGTT
431 CCTCCCGGCTTCTGCATCGAGGGC
432 GCGGTCCGCGAGTGGGAGCG
433 AGCAGCGCCGCCTCCCACCC - 113 -
434 CCGACCGTGCTGGCGGCGAC
435 TCCCGGGCTCCGCTCGCCAA
436 GCATGGGGTGCTCATCTTCCCGGAGC
437 CCCGAGAGCCGGAGCGGGGA
438 GCCGCTGCAGGGCGTCTGGG
439 gcgctgccccaagctggcttcc
440 TCAGGATGCCAGCGTGACGGAAGCAA
441 GGGCGGTGCCATCGCGTCCA
442 GGTGGGTCGCCGCCGGGAGA
443 AGGCGGAGGGCCACGCAGGG 444 GGTCCGGGGGCGCCGCTGAT 445 GCGGCCTGCGGCTCGGTTCC 446 CGGGAACCGTGGCGGCCCCT
447 gcggggaaggcggggaaggc
448 gcctcccggtttcaggcc
449 CAGCCCGCGCACCGACCAGC
450 CCCCCAGCCACACCAGACGTGGG
451 tgggcttcctgccccatggttccct
452 TCCGCGCTGGGCCGCAGCTTT
453 gcatggcccggtggcctgca
454 TGGGCAGGGGAGGGGAGTGCTTGA
455 TCCCCGGCGCCTTCCTCCTCC
456 TCCACCGCGCTTCCCGGCTATGC
457 CCCGCATCTGACCGCAGGACCCC
458 TGCGGACACGTGCTTTTCCCGCAT
459 GGAGCTGGAAGAGTTTGTGAGGGCGGTCC
460 CGGCCGCCAACGACGCCAGA
461 AGCGCCCGGTCAGCCCGCAG
462 TCCCGCCAGGCCCAGCCCCT
463 CCGATTCTTCCCAGCAGATGGCCCCAA
464 ACGCACACCGCCCCCAAGCG
465 TAGGCCCCGAGGCCGGAGCG
466 GGGGTTCGCGCGAGCGCTTTG
467 GCCAGTCTCCCGCCCCCTGAGCA
468 TGAGGAGGCAGCGGACCGGGGA
469 GCCGGCTCCACGGACCCACG
470 GCCGCCACCGCCACCATGCC
471 TTGAGTAAGGATGATACCGAGAGGGAAGA
472 tgggccaggcacggtggctca
473 CCCGGCGAAGTGGGCGGCTC - 1 14 -
474 GGCGGCCTTACCCTGCCGCGAG
475 ggtggggccggcgAGGGTCA
476 TCGGCGCGGACCGGCTCCTCTA
477 GGCCCATGCGGCCCCGTCAC
478 TGGGATTGCCAGGGGCTGACCG
479 CGCCGGAGCACGCGGCTACTCA
480 CCCTCGGCGCCGGCCCGTTA
481 GCACAGCGGCGGCGAGTGGG
482 TCACCTCGGGCGGGGCGGAC
483 GAGACGGGGCCGGGCGCAGA
484 CGCATTCGGGCCGCAAGCTCC 485 GGCCCGAAAGGGCCGGAGCG 486 ACGGCGGCCGGGTGACCGAC
487 TCCACCGGCGGCCGCTCACC
488 GCGGTCAGGGACCCCCTTCCCC 489 CGGCCGAAGCTGCCGCCCCT 490 GGCGGCCTTGTGCCGCTGGG
491 TCGCGGGAGGAGCGGCGAGG
492 TGCCCACCAGAAGCccatcaccacc
493 TGGGCCATGTGCCCCACCCC
494 CCCGCCAGCCCAGGGCGAGA
495 gccccctgtccctttcccgggact
496 GGTGGGGGTCCGCACCCAGCAAT
497 ggggcccccgggTTGCGTGA
498 TGCCTGCACAGACGACAGCACCCC
499 AGGCCGCGCCGGGCTCAGGT
500 CGGGGTAGTCGCGCAGGTGTCGG
501 tgcaggcggagaatagcagcctccctc
502 ccggaaatgctgctgcaagaggca
503 gcgtcggatccctgagaacttcgaagcca
504 CCCGGCTCCGCGGGTTCCGT
505 GCGTCGCCGGGGCTGGACGTT 506 GGGGCCTGCCGCCTCGTCCA 507 CGCACACCGCTGGCGGACACC
508 CGCAAACCATCTTCCCCGACGCCTT
509 GGGCCCTCCGCCGCCTCCAA
510 CCACCACCGTGGCAAAGCGTCCC
511 TCACAGCCCCTTCCTGCCCGAACA
512 TGCTTGATGCTCACCACTGTTCTTGCTGC
513 ggccaggcccggtggctcaca - 115 -
514 TGCGGGACGGGTGGCGGGAA
515 gGCTTGGCCCCGCCACCCAG
516 GGCGGGGAAGGCGACCGCAG
517 ggcgcccaaccaccacgcc
518 GAAAAGCCCCGGCCGGCCTCC
519 CCGCAGGTGCGGGGGAGCGT 520 CCCCGCCCACAGCGCGGAGTT 521 AGCAGGGGCCCGGGGGCGAT
522 CCATGACCGCGGTGGCTTGTGGG
523 GGCAGGTGCTCAGCGGGCAGACG
524 GGGTGCGCCCTGCGCTGGCT
525 GAATTTGGTCCTCCTGCGCCTGCCA
526 TGGCTTCCGCGGCGCCAATC
527 GGCCAGGAGAGGGGCCGAGCCT
528 cgagcgccggccccccttct
529 CGGTTGCGAGGGCACCCTTTGGC
530 tacccggacgcggtggcg
531 GCGCCGCCGAGCCTCAGCCA
532 tgcagcctcaacctcctgggg
533 CCTTGCCGACCCAGCCTCGATCCC
534 GGCGGCGTTCGGTGGTGTCCC
535 CCCGGACTCCCCCGCGCAGA
536 cggccccctgcaagttccgc
537 TGCCCAGGGGAGCCCTCCA
538 GCCGGCTGCAGGCCCTCACTGGT
539 TGTCACACCTGCCGATGAAACTCCTGCG
540 CCCCTGCGCACCCCTACCAGGCA
541 TCCTGGGGGAGCGCGGTGGG
542 AGTGGGGCCGGGCGAGTGCG
543 GCGTCCAGGCTGTGCGctcccc 544 GGCGCGGCGGTGCAGCCTCT 545 gaggcggcggcggtggcagt 546 CGCGCGACCCGCCGATTGTG
547 CCGCGGACGCCGCTCTGCAC
548 tgaacccgggaggcggaggttgc 549 TCTCGGCGGCGCGGGGAGTC
550 aggcggccacgggaggggga
551 GGACCCGAGCGGGGCGGAGA
552 AAGCACCTggggcggggcggag
553 GCCGCTCGGGGGACGTGGGA - 1 1 6 -
554 CACCGCCAGCGTGCCAGCCC
555 TATTCTTggccgggtgcggt
556 CCGCTTCCCGCGAGCGAGCC
557 CAGCCGGCGCTCCGCACCTG
558 GCGGAGCGCGCTTGGCCTCA
559 ggcctcgagcccacccagacttggc
560 TGCCGCGCCGTAAGGGCCACC
561 ACGGCGGTGGCGGTGGGTCG
562 AACCTGCCCAGTTACTGCCCCACTCCG
563 TCCAGCGCCCGAGCCGTCCA
564 GCTGCTGCTGCCCGCGTCCG
565 CACTGCTTAGGCCACACGATCCCCCAA
566 GGCCGGACGCGCCTCCCAAG
567 TCGGCCAGGGTGCCGAGGGC
568 tccgcccgcccCACAGCCAG
569 CGCGCCCCAGCCCACCCACT
570 ccgtgctgggcgcaggggaa
571 TGCGCACGCGCACAGCCTCC 572 CGGTGAGTGCGGCCCGGGGA
573 TGGCCGAGAGGGAGCCCCACACC
574 CCCAGCGCCGCAACGCCCAG
575 GCCACAAGCGGGCGGGACGG
576 TCCTCTGGACAACGGGGAGCGGGAA
577 CGCGGGTTCCCGGCGTCTCC
578 GCGCCGCCCGTCCTGCTTGC
579 ACGCGCGGCCCTCCTGCACC
580 GGGCGGGGCAAGCCCTCACCTG
581 GGGAGCGCCCCCTGGCGGTT
582 GCGAATGGTTCGCGCCGGCCT
583 TTTCCGCCGGCTGGGCCCTC
584 TCTCCGGGTcccccgcgtgc
585 GCAGCCCGGGTAGGGTTCACCGAAA
586 GGGCGGAGAGAGGTCCTGCCCAGC
587 CCCTCACCCCAGCCGCGACCCTT
588 GCGATGACGGGATCCGAGAGAAAGGCA
589 TCCGCAGGCCGCGGGAAAGG
590 GGCCCCAGTCCACCTCTGGGAGCG
591 GCTTGGCCGCCCCCGGGATG
592 CCCTCCATGCGCAATCCCAAGGGC
593 gcggcgactgcgctgcccct - 1 17 -
594 TGGGCTTGCCTCCCCGCCCCT
595 GGCGGCCCAAGGAGGGCGAA
596 gctgcgcggcTGGCGATCCA
597 TCACCGCCTCCGGACCCCTCCC
598 CCCTTCCAGCCACCCCGCCCTG
599 GCGGGACACCGGGAGGACAGCG
600 CCCTGGGTTCCCGGCTTCTCAGCCA 601 TGGCGGTGATGGGCggaggagg
602 CCAGCCCGCCCGGAGCCCAT
603 TGCCCGCGGGGGAATCGCAG
604 TGCCGCGAGCCCGTCTGCTCC
605 TGCGGCCCCCTCCCGGCTGA
606 GCAGCAGGGCGCGGCTTCCC
607 GCCGCAGCACGCTCGGACGG
608 TGCGGAGTGCGGGTCGGGAAGC
609 GGCGCGGGGGCAGGTGAGCA
610 ggcgcgggggcaggtgagcat
611 CAGTGACGGGCGGTGGGCCTG
612 CGGCGACCCTTTGGCCGCTGG
613 CCGCGGCAGCCCGGGTGAA
614 GGGCGAGCGAGCGGGACCGA
615 TGGGGCAGTGCCGGTGTGCTG
616 TCGCTGGCATTCGGGCCCCCT
617 GGAGCCGTGATGGAGCCGGGAGG
618 TGCCAGGGTGTCTTGGCTCTGGCCT
619 CCGGCTCCGGCGGGGAAGGA 620 GGCCAGGGTGCCGTCGCGCTT
621 TCGGCTCGGTCCTGAGGAGAAGGACTCA
622 GCGCGGGGAACCTGCGGCTG
623 GCCGCCGCTGCTTTGGGTGGG 624 CACCTGAGCCCGCGGGGGAAcc 625 GAACGCCGGCCTCACCGGCA
626 CCCGTGGTCCCAGCGCTCCTGCT
627 GTGCGACCCGGCGCCCAAGC
628 TGGCTCTGCGCTGCCTTTGGTGGC
629 cgcgcgggcggcTCCTTTGT
630 TGGCCCGTTGGCGAGGTTAGAGCG
631 gacccggcatccgggcaggc
632 GCCCGGACTGTAATCACGTCCACTGGGA
633 CCGCCGCCAACGCGCAGGTC - 1 1 8 -
634 CGCTGCCAGCTGCCGCTCCG
635 AGCGCCCACCTGCGCCTCGC 636 GCGGGCCAGGGCGGCATGAA 637 GGCTGCGACCTGGGGTCCGACG 638 GGTTAGGAGGGCGGGGCGCGTG
639 CAGCGCACCAACGCAGGCGAGG
640 TCGGCTGGCCCCGCCCACTC
641 CGGGGTTGCCGTCGCAGCCA
642 TCCGCACTCCCGCCCGGTTCC
643 ggaccccctgggcagcaccctg
644 cgaggcagccggatcacg
645 GGCGCGTGCGGGCGTTGTCC
646 CCAGGATGCGGCAGCGCCCAC 647 cgATGCGGCCCGCGGAGGAG 648 CGTTCTGCGCGCGCCCGACTC
649 CCCCGCCGTGGGCGTAGTAAccg
650 AACCCGCCCGGGCAGCTCCA
651 GCAGCGGTCGCGCCTCGTCG
652 CGCAATCGCGCTGTCTCTGAAAGGGG
653 GGAGCGCCCGCCGTTGATGCC
654 CCATGGCCCGCTGCGCCCTC
655 TGGGGGCGGGGTGCAGGGGT
656 CCGACCCTGCGCCCGGCAGT
657 CGGCTTCAAGTCCACGGCCCTGTGATG
658 ACCCCACCTGCCCGCGCTGC
659 ggcgcgcggagacgcagcag
660 CGTGAGCCGGCGCTCCTGATGC
661 CTGCCGCGGGGGTGCCAAGG
662 CCTGCTGCGCGCGCTGGCTC
663 CCTGGCGGCCCAGGTCGCTCCT
664 GAGCGCCCCGGCCGCCTGAT
665 CGCCGCACGGGACAGCCAGG
666 GCCCGGACATGCCCCGCCAC
667 cgggggccgccgcctgactt
668 CCAGTGGCGGCCCTCGGCCT
669 CGCCCGGCGCGGATAACGGTC 670 TGCTCCGGGTGGGGAGGGAGGC
671 TGCCTGGGCGCAGAACGGGGTC
672 GGGTCCTAATCCCCAGGCTGCGCTGA
673 TCCGCGTCCCCGGCTGCTCC - 1 1 9 -
674 GGGCAGGGCTGACGTTGGGAGCG
675 GCCGTGGGCGCAGGGGCTGT
676 cctgcgcacgcgggaagggc
677 CGCGGACGCAGCCGAGCTCAA
678 CGACCCATGGCGGGGCAGGC
679 tccgctccccgcccctggct
680 tgtgccgcgcggttgggagg
681 TCACTCACGCTCTCAGCCCGGGGA
682 CGGCAAGCGGGCTTCGGGAAGAA 683 CCCCGCGGGCCGGGTGAGAA
684 CGGCGGCGGCTGGAGAGCGA
685 CGGGCCCCGGGACTCGGCTT
686 GACGGAATGTGGGGTGCGGGCCT
687 TGCGGCTGCTGCCGAGGCTCC
688 ACCGCTGCGCGAGGGAgggg
689 GGGGGTGCGGCGTCTGGTCAGC
690 GGCCGGGGGAAATGCGGCCT
691 tgcctggtaggactgacggctgcctttg
692 AGCGCGGGCGCCTCGATCTCC
693 TCCCGGCTGGTCGGCGCTCCT
694 CCGGGGCTGGGACGGCGCTT
695 GGGCGGGGTGGGGCTGGAGC 696 GTGCGGTTGGGCGGGGCCCT
697 GGCGGTGCCTCCGGGGCTCA
698 GGCGGTGCCTCCGGGGCTCA
699 CGGGAGCCCGCCCCCGAGAG
700 TCCTGCCATCCGCGCCTTTGCA
701 AGGCACAGGGGCAGCTCCGGCAC
702 CGACCCCTCCGACCGTGCTTCCG
703 CCCGCAGGGTGGCTGCGTCC
704 GCGTCTGCCGGCCCCTCCCC
705 TAGGCCGCCGGGCAGCCACC
706 GGGGAGCGGGGACGCGAGCA
707 GCCGGCTGGCTCCCCACTCTGC
708 TCGCTCACGGCGTCCCCTTGCC 709 TCCCCGCTGCCCTGGCGCTC
710 GGCCAGAGGCAGGCCCGCAGC
711 TGCCCGGGTCATCGGACGGGAG
712 CCCAGTGCGCACGGCGAGGC
713 AGCGTCCCAGCCCGCGCACC - 120 -
714 TGCTCCCCCGGGTCGGAGCC
715 CGCTCGCATTGGGGCGCGTC
716 TGCGGCAAGCCCGCCATGATG
717 TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTG
718 GGACCATGAGTGTTTCCATGCTTGGCATCAGA
719 tcagccactgcttcgcaggctgacg
720 cggccagctgcgcggcgact
721 TCGGAGAAGCGCGAGGGGTCCA
722 GCCGGGTGGGGGCTGCCTTG
723 tcctcgcccggcgcgattgg
724 GGCCGTGCAGTTGGTCCCCTGGC
725 GCGAGCCTGCTGCTCCTCTGGCACC
726 gccagagctgtgcaggctcggcattt
727 tgcccagcaaatgccttcctctttccg 728 TGGCCTGACCACCAATGCAGGGGA
729 TCCACCTGGGCTTCTGGGCAGGGA
730 agctggcctgcgccccgctg
731 AGCCGCGGCAGCGCCAGTCC
732 GGGGCCGGGCCGCTCAGTCTCT
733 GCAGTGAGCGTCAGGAGCACGTCCAGG
734 cccgATCCCCCGGCGCGAAT
735 GGCGTGACCGTGGCGCGGAA
736 AGCGGCCCGCAGAGCTCCACCC
737 GGCAGGCGGGCGCAGGGAAG
738 tctgccccgggttcacgccat
739 CGGGCGGGCCCTGGCGAGTA
740 GCAAGCCCGCCACCCCAGGGAC
741 GGCCCAGGCGGATGGGGTTGG
742 TCCGAGAGGCGTGTGGTAGCGGGAGA
743 AGGCGGCCGCGGGCGTTAGC
744 aaggcagcgcgggccaccga
745 ggcatcctgcccgccgcctg 746 TGGGGCGGGGTCTCGCCGTC
747 TCGGGCTCGCGCACCTCCCC
748 CCAGGTGCGCGCTTCGCTCCC
749 ACCTGCGCCACCGCCCCACC
750 GCCGAGCAGAGGGGGCACCTGG
751 TCGCGCCGCTCTGCGTTGGG
752 CCGCCGGGGCAGAAGGCGAG
753 TCcactggacaggggtgggagcctctg - 12 1 -
754 gcccaccggcgctgcgctct
755 GCGGTGCCAGCCCCGCTGTG 756 GACCCGCCTGCGTCCTCCAGGG
757 CCCATCACAGCCGCCCAACCAGC
758 GAGCggggcggagccgagga
759 TGCAATTGTGCAGTGGCTGCGTTTGTTTC
760 CCCGACCGGATGCTCCTTGACTTTGCC
761 GCGAGCGCGCGCACCGATTG
762 CACTCCGCCGGCCGCTCCTCA
763 TCGGGGGTCCCGGCCGAATG
764 GCTCTCCCAGCTGCACGCCAACTTCTTG
765 GGAGGAGCCTGGCGCTGGCGAGT
766 TGGCTCTGGACCGCAGCCGGGTA
767 ACGGCGGCGTCCCGGGTCAA
768 TGGCCAAGCGCTGCCACTCGGA
769 CGCAGGCCGCTGCGGTGGAG
770 GCGCCTGCGCCATGTCCACCA
771 TGGTGCCTCCCGCAACCCTTGGC
772 GCCCGGCTCCAGGCGGGGAA 773 GCAATGCTGGCTGACCTGGACC 774 CGCCCGCCCGTCGGGATGAG 775 TGCCCCCACCATCCCCCACCA 776 GGCGCGAGCGGCGGGAACTG
777 GGCGCCGCTCGCGCATGGT
778 CCCGCTCTGCCCCGTCGCAC
779 GTAGCGCGGGCGAGCgggga
780 AGCGCCGAGCAGGGCGCGAA
781 ggcggcggccacgcaggttc
782 ccctcccgcacgctgggttgc
783 TCACGGCCGCATCCGCCACA
784 CGGCGCCGGCCGCTCTTCTG
785 CCGGCAGAGAATgggagcgggagg 786 TCGGCCGGGGCGCCAGGTCT
787 TGGGGCTGCGGGCGATGCCT
788 GGCTGCGGGGACCGGGGTGT
789 CGGCCCAAGCCGCGCCTCAC
790 ccgcgcccggAACCGCTGCT
791 TCGGCCGGGAGCGTGGGAGC
792 TGCAGACATTGGCGCGTTCCTCCA
793 GGACCCACGCGCCGAGCCCAT - 122 -
794 GGAGGGGGCGAGTGAGGGATTAGGTCCG
795 TCCCCTCACGCCGATGCCACG
796 CCATGCCCGCCCCAGCTCCTCA
797 CCGCCGTGATGTTCTGTTCGCCACC
798 CGTGGCTGCCCCTGCACTCGTCG
799 tctggccagtccgtgaaggcctctga 800 CCGGGGTGCAAGGGCCACGC
801 cgccgcgcTTCCTCCCGACG
802 AGCGACCCGGGGCGTGAGGC
803 TGCGGAAACCTATCACCGCTTCCTTTCCA
804 ggcagggcggggcagggttg
805 GGGTCTCCAGACTGATGGGCCGGTGA 806 CGCTGAAGCCGCTGCTGTCGCTGA
807 TCCCACGCTCCCGCCGAGCC
808 AAATATgccggacgcggtgg 809 CGCCTTTCCGCGGCGGGAGC
810 CCCAGCCCAGGCCGCAGGCA
811 ccccgcaggggacctcataacccaa
812 GAGTTGGCTCGGCGTCCCTGGCA
813 tccctccgcctggtgggtcccc
814 TGACCCCTGGCACATCAGGAAAGGGC
815 TGCCCCGCAAGAACGGCCCAG
816 GGCCTCGGAGTGCGACGCGAGC
817 GCGCCAACCCAGACCCGCGCTT
818 TGCAAGCGCGGAGGCTGCGA
819 AGCCGGGCCACGGGCAGACA 820 CCCGGGCGGCCACAAAGGGC 821 CCCCATCCCAGGTGACCGCCCTG 822 TGACTCTGGGGGAAGCACGCGACG
823 GGTGCGGCCGAAGCCGTCGC
824 TGCCCCTCGGGCCCTCGCTG 825 GGCCACGGGGACCGGGGACA 826 GGGCGCCGCAGGGCGACAAC
827 GCAGCGCGCTTTGGGAAGGAAGGC
828 GGGTTCCACCCGCGCCCACG
829 TCGCGGCCCAGACCCCCGAC
830 CGAGACCCGGTGCGCCTGGGAG
831 aggtgcccgccaccatgc
832 cgcccaggctggagtgcagtggc
833 GCCGGCGAGGTCTCCGCGGTCT - 123 -
834 CGCAGGGCCACCGGCTCGGA
835 GCCCCGGAGCATGCGCGAGA
836 CCCCTGGGGACCCCTGCCATCCTT
837 TTAccccgcgccgcgccacc
838 GCGGGCCGAGCCCACCAACC
839 GCGCGGTGGCCGCTTGGAGG
840 CCCGCCAGCGGCctgtgcct
841 CGCGCATGCCAAGCCCGCTG
842 GGCGCAGGAGCAGTTGGGGTCCA
843 TGGGGTAGGCGGAACGCCAAGGG
844 CCCGCTTCACGCCCCCACCG
845 GCAGCCCGGGTGGGCAAGGC
846 TGCAGTTGCCCTTGCCCTGCGAC 847 TGGCCGGGCGCCTCCATCGT
848 GCCTGCGATGGGCTCGGTGGG
849 CCGCGGTTCGCATGGCGCTC
850 TGGGCCATCTCGAGCCGCTGCC
851 TGGGGGAGTGCGGGTCGGAGC
852 CTGCCGCGCCCCCAGCACCT
853 GGCTGCTGGCGGGGCCGTCT
854 GGGCGCGGCGACTTGGGGGT
855 aaactgcgactgcgcggcgtgag
856 TGCTGGGGCCGTGGGGGTGC
857 TCCGCGCTGCCCGGGTCCTT
858 GTGGCGGCCCCCGCGGATCT 859 GGGGAGGCGCCACCGCCGTT 860 GGAGCGGGAGGGCGCTGGGA
861 tgaaggctgtcagtcgtggaagtgagaagtgc
862 ggagaaaatccaattgaaggctgtcagtcgtgg
863 ggggacaaccggggcggatccc
864 CCCGGGAGGAGAGGCGAACAGCG
865 AGTGCGCGGGTGCCGGGTGG
866 TGGCATCCCCTACCCGGGCCCTA
867 GAGGCTGGTTCCTTGTCGTCGGTTGGG
868 GCGGGGTCAGGCCGGGGTCA
869 GGCAGCGGCTGGAGCGGTGTCA
870 GCCCGGGCACACGCCCCATC
871 gcaccgccacgcccactgcc
872 TGTCATGCTTCTTTCTCCCCACTGACTCA
873 gcccaggctggggtgcaatggc - 124 -
874 CGCCTCGGGGGCCACGGCAT
875 CGTGGGTCCTGGCCCGGGGA
876 TCCCCGGGCGGCCATTAGGCA
877 GGCGGGGGTGGGAGTGATCCC
878 CGTCAGTCCCGGCTGCGAGTCCA
879 CCGGGGTCCGCGCCATGCTG
880 CATGGCGGGGCCCGAGCGAC
881 CCGCCTCCTTGCCCCGACACCC
882 TCGGACACGCCTTCGCCTCAGCC
883 CGAGCTGGGCGCAGGCGCAA 884 GCGGGGTTGTGTGTGGCGGAGG 885 accgcgcccggccTGCAAAG 886 GCGGGGCCAGAGAGGCCGGAA
887 GCCCCAAGGGAAGATGCAGGGAGGAA
888 gccccaagggaagatgcagggaggaa 889 GCCCGCACGTGCACCACCCA
890 GGGTGACGAAGTGGTGTCTTTACCGAgga
891 CCGCCGTGCGCCTGTGGGAA
892 ggctgctgcgggaggatcac
893 TGGGCATCCAGAAAAATGGTGGTGATGGC
894 gccgcgccgggccCTATGAG
895 CCGCCATGCGGGCAGGGACC
896 TGTTACAggctggacacggtggctc
897 cggaacttgcagggggccga
898 TGCAAAATCCTCCCCTTCCCGCACCC
899 GCGCTGGAGCCACGCGACGA
900 GGGGTCCGCTCCCGCGTTCG
901 CGCCCCGGGCTGAGAGCTGGGT
902 GGCCCTTCGGGGGCCGGGTT
903 TGGCCACAAAGGGGCCGGAATGG
904 ACCCCAGCGCGTGGGCGGAG
905 GGGCTGCGGGGCGCCTTGAC
906 GCACCGCGGCTGGAGCGGAC
907 AGGCGATCCCAAGGCTGTTGGAGGC
908 tccacccgccttggcctccca
909 cggcgggaaggcggggcaag
910 ggagccgcggcgtgagtgcg
911 GGCCGGCACCCCACGCCAAG
912 GCGGGGCGGAGCGCACACCT
913 GCGGCCAGCAGCGCGTCCTC - 125 -
914 CCGACAGCCGGCAAGGCCCAA
915 ttgtttttgtttgtttgttttgaaagggag
916 CCCCGGTTTCCCCGCGCCTC
917 GGCTGGACGCGCCCTCCGACA
918 TCCCACGCGCCCGCCCCTAC
919 cggccacgccttccgcggtg 920 GGCTCCGCTGGGGCGCAGGT 921 GCCGCCCCGTGTCGTGCGTC
922 GGCGTCAGTTGGAGTGTGGGGTCGG
923 CCGAGCGGGGTGGGCCGGAT
924 CATCGCGCGGGACCCAACCCA
925 CAGTGGGTGGATCTCACCTGCCTTCGG
926 GAGGCCGCGGGGCTCCGACA
927 GAGCCTGCCCTATAAAATCCGGGGCTCG
928 TCCCGGCGGGTGGTGCCTGA
929 TCTGAGCGCCCGCCGCCTGC
930 GGCTGCCGGCGCGGGACCTA
931 TCCGGGGCATTCCCTCCGCGAT
932 TGGCGGCGGCCCCTGCTCGT
933 cggcgCGCGACTGGGAGGGA 934 GGCGCCAGCGCAACCAGAGCG 935 CGAAGGTGGCGCGGCCTGGA 936 CCCAGCGGGCTTCGCGGGAG
937 CCCGCTTGCCCCGCCCCCTA
938 CCCACACCTCCACCTGCTGGTGCCT
939 ATGCAGCCCCGCCGGCAACG
940 CCGGATGCCCGGTGTGCCTGG
941 GCGAGCAGGGACGCAGCTCTGGTG
942 CGCGCTCGGCCCGCTCAGTG 943 TGGTGCCGGCAGGGAGGGGA 944 GGGCGGTGGCGATGGCTGGC
945 GGCTGTTGGTCTTTTTCCCAGCCCCGAA
946 CCGGGCCGGCAGCGCAGATGT
947 CGGAGGGCGATGGGGCCCTG
948 GGGGCCGGGCTGCGAAGCTG
949 TGCCTGGGCACCCCACGGACG
950 GCCCTACGTCCGGGCAGCACGC
951 CTGTGCGCGTCCCCGCCGTG
952 TGCAGCGGCGCCTCGGACCC
953 ccgctgggcgcgctgggaag - 12 6 -
954 GGCGCATGCTCTGCGCGTATTGGC
955 GGGTGGGCGGGCCGTTCTGAGG 956 GGGCTGCCGGGTTGGCGCAG
957 GGCGCGTGCGGAAAAGCTGCG
958 TCCAGGCCGCCCTCGGGTCA 959 GGGGAGGGGGCGCAGCCAGA
960 GGCAGCGTGGTCTTCCACTTCCCCCT
961 GGGATCGAGGGATCGAGGCAGGGGA
962 CGGCCATGAGCGCCTCCACGC
963 CCCGGTGTGCGGCAGCGACG
964 TTGGGGCGGCCGGAAGCCAG
965 CGCAGCGGCGGCGTCTCGGT
966 CCGCGACCTCCCCAAGCCACCC 967 GGCGGCCGACCGCGAACACC
968 CCCCATTTCCGAGTCCGGCAGCA
969 CCCAGCCTGGCCTCTCCTCTCAGGCA
970 cggctctttcctcctcaagagatgcggtg
971 CGCCGCCGTCCCTGGTGCAG
972 TGGGGACCCCTCGCCGCCTG
973 GCGCCCAGCCCGCCCCAAGA
974 caggggacgcgggcgtgcag
975 CCGGGCGGGGCCCAACTGCT 976 CCCGAGCAGGGCCGGAGCAGA 977 CCCCTCCACATTCCCGCGGTCCT
978 TCCTTTGTGGCCTGGGCAGGATGCAG
979 GCAGCGCGCGGTTTGGGGCT
980 GAGGCCTGCGGGCGCTGCTG
981 TCACGGTTGCTGGGCCGTCGC
982 CGGGGTGGGCCTCGCGGAGA
983 GCCTGCGCTCCTGGCGCCCT
984 CGCCTTCGGAGAGCAGAGTCAACACGGA
985 TGCCCCTAAATGAGAAAGGGCCCTTGAG
986 GCCACGCCCCGGGACCGGAA
987 TCCCGCCCAGGGGCCTCCCA
988 ccccgcgcccggccAAAGAA
989 GGACCGCCGCACAGCCCCAA
990 GGGCAGCGGTGGCCGTGCAT
991 TTCCTGCGCCGCCCCCTCCC
992 GGCGTCTCCCTGTCCCCGCCTG
993 GCCGGCCTCGCGCACCGTGT - 127 -
994 CCCGGGACGTGCGCGCTTGG
995 TGTCCCCCGAGCCGCCCTGC
996 TCGCTCTCGTGCAGCGGCGTCA
997 CCCGCGCGCTGCAGCATCTCC
998 CCCCAGCTGCCGCCATCGCA 999 GCCCGGGCCCGCCTCAAGGA 1000 TGCCGGCGAGGCCTTTTCTCGG 1001 GGCGGGTGGGGAGCGCGAAC
1002 CCCGCCGCCGCTGGTCACCT
1003 ccggctgcctcggcctccca
1004 ggtgtgcaccaccacgcc
1005 GGCGCGTCCCGGCGGCTTCT
1006 AGTCCCTGCGCCCCGCCCTG
1007 TGCCCCCAAACTTTCCGCCTGCAC 1008 CTTGCGGCCACCCGGCGAGC
1009 TCGCGCGGAAACTCTGGCTCGG
1010 GCTGCGGCCCAGAGGGGGTGA
1011 CGGCGGGCTTGGGTCCCGTG
1012 TCCCCCGCCGCACCAGCACC
1013 GCGCGGTGCGGGGACCTGCT
1014 GCCGGACGCTCGCCCCGCAT
1015 GAGTGCTCTGCAGCCCCGACATGGG
1016 CCGCGCAGACGTCGGAGCCCAA
1017 TGGCCGAGGCGCGTGGCGAG
1018 GGCCGCGCTGCCCCAGGGAT
1019 CCGGGGGCGGACGCAGAGGA
1020 GGGGGCGGAGCCTGGGAATGGG
1021 GGGCGGGCCCTGTGGGTGGA
1022 CCGCTCCCCCATCTCCACGGACG 1023 GACCCAGGGAGGCGCGGGGA 1024 TGCCCGGCCGCAGGTGACCA
1025 GCGCCGGGAGTGGGCAGGGA
1026 ACCCAGGCCGGCGCGGGAAG
1027 ttcccgccgcccggtcctca
1028 CGCGCCGGTGACGGACGTGG
1029 AACCCTCCCAGCCAAAACGGGCTCA
1030 CGGGCGAGGCCGCCCTTTGG
1031 GGCCGCGGACGCCCAGGAAA
1032 CCGTTTGGAACGTGGCCCAAGAGGC
1033 CCCGCCTCCGCTCCCCGCTT - 128 -
1034 ggtggcggcggcagaggagga
1035 CGCGGGGAGCAGAGGCGGTG
1036 gggcgcccgcgctgagggt
1037 GGGCCTGGCCTCCCGGCGAT
1038 CACCCGGCGTCCGCACCAGC
1039 CGGCGCTGGTTTGGCGGCCT
1040 ccaggagccccggaggccacg
1041 GCGATCTCCTGCCCAGGTGTGTGCTC
1042 ACTGCCCGGGCTCGCCGCAC
1043 TGCGGCAACGGTGGCACCCC
1044 GGAGCGAAGCTGGCGGAACCCACC
1045 GGCGGCCGACGGGGCTTTGC
1046 GGCCGCGGGTGCCTCGGTCT
1047 GCGCTCCAGCCATGGCGCGTT 1048 GCCGGACGGGCGTGGGGAGA 1049 TCCCCCGCGACTGCCCCTCC 1050 GGGTGGCAGCGGGTGCGGAA 1051 gctcgcccgctcgcagccaa 1052 CGAGGTTCCGCAGCCCGAGCCA 1053 GCGCGGGGGACCGAAACCGTG
1054 GCCGAGCCCGGCCCAAAGCC
1055 TGCCAACGTTCACCCGGCTGGC
1056 GACAGTGCGAGGGAAAACCACCTTCCCC 1057 GGGTCGGGCCGGGCTGGAGC
1058 GGGTCGGGCCGGGCTGGAGC
1059 GCGGGGCCGAGGGGCTGAGC 1060 GCCCGGCCACCTCGGGGAGC
1061 ACTGTCTGCCAAGCCAGCCCCAGGG
1062 GGATGGTGGCGCCGGGCTGC
1063 TCCAGGAGGGCCAGGTCACAGCTGC
1064 CGGCTGGCTCGCTTGGCTGGC
1065 TCCGGCGCTGTTGGGCAGCC
1066 CCTGCGCACGCGGGAAGGGC
1067 TCTTCCCTTCTTTCCCACGCTGCTCCG
1068 CAGCGCCCCCGCCTCCAGCA
1069 GCTGCGCGGCTGGCGATCCA
1070 GCCGACGACCGGAGGGCCCACT
1071 TGCCCAGGCTGGCCCCTCGG
1072 CGCGGCCCTCCCCAGCCCTC
1073 CCCCGCCCGGCAACTGAGCG - 12 9 -
1074 AAGAGCCCGCGCGCCGAGCC
1075 TGCCCACTGCGGTTACCCCGCAT 1076 GCATGGTGGTGGACATGTGCGGTCA
1077 CATAGAAGAGGAAGGCAAAGGCTGTGACAGGCA
1078 TCATCCTAGACTTGCAGTCAAGATGCCTGCCC
1079 agccagcggtgccggtgccc
1080 gccccgctccgccccagtgc
1081 CACGGGGGCGGGGAGACGCGGGGTGCACTTCTCGCCCCGAGGGCCTCCGGCGAAGCAACCCGGCAGC-
Figure imgf000131_0001
TCCGA
1082 CACAGGGTGGGGCAGGGAGCATCAGGGGGCAGGCAGCCACACCCCCGACACATCAAGACACCTGAGT-
GGCAGGTTCAAGCCGGAGGCGCTGTATTTCCACACAGGAAGAAGGCCAAAAAAGGTGACACTGC-
CCCCTCCCAGTGGCTCCATGCTCCTCAGCTATGGCTGT(
TGCCAGGCTCCTTGCATGCAAGGCAGCCCCCACCCGGC
1083 accgggccttccgcgcccctcgccccacgccgcgggtgcggtcctccctccagcagagggttccgg- gcgccggcgcggcccgcacggggccgggagcccttcctgccggccgggtgcgcgcggcgccgccgacagct- gtttgccatcggcgccgctcccgcccgcgtcccggtgcgcgccccgcccccgccaacaaccgccgctctgattg gcccggcgcttgtctcttctctccccgcagccaatcgcgccggg
1084 CCCACCTCCCCCAACATTCCAGTTCCTTCTTTTCCTTCTACTCTTCAGCGGCCTCAGCCTGCGCAC-
Figure imgf000131_0002
AGGGGACCAACTGCACGGCC
1085 CACAGAGCCAGGCAAGCATGGGTGAGAGCTCAGACCATCCTTGTTGGACTAAAAGGAAGGG-
GCAGACTGCCCATGGGGGGCAGCCGAGAGGGTCAGGCCCC(
GATGGGGGGCTGAGTGGTGCCAGAGGAGCAGCAGGCTCGC
1086 ggagcagcaggctcgctcggggagagtagggccttaggatagaagggaaatgaactaaacaac- cagcttcctcccaaaccagtttcaggccagggctgggaatttcacaaaaaagcagaaggcgctctgtgaa- catttcctgccccgccccagcccccttcctggcagcattaccacactgctcacctgtgaagcaatcttccggag acagggccaaagggccaagtgccccagtcaggagctgcctataaatgc
1087 gcccaaagtgcggggccaacccagacagtcccacttaccaggtcttctgaaagacagctgacaa- gagacatgcagggctgagaggcagctcctttttatagcggttaggcttggccagctgcccacagcttcaggc- catcagagacagcttctccctgccagagttgctacagtctctggtttctcaaccaggtgaatgtggcaatcact gtgcagaatgaaaattttgggtggggaggtaggagaagcggaaag
1088 GGAAAGAGGAAGGCATTTGCTGGGCAATAGTGCCCAGAAGGAAAAAGCAGGTAGGGGG-
GCTCTTTTTCTGGGCTGCTGGCATCCACTTGCTTGA1
TGCGATTGCTAATCCCCTGCATTGGTGGTCAGGCCA
1089 CAGCGGCCCCGCGGGATTTTGCCCAGCTGCTTCGTGCCCTCTGGTGGCTAAGGCGTGTCATTGCAGT- - 130 -
AGTCCCTGCCCAGAAGCCCAGGTGGA
1090 ccgacagcgcccggcccagatccccacgcctgccaggagcaagccgagagccagccggccg- gcgcactccgactccgagcagtctctgtccttcgacccgagccccgcgccctttccgggacccctgc- cccgcgggcagcgctgccaacctgccggccatggagaccccgtcccagcggcgcgccacccgcagcggggcgca ggccagct
1091 GGGCCAATCCCCGCGGCTGGGCAGAGCGACCCGAGGGCGGCGCCCTGCAGACCACGTGGCCCGGGAG-
Figure imgf000132_0001
TGGGTGCTTCCGGGCCGGCTGGCGGGACTGGCGCTGCCGCG
1092 GGCGGCTGCGGGGAGCGATTTTCCAGCCCGGTTTGTGCTCTGTGTGTTTGTCTGCCTCTGGAGGGCT-
Figure imgf000132_0002
GAGGCGCCGGGCAGCGACCCCTGCAGCGGAGACAGAGACTGAGCG
1093 GCCAGGACCGCGCACAGCAGCAGGGCGCGGGCGAGCATCGCAGCGGCGGGCAGGGCGCGGCGCGGGG-
GTAGGCTTTGCTGTCTGA(
GCTCCTGACGCTCACTGC
1094 CGGGCAAGAGAGCGCGggaggaggaggaggagaaaaaggaggaggaggaggaggaggaggCGGC-
CCCGCATCCCTAATGAGGGAATGAATGGAGAGGCCCCCTCGGCTggcgcccgcccacccggcggcggccgc- cAAGTGCCTCTGGGCGCTGCGTGCCGCGCCCGCTGCTCCGCGCGCA(
GGCGCGGCGGCGGCGGTGGCTGTGACCGCGCGGACCGAGCCGAGAC
1095 GCGCGCAGCCAGGGGCGACGCTTCCGCTCCGAGCCGCGGCCCGGGGCCACGCGCTAAGGGC-
Figure imgf000132_0003
CTGACCGCCGCGCGCCGCCCTGCTGCTCACCTACTTCCGCGCCACGG
1096 GTGCGCTCACCCAGCCGCAGGCGCCTGAGCGGCCAGAGCCGCCACCGAACACGCCGCACCGGCCAC-
CGCCGTTCCCTGATAGATTGCTGATGCCTGGCCGCGGGAACGCCCACGGAACCCGCGTCCAcggggcggggc- cggcggcgcgcgcgccccctgccggccggggggcggAGTTTCCCGGGCGCCTGCCGGGTGGAGCTCTGCGGGCC
GCT
1097 GAGGGCCCGGGGTGGGGCTGCGCCCTGAGGGCCCTGCCCTGCCCTCCGCACGCCTCTGGCCACG-
Figure imgf000132_0004
CGGCCCTCAGGCCTTCGGCCTCACTGCGTCCCCACTTCCCTGCGCC
1098 TATGCgcccggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcggat- cacgaggtcaggagatcgagaccatcctgactaacacggtgaaaccccgtctc- tactaaaaatacaaaaattagccgggcgcggtggcgggtgcctgtagtcccagetacttgggaggctgaggcag gagaatggcgtgaacccggggcaga
1099 CGCAGGGGAAGGCCGGGGAGGGAGGTGTGAAGCGGCGGCTGGTGCTTGGGTCTACGG-
GAATACGCATAACAGCGGCCGTCAGGGCGCCGGGCAGGCGGAGACGGCGCGGCTTcccccgggggcggccg- gcgcgggcgccTCCTCGGCCGCCGCTGCCGCGAGAAGCGGGAAAGCAGAAgcggcggggcccgggcctcagggc gcagggggcggcgcccggccACTACTCGCCAGGGCCCGCCCG - 131 -
1100 CCTGAGGCGGGGCCGTCCGGCACCCTGTGATGGGGCGTGGCCCCTGGGGAGGCTCCCACCAGCCCT-
Figure imgf000133_0001
GCTCTGGCAAGGCAGTCCCTGGGGTGGCGGGCTTGC
1101 GAGGCCGGGGACGCCGAGAGCCGGGTCTTCTACCTGAAGATGAAGGGTGACTACTACCGCTACCTG- GCCGAGGTGGCCACCGGTGACGACAAGAAGCGCATCATTGACTCAGCCC( GACATCAGCAAGAAGGAGATGCCGCCCACCAACCCCATCCGCCTGGGCC
1102 CCGCCGGCTCCCCCGTATGAGGAGCTGCCATAGCTTTCGAATCCACCTGTTTTGAACAACAG- GATTAGTGCCTGTGCCACGTCCCACGCCTCCGAGAAACCCGCAGGCTCCCGGAGGCTTCGC- CCCTTCAAACACTGCCCGAGTCTCCCTAACCTTCCTCGCCGCC CCGCTCCCGCCCTTCCTCTCCCGCTACCACACGCCTCTCGGA
1103 CAGGAGCGACGCGCGCCAAAAGGCGGCGGGAAGGAGGCGGGGCAGAGCGCGCCCGGGACCCCGACT- TGGACGCGGCCAGCTGGAGAGGCGGAGCGCCGGGAGGAGACCTTGGCC( GCCTTCCCGCGCGCCGGGCTAAAAAGGCGCTAACGCCCGCGGCCGCCT
1104 cgggggaaacgcaggcgtcgggcacagagtcggcaccggcgtccccagctctgccgaagatcgcg- gtcgggtctggcccgcgggaggggccctggcgccggacctgcttcggccctgcgtgggcggcctcgccgg- gctctgcaggagcgacgcgcgccaaaaggcggcgggaaggaggcggggcagagcgcgcccgggaccccgacttg gacgcggccagctggagaggcggagcgccgggaggagaccttggcc
1105 ccccccaccctggacccgcaggctcaggagtccacgcggggagaggggatg- gagaactctcctcgcttcgtcctctctcccggggaatccctaaccccgcactgcgttacctgtcgctttggg- gaggccgctgccgggatccggccccgaacagcccgggggggcaggggcgggggtcgtcgaggggatgggggcag agagcaggcggcgggcaggatgcc
1106 GCCCGGCTTTCCGGCGCACTCCAGGGGGCGTGGCTCGGGTCCACCCGGGCTGCGAGCCG-
GCAGCACAGGCCAATAGGCAATTAGCGCGCGCCAGGCTGCCTTC
GAAGTTCGACCCATCGGCGACCCGACGGCGAGACCCCGCCCCA
1107 cgctgggccgccccTTGCTCTTAGCCAGAGGTAGCCCCTCACCCCGCGACTTACCCCACAC-
Figure imgf000133_0002
TCACAGAACCAGCAACTCCGGGCGCGCCAGGCCTCGGGCGCCGCCATCT
1108 GCTTCTCCATAGCTCGCCACACACACACACACACGCCACGCACCGTATAAAAGCCTAAATGACACAC-
Figure imgf000133_0003
GGCCGCCGAGTTCCGCGGCTCCGGGAGCGAAGCGCGCACCTGG
1109 CCGCGCACGCGCAAGTCCAGGCCGCCGCGGCCCTGGAATAGAGACTCGCCCTTGAT-
Figure imgf000133_0004
GCGGAAGAGCCGCCTCAGCTCGGCGTCCAGGTCTGAGTGGTTGAAGGCGCCGGCG
1110 GTCTCAACTCACCGCCGCCACCGCCGCGCAGCCCCGCGGCCGCTGCTCCATAGCCCTCCGACGG-
GCGCCCAGGGGCTTCCCGGCTCCGTGCTCTCTGCCCGTCGTGGTTCCGCCTTCAgccccgcgcccgcagggc- ccgccccgcgccgtcgagaagggcccgcctggcgggcggggggaggcggggccgcccgAGCCCAACCGAGTCCG - 132 -
ACCAGGTGCCCCCTCTGCTCGGC
1111 ACAAATGCGCTGCTCGGAGAGACTGCCGCGGCAACCAACTGGACACCCCAAGAGCT- CACTCCTCCGCGGTTTTATATTCCGACTTGCGCACAGGAGCGGGGTGCGGGGGCGCAGGGAGTGTGG-
GTAACAC GCGCGA
1112 GCGCCTGCGCAGTGCAGCTTAGTGCGTCGGCGCGCAGTTCTCCCGCCCGTTTCAGCG- GCGCAGCTTCTGTAGTTGGGCTACTGGAGGGGTCGCTCAGAAACCTCATACTTCTCGGGTCAGGGAAG- GTTTGGGAGGATGCTGAGGCCTGAGATCTCATCAACCTCGCCTTCTGCCCCGGCGG
1113 aagtcaagggctttcaacctcccctgccccattcatacagtggaaggtctaacccaggcttgt- cagcctaagaacacgggatctcttcactgtggttcatgtgtagagtggagtttccatgctgaga- gagacaagcaaagaagaccagaggctcccacccctgtccagtgGA
1114 tggatcccgcacaggggctgcaggtggagctacctgccagtcccctgccgtgcgctcgcattcct- cagcccttgggtggtccatgggactgggcgccatggagcagggggtggtgcttgtcggggaggctggggc- cgcacaggagcccatggagtgggtgggaggctcaggcatggcgggctgcaggtccggagccctgccctgcggga acgcagctaaggctcggtgagaaatagagcgcagcgccggtgggc
1115 CCGCCTGTGGTTTTCCGCGCATTGTGAGGGATGAGGGGTGGAGGTGGTATTAGACGCAGC-
Figure imgf000134_0001
TGGTCTCTGCAGGGGAGGAAGTTCCCGGGCGGCGCGGCCTGCGTCACAG
1116 cgcgctctcccgcgcctctgcccgcccccggcgcccgcccccgccgctcctcccgactccccgc- ccccggcccGGGTCACTTGCCGTCGCGGTGGGCGGCCCCCGGCGAGTCCACACCCCTGCCCCGCCTCCTCCCG- GTAGGAAACTCCGGGACCCTGCAAGGGATGACTCACCCCAGTGATTCAACCGCGCCACCGAGCGCGGAGCTGCC CTGGAGGACGCAGGCGGGTC
1117 TCCGGCCCAGCCCCAACCCCGACCTAAGTAACCGGCTATCGGCCACCCATTGGCTGAAGTCCCT-
Figure imgf000134_0002
ACGGAAGCCACTGGCTGGTTGGGCGGCTGTGATGGG
1118 CCGGGTCAGGCGCACAGGGCAGCGGCGCTGCCGGAGGACCAGGGCCGGCGTGCCGGCGTCCAGCGAG-
cccgcccctcccgggctccgccccagctccgcccccgcgcgccccggccccgcccccgcgcgctctcttgcttt tctcaggtcctcggctccgccccGCTC
1119 GGGGCGGTGCCTGCGCCATATATGGGAgcggccgcccctcgccgcgcccctcgccgccgccgccgc- cgcgctcgccgactgactgcctgacggcgccgcgagccggcccgagccccgcgagccccgcgagccccgccgc- cgccgagcgccaccgagcgccgccgccgccccccgccacgcaccgcggcTCCTCGCGTCCAGCCGCGGCCAAGG AAGTTACTACTCGCCCAAATAAATCTTGAAAAGAAACAAACG
1120 GCGCGGGCCCTCAGGTTCTCCCTATCGAAGCGGTCTATGGAGATAGTTGGATACTCGGCCATCTGC-
Figure imgf000134_0003
CAATCAGAGGAGTCCGGAGACCGGGGGCAAAGTCAAGGAGCATCC
1121 cgtccgcggcTCCTCAGCGTCCCCCTTTACGGTCTGGGCGGACTGCGGGGGCTGGGGAGGTTCTGGG- - 133 -
Figure imgf000135_0001
GAGGACTGCGCAGGCGCAGTGGGCCAGGCGGCCCGGCGACCAATCGG
1122 GGAGGCGCCCAGCGAGCCAGAGTGGTGGCTGGTCCCGCGCGGTGAGTGGGATTGGGGCACTTGGG-
CCGGGCGATGGCGTGGCTTGCGTCTCCCGCCTccgggcagggcctggccgccgggcgggggcgggagggccacg cgggcccagggtggggccgcggcctgcgcggcgggcgggccgggt
1123 CGCGCAGGGGGCCTTATACAAAGTCGGAGAAGTAGCTGGGTCGCTGGCCGGCCAGGGACTCAAGC-
CCTAAGACCCGCTACAGTGCGTCCTCGCTGACAGGCTCAATCACCACGGCGAGGCCAAggcgcggggccgcggc ccgcccgAGAAGCCTGAGCTGGGCCCCGACACCCCCTGCCCGACATT
1124 CCCCACCCCCTTTCTTTCTGGGTTTTGATGTGGATGTCTTTCTATTTGTTCAGGAAATTGTGACGT- GTGTTCTGGGCAGGGTTTGAGGTTTTGGAACATTTTCTAAAAGGGACAGAGAGCACCCTGCTA- CATTTCCTAATCAAGAAGTTGGCGTGCAGCTGGGAGAGC
1125 GCGCGTTCCCTCCCGTCCGCCCCCAAgccccgcgggcctcgcccaccctgcccgccgcccctccgc- cggcggccgcccTCTGCGGCGCCCCTTTCCGGTCAGTGGAGGGGCGGGAGGAGGGGCGGGGGTGCGCGGG-
CGCCAGGCTCCTCC
1126 ccggcggAGGCAGCCGTTCGGAGGATTATTCGTCTTCTCCCCATTCCGCTGCCGCCGCTGCCAG-
GCTGCGGTCCAGAGCCA
1127 GCCTGGTGCCCCGAGCGAGCCGGGAGTAGCTGCGGCGGTGCCCGCCCCCTCTCTCCGC-
Figure imgf000135_0002
CCCGGGGGGAGATCGGGGAGCGCCCGATGCCGGGCGGCCGGAGCCATTGAC
1128 GGCGGCGGCGCTACCTGGAGGCGCGGTGGCGGGCAGGTGCCCGAACTGCACGGCGATGCAGAG-
Figure imgf000135_0003
AAGAGCGAGCACAGGAAGACCTGCGTATCCGAGTGGCAGCGCTTGGC
1129 TGGTGGCCAGCGGGGAGCGCCCGGGCGCCATCGGCGCGTCCTGCTCCACCAGGGCGACCCTGG-
Figure imgf000135_0004
CGGACTCAGAGCCATCCTCCTCCTCAACCTCCACCGCAGCGGCCTGCG
1130 GCGGCACTGAACTCGCGGCAATTTGTCCCGCCTCTTTCGCTTCACGGCAGCCAATCGCTTCCGCCA-
Figure imgf000135_0005
CCGAAGGGCGAGCCGCAAACGCTAAGTCGCTGGCCATTGGTG
1131 CTCGGCGATCCCCGGCCTGAACGGGTAGGAGGGGTTGGGGGATTCCGCCATCCCTTGTTTTGAG- - 134 -
1132 CGCAGGGAGCGCGCGGAGGCCCGCAGGGTGCCCGCCTGGCCGCAGAGGCCGCGACGCCCCCTCCGC-
GCGGGCGGGCTGCGGGCTCCCGGCGCCTTCCCGCAGAGGCGGCGACAgcggccgccccccccgcggggccgggc cggggAACTTTCCCCGCCTGGAGCCGGGC
1133 GAAATACTCCCCCACAGTTTTCATGTGATCAGGAATTCAGCATAGGCTATAAGACGGAGTGCTCCAT-
Figure imgf000136_0001
AAGCCCCAGAGATAAATGACATAGGTCCAGGTCAGCCAGCATTG
1134 CCGGGCGCACGGGGAGCTGGGCGGACGGCGGCCCCCGCCTCCTCCGGGGACGCGGCAC-
GAGACGCGGGGACGCGCGGACGCCACGCTCAGCGGCCGCCCCCGGCCTCCGCGCCGCCTTCCTCCCGG-
GAGCAGCCCCGACGCGCGCGGGCCCGGACCGCCGGGGTTGTCATGGCAGCAGCTCCATCCCTGACCGCCACT
CTCCCGGTGCCGCCTCGGAGCGAGCGGGCTGGCGGGCGGCGCGGACTGCGCGCTC
1135 gcggcggcgTCCAGCCAGAGCCCTGTGGAAGCGGCGGCGACACTTGGGCTGGGCAGTGTCTCTGAT-
Figure imgf000136_0002
CCGTCCCACTTGTATTTGCATTGAGGTCATTGATGGAAATGGT
1136 GGGTCGCCGAGGCCGTGCGCTTATAGCCGGGATGACGCCGCAGTTGGGCCGGATCAGCTGAC-
CCGCGTGTTTGCACCCGGACCGGTCACGTgggcgcggccggcgtgcgcggggcggggcggagcggggcctg- gcctgggcggggcAACCTCGGCGCACGCGCACAgcgcccgggcggggggcggggTGGTGGTGCGCCTGCCGCGC
CTACAGTTCCCGCCGCTCGCGCC
1137 CGCGCCTGATGCACGTGGGCGCGCTCCTGAAACCCGAAGAGCACTCGCACTTCCCCGCGGCGGT-
Figure imgf000136_0003
GAGCGGCGCC
1138 ccgggagcgggcggaggaagggccgggcgtccggcgcaagcccgcgccgccccagccccggccccg- gcccggcccgcACACGCCGCTTACCTGGAAGCCGGCGACGCTGCCGCCCACCTCCCTGCTGCGTGTCGCAAAC-
TGTGCGACGGGGCAGAGCGGG
1139 GGGGCGCACCGGGCTGGCTCCTCTGTCCGGCCCGGGAGCCCGAGGCGCTACGGGGTGCGCGG- GACAGCGAgcgggcgggtgcgcccgggcgcggcggcggcAGCGTCGGGGACCCGGAGCTCCAGGCTGCGCCT-
gccccccgccccccgctccccGCTCGCCCGCGCTAC
1140 GCCACGGGAGGAGGCGGGAACCCAGCGAGGCCCCCGAgggctggggggaccggccggccg- gacaaagcggggccgggccgggccggggcggggccgtgcggggcTCACCGGAGATCAGAGGCCCG-
GTGCAGCTGGTCAGCGAGAGGCTCCTGGCCGCGCTGCCCCTGGTTCGCGCCCTGCT
1141 cgggcatcggcgcgggatgagaaaccaacctgatacttatcgtgtgccgagttccctcct- tgtatcctgactaagcacagcgaataaccctgtccttgttctaaccccaggtcttgaagaaatact- gtcccagctgagccccgcgtttacaagatgaagaggcgccccagatgcgctgaaagaaaggccaaagctcgtgc ctccttccactgcctgcggtagaacctggtcccgcatagcttggactcggataag - 135 -
1142 acaccgccggcgcccaccaccaccagcttatattccgtcatcgctcctcaggggcctgcggcccggg- gtcctcctacagggtctcctgccccacctgccaaggagggccctgctcagccaggcccaggcccagccccag- gccccacagggcagctgctggcagggccatctgaagggcaaacccacagcggtccctgggccccaacgccaggc agcaaggactgcagcgtgcctacctgtgcagctgcaacccag
1143 CCCCAACAGCGCGCAGCGAACTCCACTGCCGCTGCCTCCGCCCCAGAGACACGTTGCAGGCCA-
Figure imgf000137_0001
AGAGCGCACGCTTCGGGGTCTCCGGGAAGTCGCGGCGCCTTCGGATG
1144 CCCCGCTGGGGACCTGGGAAAGAGGGAAAGGCTTCCCCGGCCAGCTGCGCGGCGACTCCGGG-
GACTCCAGGGCGCCCCTCTGCGGCCGACGCCCGGGGTGCAGCGGCCGCCGGGGCTGGGGCCGGCGG-
GAGTCCGCGGGACCCTCCAGAAGAGCGGCCGGCGCCG
1145 CCCGGGGGACCCACTCGAGGCGGACGGGGCCCCCTGCACCCCTCTTCCCTGGCGGGGAGAAAGGCT-
GCAGCGGGGCGATTTGCATTTCTATGAAAACCGGACTACAGGGGCAACTCCGCCGCAGGGCAGGCGCG-
GCGCCTCAGGGATGGCTTTTGGGCTCTGCCCCTCGCTGCTCCCGGCGTTTGGcgcccgcgccccctccccctgc gcccgcccccgcccccctcccgctcccATTCTCTGCCGG
1146 CCCGCGGAGGGGCACACCAggcgggtgttggggaggacgcagagggctggggctggagcccag- gcggggcagggggcggggcggagctgggtccgaggccggCGGGGGCGCCTCCATCCCACGC-
CCTCCTCCCCCGCGCGCCCGCCCGCTCTCGGGTGACTCCGCAACCTGTCGCTCAGGTTCCTCCTCTcccggccc cgccccggcccggccccgccgAGCGTCCCACCCGCCCGCGGGAGACCTGGCGCCCCG
1147 GCCCACGTGCTCGCGCCAACCCCTACGCCCCAGCGCGCCTTCTCCACCCACGCACGGGCCTCG-
Figure imgf000137_0002
CGGGAACATGAGTGGAAGAGCCCGAGTGAAGGCCAGAGGCATCGC
1148 GGCGGAGCGGCGAGGAGGAGGAGCAGGAGCGCGCAGCCAGCGGGTCCACGCATCT-
CAGCACTTCCAGACCAACTCCGGCACCTTCCACACCCCTGCCCGGGCTGGGGGCTCCGAGAGCGGC-
CGCGAAGCGACTCCGATCCTCCCTCTGAGCCTTGCTCAGCTCTGCCCCGCGCCTCCCGGGCTCCGG^
GCGGGGTCCCTGCTCCTGCGCCCCGGGCGCGCTTCCCGGACACCCCGGTCCCCGCAGCC
1149 CCTCGCCGGTTCCCGGGTGGCGCGCGTTCGCTGCCTCCTCAGCTCCAGGATGATCGGC-
Figure imgf000137_0003
1150 caggcgcgccgATGGCGTTTCTGAGGTGACGCCGCCCACACCGGGCTTCTCCGGGGGCGGAG- GAAACACCTATGAACCCTCCGGCAGCCTTCCTTGCCGGGCGCCAGGTAAGCAGCGGTTccgggcgcgg
1151 CTCCCGGCTTCTGCATCGAGGGCCTTCCAGGGCCAGCCCTTGGGGGCTCCCAGATGGG- GCGTCCACGTGACCCACTGCCCCCACGCCCGCGCGCGGGCCCCAGCAGCCCCAGAGCTGCGC- CAACTTCGTT
CCCGGCCGA
1152 CGGTCCGCGAGTGGGAGCGGCTGCTTGTGGGCAGGGTGGACGCGGGGCCACGTCTTGGCCG-
Figure imgf000137_0004
AAAGACGGCGAGACGCGTCCACGCAGGGGGAGTCTGTGCGGTTTGGA - 136 -
1153 GCAGCGCCGCCTCCCACCCCGGGCTTGTGCTGAATGGGTTCTGATTGTGCACGGGGTGCACACTGG- GCATTTCTTGGAAGGGGCACACTGacgcgcgcacacacgcccccgacgcgcacgcgccccgcgcgcact- cacactcacccccgcgcacactcacccccgcgcacactcacgcTGCCGCCGCGCTGAGGTGCAGCGCACGGGGC TTCACCTGCAACGTGTCGATTGGACGGATGGGCTCGGCGCGTGGGT
1154 CGACCGTGCTGGCGGCGACTTCACCGCAGTCGGCTCCCAGGGAGAAAGCCTGGCGAGTGAG-
Figure imgf000138_0001
1155 CCCGGGCTCCGCTCGCCAACCTGTTACTGCTGCAGAACGCCAGGAAGCTCAGCCTG-
Figure imgf000138_0002
TCTCAGCTTCCCTGGTCCCTGGTCCCGAGTTCCGCCTTCCCCCCCCGCCCCGTGGC
1156 CATGGGGTGCTCATCTTCCCGGAGCTGAGGAGCTGGGGCGGGCATGGGGTGCTCATCTTCCTG-
Figure imgf000138_0003
GGCATGGGGTGCTCATCTTCCCAGAGCTGAGGAGCTGGGGCGGGCAT
1157 CCGAGAGCCGGAGCGGGGAGGGCCCGCCAAGTCAGCATTCCAGCCGGTGATTGCAATGGACAC-
CCCCGCCCTTGTATCTCATGGAGGATTACGTGGGCAGCCCCGTGGTGGCGAACAGAACATCACGGCGG
1158 CCGCTGCAGGGCGTCTGGGCTTCTGGGGGCAGAGAAGACTCACGCAGTGAGCAGTCCGCAAGC- CCGCTGGCGGCAGCGGCGGTGCTCCGTCCAGGGCGAGAAGCTGCAGCGCTCGGGCCGGGGTCCCTCCT- GTCGCAGCAGCTCCTCGACGAGTGCAGGGGCAGCCACG
1159 gcgctgccccaagctggcttccgctgcctgctctgggctgggctgggctgggctgggctggtag- gacctgctcccagggcgggaggggacacacccacctcagcagatctcagcccatccctcccagctcagt- gcactcacccaaccccacacgggccaaggagagagtgaagaggaagcattgccctcagaggccttcacggactg gccaga
1160 CAGGATGCCAGCGTGACGGAAGCAAGTAACCACCAAGGCATCACCACTGGCGCTAAACTTCT- CACTTCCGGAGTGCTGCAAGCGCAGAAAATATACGTCATGTGCGGAGGCGGAGCTTCCGCCCT-
TGTCGCAGGGACATCTTCTGGCTGTTTCCGTCGCCTGCGTGGCCCTTGCACCCCGG
1161 GGCGGTGCCATCGCGTCCACTTCCCCGGCCGCCCCATTCCAGCTCCGGAGCTCGGCCGCAGAAACGC-
CCGCTCCAGAAggcggcccccgccccccggcccAAGGACGTGTGTTGGTCCAGCCCCCCGGTTCCCCGAGAC-
CGGCGGGCTCACCTGCGTCGGGAGGAAgcgcggcg
1162 GTGGGTCGCCGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCT- TACTCCGGCCAAAAAAGAACTGCACCTCTGGAGCGGGTTAGTGGTGGTGGTAGTGGGTTGGGAC- GAGCGCGTCTTCCGCAGTCCCAGTCCAGCGTGGCGGGGGAGCGCCTCACGCCCCGGGTCGCT
1163 GGCGGAGGGCCACGCAGGGGAGACAGAGGGCCTCCACAGGGGCCAGGGGGAAGTGTGGGAACT- GAGTCTCCCCCAGACGAGGCTTCACTTGGACACGTGTATGTGGTCACCGGGGGAAACTGAGCAGTTCT- GACTTCCCTTGGAAGGCGTGGAATTAGGAGAGAAI ATCTGTGGAAAGGAAGCGGTGATAGGTTTCCGCA - 137 -
1164 GTCCGGGGGCGCCGCTGATTGGCCGATTCAACAGACGCGGGTGGGCAGCTCAGCCGCATCGCTAAGC- CCGGCCGCCTCCCAGGCTGGAATCCCTCGACACTTGGTCCTTcccgccccgcccttccgtgccctgc- ccttccctgcccttccccgccctgccccgcccggcccggcccggccctgcccaaccctgccccgccctgcc
1165 CGGCCTGCGGCTCGGTTCCCGCCTCTTCCCCACCCCCAGCCCCGCGCTGCCCTCTCGGTCCCCCT-
Figure imgf000139_0001
GAAGAGTTGTCAGCCCAACAAGAATATAGGATCACCGGCCCATCA
1166 GGGAACCGTGGCGGCCCCTCCTGGCCCTGGGAGGTGGTCCCGCTGCCCCCCTGACTTCCGTGCACT-
Figure imgf000139_0002
GCGCATGAAGCGGCGCAGCAGCGCCAGTGTCAGCGACAGCAGC
1167 cggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggaaggcggggatggT- GAGACggtgaggcggggcggggcctggggcgcgggcggggcggggaggggtggggcggggcCCGGGGGCGCTG-
TGGCAGAGGCGGGTGCAGGGAACCCGCGGCTCGGCGGGAGCGTG
1168 cctcccggtttcaggccattctcctgcctcagcctcccaagtagctgggactacaggcgcctgccac- cactcccggctaattttttgtatttttagtagagacgggggtttcaccgtgttagccaggatggtctcgatct- gcttacctcgtgatccgcccgcctcggcctcccaaagtgctgggattacaggcgtgagccaccgcgtccggcAT ATTT
1169 AGCCCGCGCACCGACCAGCGCCCCAGTTCCCCACAGACGCCGGCGGGCCCGGGAGCCTCGCGGACGT- GACGCCGCGGGCGGAAGTGACGTTTTCCCGCGGTTGGACGCGGCGCTCAGTTGCCGGGCGGGGGAGG- GCGCGTCCGGTTTTTCTCAGGGGACGTTGAAATTATTTTTGTAACGGC GACGTGCGCGCGCGTCGTCCTCCCCGGCGCTCCTCCACAGCTCGCTG
1170 CCCCAGCCACACCAGACGTGGGAGCTTAGGATGAGAGCGGCCTCCGAGCAGATGATCACCCTG-
Figure imgf000139_0003
CGGCCTGGGCTGGG
1171 tgggcttcctgccccatggttccctctgttcccaaagggtttctgcagtttcacggagcttttca- cattccactcggtttttttttttttgagactcgctctgtcgcccaggctggaatgcagtggcgcgatctcg- gctcactgcaagctccgcctcccgggttcacgccattctgcttcagcctcccaagtagctgggattataggcgc ccgccaccacgcccggctaatggctaattttttgtattttttttt
1172 CCGCGCTGGGCCGCAGCTTTCCGGAGCGCAGAGGAAGCTGGCCAGCCTGCAGATAGCACTGG- GAAAGACACCGCGGAACTCCC
CAGGGACGCCGAGCCAACTC
1173 gcatggcccggtggcctgcactccagtgaggtggctgaactctgaccagccaagagaaaac- ccccctctccgccccaaacagctccccactcccccagcctgcccccaccctccccacattccagtctttcact- gtcgccccaggcaacttggctgcccaagaccaagccccaccaagaagctggagggccaggcaagtccaggatgg gcaagcagggaagcacgagagggagaaacagaggtgaggaaggaagg
1174 GGGCAGGGGAGGGGAGTGCTTGAGTATTGGGGCTACACTCACCACAAGAGCAGCAAACAAAGCACT- GGGTGTGGTAGAGGCTGTCCAGGGCCTGGCAGGCATTGCTCTGCCCATAGATGCCTTTGTTGCACT- - 138 -
TGATACAGGTGCCTGAGAAGAGAAAAGTGTCACACTCTACTCCCCCAGGTCAAAACCAGG-
GATTCCCAAGCTTTCCTGACTGCCCTTTCCTGATGTGCCAGGGGTCA
1175 CCCCGGCGCCTTCCTCCTCCGGACTCCGCTGCATGCCTCGCTTGCGGTGGTCCGATCG-
GCTTTCTCCGGGAGCTTTCCTCTCCCCGCCACGCCCCCGTCTCCCCGGCCGTCCCCGCGCCTCTCG-
GCCTCCCTTTCATTAGCCCCACATCTGTCTTTCCCATGGGAGGGAGCGCGCGCCTTCCGCCCAGCGGGGC(
AGCAGAGCCTCTCCAATCCTCGGCGCCTCCCCTACACAGGGTTCGCTGGGCCGTTCT
1176 CCACCGCGCTTCCCGGCTATGCGAAAGTGAAAACGAGGGGCGCCCAAGGCCCT-
GCTTCTTCCCCCTTCCTCTTCCCCTTGCCCAGCCGCGACTTCTTCCTCACTGATCTCCCGGGGGCG-
GAGACGCTGi
TCCGAGGCC
1177 CCGCATCTGACCGCAGGACCCCAGCGCTACCAAGTGCCTGTTCTTGGACCCCCAGCCGAGCAGGGG- GAAGCATCCCCAGCTCCCGCACCCAAGTCCCTGGCGCCGCTGCCGGGCCGCCCTCCCTGATGC-
CCAGCGCGCAGCCTGCCGGCGCCGCGCCTTCTGGACGGCTCTCGCCGCACCTCCTGAGCTCAGCCCGCGGCCI GCAGTGGGGCGGCCTCACTTACTGGCGGGGAAGCGCGGGTCTGGGTTGGCGC
1178 GCGGACACGTGCTTTTCCCGCATTAGGGGGGGTCTcccggcgcgcgccccgccgccACCTGTTGAG- GAAAGCGAGCGCACCTCCTGCAGCTCAGGCTCCGGGCGCCAGCCCTGCCCCGCAGCCCCAGAGC- CCGTCGCAGCTCGGGTGGTCCCTCCCCGGC AGTTATTTCTCGCAGCCTCCGCGCTTGCA
1179 GAGCTGGAAGAGTTTGTGAGGGCGGTCCCGGGAGCGGATTGGGTCTGGGAGTTCCCAGAGGCG-
Figure imgf000140_0001
TGCCCGTGGCCCGGCT
1180 GGCCGCCAACGACGCCAGAGCCGGAAATGACGACAACGGTGAGGGTTCTCGGGCGGGGCCTGG-
Figure imgf000140_0002
ACCGCGGACATGGGCGGCCGCGGGCAGGGCCCGGCCCTTTGTGGCCG
1181 GCGCCCGGTCAGCCCGCAGCGCCCGGCCAGCCCGCAGCGCCGGAGCCCGCAGTGCGTGCGAGGG-
GCTCTCGGCAGGTCCAGACGCCTCGCCGAGCCCAGCCCGCAGCTccccgggccgcgccgcgcccgcccACAGG- GCCCACAGCCCTGCTTCGGCTCTCAGGGCGGTCACCTGGGATGGGG
1182 CCCGCCAGGCCCAGCCCCTCCCTGGCCAGCCCCGTCCTTGTCCCCAAACTgggcccgcccggccgc- caggccgccgggcctccggggcccTCGCGCATCCGGCTCCGAAAGCTGCGCGCAGCCATCATCAGGGC- CCTTCTGGTGTTAGAAGAGACCCCGGCATCATCTTTTCGTCGCGTGCTTCCCCCAGAGTCA
1183 CGATTCTTCCCAGCAGATGGCCCCAAAGTTCAGTTCCTGAATTGCCTCGCGGAGCCGCGGGCT- GCAACGTGAGGCGGCCGCTGCCAGTCGACTCAACCACCGGAGTGGCCCCTGCAGTTGGATAGCAAC- GAGAATCCTCCAGGGGTGCAGGGCGACGGCTTCGGCCGCACC
1184 CGCACACCGCCCCCAAGCGGCCGGCCGAGGGAGCGCCGCGGCAGCGGGAGAGGCGTCTCTGTGGGC-
Figure imgf000140_0003
GTCCTCACCCCACGGGGACGGTGGAGGAGAGTCAGCGAGGGCCCGA
1185 AGGCCCCGAGGCCGGAGCGGCGGAGGGGGCGGCCCCTCCCACAGGGTCTTCCCACCCACAGGGCAC- - 139 -
:GC
GCGACGCTCGCTCGTGTCCCCGGTCCCCGTGGCC
1186 GGGTTCGCGCGAGCGCTTTGTGCTCATGGACCAGCCGCACAACTTTTGAAGGCTCGCCGGCCCATGT-
Figure imgf000141_0001
CGGCGAGCTCCCTCATGTTGTCGCCCTGCGGCGCCC
1187 CCAGTCTCCCGCCCCCTGAGCATGCACGCACTTTGGTTGCAGTGCAATGCTCTGACTTCCAAATGG-
GAGAGACAAGTGGCGGAAAATAGGGTCTTCTCCCACCTCCCACCCCCCCATCCCGACTCTTTTGC-
CCTTCTTTTGGTCCAAGAGATTTTGAAACCGTGCAGAACGAGGGAGAGGGC
AAAACACACCCCAAAGTGGGCCTCGCATCGGCCCTCGCATTCCTGTAGAG
1188 GAGGAGGCAGCGGACCGGGGACACCCTGGGGGAACTTCCCGAGCTCCGCGACCTCGAAGCCTGGC-
Figure imgf000141_0002
GGCCGTGGGCGCGGGTGGAACCC
1189 CCGGCTCCACGGACCCACGGAAGGGCAAGGGGGCGGCCTCGGGGCGGCGGGACAGTTGTCGGAGG-
GCGCCCTCCAGGCCCAAGCCGCCTTCTCCGGCCCCCGCCATGGCCCGGGGCGGCAGTCAGAGCTG-
GAGCTCCGGGGAATCAGACGGGCAGCCA;
GGACGGGCGTCGGGGGTCTGGGCCGCGA
1190 CCGCCACCGCCACCATGCCCAACTTCGCCGGCACCTGGAAGATGCGCAGCAGCGAGAATTTCGAC-
Figure imgf000141_0003
AATAGATCGCCTTGTCTCCCAGGCGCACCGGGTCTCG
1191 TGAGTAAGGATGATACCGAGAGGGAAGAAAAAAATACCCTCTTTGggccaggcacggtggctcac- ccctgtaatcccagcactttgggaggctgaggcgagcggatcacgagatcagaagatcgagaccatcctg- gctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagccaggcatggtggcgggcacct
1192 tgggccaggcacggtggctcacccctgtaatcccagcactttgggaggctgaggcgagcggatcac- gagatcagaagatcgagaccatcctggctaacacagtgaaaccccatctctaccaaaaatacaaaaaattagc- caggcatggtggcgggcacctgtagtcccagctacttgggaggctgaggcaggagaatcctttgaacccaggag gcggagcttgcagtgagctgagattgtgccactgcactccag
1193 CCGGCGAAGTGGGCGGCTCCCCAAGCGCCCAGGCTGCGCAGCACGATggccgcccccgccgcgcac- cgcgtgtgcccgcacgcccgccccctgcgccccggggacgcctctccgcccctccccctgcccctccgcccac- cgcgcggtcgccccacgccgcgggcgctgcttcgccgcccgggaggccgcctcccgccccgggACCGGATAACG CCCTAAATCAGCGCAGCTGAGGCGAGGCCGTGGCCCCCGCAG
1194 GCGGCCTTACCCTGCCGCGAGCGCCTGTGACAGCGGCGCCGCTGTGCTCGCGACCCCGGCTCCGG-
Figure imgf000141_0004
GCGAGACAGTGGCCGGGCACGTGCTGCTGGAGGCGTCCGAGCCGG
1195 gtggggccggcgAGGGTCAGGGGCATCGCGGCCGCGACCCCATTCTGCAGCCCCCGAGGCTCGC-
CCGACTCCTGGCTGCCCTGGACTCCCCTCCCTCCTCCCTCCCGCCTCCTCGCCCAGGGCCCGGCTCACCTg- gcggcggggcgcgggacgccgcgggcgggacggcggggggctccggggcgctccggggcggcTCTCGCGCATGC - 140 -
TCCGGGGC
1196 CGGCGCGGACCGGCTCCTCTACCACTTTCTCCAGCTGCACTGCCACCCAGCCTGCCTGGTGCTGGT-
Figure imgf000142_0001
CGATGACACAGAGAAGGATGGCAGGGGTCCCCAGGGG
1197 GCCCATGCGGCCCCGTCACGTGATGCAAGGATCGCCGGCCTTTCCGCCAGAGGGCGGCACAGAAC-
GCGGCGTCCCGGGGCCAGGGGGGTGCGCCTTTCTCCGCGTcggggcggcccggagcgcggtggcgcggcgcggg gTAA
1198 GGGATTGCCAGGGGCTGACCGGAGTGTTGCTGGGAAGGAGCCTCAGCTCCGCTCCAGGTCCTCCAC-
GAAACTGAAGAGCTTCCGCATCTTGCTTGGGTTGGTGGGCTCGGCCCGC
1199 GCCGGAGCACGCGGCTACTCAGGCCGAACCCCGACCCGGACCCGGCACGCGGCCTCGGCGAGGGCGG-
Figure imgf000142_0002
GACCTCCAAGCGGCCACCGCGC
1200 CCTCGGCGCCGGCCCGTTAGTTgcccgggcccgagccggccgggcccgcgggTTGCCGAGCCCGCT-
GACGTCAGCCCGGGTTTCCCCCCCCCACCGGGGCTTCCCCATCCCCCGAGGCTTCCCGGGAGGGCTGC-
GAGTCCGGGGAGCGTGCGGGGTCGCCACCATCGGGACCCCCAGAGGAC
GGACGCTGTCCCCCTCCCGCCCCCCAccccatttacagattgggaga
1201 CACAGCGGCGGCGAGTGGGTCGTGCACGCGGATGCGGGGTGGGAGTGGGGGCGCACGCGCGGGCGT-
Figure imgf000142_0003
1202 CACCTCGGGCGGGGCGGACTCGGCTGGGCGGACTCAGCGGGGCGGGCGCAGGCGCAGGGCGG-
GTCCTTTGCGTCCGGCCCTCTTTCCCCTGACCATAAAAGCAGCCGCTGGCTGCTGGGCCCTAC-
CAAGCCTTCCACGTGCGCCTTI
GACCCCAACTGCTCCTGCGCC
1203 AGACGGGGCCGGGCGCAGACGCCCCGCCCCGCCCTTGCACCCAGCCCGCTGAGTCCGCACCGC-
CCGCGGTCCCGGCCTGGGCTGTGCGCAGGAGATGGGCCAAGTGCAAGGTCCCTTGAGCGCAGCTGG-
GCGCACACCGCAGGACGGCCCCTTTCGCACCGGCTCGCGAGGGAGGCGCTGTGCCCCCCGTGTGCGGCT1
CACCCTGCCAGGCCTTCCCAGCTTCCCTGAGGTTGCCTGCTACACCCGCCCC
1204 GCATTCGGGCCGCAAGCTCCGCGCCCCAGCCCTGCGCCCCTTCCTCTCCCGTCGTCAC-
Figure imgf000142_0004
GGCGGGGGCGTCTGGCCGCGGAGTCCGCGGGGTGGGCTCGCGCGGGCGGTGG
1205 GCCCGAAAGGGCCGGAGCGTGTCCCCCGCCAGGGCGCAGGCCCCAGCCCCCCGCACCCCTAT-
TGTCCAGCCAGCTGGAGCTCCGGCCAGATCCCGGGCTGCC(
GAGCGCAGAGAAAAGTTCAAGCCTTGCCCACCCGGGCTGC
1206 CGGCGGCCGGGTGACCGACCACTGCTTACCAGGAGGGGAGACTGGCAGGGGGGGCTCAAGGAA- - 14 1 -
^CCGGGAGTGCCAGGCTCGTGCCCGGCCGG GCA 1207 CCACCGGCGGCCGCTCACCTCCTGCTCCTTCTCCTGGTCCGGGCGGGCCGGCCTGG-
Figure imgf000143_0001
GCCGGTGTGTGTCCCCGCAGGAGAGTGTGCTGGGCAGACGATGCTGGACACGATG
1208 CGGTCAGGGACCCCCTTCCCCCTTCAAGCTGACTCCCTCCCACAAGGCTCTTCAGATCTCGT-
Figure imgf000143_0002
GCCTTCCCCCCCCACCGAGCCCATCGCAGGC
1209 GGCCGAAGCTGCCGCCCCTCCTCCCAACCGGCGGGTCAGATCTCGCTCCCTTTCGGACAACT-
TACCTCggagaggagtcaaggggagaggggaggggagggggggagggggcaagagagagaggggggagaa- gaggGATCTTCTCGCTTATTTCATTGTTCCCCCATCTTCAGGGAGCGGGGGCAGCGGCTCCTCAAGGCGGCGGG
CGCCGGCGTCTTCAGAGCGCCATGCGAACCGCGG
1210 GCGGCCTTGTGCCGCTGGGGGCTCCTCCTCGCCCTCTTGCCCCCCGGAGCCGCGAGCACCCAAGGT-
Figure imgf000143_0003
GGCCCTGGGGCCCTCGGGCGGGAGGGGGCAGTTACACGGCAG
1211 CGCGGGAGGAGCGGCGAGGCCCTCACCTGGCGCCTTTTATGCCCGCGGCCGGTGGAGGGGGGAAGG- GAGGAATGGTGTCAGGGGCGGATATCTGAGCCCTGAGGAATTTGCAGGCTCCTGAGAGCAAATATGG- GCTCTCTCCCCATTGGTCAATTCCCTCCCCTCCCI GGTCCGCGCAGAAGCTCCGACCCGCACTCCCCCA
1212 GCCCACCAGAAGCccatcaccaccagcaaagccaccaccaaagccaccacccaagccagcaccaag- gccaccaccatatcctcccccaaagccactaccaAAGCTGCTGCTGCTGCTGCTGAAGCCACCGCCATAGC-
CGCCCCCCAGCCC GGGGCGCGGCAG
1213 GGGCCATGTGCCCCACCCCACAGCCCCACCCTGCCCTGCCCACCACCCCAAGCCCGGCCCTGG- GTCCCAGGGTCCCGCCAGGCCCGCTGGGTGGAATGTGGTCATGTTTCAGACTGCCGATG- GCTTCCACTTCCCAGACAGGCCCAGACGGCCCCGCCAGCAGCC
1214 CCGCCAGCCCAGGGCGAGAGTCAGGGACGCGGCGTCGGGCGAGCTGCGCGGGCCCCGGGGGAG- GCGCGACCCCGGAGGCACCTGTCCGGATCCCTCCCCGCCTTGCT GCGCGCTCGGCCCGAACCGCGCGACCCCCAAGTCGCCGCGCCC
1215 gccccctgtccctttcccgggactctactacctttacccagagcagagggtgaaggcctcct- gagcgcaggggcccagttatctgagaaaccccacagcctgtcccccgtccaggaagtctcagcgagctcacgc- cgcgcagtcgcagttt
1216 GTGGGGGTCCGCACCCAGCAATAACCCGGGTCTTCCCGCTCCGGCTCCTGCCCCAGTAAGCGTTG-
Figure imgf000143_0004
A
1217 gggcccccgggTTGCGTGAGGACACCTCCTCTGAGGGGCGCCGCTTGCCCCTCTCCGGATCGC- - 1 42 -
JAGGAGGATC
GAGCTTCATGCGGCTCAACGACCTGT cgggggccgggggccggccggggc cggggTCAGCAGAAAAGGAC- CCGGGCAGCGCGGA
1218 GCCTGCACAGACGACAGCACCCCCGGCGGGGGAGAGCGGCCCCAGCGGAGACTCGGCAGGGCTCAG- GTTTCCTGGACCGGATGACTGACCTGAgcccggggcccgggcggcgctggccgggcACAGGATGCGCGGCCCG-
1219 GGCCGCGCCGGGCTCAGGTTCCACCCCCGGGAGCGCGGGGCGGAGCCAGGCCGGCGCCGAGGCT-
Figure imgf000144_0001
CAGGCCACTGTAGGGAACGGCGGTGGCGCCTCCCC
1220 GGGGTAGTCGCGCAGGTGTCGGGCGCGGAGCCGCTTGGCCTCCTCCACGAAGGGC-
Figure imgf000144_0002
TTCTCCAGGGGCAGCGTCCCGGGGGCCGCGGGGCTCCCAGCGCCCTCCCGCTCC
1221 tgcaggcggagaatagcagcctccctctgccaagtaagaggaaccggcctaaagga- cattttctctctctctcctcccctctcatcgggtgaatagtgagctgctccggcaaaaagaaaccggaaat- gctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgggcctctcagattgcgacg gtcctcctcggcctggcgggcaaacccctggtttagcacttctcacttccacga
1222 ccggaaatgctgctgcaagaggcagaaatgtaaatgtggagccaaacaataacagggctgccgg- gcctctcagattgcgacggtcctcctcggcctggcgggcaaacccctggtttagcacttct- cacttccacgactgacagccttcaattggattttctcc
1223 gcgtcggatccctgagaacttcgaagccatcctggctgaggctaatctccgctgtgcttcctct- gcagtatgaagactttggagactcaaccgttagctccggactgctgtccttcagaccaggacccagctccagc- ccatccttctccccacgcttccccgatgaataaaaatgcggactctgaactgatgccaccgcctcccgaaaggg gggatccgccccggttgtcccc
1224 CCGGCTCCGCGGGTTCCGTGGGTCGCCCGCGAAATCTGATCCGGGATGCGGCGGCCCAATCGGAAG-
GTGGACCGAAATCCCGCGACAGCAAGAGGCCCGTAGCGACCCGCGGTGCTAAGGAACACAGT-
GCTTTCAAAAGAATTGGCGTCCGCTGTTCGCCTCTCCTCCCGGG
1225 CGTCGCCGGGGCTGGACGTTCGCAGCGGCGCTTCGGAAGGGGGCCCCGCGGGAGCAGCCGC-
Figure imgf000144_0003
GGCTGGGGGCGTCCACCTGAGAAGCTCCGCTCTCGCTCAGACACCCCAC
1226 GGGCCTGCCGCCTCGTCCACCGTCCGTCGTGAGGCCGGCAGCGGACACGTGCTCATCCCACGGGGAG-
Figure imgf000144_0004
TAAGCCCAGAGGTGGGGTCTTTGGAGAGCACTTAGGGCCCGGG
1227 GCACACCGCTGGCGGACACCCCAGTAACAAGTGAGAGCGCTCCAC-
CCCGCAGTCCCCCCCGCCTCTCCTCCCTGGGTCCCCTCGGCTCTCGGAAGAAAAAC-
1228 GCAAACCATCTTCCCCGACGCCTTCCACATAAGATGCCCTCCTGCGGGCCCTCACCTTTTGACACT- - 143 -
Figure imgf000145_0001
CCCGC
1229 GGCCCTCCGCCGCCTCCAACCGCGCACCAGGAGCTGGGCAcggcggcagcggcggcagcggcg- gcgTCGCGCTCGGCCATGGTCACCAGCATGGCCTCGATCCTGGACGGCGGCGACTACCGGC-
AACACCTACACCACGCTGACACCGCTCCAGCCGCTGCC
1230 CACCACCGTGGCAAAGCGTCCCCGCGCGGTGAAGGGCGTCAGGTGCAGCTGGCTGGACATCTCG-
GCACCGGGCTGAGCGCAggccccgcggcggcgccgggggcagccggggTCTGCAGCGGCGAGGTCCTGGCGACC
GGGTCCCGGGATGCGGCTGGATGGGGCGTGTGCCCGGGC
1231 CACAGCCCCTTCCTGCCCGAACATGTTGGAGGCCTTTTGGAAGCTGT-
GAGGAGTGCCGCcgaggcggggcggggcggggcgtggagctgggctggcagtgggcgtggcggtgc
1232 GCTTGATGCTCACCACTGTTCTTGCTGCTCAAGGGAAACCAAGTATATATTTGTGGATAG-
ATCCTAACTCAGATGATACTGTCAGAATATATAAGATTCCTATACCACATCCTGAACTCTGAAAGT-
TGCAGTTCTACGTAGAAGTTCACTGAGGGTTGTAAGAGTCAGI
CAAACCTCACAGGTGAGTCAGTGGGGAGAAAGAAGCATGACA
1233 ggccaggcccggtggctcacacctgtaatcccageactttgggaggccgaggtgggcggattgcct- gaggtcaggagtttgagaccagcctggccaacatggtgaaaccccgtetctactaaaaataccaaaaattagc- cagtcgtagtggtgggcacctgtaatcccagctattcaggaggctgaggcaggaggatcacttgaacccaagag gcgggagttgcagtgagcagagatcacgccattgcaccccag
1234 GCGGGACGGGTGGCGGGAAGGAGGGAGGCGCGGCTGGGGAGAGCGCTCGGGAGCTGCCGGGCGCT-
GCGGaccccgtttagtcctaacctcaatcctgcgagggaggggacgcatcgtcctcctcgccttacagacgc- cgaaacggagggtcccattagggacgtgactggcgcgggcaacacacacagcagcgacagccgggaGGTAAGCC
GCGTCCCAGCGGCTCCGCGGCCGGGCTCGCAGTCGCCCCAGTGA
1235 GCTTGGCCCCGCCACCCAGACCCCTCCCCCGGGGGCGCCCAGCTTGGCCTCTGGGTCCCG-
Figure imgf000145_0002
GCCCAGCTGACCCTGCCCGCCCCGCCGCGGCCCTGCAGTCCCCGGGCCAG
1236 GCGGGGAAGGCGACCGCAGCCCACCTACCGCTGGACGCGGGTTGGGGACCCCGCCGCCCGGC-
CAGCTTTGTTcgggggcccgcggcccctcccgggcccccgcACCGCCTCGGGTGACCCGCGGT-
GCCGCCCGGGGA
1237 gcgcccaaccaccacgcccgcctaatttttgtatttttagtagagacgggttttcaccattttggc- caggctggtctcgaaccccgacctcaggtgatctgcccaaaagtgctgggattacaggcgtcagccaccgcgc- ccggccGGGACCCTCTCTTCTAACTCGGAGCTGGGTGTGGGGACCTCCAGTCCTAAAACAAGGGATCACTCCCA
CCCCCGCC
1238 AAAAGCCCCGGCCGGCCTCCCCAGGGTCCCCGAGGACGAAGTTGACCCTGACCGGGC- - 144 -
CCGCCGGTCCGCAGACCCTGCACCGGGCTTGGACTCGCAGCCGGGACTGACG
1239 CGCAGGTGCGGGGGAGCGTGCGGCCGGGTCCATGCGCCTGCGGGCGGCGGGGGGAGACGCGT-
TGCCTTCGGCCGGGACCACTGCACCTGCCCGCGTGGGTAATGCGCCCGC-
1240 CCCGCCCACAGCGCGGAGTTTAGTCTGCGCGTGCCTCGCTCGAGAACGCGCTCGTGCGCATGC-
GAAGTGGTcggcgcgcggcgcggcgcgccTGGGCGCTAAGATGGCGGCGGCGTGAGTTGCATGTTGTGTGAGGA
TCCCGGGGCCGCCGCGTCGCTCGGGCCCCGCCATG
1241 GCAGGGGCCCGGGGGCGATGCCACCCGGTGCCGACTGAGGCCACCGCACCATGGCCCGCTCGCT-
Figure imgf000146_0001
TCGGTGGGCGGTGGGTGGTGGGCAGTGGGCGGTGGCCAGCCGGCAGGG
1242 CATGACCGCGGTGGCTTGTGGGAAAAGTGGCTCGGAACCCCAAATCCCGGTTAGATTGCAGGCAC-
CGCCGGACGCTGGCTCCCGGAGGTTTTAGTTTTCCCTCTACCAGGAGTGTGAAGACACAGAGACTTAT-
TGCGCTGGCGAAGATGGCTGAGGCGAAGGCGTGTCCGA
1243 GCAGGTGCTCAGCGGGCAGACGCCCCGCCCCGCCCCGCCAGGTTCTGTTGGGGGCGAGGC-
CCGCGCAAGCCCCGCCTCTTCCCCGGCACCAGGGGCGGGCCCAGGTGCGCCCAGGGCCGGGGAGCGGC-
CGCGCAGGTGCCTGCCCTTTGCGCCTGCGCCCAGCTCG
1244 GGTGCGCCCTGCGCTGGCTAAAGTGCGCAAGCGCGCGAGGCTCGGGCCTTTCAAAccccggcgcgc- cggcgccggcgTCGACACTGCGCAAGCCCAGTCGCGCCTCTCCAGAGCGGGAAGAGCGCTGCGTTCCT-
TAGCAACGAGCGTTTCCTCCAGCCCCGCCTCCCTCCGCCACACACAACCCCGC
1245 AATTTGGTCCTCCTGCGCCTGCCAAGATTGTCTgagtattgatcgaacccaggagttcgagat- cagcttgagcaagatagcgagaacccccgcccctccacctcgtctcaaaaaaaaaaaaaaaTCGTCT-
TTGCAggccgggcgcggt
1246 GGCTTCCGCGGCGCCAATCTCCACCCGCAGTCTCCGCCTCCCGCACCTGTGGTCCGGGCCTCACG-
GTTTCAGCGCCGCGAGGCCTCACCTGCTGGTCTTGGAGCCTCAAGGGAAAGACTGCAGAGGGATCGAGGCGGI
CCACTGCCAGCACGGCCAGCGTGGCCCAGGGCTCGCAGCACTTCCGGCCTCTCTGGCCCCGC
1247 GCCAGGAGAGGGGCCGAGCCTGCACAGGAGCTTCCTCGGTTTTCCGAGCGCCGGCCCCCCTTCTCT-
GCCTGGGAGGAGGTGGTTAGAGTCCCCTGGGTGTGTGCCCCGCAGAGGGAGCTCTGGCCTCAGTGCCCAGTG1
GCAGACCAATGAGAGCCCCAGAGAGAAAGACGGTCATTTCCTCCCTGCATCTTCCCTTGGGGC
1248 cgagcgccggccccccttctctgcctgggaggaggtggttagagtcccctgggtgtgtgccccgca- gagggagctctggcctcagtgcccagtgtgcagaccaatgagagccccagagagaaagacggt- catttcctccctgcatcttcccttggggc
1249 GGTTGCGAGGGCACCCTTTGGCCCGGGGGCGCGCAGGAGAGGGCAGGGGCCAGGGGTTTCCTGGGC-
CGGGCACCGTGGGGCGGGACGTGGCCCGGGAGGAGCTGGGGGGACTGGGTGGTGCACGTGCGGGC
1250 acecggacgcggtggcgcgcgcctgtaatcccagetactcgggagcctgaggcaggagaatcgct- tgaatccgggaggcggaggttgcagtaagccgagatcgcgccactgcaccccagcctgggcgaca- gagcaagactccTCGGTAAAGACACCACTTCGTCACCC - 145 -
1251 CGCCGCCGAGCCTCAGCCACGCCTCTGTGCAGCGGGGAAGACTCCTCTCGCGCCTTCTCAGTCAGT-
Figure imgf000147_0001
CATTGGCTGAAGGTCGCCGTCGCCCAACGCAGGCCATTCTGGGT
1252 gcagcctcaacctcctggggtcaagtgatcatcctggctcaaccacccaagtagccgggactacgg- gtggccgccaccatgcccggataatttttttatttttgtggagatgggggtcccacgatgttgc- ccagtccagtcttgaactcctgggctcaagtgatcctcccgcagcagcc
1253 CTTGCCGACCCAGCCTCGATCCCCTGCGGCGTCCAGGTCCCAATGCCCCAACGCAGGCCACCCCCG-
GCTCCTCTGTGGACTCACGAAGACAAGGTCCGGCCGCTCGGGCCGCGAGAGTCGCGCCATCACCAC-
CATTTTTCTGGATGCCCA
1254 GCGGCGTTCGGTGGTGTCCCGGTGCAGCCACGCGAGAGTAGAAGGGTGGAAAGGGGAGGTGCCCAGT-
GAAATGGAGCCTGTCCCGTGCACTTTCGGGCATTTCGAGCATCTTGTGGGCTCTCCCAAGTCGCGGC-
CCCTCCTCTGAGAGCCACAGTCAGGTCTGTCCTCAGGGGTCGAGGCGG
AGGCGGGGGGCACGGCCTTTCCATTTTCCCTGCTCCCCTCTGCAGAA
1255 CCGGACTCCCCCGCGCAGACCACCGTGCCAGGACAGCCCGCTCGGGAGTCGGGCCTGGAAGCAGGCG-
GACAGCGTCACC
CCGCATGGCGG
1256 ggccccctgcaagttccgcctcccgggttcacaccattctcctgcctcagectccccagcagctgg- gactacaggcacctgccgccacgcccggctaattttttgtatttttagtagagacagggtttcaccatgt- tagccaggatggtctcgatctcctgaccttgtgatctgcccgcctcggcctcccaaagtgttgggattacaggc gtgagccaccgtgtccagccTGTAACA
1257 GCCCAGGGGAGCCCTCCATTTGTAGAATGAATGAGAGTCCAGGTTATGAACAGTGCCTGGAGTGTAG-
GAACACCCTCCTTTGCCTCTTTGACAGGTCTGCATCATAACACtttttttttttttttgagacagagtct- cactctgtcgcccaggctggagtgcagtggcacgatctcggccccctgcaagttccg
1258 CCGGCTGCAGGCCCTCACTGGTTGGGTCCGCCCGCGAGGGTGCCCTGGGCCCGGT-
Figure imgf000147_0002
TTGCA
1259 GTCACACCTGCCGATGAAACTCCTGCGTAAGAAGATCGAGAAGCGGAACCTCAAATTGCGGCAGCG-
GAACCTAAAGTTTCAGGGTGAGATGCGTTGACTCGCGGTGGCTCAGAAGACCCACGCGCGAGCCCTG-
GCGCGTTCGGGCGGCCGGGGGCCCAGCTGCTCTGTGTGACGGAGGCAG(
GAGAGTGAAAAGGCAGCTTCCACTCGGGACCCGCGCTGCTGCCCACTC
1260 CCCTGCGCACCCCTACCAGGCAGGCTCGCTGCCTTTCCTCCCTCTTGTCTCTCCAGAGCCG-
GATCTTCAAGGGGAGCCTCCGTGCCCCCGGCTGCTCAGTCCCTCCGGTGTGCAGGACCCCG-
GAAGTCCTCCCCGCACAGCTCTCGCTTCTCTTTGCAGCCTGTTTCTGCGCCGGACCAGTCGAGGACTCTGG.
GTAGAGGCCCCGGGACGACCGAGCTGATGGCGTCTTCGACCCCATCTTCGTCCGCAACC
1261 CCTGGGGGAGCGCGGTGGGGGTAAGATAAGGGATGGGGGCTCCGAGGGCTGGGAACTGCAGGAAG-
Figure imgf000147_0003
TCGCGCGCCTGGGAGGCTTCCATCTCCCGGGACCCAGCTCTCAGCC - 14 6 -
1262 GTGGGGCCGGGCGAGTGCGCGGCATCCCAGGCCGGCCCGAACGCTCCGCCCGCGGTGGGC-
Figure imgf000148_0001
TTCCCCACCGAGAGCGCATGGCTTGGGAAGCGAGGCGCGAACCCGGCCCCC
1263 CGTCCAGGCTGTGCGctccccgttctcccctcctccccacttctccccacgcct- tgctcgtctcccgccctcctccgacaaccgctcccctcaccctccacccctacccccgc- ccctcctccttcctccccGGCATGCGCCATATGGTCTTCCCGGTCCAGCCAAGAGCCTGGAACCACGTGi
CCCATTTGTATGCCGCGGAGCGCTCCATTCCGGCCCCTTTGTGGCCA
1264 GCGCGGCGGTGCAGCCTCTCCCGAGCGCGCTGGGTCGCCTCTGCTCGGTCTGGGGTCTGCCAG-
GCGCGATCCCCCCGGTGCAGCCGAGCCCCTCCGCAGACTCTGCGCAGGAAAGCGAAACTACCCGGCAG-
GAGAAAAGGCAGCGCTGGCGCCCGGCCCCCTTCCGCCCCCACCAATCACC(
AGACGCGGCTGTTCCGTGGGCGCCACCGCCTCCCTCTGCGGGCCGCTGCT
1265 aggcggcggcggtggcagtggcacccggcggggaagcagcagcCAAACCCGCGCATGATCTCGA-
Figure imgf000148_0002
GCAGACGAGTGGAGCCCGAGGAGGCAGGGTGGAGGGAGAGTCAAGG
1266 GCGCGACCCGCCGATTGTGTCGAGTCAGCAGCGGCAGCGGGGACGCGCGAAGCCATGGCTCCCGC-
CCGCGCTCGGGAGGGCGCCGGGGGTCCTGCGCCTCCGGGAGGTTTGTGGCCGAgcgcggcgcggccccgagcg- gccccgcagcgcccggctccccgccgcTCGCTCTCCAGGCGCCGACCCGCCTGCGTCGCCACCCTCTCGCCGCT
CCCTGCCGCCACCTTCCTCCCGCCCGGGTGCCGGGCGTCCGCT
1267 CGCGGACGCCGCTCTGCACCTGTTGCCGCCGTCACTCATCCCGCCAGGCGGGCGGGGCCGCGCGGGT-
Figure imgf000148_0003
AGGGGGCGACTTAGCGGTTTCAGCCTCCAACAGCCTTGGGATCGC
1268 tgaacccgggaggcggaggttgctgtgagccgagatggcaccattgcactccagcctgggcaacaa- gagcgaaactccgtcccccgaacaaaaaattcaaatgggaaagagaggcagatggcagagaacaggggaggg- gctgggcaccgtggctcatgcctgtaatcccagcactttgggaggccaaggcgggtgga
1269 CTCGGCGGCGCGGGGAGTCGGAGGACGCAGCCAAGCGGCGGCGGCGAGGAGGGTCACAGCCGGAAA-
GCGGGGGCTTTATCGGCGGCGCCGCGCGGGCccccgccccttcctgccgcccccgcccccggcccgccttgccc cgccttcccgccg
1270 aggcggccacgggagggggaggggctggcaacggcgccgtgggggcggggctcgctttgtgcaag- gtccgcgctgattgggccgtgggcgcgcgggtcccggcctgcgtcgtgggactggcgtttttggcgccggct- gtgaggggagcgcgggggtggtggaatcgggcggtctccggttcgccaatgtggctgggtccgtaggcttgggc agccttggagttcctcagagaccccgcgctcggtcccggcacgc
1271 GACCCGAGCGGGGCGGAGAGTGGCAGGAGGAGGCGAATCTCCGCGCTCCGGCGAACTTTATCGGGT-
Figure imgf000148_0004
GGCGGCCGAGGGTGGGGTGCCGGCCACCACCACCCTTGGCGTGGG
1272 AGCACCTggggcggggcggagcggggcgcgcgggcccACACCTGTGGAGAGGGCCGCGCCCCAACT- - 147 -
:ACTCCCCC/
CAGCGGCCAGAGGTGAGCAGTCCCGGGAAGGGGCCGAGAGGCGGGGCCGCCAGGTCGGGCAGGTGT- GCGCTCCGCCCCGC 1273 CCGCTCGGGGGACGTGGGAGGGGAGGCGGGAAACAGCTTAGTGGGTGTGGGGTCGCGC-
Figure imgf000149_0001
TCGACCGGGGCGACTTCTATACGGCGCACGGCGAGGACGCGCTGCTGGCCGC
1274 ACCGCCAGCGTGCCAGCCCCGCCCCTACCCACCAGTGTGCCAGCCCCGCCCTTCCCCACGTCGC- cgcgcgcccgggggcggggcctggcgcgcaccgcccgcgcACGGCGAGGCGCCTGTTGATTGGCCACTGGGGC-
TTAGGAGCTCCGTCCGACAGAACGGTTGGGCCTTGCCGGCTGTC
1275 ATTCTTggccgggtgcggtggctcacgcctgtaatcccagcactttgggaggctgaggtgggtggat- cacctgaggtcaagagttcgagaccagcctggccaacatggtgaaaccccgtctc- tactaaaaatacaaaaattagccgggcgtggtggtgggcacctgtaatcccagetactcagaaggttgaggcag gagaatcgcttgaacccgggagaaggaggttgcagtgagccgagatcgcgccattgcac
1276 CGCTTCCCGCGAGCGAGCCGCCCAGAGCGCTCTGCTGGCGGCAGAGGCGGCGGCGAGGCTG- GCGCGCTTGCCGCCGTCTGCTCGCCCCGCGGAGGCGACCTGGGCAGACGCTGCTGG-
CCCAAATCCGAGTCTGCGGAGCCTGGGAGGGCTCCCAGCTTCCTATCCAAACCGCGCCGGGGCA 1277 AGCCGGCGCTCCGCACCTGCCCCTCAGCGCCTGCCGTCCGCCCCACCGCCGCGGCGC-
Figure imgf000149_0002
CTCCGGCCTCCAGCCCCCGCCCCGTCTCATCGCGCCGGGCGCCCGGTGCGCCTG
1278 CGGAGCGCGCTTGGCCTCACAGGACAGTGGGTGTGGCTGGGGTGACGGGGCAGGGTGGGGAAGACTG-
Figure imgf000149_0003
CGGTGGCAGGTGAATTGCTGCGGCGCCGGGTAGGGGCGGGC
1279 ggcctcgagcccacccagacttggccaagcagccctcggccagaccaagcacactccctcggag- gcctggcagggcccctgctttaccctgccccccacgccccgccccgacccgaccctcccaggcagcccct- cagcgtctgccgcccgcccttgggcctttccggccagcccctccctccgcccacgcccagaacagcccatgctc ttggaggagagcaggtgggcttgaccgggactggcccctcaccgcgg
1280 GCCGCGCCGTAAGGGCCACCCCCAGAGGCCGAGGAGGTGGGGCTGGCCTGGCTTTCTGGCCAGGT-
GGGGCTTGTCCAACCCCACAAACATCAGGGCTCACCCTGGATGTGGAAGAGAAGGAGCGAC-
CGCCTGGCCCGGGCGATGGGCCTCCCGTCCCCCCAGGGCTGCCTCCCCGCCGGTG
1281 CGGCGGTGGCGGTGGGTCGGCGACCGGCGGGCCGAAGACTGGAAGCCCGGGCCGCTGAG-
GCTCCGCAgccccctccgcgccgccccggcccgcccccgccgcgccgccccttccctccccgcgcccgc-
GGCTTGGCGACAAGGACGCACGACACGGGGCGGC
1282 ACCTGCCCAGTTACTGCCCCACTCCGCGGAATAAGCTCTTACCCAC- - 148 -
CGCTCCTCTTCTTCAATTCATTTCTGTTATGGAACTGTCGCGGCACTACAAAGTCTCTATGTAGT-
TATAAATAAACGTTATCTGGAAGAGCAGCCGACAACAACTTTCAAGATCTCCAATTCCCCGAC-
CCCACACTCCAACTGACGCC
1283 CCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTGCCAAACCGGGCAGCATCGCGACCCT-
Figure imgf000150_0001
CCGAGCGCGGCGGCGGGGCTCAGAGCCAGGCGAGTCAGCTGATCC
1284 CTGCTGCTGCCCGCGTCCGAGGCTCGCGGGCGGCGGGCCCGGGTGAGTGCACACCCGGCGCGCTGC-
CGGGCTCCCGGATGTGTCACCTTGTCCCGCTGCAGCCGAGATGCCGGGGGAGCGGGGCCTTCCACAC-
CCCCTCCGTGGGTGTGTGGTGAGTGTC
CTGCATGGGTTGGGTCCCGCGCGATG
1285 ACTGCTTAGGCCACACGATCCCCCAAGCCTGGGCTGCCAGACGTCGCCATCATTGTTCCATGCAGAT-
CATGCCCATCCTGTGCAGAAGGTCACTATAGGAACACATGGCACI
CATGGTGCTTGTCTAAACCGAAGGCAGGTGAGATCCACCCACTG
1286 GCCGGACGCGCCTCCCAAGGGCGCGGGTCCGAGGCGCAAGGCGAGCTGGAGACCCCGAAAACCAGG-
Figure imgf000150_0002
CGCGCCCCTCCGGCCCGGACAGGCCGAGTGGACATTGTCGGAG
1287 CGGCCAGGGTGCCGAGGGCCAGCATGGACACCAGGACCAGGGCGCAGATCACCTTGTTCTCCATGGT-
GGCCATTGCCTCCTCTCTGCTCCAAAGGCGACCCCGAGTCAGGGATGAGAGGCCGCCCGAGCCCCG-
GATTTTATAGGGCAGGCTC
1288 ccgcccgcccCACAGCCAGCGGCTCCGCGCCCCCTGCAGCCACGATGCCCGCGGCCCGGCCGCCCGC-
Figure imgf000150_0003
ACCAGGGCGACTCAGGCACCACCCGCCGGGA
1289 GCGCCCCAGCCCACCCACTCGCGTGCCCACGGCGGCATTATTCCCTATAAGGATCTGAACG-
Figure imgf000150_0004
CACTCGGCCGCCCAGGCAGCGGCGCAGAGCGGGCAGCAGGCAGGCGG
1290 cgtgctgggcgcaggggaaacagcgacgcacgggacaaaACAAGCTTGCAGAACAGCAGGGGGCAGA-
GATTCCGCGCTCGGGGTGCGAGAGGCCGCTCCcggggaggggcgggacccgggcggggcgggaggggcggggcg
CCCGGGCCTATTAGGTCCCGCGCCGGCAGCC
1291 GCGCACGCGCACAGCCTCCGGCCGGCTATTTCCGCGAGCGCGTTCCATCCTCTAC-
GATCCCGCGGCGTCCGGCCCGGGTGGTCTGGATCGCGGAGGGAATGCCCCGGA
1292 GGTGAGTGCGGCCCGGGGAGGGGAGGGGACCAGGGCGACCGGAGCCCCCAGCGATCCCGCCTG-
GAGCGGCCGCCAAGCTCCCTCGGGCACCCGGGTTCAGCGGGTCCCGATCCGAGGGCGTGCGAGCT-
GAGCCTCCTGGACCGGGTCCGCCGC
GTCGACGAGCAGGGGCCGCCGCCA - 14 9 -
1293 GGCCGAGAGGGAGCCCCACACCTCGGTCTCCCCAGACCGGCCCTGGCCGGGG-
Figure imgf000151_0001
GGCCCTACTGTgcgcgggcggcggccgagcccgggccgcTCCCTCCCAGTCGCGcgcc
1294 CCAGCGCCGCAACGCCCAGGGTGTGGGGCGGAGTAAGATGTGAAACCTCTTCAGCTCACGGCACCGG-
Figure imgf000151_0002
CAATCGCTCGCGCTCTGGTTGCGCTGGCGCC
1295 CCACAAGCGGGCGGGACGGCTGGAGACTGCCGGGACAGCGGCTGCCGGTGCTACGCGGGTGGTGG-
GCGGCCCGGAAATGAGCGCCCTCCGGGGACAGGGGGCTCTGCGGGGCGGCGACAGCTG-
GATTCCCAGCGCGCACAAAGCCTGCGGGAGGATCCATTGTAGCGGTCGCTCCTCCCCGCTTAGCGAGGGCGI
GCAGGGGCGGGGGATGTCGAAGGGTCAGGTTTGTCCAGGCCGCGCCACCTTCG
1296 CCTCTGGACAACGGGGAGCGGGAAAAAAGCTACGCAGGAGCTTGGATCGGGCGAAGCTCGCGG-
Figure imgf000151_0003
CCTCCGCCCACCGGCGGCGTCTCCCGCGAAGCCCGCTGGG
1297 GCGGGTTCCCGGCGTCTCCAAAGCTACCGCTGCCGGAAGAGCGCGGCGCCCGACGGAGCCGTGTG-
GAGGCCAAAACTCCTCCCGGAAGCCGCTACTGGCCCCGCTTGC
CCGGGGGAAGCCGGGAAGCCGTTAGGGGGCGGGGCAAGCGGG
1298 CGCCGCCCGTCCTGCTTGCTGCTGGGTCCGGTTGCCGAGGCGGAAAAGTCGCAAGCTCCTTCAGT-
Figure imgf000151_0004
TACTGGAGGAGAGGCCAGCATGCTTGTCAGGCACCAGCAGGTGGA
1299 CGCGCGGCCCTCCTGCACCTCGGCCAGCACTCGTAGCGCGCTGGGCGAGCCGGACCGGAAGT-
Figure imgf000151_0005
GGCGGTGGGGCGCGGGCCCCTGGGCCCGGACCAGGGAGCAGGCAGCCG
1300 GGCGGGGCAAGCCCTCACCTGCGCCAATCAGGGTGCGGAGTAGGCCCCGCAGGCGCCTCACCCAT-
Figure imgf000151_0006
CTTGCACTGGGCAGAGGGGCAGCTTCCCTGAGAGCAGCTAAGC
1301 GGAGCGCCCCCTGGCGGTTTCAGGGCGGCTCACCGAGAGGGCGCCGGGAGCGCCCGGTTGGG-
GAACGCGCGGCTGGCGGCGTGGGGACCACCCGGCAGGACCAGGCACCAGAGCTGCGTCCCTGCTCGC
1302 CGAATGGTTCGCGCCGGCCTATATTTACCCGAGATCTTCCTCCCGGACGGCAAGGATGTGAGGCAG-
Figure imgf000151_0007
CTGTTCTCGGAGCCCAGCGCCGTCTCGGCCAGGCCAGCCCGG
1303 TTCCGCCGGCTGGGCCCTCCGTCTACCCCCAGCGGCGAggggcggggccggcgcgggcgcAGAG- GCGTCACGCACTCCATGGTAACGACGCTCGGCCCGAAGATGGCGGCCGAATGGGGCGGAGGAGTGGGT- - 150 -
GTTGGGCGGGCTCCGGGCCAGCGCCACATCTACTCCCGTCTCCTTGGGC
1304 CTCCGGGTcccccgcgtgcccggcccgccccggcccgcTTCCCGGGCGCTGTCTTACTCCGGGC-
CCGGGGCGCCTGCTCCGCGCCGCGTCTGCGAACCGGTGACCTGGTTTCCCCTCCAGCCCTCACGGCT-
GTCCGACTTGCGCGGCGGTGGCGGCGGCGGCCAAGAGCAGGCAAACCCGG(
ATGGCCTCCTGGCGCACACCCCGCCGCCGCCGCCAGCCATCGCCACCGCC
1305 CAGCCCGGGTAGGGTTCACCGAAAGTTCACTCGCATATATTAGGCAATTCAATCTTTCATTCTGTGT-
GACAGAAGTAGTAGGAAGTGAGCTGTTCAGAGGCAGGAGGGTCTATTCTTTGCCAAAGGGGGGAC-
CAGAATTCCCCCATGCGAGCTGTTTGAGGACTGGGATGCCGAGAACGCGI
GCACCGTCGGGGTAGGATCCGGAACGCATTCGGAAGGCTTTTTGCAAGC
1306 GGCGGAGAGAGGTCCTGCCCAGCTGTTGGCGAGGAGTTTCCTGTTTCCCCCGCAGCGCTGAGT-
Figure imgf000152_0001
TGGCGAGCGGGCGCCACATCTGGCCCGCACATCTGCGCTGCCGGCC
1307 CCTCACCCCAGCCGCGACCCTTCAAGGCCAAGAGGCGGCAGAGCCCGAGGCCTGCAC-
Figure imgf000152_0002
CGCGGGTAGCTACGATGAGGCGGCGACAGACCAGGCACAGGGCCCCATCGCCCT
1308 CGATGACGGGATCCGAGAGAAAGGCAAGGCGGAAGGGGTGAGGCCGGAAGCCGAAGTGCCGCAGG-
Figure imgf000152_0003
GGAAGTCCTCCTTGTATTCCTCCCTCGCCTACTCTGAGGCCTTCCA
1309 CCGCAGGCCGCGGGAAAGGCGCGCCGAGTCCTGCAGCTGCTCTCCCGGTTCGGGAAACGCGCGGG-
Figure imgf000152_0004
CCCGGGGAGCGTCCGTGGGGTGCCCAGGCA
1310 GCCCCAGTCCACCTCTGGGAGCGCCTGCGCCGCTCCGCGGAGAGTCCGTGGATCTCACAGTGAGC-
Figure imgf000152_0005
1311 CTTGGCCGCCCCCGGGATGGGGCGAGGGGTTCCCGAGGGCTtgggagggcggcttgggaga- gagctccggctccggaacgaggtgtcctgggaacactcccgggtctgtaacttcggacaaat- cacgctcgctttcccggcctcagtgtgccgttctgtaacttgggtctaaCCCCGGCTCGCACACACGGCGGGGA CGCGCACAG
1312 CCTCCATGCGCAATCCCAAGGGCGGAGAGGAATTTCAGCAGCTACGAGCAACAGAAAGGAAACGAGA-
GCACGTCCGCccccgccccgccccgctccgcccGGCGCACCTGATGCCCAAACTGGTTGCACGGGAAGCCGAGC ACCACCAGGCCCCGGGGTCCGAGGCGCCGCTGCA
1313 gcggcgactgcgctgccccttggctgccccttccgctctcgtaggcgcgcggggccactact- cacgcgcgcactgcaggcctttgcgcacgacgccccagatgaagtcgccacagaggtcgcaccacgtgtgcgt- ggcgggccccgcgggctggaagcggtggccacggccagggaccagetgccgtgtggggttgcacgcggtgcccc gcgcgatgcgcagcgcgttggcacgctccagccgggtgcggccctt - 151 -
1314 GGGCTTGCCTCCCCGCCCCTACCTTCCAGGATGTTGACAGCTGGGAATGAAAGGCAGAGGGAGG-
Figure imgf000153_0001
CCAAGATGGCCGCTTGTCTGGGGACAGGAGCGGAGGCCAATACGCG
1315 GCGGCCCAAGGAGGGCGAACGCCTAAGACTGCAAAGGCTCGGGGGAGAACGGCTCTCGGAGAACGG-
Figure imgf000153_0002
CGCGAGCGCCTCAGAACGGCCCGCCCACCC
1316 ctgcgcggcTGGCGATCCAGGAGCGAGCACAGCGCCCGGGCGAGCGCCGGGGGGAGCGAGCAGGG-
Figure imgf000153_0003
AAAGGAGGACCAGAAGGGAGGATGGGATGGAAGAGAAGAAAAAGCA
1317 CACCGCCTCCGGACCCCTCCCTCATCAGAAAGCCCAGGCTCCGCTCGTAGAAGTGCGCAGGCGTCAC- CGCGCATCCAGGAGCCACGTGTCAGGAGTCACGTGT CGTTGGAGCGCCTGCGCAGCTTTTCCGCACGCGCC
1318 CCTTCCAGCCACCCCGCCCTGGGCGCCTCTGGCGCGCTCTGATGACGCTCCAAGGGAAGAGGAAGT-
Figure imgf000153_0004
CCGAGGGCGGCCTGGA
1319 CGGGACACCGGGAGGACAGCGCGGGCGAGGCGCTGCAAGCCCGCGCGCAGCTCCGGGGGGCTCCGAC-
Figure imgf000153_0005
GCGTTGAGAATGCCCCTCACGCGCTTGCTGGAAGGGAATTC
1320 CCTGGGTTCCCGGCTTCTCAGCCACTGGAGCTGCCAGTCTCAAATTACCGGAGGGGAGGGAGGGCAG-
GGGCTCCTTGTGCCCAGATCCTTTGTATTCATAGGGGGAAGTGGAAGACCACGCTGCC
1321 GGCGGTGATGGGCggaggaggaggaagaggaggaggaggaagaggaggagggggaAAACGATGACAG-
Figure imgf000153_0006
ccctcccccccctcccccccccccACACACACACACTCCCCTG
1322 CAGCCCGCCCGGAGCCCATGCCCGGCGGCTGGCCAGTGCTGCGGCAGAAGGGGGGGCCCGGCTCTGC- ATGGCCCCGGCTGCTGACATGACTTCTTTGCCACTCGGTGTCAAAGTGGAGGACTCCGCCTTCGGCAAgccg- gcggggggaggcgcgggccaggcccccagcgccgccgcggccACGGCAGCCGCCATGGGCGCGGACGAGGAGGG GGCCAAGCCCAAAGTGTCCCCTTCGCTCCTGCCCTTCAGCGT
1323 GCCCGCGGGGGAATCGCAGTGAGCAGCGCGGGGCGAGGCCGCCGCGGACGCCCCGTCGGATGTGC-
Figure imgf000153_0007
GGGTAGGTGGTGCCGTCGCTGCCGCACACCGGG
1324 GCCGCGAGCCCGTCTGCTCCCGCCCTGCCCGTGCACTCTCCGCAGCCGCCCTCCGCCAAGC- - 152 -
GCGCTGCTGCTCTGCCTGGGTAAGTTCTCCCCCTCTGGCTTCCGGCCGCCCCAA
1325 GCGGCCCCCTCCCGGCTGAGCCTATAAAGCGGCAGGTGCGCGCCGCCCTACAGACGTTCGCACACCT-
GGGTGCCAGCGCCCCAGAGGTCCCGGGACAGCCCGAggcgccgcgcccgccgccccgAGCTCCCCAAGCCTTC-
AGACCGAGTTGCCCCAGAGACCGAGACGCCGCCGCTGCG
1326 CAGCAGGGCGCGGCTTCCCTTTCCCGGGGCCTGGGGCCGCAATCAGGTGGAGTCGAGAGGCCGGAG-
Figure imgf000154_0001
CCTCCCGGCGCGTCCCCGCGAGCCTCGCCCACAGCCGCCTGCTG
1327 CCGCAGCACGCTCGGACGGGCCAGGGGCGGCGACCCCTCGCGGACGCCCGGCTGCGCGCCGGGC-
GCCCAAGTCGGAGggcggcagcggcggcggagcggcgggtggcggggctggcggggccggggccggggccggct gcggctccggcggcTCGTCCGTGGGGGTCCGGGTGTTCGCGGTCG
1328 GCGGAGTGCGGGTCGGGAAGCGGAGAGAGAAGCAGCTGTGTAATCCGCTGGATGCGGACCAGG-
TGGTGCAGCCCGCCAGGGTGTCACTGGAGACAGAATGGAGGTGCTGCCGGACTCGGAAATGGGG
1329 GCGCGGGGGCAGGTGAGCATGCGAAGGTTGGAGGCCGCGCCCCTTGCTGAGGCGCAGCTGGCT-
Figure imgf000154_0002
GCTTCTCCCCAGCACCCCGGTGTGGGCTTCCCAAGGTCCTGCCTGA
1330 ggcgcgggggcaggtgagcatgcgaaggttggaggccgcgccccttgctgaggcgcagctggct- gctcttttcgggccggcatacgcgcgcagccgcagctgaggtcaccccgctgaggtggtggggaggggaatg- gttattcttgaggcaccgcatctcttgaggaggaaagagccg
1331 AGTGACGGGCGGTGGGCCTGGGGCGGCCAGCGGTGACTCCAGATGAGCCGGCCGTCCGCGTTCGCGC- CGCGGCGGTGCGGTT
CAGGGACGGCGGCG
1332 GGCGACCCTTTGGCCGCTGGCCTGATCCGGAGACCCAGGGCTGCCTCCAGGTCCGGACGCGGG-
Figure imgf000154_0003
CGCGACTTGGGCTCCTTGACACAGGCCCGTCATTTCTCTTTGCAGGTTC
1333 CGCGGCAGCCCGGGTGAATGGAGCGAGGCGGCAGGTCATCCCCGTGCAGCGCCCGG-
Figure imgf000154_0004
TGAAACAGAGGCGGGGTCGGCGGGCGATTAGCGGCCGAGGCACGCTCCTCTTG
1334 GGCGAGCGAGCGGGACCGAGCGGGGAGCGGGTGGAGGCGGCGCCACG-
GCGCGCACACACTCGCACACACGCGCTCCCACTCCAcccccggccgctccccgcccgaggggccgcgcggcg- gccgcggggAACGATGCAACCTGTTGGTGACGCTTGGCAACTGCAggggcgcccgcggtccctgcccccacgcc ctccgcgcgggccccgccaccccggccccgacggcgcctgcacgcccgcgtcccctg
1335 GGGGCAGTGCCGGTGTGCTGCCCTCTGCCTTGAGACCTCAAGCCGCGCAGGCGCCCAGGGCAGGCAG- - 153 -
GGGGCCAGCCAGGGTAGCCGGGAAGCAGTGGTGGCCCGCCCTCCAGGGAGCAGTTGGGCCCCGCCCGG
1336 CGCTGGCATTCGGGCCCCCTCCAGACTTTAGCCCGGTgccggcgccccctgggcccggcccgg- gcctcctggcgcagcccctcgggggcccgggcACACCGTCCTCGCCCGGAGCGCAGAGGCCGACGCCCTAC-
GAGTGGATGCGGCGCAGCGTGGCGGCCGGAGGCGGCGGTGGCAGCGGTAAGGACCCTTCCCTCGCCCTGCGC(
CTGGACCTGCAGGTGCTCGGGCGCGGCCCAGGCCGCCCCCTGTCTGA
1337 GAGCCGTGATGGAGCCGGGAGGAGAGGCGCATCCTCAGCAGAGCTTCCCTCCCTTGCACACGAGCT-
Figure imgf000155_0001
1338 GCCAGGGTGTCTTGGCTCTGGCCTGAGTCGGGTATGTGAAAGCCTTTTGGGGCAGGAAGGG-
GCAAAGTGATACCTGGCCGTCCCACCCTCTGGTCCCAGAAGGAGCTCTCGCTGGAGCCAG-
GCAGCCTCCAGTCCCCCTCCTTTCAGCCTTGTCATTCTCTGCATCCTGCCCAGGCCACAAAGGA
1339 CGGCTCCGGCGGGGAAGGAGGCgggctgcggctgcggctggggctgaagctggggctggggttgggg-
GACTGCCCGGGGCTTAGATGGCTCCGAGCCCGTTTGAGCGTGGTCTCGGACTGCTAACTGGACCAACG-
GCAACTGTCTGATGAGTGCCAGCCCCAAACCGCGCGCTGC
1340 GCCAGGGTGCCGTCGCGCTTGGCGCCGTCCAGGGCGGCGCTGCGCTCGTCCAGCAACACCACGGCGT-
Figure imgf000155_0002
AAGGAGCGGCAGTCCAGCAGCAGGCATTGCGCCGCTCGCTCC
1341 CGGCTCGGTCCTGAGGAGAAGGACTCAGCCGCGGCTGCGGGACCCGGGCACCGGGAggcggtggcg- gcggcggcggcggcagcagcggcgacagcagaggaggaagaggaggaagaaggaaagaaaaagaagaaCCAG-
GAGGAGTCCTCAACAACGACAGCGGGGACTGCGGGACCAGGGTAAAGCGGCGACGGCGGCGACGGCCCAGCAA
CGTGA
1342 CGCGGGGAACCTGCGGCTGCCCGGGCAAGGCCACGAGGCTTCTTATACCCGGTCCTCGC-
Figure imgf000155_0003
TTGGCAGGCGCTGGGGCCCTCTGAGAGCAGGCAGGCCCGGCCTTTGTCTCCG
1343 CCGCCGCTGCTTTGGGTGGGGGGCTGACAGGGCTGCGCGCGTCGCGCTCTTGGCTGGGGCTGCGCGG-
Figure imgf000155_0004
TCAACCGGCGCGGGGAGTCCCGTGAAGACATGAGGGCGCCAGGAG
1344 ACCTGAGCCCGCGGGGGAAccccccccccacccccggggaaccccccccacccccgccgc- cccccgccTGCAAGTTGTTACCAGTAAATAAAAGGGATCCTATTTTAGCAAGCCACACAGCATTAGAGG-
GCAAATAATAGTTTGGTGGCAGGAGAGCGATGAGACGGGAAAGTGTGGGGCAAAGCTTACAGTCATTGGT
ATTCTAACTGGCCTGTTAGCCAAAAAGTAAGGTTTTCTTTACCTCCGTGTTG
1345 AACGCCGGCCTCACCGGCAGACGCGCGCCCTCCTCCCAGATGCGCAGGTGACCCCGGCGGGCG-
GCGCGGGAAAGGGAAGAGCTCCGCGAGGCCGCGCGGGGGGGAAGCGGGAGAAGC-
CGCTCTTCCTATTCCACTCGCAGTCTGCGTGTGGGGGAAACGAGTGCCCGGCGTATGAAACGCCTAACTT
AAATAAAGAGAGACGTATAAAAGTTCAAGAATTCTGTCCAGACTCAAGGGCCCTTTCTCATTTA
1346 CCGTGGTCCCAGCGCTCCTGCTATTTGCATTCCAAAGCAGACACCTCATGCGCTCAACCCCGC- - 154 -
GAAGGGGGCCGGGGGCGGGACCAGGGCGCGCGTTCCGGTCCCGGGGCGTGGC
1347 TGCGACCCGGCGCCCAAGCAGCCTGGGACCTTGCGCGGACCTGACCCCTTCAGACCGCAGGCAGTCT-
Figure imgf000156_0001
TGGGGGGTGGGGTTGTTGGAGCCCCGCGGAGGTCTGGGAGGCCC
1348 GGCTCTGCGCTGCCTTTGGTGGCTCCTCCCTGGTCCTCTAAATGTGACACCAGGCGGATGCGGGGC-
CACAGGACCCTGGGGCTTGAGTCACACA/
GACCATGGTTCTTTggccgggcgcgggg
1349 gcgcgggcggcTCCTTTGTGTCCAGCCGCCGCCACCGGAGCTCCCGGGGCCTCCGCGGG-
Figure imgf000156_0002
TGCCAGCCGAGGAGGCGCGGCGGAGAGGGGACTGCGGTCAGCTGCGTCCA
1350 GGCCCGTTGGCGAGGTTAGAGCGCCAGGTTGTAAGAATCGGGTCTGTGGACCTCATACCAGATAG-
Figure imgf000156_0003
CTCCTCGCGGCCGCGAACCCAGCGCCGTCCTCGCAGCGCGGCAA
1351 acccggcatccgggcaggctgcgcgcgggtgcggggcgagggcgccgcggggACTGGGACGCACGGC- CCGCGCGCGGGACACGGCCATGGAGGACGCGGGAGCAGCTGGCCCGGGGCCGGAGCCTGAGCCCGAGC- CCGAGCCGGAGCCCGAGCCCGCGCCGGAGCCGGAACCGGAGCCCAA( TCCCGACTCTGGACCGACGTGATGGGTATCCTGGTAAGTTACCTGG
1352 CCCGGACTGTAATCACGTCCACTGGGAACTGGCGCAGTAGTGGAGGGGACGCGATCAGGCCCGTG- GCTGCGCCCAGAGCATGATI
GCGGGGACAGGGAGACGCC
1353 CGCCGCCAACGCGCAGGTCTACGGTCAGACCGGCCTCCCCTACGGCCCCGGGTCTGAGGCTGCG-
Figure imgf000156_0004
CCAGCGGCTACACGGTGCGCGAGGCCGGC
1354 GCTGCCAGCTGCCGCTCCGGCTCCCACTTCCCACCTGCTGCCCGAGGAAGACTTCCGG-
Figure imgf000156_0005
CGACCCGGGCCCCTCGGAGCTCCCCTTCAGGATCGTGCACCAAGCGCGCAC
1355 GCGCCCACCTGCGCCTCGCGGGGTCCCCGAGGTCCCGCCACCGAGCGCCCAAGGCGG-
Figure imgf000156_0006
GGAGTCGACGCCCAGCACGGGGGTGACGGCGCTGCCGTAGGTGCAGGGCGGC
1356 CGGGCCAGGGCGGCATGAAGAAGTCCCGCCGCTACGTGCCCGGCACAGTGGCCCTGCGCGACGTTCG-
GCGCTACCAGAAC
GCACGAGAGCGA
1357 GCTGCGACCTGGGGTCCGACGGACGCCTCCTCCGCGGGTATGAACAGTATGCCTACGATGGCAAG-
GATTACCTCGCCCTGAACGAGGACCTGCGCTCCTGGACCGCAGCGGACACTGCGGCTCAG- - 155 -
JGCTGAACA/
GAGTGGCTCCACAGATACCTGGAGAACGGGAAGGAGATGCTGCAGCGCGCGGG 1358 GTTAGGAGGGCGGGGCGCGTGCGCGCGCACCTCGCTCACGCGCCGGCGCGCTCCTTTTGCAG- GCTCGTGGCGGTCGGTCAGCGGGGCGTTCTCCCACCTGTAGCGACTCAGGTTACTGAAAAGGCGG- GAAAACGCTGCGATGGCGGCAGCTGGGG 1359 AGCGCACCAACGCAGGCGAGGGACTGGGGGAGGAGGGAAGTGCCCTCCTGCAGCACGCGAG-
Figure imgf000157_0001
1360 CGGCTGGCCCCGCCCACTCTCCGCGGCCGGAAGTGGCGGCGCCGAGTGAGGTAAATGCGTGCCCG-
CCTCGACGGGGGAGTTGCCGAGAAAAGGCCTCGCCGGCA
1361 GGGGTTGCCGTCGCAGCCAGCTGAGTGTTGCGCCAGGGGGACAGGTATGTTCCAGGCAGTGGCAAGC-
Figure imgf000157_0002
TCCCCACCCGCC
1362 CCGCACTCCCGCCCGGTTCCCCGGCCGTCCGCCTATCCTTGGCCCCCTCCGCTTTCTCCGCGCCGGC-
CCGCCTCGCTTATGCCTCGGCGCTGAGCCGCTCTCCCGATTGCCCGCCGACATGAGCTGCAACGGAG-
GCTCCCACCCGCGGAT
ACCAGCGGCGGCGGG
1363 ggaccccctgggcagcaccctggccacccttccatccacaacatccagaccacacggccaagg- gcacctgaccctgtcaaaaccccaaatccagctgggcgcggtggctcatgcctgtaatcccagcatttgggag- gccgaggcagccgg
1364 gaggcagccggatcacgaagtcaggagttcgagaccagectgaccaacatggtgaaaccccgtctc- tactaaaatacaaaaattagccgggcgtggtggtgcacacc
1365 GCGCGTGCGGGCGTTGTCCCGGCAACCAGGGGGCGGGGCTGGGCGTGGCACCGCCCCGCGCTCCGCT-
Figure imgf000157_0003
CCGCCTCGCCTGGGAGAAGCCGCCGGGACGCGCC
1366 CAGGATGCGGCAGCGCCCACCCGCGCGGCGTGGAGGGGGCCGGGGGCGGCGCTCGGCGCAGATG-
Figure imgf000157_0004
GCCCCGACAAGTCCCAGCCCTCGGAGGCAGGGCGGGGCGCAGGGA
1367 GATGCGGCCCGCGGAGGAGAGAGCAGGAGGACGGACGGGAGGGACCTCCGCGGGGAGG-
GCGCGCgggggaggcggggagggaggcgggagggggaggggACGGTGTGGATGGCCCCGAG-
GTCCAAAAAGAAAGCGCCCAACGGCTGGACGCACACCCCGCCAGGCCTCCTGGAAACGGTGCCGGTGCT(
GCCCGCGAGGTGTCTGGGAGTTGGGCGAGAGCTGCAGACTTGGAGGCTCTTATACCTCCGTG
1368 GTTCTGCGCGCGCCCGACTCCGCTGCCCGCCCCGCCAGGCCTCCGGGAGGTGGGGGCTGGGAG-
GCGTCCCCCGCTCCCGCCCCCTCCCCACCGTTCAATGAAAGATGAACTGGCGAGAGGTGAGAAGGGAAG,
GCTCCCGGCTCTCTCGGGGCGGGAATCAGTGGGCCAGAGCTCGCCGGGTGGCCGCAAG
1369 CCCGCCGTGGGCGTAGTAAccgccaccgccgccgccccccgcgccaccaccaccgccgccT- - 156 -
Figure imgf000158_0001
GAATACGATTAGCAATCCCCCCGCACCGCGGCGGGCGCCCGCAGCCAATC
1370 ACCCGCCCGGGCAGCTCCAGTCCCGGACTCCGCAGCTCGGAGCGCAGCCAGCCACGGCCATTGCGG-
Figure imgf000158_0002
1371 CAGCGGTCGCGCCTCGTCGGGCGACGGCTGGCAGCGAAGGCCGGAGCCACAGCGCTCGGTGTAGAT- GCCGCACGGCTGGCCCTCGCTCAGTGCGCACGTCAGGCAGCAGCCGCAGCCCGGCTCGCGCAC- CAGCTCCGCGCACACGGCC
CGGGACCCAAGCCCGCCG
1372 GCAATCGCGCTGTCTCTGAAAGGGGTGGAGAAGGGGCTGGATGAGTCCGGAAGTGGAGATTGGCT- GCTTAGTGACGCGCGGCGTCCCGGAAGTTGACAGATACAGGGCGAGAGGCAGTGGAGGCGGGACTTG- GATAGGGGCGGAACCTGAGACTACCTTTCTGC TGTGAAGCTCCGGTGCTGGTGCGGCGGGGGA
1373 GAGCGCCCGCCGTTGATGCCCCAGCTGCTCTGGCCGCGATGGGCACTGCAGGGGCTTTCCTGT-
Figure imgf000158_0003
AGCAGGTCCCCGCACCGCGC
1374 CATGGCCCGCTGCGCCCTCTCCGCCGGTTGGGGAGAGAAGCTCCTGGAGCGGCCAGATACCTGTTG-
Figure imgf000158_0004
GTGCGCGGCTTTCTGCTCCAGGCGGCCCGGGTGCCCGCTTTATGCG
1375 GGGGGCGGGGTGCAGGGGTGGAGGGGCGGGGAGGCGGGCTCCGGCTGCGCCACGCTATC-
Figure imgf000158_0005
GCAGAGCACTC
1376 CGACCCTGCGCCCGGCAGTCCCCGGGGGCCGTGCGCCCGGCCCAGGCTCGGAGGTCCAGCCCAGCG-
GCGGCTCAGGCTGCGCGCCTGGCTCCCAGCCTCAGTTTCCC
GCTTCTGCGGACAGAGCCTTGGGCTCCGACGTCTGCGCGG
1377 GGCTTCAAGTCCACGGCCCTGTGATGGGATGTGGGCAGGGCCTGAGACAGGCCGAAC-
Figure imgf000158_0006
CGCGCCTCGGCCA
1378 CCCCACCTGCCCGCGCTGCTTCTACCTGAAACTGGCCAAGGGCCCGAGCCCGGACCGGAGCCGT-
Figure imgf000158_0007
GAAAGAGCGTGGTGGGGGACCCGCGGCCGATGGAATCCCTGGGGCA
1379 gcgcgcggagacgcagcagcggcagcggcagcATGTCGGCCGGCGGAGCGTCAGTCCCGCCGC- CCCCGAACCCCGCCGTGTCCTTCCCGCCGCCCCGGGGTCACCCTGCCCGCCGGCCCCGACATCCTGCG- - 157 -
GGACAGGCGGCGGCATCCTTGTCCCCCGGGCTGTCTTCCTCTGCGTCCGC
1380 GTGAGCCGGCGCTCCTGATGCGGAGAGGTGCGGCCATGTCCTGGCTGGGAGCGAAGCGC-
Figure imgf000159_0001
CTGTTCTCCGGCCCCGCCCCATTCCCAGGCTCCGCCCCC
1381 TGCCGCGGGGGTGCCAAGGGAAGTGCCAGCTCAGAGGGACCATGTGGGCGCAGGCACCCAGGCG-
Figure imgf000159_0002
GCCCGAGGGGCCCAGCCCTGCTCCAAAAGGGCTGTGGCTCCACCCAC
1382 CTGCTGCGCGCGCTGGCTCTTCTGCGAGGCCTGCTTGAGCTTGTTGCCGCCTTTGGGCTCCGGGC-
Figure imgf000159_0003
TCGTCCGTGGAGATGGGGGAGCGG
1383 CTGGCGGCCCAGGTCGCTCCTGCCCAACCCGGGGACCCATCTCTTCCCCCGACTCCGACGACTGGT-
Figure imgf000159_0004
GCGACAACGAGCACAAGGGTCTTGGGGACCCGGGGCCCAGGCC
1384 AGCGCCCCGGCCGCCTGATGGCCGAGGCAGGGTGCGACCCAGGACCCAGGACGGCGTCGGGAAC-
CATACCATGGCCCGGATCCCCAAGACCCTAAAGTTCGTCGTCGTCATCGTCGCGGTCCTGCTGCCAGT-
GAGTCCCGGCCGCGGTCCCTGGCTGGGGAAGAGCGC/
CAGGGATGCCTGGCCCTGGTCACCTGCGGCCGGGCA
1385 GCCGCACGGGACAGCCAGGGGGAGCGCGCGCTCTGCTCCCTCGCGGCCCGGTCGCTCCTGCCCAGC-
CCGGGCCAAGCATCCCCACCGTGTCCCCCTCTCTCCCTGCCCACTCCCGGCGC
1386 CCCGGACATGCCCCGCCACAAGTGACCCGGGCCAGGCACCCCCGCCGCGTCCCCCTCTCTCTCTGC-
CCCCTCCCGGTGCCAGGCGCGCTTTTCCCCAGGCAGGACCGCGGTGGGGACTCACCTGCAGCAGGAC-
1387 cgggggccgccgcctgacttcggacaccggccccgcacccgccaggaggggagggaaggggag- gcggggagagcgacggcggggggcgggcggtggaccccgcctcccccggcacagcctgctgaggggaa- gagggggtctccgctcttcctcagtgcactctctgactgaagcccggcgcgtggggtgcagcgggagtgcgagg ggactggacaggtgggaagatgggaatgaggaccgggcggcgggaa
1388 CAGTGGCGGCCCTCGGCCTGCGGTCGGAGGCGGCGCGGGCGGGGAGGCGGCGCTGCGGGCTGGGT-
GCGCCCCGGCTCCCGGAGGTGCGGCGAGCAGGAAggcgcggggcggcgggcgcgcggcACTGACTCCGGAG-
GCTGCAGGGCTGGAGTGCGCGGGGCTCCTACGGCCGAGCCCTC
GCGGGGCGAGCCGCACTCGTTACCACGTCCGTCACCGGCGCG
1389 GCCCGGCGCGGATAACGGTCCGGCGGGAGGACACGGCGGTCCCTACAGCATCGCGGCGGGCCAG-
GCTCGGGCAGGGGCCGTGCTCAGGTGCGGCAGACGGACGGGCCGGCGCCTCTGAAGTCACCCG-
GCTCCTTTACGAACTGAGCCCGTTTTGGCTGGGAGGGTT
1390 GCTCCGGGTGGGGAGGGAGGCTGGCAGCTCACCCCCGGGGGCGAGGGGTCTGCGTTAGCCGTAGC- - 158 -
:CGCGGCTCGGCGACCCGCGCCC GCCTCGCCCG 1391 GCCTGGGCGCAGAACGGGGTCCCTCGGCAGGACCCTCGCCGCGACAGCCTCAGCAGGGGATCGTC-
Figure imgf000160_0001
GCTCTTTTCCTGGGCGTCCGCGGCC
1392 GGTCCTAATCCCCAGGCTGCGCTGACAGGATTAGGCTCCGTTCCTCCCCATAATGTTCCCAGGAC-
Figure imgf000160_0002
ACGG
1393 CCGCGTCCCCGGCTGCTCCTCCTCGTGCTggcggcggcggcggcggcggcggcggcgCT-
GCTCCCGGGGGCGACGGGTGAgcggcggcgcggcgggcgggcgactgcggggcgcgcgggccggacccg- gccTCTGGCTCGCTCCTGCTCTTTCTCAAACATggcgcggggccgggggcgcaggtggcggcgccggggcccgg gccgggctctcgtggcgccgcgcggctcggcggctgccgggcgAACCGCAAGC
1394 GGCAGGGCTGACGTTGGGAGCGCTATGAGCTGCCGGGCAGGGTCCTCACCGGGGGCTTCCTCTGCGG-
CGCCTTCTCCGCAGGTCTTTATTCATCATCTCATctccctcttccccttctccttctcctttgcctccttctcc tttgcctccttctcctcctcttcctccccctcctccaccaccacc
1395 CCGTGGGCGCAGGGGCTGTGGCCGGGGCGGTGGGCGGGCGGTGCCGCCAGGTGAGACTGGCTGCCGT-
Figure imgf000160_0003
CCGCCGCCGCCCGCCACCGCCTCTGCTCCCCGCG
1396 cctgcgcacgcgggaagggctgccggaggcgcccgtagggaggcgcgcgcgcgggcggctcagggc- ccgcgttcctctccctcccgcctaccgccactttcccgccctgtgtgcgcccccacccccaccac- catcttcccaccctcagcgcgggcgccc
1397 GCGGACGCAGCCGAGCTCAAAGCCGCTCTGGCCGCAGGGTGCGGACGCGTCGCGGAGTCCTCACTGC-
GCGTCTCCCCACCCCCTTAGGCTCCGCCCCCTGTCCGCTGTGATCGCCGGGAGGCCAGGCCC
1398 GACCCATGGCGGGGCAGGCGGCGGCGCTGTCGGGCGGGCAGGGGTGGCGGGAGGCGGTGGCGCAGC-
Figure imgf000160_0004
GCGGACGCCGGGTG
1399 ccgctccccgcccctggctccgcctggc- cccactcccctccgcgcgccttccctcttctcccccgctccccGCGGACGCTCCTCTCTTTCCCAGTGGGC- CAACTTTATGCTGAAATTTCTTTTCTGCCCTTTTTTGGGATGTTTCCCCATTGGGAGGCGGAGCCGGGCTGC( CGGGGAAGGCGGAGGGCGAGGGGAAGAGTCACTGAGCTGCGGGGCATAGGGGGTCCGGGGCGAGGT- GCCTTCTCCCACCCAG
1400 tgtgccgcgcggttgggaggagggtcgtgagcgtgagcgtgggagcgctgggggctctgctcgcgt- gctgctctgaagttgttccccgatgcgccgtaggaagctgggattctcccatccggacgtgggacgcaggg- gaggggtaggtttcaccgtccgggctgatgactcgtggcctccggggctcctgg - 159 -
1401 CACTCACGCTCTCAGCCCGGGGAATCCCAGCGGGGAGGAGGGAGGGAG- GTCGTTTTCTTCAGCTCCCCAGGTGGTCTGTGCTGGGTGTGCTGACGGTCCTTTTGGGAAAACAG-
1402 GGCAAGCGGGCTTCGGGAAGAATGCAGTTGGTGAGGAAGCTCGGCGAGGCGTGCCCGTGCAGCTGC-
Figure imgf000161_0001
AGCTCTGATCTCTGCCTTCGCCTCGCGCAGCTGTGCGGCGAGCCC
1403 CCCGCGGGCCGGGTGAGAACAGGTGGCGCCGGCCCGACCAGGCGCTTTGTGTCGGGGCGCGAG-
Figure imgf000161_0002
AGGGTGGCCACGGTGTTAGGAGAGGCGCGGGAGCCGAGAGGTGGCG
1404 GGCGGCGGCTGGAGAGCGAGGAGGAGCGGGTGGCCCCGCGCTGCGCCCGCCCTCGCCTCACCTG-
GCGCAGGTAGGTGTGGCCGCGTCCCCTACCCGGCCGGGACTTTCTGGTAAGGAGAGGAGGTTACGGG-
GAACGACGCGCTGCTTTCATGCCCTTTCTTGTTC:
ATAAAATACAGGTGGGTTCCGCCAGCTTCGCTCC
1405 GGGCCCCGGGACTCGGCTTGCACGAGCCAGTCTGGGGACCGGGGAGGCGGGGAGAGGGAAGGG-
Figure imgf000161_0003
TTGCTCTAAAGCCGCCGCCTCCGGCAAAGCCCCGTCGGCCGCC
1406 ACGGAATGTGGGGTGCGGGCCTGAATATTATAAACAAAACCAAAAAACACTGGCTGGAAAG-
Figure imgf000161_0004
C
1407 GCGGCTGCTGCCGAGGCTCCTGGTTTCCACCGCCGCCCTCGGGGATCATGCCGCCATCGCGGTTCAT-
Figure imgf000161_0005
ACACGGTGTCGTGGACGAGCGAACGCGCCATGGCTGGAGCGC
1408 CCGCTGCGCGAGGGAgggggcccgaggcgcccccggcccgcccTCCTCCCGGTCTTCGGATCCGAGC-
GCGGGGGCACGTGGGCGCGCGGGGTGCGTGGCAAGCCGCCCCTCTCCCCACGCCCGTCCGGC
1409 GGGGTGCGGCGTCTGGTCAGCCAGGGGTGAATTCTCAGGACTGGTCGGCAGTCAAGGTGAGGACCCT-
Figure imgf000161_0006
GAGGGGCAGTCGCGGGGGA
1410 GCCGGGGGAAATGCGGCCTCTAAGCTCTCCGCTGAGGCGGCTTGGAAGGAATAGTGACTGACGTg- gaggtgggggaggtggctggcccgggcgaggcccagggagagggagaggaggcgggtgggagaggaggagggT- GTATCTCCTTTCGTCGGCCCGCCCCTTGGCTTCTGCACTGATGGTGGGTGGATGAGTAATGCATCCAGGAAGC( TGGAGGCCTGTGGTTTCCGCACCCGCTGCCACCC
1411 tgcctggtaggactgacggctgcctttgtcctcctcctctccaccccgcctccccccaccct- gccttccccccctcccccgtcttctctcccgcagctgcctcagtcggctactctcagccaacccccctcac- - 1 60 - cacccttctccccacccgcccccccgcccccgtcggcccagcgctgccagcccgagtttgcagagag- gtaactccctttggctgcgagcgggcgagc
1412 GCGCGGGCGCCTCGATCTCCCGCGCGCGCGCGTGCGCGAGACCCCCCTTTGGCCCCCTACCCT- GCAGCAAGGGTAGCGTGACGTAATGCAACCTCAGCATGTCAGCAGCAATATAAAGGAGAATGAGGCG- GCGCGCCTCCCAGACGCAGAGTAGATTGTGATTGGCTCGGGCTGCGGAACCTCG
1413 CCCGGCTGGTCGGCGCTCCTCGCAGGCGGTGTCCCGGTCCGGAGCGATCTGCGCGCTCGGCCCCGCG- GCCGCGCCCTCCCCGAAGCCCTTGCTTTGTTCTGTGAGCGCCTCGTGTCAGCCAGGCGCAGTGAGCT-
GCTGCAAACTACACGGTTTCGGTCCCCCGCGC
1414 CCGGGGCTGGGACGGCGCTTccaggcggagaaagacctccgcgggccgcgcgcggccttccccctgc- gaggatcgccattggcccgggttggctttggaaagcggcggtGGCTTTGGGCCGGGCTCGGC
1415 GGGCGGGGTGGGGCTGGAGCtcctgtctcttggccagctgaatggaggcccagtggcaacacag- gtcctgcctggggatcaggtctgctctgcaccccaccttgctgcctggagccgcccacctgacaacctct- catccctgctctgcagatccggtcccatccccactgcccaccccacccccccagcactccacccagttcaacgt tccacgaacccccagaaccagccctcatcaacaggcagcaagaaggg
1416 GTGCGGTTGGGCGGGGCCCTgtgccccactgcggagtgcgggtcgggaagcggagagagaagcagct- gtgtaatccgctggatgcggaccagggcgctccccattcccgtcgggagcccgccgattggctgggtgtgg- gcgcacgtgaccgacatgtggctgtattggtgcagcccgccagggtgtcactggagacagaatggaggtgctgc cggactcggaaatggggtaggtgctggagccaccatggccagg
1417 GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact- gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC
1418 GGCGGTGCCTCCGGGGCTCAcctggctgcagccacgcaccccctctcagtggcgtcggaact- gcaaagcacctgtgagcttgcggaagtcagttcagactccagcccGCTCCAGCCCGGCCCGACCC
1419 CGGGAGCCCGCCCCCGAGAGgtgggctgcgggcgctcgaggcccagccgccgccgccgccgccgc- cgccgccgcctccgccgccgccgccgccgccgccgccgccgcgctgccgcacgccccctggcagcg- gcgcctccgtcaccgccgccgcccgcgctcgccgtcggcccgccgcccgctcagaggcggccctccaccggaag tgaaaccgaaacggagctgagcgcctgactgaggccgaacccccggcccg
1420 TCCTGCCATCCGCGCCTTTGCActtttctttttgagttgacatttcttggtgctttttg- gtttctcgctgttgttgggtgctttttggtttgttcttgtccctttttcgtttgctcatcctttttg- gcgctaactcttaggcagccagcccagcagcccgaagcccgggcagccgcgctccgcggccccggggcagcgcg gcgggaaccgcagccaagccccccgacacggggcgcacgggggccgggcagcccg
1421 AGGCACAGGGGCAGCTCCGGCACggctttctcaggcctatgccggagcctcgagggctggagagcgg- gaagacaggcagtgctcggggagttgcagcaggacgtcaccaggagggcgaagcggccacgggaggggggc- cccgggacattgcgcagcaaggaggctgcaggggctcggcctgcgggcgccggtcccacgaggcactgcggccc agggtctggtgcggagagggcccacagtggacttggtgacgct
1422 CGACCCCTCCGACCGTGCTTCCGgtgagggtcctgggcccctttcccactctctagagaca- gagaaatagggcttcgggcgcccagcgtttcctgtggcctctgggacctcttggccagggacaaggacccgt- gacttccttgcttgctgtgtggcccgggagcagctcagacgctggctccttctgtccctctgcccgtggacatt agctcaagtcactgatcagtcacaggggtggcctgtcaggtcaggcgg
1423 CCCGCAGGGTGGCTGCGTCCttccagggcctggcctgagggcaggggtg- - 1 61 - gtttgctcccccttcagectccgggggctggggtcagtgcggtgctaacacggctetctctgtgctgtgg- gacttccaggcaggcccgcaagccgtgtgagccgtcgcagccgtggcatcgttgaggagtgct- gtttccGCAGCTGTGACCTGGCCCTCCTGGA
1424 GCGTCTGCCGGCCCCTCCCCttgtccgtcccctccgcgccgctggcgcgcgccttctgaatgc- caagcattgccataaactccggggacaaaagcctgggtcacaaaagccccctctagaagttcacaccctgag- gcttccctggcaaggctgggggccgtttggcccttccatgtggactgcaaaaacagtgttggaatgcaggactc tgggtatgttctcgaaagttgttacaaccccaacccagggttgacc
1425 TAGGCCGCCGGGCAGCCACCgcgctcctctggctctcctgctccatcgcgctcctccgcgcccttgc- cacctccaacgcccgtgcccagcagcgcgcGGCTGCCCAACAGCGCCGGA
1426 GGGGAGCGGGGACGCGAGCAgcaccagaatccgcgggagcgcggctgttcctggtagggccgtgt- caggtgacggatgtagctagggggcgagctgcctggagttgcgttccaggcgtccggcccctgggccgtcac- cgcggggcgcccgcgctgagggtgggaagatggtggtgggggtgggggcgcacacagggcgggaaagtggcggt aggcgggagggagaggaacgcgggccctgagccgcccgcgcgcg
1427 GCCGGCTGGCTCCCCACTCTGCcagagcgaggcggggcagtgaggactccgcgacgcgtccgcac- cctgcggccagagcggctttgagctcggctgcgtccgcgctaggcgctttttcccagaagcaatccag- gcgcgcccgctggttcttgagcgccaggaaaagcccggagctaacgaccggccgctcggccactgcacggggcc ccaagccgcagaaggacgacgggagggtaatgaagctgagcccaggtc
1428 TCGCTCACGGCGTCCCCTTGCCtggaaagataccgcggtccctccagaggatttgagggacagg- gtcggagggggctcttccgccagcaccggaggaagaaagaggaggggctggctggtcaccagagggtggggcg- gaccgcgtgcgctcggcggctgcggagagggggagagcaggcagcgggcggcggggagcagcatggagccggcg gcggggagcagcatggagccttcggctgactggctggccacggc
1429 TCCCCGCTGCCCTGGCGCTCcccctttgatttattagggctgccgggttggcgcagat- tgctttttcttctcttccatcccatcctcccttctggtcctcctttccacagtgggagtccgtgctcct- gctcctcggttggctcctaagtgccccgccaggtcccctctcctttcgctctcccggctccggctcccgactct tcggcccgctggcatctgcttccctcccctgcctcgtttctcgtcgcccctgct
1430 GGCCAGAGGCAGGCCCGCAGCtccctgccccgcctctgtgcctccgccaacccgacaacgct- tgctcccaccccgatccccgcacccgcgcgaAGTGGGCCCTCCGGTCGTCGGC
1431 TGCCCGGGTCATCGGACGGGAGgccgcgccacgtgagggcggcaagagggcactggccctgcggc- gaggccccagcgaggggcgcttccCCGAGGGGCCAGCCTGGGCA
1432 CCCAGTGCGCACGGCGAGGCagtagcccggccccgcactgctgataggtgcaggcag- gacagtccctccaccgcggctcggggcgtcctgattggtgcggagccacgtcagtcgcacccggagaagg- gtctgggaggaggcggaggcggaGAGGGCTGGGGAGGGCCGCG
1433 AGCGTCCCAGCCCGCGCACCgaccagcgccccagttccccacagacgccggcgggcccgg- gagcctcgcggacgtgacgccgcgggcggaagtgacgttttcccgcggttggacgcggCGCTCAGTTGCCGG-
GCGGGG
1434 TGCTCCCCCGGGTCGGAGCCccccggagctgcgcgcgggcttgcagcgcctcgcccgcgct- gtcctcccggtgtcccgcttctccgcgccccagccgccggctgccagcttttcggggccccgagtcgcac- ccagcgaagagagcgggcccgggacaagctcgaactccggccgcctcgcccttccccggctccgctccctctgc cccctcggggtcgcgcgcccacgatgctgcagggccctggctcgctgctg
1435 CGCTCGCATTGGGGCGCGTCccccatccgcccccaactgtggtgtcgcgacaggtcctattgcgggt- - 1 62 - gtctgcggtgggaagggcggtggtgactgggagcATGCGGGGTAACCGCAGTGGGCA
1436 TGCGGCAAGCCCGCCATGATGtccacgtgacaaaagccatgatatacatatgacaacgcctgccata- ttgtccctgcggcaaaacccaacacgaaaagcacacagcaaagacaaagaggcccgccatgttttacactgcg- gcaagaccttcagccgccatcttttcctgtgTGACCGCACATGTCCACCACCATGC
1437 TCTTGAGCCTCAGGAGTGAAAAGGCCCCTTGggaaaccctcacccaggagatacacaggagcactg- gctttggcagcagctcacaatgagaaagaTGCCTGTCACAGCCTTTGCCTTCCTCTTCTATG
1438 GGACCATGAGTGTTTCCATGCTTGGCATCAGAcatgtcttctacccctattcagtctgtcatccact- ggtcaagaatcccaaacattctaaaactgtgtccacatctcttctgggtaactcttatgattggagg- gcttcctgaggtgtgaagtctatcacagatccagtgactaacttctagcttcatcttattctcacttaggggag aagagttgaggcccaagcaaacctcttcttaccattggcttagggaa
1439 tcagccactgcttcgcaggctgacgttactgacgtggtgccagcgacggagggcgagaacgc- cagcgcggcgcagccggacgtgaacgcgcagatcaccgcagcggttgcggcagaaaacagccgcattatggg- gatcctcaactgtgaggaggctcacggacgcgaagaacaggcacgcgtgctggcagaaacccccggtatgaccg tgaaaacggcccgccgcattctggccgcagcaccacagagtgcacag
1440 cggccagctgcgcggcgactccggggactccagggcgcccctctgcggccgacgcccggggt- gcagcggccgccggggctggggccggcgggagtccgcgggaccctccagaagagcggccggcgccgtgact- cagcactggggcggagcggggc

Claims

- 1 63 -Claims :
1. Method of determining a subset of diagnostic markers for po¬ tentially methylated genes from the genes of gene marker IDs 1- 359 of table 1, suitable for the diagnosis or prognosis of a disease or tumor type in a sample, comprising a) obtaining data of the methylation status of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 in at least 1 sample, preferably 2, 3, 4 or at least 5 samples, of a confirmed disease or tumor type positive state and at least one sample of a disease or tumor type negative state, b) correlating the results of the obtained methylation status with the disease or tumor type, c) optionally repeating the obtaining a) and correlating b) steps for a different combination of at least 50 random genes selected from the 359 genes of gene marker IDs 1-359 and d) selecting as many marker genes which in a classification analysis have a p-value of less than 0.1 in a random-vari¬ ance t-test, or selecting as many marker genes which in a classification analysis together have a correct disease or tumor type prediction of at least 70% in a cross-validation test, wherein the selected markers form the subset of diagnostic mark¬ ers .
2. The method of claim 1, characterized in that the correlated results for each gene b) are rated by their correct correlation to the disease or tumor type positive state, preferably by p- value test, and selected in step d) in order of the rating.
3. The method of claim 1 or 2, characterized in that the at least 50 genes of step a) are at least 70, preferably at least 90, at least 100, at least 120, at least 140, at least 160, at least 180, at least 200, at least 220, at least 240, at least 250 or all, genes.
4. The method of any one of claims 1 to 3, characterized in that not more than 40, preferably not more than 30, in particu- - 1 64 - lar preferred not more than 20, marker genes are selected in step d) for the subset.
5. The method of any one of claims 1 to 4, characterized in that the step a) of obtaining data of the methylation status comprises determining data of the methylation status, preferably by methylation specific PCR analysis, methylation specific di¬ gestion analysis, or hybridization analysis to non-digested or digested fragments, or PCR amplification analysis of non-diges¬ ted or digested fragments.
6. Method of identifying a disease or tumor type in a sample comprising DNA from a patient, comprising the step of providing a diagnostic subset of markers identified according to any one of claims 1 to 5, determining the methylation status of the genes of the subset in the sample and comparing the methylation status with the status of a confirmed disease or tumor type pos¬ itive and/or negative state, thereby identifying the disease or tumor type in the sample.
7. A set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes se¬ lected from at least 180, preferably at least 200, more pre¬ ferred at least 220, in particular preferred at least 240, even more preferred at least 260, most preferred at least 280, marker genes of table 1.
8. A set of nucleic acid primers or hybridization probes being specific for a potentially methylated region of marker genes se¬ lected from at least one of the following combinations a) CHRNA9, RPA2, CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SER- PINB5, PIWIL4, NTF3, CDK2AP1 b) IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl c) KRT17, IGFBP7, RHOXFl, CLIC4, TP53, DLX2, ITGA4, AIMlL, SERPINl, SERPIN2, TP53, XIST, TEADl, CDKN2A, CTSD, OPCML, RPA2, BRCA2, CDHl, S100A9, SERPINB2, BCL2A1, UNC13B, ABLl, TIMPl, ATM, FBXW7, SFRP5, ACTB, MSXl, LOX, SOX15, DGKH, CYLD, XPA, XPC d) NEUROD2, CTCFL, GBP2, SFN, MAGEB2, DIRAS3, ARMCX2, HRAS e) SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUROD2 - 1 65 - f) PITX2, TJP2, CD24, ESRl, TNFRSFlOD, PRA3, RASSFl g) GATA5, RASSFl, HIST1H2AG, NPTXl, UNC13B h) SMAD3, NANOSl, TERT, BCL2, SPARC, SFRP2, MGMT, MYODl, LAMAl i) TJP2, CALCA, PITX2, TFPI2, CDKN2B j) PITX2, TNFRSFlOD, PAX8, RAD23A, GJB2, F2R, TP53, NTHLl , TP53 k) ARRDC4, DUSPl, SMAD9, HOXAlO, C3, ADRB2, BRCA2 , SYK
1) PITX2, MT3, RPA3, TNFRSFlOD, PTEN, TP53, PAX8, TGFBR2,
HICl, CALCA, PSATl, MBD2, NTF3, PLAGLl, F2R, GJB2, ARRDC4,
NTHLl m) MT3, RPA3, TNFRSFlOD, HOXAl, C13orf15, TGFBR2, HICl, CALCA,
PSATl, NTF3, PLAGLl, F2R, GJB2, ARRDC4, NTHLl n) PITX2, PAX8, CD24, TP53, ESRl, TNFRSFlOD, RAD23A, SCGB3A1.
RARB, TP53, LZTSl o) DUSPl, TFPI2, TJP2, S100A9, BAZlA, CPEB4, AIMlL, CDKN2A,
PITX2, ARPClB, RPA3, SPARC, SFRP4, LZTSl, MSH4, PLAGLl, AB-
CBl, C13orfl5, XIST, TDRD6, CCDC62, HOXAl, IRF4, HSD12B4,
S100A9, MT3, KCNJ15, BCL2A1, S100A8, PITX2, THBD, NANOSl,
SYK, SMAD2, GNAS, HRAS, RARRESl, APEXl p) TJP2, CALCA, PITX2, PITX2, ESRl, EFSSMAD3, ARRDC4, CD24,
FHL2, PITX2, RDHE2, KIF5B, C3, KRT17, RASSFl q) CHRNA9, RPA2 , CPEB4, CASP8, MSH2, ACTB, CTCFL, TPM2, SER-
PINB5, PIWIL4, NTF3, CDK2AP1 r) IGF2, KCNQl, SCGB3A1, EFS, BRCAl, ITGA4, H19, PTTGl s) KRT17, AQP3, TP53_CGI23_lkb, ZNF462, NEUROGl, GATA3, MTlA,
JUP, RGC32, SPINT2, DUSPl t) NCL, XPA, MYODl, Pitx2 u) SPARC, PIWIL4, SERPINB5, TEADl, EREG, ZDHHCIl, C5orf4 v) HSD17B4, DSP, SPARC, KRT17, SRGN, C5orf4, PIWIL4, SERPINB5,
ZDHHCIl, EREG w) TIMPl, C0L21A1, C0L1A2, KL, CDKN2A x) TIMPl, C0L21A1, C0L1A2 y) BCL2A1, SERPINB2, SERPINEl, CLIC4, BCL2A1, ZNF256, ZNF573,
GNAS, SERPINB2 z) TDRD6, XIST, LZTSl, IRF4 aa) TIMPl, C0L21A1, C0L1A2, KL, CDKN2A, Lamda, bb) DSP, AR, IGF2, MSXl, SERPINEl
CC) FHLl, LMNA, GDNF dd) FBXW7, GNAS, KRT14 ee) CHFR, AR, RBPl, MSXl, C0L21A1, FHLl, RARB ff) DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 - 1 66 - gg) FHLl, LMNA, GDNF hh) FBXW7, GNAS, KRT14 ii) CHFR, AR, RBPl, MSXl, COL21A1, FHLl, RARB jj) DCLRElC, MLHl, RARB, OGGl, SNRPN, ITGA4 kk) SFN, DIRAS3, HRAS, ARMCX2, MAGEB2, GBP2, CTCFL, NEUROD2
11) SFN, BAZlA, DIRAS3, CTCFL, ARMCX2 , GBP2, MAGEB2,
NEUROD2 mm) DIRAS3, C5AR1, BAZlA, SFN, ERCCl, SNRPN, PILRB, KRT17,
CDKN2A, H19, EFS, TJP2, HRAS, NEUROD2 , GBP2, CTCFL, nn) DIRAS3, C5AR1, SFN, BAZlA, HIST1H2AG, XAB2, HOXAl,
HICl, GRIN2B, BRCAl, C13orf15, SLC25A31, CDKN2A, H19, EFS,
TJP2, HRAS, NEUR0D2, GBP2, CTCFL, oo) TFPI2, NEUROD2, DLX2, TTC3, TWISTl pp) MAGEB2, MSH2, ARPClB, NEUROD2 , DDX18, PIWIL4, MSXl,
COL1A2, ERCC4, GADl, RDHlO, TP53, APC, RHOXFl, ATM qq) ACTB, EFS, CXADR, LAMC2, DNAJA4 , CRABPl, PARP2, HICl,
MTHFR, S100A9, PTX2 rr) ACTB, EFS, CXADR, LAMC2, DNAJA4 , PARP2, CRABPl, HICl,
SERPINIl, MTHFR, PITX2
SS) ACTB, EFS, PARP2, TP73, HICl, BCL2A1, CRABPl, CXADR,
BDNF, COLlAl tt) EFS, ACTB, BCL2A1, TP73, HICl, SERPINIl, CXADR uu) ACTB, TP73, SERPINIl, CXADR, HICl, BCL2A1, EFS vv) FBXL13, PITX2, NKX2-1, IGF2, C5AR1, SPARC, RUNX3,
CHSTIl, CHRNA9, ZNF462, HSD17B4, UNG, TJP2, ERBB2, S0X15,
ERCC8, CDXl, ANXA3, CDHl, CHFR, TACSTDl, MTlA ww) TP53, PTTGl, VHL, TP53, S100A2, ZNF573, RDHlO, TSHR,
MYO5C, MBD2, CPEB4, BRCAl, CD24, COLlAl, VDR, TP53, KLF4,
ADRB2, ERCC2, SPINT2, XAB2, RBl, APEXl, RPA3, TP53, BRCA2 ,
MSH2, BAZlA, SPHKl, ERCC8, SERPINIl, RPA2 , SCGB3A1, MLH3,
CDK2AP1, MTlG, PITX2, SFRP5, ZNF711, TGFBR2, C5AR1, DPHl,
CDXl, GRIN2B, C5orf4, BOLL, HOXAl, NEUROD2, BCL2A1, ZNF502,
FOXA2, MYODl, HOXAlO, TMEFF2, IQCG, LXN, SRGN, PTGS2,
ONECUT2, PENK, PITX2, DLX2 , SALL3, APC, APC, HIST1H2AG,
ACTB, RASSFl, S100A9, TERT, TNFRSF25, HICl, LAMC2, SPARC,
WTl, PITX2, GNA15, ESRl, KL, HICl xx) HICl, LAMC2, SPARC, WTl, PITX2, GNA15, KL, HICl yy) HICl, KL, ESRl
or a set of at least 50%, preferably at least 60%, at least - 1 67 -
70%, at least 80%, at least 90%, 100% of the markers of anyone of the above a) to yy) .
9. Set according to claim 7 or 8, characterized in that the set comprises not more than 100000 probes or primer pairs, particular in the case of immobilized probes on a solid surface.
10. Set according to any one of claims 7 to 9, characterized in that the primer pairs and probes are specific for a methylated upstream region of the open reading frame of the marker genes.
11. Set according to any one of claims 7 to 10, characterized in that the probes or primers are specific for a methylation in the genetic regions defined by SEQ ID NOs 1081 to 1440 including the adjacent up to 500, preferably up to 300, up to 200, up to 100, up to 50 or up to 10 adjacent base pairs corresponding to gene marker IDs 1 to 359, respectively.
12. Set according to claim 11, characterized in that the probes or primers are of SEQ ID NOs 1 to 1080.
13. Method of identifying or predicting a disease or tumor type in a sample comprising DNA from a patient, comprising obtaining a set of nucleic acid primers or hybridization probes according to claims 7 to 12, determining the methylation status of the genes in the sample which the members of the set are specific for and comparing the methylation status of the genes with the status of a confirmed cancer type positive and/or negative state, thereby identifying the disease or tumor type in the sample .
14. Method of claim 6 or 13, characterized in that the methylation status is determined by methylation specific PCR analysis, methylation specific digestion analysis and either or both of hybridization analysis to non-digested or digested fragments or PCR amplification analysis of non-digested or digested fragments .
15. The method of claims 13 or 14, characterized in that the disease or tumor type is selected from lung, gastric, - 1 68 - colorectal, brain, liver, bone, breast, prostate, ovarian, bladder, cervical, pancreas, kidney, thyroid, oesophaegeal, head and neck, neuroblastoma, skin, nasopharyngeal, endometrial, bile duct, oral, multiple myeloma, leukemia, soft tissue sarcoma, anal, gall bladder, endocrine, mesothelioma, wilms tumor, testis, bone, duodenum, neuroendocrine, salivary gland, larynx, choriocarcinoma, cardial, small bowel, eye, germ cell cancer, preferably thyroid or breast cancer or a trisomy, in particular trisomy 21.
PCT/EP2010/051033 2009-01-28 2010-01-28 Methylation assay WO2010086389A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP16188579.3A EP3124624B1 (en) 2009-01-28 2010-01-28 Methylation assay
CA2750979A CA2750979A1 (en) 2009-01-28 2010-01-28 Methylation assay
US13/146,903 US10718026B2 (en) 2009-01-28 2010-01-28 Methylation assay
EP10701378.1A EP2391729B1 (en) 2009-01-28 2010-01-28 Methylation assay

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP09450020A EP2233590A1 (en) 2009-01-28 2009-01-28 Methylation assay
EP09450020.4 2009-01-28

Publications (1)

Publication Number Publication Date
WO2010086389A1 true WO2010086389A1 (en) 2010-08-05

Family

ID=40578874

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2010/051032 WO2010086388A1 (en) 2009-01-28 2010-01-28 Lung cancer methylation markers
PCT/EP2010/051033 WO2010086389A1 (en) 2009-01-28 2010-01-28 Methylation assay

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/051032 WO2010086388A1 (en) 2009-01-28 2010-01-28 Lung cancer methylation markers

Country Status (4)

Country Link
US (3) US10718026B2 (en)
EP (4) EP2233590A1 (en)
CA (2) CA2750978A1 (en)
WO (2) WO2010086388A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010093906A2 (en) 2009-02-12 2010-08-19 Curna, Inc. Treatment of glial cell derived neurotrophic factor (gdnf) related diseases by inhibition of natural antisense transcript to gdnf
JP2012044924A (en) * 2010-08-26 2012-03-08 Japan Health Science Foundation Marker and method for cervix uteri cancer screening
WO2012031329A1 (en) * 2010-09-10 2012-03-15 Murdoch Childrens Research Institute Assay for detection and monitoring of cancer
EP2886659A1 (en) 2013-12-20 2015-06-24 AIT Austrian Institute of Technology GmbH Gene methylation based colorectal cancer diagnosis
CN105143465A (en) * 2013-03-14 2015-12-09 梅奥医学教育和研究基金会 Detecting neoplasm
US9797016B2 (en) 2010-10-19 2017-10-24 Oslo Universitetssykehus Hf Methods and biomarkers for detection of bladder cancer
WO2017212734A1 (en) * 2016-06-10 2017-12-14 国立研究開発法人国立がん研究センター Method for predicting effect of pharmacotherapy on cancer
US10006093B2 (en) 2015-08-31 2018-06-26 Mayo Foundation For Medical Education And Research Detecting gastric neoplasm
US10030272B2 (en) 2015-02-27 2018-07-24 Mayo Foundation For Medical Education And Research Detecting gastrointestinal neoplasms
US10184154B2 (en) 2014-09-26 2019-01-22 Mayo Foundation For Medical Education And Research Detecting cholangiocarcinoma
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
CN109803978A (en) * 2016-09-07 2019-05-24 武汉华大吉诺因生物科技有限公司 Polypeptide and its application
US10301680B2 (en) 2014-03-31 2019-05-28 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US10370726B2 (en) 2016-04-14 2019-08-06 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasia
US10435753B2 (en) 2010-03-26 2019-10-08 Mayo Foundation For Medical Education And Research Methods for detecting colorectal cancer using a DNA marker of exfoliated epithelia and a fecal blood marker
US10435755B2 (en) 2015-03-27 2019-10-08 Exact Sciences Development Company, Llc Detecting esophageal disorders
US10913986B2 (en) 2016-02-01 2021-02-09 The Board Of Regents Of The University Of Nebraska Method of identifying important methylome features and use thereof
US10934594B2 (en) 2017-11-30 2021-03-02 Mayo Foundation For Medical Education And Research Detecting breast cancer
US10934592B2 (en) 2017-02-28 2021-03-02 Mayo Foundation For Medical Education And Research Detecting prostate cancer
US11078543B2 (en) 2016-04-14 2021-08-03 Mayo Foundation For Medical Education And Research Detecting pancreatic high-grade dysplasia
US11674168B2 (en) 2015-10-30 2023-06-13 Exact Sciences Corporation Isolation and detection of DNA from plasma

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2233590A1 (en) 2009-01-28 2010-09-29 AIT Austrian Institute of Technology GmbH Methylation assay
WO2013012781A2 (en) * 2011-07-15 2013-01-24 The Johns Hopkins University Genome-wide methylation analysis and use to identify genes specific to breast cancer hormone receptor status and risk of recurrence
ITPD20110324A1 (en) * 2011-10-12 2013-04-13 Innovation Factory S C A R L METHOD FOR DETECTION OF GALFA15 AS A TUMOR MARKER IN PANCREATIC CARCINOMA
JP6224010B2 (en) 2012-02-13 2017-11-01 ベイジン インスティチュート フォー キャンサー リサーチ Methods for in vitro assessment of tumor development, metastasis or life expectancy and artificial nucleotides used
EP2636752A1 (en) * 2012-03-06 2013-09-11 Universiteit Maastricht In vitro method for determining disease outcome in pulmonary carcinoids.
CN102787174B (en) * 2012-09-06 2013-08-07 南京大学 Kit for morbidity-related tumor suppressor gene epigenetic mutation detection of gastrointestinal neoplasms and application thereof
PL2904398T3 (en) * 2012-10-02 2018-11-30 Sphingotec Gmbh A method for predicting the risk of getting cancer in a female subject
WO2014062218A1 (en) * 2012-10-16 2014-04-24 University Of Southern California Colorectal cancer dna methylation markers
US9518989B2 (en) 2013-02-12 2016-12-13 Texas Tech University System Composition and method for diagnosis and immunotherapy of lung cancer
CN105154542B (en) * 2015-09-01 2018-04-17 杭州源清生物科技有限公司 One group of gene for being used for lung cancer molecule parting and its application
WO2017192221A1 (en) * 2016-05-05 2017-11-09 Exact Sciences Corporation Detection of lung neoplasia by analysis of methylated dna
US11685955B2 (en) 2016-05-16 2023-06-27 Dimo Dietrich Method for predicting response of patients with malignant diseases to immunotherapy
DE102016005947B3 (en) 2016-05-16 2017-06-08 Dimo Dietrich A method for estimating the prognosis and predicting the response to immunotherapy of patients with malignant diseases
WO2017210662A1 (en) * 2016-06-03 2017-12-07 Castle Biosciences, Inc. Methods for predicting risk of recurrence and/or metastasis in soft tissue sarcoma
US10093986B2 (en) 2016-07-06 2018-10-09 Youhealth Biotech, Limited Leukemia methylation markers and uses thereof
WO2018009709A1 (en) * 2016-07-06 2018-01-11 Youhealth Biotech, Limited Lung cancer methylation markers and uses thereof
EP3481953A4 (en) * 2016-07-06 2020-04-15 Youhealth Biotech, Limited Liver cancer methylation markers and uses thereof
US11396678B2 (en) * 2016-07-06 2022-07-26 The Regent Of The University Of California Breast and ovarian cancer methylation markers and uses thereof
WO2018009707A1 (en) 2016-07-06 2018-01-11 Youhealth Biotech, Limited Solid tumor methylation markers and uses thereof
EP3481951A4 (en) * 2016-07-06 2020-08-05 Youhealth Biotech, Limited Colon cancer methylation markers and uses thereof
CN106282347A (en) * 2016-08-17 2017-01-04 中南大学 HoxC11 as biomarker preparation adenocarcinoma of lung pre-diagnostic reagent in application
CN116064795A (en) * 2016-09-02 2023-05-05 梅约医学教育与研究基金会 Methods and kits for determining methylation status of differentially methylated regions
CN109963862B (en) 2016-09-07 2024-01-30 武汉华大吉诺因生物科技有限公司 Polypeptides and uses thereof
WO2018107381A1 (en) * 2016-12-14 2018-06-21 Shanghaitech University Compositions and methods for treating cancer by inhibiting piwil4
CN106755466A (en) * 2017-01-12 2017-05-31 宁夏医科大学 A kind of its method for building up of specific DNA methylome and application
WO2018174860A1 (en) * 2017-03-21 2018-09-27 Mprobe Inc. Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling
WO2018214249A1 (en) * 2017-05-22 2018-11-29 立森印迹诊断技术(无锡)有限公司 Imprinted gene grading model, system composed of same, and application of same
CN109528749B (en) * 2017-09-22 2023-04-28 上海交通大学医学院附属瑞金医院 Application of long-chain non-coding RNA-H19 in preparation of drug for treating pituitary tumor
US11661601B2 (en) 2018-03-22 2023-05-30 Ionis Pharmaceuticals, Inc. Methods for modulating FMR1 expression
CA3094717A1 (en) 2018-04-02 2019-10-10 Grail, Inc. Methylation markers and targeted methylation probe panels
EP3795693A4 (en) * 2018-05-18 2022-01-19 Lisen Imprinting Diagnostics, Inc. Method for diagnosing cancer by means of biopsy cell sample
CN110714075B (en) * 2018-07-13 2024-05-03 立森印迹诊断技术(无锡)有限公司 Grading model for detecting benign and malignant degrees of lung tumor and application thereof
CN113286881A (en) 2018-09-27 2021-08-20 格里尔公司 Methylation signatures and target methylation probe plates
KR102637032B1 (en) * 2020-01-28 2024-02-15 주식회사 젠큐릭스 Composition for diagnosing bladder cancer using CpG methylation status of specific gene and uses thereof
CN114075603A (en) * 2020-09-01 2022-02-22 闫池 Method for determining differential methylation sites of CpG island of AJUBA gene promoter
CN112301132A (en) * 2020-11-18 2021-02-02 中国医学科学院肿瘤医院 Kit for multi-gene combined detection of cancer and detection method thereof
CN113528667B (en) * 2021-07-20 2022-09-20 中国科学院上海营养与健康研究所 Diagnosis method of giant cell tumor of bone
KR102404750B1 (en) * 2021-10-14 2022-06-07 주식회사 엔도믹스 Methylation marker genes for colorectal diagnosis using cell-free dna and use thereof
CN114438188A (en) * 2021-11-29 2022-05-06 中国辐射防护研究院 Use of hypermethylated CDK2AP1 gene as molecular marker for alpha radiation damage prediction
CN114395623B (en) * 2021-12-16 2024-03-08 上海市杨浦区中心医院(同济大学附属杨浦医院) Gene methylation detection primer composition, kit and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6605432B1 (en) 1999-02-05 2003-08-12 Curators Of The University Of Missouri High-throughput methods for detecting DNA methylation
EP1369493A1 (en) 2002-06-05 2003-12-10 Epigenomics AG Quantitative determination method for the degree of methylation of cytosines in CpG positions

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6582908B2 (en) * 1990-12-06 2003-06-24 Affymetrix, Inc. Oligonucleotides
DE10161625A1 (en) 2001-12-14 2003-07-10 Epigenomics Ag Methods and nucleic acids for the analysis of a pulmonary cell division disorder
EP1635694A4 (en) * 2003-05-30 2008-05-21 Univ Temple Methods of diagnosing, prognosing and treating breast cancer
US7588894B2 (en) * 2005-02-01 2009-09-15 John Wayne Cancern Institute Use of ID4 for diagnosis and treatment of cancer
US20100035970A1 (en) * 2005-04-15 2010-02-11 Oncomethylome Sciences, S.A. Methylation Markers for Diagnosis and Treatment of Cancers
US9512483B2 (en) * 2005-07-09 2016-12-06 Lovelace Respiratory Research Institute Gene methylation as a biomarker in sputum
WO2007032748A1 (en) * 2005-09-15 2007-03-22 Agency For Science, Technology & Research Method for detecting dna methylation
EP2004860A4 (en) * 2006-03-29 2009-12-30 Wayne John Cancer Inst Methylation of estrogen receptor alpha and uses thereof
EP2450710B1 (en) * 2006-07-14 2020-09-02 The Regents of The University of California Cancer biomarkers and methods of use thereof
US8911937B2 (en) * 2007-07-19 2014-12-16 Brainreader Aps Method for detecting methylation status by using methylation-independent primers
EP2233590A1 (en) 2009-01-28 2010-09-29 AIT Austrian Institute of Technology GmbH Methylation assay

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6605432B1 (en) 1999-02-05 2003-08-12 Curators Of The University Of Missouri High-throughput methods for detecting DNA methylation
EP1369493A1 (en) 2002-06-05 2003-12-10 Epigenomics AG Quantitative determination method for the degree of methylation of cytosines in CpG positions

Non-Patent Citations (21)

* Cited by examiner, † Cited by third party
Title
BIBIKOVA M ET AL., GENOME RES, vol. 16, no. 3, 2006, pages 383 - 393
BIBIKOVA MARINA ET AL: "High-throughput DNA methylation profiling using universal bead arrays", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US, vol. 16, no. 3, 1 March 2006 (2006-03-01), pages 383 - 393, XP002445638, ISSN: 1088-9051 *
BO, JONASSEN, GENOME BIOLOGY, vol. 3, no. 4, 2002
CHENG Y ET AL., GENOME RES, vol. 16, no. 2, 2006, pages 282 - 289
CHENG YU-WEI ET AL: "Multiplexed profiling of candidate genes for CpG island methylation status using a flexible PCR/LDR/Universal Array assay", GENOME RESEARCH,, vol. 16, no. 2, 1 February 2006 (2006-02-01), pages 282 - 289, XP002522613 *
DATABASE GEO [online] NCBI; 11 March 2002 (2002-03-11), "[HG-U133A] Affymetrix Human Genome U133A Array", XP002527544, retrieved from NCBI Database accession no. GLP96 *
DUDOIT ET AL., JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, vol. 97, 2002, pages 77 - 87
LAUSS M, KRIEGNER A, VIERLINGER K, VISNE I, YILDIZ A, DILAVEROGLU E, NOEHAMMER C.: "Consensus genes of the literature to predict breast cancer recurrence 33", BREAST CANCER RES TREAT, vol. 110, 2008, pages 235 - 44, XP019600682
ONGENAERT M ET AL., NUCLEIC ACIDS RES, vol. 36, 2008, pages D842 - D846
ONGENAERT MATÉ ET AL: "PubMeth: a cancer methylation database combining text-mining and expert annotation.", NUCLEIC ACIDS RESEARCH JAN 2008, vol. 36, no. Database issue, January 2008 (2008-01-01), pages D842 - D846, XP002527543, ISSN: 1362-4962 *
RADMACHER ET AL., JOURNAL OF COMPUTATIONAL BIOLOGY, vol. 9, 2002, pages 505 - 511
RAMASWAMY ET AL., PNAS USA, vol. 98, 2001, pages 15149 - 54
SATO N ET AL., CANCER RES, vol. 63, no. 13, 2003, pages 3735 - 3742
SATO N ET AL: "Discovery of novel targets for aberrant methylation in pancreatic carcinoma using high-throughput microarrays", CANCER RESEARCH, AMERICAN ASSOCIATION FOR CANCER RESEARCH, BALTIMORE, MD., US, vol. 63, no. 13, 1 July 2003 (2003-07-01), pages 3735 - 3742, XP002318399, ISSN: 0008-5472 *
SHAMES D ET AL., PLOS MEDICINE, vol. 3, no. 12, 2006, pages 2244 - 2262
SHAMES DAVID S ET AL: "A genome-wide screen for promoter methylation in lung cancer identifies novel methylation markers for multiple malignancies.", PLOS MEDICINE DEC 2006, vol. 3, no. 12, December 2006 (2006-12-01), pages E486, XP002527542, ISSN: 1549-1676 *
SIMON ET AL., JOURNAL OF THE NATIONAL CANCER INSTITUTE, vol. 95, 2003, pages 14 - 18
WRIGHT G.W., SIMON R, BIOINFORMATICS, vol. 19, 2003, pages 2448 - 2455
WRIGHT G.W., SIMON R., BIOINFORMATICS, vol. 19, 2003, pages 2448 - 2455
YAN P S ET AL., CLIN CANCER RES, vol. 6, no. 4, 2000, pages 1432 - 1438
YAN P S ET AL: "CpG island arrays: an application toward deciphering epigenetic signatures of breast cancer", CLINICAL CANCER RESEARCH, THE AMERICAN ASSOCIATION FOR CANCER RESEARCH, US, vol. 6, no. 4, 1 April 2000 (2000-04-01), pages 1432 - 1438, XP002245955, ISSN: 1078-0432 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2396408B1 (en) * 2009-02-12 2017-09-20 CuRNA, Inc. Treatment of glial cell derived neurotrophic factor (gdnf) related diseases by inhibition of natural antisense transcript to gdnf
EP2396408A2 (en) * 2009-02-12 2011-12-21 Opko Curna, LLC Treatment of glial cell derived neurotrophic factor (gdnf) related diseases by inhibition of natural antisense transcript to gdnf
WO2010093906A2 (en) 2009-02-12 2010-08-19 Curna, Inc. Treatment of glial cell derived neurotrophic factor (gdnf) related diseases by inhibition of natural antisense transcript to gdnf
US10435753B2 (en) 2010-03-26 2019-10-08 Mayo Foundation For Medical Education And Research Methods for detecting colorectal cancer using a DNA marker of exfoliated epithelia and a fecal blood marker
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
JP2012044924A (en) * 2010-08-26 2012-03-08 Japan Health Science Foundation Marker and method for cervix uteri cancer screening
WO2012031329A1 (en) * 2010-09-10 2012-03-15 Murdoch Childrens Research Institute Assay for detection and monitoring of cancer
US9797016B2 (en) 2010-10-19 2017-10-24 Oslo Universitetssykehus Hf Methods and biomarkers for detection of bladder cancer
US9506116B2 (en) 2013-03-14 2016-11-29 Mayo Foundation For Medical Education And Research Detecting neoplasm
CN105143465A (en) * 2013-03-14 2015-12-09 梅奥医学教育和研究基金会 Detecting neoplasm
US10683555B2 (en) 2013-03-14 2020-06-16 Mayo Foundation For Medical Education And Research Detecting neoplasm
US9982310B2 (en) 2013-03-14 2018-05-29 Mayo Foundation For Medical Education And Research Detecting neoplasm
US9994911B2 (en) 2013-03-14 2018-06-12 Mayo Foundation For Medical Education And Research Detecting neoplasm
US11821039B2 (en) 2013-03-14 2023-11-21 Mayo Foundation For Medical Education And Research Detecting neoplasm
EP3572529A2 (en) 2013-12-20 2019-11-27 AIT Austrian Institute of Technology GmbH Gene methylation based colorectal cancer diagnosis
WO2015091979A3 (en) * 2013-12-20 2015-08-13 Ait Austrian Institute Of Technology Gmbh Gene methylation based colorectal cancer diagnosis
EP2886659A1 (en) 2013-12-20 2015-06-24 AIT Austrian Institute of Technology GmbH Gene methylation based colorectal cancer diagnosis
US11987847B2 (en) 2014-03-31 2024-05-21 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US10301680B2 (en) 2014-03-31 2019-05-28 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US11365451B2 (en) 2014-03-31 2022-06-21 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US11078539B2 (en) 2014-03-31 2021-08-03 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US10883144B2 (en) 2014-03-31 2021-01-05 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasm
US10184154B2 (en) 2014-09-26 2019-01-22 Mayo Foundation For Medical Education And Research Detecting cholangiocarcinoma
US10900090B2 (en) 2014-09-26 2021-01-26 Mayo Foundation For Medical Education And Research Detecting cholangiocarcinoma
US10704107B2 (en) 2015-02-27 2020-07-07 Mayo Foundation For Medical Education And Research Detecting gastrointestinal neoplasms
US11384401B2 (en) 2015-02-27 2022-07-12 Mayo Foundation For Medical Education And Research Detecting gastrointestinal neoplasms
US10030272B2 (en) 2015-02-27 2018-07-24 Mayo Foundation For Medical Education And Research Detecting gastrointestinal neoplasms
US10435755B2 (en) 2015-03-27 2019-10-08 Exact Sciences Development Company, Llc Detecting esophageal disorders
US11104960B2 (en) 2015-03-27 2021-08-31 Exact Sciences Development Company, Llc Detecting esophageal disorders
US10006093B2 (en) 2015-08-31 2018-06-26 Mayo Foundation For Medical Education And Research Detecting gastric neoplasm
US10597733B2 (en) 2015-08-31 2020-03-24 Mayo Foundation For Medical Education And Research Detecting gastric neoplasm
US11859254B2 (en) 2015-08-31 2024-01-02 Mayo Foundation For Medical Education And Research Detecting gastric neoplasm
US11674168B2 (en) 2015-10-30 2023-06-13 Exact Sciences Corporation Isolation and detection of DNA from plasma
US10913986B2 (en) 2016-02-01 2021-02-09 The Board Of Regents Of The University Of Nebraska Method of identifying important methylome features and use thereof
US11078543B2 (en) 2016-04-14 2021-08-03 Mayo Foundation For Medical Education And Research Detecting pancreatic high-grade dysplasia
US11542557B2 (en) 2016-04-14 2023-01-03 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasia
US10370726B2 (en) 2016-04-14 2019-08-06 Mayo Foundation For Medical Education And Research Detecting colorectal neoplasia
WO2017212734A1 (en) * 2016-06-10 2017-12-14 国立研究開発法人国立がん研究センター Method for predicting effect of pharmacotherapy on cancer
CN109803978B (en) * 2016-09-07 2022-07-01 武汉华大吉诺因生物科技有限公司 Polypeptide and application thereof
CN109803978A (en) * 2016-09-07 2019-05-24 武汉华大吉诺因生物科技有限公司 Polypeptide and its application
US11697853B2 (en) 2017-02-28 2023-07-11 Mayo Foundation For Medical Education And Research Detecting prostate cancer
US10934592B2 (en) 2017-02-28 2021-03-02 Mayo Foundation For Medical Education And Research Detecting prostate cancer
US10975443B2 (en) 2017-11-30 2021-04-13 Mayo Foundation For Medical Education And Research Detecting breast cancer
US10934594B2 (en) 2017-11-30 2021-03-02 Mayo Foundation For Medical Education And Research Detecting breast cancer

Also Published As

Publication number Publication date
US10718026B2 (en) 2020-07-21
EP3124624B1 (en) 2019-11-20
EP2391729A1 (en) 2011-12-07
WO2010086388A1 (en) 2010-08-05
US20160281175A1 (en) 2016-09-29
CA2750978A1 (en) 2010-08-05
US20110287967A1 (en) 2011-11-24
EP3124624A2 (en) 2017-02-01
EP2391728A1 (en) 2011-12-07
EP2391729B1 (en) 2016-09-14
EP3124624A3 (en) 2017-04-12
EP2233590A1 (en) 2010-09-29
CA2750979A1 (en) 2010-08-05
US20110287968A1 (en) 2011-11-24

Similar Documents

Publication Publication Date Title
US10718026B2 (en) Methylation assay
Radpour et al. Hypermethylation of tumor suppressor genes involved in critical regulatory pathways for developing a blood-based test in breast cancer
JP7481804B2 (en) Detection of high-grade pancreatic dysplasia
EP3083993B1 (en) Gene methylation based colorectal cancer diagnosis
CN116064795A (en) Methods and kits for determining methylation status of differentially methylated regions
US11384401B2 (en) Detecting gastrointestinal neoplasms
EP1996729A2 (en) Molecular assay to predict recurrence of dukes&#39; b colon cancer
Fang et al. Genome-wide analysis of aberrant DNA methylation for identification of potential biomarkers in colorectal cancer patients
AU2020211461A1 (en) Detecting endometrial cancer
AU2018229294B2 (en) Detecting prostate cancer
WO2016044142A1 (en) Bladder cancer detection and monitoring
KR101504069B1 (en) Methods and Methylation Markers for detecting or diagnosing cholangiocarcinoma

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10701378

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2750979

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 13146903

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 6552/DELNP/2011

Country of ref document: IN

REEP Request for entry into the european phase

Ref document number: 2010701378

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2010701378

Country of ref document: EP