AU2008100463A4

AU2008100463A4 - Genome-based Diagnosis for Cancer

Info

Publication number: AU2008100463A4
Application number: AU2008100463A
Authority: AU
Inventors: Ilene Chen; Ling-Hong Tseng
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-05-21
Filing date: 2008-05-21
Publication date: 2008-07-03
Anticipated expiration: 2016-05-21

Description

00 Description Genome-based Diagnosis for Cancer [1J (Please find enclosed a research note entitled "Genom c-based Diagnosis 11r Cancer.) 00 00 Genome-based Diagnosis for Cancer Ling-lHong Tseng'*, Ilene Chen 2 SDepartment of Obstetrics and Gynecology. Chang (Gung Memorial H-ospital and University of Chang Gung School of Medicine, Taiwan "School of Electrical and Information Engineering, University of Sydney. Australia S* Ling-Hong Tseng and llene Chen contributed equally to this paper.

Correspondence should be addressed to Ilene Chen ilene@ee.usvd.edu.au TEL: +61 2 93517221 FAX: +61 2 93513847 00

SABSTRACT

Cancer is a large cluster of diseases, of which at least one hundred distinct types have been recognized. The treatment of patients with cancer depends on establishing accurate diagnoses with a combination of clinical and pathological information. To know whether the diagnosis of multiple common adult malignancies could be achieved by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types. and 90 normal tissue samples to DNA microarray gene expression analysis. The expression levels of 16063 genes and expressed sequence tags were used to assess the accuracy ofa multiclass classifier based on a Support Vector Machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Overall, these results demonstrate the feasibility of accurate, multiclass molecular class classification and suggest a strategy for clinical implementation of cancer diagnostics on this genomic scale.

INTRODUCTION

Cancer is a large cluster of disease, of which at least one hundred distinct types have been recognized. Cancer classification relies on the subjective interpretation of both clinical and pathological information with an eye toward placing tumors in currently accepted categories based on the tissue of origin of the tumor. However, clinical information can be incomplete or misleading. In addition, there is a wide spectrum in cancer morphology and many tumors are atypical or lack morphologic features that: are useful for differential diagnosis These difficulties can cause diagnostic 00 O confusion, prompting calls for mandatory second opinions in all surgical pathology CN cases In the aggregate, these are limitations that may hinder patient care. add expenses, and confound the results of clinical trials.

Molecular diagnostics offer the promise of precise, objective, and systematic human C cancer classification, but these tests are not widely used because characteristic molecular markers for most solid tumors have yet to be identified Recently, DNA Cc microarray-based tumor gene expression profiles have been used for cancer diagnosis.

But studies have been limited to few cancer types and have spanned multiple technology platforms complicating comparison among different datasets The 0 feasibility of cancer diagnosis across all of the common malignancies based on a 00 0 single reference database has not been explored. In addition, comprehensive gene p expression databases have yet to be developed, and there are no established analytical methods capable of solving complex, multiclass, gene expression-based classification problems.

To address these challenges, we have created a gene expression database containing the expression profiles of 218 tumor samples representing 14 common human cancer classes. By using an innovative analytical method. we show that accurate multiclass cancer classification is actually possible, suggesting the feasibility of molecular cancer diagnosis by means of comparison with a commonly accessible catalog of gene expression profiles.

METHODS

Snap-frozen human tumor and normal tissue specimens, spanning 14 different tumor classes, were obtained from the National Cancer Institute/Cooperative Human Tissue Network, Massachusetts General Hospital Tumor Bank, Dana-Farber Cancer Institute.

Brigham and Women's Hospital. Children's Hospital (all in Boston), and Memorial Sloan-Kettering Cancer Center (New York). Tissue was collected and studied under an anonymous discarded tissue protocol approved by the Dana-Farber Cancer Institute Institutional Review Board.

Initial diagnoses were made at university hospital referral centers by using all available clinical and pathological information. Tissues underwent centralized clinical and pathology review at the Dana-Farber Cancer Institute and Brigham and Women's Hospital (by or Memorial Sloan-Kettering Cancer Center (by E.L. and to confirm initial diagnosis of site of origin and histological type. All tumors were biopsy specimens from primary sites (except where noted) obtained before any treatment and were enriched in malignant cells but otherwise unselected.

00 O Normal tissue RNA (Biochain, Hayward. CA) was from snap-frozen autopsy Cl specimens collected through the International Tissue Collection Network.

S"Hybridization targets" were prepared with RNA from whole tumors by using published methods Targets were hybridized sequentially to oligonucleotide Cl microarrays l-Hu6800 and Hu35KsubA GeneChips (Affymetrix, Santa Clara. CA)] containing a total of 16,063 probe sets representing 14,030 GenBank and 475 The CO Institute for Genomic Research (TIGR) accession nos., and arrays were scanned by using standard Affymetrix protocols and scanners. For subsequent analysis. each Sprobe set was considered as a separate gene. Expression values for each gene were calculated by using Affymetrix (iGENt;(IP analysis software.

00 0 Of 314 tumor and 98 normal tissue samples processed, 218 tumor and 90 normal tissue samples passed quality control criteria and were used for subsequent data analysis. The remaining 104 samples either failed quality control measures of the amount and quality of RNA, as assessed by spectrophotometric measurement of OD and agarose gel electrophoresis, or yielded poor-quality scans. Scans were rejected if mean chip intensity exceeded 2 SDs from the average mean intensity for the entire scan set, if the proportion of "present" calls was less than 10%, or if microarray artifacts were visible. The resulting dataset contained '5 million gene expression values.

Clustering. Gene expression data were subjected to a variation filter that excluded genes showing minimal variation across the samples as follows: genes were excluded if they exhibited less than 5-fold and 500 units absolute variation across the dataset after a threshold of 20 units and ceiling of 16,000 units was applied. Of 16,063 expression values considered, 11,322 passed this filter and were used for clustering.

The dataset was normalized by standardizing each row (gene) to mean 0 and variance 1. Average-linkage hierarchical clustering was performed by using CLUSTER and TREEVIIW software Self-organizing map analysis was performed by using our G;NECU,[STER analysis package (available at www- enome.wi.mit.edu/MPR) (12).

Support Vector Machine (SVM) Algorithm and One vs. All (OVA) Classification Scheme. The SVM experiments described in this article were performed by using an implementation of sVM-FU (available at www.ai.m it.edu/proiects/cbcl). This linear SVM algorithm maximizes the distance between a hyperplane, w, and the closest samples to the hyperplane from two tumor classes, with the constraint that the samples from the two classes lie on separate sides of the hyperplane. This distance is 00 calculated in I 6.063-dimensional gene space, corresponding to the total number of cK expression values considered. This geometric property can be imposed by means of the following optimization problem: minm/2 1 1 w" 1 subject to b) 1. for all i. An unknown test sample's position relative to the hyperplane determines its class, and the confidence of each SVM prediction is based on the distance of a test sample from the hyperplane. In going from binary to multiclass classification, we used an OVA approach (described in Results). Given in classes and m trained classifiers, a new sample takes the class of the classifier with the largest real valued output class arg maxj-i wheref is the real valued output of the ith classifier. A positive prediction 0 strength corresponds to a test sample being assigned to a single class rather than to the 00 "all other" class.

Recursive Feature Elimination. This feature selection method recursively removes features based on the absolute magnitude of each hyperplane element Given microarray data with n genes per sample, each OVA SVM classifier outputs a hyperplane, w, that can be thought of as a vector with n elements each corresponding to the expression of a particular gene. Assuming that the expression values of each gene have similar ranges, the absolute magnitude of each element in w1 determines its importance in classifying a sample, because Ax) ~l=1 wxi b and the class label is signl[(x)]. Each OVA SVM classifier is first trained with all genes, then genes corresponding to liil in the bottom 10% are removed, and each classifier is retrained with the smaller gene set. This procedure is repeated iteratively to study prediction accuracy as a function of gene number.

Statistical Analysis. A class-proportional random predictor was used to determine the number of correct classifications that would be expected by chance for multiclass prediction. Associated P values were calculated based on the likelihood that the observed classification accuracy could be arrived at by chance, as described (14).

Genes that correlate with each tumor class were identified by sorting all of the genes on the array according their signal to noise (S2N) values [(po where p and o represent the mean and SD of expression, respectively, for each class] as published For the permutation tests, 1,000 permutations of the sample labels (tumor type) were performed on the dataset. and the S2N ratio was recalculated for each gene for each class label permutation. A gene is considered a statistically significant class-specific marker if the observed S2N exceeds the permuted S2N at least 99% of the time (P 0.01)

RESULTS

00 Fig. I Clustering of tumor gene expression data and identification of tumor-specific molecular markers.

Fig. 2. Mlulticlass classification scheme.

C Fig. 3. Multiclass classification result.

Fig. 4. Multiclass classification error analysis.

Fig. 5. Multiclass classification as a function of gene number.

00 We have studied the gene expression profiles of 144 primary tumors by using oligonucleotide microarrays containing 16063 oligonucleotide probe sets. Tumor C1 samples were primarily solid tumors of epithelial origin, spanning 14 common tumor classes that account for approximately 80% of new cancer diagnoses in the U.S (Fig.

1).

We explored two fundamentally different approaches to data analysis. The first, unsupervised learning, often referred to as clustering, allows the dominant structure in a dataset to dictate the separation of samples into clusters based on overall similarity in gene expression, without prior knowledge of sample identity. Fig. I shows the results of both hierarchical and self-organizing map clustering of this dataset.

Although some tumor types [lymphoma, leukemia, and central nervous system (CNS)] formed relatively discrete clusters with both methods, others, in particular the epithelial tumors, were largely intermixed. This finding indicates that unsupervised learning does not adequately capture the tissue of origin distinctions among these molecularly complex tumors. This result possibly reflects the large degree of biological variability in gene expression data. In addition, because tumor specimens were unselected with regard to percentage of stromal infiltration or inflammation, these clustering results might reflect contributions from nonneoplastic cellular elements to gene expression signatures that confound tissue of origin distinctions.

Alternatively, the hierarchical tree structure might reflect bona/fide previously unrecognized relationships among tumors that transcend tissue of origin distinctions.

The second approach to this classification problem is to use a supervised learning method. This method involves "training" a classifier to recognize distinctions among the 14 clinically defined tumor classes based on gene expression patterns, and testing the accuracy of the classifier in a blinded manner. Supervised learning has been used to make pairwise distinctions with gene expression data the distinction between acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AMIL); ref. 4].

00 I However, making multiclass distinctions can be a considerably more difficult Cl challenge. For this purpose, we devised an analytical scheme, depicted in Fig. 2. First, we divide the multiclass problem into a series of 14 OVA pairwise comparisons. Each test sample is presented sequentially to these 14 pairwise classifiers, each of which either claims or rejects that sample as belonging to a single class. This method results in 14 separate OVA classifications per sample. each with an associated confidence.

Each test sample is assigned to the class with the highest OVA classifier confidence.

\o We evaluated several classification algorithms for these OVA pairwise classifiers including weighted voting k-nearest neighbors and SVM, all of which 0 yielded significant prediction accuracy. Because the SVM algorithm consistently O out-performed other algorithms, these results are described in detail (Figs. 3, 4, and CK The SVM algorithm was used recently for pairwise gene expression-based classification (17, 18) and has a strong theoretical foundation (19, 20). This algorithm considers all profiled genes, to create descriptions of samples in this high-dimensional space, and then defines a hyperplane that best separates samples from two classes (Fig.

The position of an unknown sample relative to the hyperplane determines its membership in one or the other class "breast cancer" vs. "not breast cancer").

Fourteen separate OVA classifiers classify each sample. The confidence of each OVA SVM prediction is based on the distance of the test sample to each hyperplane, with a value of 0 indicating that a sample falls directly on a hyperplane. The overall multiclass classifier assigns a sample to the class with the highest confidence among the 14 pairwise OVA analyses.

The accuracy of this multiclass SVM-based classifier in cancer diagnosis was first evaluated by cross-validation in a set of 144 training samples. This method involves randomly withholding 1 of the 144 primary tumor samples, building a predictor based only on the remaining samples, then predicting the class of the withheld sample. The process is repeated for each sample, and the cumulative error rate is calculated. As shown in Fig. 3, the majority of the 144 calls was high confidence (defined as confidence and these had an accuracy of 90%, using the patient's clinical diagnosis as the "gold standard." The remaining 20% of the tumors had low confidence calls (confidence and these predictions had an accuracy of 28%.

Overall, the multiclass prediction corresponded to the correct assignment for 78% of the tumors. For half of the errors, the correct classification corresponded to the second- or third-most confident OVA prediction.

We confirmed these results by training the multiclass SVM classifier on the entire set of 144 samples and applying this classifier without further modification to an 00 independent test set of 54 tumor samples. Overall prediction accuracy on this test set 1 was 78%, a result similar to cross-validation accuracy and highly statistically significant when compared with class-proportional random prediction (P I The majority of these 54 predictions were high confidence, with an accuracy of 83%. whereas low-confidence calls were made on the remaining 22% of tumors with an accuracy of 58%. Again, for one-half of the errors, the correct classification corresponded to the second- or third-best prediction. Of note, classification of 100 e3 random splits of a combined training and test dataset gave similar results, confirinin the stability of prediction for this collection of samples (Fig.

O

0 Among these 54 test samples, were 8 metastatic samples, 6 of which were correctly O classified despite the classifier having been trained solely with gene expression data derived from primary tumors (P 0.005 vs. random multiclass assignment). This finding implies that prediction is being driven by cancer-intrinsic gene expression patterns rather than by gene expression signatures derived from contaminating nonmalignant tissue elements. These results further indicate that many cancers retain their tissue of origin identity throughout metastatic evolution, suggesting that gene expression-based approaches to the diagnosis of clinically problematic metastases of unknown primary origin (21) may be feasible.

We next investigated the number of genes contributing to the high accuracy of the SVM OVA classifier. The SVM algorithm considers all 16063 input genes and naturally utilizes all genes that contain information for each OVA distinction. Genes are assigned weights based on their relative contribution to the determination of each hyperplane. and genes that do not contribute to a distinction are weighted at zero.

Virtually all genes on the array were assigned weakly positive and negative weights in each OVA classifier (data not shown), indicating that thousands of genes potentially carry information relevant for the 14 OVA class distinctions. To determine whether the inclusion of this large number of genes was actually required for the observed high-accuracy predictions, we examined the relationship between classification accuracy and gene number by using recursive feature elimination. As shown in Fig. maximal classification accuracy is achieved when the predictor utilizes all genes for each OVA distinction. Nevertheless, significant prediction can still be achieved by using smaller gene numbers. Alternate feature selection methods with different properties, such as S2N radius-margin scaling and gene shaving also resulted in reduced classification accuracy (data not shown).

Our gene expression dataset is also useful for biological discovery. Many genes already in routine clinical use for cancer diagnosis were identified, including 00 O prostate-specific antigen (prostate cancer), carcinoembryonic antigen (colon cancer).

C CD20 (lymphoid cancers). S100 (melanoma). and estrogen receptor (uterine cancer).

In addition, many previously unrecognized markers were discovered, the vast majority of which are tissue-specific genes, reflecting a recurring onco-developmental connection that has been described for many cancers For example, a search for colorectal adenocarcinoma-specific markers revealed 27 that were statistically significant (P 0.01) based on permutation testing. These genes include intestine-specific transcription factors, cytoskeletal and adhesion molecules, signaling molecules, and membrane-bound tumor markers. Notably, the two transcription factors, Cdx- I and Bteb-2, are both downstream targets of the Wnt- I /p-Catenin 00 signaling pathway, which is mutated in most colorectal cancers (25-27). The other Sstatistically significant colon adenocarcinoma marker genes are thus also candidates Sfor being under Wnt-1 /p-Catenin control. This observation suggests that the gene expression database described here may be useful not only for cancer diagnosis, but also for the generation of new biological hypotheses into the pathogenesis of cancer.

The significant degree of shared gene expression between tumors and their normal tissue counterparts prompted us to ask whether supervised learning could be used to distinguish 210 primary tumors considered as a single class from a collection of normal tissues. By using the S2N metric, we were unable to identify single gene markers that are uniformly expressed only in cancer and not normal tissue.

Nevertheless, using the SVM algorithm in cross-validation, we were able to make this pairwise distinction with high accuracy indicating the presence of a cancer-specific gene expression fingerprint common to all tumors.

We next considered the 28 samples that yielded low-confidence predictions in cross-validation, as these samples are generally misclassified by the multiclass predictor. We found that a large number (17 of 28) were moderately or poorly differentiated (high-grade) carcinomas. It can be difficult to classify such tumors with traditional methods because they often lack the characteristic morphological hallmarks of the organ from which they arise. It has been assumed that these tumors are nonetheless fundamentally molecularly similar to their better-differentiated counterparts, apart from a few differences that might account for their clinically aggressive nature. We directly tested this hypothesis by applying our multiclass classifier, trained on the original 144-tumor dataset, to an independent set of poorly differentiated tumors.

Gene expression data were collected from 20 poorly differentiated adenocarcinomas (14 primary and 6 metastatic). representing 5 tumor types: breast, lung. colon, ovary, 00 and uterus. The technical quality of this dataset was indistinguishable from the other C samples in the study. However, these tumors could not be accurately classified according to their tissues of origin, compared with the high overall accuracy seen with lower-grade tumors. Overall, only 6/20 samples were correctly classified, which is statistically no better than what one would expect by chance alone (P 0.38) (Fig. Because the classifier relies on the expression of thousands of similarly weighted tissue-specific molecular markers to determine the class of a tumor, these M\ findings indicate that poorly differentiated tumors do not simply lack a few key markers of differentiation, but rather have fundamentally distinct gene expression 0 patterns. This result has significant implications for the future management of patients 00 with these cancers.

C-I DISCUSSION We report here the creation of a gene expression database from 308 common human cancers and normal tissues by using oligonucleotide microarrays and demonstrate that multiclass cancer diagnosis is feasible by means of comparison of an unknown sample to this reference database. Notably, molecularly complex solid tumors can be distinguished with this method despite the presence of varying proportions of nonneoplastic elements in clinical specimens. These findings suggest a new strategy for the future uniform and comprehensive molecular classification of primary and metastatic tumors.

The multiclass classifier that we describe is highly accurate, but is not perfect. That errors were evenly distributed throughout most solid tumor classes and that half of the errors were "close calls" imply that improved accuracy might be possible by increasing the number of samples from these classes in the training set, beyond the modest number used in this study.

Our findings also imply that information useful for multiclass tumor classification is encoded in complex gene expression patterns not adequately captured by a small number of genes. Although pairwise distinctions can be made between select tumor classes using fewer genes, multiclass distinctions among highly related tumor types adenocarcinomas) are intrinsically more difficult. The effects of biological and measurement noise, contaminating nonmalignant tumor components, and inclusion of genetically heterogeneous samples within clinically defined tumor classes may all effectively decrease predictive power in the multiclass setting. Increased gene number likely allows for highly accurate prediction despite these factors. A greater variety and large number of tumors with detailed clinico-pathological characterization will be 00 required to fully explore the true limitations of gene expression-based multiclass C1 classification. In addition, the SVM-based classification strategy used here may not be the optimal method for every type of multiclass problem. Other classification schemes, classification algorithms, or novel marker selection methods might also be useful for making multiclass distinctions.

The results were very interesting. The poorly differentiated tumors studied in this C, study could not be classified according to their tissues of origin, despite the classilier's j use of thousands of tissue-specific molecular markers. We had expected that these tumors would have fundamentally similar gene expression patterns compared with 0 their well differentiated counterparts, with only minor differences. To the contrary.

0 our data indicate that poorly differentiated tumors have a very different gene C, expression program. On a fundamental level, this finding raises the possibilities that poorly differentiated tumors arise from distinct cellular precursors, have different molecular mechanisms of transformation, or have unique natural histories in some other respect. This finding also has important clinical implications in that it suggests that these tumors should be classified distinctly, rather than lumped with well differentiated tumors arising from the same organ. Given the clinically aggressive nature of poorly differentiated cancers, some markers of poorly differentiated tumors might prove generally useful for predicting poorer clinical outcome.

Expression-based multiclass cancer classification is not a substitute for traditional diagnostics, but it represents a potentially important adjunct. Molecular characteristics of a tumor sample may remain intact despite atypical clinical or histological features.

Classification occurs through an algorithmic rather than subjective approach in which classification confidence is quantified. In addition, all samples are evaluated by a uniform method that can be standardized throughout the medical community.

Currently, diagnostic advances are disseminated into clinical practice in a slow and uneven fashion. By contrast, a centralized classification database may allow classification accuracy to rapidly improve as the classification algorithm "learns" from an ever-growing database. As robust molecular correlates of stage, natural history, and treatment response in multiple tumor classes are discovered 2, 29) computational methods for making multiclass distinctions using gene expression or proteomic data will take on increasing importance.

Clinical trials will be required to determine how best to integrate genomics-based diagnostics into standard patient care. This study provides insight into the form such molecular diagnosis might take. A future challenge is to directly apply this approach to the diagnosis ofclinically ambiguous tumors. In addition, many have assumed that 00 DNA microarrays will be useful for the high-throughput discovery of tumor-specifi marker genes, but that clinical implementation Will use routine immunohistochemistrx' or other traditional methods. Indeed. some ofthe nmrkers that we describe may prove useful in this realm. However, our results indicate that optimal rrulticlass molecular classificatioin may require gene numbers that are beyond the scope of traditional molectilar diagnostics such as irnmunohistochemnistry. This finding suggests that the successful clinical deployment of comprehensive molecti lar-based classification may require the introdtiction ot highly parallel platlborrs such as DNA mnicroarrays into the clinical setting.

00 REFERENCES I Ramnaswai'n, S.:Osteen. R Shulman. L N. C linical Oncology, Lenthard R F, Osteen R T, Gansler 'L editors. Atlanta: Am. Cancer Soc.: 200) 1. pp.

711-719.

2. Tornaszewski J F, LiVolsi V A. Cuncei% 1 999;86:2 198-2200, 3. Connolly. J L.:-Schnitt, S i.;Wang, IF I L.:Dvorak, A NM.; Dvorak. I- F. Ca icer Mledicine, 1lolland.J F, Frei E, Bast R. C, Kuf'e D W. Mor-ton D L.

Weichselbaumn R R. editors. Baltimore: Williams Wilkins: 1997. pp.

533--555.

4. Ciolub T R. Slonim D K.JTamnavo P, Fluard C. Gaasenbeek M. Ni sirov J P.

C oIler 1-1. Loh Mi L, Downing J R, Caflgiuri NM A. et al. Science.

1999:286:53 1-537.

Aliiadeh.A A, Eisen Mi B, Davi: R Ma C. Lossos I S. Rosenw ald A.

Bo4dri ck J C, Sabet fH, Tran T, YuI X, et al. Nkature (London), 2000-,403:503--511.

6. Bittner Ni. Meltzeir P, Chen Y.AJang Y, Sertor E. Hlendrix Ni, Radmnacher Ni, Simon R. Yakhini Z. Ben-Dor A, et al. N7ate (Londo),) 2000,406:5 36--540.

7T Prou C Ni, Sorlie T, Eisen M 13, van de Ri n M. Jeffrev S S. Rees C A, Pollac'k J1 R. Ross D)T, Johnsen H. Akslen L A, et al. Nature (Londlon).

2000;406:747-752.

8. 1ledenfalk I, D)uggan Chen Y, Radrnacher Ni, Simon R, Mieltzer P~, (justerson B Estellc M'N, Kallionierni 0 Wilfond B, et al. N Engi .J Aled.

2001.344:539-548.

9. Khan J, Wci. S Ringner Ni. Saal L 1-1 Ladanvi Ni. Westcrmiann F. Berthold F, Schwab M. Antonescu C R, Peterson C. et al. N~at Med?(. 200)1:7:673--679.

Dhanasekaran S NM, Bairrette T R, (ihosh D, Shah R, Varambally S. Kurachi K, Pienta K J. Rublin Ni A. Chinnaiyarn A Ni. N'ature. 200 l;412:822-826.

00 11. Eisen MI B, Spellman P T. Brown P1 CX Botstein Pr-oc N'a/I A cad Sci USA.

1998: 95:1I4863--14868.

12. Tarnayo P, Slonim D7. Mesirov J. Zhu Q, Kitareewan S, 1)mitrovsky F, Lander E (olub 1 R. Proc Nail.Acad Sc lIS'A, 1 999;96:2907-29 121 13. G7uy'on, 1L, Weston, B~arnhill, S. Vapnik. V. Mach. Learn. 2002; 46:389-422.

14. i-lair-f J .:Anderson, R 12_;Tatharn. R Black. W C. Mluitivariate Data Analysis. Englewood Cliffs. NJ: Prentice-Hall: 1998.

Slonim. D) K. Proceedings of the I'Ottirth Annmual International Uo/rneon Comnt~.at1.onal "Aolectilar IBiolog) Tokyo: Universal Acad. Pr~ess; 2000. pp.

00 263--272.

Daaah.V1.A nr lsIfiio Techniques. Los Alamnitos, CA: IFTE Comp. Soc. Press: 1991.

17. Brown M1 P, Grundy W N. tin D, Christianini N, Sunt C W. Furey t'S.

Arc 11ussker Pr-oc N'atIAcad Sd USA. 2000;97:262--267.

18. Furey I' hiistianini N. Duffy N, Bcdnarski D) W, Schumnmer MI, IlaUssler D).

Bloin/in' maticS 2 000: 16:906-9 14.

19. Vapnik, V N.Statistical Learnim.,rTheor New York: Wiley: 1998.

L'vgcn ion T. 1ontil NI, Poggio T. Adv Ciolnw utth. 2000; 13:1--50.

2 1. Hainsworth J D. Greco F A. N' Engid Mcfd 1993X39:257-261.

22. Chapelle, Vapnik. IBousquet. 0. NMUkherjce. S. Mach. Learn.

2002:30:161-190.

23. Hastie. ibshirani. Eisen. NI. Alizacich. Levy, Staudt, L., Chan, W. C Botstein, Brown, P. (2000) Genoine Biol. 1, RFLSLARCHOO03.

24. Taipale J, Beachy P A. JNature (London). 2001 ;41 1:349-354.

Lickert Dornon C, fHuLs CG. Wehrle C. DuIluc 1, Clevers NIeyer 1 Freund J N, Kern Icr R. Dereiolmient (Ca'mbridgeC, U 2'000:127:3805--381'3.

26. Zierner L T, Pennica D. Levine A J. MIfo Cell Biol. 200 1:121:562-5..74.

27. ieinz M, C levers FH, Cell. 2000; t03:3 11-320.

28. Schcrf U. Ross D T. Waltham NI. Smith L, Lee J K. Tanahe L. Kohn K W, Reinhold W C, Mye rs T Andrews DT,'F et al. N. at Genet 2000,24:236-244.

29. Staunton J E. Slonim D K. Col ler fH A, Tamnayo Angelo MI J, p~ark J.

ScherflU, Lee J1 K, Reinhold W 0. Weinstein J N. et al. I'roc N'ati A'ad Sc USA. 2001:98:10787-10792.

13 00 Su A 1, Welsh J1 B, Sapirioso L, M, Kern S G. Iimitrov 1P. L app 1-1. SChuLltz P CKI G. Pow~ell S M, Moskaluk (C A. Frierson 1-1 FJr. H ampton G Cancer ReS, 2001:61:7388-7393'.

I