WO2005068657A2

WO2005068657A2 - Genetically profiled cell lines (gpcl) and methods of utilizing same for genetic dissection of cellular phenotypes

Info

Publication number: WO2005068657A2
Application number: PCT/IL2005/000077
Authority: WO
Inventors: Ariel Darvasi; Sagiv Shifman
Original assignee: Yissum Research Development Company Of The Hebrew University Of Jerusalem
Priority date: 2004-01-20
Filing date: 2005-01-20
Publication date: 2005-07-28
Also published as: WO2005068657A3

Abstract

Genetically profiled cell cultures are provided along with a dataset representing the genetic profile of the cells. Also provided are methods and kits using same for the identification of the genetic basis of a trait, a disease or a state which can be directly or indirectly measured in a cell-based assay. In addition, the methods provided by the present invention can be used to: (i) identify novel drugs for the treatment and/or prevention of diseases (ii) determine population variation to drug responsiveness, efficacy, toxicity and adverse effects, and (iii) to optimize the drug lead process.

Description

GENETICALLY PROFILED CELL LINES (GPCL) AND METHODS OF UTILIZING SAME FOR GENETIC DISSECTION OF CELLULAR PHENOTYPES

FIELD AND BACKGROUND OF THE INVENTION The present invention relates to genetically profiled cell cultures which can be used to identify the genetic basis of a trait, a disease or a state which can be directly or indirectly measured in a cell-based assay. In addition, the present invention can be used to: (i) identify novel drugs for the treatment and/or prevention of diseases (ii) determine population variation to drug responsiveness, efficacy, toxiciry and adverse effects, and (iii) to optimize the drug lead process.

The process of drug approval involves testing potential drug molecules in cell cultures (i.e., in vitro) and laboratory animals (i.e., in vivo) and recording the positive and negative effects generated by the drug molecules on the cells or animals. Once a drug molecule is found to be useful and safe in laboratory animals it enters into clinical trials to determine therapeutic dosage and overall effectiveness. In these trials the positive and negative effects of the drug are compared with the effects generated by placebos. For most drug molecules, the overall effectiveness varies between 50 to 75 % in the entire patient population (Spear, B.B. et al. 2001. Clinical application of pharmacogenetics. Trends Mol Med 7: 201-204). In addition, the adverse effects generated by drug molecules in certain individuals are often unpredictable and can result in death; in 1994, 100,000 death cases in the USA were attributed to adverse reactions to various drugs (Lazarou J, et al., 1998. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA 279: 1200- 1205). Thus, many drug candidate molecules fail during the drug discovery process, causing enormous financial implications for the pharmaceutical industry (Jefferys, D.B. et al. 1998. New active substances authorized in the United Kingdom between 1972 and 1994. Br. J. Clin. Pharmacol. 45: 151-156). Twin and family studies revealed high degree of heritability in the rate of drug metabolism (Nesell, E.S. 1989. Pharmacogenetic perspectives gained from twin and family studies. Pharmacol. Ther. 41:535-552), suggesting a genetic basis for drug effectiveness and adverse effects. Further studies have demonstrated that the differences among normal human subjects in the efficacy and safety of therapeutic agents are caused by genetic polymorphisms in drug-metabolizing enzymes, drug transporters, and drug receptors (Nesell, E.S. 2000. Advances in pharmacogenetics and pharmacogenomics. J. Clin. Pharmacol. 40: 930-8; Evans, W.E. and Relling, M.N. 1999. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 286: 487-491; Ishikawa, T., et al., 2004. The genetic polymorphism of drug transporters: functional analysis approaches. Pharmacogenomics. 5: 67-99).

Thus, to increase drug effectiveness and to avoid the drug-related toxicity and adverse effects the drug molecule should be designed specific to a group of patients sharing the same genetic make-up or to ensure that the drug effect has the desirable effect across different genetic backgrounds. Several pharmacogenomic studies attempted to reveal the genetic basis of specific differences in susceptibility to drugs which were used to treat hypertension (Cadman, P.E., and O'Connor, D.T. 2003. Pharmacogenomics of hypertension. Curr. Opin. Νephrol. Hypertens. 12: 61-70), cardiovascular diseases (Mehraban, F., and Tomlinson, J.E. 2001. Application of industrial scale genomics to discovery of therapeutic targets in heart failure. Eur J Heart Fail. 3: 641-50), arrhythmias (Woosley, R.L. and Roden, D.M., 1987. Pharmacologic causes of arrhythmogenic actions of antiarrhythmic drugs. Am. J. Cardiol. 59: 19E-25E) and Alzheimer's disease (Issa AM, Keyserlingk EW. 2000. Apolipoprotein E genotyping for pharmacogenetic purposes in Alzheimer's disease: emerging ethical issues. Can J Psychiatry. 45: 917-22). However, disease-related pharmacogenomic studies require genotyping of many thousands of genetic markers such as single nucleotide polymorphisms (SΝPs) in numerous patients for each disease in order to achieve statistical significance. There is thus a widely recognized need for, and it would be highly advantageous to have, a method of predicting drug effectiveness, toxicity and adverse effects devoid of the above limitations.

SUMMARY OF THE INVENTION According to one aspect of the present invention there is provided a kit for predicting efficacy of drug treatment on an individual, the kit comprising: (a) a collection of cell cultures derived from cells of individuals being representative of at least one population, and; (b) a dataset representing a genetic profile of cells of each of the cell cultures. According to another aspect of the present invention there is provided a method of associating phenotypic characteristics with a genetic profile of an individual, the method comprising: (a) providing a collection of cell cultures derived from cells of individuals, the individuals being representative of at least one population; (b) qualifying genomes, transcriptomes and/or proteomes of cells of each of the cell cultures thereby generating a dataset representing genetic profiles of cells of the cell cultures; (c) exposing the cell cultures to an agent or a condition, and; (d) associating the dataset with an alteration in a phenotype in cells of at least one cell culture of the cell cultures resultant from the exposing to the agent or the condition thereby associating phenotypic characteristics with the genetic profile of the individual.

According to yet another aspect of the present invention there is provided a method of detenrirning a predisposition of an individual having a specific genetic profile to a disease comprising: (a) establishing cell cultures from cells derived from individuals representative of at least one population; (b) qualifying genomes, transcriptomes and/or proteomes of cells of each of the cell cultures thereby generating a dataset representing genetic profiles of cells of the cell cultures; (c) associating the dataset with the presence or absence of the disease in at least one individual of the at least one population thereby determining the predisposition of the individual having a specific genetic profile to the disease. According to still another aspect of the present invention there is provided a collection of genetically profiled cell cultures comprising cell cultures derived from cells of individuals being representative of at least one population, and a dataset representing a genetic profile of cells of each of the cell cultures. According to further features in preferred embodiments of the invention described below, the cell cultures are capable of proliferation. According to still further features in the described preferred embodiments the cell cultures include cells selected from the group consisting of lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells. According to still further features in the described preferred embodiments the at least one population is selected from the group consisting of a Finnish population, a Sardinian population, an Ashkenazi Jew population, a South-West-Netherlands population, a Danish population, an Icelander population, a Swedish population, a Kizilcaboluk-Denizli Turkish population, a Brazilian Amondava population, a Newfoundland Canadian population, a Bedouin of Kuwait population, a Saguenay- Lac-Saint-Jean (SLSJ) Quebec Canadian population, a Salandra Italian population, a Caucasian population, an African population, an Asian population, and an Hispanic population. According to still further features in the described preferred embodiments the genetic profile represents a DNA polymorphism profile. According to still further features in the described preferred embodiments the DNA polymorphism is selected from the group consisting of single nucleotide polymorphism, micro-deletion, micro-insertion, short deletions and insertions, multinucleotide changes, short tandem repeats (STR), and variable number of tandem repeats (VNTR). According to still further features in the described preferred embodiments the genetic profile represents an RNA and/or protein expression and/or activity pattern. According to still further features in the described preferred embodiments the cell cultures are arranged in a form of a multi-well plate format. According to still further features in the described preferred embodiments the agent is a drug and/or a chemical agent. According to still further features in the described preferred embodiments the qualifying a genome is effected by identifying at least one sequence modification in a DNA sequence of the cells. According to still further features in the described preferred embodiments the identifying is effected by a method selected from the group consisting of restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymorphism (SSCP) analysis, Dideoxy finge rinting (ddF), pyrosequencing analysis, acycloprime analysis, Reverse dot blot, GeneChip microarrays, Dynamic allele-specific hybridization (DASH), Peptide nucleic acid (PNA) and locked nucleic acids (LNA) probes, TaqMan, Molecular Beacons, Intercalating dye, FRET primers, AlphaScreen, SNPstream, genetic bit analysis (GBA), Multiplex minisequencing, SNaPshot, MassEXTEND, MassArray, GOOD assay, Microarray miniseq, arrayed primer extension (APEX), Microarray primer extension, Tag arrays, Coded microspheres, Template-directed incorporation (TOT), fluorescence polarization, Colorimetric oligonucleotide ligation assay (OLA), Sequence-coded OLA, Microarray ligation, Ligase chain reaction, Padlock probes, Rolling circle amplification, and Invader assay.

According to still further features in the described preferred embodiments the phenotype is selected from the group consisting of a morphological phenotype, a viability phenotype, a molecular phenotype, a differentiation phenotype, a proliferation phenotype, a cell behavioral phenotype, a susceptibility phenotype, and a resistance phenotype.

According to still further features in the described preferred embodiments the phenotype is detected using histological stains, flow cytometry analysis, biochemical assays, immunological assays, cell proliferation assays, cell differentiation assays, and/or RNA assays. The present invention successfully addresses the shortcomings of the presently known configurations by providing genetically profiled cell cultures and a dataset representing the genetic profile of the cells. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

BRIEF DESCRIPTION OF THE DRAWINGS The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice. In the drawings:

FIG. 1 illustrates the approximate genome coverage as a function of the number of SNPs tested using random selection (light blue line) or optimal selection (dark blue line). Note that approximately 100,000-150,000 tagged SNPs might be sufficient to significantly cover the human genome.

FIGs. 2a-b illustrate the approximate number of cell cultures (i.e., sample size) required to significantly identify a given gene effect. The sample size required for analysis (N) for different LD levels between the tested marker and the functional allele (r² = 0.25, 0.5 or 1) is plotted as a function of the gene effect. The gene effect is expressed in units of odds ratio (OR, Figure 2a) or inheritance (h , Figure 2b). The allele frequencies of the marker and the functional variant are assumed to be 0.25. An overall type I error (α) of 0.05 is assumed with 100,000 independent tests and power of 0.8. Figure 2a - calculated sample size for a dichotomous trait. The ratio between the numbers of cell lines exhibiting two different phenotypes is assumed to be 2:1. Figure 2b - sample size calculated for a quantitative trait locus with different extent of heritability (h²).

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention relates to genetically profiled cell cultures which can be used to identify the genetic basis of a trait, a disease or a state which can be directly or indirectly measured by a cell-based assay. In addition, the present invention can be used to: (i) identify novel drugs for the treatment and/or prevention of diseases (ii) determine population variation to drug responsiveness, efficacy, toxicity and adverse effects, and (iii) to optimize the drug lead process. More particularly, the present invention relates to the generation of cell cultures from cells of individuals being representative of a population or populations, determining the genotype of single nucleotide polymorphisms in the genomes of the cells and measuring gene expression, in response to treatment to thereby identify the genetic basis of the trait and to determine the population variation of the trait. Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details set forth in the following description or exemplified by the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and teirninology employed herein is for the purpose of description and should not be regarded as limiting.

Most drugs used in clinical applications have various degrees of effectiveness. Thus, while in some patients a specific drug can effectively reduce the symptoms of the disease, in other patients the same drug can either exert no positive effects or cause minor to severe negative side effects. Indeed, undesired side effects are not only difficult to predict, but can also lead to death in certain individuals. The reasons for such variability in drug responsiveness include the general health status of the treated individual, the degree of the disease, the genetic makeup of the individual and environmental factors.

Thus, it is beneficial in some cases of drug discovery and development to rely on pharmacogenomic studies. Pharmacogenomic studies apply genomics technologies such as gene sequencing, statistical genetics, and gene expression analysis to drugs in clinical development and on the market. The aim of pharmacogenomics is to identify or match therapeutic agents with specific genetic backgrounds. Such agents, which are designed according to a specific genetic make-up, are predicted to be more effective in treating the disease and are anticipated to cause less side effects. In order to develop pharmacogenomic drugs the genetic make-up characterizing a group of individuals has to be resolved. Specifically, the frequency of DNA polymorphisms within the group of individuals needs to be determined and correlated with the individual's clinical records and response to drug treatments. Previous attempts to reveal the genetic basis for drug susceptibility involved screening of multiple patients for each disease tested (e.g., hypertension, cardiovascular disease, arrhythmias, Alzheimer's disease) and genotyping of many thousands of genetic markers [e.g., single nucleotide polymorphism (SNPs)] in order to achieve statistical significance (Cadman, P.E., and O'Connor, D.T. 2003. Curr. Opin. Nephrol. Hypertens. 12: 61-70; Mehraban, F. and Tomlinson, J.E. 2001. Eur. J. Heart. Fail. 3: 641-50; Woosley, R.L. and Roden, D.M., 1987. Am. J. Cardiol. 59: 19E-25E; Issa AM, Keyserlingk EW. 2000. Can. J. Psychiatry. 45: 917-22).

While reducing the present invention to practice, the present inventors have devised a novel approach for identifying new drugs, predicting drug responsiveness and deteπriining predisposition to a disease. This approach, which is described in detail in the Examples section involves construction of a whole-genome profiled cell culture system which includes a genetic profile of all or part of the genes in the genome of a group of individuals being representative of a population or of several populations. Thus, according to one aspect of the present invention there is provided a collection of genetically profiled cell cultures of cells of individuals being representative of a population or populations, and a dataset representing the genetic profile of cells of each of the cell culture.

As used herein the phrase "genetically profiled" refers to allelic determination of polymorphisms in the genome, transcriptome and/or proteome of the cells. As used herein the term "population" refers in general to any group of people which share a common feature. Such people can be of the same ethnicity, ancestry origin, place of birth, and the like.

As used herein, the phrase "individuals representative of a population" refers to individuals which represent a statistically significant sample of the population. Determining the number of individuals (sample size) necessary to represent a specific population is central to successful implementation of the present methodology and as such, parameters which must be taken into consideration when determining a sample size are described in detail herein below. The population can be any desired population, preferably, the population used by the present invention is an isolated population which has been geographically, historically or ethnically isolated for several generations and displays an optimum ratio between homogenity and heterogenity. Isolated populations demonstrate increased level of linkage disequilibrium (LD), reduced genetic heterogeneity, and low levels of false positives due to population stratification (Shifman, S. and Darvasi, A. 2001. The value of isolated populations. Nat. Genet. 28: 309-310). The process of population bottleneck, isolation, and rapid growth resulted in genetically homogenous populations in which most of the rare variants were eliminated by a genetic drift. It will be appreciated that since isolated populations share common variants with the world population they can be used to identify genes which are relevant to all populations. In addition, while the identification of rare variants affecting a disease is practically impossible with all populations, in isolated populations, where many of those variants are not present, the signal to noise effect of the common variants is significantly larger. Thus, the use of an appropriate isolated population increases the statistical power of detection of correlation between a measured phenotype and a genetic variant. Non-limiting examples of populations suitable for use with the present invention include the Finnish population, the Sardinian population, the Ashkenazi Jew population, the South-West-Netherlands population, the Danish population, the Icelander population, the Swedish population, the Kizilcaboluk-Denizli Turkish population, the Brazilian Amondava population, the Newfoundland Canadian population, the Bedouin of Kuwait population, the Saguenay-Lac-Saint-Jean (SLSJ) Quebec Canadian population, and the Salandra Italian population. It will be appreciated that the present invention can also be applied to defined heterogeneous populations such as Caucasians, Afro-Americans, Africans, Asians, Hispanics, and the like. The invention can also be used with a panel of samples with mixed ethnicities representative of the entire human species. The cells of the cultures can be any cells which can be maintained under culturing and/or storage conditions. Preferably, the cells are selected capable of proliferating in culture. Non-limiting examples of cells suitable for use along with the present invention include lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells. The cell cultures of the present invention can be prepared using methods known in the arts. For example, lymphoblastoid cell lines are prepared from lymphocytes isolated from peripheral blood samples. Briefly, 15 ml of blood samples are mixed with 25 ml of phosphate-buffered saline (PBS) and 10 ml of Ficoll-Hypaque solution (Bio-Whittaker) and centrifuged for 20 minutes at 800 x g. Following centrifugation the interface layer including the lymphocytes is collected and centrifuged again for 10 minutes at 600 x g. The resulting lymphocyte pellet is washed and resuspended in a complete culture medium at a concentration of 1 x 10⁶ cells/ml, and 5 ml of the lymphocyte solution are placed in an upright position 25-cm² tissue culture flask. Prior to transformation cells are incubated for 1 hour with 10 μg/ml of anti-CD3 antibody following which 0.5 ml of the EBV-containing B95-8 supernatant is added to the flask. Lymphocytes are cultured for 1-2 weeks in the upright position allowing transformation to occur. It will be appreciated that following transformation the color of the medium changes to an orange/yellow color and small clumps of cells are visible. The transformed lymphoblastoid cells are then propagated by adding 5 ml of fresh complete medium for another 2-3 days of culturing following which 5 ml of the supernatant are replaced with 5 ml of fresh complete medium until the total cell number exceeds 5 x 10⁶ cells. Cells are then transferred to 75-cm flasks containing 50 ml of complete medium and incubated until cell concentration is > 1 x 10⁶ cells/ml. Lymphoblast cultures are maintained in culture by splitting the cultures to 1 x 10⁵ cells/ml and propagating the culture until a concentration of 1 x 10⁶ cells/ml is achieved. Fibroblasts can be prepared from skin biopsies using methods known in the art. Briefly, a skin biopsy of approximately 0.1 cm² is taken, for example, from under the arm and is placed in tissue-culture medium (e.g. Ham's F12 media, Gibco Laboratories Grand Island, New York, USA). The skin tissue is cut into small tissue chunks and approximately 10 pieces are placed on a wet surface of a tissue culture flask. The tissue culture flask is turned upside down, closed tight and left for 24 hours at room temperature. The flask is then inverted back and the tissue chunks remain fixed to the bottom of the flask. A fresh tissue culture media [e.g., Ham's F12 media, with 10 % fetal bovine serum (FBS), penicillin and streptomycin] is added and the flasks are incubated at 37 °C for approximately one week. Following one week of incubation a fresh media is added and subsequently changed every several days. After an additional two weeks in culture, a monolayer of fibroblasts emerge. The fibroblasts culture are propagated by trypsinization of the monolayer and passage into larger flasks. Hematopoietic stem cells can be provided from bone marrow cells, mobilized peripheral blood cells or cord blood cells. Bone marrow cells can be obtained from the donor by standard bone marrow aspiration techniques know in the art, for example by aspiration of marrow from the iliac crest. Peripheral blood stem cells are obtained after stimulation of the donor with a single or several doses of a suitable cytokine, such as granulocyte colony-stimulating factor (G-CSF), granulocyte/macrophage colony- stimulating factor (GM-CSF) and interleukin-3 (IL-3). In order to harvest desirable amounts of stem cells from the peripheral blood cells, leukapheresis is performed by conventional techniques (Caspar, C.B. et al., 1993. Effective stimulation of donors for granulocyte transfusions with recombinant methionyl granulocyte colony-stimulating factor. Blood. 81: 2866-71) and the final product is tested for mononuclear cells. Cord blood cells are obtained from newborn individuals. Nucleated cells are separated from erythrocytes using methods known in the arts such as a bag system and separation by agglutination (see International Publication No. WO 96/17514). CD43 expressing hematopoietic stem cells are enriched using combinations of density centrifugation, immuno-magnetic bead purification, affinity chromatography, and fluorescent active cell sorting (FACS). CD34+ enriched stem cells are then cultured in the presence of growth factors such as IL-3 and stem cell factor.

To obtain the genetic profile dataset, cells of each culture of the collection are subjected to allelic determination of DNA polymorphisms, RNA polymorphisms and/or protein polymorphisms .

The term "DNA polymorphism" refers to the occurrence of two or more genetically determined variant forms (alleles) of a particular nucleic acid at a frequency where the rarer (or rarest) form could not be maintained by recurrent mutation alone. According to preferred embodiments of the present invention the DNA polymorphisms of the present invention can be single nucleotide polymorphism (SNP), microdeletion and/or microinsertion of at least one nucleotide, short deletions and insertions, multinucleotide changes, short tandem repeats (STR), and variable number of tandem repeats (VNTR). Preferably, the DNA polymorphism of the present invention is a SNP. The genetic profile of DNA polymorphism indicates presence of specific alleles of a set of DNA polymorphisms in the genome of at least one of the cells. For example, in a biallelic SNP, a certain genome can be homozygote to the common variant form, i.e., having two copies of the common allele, heterozygote, i.e., having one copy of the common allele and one copy of the rare allele, or homozygote to the rare variant form, i.e., having two copies of the rare allele. Thus, for a set of SNPs, a specific genome can have a specific genetic profile. The genetic profile of RNA polymorphism refers to the expression pattern of various genes in the cells and indicates the presence or absence of specific RNA transcripts and/or the expression of specific allelic polymorphisms (i.e., RNA transcripts expressing one polymorphic allele and not the other). For example, the expression of certain genes in specific cell types (e.g., lymphocytes) is polymorphic, i.e., certain individuals express the gene in such cells and other individuals do not. In addition, certain genes are subjected to alternative splicing resulting in several RNA transcripts from a single gene. The expression of such alternatively spliced transcripts can be polymorphic in the population i.e., certain RNA splice forms are expressed in a specific cell type of one individual but not in the same cell type of another individual. The genetic profile of protein polymorphism refers to the expression and/or activity pattern of various proteins in the cells and indicates the presence or absence, rate of production, degradation, and/or activity of specific protein allelic polymorphisms. According to preferred embodiments the dataset of the present invention also includes information about the individuals from which the cells have been derived. Such information is obtained from questionnaires completed by the individual and including, but not limited to individual's age, sex, health status (i.e., the presence or absence of a disease), clinical characterization, medication in use including positive and negative effects.

According to preferred embodiments of the present invention, genetic profiling of DNA polymorphism is effected via SNP analysis. The type and number of SNPs which are genotyped according to the present invention depends on the SNP selection method, the population used, the statistical power and the gene effect as described hereinbelow and in Example 1 of the Examples section which follows. According to one preferred embodiment of the present invention, the SNPs used for genetic profiling according to the present invention are selected in constant intervals across the genome. For example, a selection of a SNP every 10 Kb in the genome is translated to 300,000 SNPs, located at least 5 Kb away from any variant in the genome. The advantage of using this SNP selection method along with the present invention is that linkage disequilibrium (LD) information is not required and that no assumption is made as to the type or location of the disease-susceptibility variants. As used herein, the term "linkage disequilibrium" refers to the non-random association between alleles at different adjacent loci.

According to another preferred embodiment of the present invention, the SNPs used for genetic profiling according to the present invention are selected from tag SNPs which capture most of the genetic variability in the population. As used herein the phrase "tag SNPs" refers to a subset of SNPs which are sufficient to capture the full haplotype information in regions of high LD i.e., in SNP haplotypes blocks (Johnson, G.C. et al. 2001. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29: 233-237; Ke, X. and Cardon, L.R, 2003. Efficient selective screening of haplotype tag SNPs. Bioinformatics. 19: 287-8). SNP haplotype blocks are groups of SNP locations that do not appear to recombine independently and that can be grouped together in blocks of SNPs. It will be appreciated that such a method uses only common haplotypes which are captured by the haplotype tagged SNPs (htSNPs) and that the haplotype distribution must be identified in advance. It has been estimated that 150,000 tag SNPs are sufficient to screen the genome with European samples (Goldstein, D.B., 2003. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19: 615-622). Indeed, as is shown in Figure 1 and in the Examples section which follows, approximately 100,000-150,000 tagged SNPs are sufficient to cover 80-90 % of the human genome. On the other hand, if the SNPs are randomly selected at constant intervals, more than 200,000 SNPs will cover only 80 % of the human genome. Different populations have been shown to have different levels of LD (Shifinan et al. Hum. Mol. Genet. 2003, 12: 771-6). LD-based approaches require different number of SNPs. For example, an order of 150,000 or 300,000 SNPs are required if the European population or the African- American populations, respectively, are used. On the other hand, the number of SNPs required can decline to 75,000 or even 15,000 if a small stable population such as the Saami of Scandinavia or another young and homogenous population is used. According to yet another preferred embodiment of the present invention the SNPs used by the present invention are selected using a sequence-based (or gene-based) strategy. This strategy consists of identifying variants in genes, together with their regulatory regions which may possess a functional effect. Thus, the sequence-based strategy requires prior knowledge concerning the type of variants which are predicted to be associated with the disease. It will be appreciated that a genetic profile based on this strategy requires the analysis of approximately 50,000-100,000 SNPs, in any population (Botstein, D., and Risch, N. 2003. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat. Genet. 33 Suppl: 228-237). The public database available via the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) currently includes 50,000 SNPs in coding regions, of which approximately 25,000 SNPs are non-synonymous SNPs (i.e., cause a change in an amino acid) and are thus in high priority for use with the present invention. In addition, there are about 250,000 SNPs in untranslated regions (UTR) of gene's transcripts. The SNPs in the UTR may possess functional properties (e.g., in the transcript stability level), and thus are also considered for use with the method of the present invention.

The SNPs of the present invention can be identified using a variety of methods. One option is to determine the entire gene sequence of a PCR reaction product. Alternatively, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain- terminating nucleotide analogs. Following is a non-limiting list of SNPs detection methods which can be used along with the present invention. Restriction fragment length polymorphism (RFLP): This method uses a change in a single nucleotide (the SNP nucleotide) which modifies a recognition site for a restriction enzyme resulting in the creation or destruction of an RFLP. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, genetically named the "Mismatch Chemical Cleavage" (MCC) (Gogos et al., Nucl. Acids Res., 18:6807-6817, 1990). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals which are not suited for use in a clinical laboratory.

Allele specific oligonucleotide (ASO): In this method an allele-specific oligonucleotides (ASOs) is designed to hybridize in proximity to the polymorphic nucleotide, such that a primer extension or ligation event can be used as the indicator of a match or a mis-match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific SNPs (Conner et al, Proc. Nafl. Acad. Sci., 80:278-282, 1983). The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles.

Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis" (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of SNPs in the target sequences because of the corresponding changes in their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of G-C base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the fraction of mutations that can be recognized by DGGE (Abrams et al., Genomics 7:463-475, 1990). Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature (Sheffield et al., Proc. Natl. Acad. Sci., 86:232-236, 1989; and Lerman and Silverstein, Meth. Enzymol., 155:482-501, 1987). Modifications of the technique have been developed, using temperature gradients (Wartell et al., Nucl. Acids Res., 18:2699-2701, 1990), and the method can be also applied to RNA:RNA duplexes (Smith et al., Genomics 3:217-223, 1988). Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires specialized equipment to prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE) (Borrensen et al., Proc. Natl. Acad. Sci. USA 88:8405, 1991). CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of SNPs. A technique analogous to DGGE, termed temperature gradient gel electrophoresis (TGGE), uses a thermal gradient rather than a chemical denaturant gradient (Scholz, et al, Hum. Mol. Genet. 2:2155, 1993). TGGE requires the use of specialized equipment which can generate a temperature gradient perpendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to ranning the gel.

Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single-Strand Conformation Polymorphism" (SSCP) was developed by Hayashi, Sekya and colleagues (reviewed by Hayashi, PCR Meth. Appl., 1:34-38, 1991) and is based on the observation that single strands of nucleic acid can take on characteristic conformations in non-denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations (Orita, et al, Genomics 5:874-879, 1989).

The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non- denaturing polyacrylamide gel, so that infra-molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fingeφrinting (ddF) is another technique developed to scan genes for the presence of mutations (Liu and Somrner, PCR Methods Appli., 4:97, 1994). The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90 % of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened. Pyrosequencing™ analysis (Pyrosequencing, Inc. Westborough, MA, USA): This technique is based on the hybridization of a sequencing primer to a single stranded, PCR-amplified, DNA template in the presence of DNA polymerase, ATP sulfurylase, luciferase and apyrase enzymes and the adenosine 5' phosphosulfate (APS) and luciferin substrates. In the second step the first of four deoxynucleotide triphosphates (dNTP) is added to the reaction and the DNA polymerase catalyzes the incorporation of the deoxynucleotide triphosphate into the DNA strand, if it is complementary to the base in the template strand. Each incorporation event is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. In the last step the ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5' phosphosulfate. This ATP drives the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. The light produced in the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) camera and seen as a peak in a pyrogram™. Each light signal is proportional to the number of nucleotides incorporated.

Acycloprime™ analysis (Perkin Elmer, Boston, Massachusetts, USA): This technique is based on fluorescent polarization (FP) detection. Following PCR amplification of the sequence containing the SNP of interest, excess primer and dNTPs are removed through incubation with shrimp alkaline phosphatase (SAP) and exonuclease I. Once the enzymes are heat inactivated, the Acycloprime-FP process uses a thermostable polymerase to add one of two fluorescent terminators to a primer that ends immediately upstream of the SNP site. The terminator(s) added are identified by their increased FP and represent the allele(s) present in the original DNA sample. The Acycloprime process uses AcycloPol™, a novel mutant thermostable polymerase from the Archeon family, and a pair of AcycloTerminators™ labeled with R110 and TAMRA, representing the possible alleles for the SNP of interest. AcycloTerminator™ non-nucleotide analogs are biologically active with a variety of DNA polymerases. Similarly to 2', 3'-dideoxynucleotide-5'-triphosphates, the acyclic analogs function as chain terminators. The analog is incorporated by the DNA polymerase in a base-specific manner onto the 3 '-end of the DNA chain, and since there is no 3'-hydroxyl, is unable to function in further chain elongation. It has been found that AcycloPol has a higher affinity and specificity for derivatized AcycloTerminators than various Taq mutant have for derivatized 2',3'- dideoxynucleotide terminators. Reverse dot blot: This technique uses labeled sequence specific oligonucleotide probes and unlabeled nucleic acid samples. Activated primary arnine-coηjugated oligonucleotides are covalently attached to carboxylated nylon membranes. After hybridization and washing, the labeled probe, or a labeled fragment of the probe, can be released using oligomer restriction, i.e., the digestion of the duplex hybrid with a restriction enzyme. Circular spots or lines are visualized colorimetrically after hybridization through the use of streptavidin horseradish peroxidase incubation followed by development using tetramethylbenzidine and hydrogen peroxide, or via chemiluminescence after incubation with avidin alkaline phosphatase conjugate and a luminous substrate susceptible to enzyme activation, such as CSPD, followed by exposure to x-ray film. It will be appreciated that advances in the field of SNP detection have provided additional accurate, easy, and inexpensive large-scale SNP genotyping techniques, such as dynamic allele-specific hybridization (DASH, Howell, W.M. et al., 1999. Dynamic allele-specific hybridization (DASH). Nat. Biotechnol. 17: 87-8), microplate array diagonal gel electrophoresis [MADGE, Day, I.N. et al., 1995. High-throughput genotyping using horizontal polyacrylamide gels with wells arranged for microplate array diagonal gel electrophoresis (MADGE). Biotechniques. 19: 830-5], , the TaqMan system (Holland, P.M. et al., 1991. Detection of specific polymerase chain reaction product by utilizing the 5'→3' exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci U S A. 88: 7276-80), as well as various DNA "chip" technologies such as the GeneChip microarrays (e.g., Affymetrix SNP chips) which are disclosed in U.S. Pat. Appl. No. 6,300,063 to Lipshutz, et al. 2001, which is fully incorporated herein by reference, Genetic Bit Analysis (GBA™) which is described by Goelet, P. et al. (PCT Appl. No. 92/15712), peptide nucleic acid (PNA, Ren B, et al., 2004. Nucleic Acids Res. 32: e42) and locked nucleic acids (LNA, Latorra D, et al., 2003. Hum. Mutat. 22: 79-85) probes, Molecular Beacons (Abravaya K, et al., 2003. Clin Chem Lab Med. 41: 468-74), intercalating dye [Germer, S. and Higuchi, R. Single-tube genotyping without oligonucleotide probes. Genome Res. 9:72-78 (1999)], FRET primers (Solinas A et al., 2001. Nucleic Acids Res. 29: E96), AlphaScreen (Beaudet L, et al., Genome Res. 2001, 11(4): 600-8), SNPstream (Bell PA, et al., 2002. Biotechniques. Suppl.: 70-2, 74, 76-7), Multiplex minisequencing (Curcio M, et al., 2002. Electrophoresis. 23: 1467-72), SnaPshot (Turner D, et al., 2002. Hum Immunol. 63: 508-13), MassEXTEND (Cashman JR, et al., 2001. Drug Metab Dispos. 29: 1629- 37), GOOD assay (Sauer S, and Gut IG. 2003. Rapid Commun. Mass. Spectrom. 17: 1265-72), Microarray minisequencing (Liljedahl U, et al., 2003. Pharmacogenetics. 13: 7-17), arrayed primer extension (APEX) (Tonisson N, et al., 2000. Clin. Chem. Lab. Med. 38: 165-70), Microarray primer extension (O'Meara D, et al., 2002. Nucleic Acids Res. 30: e75), Tag arrays (Fan JB, et al., 2000. Genome Res. 10: 853-60), Template-directed incorporation (TDI) (Akula N, et al., 2002. Biotechniques. 32: 1072- 8), fluorescence polarization (Hsu TM, et al., 2001. Biotechniques. 31: 560, 562, 564- 8), Colorimetric oligonucleotide ligation assay (OLA, Nickerson DA, et al., 1990. Proc. Natl. Acad. Sci. USA. 87: 8923-7), Sequence-coded OLA (Gasparini P, et al., 1999. J. Med. Screen. 6: 67-9), Microarray ligation, Ligase chain reaction, Padlock probes, Rolling circle amplification, Invader assay (reviewed in Shi MM. 2001. Enabling large-scale pharmacogenetic studies by high-throughput mutation detection and genotyping technologies. Clin Chem. 47: 164-72), coded microspheres (Rao KN et al., 2003. Nucleic Acids Res. 31: e66) and MassArray (Leushner J, Chiu NH, 2000. Mol Diagn. 5: 341-80). According to preferred embodiments of the present invention the SNPs used by the present invention are selected from the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) SNPs database and included in the 12 CD- ROMS which are incorporated herein as detailed in the examples section which follows: CD-ROM1 (SNPs from chromosomes 1 and 22), CD-ROM2 (SNPs from chromosomes 2, 21 and Y), CD-ROM3 (SNPs from chromosomes 3 and 14), CD- ROM4 (SNPs from chromosomes 4 and 17), CD-ROM5 (SNPs from chromosomes 5 and 18), CD-ROM6 (SNPs from chromosomes 6 and 19), CD-ROM7 (SNPs from chromosomes 7 and 12), CD-ROM8 (SNPs from chromosomes 8 and 9), CD-ROM9 (SNPs from chromosomes 10 and 11), CD-ROM10 [SNPs from chromosomes 13, 15, 16 and from more than one chromosome (multi)], CD-ROM11 [SNPs from chromosome 20 and from repetitive elements (masked)], CD-ROM12 (SNPs from chromosome X). As is mentioned hereinabove, the genetic profile of the cells can also be effected via analysis of cell transcriptomes.

The expression level of the RNA in the cells of the present invention can be determined using methods known in the arts.

Northern Blot analysis: This method involves the detection of a particular RNA in a mixture of RNAs. An RNA sample is denatured by treatment with an agent (e.g., formaldehyde) that prevents hydrogen bonding between base pairs, ensuring that all the RNA molecules have an unfolded, linear conformation. The individual RNA molecules are then separated according to size by gel electrophoresis and transferred to a nitrocellulose or a nylon-based membrane to which the denatured RNAs adhere. The membrane is then exposed to labeled DNA probes. Probes may be labeled using radio- isotopes or enzyme linked nucleotides. Detection may be using autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of particular RNA molecules and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the gel during electrophoresis. RT-PCR analysis: This method uses PCR amplification of relatively rare RNAs molecules. First, RNA molecules are purified from the cells and converted into complementary DNA (cDNA) using a reverse transcriptase enzyme (such as an MMLV-RT) and primers such as, oligo dT, random hexamers or gene specific primers. Then by applying gene specific primers and Taq DNA polymerase, a PCR amplification reaction is carried out in a PCR machine. Those of skills in the art are capable of selecting the length and sequence of the gene specific primers and the PCR conditions (i.e., annealing temperatures, number of cycles and the like) which are suitable for detecting specific RNA molecules. It will be appreciated that a semi- quantitative RT-PCR reaction can be employed by adjusting the number of PCR cycles and comparing the amplification product to known controls.

RNA in situ hybridization stain: In this method DNA or RNA probes are attached to the RNA molecules present in the cells. Generally, the cells are first fixed to microscopic slides to preserve the cellular structure and to prevent the RNA molecules from being degraded and then are subjected to hybridization buffer containing the labeled probe. The hybridization buffer includes reagents such as formamide and salts (e.g., sodium chloride and sodium citrate) which enable specific hybridization of the DNA or RNA probes with their target mRNA molecules in situ while avoiding non- specific binding of probe. Those of skills in the art are capable of adjusting the hybridization conditions (i.e., temperature, concentration of salts and formamide and the like) to specific probes and types of cells. Following hybridization, any unbound probe is washed off and the slide is subjected to either a photographic emulsion which reveals signals generated using radio-labeled probes or to a colorimetric reaction which reveals signals generated using enzyme-linked labeled probes.

In situ RT-PCR stain: This method is described in Nuovo GJ, et al. [Intracellular localization of polymerase chain reaction (PCR)-amplified hepatitis C cDNA. Am J Surg Pathol. 1993, 17: 683-90] and Komminoth P, et al. [Evaluation of methods for hepatitis C virus detection in archival liver biopsies. Comparison of histology, immunohistochemistiy, in situ hybridization, reverse franscriptase polymerase chain reaction (RT-PCR) and in situ RT-PCR. Pathol Res Pract. 1994, 190: 1017-25]. Briefly, the RT-PCR reaction is performed on fixed cells by incorporating labeled nucleotides to the PCR reaction. The reaction is carried on using a specific in situ RT- PCR apparatus such as the laser-capture microdissection PixCell I LCM system available from Arcturus Engineering (Mountainview, CA).

Although cell profiling methods which analyze the genome or transcriptome are preferred for their accuracy and high throughput capabilities, it will be appreciated that the present invention can also utilize protein analysis tools for profiling the cells of the cultures.

Expression and/or activity level of proteins expressed in the cells of the cultures of the present invention can be determined using methods known in the arts. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. Western blot: This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabeled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Radio-immunoassay (RIA): In one version, this method involves precipitation of the desired protein (i.e., the substrate) with a specific antibody and radiolabeled antibody binding protein (e.g., protein A labeled with I¹²⁵) immobilized on a precipitable carrier such as agarose beads. The number of counts in the precipitated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.

Immunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fluorophores. Detection is by microscopy and subjective or automatic evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. It will be appreciated that immunohistochemistry is often followed by counterstaining of the cell nuclei using for example Hematoxyline or Gie sa stain.

In situ activity assay: According to this method, a chromogenic substrate is applied on the cells containing an active enzyme and the enzyme catalyzes a reaction in which the substrate is decomposed to produce a chromogenic product visible by a light or a fluorescent microscope.

In vitro activity assays: In these methods the activity of a particular enzyme is measured in a protein mixture extracted from the cells. The activity can be measured in a spectrophotometer well using colorimetric methods or can be measured in a non- denaturing acrylamide gel (i.e., activity gel). Following electrophoresis the gel is soaked in a solution containing a substrate and colorimetric reagents. The resulting stained band corresponds to the enzymatic activity of the protein of interest. If well calibrated and within the linear range of response, the amount of enzyme present in the sample is proportional to the amount of color produced. An enzyme standard is generally employed to improve quantitative accuracy. As is mentioned hereinabove, in order to represent a population, an appropriate number of individuals from which cells are derived has to be selected. The number of individuals from which cells are derived depends on the gene effect (i.e., how much the variant influences the frait), the allele frequencies of the marker allele (i.e., the tested allele of a specific SNP) and the functional allele (i.e., the allele contributing to the phenotype of the trait) and LD level between the marker allele and the functional allele. For example, the identification of an association between a frait and a gene is easier when the sample size, the effect size and the LD between the marker allele and the functional allele, all of which, increase.

According to preferred embodiments the measured trait of the present invention is a dichotomous trait, i.e., when the trait may have only two possible outcomes: trait type 1 and trait type 2.

As is described in Example 1 of the Example section which follows, the statistical power for dichotomous traits is calculated by the following equation (Pr{Z < ∑_β}). As used herein "Pr" refers to probability, "Z" refers to the test statistics standardized under the alternative hypothesis and "zp" refers to the test threshold standardized under the alternative hypothesis. The test threshold standardized under the alternative hypothesis

(∑β) is calculated using the following equation:

where n and m₂ are the proportions of the marker allele in cell cultures with trait type 1 and 2 respectively, Ν is the sample size (i.e., the number of individuals representative of a population), and x is the ratio between the number of cell cultures with trait type 1 and type 2 (McGinnis, R., et al, 2002. Power and efficiency of the TDT and case- control design for association scans. Behav. Genet. 32: 135-144; Schork, Ν.J. 2002. Power calculations for genetic association studies using estimated probability distributions. Am. J. Hum. Genet. 70: 1480-1489). m is the average proportion of the

tested allele of the marker in both groups and is defined as m - — . l + x

It will be appreciated that the statistical power is calculated for a given gene effect and for different levels of linkage disequilibrium (LD).

As used herein the phrase "gene effect" refers to the proportion of genetic variance which can be explained by the gene as a part of the total phenotypic variance. The gene effect can be expressed by odds ratio (OR). As used herein the phrase "odds ratio" refers to the ratio of the odds of the risk factor in a diseased group and in a non- diseased (control) group. The odds are the ratio of the probability that the event of interest occurs relative to the probability that it does not. The OR is calculated as follows: (OR) = ^ — - — ^-, where p_y and ρ_n are the proportions of the functional (l -p_l) x p₂ allele in cell cultures with trait type 1 and 2 respectively. _r . _{τ ι ι} i i 2 (pm - p x m)² , . _A1 The LD level can be measured by r = _ _ _ — - — — , where /WM is the p x(l-p) x m x (l-m) frequency of the haplotype containing both the functional allele and the tested allele of the marker and p and in are the frequencies of these alleles in the population. As is shown in Figure 2a and in Example 1 of the Examples section which follows, using the above described equation for the gene effect (t.e., by calculating the OR) the sample size needed for a given statistical power (i.e., at different levels of LD) can be calculated. For example, if the marker is in tight association with the gene (t.e., LD equals to 1) and the gene effect is high (e.g., OR equals to 2) then the number of individual's cells needed to identify such a gene is less than 300. On the other hand, if the marker and the gene are not in tight association (e.g., LD = 0.25), and the gene effect is relatively low (e.g., OR = 1.3) then the number of individual's cells needed to identify such a gene is about 7000. According to preferred embodiments of the present invention the measured trait of the present invention is a quantitative trait with a continuous distribution such as a normal distribution. For quantitative traits, the mean trait can be compared between the different genotype groups by analysis of variance (ANON A). The test is based on the significance calculated using the F distribution of the expected mean square between marker genotypes (EMSp) against the expected mean square within marker genotypes (EMS_ω). The gene effect may be expressed in terms of gene heritability (h²), which is the proportion of genetic variance which can be explained by the gene as a part of the total phenotypic variance. The power of the F statistical test is predicted from the probability Pr{ ^,, _v2(δ_F) > - _α;vlιV2} ₅ where F_{vϊ v2}(δ_F) is the noncentral invariable with vi (number of genotype groups- 1) and v₂ (Ν-l) degrees of freedom and the noncentral (Luo, Z.W. (Detecting linkage disequilibrium

between a polymorphic marker locus and a trait locus in natural populations. Heredity, 80: 198-208, 1998).

As is shown in Figure 2b and in Example 1 of the Examples section which follows, the sample size (i.e., the number of individuals required to generate cell cultures thereof) can be calculated for different degrees of LD and heritability levels.

For example, the detection of a gene with 5-10 % heritability using a marker that is in tight association with the gene (i.e., LD equals to 1) will require 350-750 individual's cells. On the other hand, the detection of a gene with 30 % heritability using the same marker will require approximately 100 individual's cells. Following is a non-limiting example of a construction of a collection of genetically profiled cell cultures according to the teachings of the present invention. First, a group of 1000 individuals which are descendants of a isolated homogenous population such as the Finnish population are selected. The individuals complete a questionnaire including information regarding their age, sex, health status. The questionnaire also includes information regarding medications in use and the positive and negative effects generated by such medications.

Blood samples are collected from the selected individuals and lymphocytes are isolated from the blood and are transformed using Epstein Barr Virus to generate lymphoblastoid cell lines. Once the cell lines are established, the DNA, RNA and protein molecules of the cells are extracted. For determination of the genotypes, 100,000 SNPs are randomly selected at constant intervals. It will be appreciated that 72 % of all the variants in the genome are expected to be in LD of r² > 0.333 with at least one of the 100,000 SNPs selected. Thus, there is a statistical power of 80 % to identify variants (i.e., genes) which have 7 % effect on a specific trait in 72 % of the genome. Thus, the DNA which was prepared from the cells is subjected to a SNP analysis using e.g., the Acycloprime™ technology and the resulting genotype for each SNP is recorded in a dataset. Next, the genetic profile of the RNA in the cells is determined using for example, RT- PCR analysis with primers specific for part or all of the genes in the genome. Additionally or alternatively, the genetic profile of part or all of the proteins in the genome is deteπnined using e.g., Western Blot and/or activity assays. The genetic profile of the RNA and/or protein is recorded in the dataset and assigned to each of the cell lines.

The profiled cell cultures of the present invention can form a part of a kit. Such a kit includes the cell cultures and the dataset representing the genetic profile of the cells. Dataset provided on magnetic, optical or optico-magnetic disk also includes information on the population, the individuals and the types of cells included. It will be appreciated that the genetically profiled cell cultures of the present invention which are part of the kit can be contained within any suitable tissue culture container known in the arts such as flasks, plates, tubes, vials and/or an array. Preferably, the cells of the present invention are contained and arranged using a tissue culture array (Biran I, Walt DR. 2002. Optical imaging fiber-based single live cell arrays: a high- density cell assay platform. Anal. Chem. 74: 3046-54).

According to preferred embodiments of the present invention, the kit also includes a software application which is designed for accessing, analyzing and interpreting results included in the dataset of the present invention. For example, the software can enable the user to retrieve the genotyping status of each of the SNPs tested along with the expression profile of the gene from which the SNP is derived or associated with. The software can also enable the user to correlate between a specific genetic profile and any trait measured in the cells or the health status of the individual from which the cells are derived. Thus, the information regarding drug effects reported by the individuals can be also correlated with the genetic profile of the same individuals.

The kit also includes the appropriate instructions for use and labels indicating FDA approval for use in vitro. Since the profiled cell cultures described hereinabove enable accurate association between phenotypic changes in cells of the culture and specific alleles, such profiled cell cultures or a kit containing same can be utilized to, for example, associate a specific genotype to responsiveness of an individual or a population to an agent (e.g., drug) or condition (e.g., irradiation).

Thus, according to another aspect of the present invention there is provided a method of associating phenotypic characteristics with a genetic profile of an individual.

As used herein the phrase "phenotypic characteristics" refers to a morphological phenotype (e.g., size and shape of cells), a viability phenotype (e.g., live and dead cells), a molecular phenotype (e.g., an expression pattern, an activity pattern), a differentiation phenotype (e.g., cell differentiation rate), a proliferation phenotype (e.g., cell proliferation rate), a cell behavioral phenotype (e.g. chemotaxis movement), a susceptibility phenotype (e.g., susceptibility to pathogens or toxins ), and a resistance phenotype (e.g., drug resistance) The method is effected by exposing the genetically profiled cell cultures of the present invention to an agent or a condition and associating the resultant alteration in a phenotype in the cells with the dataset representing the genetic profile of the cells. According to preferred embodiments the agent of the present invention is a drug molecule. Such molecules can interact with receptors, antigens, secreted proteins and the like and cause an alteration in the phenotype in the cells. As used herein, the phrase "alteration in a phenotype" refers to changes in the molecular or cellular phenotype of the cells. For example, the agent of the present invention can bind to a receptor on the cell and induce intracellular reactions which lead to upregulation or downregulation of certain genes. Thus, according to preferred embodiments the alteration in the phenotype is detected using histological stains (May-Grunwald-Giemsa stain, Giemsa stain, Papanicolau stain, Hematoxyline stain and/or DAPI stain), flow cytometry analysis of membrane bound markers using, e.g., a fluorescence-activated cell sorting (FACS), biochemical assays (e.g., using enzymatic assays), immunological assays (e.g., using specific antibodies), and/or RNA assays (e.g., using RT-PCR, Northern blot, RNA in situ hybridization and in situ RT-PCR). It will be appreciated that the interaction of the agent of the present invention with the cells can result in changes in the size and shape of the cells and/or the cellular compartments (e.g., nucleus, cytoplasm, nucleolus) which can be visualized using a light microscope (e.g., for histologically stained cells), an inverted microscope (e.g., for unstained and/or live cells), a fluorescent microscope (e.g., for fluorescently- labeled cells) and/or an electron microscope (e.g., for Gold-labeled markers). In addition, the interaction of the agent of the present invention with cellular components can results in changes in proliferation and/or differentiation processes of the cells. For example, activation of certain receptors on hematopoietic cells can result in increased proliferation of those cells. According to preferred embodiments the alteration in the phenotype is detected using cell proliferation assays [e.g., using a MTT-based cell proliferation assay (Hayon, T. et al., 2003. Appraisal of the MTT-based assay as a useful tool for predicting drug chemosensitivity in leukemia. Leuk Lymphoma. 44: 1957-62)], cell differentiation assays (Kohler, T., et al., 2000. Cytokine-driven differentiation of blasts from patients with acute myelogenous and lymphoblastic leukemia into dendritic cells. Stem Cells.18: 139-47), apoptosis assays [e.g., using the Ethidium homodimer-1 (Molecular Probes, Inc., Eugene, OR, USA), the Tunnel assay (Roche, Basel, Switzerland), the live/dead viability/cytotoxicity two-color fluorescence assay (L-3224, Molecular Probes)], flow cytometry analysis [Lodish, H. et al., "Molecular Cell Biology", W.H. Freeman (Ed.), 2000], and the like. The association of a phenotype with a genotype is effected by comparing the genotypes of the individuals having the same phenotypic alterations as a result of an agent or a condition and identifying the patterns of genotypes which are common to such individuals. It will be appreciated that the genetically profiled cell cultures of the present invention can be used to identify novel drugs for the treatment and/or prevention of complex diseases. For example, a candidate drug molecule can be applied on the cells and the alteration in the phenotype observed in the cells can be analyzed and correlated with the dataset representing the genetic profile of the cells. Thus, the positive and negative effects exerted by the drug molecule can be attributed to the genetic make-up of the individual from which the cells are derived. In addition, taking into consideration the health status (e.g., the presence or absence of a disease) of the individual from which the cells have been derived the observed alteration in the phenotype can be used to predict the efficacy of drug treatment on an individual It will be appreciated that the collection of genetically profiled cell cultures of the present invention or the kit including same can be used to determine a predisposition of an individual having a specific genetic profile to a disease. As used herein, the term "predisposition" refers to a situation of susceptibility to develop a disorder or a disease. An individual with a predisposition to a disorder or disease is more likely than an individual without the predisposition to the disorder or disease to develop the disorder or disease. Methods of "determining a predisposition" of an individual to a disorder or disease are used in genetic counseling, especially prior to making a decision to conceive a child. Also, application of methods for detecting genotypes of SNPs in a particular region of the genome can be applied to the parents of an individual with a particular disorder. The method according to this aspect of the present invention is effected by associating the dataset representing the genetic profile of the cells with the occurrence of the disease.

As used herein the term "about" refers to ± 10%. Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.

EXAMPLES Reference is now made to the following examples, which together with the above descriptions, illustrate the invention in a non limiting fashion. Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, "Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Current Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Current Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988); Watson et al., "Recombinant DNA", Scientific American Books, New York; Birren et al. (Eds.) "Genome Analysis: A Laboratory Manual Series", Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; "Cell Biology: A Laboratory Handbook", Volumes I-III Cellis, J. E., ed. (1994); "Culture of Animal Cells - A Manual of Basic Technique" by Freshney, Wiley-Liss, N. Y. (1994), Third Edition; "Current Protocols in Immunology" Volumes I-III Coligan J. E., ed. (1994); Stites et al. (Eds.), "Basic and Clinical Immunology" (8th Edition), Appleton & Lange, Norwalk, CT (1994); Mishell and Shiigi (Eds.), "Selected Methods in Cellular Immunology", W. H. Freeman and Co., New York (1980); available irnmunoassays are extensively described in the patent and scientific literature, see, for example, U.S. Pat. Nos. 3,791,932; 3,839,153; 3,850,752 3,850,578; 3,853,987; 3,867,517; 3,879,262; 3,901,654; 3,935,074; 3,984,533 3,996,345; 4,034,074; 4,098,876; 4,879,219; 5,011,771 and 5,281,521 "Oligonucleotide Synthesis" Gait, M. J., ed. (1984); "Nucleic Acid Hybridization" Hames, B. D., and Higgins S. J., eds. (1985); "Transcription and Translation" Hames, B. D., and Higgins S. J., eds. (1984); "Animal Cell Culture" Freshney, R. I., ed. (1986); "Immobilized Cells and Enzymes" IRL Press, (1986); "A Practical Guide to Molecular Cloning" Perbal, B., (1984) and "Methods in Enzymology" Vol. 1-317, Academic Press; "PCR Protocols: A Guide To Methods And Applications", Academic Press, San Diego, CA (1990); Marshak et al., "Strategies for Protein Purification and Characterization - A Laboratory Course Manual" CSHL Press (1996); "Approaches to Gene Mapping in Complex Human Diseases" Jonathan L. Haines and Margaret A. Pericak- Vance eds., Wiley-Liss (1998); "Genetic Dissection of Complex Traits" D.C. Rao and Michael A. Province eds., Academic Press (1999); "Introduction to Quantitative Genetics" D.S. Falconer and Trudy F.C. Mackay, Addison Wesley Longman Limited (1996); all of which are incorporated by reference as if fully set forth herein. Other general references are provided throughout this document. The procedures therein are believed to be well known in the art and are provided for the convenience of the reader. All the information contained therein is incorporated herein by reference.

EXAMPLE 1 GENERATION OF GENETICALLY PROFILED CELL LINES To generate genetically profiled cell lines, biological samples (e.g., blood) are collected from a large number of individuals and cell lines (e.g., lymphoblastoid cell cultures) are established. DNA, RNA and protein samples extracted from the established cell lines are subjected to genotyping and gene expression analysis and the data generated using these analyses is arranged in databases using the appropriate software. Subject collection - Subjects are randomly collected from a population with an optimum ratio between homogeneity, which increases statistical power and linkage disequilibrium (LD), and heterogeneity, which increases the proportion of variations explained by genetic factors and the mapping resolution due to the break of the linkage disequilibrium. Suitable populations along with the present invention include the Caucasian isolated populations such as the population in Finland or Sardinia which were founded by a small number of founders and expanded rapidly while being isolated from other populations. Also heterogeneous populations, such as Caucasians, Africans, Afro- Americans, Asians and Hispanics are suitable for this invention. Establishment of lymphoblastoid cell cultures - Lymphocytes are isolated from peripheral blood samples (15-45 ml), transformed with Epstein Barr Virus (EBV) which renders them unlimited expansion capacity in culture, essentially as described elsewhere (Bonifacino 1998, Current protocols in cell biology; John Wiley, New York). Isolation of lymphocytes from peripheral blood samples - For the establishment of a lymphoblastoid cell line peripheral blood samples (15 ml) are placed in 50-ml conical centrifuge tubes and mixed with 25 ml of phosphate-buffered saline (PBS) and 10 ml of Ficoll-Hypaque solution (Bio-Whittaker). The blood solution is centrifuged in a JOUAN centrifuge (CR 4-22, JOUAN S.A., Saint-Herblain, France) using the "brake off' mode at 20 °C for 20 minutes at 800 x g. Following centrifugation three distinct layers and a red pellet are formed: the upper layer which includes the plasma and platelets, the interface layer which includes the lymphocytes, and the lower layer with the red pellet which include the granulocyte and erythrocyte cell fractions. For lymphocyte isolation, the upper layer is aspirated and the interface band is collected along with a maximum of 5 ml of the lower layer into a 50-ml conical centrifuge tube. The volume of the lymphocytes layer collected from 2-3 aliquots (of 15 ml) of the same blood sample is adjusted to 50 ml using PBS and the lymphocytes solution is centrifuged using the "brake on" mode at 20 °C for 10 minutes at 600 x g. Following centrifugation the supernatant is aspirated and the pellet is resuspended with 10 ml PBS, centrifuged again at 20 °C for 15 minutes at 300 x g. Sedimented lymphocytes are resuspended again in Lymphocyte culture medium (LCM) and the number of viable cells is determined using trypan blue exclusion (Sigma, St Louis, MO, USA). Preparation of Epstein Ban Virus (EBI transformed cells - B95-8 EBV- transformed marmoset cells are suspended in a complete culture medium at a concentration of 1 x 10⁶ cells/ml and 50 ml of cells are cultured in 75-mπT tissue culture flasks for 3 days until at least 95 % of cells are viable and in the exponential growth phase. Confluent cultures are centrifuged at 4 °C for 10 minutes at 600 x g, following which the supernatant is filtered through 0.45 μm sterile filters, divided into 0.6-ml aliquots and stored at -70 °C. Transformation of isolated lymphocytes with EBV - Isolated lymphocytes are resuspended in a complete culture medium at a concentration of 1 x 10⁶ cells/ml, and 5 ml of the lymphocyte solution are placed in an upright 25-cm² tissue culture flask. Prior to transformation cells are incubated for 1 hour with 10 μg/ml of anti-CD3 antibody following which 0.5 ml of the EBV-containing B95-8 supernatant is added to the flask. Lymphocytes cells are cultured for 1-2 weeks in the upright position allowing transformation to occur. It will be appreciated that following transformation the color of the medium changes to an orange/yellow color and small clumps of cells are visible. The transformed lymphoblastoid cells are then propagated by adding 5 ml of fresh complete medium for another 2-3 days of culturing following which 5 ml of the supernatant are replaced with 5 ml of fresh complete medium until the total cell number exceeds 5 x 10 cells. Cells are then transferred to 75-cm flasks containing 50 ml of complete medium and incubated until cell concentration is > 1 x 10 cells/ml. Lymphoblast cultures are maintained in culture by splitting the cultures to 1 x 10⁵ cells/ml and propagating the culture until a concentration of 1 x 10⁶ cells/ml is achieved. Alternatively, for long-term storage, aliquots of 1 x 10⁶ cells/ml lymphoblastoid cells are mixed with an equal volume of a cryoprotective media (available from BioWhittaker-Cambrex, Baltimore MD, USA) and are kept in liquid nitrogen until further use. SNP selection - The three main strategies for SNP selection in whole-genome scans include a map-based selection (i.e., selection of SNPs in constant intervals); an LD- based selection (i.e., selection of tag SNPs based on LD patterns); and a sequence- based selection (i.e., selection of SNPs within genes which are likely to be functional). The three strategies differ in the prior knowledge required in each case. The map-based strategy - This strategy requires only the location of the SNPs across the genome. The main advantage of this approach is that LD information is not required and that no assumption is made on the type or location of the disease- susceptibility variants. The number of SNPs required under this approach is, however, higher than the other strategies. For example, a selection of a SNP every 10 Kb in the genome is translated to 300,000 SNPs, located at least 5 Kb away from any variant in the genome. Based on the LD information from the study by Shifrnan et al. (Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 2003, 12: 771-6), using a marker density of a single SNP every 10 Kb, 95 % of the common variants (>5 %) in the genome are on average LD of r² > 0.6. The LD-based strategy - This strategy is based on tag SNPs which capture most of the genetic variability. The few common haplotypes in a region of high LD (haplotype blocks) are represented by haplotype tag SNPs (htSNPs; Johnson, G.C. et al. 2001. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29: 233- 237). The main disadvantage of this approach is that only common haplotypes are captured by the htSNPs. In addition, the haplotype distribution must be identified in advance. It has been estimated that 150,000 tag SNPs will be sufficient to screen the genome with European samples (Goldstein, D.B., 2003. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19: 615-622). Furthermore, different populations have shown to have different level of LD (Shifrnan et al. Hum. Mol. Genet. 2003, 12: 771-6). LD-based approaches require different number of SNPs, an order of 150,000 is required for Europeans, but double that number is required if Africans or African- Americans are used, respectively. On the other hand with a young homogenous population that number may be reduced by up to one order of magnitude (i.e. 15,000). The sequence-based (or gene-based) strategy - This sfrategy consists of identifying variants in genes, together with their regulatory regions which may possess a functional effect. Since regions or polymorphisms which do not appear to be functional are not tested, hidden regulatory regions or unknown genes can be missed. Although the sequence-based strategy requires prior knowledge concerning the type of variants which are predicted to be associated with the disease, it has been estimated that this strategy requires a less dense SNP distribution, between 50,000-100,000 SNPs (Botstein, D., and Risch, N. 2003. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat. Genet. 33 Suppl: 228-237). Currently, the public databases include about 50,000 SNPs in the coding regions. About half of them are non-synonymous SNPs (i.e., cause a change in an amino acid) and are thus in high priority for gene-based screens. In addition, there are about 250,000 SNPs in untranslated regions (UTR) of gene's transcripts, The SNPs in the UTR may possess functional properties, and thus are also considered in gene-base approaches.

Statistical power for different gene effects and sample sizes - The identification of variants which influence a measured trait depends on several parameters: (i) sample size (i.e., the number of different cell cultures); (ii) the effect size (i.e., how much the variant influences the trait); (iii) the LD level between the tested variant and the functional variant; and (iv) the allele frequencies of the marker and the functional polymorphism. It will be appreciated that the identification of an association between a trait and a gene is easier when the sample size, the effect size and the LD increase. The measured trait can be a dichotomous trait, i.e., when the trait may have only two possible outcomes, or a continuous distribution such as normal distribution if the frait is quantitative. In the first case, the allele frequencies of all available SNPs are compared between the two groups of cell cultures and stratified according to the trait. In the second case, the cell cultures are stratified by the different genotypes of all available SNPs and the mean frait is compared between the different genotype classes. For dichotomous traits, the statistical power (Pr{Z < Z_β) ) can be calculated using the

following equations: and m₂ are the

proportions of the marker allele in cell cultures with trait type 1 and 2 respectively, Ν is the number of cell cultures, and x is the ratio between the number of cell cultures with trait type 1 and type 2 (McGinnis, R., et al., 2002. Power and efficiency of the TDT and case-control design for association scans. Behav. Genet. 32: 135-144; Schork, Ν.J. 2002. Power calculations for genetic association studies using estimated probability distributions. Am. J. Hum. Genet. 70: 1480-1489). m is the average proportion of the

tested allele in both groups and is defined as . The statistical power is

calculated for a given gene effect and for different LD levels. The gene effect can be

expressed by odds ratio (OR) = — — - — — , where p_{ and p₂ are the proportions of the (l -p_{) x p₂ functional allele in cell cultures with trait type 1 and 2 respectively. The LD level can

be measured by r² = — — _ _ ₌ - , where pm is the frequency of the p x(l - p) x m x (\ -m) haplotype containing both the functional allele and the tested allele of the marker and p and m are the frequencies of these alleles in the population. One can use the above equations to calculate the sample size needed for a given statistical power (Figure la).

For quantitative traits, the mean frait can be compared between the different genotype groups by analysis of variance (ANON A). The test is based on the significance calculated using the F distribution of the expected mean square between marker genotypes (EMSp) against the expected mean square within marker genotypes (EMS_ω). The gene effect may be expressed in terms of gene heritability (h²), which is the proportion of genetic variance which can be explained by the gene as a part of the total phenotypic variance. The power of the F statistical test is predicted from the probability Pr{ ⁷ _vi _V2(<5 ) > E_a._{vl v2}} , where F_{vl v2}(δ_F) is the noncentral F- variable with vi (number of genotype groups- 1) and v₂ (Ν-l) degrees of freedom and the noncentral

parameter δ_F Following the equation of Luo, Z.W.

(Detecting linkage disequilibrium between a polymorphic marker locus and a trait locus in natural populations. Heredity, SO: 198-208, 1998.), the sample size can be calculated for different degrees of LD and heritability levels (see Figure lb).

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.

CD-ROM Content

The following lists the file content of the 12 CD-ROMs which are enclosed herewith and filed with the application. File information is provided as: File name/byte size/date of creation operating system/machine format. CD-ROM1 (2 Files):

1. rs_chl.txt / 582,840,000 bytes/ October 14, 2003/ Microsoft Windows XP Professional/ PC.

2. rs_ch22.txt / 131,514,000 bytes/ October 14, 2003/ Microsoft Windows XP Professional/ PC. CD-ROM2 (3 Files):

1. rs_ch2.txt / 557,045,000 bytes/ January 12, 2005/ Microsoft Windows XP Professional/ PC.

2. rs_ch21.txt / 104,030,000 bytes/ January 12, 2005/ Microsoft Windows XP Professional/ PC. 3. rs_chY.txt / 26,879,000 bytes/ January 12, 2005/ Microsoft Windows XP

Professional/ PC. CD-ROM3 (2 Files):

1. rs_ch3.txt / 491,759,000 bytes/ January 5, 2005/ Microsoft Windows XP

Professional/ PC. 2. rs_chl4.txt / 219,737,000 bytes/ January 5, 2005/ Microsoft Windows XP

Professional/ PC. CD-ROM4 (2 Files):

1. rs_ch4.txt / 500,009,000 bytes/ January 12, 2005/ Microsoft Windows XP

Professional/ PC. 2. rs_chl7.txt / 177,032,000 bytes/ January 12, 2005/ Microsoft Windows XP

Professional/ PC. CD-ROM5 (2 Files):

1. rs_ch5.txt / 493,708,000 bytes/ January 12, 2005/ Microsoft Windows XP

Professional/ PC. 2. rs_chl8.txt / 182,130,000 bytes/ January 12, 2005/ Microsoft Windows XP

Professional/ PC. CD-ROM6 (2 Files): 1. rs_ch6.txt / 460,888,000 bytes/ October 15, 2003/ Microsoft Windows XP Professional/ PC.

2. rs_chl9.txt / 192,608,000 bytes/ October 14, 2003/ Microsoft Windows XP Professional/ PC. CD-ROM7 (2 Files):

1. rs_ch7.txt / 393,318,000 bytes/ October 13, 2003/ Microsoft Windows XP Professional/ PC.

2. rs_chl2.txt / 279,901,000 bytes/ October 15, 2003/ Microsoft Windows XP Professional/ PC. CD-ROM8 (2 Files):

1. rs_ch8.txt / 372,008,000 bytes/ October 13, 2003/ Microsoft Windows XP Professional/ PC.

2. rs_ch9.txt / 326,637,000 bytes/ October 15, 2003/ Microsoft Windows XP Professional/ PC. CD-ROM9 (2 Files):

1. rs_chl0.txt / 317,340,000 bytes/ April 26, 2004/ Microsoft Windows XP Professional/ PC.

2. rs_chl l.txt / 393,008,000 bytes/ October 13, 2003/ Microsoft Windows XP Professional/ PC. CD-ROM10 (4 Files):

1. rs_chl3.txt / 193,858,000 bytes/ October 15, 2003/ Microsoft Windows XP Professional/ PC.

2. rs_chl5.txt / 204,143,000 bytes/ October 15, 2003/ Microsoft Windows XP Professional/ PC. 3. rs_chl6.txt / 211,145,000 bytes/ October 14, 2003/ Microsoft Windows XP

Professional/ PC.

4. rs_chMulti.txt / 63,990,000 bytes/ October 14, 2003/ Microsoft Windows XP

Professional/ PC. CD-ROM11 (2 Files): 1. rs_ch20.txt / 336,914,000 bytes/ October 14, 2003/ Microsoft Windows XP

Professional/ PC.

2. rs_cb_Masked.txt / 331,684,000 bytes/ October 15, 2003/ Microsoft Windows

XP Professional PC. CD-ROM12 (1 File):

1. rs_chX.txt / 416,441,000 bytes/ October 14, 2003/ Microsoft Windows XP

Professional PC.

Claims

WHAT IS CLAIMED IS:

1. A kit for predicting efficacy of drug treatment on an individual, the kit comprising:

(a) a collection of cell cultures derived from cells of individuals being representative of at least one population, and;

(b) a dataset representing a genetic profile of cells of each of said cell cultures.

2. The kit of claim 1 , wherein said cell cultures are capable of proliferation.

3. The kit of claim 1, wherein said cell cultures include cells selected from the group consisting of lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells.

4. The kit of claim 1 , wherein said at least one population is selected from the group consisting of a Finnish population, a Sardinian population, an Ashkenazi Jew population, a South- West-Netherlands population, a Danish population, an Icelander population, a Swedish population, a Kizilcaboluk-Denizli Turkish population, a Brazilian Amondava population, a Newfoundland Canadian population, a Bedouin of Kuwait population, a Saguenay-Lac-Saint-Jean (SLSJ) Quebec Canadian population, a Salandra Italian population, a Caucasian population, an African population, an Asian population, and an Hispanic population.

5. The kit of claim 1, wherein said genetic profile represents a DNA polymorphism profile.

6. The kit of claim 5, wherein said DNA polymorphism is selected from the group consisting of single nucleotide polymorphism, micro-deletion, micro-insertion, short deletions and insertions, multinucleotide changes, short tandem repeats (STR), and variable number of tandem repeats (VNTR).

7. The kit of claim 1, wherein said genetic profile represents an RNA and/or protein expression and/or activity pattern.

8. The kit of claim 1, wherein said cell cultures are arranged in a form of a multi- well plate format.

9. A method of associating phenotypic characteristics with a genetic profile of an individual, the method comprising: (a) providing a collection of cell cultures derived from cells of individuals, said individuals being representative of at least one population; (b) qualifying genomes, transcriptomes and/or proteomes of cells of each of said cell cultures thereby generating a dataset representing genetic profiles of cells of said cell cultures; (c) exposing said cell cultures to an agent or a condition, and; (d) associating said dataset with an alteration in a phenotype in cells of at least one cell culture of said cell cultures resultant from said exposing to said agent or said condition thereby associating phenotypic characteristics with the genetic profile of the individual.

10. The method of claim 9, wherein said agent is a drug and/or a chemical agent.

11. The method of claim 9, wherein said cell cultures are capable of proliferation.

12. The method of claim 9, wherein said cell cultures include cells selected from the group consisting of lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells.

13. The method of claim 9, wherein said at least one population is selected from the group consisting of a Finish population, a Sardinian population, an Ashkenazi Jew population, a South- West-Netherlands population, a Danish population, an Icelander population, a Swedish population, a Kizilcaboluk-Denizli Turkish population, a Brazilian Amondava population, a Newfoundland Canadian population, a Bedouin of Kuwait population, a Saguenay-Lac-Saint-Jean (SLSJ) Quebec Canadian population, a Salandra Italian population, a Caucasian population, an African population, an Asian population, and an Hispanic population.

14. The method of claim 9, wherein said genetic profile represents a DNA polymorphism profile.

15. The method of claim 14, wherein said DNA polymorphism is selected from the group consisting of single nucleotide polymorphism, micro-deletion, micro-insertion, short deletions and insertions, multinucleotide changes, short tandem repeats (STR), and variable number of tandem repeats (VNTR).

16. The method of claim 9, wherein said genetic profile represents an RNA and/or protein expression and/or activity pattern.

17. The method of claim 9, wherein said qualifying a genome is effected by identifying at least one sequence modification in a DNA sequence of said cells.

18. The method of claim 17, wherein said identifying is effected by a method selected from the group consisting of restriction fragment length polymoφhism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Elecfrophoresis (DGGE/TGGE), Single-Strand Conformation Polymoφhism (SSCP) analysis, Dideoxy fingeφrinting (ddF), pyrosequencing analysis, acycloprime analysis, Reverse dot blot, GeneChip microarrays, Dynamic allele-specific hybridization (DASH), Peptide nucleic acid (PNA) and locked nucleic acids (LNA) probes, TaqMan, Molecular Beacons, Intercalating dye, FRET primers, AlphaScreen, SNPstream, genetic bit analysis (GBA), Multiplex minisequencing, SNaPshot, MassEXTEND, MassArray, GOOD assay, Microarray miniseq, arrayed primer extension (APEX), Microarray primer extension, Tag arrays, Coded microspheres, Template-directed incoφoration (TDI), fluorescence polarization, Colorimetric oligonucleotide ligation assay (OLA), Sequence-coded OLA, Microarray ligation, Ligase chain reaction, Padlock probes, Rolling circle amplification, and Invader assay.

19. The method of claim 9, wherein said phenotype is selected from the group consisting of a moφhological phenotype, a viability phenotype, a molecular phenotype, a differentiation phenotype, a proliferation phenotype, a cell behavioral phenotype, a susceptibility phenotype, and a resistance phenotype.

20. The method of claim 19, wherein said phenotype is detected using histological stains, flow cytometry analysis, biochemical assays, immunological assays, cell proliferation assays, cell differentiation assays, and/or RNA assays.

21. A method of determining a predisposition of an individual having a specific genetic profile to a disease comprising: (a) establishing cell cultures from cells derived from individuals representative of at least one population; (b) qualifying genomes, transcriptomes and/or proteomes of cells of each of said cell cultures thereby generating a dataset representing genetic profiles of cells of said cell cultures; (c) associating said dataset with the presence or absence of the disease in at least one individual of said at least one population thereby determining the predisposition of the individual having a specific genetic profile to the disease.

22. The method of claim 21, wherein said cell cultures are capable of proliferation.

23. The method of claim 21, wherein said cell cultures include cells selected from the group consisting of lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells.

24. The method of claim 21, wherein said at least one population is selected from the group consisting of a Finnish population, a Sardinian population, an Ashkenazi Jew population, a South- West-Netherlands population, a Danish population, an Icelander population, a Swedish population, a Kizilcaboluk-Denizli Turkish population, a Brazilian Amondava population, a Newfoundland Canadian population, a Bedouin of Kuwait population, a Saguenay-Lac-Saint-Jean (SLSJ) Quebec Canadian population, a Salandra Italian population, a Caucasian population, an African population, an Asian population, and an Hispanic population.

25. The method of claim 21, wherein said genetic profile represents a DNA polymoφhism profile.

26. The method of claim 25, wherein said DNA polymoφhism is selected from the group consisting of single nucleotide polymoφhism, micro-deletion, micro-insertion, short deletions and insertions, multinucleotide changes, short tandem repeats (STR), and variable number of tandem repeats (VNTR).

27. The method of claim 21, wherein said genetic profile represents an RNA and/or protein expression and/or activity pattern.

28. The method of claim 21, wherein said qualifying a genome is effected by identifying at least one sequence modification in a DNA sequence of said cells.

29. The method of claim 28, wherein said identifying is effected by a method selected from the group consisting of restriction fragment length polymoφhism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymoφhism (SSCP) analysis, Dideoxy fingeφrinting (ddF), pyrosequencing analysis, acycloprime analysis, Reverse dot blot, GeneChip microarrays, Dynamic allele-specific hybridization (DASH), Peptide nucleic acid (PNA) and locked nucleic acids (LNA) probes, TaqMan, Molecular Beacons, Intercalating dye, FRET primers, AlphaScreen, SNPstream, genetic bit analysis (GBA), Multiplex minisequencing, SNaPshot, MassEXTEND, MassAnay, GOOD assay, Microarray miniseq, anayed primer extension (APEX), Microanay primer extension, Tag anays, Coded microspheres, Template-directed incoφoration (TDI), fluorescence polarization, Colorimetric oligonucleotide ligation assay (OLA), Sequence-coded OLA, Microarray ligation, Ligase chain reaction, Padlock probes, Rolling circle amplification, and Invader assay.

30. A collection of genetically profiled cell cultures comprising cell cultures derived from cells of individuals being representative of at least one population, and a dataset representing a genetic profile of cells of each of said cell cultures.

31. The collection of claim 30, wherein said cell cultures are capable of proliferation.

32. The collection of claim 30, wherein said cell cultures include cells selected from the group consisting of lymphoblastoid cells, fibroblastoid cells, and hematopoietic stem cells.

33. The collection of claim 30, wherein said at least one population is selected from the group consisting of a Finnish population, a Sardinian population, an Ashkenazi Jew population, a South-West-Netherlands population, a Danish population, an Icelander population, a Swedish population, a Kizilcaboluk-Denizli Turkish population, a Brazilian Amondava population, a Newfoundland Canadian population, a Bedouin of Kuwait population, a Saguenay-Lac-Saint-Jean (SLSJ) Quebec Canadian population, a Salandra Italian population, a Caucasian population, an African population, an Asian population, and an Hispanic population.

34. The collection of claim 30, wherein said genetic profile represents a DNA polymoφhism profile.

35. The collection of claim 34, wherein said DNA polymoφhism is selected from the group consisting of single nucleotide polymoφhism, micro-deletion, micro- insertion, short deletions and insertions, multinucleotide changes, short tandem repeats (STR) and variable number of tandem repeats (VNTR).

36. The collection of claim 30, wherein said genetic profile represents an RNA and/or protein expression and/or activity pattern.

37. The collection of claim 30, wherein said cell cultures are ananged in a form of a multi-well plate format.