MXPA00006875A - Multiplex vgid - Google Patents

Multiplex vgid

Info

Publication number
MXPA00006875A
MXPA00006875A MXPA/A/2000/006875A MXPA00006875A MXPA00006875A MX PA00006875 A MXPA00006875 A MX PA00006875A MX PA00006875 A MXPA00006875 A MX PA00006875A MX PA00006875 A MXPA00006875 A MX PA00006875A
Authority
MX
Mexico
Prior art keywords
dna
immobilized
unbound
separating
contacting
Prior art date
Application number
MXPA/A/2000/006875A
Other languages
Spanish (es)
Inventor
Francois J M Iris
Jeanlouis Pourny
Original Assignee
Valigen Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valigen Inc filed Critical Valigen Inc
Publication of MXPA00006875A publication Critical patent/MXPA00006875A/en

Links

Abstract

The present invention relates generally to the field of genomics. More particularly, the present invention relates to a method for gene identification beginning with user-selected input phenotypes. The method is referred to generally as the ValiGeneSM Gene Identification method, or the VGIDSM method. When more than two source populations of nucleic acids are simultaneously compared, the method may be referred to as multiplex VGIDSM. The method employs nucleic acid mismatch binding protein chromatography to effect a molecular comparison of one phenotype with others. Genes are identified as having a specified function, or as causing or contributing to the cause or pathogenesis of a specified disease, or as associated with a specific phenotype, by virtue of their selection by the method. Identified genes may be used in development of reagents, drugs and/or combination thereof useful in clinical or other settings for prognosis, diagnosis and/or treatment of diseases, disorders and/or conditions. The method is equally suited for gene identification for agricultural, bio-engineering, medical, veterinary, and many other applications.

Description

MULTIPLE VGID This application is a continuation in part of the US Patent Application Serial Number 09 / 007,905 (Attorney Registration Number 9408-003) entitled "METHOD FOR IDENTIFYING GENES UNDERLYING DEFINED PHENOTYPES" (Method for identifying genes that form the basis of defined phenotypes), presented on January 15, 1998, which is incorporated herein by reference in its entirety. 1. FIELD OF THE INVENTION The present invention generally relates to the field of genomics. More particularly, the present invention relates to a method for identifying genes starting with entered phenotypes selected by the user. The method is generally known as the ValiGeneMR gene identification method, or the VGID ^ method. The method employs nucleic acid mismatch protein chromatography to effect a molecular comparison of one phenotype with others. Genes are identified that have a specific function, either causing or contributing to the cause or pathogenesis of a specific disease, or associated with a specific phenotype, by virtue of their selection by the method. The identified genes can be used in the development of reagents, drugs and / or combinations thereof useful in clinical settings or in other types of environments for prognoses, diagnoses and / or treatment of diseases, disorders and / or conditions. The method is also suitable for the identification of genes for agriculture, bioengineering applications, medical applications, veterinary applications and many other types of applications. When more than two nucleic acid source populations are compared simultaneously, the method may be known as multiplex VGIDMR. 2 . BACKGROUND OF THE INVENTION The identification of a particular genotype responsible for a given genotype is an essential goal that is the basis of gene-based medicine because it provides a rational starting point for the development of successful strategies for the management of the disease. , therapy and cure. While, in a recent estimate, only two percent (2%) of the human genome has been sequenced, perhaps more than 50% of the human genes expressed are at least partially represented in the existing databases (Duboule, D. , October 24, 1997, Editorial: The Evolution of Genomics, Science 278, 555). It is therefore quite clear that the understanding of the functional interactions between the products of expressed genes represents the next great challenge in medicine and biology. This understanding has been known as "functional genomics", although this term is perhaps too broad to have a clear meaning (Heiter, P.And Boguski, M., October 24, 1997, Functional Genomics: It's All How You Read It, Science 278, 601-602). However, the general opinion is that functional genomics generally describes "... a transition or expansion from the process of forming maps and frequencies of genomes, towards an emphasis on the function of the genome." (Id.). In addition, this new emphasis will require "... creative thinking in the development of innovative technologies that make use of the extensive resources of structural genomic information." Perhaps the best definition of functional genomics is "... the development and application of global experimental approaches (genome scale or system scale) to evaluate the function of a gene making use of the information provided by structural genomics. "(Id., emphasis added) One of the main advantages of the present invention is the fact of avoid large-scale sequencing to determine functional relationships between genes The VGIDMR method of the present invention is a direct and very powerful method of genetic comparison or subtraction technique.Functional information is obtained from a comparison between genes expressed globally ( that is, on a genome scale), of two or more user-defined phenotypes using an unequal binding protein chromatography.
With the VGIDMR method, the genes of the disease can be identified over a period of weeks, and not the years required to succeed using position cloning. 2.1 DISEASE CHARACTERISTICS AND OTHER PHENOTYPES Genetic diseases and other genetically determined genotypes, regardless of the mode of inheritance, may be due to individual or multiple lesions (ie, mutations) that affect a gene or more than one gene simultaneously. Genetic heterogeneity (ie, DNA sequence difference), by definition, characterizes all diseases that have a genetic component. Genetic diseases can also be classified among four broad genotypic groups, in accordance with what is described below. A monoallelic disease is characterized because it has a switch in a single allele of a single gene. This group of diseases is the simplest in terms of genetic analysis since monoallelic diseases arise, by definition, from a single lesion that affects a single gene. Monoallelic diseases have also been described as presenting "molecular monomorphism", which is another way of saying that a single molecular defect in a single gene is responsible for the disease genotype. Since such genetic lesions are unique, it invariably "causes" the disease in question. In the case of a monoallelic disease, only a few affected individuals must undergo genetic analysis to attribute a given mutation to a disease phenotype. That is, large family studies are not required to identify the gene that causes the disease. Only some examples of these diseases are known. An example is sickle cell anemia, which is due to a single base substitution, (ie, A -> T) in the gene encoding hemoglobin. This base substitution changes the respective codon from GAG to GTG, which ultimately results in a substitution of amino acids glutamate for valine at position six of the hemoglobin β chain molecule and the characteristic falciform erythrocyte devastating. A polyallelic disease is characterized because it has several different mutations that arise independently in a single gene. Here, each independent mutation causes a different disease allele. A significant proportion of all genetic diseases result, as is believed in this way. Since such de novo mutations are frequent, polyalemia is a very common feature of genetic disease. Duchenne muscular dystrophy (DMD), Becker's myopathy, and cystic fibrosis (CF) are well-known examples of polyallelic diseases (see, for example, McKusick, Mendelian Inheritance in Man, Catalog of Autosomal Dominant, Autosomal Recessive, and X -Linked Phenotypes, tenth edition, 1992, The Johns Hopkins University Press, Baltimore, Maryland). Polyalelismo can arise in at least two ways. First each new one of a disease can arise from an event of independent mutation in the target gene. For example, DMD, at least 30% of the cases present novel mutations in the dystrophin gene that differ from all previously characterized mutations. Second, selective fixation of different founder effect mutations contributes to the emergence of polyalemia. For example, we can mention ß-thalassemia in which the world population of affected individuals presents a markedly elevated polyalemia, but local populations are characterized by limited allelic heterogeneity. A non-allelic genetic disease is characterized by having more than one candidate gene. Here, a clinically well-defined genetic disease may be due to an injury (mutation) of any gene among several candidate genes. For example, an osteogenesis imperfecta is caused by the injury of any of five different collagen and type 1 genes. Nevertheless, the identification of the candidate genes for a non-allelic genetic disease is more difficult when the candidate genes, unlike the collagen genes, are not related in sequence. For example, pituitary dwarfism is physiologically due to hyperfunction of the anterior pituitary gland. In a minority of cases of pituitary dwarfism, the causative lesion has been traced to the genetic complex of growth hormone elaboration (Kaplan and Delpech, 1993, in Molecular Biology and Medicin, second edition, Médecine-Sciences Flammarion, Paris, chapter 12 , pages 307-308). In the vast majority of cases, however, these genes are completely normal and the disease loci that cause them are not even related to the growth hormone complex (as shown by polymorphism binding studies, ID). Accordingly, other unidentified genes comprising alleles unrelated to growth hormone account for most cases of pituitary dwarfism. Such non-allelic diseases clearly require more than just a linkage analysis to identify all the genes involved. The VGIDMR method of the present invention offers a quick and rational way to attack this problem. A polygenic disease is characterized because it has several abnormal genes that act concurrently to produce a pathological phenotype. This group includes many genetic diseases often described as "multifactorial disorders". Examples include diabetes mellitus, hypertension, atherosclerosis, autoimmune disorders. In the case of most polygenic diseases, the metabolic complexities are so great that a rational basis in which candidate genes could be identified, or may not have existed before the invention presented here. In the few cases in which a candidate gene has been suggested, this knowledge has been largely inadequate to identify susceptible individuals, or to explain the pathogenesis. The last two groups of genetic disorders described above (ie, non-allelism and polygenism) represent the biggest challenge facing human medicine and veterinary medicine today. Due to the absence of sufficient biochemical and physiological data, credible candidate genes have largely not been identified. This absence of credible candidate genes has in turn eliminated the possibility of identifying susceptible individuals and of attempting preventive intervention before the onset of symptoms. The invention presented here offers a way to overcome these limitations by identifying credible candidate genes. 2.2 IDENTIFICATION OF GENES BY CLONING POSITION There are several known methods available to identify candidate disease genes, and to select additionally genes between identified candidates, systematically associated with a given pathology. These various methods include differential expression analysis (eg, differential display, serial analysis of gene expression or SAGE), and position cloning methods. In the position cloning approach, the initial steps are quite similar or identical; frequently they are only the final steps that differ (see for example Rommens et al., 1989, Science 245, 1059-1065, Duyk et al., 1990, Proc. Natl. Acad. Sci U.S.A.87, 8995-8999). The main drawbacks of position cloning methods generally include: (a) the slow step of discovery, which often requires several years to reach success; (b) the high complexity of the techniques involved, which require highly trained individuals who must pay careful attention to details to achieve satisfactory results; (c) the labor-intensive nature of techniques that frequently require huge amounts of sequencing; (d) extreme expenses related to any slow, complex, labor-intensive effort. Position cloning can be considered as four discrete steps well known in the art. Each of these steps is described below. 2.2.1 LINK MAP FORMATION The first step in using position cloning for disease gene identification is the search for a genetic link between a locus involved in pathogenesis and several genotypic polymorphic markers. This step of segregation analysis in affected families. The formation of link maps takes advantage of the fact that the closer two genetic loci are to each other, the lower the chances of an independent recombinant event separating them. Therefore, the goal is to find a specific fragment of genomic DNA limited by two markers known to be systematically present in all affected members of a family but rarely present in unaffected members. If said genomic fragment can be identified, the photogenic locus will be located between the markers. The formation of link maps presents difficulties that vary according to the mode of inheritance of a disease. In an ideal link map, all carriers of an abnormal gene will be identified. In the case of an autosomal dominant disease, this is possible only theoretically if: (a) all carriers show the phenotype of the disease (ie, the penetration is complete); and (b) the manifestation of the disease is early. In the case of autosomal recessive disorders, it is only possible to detect homozygotes (all affected) and obligate heterozygotes (parents). Therefore, it is essential to have access to families. There are at least two homozygous affected living siblings when the map of an autosomal recessive disorder is established. In some fortunate cases of linkage mapping and analysis, one can easily rule out specific chromosomes that carry the disease gene of interest; in these rare cases, the gene search quickly becomes more focused. For example, DMD is a recessive disorder that is very rare in women. As a result, the search for the DMD gene can be safely limited to the X chromosome. However, in most cases, such a simplified approach is not available. One case is CF, where five years of intensive efforts were required to identify only the chromosome associated with the disease. 2.2.2 CROMOSOMAL LOCALIZATION The genomic fragment identified in the previous step is often very large (ie several million bases) and totally unknown in terms of the number and identity of genes it encodes. Accordingly, it is frequently essential to locate the genomic fragment on a specific chromosome in order to take advantage of the other known markers that may not yet be associated with the fragment. Chromosomal localization can be carried out by the use of polymorphic markers (for example microsatellites) identified in genomic DNA or large genomic fragments cloned in yeast artificial chromosomes (YACs) that have been assigned to specific human chromosomes. The location of chromosomes can also be effected by fluorescence labeling of a large identified genomic fragment (e.g., 100 kilobase) for hybridization and karyotype analysis (Dauwerse et al., 1992, Hum Mol. Genet., 1, 593-598). 2.2.3. ADDITIONAL REFINEMENT Once the genomic fragment identified to a specific chromosome is located, the largest possible number of polymorphic markers is used to enclose the smallest possible region (ie, locus) that encodes the gene of interest. This step can produce genomic fragments that are still very large, that is, from half a million to a million bases long. Since the average length of a gene is of the order of seventy thousand bases, this region will probably code many different genes. In addition, this approach does not distinguish between monogenic and polygenic disorders. If an apparent lack of genetic heterogeneity can not be clinically determined, then the actual degree of heterogeneity should be assessed by a systematic comparison of different families. In this very frequent case, the results of each family should be analyzed separately to determine if they are consistent with a "single locus" hypothesis. It is a complex problem since the genetic heterogeneity may not be clinically detectable (for example pituitary dwarfism, see above). Alternatively, apparent clinical heterogeneity may lead to erroneous conclusions in the sense that different genes are involved when, in fact, different allelic forms of the same gene are involved (eg, Becker's myopathy and DMD, see above). 2.2.4. FROM LOCUS TO GEN After having defined a genetic locus for a gene associated with disease using the above methods, there is much to be done before finally being able to identify the gene itself. The problem of identification covers two main difficulties. First, it is necessary to generate new markers for additional refinement of map formation. The new markers must be located as close as possible to the gene in question and finally within that gene. Second, it is necessary to demonstrate that the identified gene is the real cause of the disease. These two tasks require the use, in parallel, of a great variety of methods. Two of the most frequently used approaches are briefly described below. The collection of exons includes the cloning of short fragments generated from a locus totally identified in retroviral vectors that have been manipulated to reveal the presence of exons (ie, coding sequences) within a short fragment. Any clone • Positive (ie, clone containing an exon) functions as a new marker and must be sequenced later and marked on the map at the locus in order to define the relative position of each of said positive clones. The exons collection approach requires a lot of work to the extent that it requires massive amounts of DNA sequencing and "produces a substantial number of • 10 false positives and false negatives. Obviously, the map of exons generated includes exons of any gene within the locus and is not specific for exons of the gene of interest for the disease. Therefore, additional work is required. 15 Complementary DNA subtraction assays (cDNAs) employ cDNA libraries constructed from the cells of an affected individual and from cells of a healthy individual. The procedure has two successive phases. In phase one, the cDNA inserts from the The healthy individual is immobilized on a membrane and employed to trap (subtract) the homologous cDNA inserts present in the library of the affected individual. In phase two, the procedure is inverted: that is, the cDNA inserts from the library of the affected individual are immobilized and employed to subtract the homologous inserts from the healthy cDNA library. Accordingly, these two phases provide fragments of .DNA that are totally unique to the affected individual or the healthy individual, respectively. Any fragment homologue (similar but not identical) to a sequence present in the immobilized library is still trapped. Accordingly, this approach frequently results in a total loss of the gene of interest. Clones obtained by exon collection approaches or their .ADNc fractions are frequently used for direct hybridization in the case of: (a) yeast artificial chromosome splice segments (YAC contigs) that cover the locus of interest; (b) mRNA preparations obtained from affected and healthy individuals; and / or (c) enriched genomic libraries obtained from the same healthy and affected individuals. Any "positive" hybridization signal is then analyzed by sequencing.In the last stage of position cloning, ie, gene identification, one frequently faces results that can not accurately determine the relevant gene. The only remaining approach is to fully sequence and analyze the smallest genomic region of the defined locus, which can still be found within a range of 300 to 700 kilobases.The problematic nature of position cloning for disease gene identification is further clarified below, looking at some of the realities associated with this approach, position-cloning projects require so much work that they have been undertaken, in most cases, only by large pools of international research groups comprising at least three laboratories. by sets, each laboratory of a set of this type, in turn, c It typically consists of five or more researchers who devote essentially all their time and effort to the project. For example, identification of the CF gene required a total of eight years, finding the gene for type 1 polycystic liver disease (PKD1) required six years, and finding the ataxia-telangiectasia gene required more than five years. Many other examples could be mentioned and many position cloning efforts have not yet identified the target gene. Notably, they are all monogenic diseases, that is, only one gene is responsible for the disease and is the same gene in all cases of the disease. The difficulties increase in the context of polygenic or multifactorial disorders. Here, little progress has been made in the identification of genes. For example, after more than fifteen years of intensive research by a considerable number of research teams, the genetic causes of diabetes mellitus (type I and type II) remain largely unknown. The same can be said for chronic kidney failure (CRF), multiple sclerosis (MS), atherosclerosis, and many other diseases. This list presents only some of the most prevalent polygenic or multifactorial disorders. One of the main reasons for this situation is that, in the absence of any information to prove probable candidate genes, it is necessary to first establish a map of the loci associated with the disorder to specific chromosomal regions before having the opportunity to isolate the genes in question by cloning position (see above). Obviously, it would be considerably easier to completely avoid map formation and to work from mRNA transcripts of genes expressed in affected tissues. However, it has been found that this approach is virtually impossible using past methods. This is due, at least in part, to the fact that tissues and cells express a large number of genes. In addition, the genes associated with pathologies are frequently expressed at very low levels. Accordingly, the few mRNA transcripts of relevant diseases can be lost among a huge number of other transcripts. In addition to the problem of identification, the disease transcripts may show wide differences between the affected individuals. These intrinsic indications of prior position and subtraction methodologies are such that very small amounts of mRNA can not be used. The VGIDMR method for gene identification presented here offers a simple solution to this enormous problem. It allows to identify genes associated with phenotype, in monogenic contexts as well as in polygenic contexts, in a matter of weeks and not years and with a very low cost. 2.3 REPAIR OF LACK OF CORRESPONDENCE DNA mismatch repair genes comprise one or several mechanisms through which a high-fidelity DNA replication is maintained in cells under physiological conditions. Many researchers over the years have manipulated one or several of these genes to achieve several purposes. First described in bacteria, the mismatch repair system is applied when the MutS gene product recognizes and binds with a mismatched pair of bases (see Cox, EC, 1997, MutS, Proofreading and Cancer, Genetics 146, 443 -446). MutS works in conjunction with the products of the MutH and MutL genes; these three proteins together form the MutHLS mismatch repair system. A recent review has provided a detailed description of this system in eukaryotes (see, Kolodner, R., 1996, Biochemistry And Genetics of Eukaryotic Mismatch Repair, Genes Dev. 10, 1433-1442). Non-polyposis hereditary colon cancer (HNPCC) comes from mutations in the hMSH2 gene, the human homolog of the bacterial MutS gene, as shown by studies conducted in two laboratories in 1993 (see Fishel, R. Et al., 1993 , The Human Mutator Gene Homolog MSH2 And Its Association With Hereditary Nonpolyposis Colon Cancer, Cell 75, 1027-1038; Leach, FS et al., 1993, Mutations Of A MutS Homolog In Hereditary Nonpolyposis Colorectal Cancer, Cell 75, 1215-1225). The human MSH2 protein also works through the link with non-corresponding DNA elements (Fishel, R. et al., 1994, Binding Of Mismatched Microsatellite DNA Sequences By The Human MSH2 Protein, Science 266, 1403-1405; Fishel, R. et al., 1994, Purified Human MSH2 Protein Binds To DNA Containing Mismatched Nucleotides, Cancer Res. 54, 5539-5542). Another human homologue of bacterial MutS has recently been linked to cancer susceptibility (Edelman, W. et al., November 14, 1997, Mutation In The Mismatch Repair Gene Msh6 Causes Cancer Susceptibility, Cell 91, 467-477). Traditionally, the manipulation of the lack of correspondence repair system has been employed in several ways. For example, a method for in vitro recombination of mismatches has been described in which E. Coli deficient in MutS is used (Resnick, MA and Radman., M., August 2, 1994, System For Isolating And Producing New Genes, Gene Products And DNA Sequences, U.S. Patent No. 5,334,522). Others have described the use of MutS protein to detect DNA mismatches in vitro with antibodies (Wagner, RE, Jr., and Radman, M., April 2, 1997, Method For Detection Of Mutations, European Patent EP 0 596 028 Bl). Others have used the inability of the system to repair loops of five nucleotides or more in vivo to design a system capable of detecting a unique mismatch in a DNA fragment of up to 10 kilobases (see Faham, M. and Cox, DR, 1997). , A Novel in vivo Method To Detect DNA Sequence Variation, Genome Research 5, 474-482). 3. COMPENDIUM OF THE INVENTION This invention provides a method for identifying a gene or allele, or several genes or alleles, that serve as the basis for a phenotype of interest. Regarding this aspect, genes or alleles are identified which have a specific function, either cause or contribute to the cause or pathogenesis of a specific disease, or are associated with a specific phenotype, by virtue of their selection by the method . This invention is based, at least in part, on recognition that comparison of a population of nucleic acid molecules with another population of nucleic acid molecules or several other populations of nucleic acid molecules, in order to isolate genes that serve The basis for specific phenotypic traits is greatly facilitated by first taking measures to ensure the internal homogenization of one or several of the populations to be compared before carrying out the external comparison of two or more populations. Regarding this aspect, an internal homogenization is carried out through a first round of hybridization and classification of DNA duplexes corresponding to the uneven DNA duplexes. Similarly, an external comparison is made by a second round of hybridization and by sorting the matched DNA duplexes of the uneven DNA duplexes, as described in detail below. This invention offers a method for identifying one or several genes that form the basis of a defined phenotype, comprising the following steps in the indicated order: (a) removing the unrelated duplex nucleic acid molecules formed from the hybridization within from each of two source populations of nucleic acids; and (b) retaining the unrequited duplex nucleic acid molecules formed from the hybridization between the two source populations, the molecules retained in step (b) comprise the gene or the various genes serving (s) as the basis for the defined phenotype.
In addition, this invention provides a method for identifying one or more genes that form the basis of a defined phenotype, comprising the following steps in the established order: (a) removing the unrequited duplex nucleic acid molecules formed from the hybridization within a first source population of nucleic acids; and (b) retaining the unrequited duplex nucleic acid molecules formed from the hybridization between the first source population and a second source population of nucleic acids, the molecules retained in step (b) comprise the gene or genes that serve of base for the defined phenotype. Nucleic acid sample populations can be derived from * many different sources. In one embodiment, the first source population and the second source population are each, nucleic acid populations derived from at least two consanguineous individuals. In another embodiment, the first source population and the second source population are each populations of nucleic acids derived from more than two individuals that have consanguinity. In one embodiment, the first source population and the second source population are each populations of nucleic acids derived from two to six individuals that have consanguinity. In another embodiment, the first source population and the second source population are each populations of nucleic acids derived from three individuals that have consanguinity. In another modality, each source population is a cell line. In addition, sample populations of nucleic acids can be manipulated in various ways in order to facilitate the identification of genes. In one embodiment, the source populations are normalized cDNA libraries in order to facilitate the identification of rare transcripts. In another embodiment, the source populations are linearized cDNA libraries to facilitate hybridization. In another modality, the source populations are normalized and linearized. In addition, nucleic acid sample populations can be manipulated in various ways in order to facilitate the removal of unwanted cDNAs. In one embodiment, the two source populations are DNA, the DNA of a source formation is labeled, and the hybridization in step (b) is carried out using an excess of labeled DNA. In another embodiment, the excess of labeled DNA is a triple excess.
The genes that serve as the basis for virtually any defined phenotype can be identified using the method of the invention. In a preferred embodiment, the defined phenotype is selected from the group consisting of a plant resistance phenotype, a phenotype of resistance to microorganisms, cancer, osteoporosis, obesity, type II diabetes, and a prion-related disease. Additional examples of preferred defined phenotypes appear below. Defined plant phenotypes include, but are not limited to, herbicide resistance, resistance to predatory insects, resistance to fungal infections, increased yields, frost resistance, resistance to dehydration, increased stem resistance, and many other characteristics. Defined microorganism phenotypes include, but are not limited to, susceptibility or resistance to antibiotics, detoxification of liquid, soil, solids and / or gases contaminated by contaminants or toxic compounds (for example, dioxin, nitrous oxides, carbon monoxide, sulfur dioxide, free radicals, etc.). Defined animal and / or veterinary phenotypes include, but are not limited to, resistance to neurological disorders such as prion-related diseases, infectious disorders (eg, swine plague), foot-and-mouth disease, and many others. Defined human phenotypes include, but are not limited to, cancer susceptibility, autoimmune diseases, neurological disorders, metabolic disorders (e.g., diabetes, obesity), systemic diseases (e.g., osteoporosis), and many others. This invention offers a method for identifying one or more genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which it is derived. a second cDNA library is derived. The method comprises the steps of (a) hybridizing the insert DNA from the first cDNA library with itself, (b) hybridizing the insert DNA from the second additional cDNA library to itself, (c) putting in contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein, (d) contacting the .DNA hybridized in step (b) with a second unequal immobilized binding protein, (e) separating the DNA unlinked of the ligated DNA contacted in step (c), (f) separate the unbound DNA from the ligated DNA contacted in step (d), (g) label the unbound DNA separated in step (f) ) with a label capable of binding a partner molecule or immobilized agent to a substrate, (h) hybridizing the labeled DNA with unbound DNA separated in step (e), (i) contacting the hybridized DNA in step (h) ) with a third immobilized immobilized binding protein, (j) separating the unbound DNA from the ligated DNA contacted in step (i), (k) contacting the unbound DNA separated in step (j) with the partner molecule or immobilized agent in the substrate capable of binding the label, and (1) separating the unbound DNA of the ligated DNA contacted in step (k), said unattached .DNA separated in step (1) encodes one or several genes • identified as serving as a foundation for the defined phenotype. In addition, this invention offers a method for identifying one or several genes that serve as a basis for a phenotype defined from organisms that have consanguinity. The method comprises the steps of (a) hybridizing the insert DNA from a first collection of cDNA libraries derived from organisms having the phenotype defined with itself, (b) contacting the DNA hybridized in step (a) with a first immobilized unbound linker protein, (c) separating the unbound DNA from the ligated DNA put in contact in step (b), (d) mark the DNA not ligated separately in step (c) with a marker capable of binding a molecule or partner agent immobilized on a substrate, (e) hybridizing the DNA labeled in step (d) with .DNA of insert from a second collection of cDNA libraries derived from organisms that do not have the phenotype Defined (f) contacting the hybridized DNA in step (e) with a second immobilized mismatched binding protein, (g) separating the unbound DNA from the ligated DNA contacted in step (f), (h) contacting the Unbound DNA separated in step (g) with the molecule or The partner agent immobilized on the substrate capable of binding the label, and (i) separating the unbound DNA from the ligated DNA contacted in step (h), said unbound DNA separated in step (i) encodes identified genes that they serve as a basis for the defined phenotype. This paragraph presents a preferred embodiment wherein the DNA labeled in step (d) corresponds to undesired marked material for removal. In addition, this invention offers a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from the which is derived a second cDNA library. The method comprises the steps of (a) hybridizing the insert DNA from the first cDNA library with itself, (b) hybridizing the insert DNA from the second library of additional cDNAs to itself, (c) contacting the hybridized DNA in the step (a) with a first immobilized unequal binding protein, (d) contacting the DNA hybridized in step (b) with a second immobilized, unequal binding protein, (e) separating the unbound DNA from the ligated DNA set in contact in step (c), (f) separating the unbound DNA from the ligated DNA put in contact in step (d), (g) labeling the unbound DNA separated in step (f) with a label capable of binding a molecule or partner agent immobilized on a substrate, (h) hybridizing the DNA labeled in step (g) with unbound DNA separated in step (e), (i) contacting the hybridized DNA in step (h) with an immobilized third unequal binding protein, (j) separating the unbound DNA from the ligated DNA contacted in step (i), (k) releasing the ligated DNA separated in step (j) of the immobilized third unequal binding protein, (1) contacting the unbound DNA separated in step (k) with the molecule or partner agent immobilized on the substrate capable of binding the marker, (m) denaturing the DNA contacted in step (1), and (n) separating the unbound DNA from the denatured ligated DNA in step (m), said unbound DNA separated in step (n) encodes one or several identified alleles that form the basis of the defined phenotype. In addition, this invention offers a method for identifying one or more alleles that form the basis of a phenotype defined from organisms that have consanguinity. The method comprises the steps of (a) hybridizing the insert DNA from a first collection of cDNA libraries derived from organisms having the phenotype defined with itself, (b) contacting the hybridized DNA in step (a) with an immobilized immobilized first binding protein, (c) separating the unbound DNA from the ligated DNA contacted in step (b), (d) labeling the unbound DNA separated in step (c) with a marker capable of ligating a molecule or partner agent immobilized on a substrate, (e) hybridizing the DNA labeled in step (d) with insert DNA from a second collection of cDNA libraries derived from organisms that do not have the defined phenotype, (f) contacting the DNA hybridized in step (e) with a second immobilized mismatched binding protein, (g) separating the unbound DNA from the ligated DNA contacted in step (f), (h) releasing the ligated DNA separately in step (g) of the second unlinked link protein i mobilized, (i) contacting the DNA released in step (h) with the molecule or partner agent immobilized on the substrate capable of binding the label, (j) denaturing the DNA contacted in step (i), and (k) ) separating the ligated DNA from denatured unbound DNA in step (j), said ligated DNA separated in step (k) encodes one or more alleles that form the basis of the defined phenotype. Collections of cDNA libraries vary according to the specific attributes of the sample source. In one embodiment, the first collection of cDNA libraries and the second collection of cDNA libraries are each populations of nucleic acids derived from at least two individuals that are inbred. In another embodiment, the first collection of cDNA libraries and the second collection of cDNA libraries each are populations of nucleic acids derived from more than two individuals that are inbred. In one embodiment, the first collection of cDNA libraries and the second collection of cDNA libraries each are populations of nucleic acids derived from two to six individuals that have consanguinity. In another embodiment, the first collection of cDNA libraries and the second collection of cDNA libraries are each populations of nucleic acids derived from three individuals having consanguinity. A population of nucleic acid samples can be left unchecked or can be labeled with a single marker in several ways. In one embodiment, labeling is effected by polymerase chain reaction using a 5 'biotinylated initiator. In another embodiment, labeling is effected by polymerase chain reaction using an initiator labeled with 5 'peptide. In a preferred embodiment, labeling using a 5 'biotinylated initiator is carried out when an unlabeled sample population and a labeled sample population are employed. In another preferred embodiment, labeling using an initiator labeled with 5 'peptide is carried out when multiplexing is performed, i.e., using three or more populations of nucleic acid samples. A sample population of labeled nucleic acids can be classified in several ways. In one embodiment, the substrate for binding the biotin label is streptavidin. In another embodiment, the substrate for ligating the peptide tag is • an antibody. In another embodiment, the antibody is an anti-peptide antibody. In another embodiment, the anti-peptide antibody is monoclonal. Several mismatched, recombinant and wild type binding proteins can be used to effect sorting (ie binding and releasing) of .DNA duplexes containing non-corresponding elements.
In one embodiment, the mismatch binding protein is MutS from E. coli. In another embodiment, the unequal binding protein is hMSH2. In another embodiment, the unequal binding protein is a protein complex hMSH2-hMSH6. This invention offers a method for identifying one or more genes that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or an individual from which a second cDNA library is derived. The method comprises the steps of (a) Amplify the insert DNA from the first cDNA library by polymerase chain reaction, (b) amplify the insert DNA from the second cDNA library by polymerase chain reaction, (c) hybridize the amplified DNA in step (a) with himself; (d) hybridize the amplified DNA in step (b) with itself; (e) contacting the DNA hybridized in step (c) with a first immobilized MutS, (f) contacting the DNA hybridized in step (d) with a second immobilized MutS, (g) separating the unbound DNA of the ligated DNA contacted in step (e), (h) separating the unbound DNA from the ligated DNA contacted in step (f), (i) amplifying the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplifying and labeling the unbound DNA separated in step (h) by polymerase chain reaction employing 5 'biotinylated primers, (k) hybridizing the amplified DNA and labeling in step (j) with DNA amplified in step (i), (1) contacting the DNA hybridized in step (k) with a third Immobilized MutS, (m) separating the unbound .DNA from the ligated DNA contacted in step (1), (n) contacting the unbound DNA separated in step (m) with immobilized streptavidin, and (o) separating the unbound DNA from the ligated DNA contacted in step (n), said unbound DNA separated in the step of (o) encodes one or several identified genes that form the basis of the defined phenotype. In addition, this invention offers a method for identifying one or several genes that form the basis of a disease phenotype for healthy and affected individuals that are inbred. The method comprises the steps of (a) amplifying the insert DNA from a first collection of .DNA libraries derived from individuals affected by polymerase chain reaction, (b) hybridizing • the DNA amplified in step (a) with itself, (c) contacting the DNA hybridized in step (b) with a first immobilized MutS 5, (d) separating the unbound DNA from the ligated DNA put in contact in step (c), (e) amplifying and labeling the unbound DNA separated in step (d) by polymerase chain reaction using the biotinylated primers of 5, (f) amplifying the insert DNA from a second collection of cDNA libraries derived from healthy individuals by polymerase chain reaction, (g) hybridizing the amplified DNA and labeling in step (e) with the amplified DNA in step (f), (h) contacting the .ADN hybridized in step (g) with a second immobilized MutS, (i) separating the unbound DNA from the ligated DNA contacted in step (h), (j) contacting the unbound DNA separated in step (i) with immobilized streptavidin, and (k) separating the DNA unbound of the ligated DNA contacted in step (j), said unbound DNA Separated in step (k) encodes one or several identified genes that form the basis of the disease phenotype. In addition, this invention provides a method for identifying one or more alleles that form the basis of a. defined phenotype presented by a cell or an individual starting from which a first .ADNc library is derived, but not presented by a cell or individual from which a second cDNA library is derived. The method comprises the steps of (a) amplifying the insert DNA of the first cDNA library by polymerase chain reaction, (b) amplifying the insert DNA of the second cDNA library by polymerase chain reaction, (c) ) hybridizing the amplified DNA in step (a) with itself, (d) hybridizing the amplified DNA in step (b) with itself, (e) contacting the DNA hybridized in step (c) with a first immobilized MutS , (f) contacting the DNA hybridized in step (d) with a second immobilized MutS, (g) separating the unbound DNA from the linked DNA contacted in step (e), (h) separating the non-DNA. bound the bound DNA contacted in step (f), (i) amplify the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplify and label unbound DNA separated in step (h) by polymerase chain reaction using 5 'biotinylated primers, (k) hibri give the amplified and labeled DNA in step (j) with DNA amplified in step (i), (1) contact the DNA hybridized in step (k) with a third immobilized MutS, (m) separate the DNA not bound ligated DNA contacted in step (1), (n) release the ligated DNA separated in step (m) of the immobilized third MutS, (o) contact the DNA released in step (n) with streptavidin immobilized, (p) denaturing the DNA contacted in step (o), and (q) separating the unbound DNA from the denatured ligated DNA in step (p), said unbound DNA separated in step (q) encodes one or several identified alleles that form the basis of the defined phenotype. In one embodiment, the release of ligated DNA from the third MutS immobilized in step (n) is carried out using ATP or proteinase K. Additionally, this invention offers a method for identifying one or several affected alleles that form the basis of a phenotype. of disease from healthy and affected individuals who have consanguinity. The method comprises the steps of (a) amplifying the insert DNA from a first collection of cDNA libraries derived from the individuals affected by polymerase chain reaction, (b) hybridizing the amplified DNA in step (a) with itself, (c) contacting the DNA hybridized in step (b) with a first immobilized MutS, (d) separating the unbound DNA from the ligated DNA contacted in step (c), (e) amplifying and labeling the DNA. Unbound DNA separated in step (d) by polymerase chain reaction using 5 'biotinylated initiators, (f) amplifying an insert DNA from a second library of .ADNc libraries derived from healthy individuals by polymerase chain reaction, (g) hybridizing the amplified DNA and labeling in step (e) with the amplified DNA in step (f), (h) contacting the DNA hybridized in step (g) with a second immobilized MutS, (i) separating the unbound DNA from the ligated DNA contacted in step (h), (j) releasing the ligated DNA separated in step (i) of the second immobilized MutS, (k) contacting the DNA released in step (j) with immobilized streptavidin, (1) denaturing the DNA contacted in the step (k), and (m) separate the ligated DNA from denatured unbound DNA in step (1), said separated ligated DNA in step (m) encodes one or several affected alleles that form the basis of the disease phenotype. In one embodiment, the release of DNA ligated from the second MutS immobilized in step (j) is carried out using ATP or proteinase K. In addition, this invention offers a method for identifying one or more genes that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or individual from which several additional cDNA libraries are derived. The method comprises the steps of (a) hybridizing the insert DNA of each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized, unequal binding protein, ( c) separating the unbound DNA from the ligated DNA individually contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a different marker capable of binding an immobilized partner molecule in a substrate, (e) hybridizing the labeled DNA separately in step (d), contacting the DNA hybridized in step (e) with an immobilized, unequal binding protein, and (g) separating the unbound DNA from the DNA ligated put in contact in step (f). Furthermore, this invention offers a method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from the which are derived several additional cDNA libraries. The method comprises the steps of (a) amplifying the insert DNA of each cDNA library by polymerase chain reaction, (b) hybridizing each separate population of amplified DNA in step (a) with itself, (c) putting in contact each separate population of DNA hybridized in step (d) individually with immobilized MutS, (d) separate the unbound DNA from ligated DNA contacted in step (c), (e) label each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a different 5 'peptide labeled primer capable of binding a partner molecule immobilized on a substrate, (f) hybridizing the labeled DNA in step (e), (g) ) contacting the DNA hybridized in step (f) with immobilized MutS, and (h) separating the unbound DNA from the ligated DNA contacted in step (g) In addition, this invention offers a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from the which are derived several additional cDNA libraries. The method comprises the steps of (a) hybridizing an insert DNA from each cDNA library to itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized, unequal binding protein, (c) separating the unbound DNA from the ligated DNA contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a different marker capable of ligating a partner molecule immobilized on a substrate (e) hybridizing to the DNA labeled in step (d), (f) contacting the DNA hybridized in step (e) with an immobilized, unequal binding protein, and (g) separating the unbound DNA from the ligated DNA contacted in step (f). In addition, this invention offers a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from the which are derived several additional cDNA libraries. The method comprises the steps of (a) amplifying an insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing the amplified DNA from each library in step (a) with itself, ( c) contacting the .ADN from each library hybridized in step (b) individually with an immobilized mismatched binding protein, (d) separating the unbound DNA from the ligated DNA contacted in step (c), ( e) amplify and label each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a different 5 'peptide labeled primer, (f) hybridizing the amplified DNA and labeling in step (e), (g) contacting the hybridized DNA in step ( f) with an immobilized mismatched binding protein, (h) separating the unbound DNA from the ligated DNA contacted in step (g), (i) releasing the ligated DNA separately in step (h), and (j) Separate the DNA released in step (i) into individual strands. In addition, this invention provides a method for identifying one or more alleles that form the basis of a defined phenotype comprising the following steps in the indicated order: (a) removing non-matched duplex nucleic acid molecules formed from hybridization within each of several source populations of nucleic acids; (b) retaining unrequited duplex nucleic acid molecules formed from hybridization in the various source populations; (c) separating the unrequited strands conserved in step (b), said separate strands comprising one or several alleles that form the basis of the defined phenotype. This invention offers a method for identifying one or several genes that form the basis of a defined phenotype. The method comprises the steps of (a) removing unrequited duplex nucleic acid molecules formed from hybridization within each of several nucleic acid source populations, and (b) retaining unrequited duplex nucleic acid molecules formed from of the hybridization between the various source populations, the molecules retained in step (b) comprise one or several genes that form the basis of the defined phenotype. In one embodiment, the various source populations comprise at least one normalized cDNA library. In another embodiment, the various source populations comprise at least one linearized cDNA library. In another modality, the various source populations consist of DNA, the DNA of each of the source populations is labeled with a different marker, and the hybridization in step (b) is carried out using an excess of labeled DNA from one or several source populations. In one embodiment, the excess of labeled DNA is a triple excess. In addition, in another embodiment, each of the source populations is derived from a cell line. This invention also provides a method for identifying one or more genes that form the basis of a defined phenotype present in a cell or individual from which a first cDNA library is derived, but not present in a cell or individual from which several additional cDNA libraries are derived. The method comprises the steps of (a) hybridizing the insert DNA from the first cDNA library with itself, (b) hybridizing the insert DNA from each library of the various additional cDNA libraries to itself, (c) ) contacting the DNA hybridized in step (a) with an immobilized, unequal binding protein, (d) contacting each separate population of .ADNc hybridized in step (b) individually with an immobilized, unequal binding protein, ( e) separating the unbound DNA from the ligated DNA contacted in step (c), (f) separating the unbound DNA from the ligated DNA individually contacted in step (d), (g) labeling each separate population of the Unbound DNA separated in step (f) with a distinguishable marker capable of ligating a partner molecule immobilized on a substrate, (h) hybridizing the DNA labeled separately in step (g) with unbound DNA separated in step (e) , (i) contacting the hybridized DNA in step (h) c on an immobilized unequal binding protein, (j) separating the unbound DNA from the ligated DNA contacted in step (i), (k) contacting the unbound DNA separated in step (j) with the partner molecule of each different marker, and (1) separating the unbound DNA from the ligated DNA contacted in step (k), said unbound DNA separated in step (1) encoding one or several identified genes that form the basis of the phenotype definite. In one embodiment, one or more cDNA libraries are normalized. In another embodiment, one or several cDNA libraries are linearized. In another embodiment, labeling is carried out by polymerase chain reaction using an initiator labeled with peptide 5. In another embodiment, at least one immobilized partner molecule is an antibody. In another embodiment, the antibody is an anti-peptide antibody. In another embodiment, the hybridization in step (h) is carried out using an excess of labeled .DNA. In another embodiment, the excess of labeled DNA is a triple excess. In another embodiment, an immobilized mismatched binding protein is MutS. In one embodiment, the defined phenotype is selected within the group consisting of a plant phenotype, a microorganism phenotype, and a pathological phenotype. In another embodiment, the defined phenotype is a pathological phenotype selected from the group consisting of cancer, osteoporosis, obesity, type II diabetes, and a prion-related disease. This invention further provides a method for identifying one or more genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which several additional cDNA libraries are derived. The method comprises the steps of (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying the insert DNA from each of the several additional cDNA libraries by polymerase chain reaction, (c) hybridizing the amplified DNA in step (a) with itself, (d) hybridizing each separate population of amplified DNA in step (b) with itself, (e) contacting the hybridized DNA in step (c) with immobilized MutS, (f) contacting each separate population of DNA hybridized in step (d) individually with immobilized MutS, (g) separating the unbound DNA from the ligated DNA contacted in step (e), (h) separating the unbound DNA from the ligated DNA contacted in step (f), (i) labeling the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) labeling each separate population of unbound DNA separated in step (h) by polymerase chain reaction employing an initiator having a distinguishable 5"peptide tag capable of ligating a partner molecule immobilized on a substrate, (k) hybridizing the DNA labeled in step (i) with DNA labeled in step (j), (1) contacting the hybridized DNA in step (k) with immobilized MutS, (m) separating the unbound DNA from the ligated DNA contacted in step (1), (n) contacting the unbound DNA separately in the step (m) with one or more partner molecules capable of binding the labeled primers with distinguishable 5N peptides, and (o) separating the unbound .DNA from the ligated DNA contacted in step (n), said unbound DNA separated into step (o) encodes one or several identified genes that form the basis of the defined phenotype. The invention provides a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which it is derived. several additional cDNA libraries are derived. The method comprises the steps of (a) hybridizing the insert DNA from the first cDNA library with itself, (b) hybridizing the insert DNA from each of the several additional cDNA libraries to itself, (c) putting in contact the DNA hybridized in step (a) with an immobilized unequal binding protein, (d) contacting each separate population of cDNA hybridized in step (b) individually with an immobilized, unequal binding protein, (e) separating the unbound DNA of the ligated DNA contacted in step (c), (f) separating the unbound DNA from the ligated DNA contacted in step (d), (g) labeling each separate population of the unbound DNA separately in step (f) with a distinguishable marker capable of ligating a partner molecule immobilized on a substrate, (h) hybridizing the DNA labeled separately in step (g) with unbound DNA separated in step (e), (i) contacting the DNA hybridized in step (h) with a protein-link dispa immobilized grid, (j) separating the unbound DNA from the ligated DNA contacted in step (i), (k) contacting the unbound DNA separated in step (j) of the immobilized, unequal binding protein, (1) contacting the DNA released in step (k) with one or more partner molecules capable of binding the different markers, (m) denaturing the DNA placed in contacts in step (1), and (n) separating the unbound DNA of denatured ligated DNA in step (), said unbound DNA separated in step (n) encodes one or more identified alleles that form the basis of the defined phenotype. In one embodiment, at least one cDNA library is normalized. In another embodiment, at least one cDNA library is linearized. In one embodiment, labeling is carried out by polymerase chain reaction using primers labeled with 5 'peptide. In another embodiment, at least one immobilized partner molecule is an antibody. In another embodiment, the antibody is an anti-peptide antibody. In another embodiment, the hybridization in step (h) is carried out using an excess of labeled DNA. In another embodiment, the excess of labeled DNA is a triple excess. In another embodiment, at least one of the immobilized mismatched binding proteins is MutS. This invention offers a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or individual from which several additional cDNA libraries are derived. The method comprises the steps of (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying the insert DNA from each of the several additional cDNA libraries by polymerase chain reaction, (c) hybridizing the amplified DNA in step (a) with itself, (d) hybridizing the DNA amplified from each library in step (b) with itself, (e) contacting the DNA hybridized in step (c) with immobilized MutS, (f) contacting each population of hybridized DNA in step (d) ) individually with immobilized MutS, (g) separating the unbound DNA from ligated DNA contacted in step (e), (h) separating the unbound DNA from the ligated DNA contacted in step (f), (i) ) amplifying the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplifying and labeling each population of unbound DNA separated in step (h) by polymerase chain reaction using an initiator labeled with a distinguishable 5"peptide, (k) hybridizing the .ADN amplified and labeled on the p aso (j) with DNA amplified in step (i), (1) contacting the DNA hybridized in step (k) with immobilized MutS, (m) separating the unbound DNA from the ligated DNA contacted in step (1), (n) releasing the separated ligated DNA in the immobilized step (m) of MutS, (o) contacting the DNA released in step (n) with one or more specific immobilized antibodies for each peptide-labeled primer 5"distinguishable, (p) denaturing the DNA contacted in step (o), and (q) separating unbound DNA denatured in step (p) said unbound DNA separated in step (q) encodes one or several identified alleles that form the basis of the defined phenotype. In one embodiment, the release of the ligated DNA from MutS immobilized in step (n) is carried out using ATP or proteinase K. In another embodiment, the method further comprises a step of using the gene or the various genes or identified alleles to carry out a prognosis or diagnosis. In one embodiment, the gene or allele or the identified genes or alleles, or else a protein encoded by them, is an objective of pharmacological intervention. In another modality, the various source populations are from three to twelve source populations. In another modality, the various source populations are within a range of three to six source populations. In another embodiment, the plurality of source populations consists of four source populations. This invention offers a method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which it is derived. several additional cDNA libraries are derived. The method comprises the steps of (a) hybridizing the insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized mismatched binding protein. , (c) separating the unbound DNA from the ligated DNA individually contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a distinguishable marker capable of binding a molecule partner immobilized on a substrate, (e) hybridizing the labeled DNA separately in step (d), (f) contacting the DNA hybridized in step (e) with an immobilized, unequal binding protein, and (g) separating the Unbound DNA of the ligated DNA contacted in step (f).
This invention offers a method for identifying one or several genes that form the basis of a defined phenotype presented % by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which several additional DNA libraries are derived. The method comprises the steps of (a) amplifying the insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing each separate population of amplified DNA in step (a) with itself, (c) contacting each separate population of DNA hybridized in step (b) individually with immobilized MutS, (d) separating the unbound DNA from the ligated DNA contacted in step (c), (e) mark each separate population of unbound DNA separated in step (d) by polymerase chain reaction using an initiator having a distinguishable 5 'peptide tag capable of binding a partner molecule immobilized on a substrate, (f) hybridizing the labeled DNA in step (e), (g) contacting the DNA hybridized in step (f) with immobilized MutS, and (h) Separate the unbound DNA from the ligated DNA contacted in step (g). This invention offers a method for identifying one or several alleles that form the basis of a defined phenotype presented by a cell or an individual from which a derivative is derived.
First cDNA library, but not presented by a cell or individual from which several additional cDNA libraries are derived. The method comprises the steps of (a) hybridizing the insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized, unequal binding protein. , (c) separating the unbound DNA from the ligated DNA contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a distinguishable marker capable of binding a partner molecule immobilized on a substrate, (e) hybridizing the labeled DNA in step (d), (f) contacting the DNA hybridized in step (e) with an immobilized, unequal binding protein, and (g) separating the DNA not ligated ligated DNA contacted in step (f). This invention offers a method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which it is derived. several additional cDNA libraries are derived. The method comprises the steps of (a) amplifying the insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing the amplified DNA from each library in step (a) with itself, ( c) contacting the DNA of each library hybridized in step (b) individually with an immobilized mismatched binding protein, (d) separating the unbound DNA from ligated DNA contacted "in step (c), (e) amplifying and labeling each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a primer labeled with a different 5 'peptide, (f) hybridizing the amplified DNA and labeling it in step ( e), (g) contacting the hybridized DNA in step (f) with an immobilized immobilized binding protein, (h) separating the unbound DNA from the ligated DNA contacted in step (g), (i) releasing the ligated DNA separately in step (h), and (j) ) separating the DNA released in step (i) into individual strands. This invention offers a method for identifying one or more alleles that form the basis of a defined phenotype. The method comprises the steps of (a) removing the unrequited duplex nucleic acid molecules formed from hybridization within each of several nucleic acid source populations, (b) retaining the unrequited duplex nucleic acid molecules formed from of the hybridization between the various source populations, and (c) separating the unrequited strands conserved in step (b), said separate strands comprising one or several alleles that form the basis of the defined phenotype. 4. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic representation of a VGIDMR approach for phenotype samples obtained from sources without at least one common ancestor (e.g., cell line samples).; healthy and diseased nodes within a single tissue sample). PCR is a polymerase chain reaction. Figure 2 is a schematic representation of a VGIDMR approach for phenotype samples obtained from sources that have at least one common ancestor (e.g., tissue samples from healthy siblings affected by the disease). Figure 3 is a flow chart depicting the phenotype selection process to be employed before using the VGID method of the invention. Figure 4 is a schematic map of five isolated hDinP clones using the VGID ^ method and samples of cell lines as entry phenotypes. The VGIDMR approach employed is the approach illustrated in Figure 1. A lymphoblast cell line was chosen as the no cell line. 1 since it expresses a specific alteration in a DNA repair pathway (ie, "with phenotype" in Figure 1); A line of hepatocyte cells was selected as cell line no. 2 (ie, "no phenotype" in Figure 1) • Figure 5A-B show BLASTX search results and computerized analysis of the hDinP clone listed in SEQ ID NO: 1 (No. 1).
Figure 6 shows the BLASTX search result and computerized analysis of the hDinP clone listed in SEQ ID N0: 2 (Tor-M). Figures 7A-B show BLASTX search results and computerized analysis for the hDinP clone listed in SEQ ID NO: 3 (No. 3). Figures 8A-B show BLASTX search result and computerized analysis for hDinP clone listed in SEQ ID NO.4 (* D • Figures 9A-B show the BLASTX search results and computer analysis of the hDinP clone listed in SEQ ID NO: 5 (* 2) 5. DETAILED DESCRIPTION OF THE INVENTION The present invention provides a method, generally known such as the ValiGeneMR identification method, or the VGID ^ method to identify a gene or several genes linked to a phenotype specified by the user.In this aspect, genes linked to a phenotype include genes that cause the phenotype of interest, genes that contribute simply to the phenotype that is partially due to genetic factors and partially due to environmental factors, as well as structurally altered genes that arise as an effect of a phenotype.The methodology comprising VGIDMR can be used to carry out an analysis based on genome function protein coding of any organism independently of the biological realm.In addition, the VGID1 ^ method can identify r simultaneously several alleles of the gene of interest associated with multiple phenotypes, including disease genotypes. Accordingly, specific diagnostic tools for phenotype are provided by the genes identified using the VGIDMR method. Particularly, such diagnostic tools can be used as an indication of the presence of the phenotype of interest. In addition, phenotype-specific forecasting tools are provided by genes identified using the VGID1 ^ method; such prognostic tools can be used to indicate or predict the course of a disease and / or the outcome of a disease for various disease phenotypes. The VGID ^ methodology is based on a constant base principle, that is, the ability to specifically trap and subsequently release hybrids of uneven artificial DNAs formed by formation interactions between the cDNAs that come from phenotypically different sources. Thus, the VGIDMR methodology is a powerful molecular comparison tool that does not require global sequence information. On the contrary, a comparison between phenotypic groups is achieved by using VGIDMR formation interactions and subsequent classification of matching and unequal hybrids. The details of the VGIDMR method vary according to the "comparison" to be made. This is due to the fact that VGIDMR transcripts derived from different sources vary in their "complexity" (ie, genetic heterogeneity), and therefore must be subjected to slightly different processing approaches, in accordance with what is described in details below. In the modality the VGIDMR method is used to isolate transcripts that are identical between phenotypically distinct groups. In another embodiment, the VGID ^ method is used to isolate different transcripts between such groups. The VGIDMR method can be applied in general terms to identify the genes that are the basis for specific functions. Common income phenotypes for use with the VGIDMR method are healthy phenotypes (normal) and affected phenotypes (patients). Other common entry phenotypes are susceptible phenotypes and resistant phenotypes (eg, susceptible to viruses and resistant to antiviral agents, susceptible and antibiotic resistant microbes, susceptible plants resistant to herbicides, insects susceptible and resistant to insecticides). Regarding this aspect, a person skilled in the art will recognize that the VGID method can be applied virtually anywhere where two or more phenotypes entered are identified., regardless of the biological field. Guidelines for the selection of entered phenotypes are offered in Section 5.4 below. The VGIDMR method uses nucleic acids obtained or derived from at least two source groups, as the initial material. In a preferred embodiment, the nucleic acid is cDNA made from messenger RNA (mRNA), preferably poly A total RNA from source groups. Small amounts of mRNA are sufficient to use the VGID, 1MR method. This flexibility in the amount of input allows a significant genetic analysis of tissue samples in rare patients such as. The lower limit amount of source nucleic acid that is required is the minimum amount for the construction of a .ADNc library (ie, from approximately 1 ng to 1 μg per source with most cDNA library construction techniques) . At its most basic level, the VGID "1 * method can be considered as a technique for subtracting expressed genes." The VGID ** method is based on two rounds of highly efficient, unequal binding protein chromatography to trap (eg, by binding on MutS immobilized from; (a) internally heterologous nucleic acids (round 1, see upper columns in figures 1 and 2); and (b) externally heterologous nucleic acids (round 2, see lower columns in Figures 1 and 2) taken in accordance with what is described below. As for this aspect, the internally heterologous nucleic acids refer to heterologous nucleic acids (ie, nucleic acids having no identical counterparts) within each of two or more source groups and the externally heterologous nucleic acids refer to nucleic acids heterologous among the source groups. In the first round, it is generally the non-trapped material from the input phenotypes that is of primary interest (said non-trapped material is said to be "homogenized"). In contrast in the second round, the material of interest is often the trapped material. This trapped material must necessarily be a hybrid duplex, artificially formed of similar but not identical cDNA strands, a strand originating from a material left untapped in the first subtraction step. For its use the VGIDMR method, nucleic acids are obtained from at least two sources. The best results are obtained when most of the nucleic acids are structurally identical between different sources since this will result in the most effective subtraction in the second round. This situation is more likely to arise when the groups of input sources are phenotypically identical but for the phenotype of interest. Therefore, the choice of input sources ultimately determines if the gene of interest expressed is identified. Frequently, the most appropriate samples are obtained from large families that contain several affected and unaffected individuals. In the context of position cloning, the reasons for this situation were explained above. In the VGIDMR method, families (particularly families where consanguinity is known, that is, relationship by a common ancestor), can also provide the most appropriate samples, but for entirely different reasons. Inbreeding causes direct, non-recombined inheritance of genetic elements that, alone or in association with other factors may have pathogenic effects. This property of consanguinity can be transformed into a considerable advantage in the search for genes directly associated with pathologies. In the presence of consanguinity, all sick individuals taken from three generations within the same family of disease transmission (or otherwise inbred) would be expected to carry the same locus causing disease, and therefore be identical by descent in this locus., 5.1 GENETIC HETEROGENEITY The various cell lines available for a given cell type (eg, hepatocytes) are characterized by similarities and functional differences (phenotypes) as well as by similarities and structural genomic differences (ie, phenotype). In each cell line the phenotype arises from a single source, that is, the genes expressed from this cell line. Samples of a given tissue that comes from different individuals are also characterized by phenotype and genotype. However, in tissue samples, the phenotype arises from the combined contributions of genes expressed from several different cell types and these contributions can not be isolated individually. The consequences of the above identification of genes according to the present invention are twofold, first the tissues are more useful for isolating genes linked to a defined phenotype in broad terms, such as, for example, the presence of a disorder that affects individual A but not to individual B. Tissue samples are less useful for isolating unknown genes associated with a narrower defined phenotype. Second, cell lines are most useful for isolating genes linked to a very clearly defined molecular function (for example, a particular form of DNA repair such as that performed by hDinP; see below) . The specific methods described below for isolating unknown genes from tissues and cell lines are therefore different. To elaborate further, it is useful to compare the genetic constitution and function in tissues and cell lines. Regarding function, it should be noted that all cells of a cell line population are of clonal origin. That is, they not only descend from a single individual cell, but are real copies of the cell of origin. All the cells of a cell line are therefore identical from a functional perspective. In contrast, a tissue sample is composed of many different types of cells that perform different functions, and each population of a given cell type are constituted by groups of functionally similar cells that have different lineages. As for the genetic constitution, different cell lines are of totally different origin. That is, the ancestral cells that caused different cell lines come from different individuals. Therefore the cell lines carry totally different genomes. In contrast, the various types of cells that make up a given tissue all share the same genome, regardless of the functional differences between the types of cells within the tissue. Members of a population of given cell lines initially have a very high internal consanguinity and a very high functional identity. However, due to the rapid growth in artificial conditions, the absence of selection pressures or even very few selection pressures and the impossibility of eradicating according to aberrant (ie, absence of immune system), members are free to accumulate mutations and transmit them to their direct descendants (insofar as these mutations do not compromise the basic metabolism). Therefore, a cell line population potentially carries a wide variety of newly acquired mutations. This not only reduces the structural genomic homogeneity of the population but also shows different members of the population expressing different forms of a given gene (ie, mutant alleles), as well as genes that are not expressed by other members of the population of the population. cell line (since a mutation in one gene could affect the expression of other genes). These effects may result in the presence of a wider spectrum of transcripts than initially could be expected from a population of homogeneous cells. Being aware of these effects allows us to control in some way paying attention to the growth conditions and the number of cell passages. These effects are exacerbated when different cell lines are used concurrently from a functional perspective. The original allelic forms and the distribution of genes in the genome of a first cell line will be different from what is found in the second cell line, but no cell line will be subjected to an applied internal genomic homogeneity. In addition, since the two cell lines are functionally different, the spectrum of transcripts expressed in a population will be different from the spectrum present in the other population. On the other hand, tissue samples comprising many cell types have a very high internal consanguinity but a very high functional diversity. In tissue samples unlike cell lines, genomic homogeneity is maintained by the individual's immune system since most aberrant cells are eradicated immediately. This application of genomic homogeneity is part of the immune system works to reduce the spectrum of transcripts found in tissues. However, the wide variety of cell types within a given tissue usually more than compensates for this effect. For example, different cell types frequently express different isoforms of a family of genes represented by several copies of genes in the genome (a phenomenon known as specific differentiation-expression). The net result is the presence of an increasing spectrum of different transcripts expressed in conformal tissues. the number of cell types rises. 5.1.1 GENETIC HETEROGENEITY IN CELLULAR LINES When starting with cell line samples, the genes of interest to be identified may have already been defined in terms of their precise molecular function (for example, see Section 6 below). The sources of genetic heterogeneity in cell lines are quite different than in tissues. First, there is a heterogeneity associated with genetic differences internal to each cell line. Second, there is a heterogeneity associated with the functional characteristics of each cell line. Third, there is a heterogeneity associated with genetic differences between cell lines. It is the solution (ie, removal) of the internal sources of heterogeneity in the first stage of the VGIDMR method together with the retention and full utilization of the other two sources of heterogeneity in the second stage leading to the direct isolation of the target genes from interest with the VGIDMR approach presented in Figure 1. That is, by retaining only structurally identical transcripts within each cell line, internal heterogeneity is removed. Then, by removing all identical transcripts between the two cell lines, only specific transcripts are left for the essential functions associated with the cell line expressing the phenotype of interest. The choice of appropriate cell lines is therefore crucial. The practical aspects of the unknown gene isolation from cell line samples are therefore fully defined in accordance with what is described in detail in Section 5.2.1 below. The first step in the approach used for cell lines isolates separately from each other. cell line nucleic acids (eg, transcripts) structurally identical internally. The second step employs the nucleic acids (eg, transcripts) from the non-specialized cell line (ie, "without phenotype" in Figure 1) to subtract their homologues (ie, structurally identical externally) from the specialized cell line (ie, "with phenotype" of interest see figure 1). The second stage used MutS, together with another collection system (for example, beads coated with streptadivine, see below), to recognize only the material that comes from the line of non-specialized cells (ie hybrid as well as native duplexes) . The remaining material at the end of the operation corresponds to the few nucleic acids (ie, transcripts) that are totally specific (ie, specific for differentiation) of the specialized cell line. 5.1.2 GENETIC HETEROGENEITY IN TISSUES When starting with tissue samples, unlike cell lines, genes of interest are usually defined only in terms of their phenotypic effects (ie as presence or absence of disease or trait). In addition, there is no complete certainty that in genetically different individuals, the same phenotypic traits do not have totally different causes. To further complicate the problem, the material used (for example mRNA) according to the second approach of the VGIDMR method comes from a complex source (in accordance with what is explained in detail above) where: (a) the tissues are made from different types of cells that can not be separated; (b) tissue samples are provided from different individuals. For tissue sample, there are three sources of genetic heterogeneity to be considered in the isolation of the genes of interest, including disease genes (affected). First, there is the heterogeneity associated with an objective tissue that comprises several types of cells. Second, there is heterogeneity associated with phenotypic differences between normal and affected individuals that do not cause disease. Third, there is heterogeneity associated with genetic differences between normal and affected individuals that cause disease. It is the solution (ie, removal) of the first source and the second source of heterogeneity that directly causes the isolation of disease genes using the VGID1 approach presented in Figure 2. By selecting as tissue donors from several members affected and several healthy members of the same genetic group (ie blood donors) and then by combining the • tissue extracts in only two groups, three objectives are achieved. First, the genetic differences between 5 affected individuals and unaffected individuals are greatly reduced; Second, the phenotypic homogeneity among affected individuals is greatly increased; and third, the genetic heterogeneities within each group of samples are homogenized. 10 The practical aspects of isolating unknown genes from tissue samples are therefore fully defined, in accordance with what is described in detail below in Section 5.2.2. The first step to use non-corresponding binding chromatography to isolate transcripts Structurally identical between affected individuals (ie, flow in column, see upper column in figure 2). These structurally identical transcripts are then used to isolate their structurally different counterparts from the unaffected combination in a second round of non-corresponding binding chromatography (see lower column in Figure 2). In this way, none of the structurally identical transcripts between the affected and unaffected combinations will be trapped by non-corresponding binding and none of the transcripts Structurally different within the unaffected (healthy) combination will be selectively recovered from the material released from the bond. 5.2. TWO APPROACHES TO THE VGIDMR METHOD The VGIDMR method is designed to identify genes by isolating nucleic acids derived from transcripts associated with a given phenotype in the total absence of relevant molecular information. In this context, a phenotype corresponds to a detectable biological difference between otherwise comparable tissues or samples of cell population. The biological differences can be located within a range of narrow well-defined metabolic functions (eg, DNA repair) to less well-defined clinical observations (eg, schizophrenia) or Alzheimer's disease). Unlike other methods of isolation of expressed transcripts (for example, cDNA subtraction technologies). the VGIDMR process does not require subtraction steps based on known sequences. In addition, the VGIDMR process does not require any molecular choice by the user. Instead, the user of VGID1 ^ requires only selecting the input phenotypes for comparison. The operating principle of the VGIDMR process makes use of the fact that any detectable biological difference that exists between two or more otherwise similar samples almost always depends, at least partially, on the presence of concomitant transcription differences between these samples. In order to isolate transcripts associated with a phenotype of interest using the VGIDMR method, no speculation is made as to possible structures that must be isolated or discarded. On the contrary, the input phenotypes are simply selected for use in the VGID ^ comparison test. While the VGIDMR method does not allow to directly identify mutations enunciated with promoter (ie by non-corresponding binding) contained in non-transcribed portions of genes any sub-expressed or over-expressed transcript as a result of said mutations can be identified (e.g., see the first approach described in Section 5.3.1 and the example in Section 6 below). In summary, the VGID ^ process allows the isolation and identification of transcripts on expressed, sub-expressed or mutated ones that differ specifically between two (or more) populations or between transcripts. The VGID1 method can be applied to any group of two or more nucleic acid source populations. The source populations of nucleic acids employed in the VGIDMR method are derived from transcript sources (ie, messenger RNA from cellular sources), preferably by converting the mRNA to double-stranded cDNA. Transcript sources include, but are not limited to, animals, plants and microorganisms including viruses. For example, the methodVGID1 MR can be applied for the isolation of microbial genes that provide resistance to toxic compounds or metabolites. With another example, the VGIDMR method can be applied to isolated plant genes that provide desirable traits for crop production. In the case of tissues and cell lines, the sources of transcripts may include, but are not limited to: (a) tissue nodes within a single tissue sample (first approach); (b) cell line samples (first approach); (c) tissue samples that come from family groups that have consanguinity (second approach). The first approach and second approach of VGIDMR that can be most commonly employed for these various sources of transcripts is described in detail below. 5.2.1 FIRST APPROACH: CELLULAR LINES WITH SINGLE TISSUE SAMPLE. This approach is especially suitable for the study of genes with specific metabolic functions, (for example in a cell line that presents the interest phenotype) or with disease processes in which the affected tissue samples are limited and where it is not possible to get control tissue from healthy individuals. This approach also allows a comparative study of sporadic forms compared to the familiar forms of a given pathology.
Three different but complementary transcript isolates can be carried out using the first approach of VGIDMR, in the following manner (i) isolation of transcripts over expressed (or expressed unilaterally) in the presence of the phenotype of interest; (ii) isolation of sub-expressed (or unilaterally repressed) transcripts in the presence of the phenotype of interest; and (ii) isolation of transcript variants (i.e., mutants) associated with the phenotype of interest. The overall experimental scheme for using the VGIDMR method in the first approach is illustrated in Figure 1. The first approach identifies a gene or several genes that form the basis of a phenotype defined in two steps by first removing the acid molecules non-corresponding duplex nuclei formed from the hybridization within each of the two source populations and, second, by retention of unequal duplex nucleic acid molecules formed from the hybridization between the two populations. The following is a preferred embodiment of the first approach; Several modifications can be made which will be apparent to a person experienced in the matter (for example, Section 5.3 below). The selection of input phenotypes is carried out by the user and can be performed as desired. However, next in Section 5.4, preferred guidelines for the selection of phenotypes are presented • (selection of transcript sources). After selection of the phenotypes, an independent (ie, separate) cDNA library is generated for each of the two or more sources of transcripts that differ in the phenotype of interest. Cell lines can be used as sources of transcription. Alternatively, a single tissue sample from an affected individual (ie, patient) can also be used. In this last step, different cellular nodes are isolated from within the first tissue sample, each node representing a different pathological stage or a different phenotypic state. In one modality, samples from sources of transcripts are processed in pairs, each member of a pair representing a different phenotype. In another modality, samples are processed in groups of three or more (a.k.a VGIDMR multiplex methodology). The two stages of the method can be subdivided into several parts for the purposes of clarification. For example, in the preferred embodiment described below, the first step comprises parts 1-3 and the second step comprises parts 4-6, in the following manner. Part 1. Each cDNA library that originates from each independent source (for example, cell line or Tissue node) is subjected to an amplification by limited polymerase chain reaction (15-20 cycles) in order to linearize the cDNA inserts. Part 2. The polymerase chain reaction products obtained from each source are denatured independently (ie, without still combining materials from different sources) and reformed. After parts 1 and 2, transcripts that exhibit structural differences within each source population will cause unequal heteroduplex molecules. Heterologous transcripts in these heteroduplex molecules arise from random mutations not associated with the phenotype of interest. This heteroduplex formation occurs since the mutations found must be common only to a part of the individual cells, and not to all the individual cells within each source population. Part 3. Reformed polymerase chain reaction products from each source are independently exposed to a non-corresponding first round of column chromatography (see the top of the two columns in Figure 1, for example, the columns may be packed with glass beads or coated with MutS to "automatically catch" the heteroduplexes containing non-corresponding elements In part 3, the uneven heteroduplexes are trapped in the column After several cycles of denaturation and random reformation after catching the uneven heteroduplexes, the column flow contains primarily transcripts structurally common to all cells within the source, ie heterologous transcripts within the source are removed in much of the material analyzed during part 3 (we can see the trash can superior in Figure 1 marked "removal of heterosexual transcripts Ologos "). Part 4. The cDNA inserts present in the flow obtained from each cell line are amplified and independently marked by polymerase chain reaction. In part 4, amplification by polymerase chain reaction has two objectives. First, this polymerase chain reaction increases the number of copies of the remaining individual cDNA inserts that come from each source population. Second, and more important, the polymerase chain reaction allows independent labeling of inserts that come from each source population. In this way, inserts that come from a given source population can be selectively removed or retrieved. For example, Figure 1 illustrates the use of two cell lines as source populations, with cell line number 1 presenting the phenotype of interest ("with phenotype" in Figure 1). Here, the inserts that were not part of the final analysis are marked for their removal (ie, transcripts not associated with the phenomenon of interest, see the trash can in the lower right part "removal of all molecules with strands. cell line number 2", see also below). The markers used in part 4 are fixed on the primers used in the relevant polymerase chain reaction. Suitable labels include molecules that can be specifically bound and then removed from solution along with their attached polymerase chain reaction products. For example three markers can be: (a) biotin molecules recognized by streptadivine coated on solid support; or (b) short peptides recognized by specific molecular antibodies fixed on solid supports. The solid supports used can be pearls, resins, microcellulose paper or others well known to those skilled in the art. Part 5. DNA amplified by polymerase chain reaction, obtained from independent sources and subjected to parts 1-4 are combined, denatured and reformed now. Part 6. Reformed polymerase chain reaction products are then exposed to a second round of column chromatography of non-corresponding elements (see the lower column in figure 1). In the approach of figure 1, the material trapped in the lower column is primarily uneven heteroduplexes composed of an unlabeled strand that comes from cell line # 1 and a marked strand that comes from cell line # 2. Therefore this material represents transcripts expressed by both cell lines but carry cell line specific mutations, and can either be discarded (see trash can in the lower right part of Figure 1) or retrieved, cloned and analyzed. It will be noted that if parts 2 and 3 were not carried out previously, the material trapped by the combination of sources in parts 5 and 6 would not be worth recovering since it would be heavily contaminated by random heterologies present in each cell line source . The recovery of heteroduplexes trapped from an unequal binding column of MutS can be carried out in at least two ways. First, the column can be filled with a regulator that contains ATP. The presence of ATP allows the ATPase activity of MutS to release trapped heteroduplexes. The range of ATP concentration suitable to effect the release is from about 1 mM to about 6 mM ATP; the optimum concentration of ATP to effect the release is approximately 3mM (see, for example, Alien et al., 1997, EMBO J.16, 4467-4476). The recovery of heteroduplexes trapped using ATP has the additional advantage of regenerating the column for subsequent use. Second, recovery can be effected using a protease (some proteases may not be suitable for use with certain short peptide markers). For example, the column can be treated with a regulator containing protease (proteinase K), which results in the destruction of immobilized MutS protein molecules in the column and the subsequent release of trapped heteroduplexes. The trapped material from the lower column in the example of Figure 1 is composed of a marked thread and an unlabeled thread. This material can be discarded if one is interested only in the transcripts of cell line number 1 (see garbage can in the bottom right in Figure 1). Alternatively, this material can be specifically recovered (for example, by using beads coated with antibody or with streptavidin, according to the marker used in part 4 above), for an examination of the genetic differences in the transcripts expressed by both entered cell lines. If this specific recovery is desired, the isolated material is amplified by polymerase chain reaction in some cycles for the production of clonable fragments having unlabeled 5 'ends. It is worth noting that the recovery retains here the original structures specific to each cell line since in the polymerase chain reaction, each strand of the original uneven heteroduplex independently provides a perfectly matched homoduplex. It is also possible to separately clone the transcripts that come from each source of cell lines. This is achieved by denaturing the heteroduplexes released from the column and subsequently linked to the marker linker (e.g., streptadivin beads), separating the pellet (which contains marked strands) from the supernatant (which contains unlabeled strands), and carrying out two polymerase chain reactions using the material in the pellet and the supernatant as separate tempers. The material not entrapped by the lower column in the scheme of Figure 1 (ie, through flow in columns) potentially contains three types of duplex DNA molecules without uneven elements, as follows. First, it may contain unlabeled homoduplexes that primarily represent transcripts over expressed or expressed unilaterally by the "unlabeled" cell line ie transcripts that have no counterparts in the "marked" cell line or that have very few counterparts in said "marked" cell line. Secondmay contain marked homoduplexes in both strands that primarily represent transcripts over expressed or expressed unilaterally by the line of "marked" cells (ie, transcripts that have no counterparts in the cell line "unlabeled" or that have very few counterparts in the "unlabeled" cell line). Third, it may contain labeled hybrid homoduplexes in one strand only that represent transcripts common to both cell lines, expressed in comparative levels. As in the case of uneven material, homoduplex hybrids labeled in unitary manner as well as double-labeled homoduplexes can be specifically removed from the solution leaving behind transcripts that come from the unlabeled cell line that have no counterparts in the labeled cell line. It will be noted that specific transcripts for the labeled cell line (ie, double-labeled homoduplexes) can not be isolated from transcripts common to both cell lines (ie, uncoupled homoduplex hybrids) according to the scheme illustrated in FIG. 1. To isolate these transcripts, the tagging strategy is reversed and the experiment is repeated. Alternatively, a totally different tagging strategy (ie, two-tag strategy) may be employed where the transcripts that come from the number 1 cell line are not left unchecked. Here the step of "marking" represented in Figure 1 is carried out in both column flows, using a different marker for each column. Thus, in the single experiment mentioned above, specific transcripts for a cell line (or tissue node) can be isolated from transcripts carrying mutations specific for cell line (or specific for nodes). It will further be noted that by the use of 2 or more different labeling agents (eg, biotin and one or more short peptides), the approach can be multiplexed. That is, by using multiple markers, several different cell lines or tissue nodes can be analyzed concurrently and transcripts specific to each component can be isolated individually. Multiplexing is limited only by the number of markers available and the user's imagination when selecting input phenotypes. 5.2.2 SECOND APPROACH: SAMPLES OF ORGANISMS THAT HAVE CONSECUTION. The second approach to employing the VGID-aMR method is especially appropriate for the isolation of founder effect mutations from samples of populations that have consanguinity such as at least one recent common ancestor. Below we present preferred modalities of the second approach; various modifications can be made which will be apparent to a person skilled in the art (for example, see Section 5., 3 below). In a preferred embodiment individuals of a population have a common father or grandfather. In another modality, individuals of a population share a common ancestor within three generations (ie, great-grandfather). In another modality, individuals of a population share a common ancestor within 10 generations. Here, obtaining control tissue samples from healthy family members is an absolute requirement. While the overall procedure illustrated in Figure 2 is similar to the first approach in that it employs a mismatch link, there are important differences in the second approach, as described below. • 10 First, as just mentioned healthy and affected individuals providing tissue samples should all share at least one recent common ancestor (ie, consanguineous individuals). Obviously, the markers of the population "affected" ("or sick") or "healthy" ("control") are arbitrary to the extent that two input populations differ in a phenotype of interest (and not necessarily a disease) is all that is required in the midst of all the individuals who contribute to the admitted populations have consanguinity. Second, at least two (2) affected relatives and two (2) healthy relatives must be sampled to obtain optimal results. For the best results the samples, the samples must be collected from 5 to 6 sick individuals and of an equal number of healthy individuals. Individuals do not have to all come from the same nuclear family (defined here as having a common mother or father) and do not have to present age correspondence. Third, cDNA libraries constructed from each tissue sample should be standardized in order to reduce the chances of losing rare transcripts. Library normalization techniques include any of the techniques known as, for example, those described in Section 5.3 below. Fourth, a homogenization step is carried out in the samples obtained from affected individuals. The homogenization is carried out in the following manner: a sample is obtained from each affected individual and said sample is used to construct an independent (ie, separate) cDNA library; each library is then amplified by polymerase chain reaction and the resulting products from all affected individuals are mixed. Denaturation, reformation and collection of uneven duplexes in a column of immobilized MutS is carried out (see upper column in figure 2). Even though this homogenization step will result in a 50% reduction in the frequency of heterozygous mutant transcripts in the flowing material, the step is preferred to ensure the isolation of transcripts structurally common to all affected individuals (see upper column flow in figure 2). The material recovered in the upper column flow is then labeled by polymerase chain reaction according to that described above. Fifth, a homogenization step similar to that performed in the affected samples does not apply to healthy samples (ie, control). Instead, the polymerase chain reaction material obtained from each of the control .ADNc libraries is mixed together and then added directly to the affected products, labeled by polymerase chain reaction obtained from the column flow top illustrated in Figure 2. This complex mixture is then denatured and randomly reformed before exposure to the lower MutS column illustrated in Figure 2. The main reason for this step is to offer an efficient balance for the effects of inbreeding. The more related the affected individuals are, the greater the number of structurally identical loci they have in common. As a result, the set of transcripts that remain after the homogenization step can be quite large, but it is likely that only some of these are relevant to the disorder. In addition, familial genetic disorders are frequently related to specific mutations that are frequent among affected members of the family that transmits the disease. However, this does not mean that unaffected individuals are exempt from mutation at the loci in question. It simply means that the unaffected individuals have inherited polymorphisms other than the polymorphisms associated with the disease and there could be several silent polymorphisms of this type. The net result of the above considerations is that while infected individuals within a family of disease transmission most likely share the same mutations, healthy members of the family do not necessarily have the same "healthy" alleles. Therefore, to identify the mutant loci associated with a familial disorder (together with healthy allelic forms) it is advisable to first isolate transcripts structurally common to all affected individuals (ie, reduce complexity by homogenization). At the same time, it is advisable to maintain as much diversity as possible within the control samples in order to optimize the opportunities to isolate all healthy allelic variants. Therefore, in approach 2, unequal heteroduplexes trapped by the second column (ie, column inferior to Figure 2) potentially have two sources: (a) unlabeled heteroduplexes with both strands originating from healthy individuals; (b) hybrid heteroduplexes marked on a strand that comes from affected individuals and that represent structurally common transcripts to all affected individuals that are also present in their healthy relatives with sequence difference. Thus, any mutant allele associated with the disease, as well as its "healthy" counterpart will be found in the trapped material. After release of the entrapped material with either ATP or proteinase K in accordance with that described above, the labeled strand can be specifically removed from the solution. The material flowing in the second column (bottom) in Figure 2 potentially contains: (a) individually marked duplexes free of mismatch representing structurally common transcripts to affected and unaffected relatives; (b) doubly marked duplexes free of lack of correspondence representing structurally common transcripts to affected relatives only; (c) unlabeled duplexes exempt from lack of correspondence representing transcripts present in unaffected relatives only. These can be specifically recovered by removing the solution from all labeled duplexes free of mismatch using a marker linker (eg, streptavidin-coated beads). It will be noted that specific transcripts for affected individuals only (ie, duplexes marked doubly free of lack of correspondence) can not be retrieved directly from the lower (second) column of Figure 2 using this approach. To isolate transcripts of this type, it would be necessary to reverse the tagging strategy and repeat the experiment (i.e., to mark during the lower left "polymerase chain reaction" represented in the diagram of figure 2). When only healthy individuals are labeled, however, the mutations associated with the disease can not be isolated from the uneven heteroduplexes trapped in the lower MutS column (disparate hybrids uniquely marked); in addition, the vast majority of trapped material will come from healthy individuals only because of the absence of selective recovery of transcripts structurally common to all healthy individuals (ie, homogenization). In addition, a step of selective recovery (ie, a first parallel column) to homogenize the population of nucleic acids can not be carried out in healthy relatives without a serious risk of losing the relevant alleles through the presence of silent polymorphisms that will generate numerous uneven heteroduplexes in the step of denaturation-reformation that would remain trapped (top trash can of Figure 2) in the first round of MutS chromatography. Alternatively, of course, in accordance with what is described for the first approach in Section 5.3.1 above, more than one marker may be used. It will be observed that the higher the inbreeding levels in the families that contribute to the normal and diseased samples, the lower the number of unequal loci obtained in the last instance. Even though all the unequal loci identified in this way will serve as markers to differentiate healthy individuals from diseased individuals, it will also be observed that genetic polymorphisms silences (ie, changes in the .DNA not harmful, not associated with the disease) will also be identified. Therefore, the best results will be obtained to identify disease genes using highly inbred populations since inbreeding reduces the number of silent genetic polymorphisms between sources entered to a minimum. Genetic loci identified by the above procedure can be used as probes in the population study performed by the standard immobilized MutS genotyping approach in genomic DNA obtained from affected individuals and healthy individuals (see Wagner et al., 1995, Nucí Acids Res. 23, 3944-3948). Subsequent statistical analyzes, well known to those skilled in the art, will readily identify loci and alleles associated with susceptibility and disease resistance. In summary, the second VGIDMR approach offers numerous advantages for searching for disease-causing genes from consanguineous sample populations. First, the approach transforms highly inbred populations into an asset rather than a liability. Second, the rapid identification of genes in cases in which the lack of physiological and / or biological information is such that there is no basis for proposing possible candidate genes. Third, the approach allows rapid identification of genes and all alleles directly and indirectly associated with susceptibility and resistance to disease. Fourth, the approach can be applied to any consanguineous population, in many contexts, that fall within a range of the search for susceptibility or resistance genes associated with multifactorial diseases to search for rare genes that provide desirable monogenic traits. The number of clones sequenced from the output obtained under any approach of the VGIDMR md is as desired by the user. Optionally, one or several clones (for example five or six) among those initially identified are sequenced to sample the results. The nucleic acid sequences are then analyzed by computer to determine open reading frames, and used to drive a protein database search to determine if a portion corresponds to a known portion of protein. Preferably, this search is carried out by transferring the nucleotide sequences in the six possible reading frames (three in • each address) in order to detect any existing protein in the database. Obviously, the VGIDMR 5 md will also identify genes not yet represented in any database. In this case, the function of the gene can be inferred from the functional differences observed between the input phenotypes. 5.3. Miscellaneous mds used in combination with the The VGID md is followed by nucleic acid extraction (for example mRNA) and cDNA synthesis using techniques well known to those skilled in the art. For example, the Gibco-BRL Trizol kits can be used for the preparation of mRNA and the Promega Universal Riboclone kits can be used for the synthesis of cDNA both in accordance with the protocols of the manufacturers. The cDNA synthesized can be selected according to size by any of the techniques well known to those skilled in the art. By For example, agarose gel electrophoresis, sucrose density gradient chromatography, molecular tint chromatography or high performance liquid chromatography can be used. The cDNA fragments subsequently cloned can be located within a range than 100 bases up to 10 kilobases or more. However, it will be recognized that the optimal size for an error-free polymerase chain reaction is approximately 600 bases. It will further be noted that the optimal size for a reverse transcription without error is approximately 400 bases. A suitable viral reverse transcriptase is that obtained from the Maloney murine leukemia virus (MMLV). If cDNA is fractionated by agarose gel electrophoresis, it can be recovered from gel sections using several well-known techniques. For example, the fragments can be collected by diffusion overnight in a small volume, or by using one of several commercially available kits, for example Gel-Clean (Promega) or QlaQulck (Quiagen). There are no special considerations when selecting a vector of a cDNA library. The VGIDMR method will work independently of the specific library vector used. Frequently, the best vector will be the vector with which the user is most familiar.
Obviously, the most important consideration for the best results will be to make sure that the libraries constructed represent rare transcripts as well as abundant transcripts, for example through the normalization of libraries. The library inserts are amplified by polymerase chain reaction using oligonucleotide primers (oligonucleotides) specific for the cloning vector. The labeled oligonucleotides are used as suitable for the particular experimental design employed. For example, in the first VGIDMR approach (see Figure 1), the oligonucleotides used for the # 2 cell line library are labeled with biotin (see example in Section 6). Any thermostable polymerase can be used, but polymerases having the least number of errors available are preferred to reduce the number of mismatches created during the polymerase chain reaction. Examples of suitable enzymes are Taq DNA polymerase and Pfu DNA polymerase, it is important to remember that large numbers of cycles are not required since the goal is simply to produce linearized fragments (and, if necessary, marked) from the library. The polymerase chain reaction products are column purified, thermally denatured, formed, and cooled to room temperature. The subtraction of heteroduplex DNA is carried out in cooled polymerase chain reaction products, renatured using mismatch link chromatographies. This can be carried out comfortably in various formats including and a test tube format, a column format, or any other user-selected format that allows a heteroduplex DNA to bind on an immobilized, unequal binding protein. . For example, the .ADN in a reaction regulator (for example 350 ng in 100 μl) can be placed in a container (for example a 0.5 ml Eppendorf tube) containing MutS (for example 10 μg) absorbed in glass beads ( for example, 100 μm diameter, acid washed, from Sigma Chemical Co.). The incubation phase is carried out for a period of time sufficient to allow the non-corresponding binding to occur (for example 15-55 min). The incubation time may vary according to the measures taken to increase the contact surface area between the immobilized unequal binding protein and the reformed cDNA. Such measures may include slow rotation of the container or placement of the container in a horizontal position. The reformed polymerase chain reaction products, unbound left free in solution can be recovered as a column flow (in a column format) or as a supernatant after centrifugation (in a specimen format). It is often helpful to repeat the unequal binding protein-mediated harvesting operation using fresh immobilized protein a total of two to four times to ensure the removal of all uneven heteroduplexes. The optimal number of repetitions that is required will depend primarily on the relative amounts of uneven heteroduplexes to be trapped and the amount of protein available for collection in each round. It will be helpful, in some cases, to use an excess of DNA from a source when a subtraction is carried out. For example, before carrying out the second round of collection using the approach illustrated in Figure 1, an excess of DNA can be used from the source without the phenotype of interest (ie cell line # 2) in DNA from the source with the phenotype of interest (ie cell line # 1) in order to ensure the complete removal of all transcripts that are identical between the two sources. Regarding this aspect, the source without the phenotype of interest can be considered as a molecular broom to remove the unwanted transcripts. The excess .DNA ratio can vary over a wide range, that is, from 1.01: 1.0 to 100: 1. Frequently it will be located within a range of 1.1: 1.0 to 10: 1. Most often it will be within a range of 1.5: 1 to 6: 1. A recommended start ratio is 3: 1. For better results to obtain .ADNc representing rare transcripts encoding the phenotype of interest, a preparation of normalized cDNA libraries is carried out from each source of input mRNA. A portion of each input library must be retained in a non-standardized form for further analysis, if desired. Known standardization techniques include, but are not limited to F they are described in the following documents: Soares and Efstratiadis, June 10, 1997, standardized cDNA libraries 5, US Patent number 5,637,685; Sankhavara et al., March 1991, construction of a .ADNc library of uniform (standardized) abundance, Proc. Natl. Acad. Sci. USA 88, 1943-1947; and Ko, 1990, a cDNA library matched by the reassortment of cDNA from short double strand, Nucí. Acids Res. 18, 5709. Suitable mismatched binding proteins that may be employed have been previously described (see, for example, Wagner, May 11, 1995, mismatched immobilized binding protein for detection or purification of mutations or polymorphisms, international publication number WO 95/12689). A preferred mismatch protein is characterized by its ability to bind DNA-DNA duplexes containing uncoupled or miscoupled bases (Id. At 13). For example, in addition to E. coli the bad link protein The correspondence can be human MSH2 (Fishel et al., 1994, Science 266, 1403-1405, Fishel et al., 1994, Cancer Res. 54, 5539-5542, Mello et al., 1996, Chem. Biol. 579-589), a protein complex hMSH2-hMSH6 (Archaya et al., 1996, Proc. Natl Acad. Sci. USA 93, 13629-13634; Gradia et al., 1997, Cell 91, 995-1005), or homologues of several other organisms such as yeast (Miret et al., 1993, J. Biol. Chem. 268, 3507-3513). Suitable conditions for formation reactions (i.e., hybridization) have been described for example by Sambrook et. al., 1989, in Molecular Cloning, A Laboratory Manual, (molecular cloning, a laboratory manual), second edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. The separation of marked strands of unlabeled strands or of strands marked differently is carried out using standard techniques. For example, biotin-labeled strands attached to streptavidin-coated beads can be placed in a first container for thermal denaturing in individual strands. After denaturation, the supernatant from the first container is removed and transferred to a second container, resulting in the separation of the marked strands from the unlabeled or otherwise marked strands. Each set of strands can now be amplified independently by polymerase chain reaction (during some cycles), cloned and sequenced. Suitable nucleic acid markers and their partner molecules or agents (ie, binding partners), include, but are not limited to, biotin and streptavidin, and short peptide markers as well as monoclonal antibodies. These are discussed in Section 5.8 and 5.9 below. Suitable linearization methods of inserts in a cDNA library include, but are not limited to, polymerase chain reaction and digestion with restriction enzyme (s). Suitable methods of cDNA amplification include, but are not limited to, polymerase chain reaction and propagation in bacteria. 5.3.1 AMPLIFICATION OF DNA The polymerase chain reaction (PCR) can be used in connection with the invention to amplify a desired sequence from a source (eg, a tissue sample, a genomic or cDNA library). Oligonucleotide primers representing known sequences can be used as primers in polymerase chain reaction. The polymerase chain reaction is typically carried out by the use of a thermal cycling apparatus (eg, from Perkin-Elmer Cetus) and a thermostable polymerase (eg, Gene AmpMR from Taq polymerase). The tuning of the nucleic acid to be amplified may include, but is not limited to, mRNA, cDNA or genomic DNA of any species. The polymerase chain reaction amplification method is well known in the art (see, for example, U.S. Patent Nos. 4,683,202, 4,683,195 and 4,889,818.; Gyllenstein et al., 1998, Proc. Nat'l. Acad. Sci. U.S.A. 85, 7652-7656; Ochman et al., 1988, Genetics 120, 621-623; Loh et al., 1989, Science 243, 217-220). Any progariotic cell, eukaryotic cell, or virus can serve as a source of nucleic acids. For example, nucleic acid sequences can be obtained from the following sources: human, porcine, bovine, feline, bird, equine, canine, insect (e.g., Drosophila), invertebrate (e.g., C. elegans), plant, etc. The .ADN can be obtained by standard procedures known in the art (see, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, second edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York; Glover (ed.), 1985, ADN Cloning: A Practical Approach, MRL Press, Ltd., Oxford, U.K. Vol. 1, 2). 5.3.2 ADJUSTMENT OF THE STRENGTH OF THE CONDITIONS Other methods available for use in relation to the methods of this invention include hybridization of nucleic acids under conditions of low, moderate or high stringency (for example, Northern and Southern blotting). Methods for adjusting the stringency of hybridization are known in the art (see, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, second edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York; see also, Ausubel et al., eds., in the series of technical laboratory manuals "Current Protocols in Molecular Biology", 1987-1994, Current Protocols, 1994-1997- John Wiley and Sons, Inc. See, especially, Dyson, NJ, 1991, Immobilization of nucleic acids and hybridization analysis (nucleic acid immobilization and hybridization analysis) in: Essential Molecular Biology: A Practical Approach (Essential Molecular Biology: A Practical Approach), Vol. , TA Brown, ed., Pp. 111-156, IRL Press at Oxford University Press, Oxford, UK, each of these references is hereby incorporated by reference in its entirety, salt concentration, melting temperature, absent The presence or absence of denaturing agents, and the type and length of nucleic acid to be hybridized (for example, DNA, RNA, NPC) are some of the variables that must be taken into account when adjusting the strict nature of a particular reaction of hybridization according to methods known in the art. The slightly strict conditions, exemplary and not limiting, may be the following (see also Shilo and Weinberg, 1981, Proc. Natl. Acad.Sci.U.S.A. 78, 6789-6792). Filters containing DNA are previously treated for 6 hours at a temperature of 40 ° C in a solution containing 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 μg / ml of denatured salmon sperm DNA. Hybridizations are carried out in the same solution with the following modifications: 0.02% PVP, 0.02% Ficoll, 0.02% BSA, 100 μg / ml of salmon sperm DNA, 10% (weight / volume) of dextran sulfate, and 5-20 x 106 cpm of 32 P labeled probe. The filters are incubated in the hybridization mixture for 18-20 hours at a temperature of 40 ° C, and then washed, for 1.5 hours at a temperature of 55 ° C in a solution containing 2X SSC, 25 mM Tris-HCL ( pH 7.4), 5 M EDTA, and 0.1% SDS. The wash solution is replaced with a fresh solution and incubated for an additional 1.5 hours at a temperature of 60 ° C. The filters are dried and exposed for autoradiography. If necessary, the filters are washed a third see at a temperature of 65-68 ° C and exposed again to film. The highly stringent conditions, by way of example and not by way of limitation, may be the following. The pre-hybridization of the DNA-containing filters is carried out for a period of 8 hours to 1 night at a temperature of 65 ° C in a regulator composed of 6X SSC, 50 mM Tris-HCL (pH 7.5), 1 mM EDTA , 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 μg / ml of denatured salmon sperm DNA. The washing of the filters is carried out at a temperature of 37 ° C for one hour in a solution containing 2X SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA. This is followed by a wash in 0.1X SSC at a temperature of 50 ° C for 45 minutes before autoradiography. 5 . 3 . 3 OLIGONUCLEOTIDE ANALOGS Oligonucleotides used in combination with the invention are frequently within a range of 10 to about 50 nucleotides in length. In specific aspects, an oligonucleotide is 10 nucleotides, 15 nucleotides, 20 nucleotides or 50 nucleotides in length. An oligonucleotide may be DNA or RNA or chimeric mixtures or derivatives of modified versions thereof, or they can be of single strand, of double strand, or partially of double strand. An oligonucleotide can be modified in the base portion, the sugar portion, or the phosphate structure, or a combination thereof. An oligonucleotide may include other adjoining groups such as biotin, fluorophores, or peptides. An oligonucleotide may comprise at least one modified base portion selected from the group including, but not limited to: 5-flurouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5- ( carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2- methylguanine, 3-methylcytosine, 5-methylycytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6 -isopentenyladenine, uracil-5-oxyacetic acid (v), pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methyl ester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3- (3-amino-3- -2 -carboxypropyl) uracil, and 2,6-diaminopurine. An oligonucleotide may comprise at least one modified phosphate structure selected from the group including, but not limited to, a phosphorothioate, a phosphorodithioate, a phosphoramidothioate, a phosphoramidothioate, a phosphoniamidate, a methylphosphonate, an alkyl phosphotriester, and a formacetal or analog. thereof. An oligonucleotide or derivative employed in combination with the methods of this invention can be synthesized using any method known in the art, for example, by the use of an automated DNA synthesizer (for example those found commercially in Biosearch, Applied Byosistems, etc.). For example, phosphorothioate oligonucleotides can be synthesized through the method of Stein et al. (Stein et al., 1988, Nucí Acids Res. 16, 3209), methylphosphonate oligonucleotides can be prepared by the use of controlled pore glass polymer supports (Sarin et al., 1988, Proc. Natl Acad. Sci. USA 85, 7448-7451), etc. An oligonucleotide can be an alpha-anomeric oligonucleotide. An alpha-anomeric oligonucleotide forms hybrids of two specific strands with complementary RNA where, unlike the usual beta units, the strands are parallel to one another (see Gautier et al., 1987, Nucí Acids Res. 15, 6625- 6641). Oligonucleotides can be synthesized using any method known in the art (eg, standard phosphoramidite chemistry in an .ADN Applied Biosystems 392/394 synthesizer). In addition, reagents can be obtained for the synthesis, from any of the commercial suppliers. Spacer phosphoramidite molecules can be used during oligonucleotide synthesis, for example, to bridge sections of oligonucleotides where base coupling is not desired or to place markers away from an oligonucleotide portion subject to base coupling. The length of the spacer can be varied by consecutive additions of spacer phosphoramidites. The spacer phosphoramidite molecules can be used as modifiers of 5 'or 3' oligonucleotides. Such spacers include Spacer Phosphoramidite 9 (i.e., 9-O-dimethoxytrityltriethylene glycol, 1- [(2-cyanoethyl) - (N, N-diisopropyl)] -phosphoramidite, and Spacer Phosphoramidite 18 (i.e., 18-0-dimethoxytrityl- hexaethylene glycol, l - [(2-cyanoethyl) - (N, N-diisopropyl)] -phosphoramidite), both available from Glen Research (Sterling, Va.) Other spacers are available for use in standard oligonucleotide synthesis. the Spacer Phosphoramidite C3 and dSpacer Phosphoramidite can be used to destabilize undesirable autohybridization events within the capture oligonucleotides or to destabilize false hybridization events between incorrectly corresponding annealing / probe complexes., when placed at the 3 'end of an oligonucleotide, they will also prevent the generation of incorrect extension products when included in a polymerase chain reaction mixture. A spacer available from Glen Research, Spacer Phosphoramidite C3 (ie, 3-O-dimethoxytrityl-propyl-1 - [(2-cyanoethyl) - (N, N-diisopropyl)] -phosphoramidite), can be added to replace an unknown base within an oligonucleotide sequence. A branching spacer can be employed as a method to increase the incorporation of a marker into an oligonucleotide. Said branching spacer can also be used to increase cable signals by hybridization through multiple branched capture probes or polymerase chain reaction primers. Branching spacers are available, for example, at Glen Research.
Biotinylated oligonucleotides are well known in the art. An oligonucleotide can be biotinylated using a biotin-NHS ester process. Alternatively, biotin can be fixed during oligonucleotide synthesis using biotin phosphoramidite (Cocuzza, 1989, Tetrahed Lett 30, 6287-6290). A biotin phosphoramidite of this type available from Glen Research is l-dimethoxytrityloxy-2- / N-biotinyl-4-aminobutyl) -propyl-3-O- (2-cyanoethyl) - (N, N-diisopropyl) -phosphoramidite. This compound also has a branching point to allow subsequent additions. The branched spacer used in this biotin phosphoramidite has been described by Nelson et al. (Nelson et al., 1992, Nucí Acids Res. 20, 6253-6259). Another 5 'biotin phosphoramidite, namely [I N- (4,4, -dimethoxytrityl) -biotinyl-6-aminohexyl] -2-cyanoethyl- (N, N-diisopropyl) -phosphoramidite, can be used to biotinylate a oligonucleotide. This compound is sold at Glen Research under license from Zeneca PLC. Fluorescent dyes can also be incorporated into an oligonucleotide using dye-labeled phosphoramidites. Two markers of this type are 5 '-Hexachloro-Fluorescein Phosphoramidite (HEX), and 5'-Tetrachloro-Fluorescein Phosphoramidite (TET), both available from Glen Research. 5.4 SELECTION OF PHENOTYPE TO OPTIMIZE THE VGIDMR METHOD.
The best results are obtained with the VGID-.MR method when special attention is paid to phenotype selection (see figure 3). The following is a preferred but not limiting phenotype selection method. The process of phenotype selection usually begins with a review of the literature. This includes the review of biological literature, medical literature, chemical literature, published bioassays as well as clinical data in relation to a phenotype of interest and with any phenotype to be compared with the phenotype of interest (ie, subtracting from the phenotype of interest). Regarding this aspect, the reference to the most current edition of a catalog of known genetic disease can be carried out to start the review of the literature. For example, a catalog of human phenotypes is as follows: McKusick, Victor A., Mendelian Inheritance in Man, Catalog of Autosomal Dominant, Autosomal Recessive, and X-Linked Phenotypes (10th edition, 1992, The Johns Hopkins University Press, Baltimore , Maryland) (hereinafter "MIMMR"). MIM1 ^ is also available in a continuously updated online version (below "OMIMMR"), which can be accessed at no cost by contacting OMIM1 ^ 1 user support, Welch Memorial Library, 1830 East Monument Street, Third Floor, Baltimore, Maryland 21205, or via email to omimhelp @ welch. jhu edu In general, MIMMR and OMIMMR comprise a catalog with one entry per human gene locus, which • gene has been associated or not with a particular disease. Each entry, usually one or two paragraphs, offers 5 information that has the following components (when the information is available): (a) title, including synonyms in parentheses; (d) description of the phenotype or gene product; (c) nature of the basic defect in any associated disorder; (d) diagnostic description and disease management, when applicable; (e) genetic aspects, including map information; (f) allelic variants; (g) reference. Search assistants in MIMMR include an author index and a title index. The content of OMIM ^, besides being the most current data available in these catalogs, can be searched entirely by computer. There are almost 6,000 entries in the twelfth edition of MIMMR (1992). Therefore, if the usual consideration is established that there are perhaps 100,000 human genes, this catalog represents only 6% of the total.
Therefore, the vast majority of genes identified using the VGIDMR method will not be represented. Nevertheless, existing entries may include related phenotypes and / or references that offer information as to the genetic nature of the phenotype of interest. 25 Other useful sources of information include collections of computer magazines such as MedlineMR. In the case of disease phenotypes, internal medicine manuals can also be consulted (see Isselbacher et al., Eds., Harrison's Principies of Infernal Medicine, thirteenth edition, McGraw-Hill, Inc., New York). The physician skilled in the relevant art generally knows the sources of literature that he should review. Of course, it is not required that a phenotype be recognized in the literature as having a genetic component for the VGIDMR method to identify genes associated with the phenotype. In fact, it could be the absence of this published recognition that leads the doctor to ask which genes are identified using the VGIDMR method, in this aspect, the knowledge or belief of the doctor is an important factor to take into account at the time of select the phenotype. A given phenotype of interest can be quite complex and will often be polygenic (see comments in Section 2.1, above). In one embodiment, the VGIDMR method may involve one of the two approaches presented above or a combination of said two approaches. Remember that one approach, presented schematically in Figure 1, involves carrying out the VGIGMR method using phenotypic groups defined by sources not known to share a common ancestor (for example, most cell line samples). The other approach, presented schematically in Figure 2, involves carrying out the VGIDMR method using phenotypic groups defined by samples obtained from sources known to share at least one common ancestor (ie, inbred sources). A review of the phenotype selection process is presented in Figure 3. In this figure, the word MEDICAL represents a person skilled in the relevant biological technique (ie, specialists in genetics, microbiology, virology, endocrinology, plant molecular biology, pathology , physiology, surgeon, person with a postdoctorate, graduate, research technician); the LITERATURE SEARCH concept represents a review of the relevant literature made by the doctor; and the concept SELECTION OF PHENOTYPE represents the identification of appropriate biological samples by the doctor after having taken into account the literature search and the personal knowledge of the doctor. 5.4.1 TISSUE SAMPLE COLLECTION The tissue samples are typically collected using methods well known to those skilled in the art. For example, to identify genes involved in colon cancer, the gastroenterologist or the endoscopy specialist can collect healthy or diseased biopsy specimens using an endoscope. Common sense is the guiding principle here. The VGID ^ method will provide the best results when normal and diseased samples are systematically and totally defined using objective criteria. 5.4.2 CELLULAR CULTIVATION When the VGIDMR method is used with cell lines as input phenotypes, the most extreme care is advised. The specific conditions used to cultivate will profoundly influence gene expression in virtually all cell lines. For example, the steroid hormone aldosterone influences gene expression virtually in all cell lines. For example, the steroid hormone aldosterone influences the expression of genes important for the absorption of salt by the epithelium, such as the A6 cell line derived from Xenopus laevis. The concentration of hormones and growth factors can vary over a wide range of commonly used supplements (for example, fetal calf serum or newborn calf serum). Therefore, attention must be paid to the control of these variables. If problems arise, you should consider reserving specific batches of all the ingredients used. In addition, chemical analyzes of specific components may also be required as part of the standardization process. This concern regarding the control of growth conditions is not limited to hormones and growth factors. Genetic expression can be influenced by basic parameters such as duration between passage, incubation temperature, pH, and the like. Therefore, the identification of genes with the VGIDMR method using cell lines as input sources will be optimized and improved through careful attention to the definition and constant maintenance of the culture conditions of the cells associated with the phenotype of interest. . This is also true in the case of phenotype or phenotypes to be compared (ie, subtracted). He • 10 animal cell culture has been described by numerous references in the literature. The search of literature carried out to select input phenotypes should be focused, in base, on the decision of the optimal conditions of cell culture. For a broad view of the techniques of cell culture and other relevant considerations, see Freshney, R.I., 1994, Culture of Animal Cells, A Manual of Basic Technique, third edition, John Wiley and Sons, Inc., New York, New York. 5.5 DETECTING FAULTS IN THE VGIDMR METHOD 20 If a given genotype of interest is initially resistant to the approaches described above to employ the VGIDMR method to identify a gene of interest, the following discussion of failure detection may be useful. A phenotype of resistant interest can be indicated by the identification of absence of genes, or the identification of excess genes (for example more than 100), in the appropriate set in the selected experimental design. Consider the case in which an initial screening does not identify a genetic component associated with a given phenotype of interest. In this case, you must take special care to redefine the nucleic acid populations defined by the input phenotypes. For example, a synergistic effect between one or more genes and an environmental factor may be required for the manifestation of the phenotype of interest. In this case, it is desirable to identify and control any environmental factors present. In this way, a weak genetic determinant for a given phenotype of interest can be reinforced by a careful modification of the inclusion criteria in a phenotypic group. Several biological assays can also be used to further define a phenotypic group. Examples of such tests are presented in the next section. 5.6 TEST FOR SELECTION OF PHENOTYPE Enzymatic and receptor-based biological assays can also be used to further define a phenotype initially resistant to gene identification with the VGID * 1 * method. This definition is directed towards the exclusion of individuals from the population who can not contribute to the genotype and which, therefore, could be beneficial to exclude from the gene identification test.
The eventual therapeutic use or eventual therapeutic uses resulting from the identification of the genes may (n) serve as a guide to select relevant biological assays known in the art. For example, bioassays selected for the additional definition of the schizophrenia phenotype may involve a panel of central nervous system receptors involved in this disease. There are many available sources that describe enzymatic or receptor assays. An example is the series Methods in Enzymology (methods in enzymology) published by Academic Press. An expert in the field will know the most appropriate tests to define the phenotype to enter. For example, to employ the VGID® method in a neurological disorder with a genetic component, the relevant bioassays may include assays to determine the activity of adrenergic receptors, cholinergic receptors, dopamine receptors, GABA receptors, glutamate receptors, monoamine oxidase, synthetase of nitric oxide, opiate receptors, or serotonin receptors. In the case of cardiovascular disorders, appropriate assays may include adenosine Ai receptors, adrenergic receptors (including cti, a2, ßi), inhibition of angiotensin I, platelet aggregation, blockage of ion channels (eg calcium channels, chloride channels), measurement of cardiac arrhythmia, blood pressure, heart rate, contraction or hypoxia. In the case of a metabolic disorder, the following bioassays can be used: serum cholesterol, serum HDL, HDL / serum cholesterol ratio, HDL / LDL ratio, serum glucose, caluresis, saluresis, or change in urine volume. In the case of allergic or inflammatory disorder, the following bioassays can be used: Arthur reaction, passive cutaneous anaphylaxis, bradykinin B2, contraction of the trachea, antagonism to histamine Hi, effects of carrageenan on migration of macrophages, antagonism of leukotriene D, antagonism of neurokinin NKi, or well cytokine assays (eg, interleukins or macrophage inhibition proteins). In the case of gastrointestinal disorders, the following bioassays may be used: cholecystokinin CCKA antagonism, cholinergic antagonism, gastric acidity, or serotonin 5-HT3 antagonism. The following lists provide simply exemplary essays. An expert in the field may choose a relevant bioassay or a set of bioassays for use in the definition of a phenotype. 5.7. DISEASES, DISORDERS, AND OTHER PHENOTYPES The various phenotypes for which genes can be identified using the VGID® method include, but are not limited to, the following disorders, diseases, and phenotypes. Examples of disease states include the following: acquired immunodeficiency syndrome (AIDS), angina, arteriosclerosis, arthritis, asthma, high blood pressure or low blood pressure, bronchitis, cancer, cholesterol imbalance, cerebral circulation, coagulation disorders, cirrhosis, depression, dermatological disease, diabetes, diarrhea, diuresis, dysmenorrhea, dyspepsia, emphysema, gastrointestinal depression, hemorrhoids, hepatitis, hyperprolactinemia hypertension, immunomodulation, resistance to bacterial infection, resistance to viral infection, inflammation, insomnia, lactation, lipidemia, migraine, prevention of pain or pain management, peripheral vascular disease, platelet aggregation, pre-menstrual syndrome, prosthetic disorder, elevated triglycerides, respiratory tract infection, retinopathy, sinusitis, rheumatic disease, wound healing, tinnitus, urinary tract infection as vein insufficiency sa Other phenotypes include, but are not limited to, cardiovascular disorders, nervous system disorders, memory improvement, hypercholesterolemia, immune system stimulation, anti-inflammatory, antipyretic, analgesic, agents to slow down the aging process, accelerated convalescence, anemia, Indigestion, impotence and menstrual disorders. Preferred phenotypes include, but are not limited to, phenotypes of plant resistance (e.g. resistance to herbicides or predatory insects), microorganism resistance phenotypes (e.g. resistance to antibiotics), cancer (e.g., breast, prostate) ), osteoporosis, obesity, type II diabetes, as well as prion-related diseases (for example, bovine spongiform encephalitis, Creutzfeldt-Jakob disease). 5.8. LINK OF OLIGONUCLEOTIDES ON SPECIFIC LINKED LIGANDS The present invention incorporates a method for the use of identification markers (ie, specific binding ligands recognized by specific receptors) to facilitate the identification and isolation of sequences of interest from a mixture complex In order to classify unknown sequences specific to a given cell or phenotype from a mixture of sequences from different sources, it is important to be able to identify sequences from several sources. A method to mark each different source of .ADNc with a unique identification marker is achieved by the use of labeled PCR oligos. The polymerase chain reaction oligonucleotide primers used to generate the prime material used in the VGID® assays are labeled with a specific binding ligand. Labeling includes fixing the ligand on the oligonucleotide stably. In a preferred embodiment, the ligand is fixed on the initiator via a covalent bond. Methods for fixing the initiator on the ligand are well known in the art. Several oligonucleotide markers include biotin, avidin or streptavidin and its derivatives, lectin, carbohydrate, peptide, hapten or immunological material. Oligonucleotides can be labeled with a high variety of labels for use in the various embodiments of the invention. For example, European Patent Publication No. EP 0370 694 A2, entitled, "Diagnostic Kit and Method Using a Solid Phase Capture Means for Detecting Nucleic Acid", (Diagnostic kit and method employing a solid phase capture medium for detect nucleic acid), by Burdick and Oakes, publication date May 30, 1990, discloses methods for ligating markers on oligonucleotides. In a preferred embodiment, the oligonucleotides are labeled with peptides. Methods for fixing peptides on oligonucleotides are well known to those of ordinary skill in the art, for example, see, 1) preparation and characterization of antisense-peptide oligonucleotide hybrids containing viral fusion peptides. Soukchareium et al., 1995, Bioconjug. Chem. 6 (1), 43-53; 2) Preparation of oligonucleotide-peptide conjugates, Tung, et al., 1991, Bioconjug. Chem. 2 (6), 464-465; 3) Ligation directed by annealing peptides on oligonucleotides, Bruick et al., 1996, Chem. Biol. 3 (1), 49-56; 4) Double-specific interaction of .RNA of HIV-1 ART with Tat-oligonucleotide peptide conjugates, Tung et al., 1995, Bioconjug. Chem. 6 (3), 292-295; 5) synthesis and enzymatic stability of peptide-oligonucleotide hybrids linked with phosphodiester, Robles et al., 1997, Bioconjug. Chem. 8 (6), 795-788; and 6) covalent protein-oligonucleotide conjugates for efficient administration of antisense molecules, Rajur et al., 1997, Bioconjug. Chem. 8 (6), 935-940. Oligonucleotides linked to various peptides for use in the methods of this invention can be maintained as for example, from Cybergene S.A. (11 rue Claude Bernard, zl nord, 35400, Saint Mallo, France) and Glen Research (22825 Davis Drive, Sterling, Virginia 20164). Additional information Glen Research can be obtained through its website (www.glenres.com). A specific method for linking a peptide on an oligonucleotide recommended by Glen Research is as follows (see also, www.glenres.com). A heterobifunctional crosslinking reagent is employed to bind a synthetic peptide having an N-terminal lysine residue on a 5"thiol-modified oligonucleotide said crosslinking reagent is phenyl ester of N-maleimido-6-aminocaproyl- (2") -nitro, 4-sulfonic acid) (mal-sac-HNSA). The sodium salt of mal-sac-HNSA is available from Bachem Bioscience. Conveniently, the reaction of the mal-sac-HNSA crosslinker with an amino group liberates a dianion phenolate (i.e., l-hydroxy-2-nitro-4-benzenesulfonic acid). This dianion phenolate is also a yellow chromophore. The chromophore characteristic provides (i) a means to quantify the magnitude of the coupling reaction (where a greater intensity of yellow color corresponds to a more complete coupling reaction), and (ii) an auxiliary to monitor the magnitude of the separation of an activated peptide (ie, a peptide crosslinked on mal-sac-HNSA and ready to come into contact with a 5"thiol-modified oligonucleotide) from the free cross-linking reagent during gel filtration. When a mal-sac-HNSA cross-linker is employed, it may be as follows: First, a peptide having an N-terminal lysine is synthesized. Alternatively, a peptide having an internal lysine may be employed since the lysine epsilon amino group is Second, an oligonucleotide having a 5"-thiol group is synthesized using methods known in the art. Third, the peptide reacts with an excess of mal-sac-HNSA in a sodium phosphate buffer (pH 7.1). Fourth, the peptide-mal-sac conjugate is separated from the free cross-linker and the regulator is exchanged for sodium phosphate (pH 6) using a gel filtration column (e.g., NAP-5, Pharmacia, Uppsala, Sweden). Fifth, a thiol-modified oligonucleotide is activated, the salt is removed and the regulator is changed to sodium phosphate (pH 6) on a gel filtration column. Sixth, the activated peptide reacts with the thiol-modified oligonucleotide. Finally, the peptide-oligonucleotide conjugate is purified by ion exchange chromatography (for example Nucleogen DEAE-500-10 or equivalent). The order of elution from the ion exchange column is as follows: the free peptide is first, oligonucleotide labeled with peptide after, and finally free oligonucleotide. In a preferred embodiment, peptide labeled oligonucleotides are recognized by specific receptors such as antibodies for classifying and isolating particular nucleic acids from a complex mixture. Since a peptide-labeled oligonucleotide primer can be subjected to high temperatures during polymerase chain reaction, in a preferred embodiment, the peptide is sufficiently short (ie, no more than five amino acids) to resist irreversible denaturation. In addition, the peptides may become resistant to different classes of proteases by avoiding the inclusion of peptide-sensitive protease bonds such as those in which serine and / or treoinin participate. Numerous methods have been employed to add a single primary aliphatic amino group to the oligonucleotides. See, for example, Agrawal et al., 1986, Nucleic Acids Res. 14, 6227-6245; Chollet et al, 1985, Nucleic Acids Res. 13, 1529-1541; Wachter et al., 1986, Nucleic Acids Res. 14, 7985-7994; Sproat et al., 1987, Nucleic Acids Res. 15, 6181-6196; Li et al., 1987, Nucleic Acids Res. 15, 5275-5286; and Smith et al., • 10 1985, Nucleic Acids Res. 13, 2439-2502. Several methods can be used to synthesize oligonucleotides containing several amino groups fixed on the oligonucleotide through a linker arm. Haralambidis et al., 1990, Nucleic Acids Res. 18, 493-499; Haralambidis et al., 1987, Nucleic Acids Res. 15, 4857-4876; Ruth et al., 1985, DNA 4, 93; Ruth, 1984, DNA 3, 123; Draper, 1984, Nucleic Acids Res. 12, 989-1002. The binding of a ligand such as biotin on the oligonucleotide is described in Kempe et al., 1985, Nucleic Acids Res. 13, 45-57. The use of a portion of Alkylation intercalation as fixation is described in U.S. Patent No. 4,582,789. Other standard methods of peptide coupling and derivatized oligonucleotide method can also be employed (see, for example, EPA 0 370 694). These standard procedures from the above references are incorporated herein by reference in their totalities. 5.9 ANTIBODIES AND DERIVATIVES THEREOF In a preferred embodiment, antibodies can be used to specifically recognize one or more peptide-labeled oligonucleotides used to label several populations of nucleic acids (eg, cDNA libraries). Such antibodies include, but are not limited to, polyclonal, monoclonal, humanized and chimeric antibodies, single chain antibodies, Fab fragment and fragments.
F (ab ') 2, fragments produced by a library of * expression of Fab, anti-idiotypic antibodies (anti-Id) and epitope binding fragments of any of the aforementioned. Such antibodies can be used as ligands in one of the screening steps in the present invention. Polyclonal antibodies that can be employed in the methods of the invention are heterogeneous populations of antibody molecules derived from the sera of immunized animals. Various methods well known in the art can be employed for the production of polyclonal antibodies to an antigen of interest. For example, in the production of polyclonal antibodies, several host animals can be immunized by injection of antigen of interest or derivative thereof, including, but not limited to, rabbits, mice, rats, etc. Various adjuvants can be used to increase the immune response, depending on the host species, and including, but not limited to, Freund's adjuvant (complete and incomplete), mineral gels such as aluminum hydroxide, surfactants such as lysolecithin, pluronic polyols, polyanions , peptides, oil emulsions, limpet hemocyanins, dinitrophenol, and potentially useful human adjuvants such as BCG (Bacillus Calmette-Guerin) and corynebacterium parvum. Such adjuvants are well known in the art.
Monoclonal antibodies that can be employed in the methods of the invention are homogeneous populations of antibodies to a particular antigen. A monoclonal antibody (mAb) to an antigen of interest can be prepared by the use of any known technique that provides for the production of antibody molecules by continuous cell lines in culture. They include, but are not limited to, the hybridoma technique originally described by Kohler and Milstein, 1975, Nature 256, 495-497, and the most recent human B-cell hybridoma technique (Kozbor et al., 1983, Immunology Today 4, 72), and the EBV-hybrid technique (Colé et al., 1985, Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pages 77-96). Such antibodies can be of any kind of immunoglobulin including IgG, IgM, IgE, IgA, IgD and any subclass thereof. The hybridoma that produces mAbs for use in this invention can be cultured in vitro or in vivo. Monoclonal antibodies that can be employed in the methods of the invention include, but are not limited to, human monoclonal antibodies or chimeric human-mouse (or other species) monoclonal antibodies. Human monoclonal antibodies can be made by any of several known techniques (eg, Teng et al., 1983, Proc. Natl. Acad. Sci. USA 80, 7308-7312; Kozbor et al., 1983, Im unology Today 4, 72-79; Olsson et al., 1982, Meth. Enzymol., 3-16). In addition, humanized monoclonal antibodies can be employed. Briefly, humanized antibodies are antibody molecules from non-human species that have one or more regions of complementary determination (CDRs) of the non-human species and a structure region of a human immunoglobulin molecule. Several techniques have been developed for the production of humanized antibodies (see, for example, Queen, U.S. Patent No. 5,585,089, which is incorporated herein by reference in its entirety). A variable region of light or heavy chain of immunoglobulin consists of a region of "structure" interrupted by three hypervariable regions, known as regions of complementary determination (CDRs). The magnitude of the region of structure and of the CDRs has been precisely defined (see, Kabat et al., 1983, "Sequences of Proteins of Immunological Interest") (North American Department of Health and Human Services) . A chimeric antibody is a molecule in which different portions are derived from different animal species, such as those having a variable region derived from a murine mAb and a human immunoglobulin constant region. Techniques for the production of "chimeric antibodies" have been developed (Morrison et al., 1984, Proc. Natl. Acad. Sci. U.S.A. 81, 6851-6855; Neuberger et al., 1984, Nature, 312, 604-608; Takeda et al., 1985, Nature, 314, 452-454) by splicing the genes of a mouse antibody molecule of appropriate antigen specificity together with genes from a human antibody molecule of appropriate biological activity. Alternatively, techniques described for the production of single chain antibodies (U.S. Patent No. 4,946,778; Bird, 1988, Science 242, 423-426; Huston et al., 1988, Proc. Natl. Acad. Sci. USA 85, 5879- 5883; and Ward et al., 1989, Nature 334, 544-546) can be adapted to produce single chain antibodies against the peptide portion of peptide-labeled oligonucleotide nucleotides useful in the methods of the invention. Single chain antibodies are formed by linking the heavy and light chain fragments of the Fv region through an amino acid bridge, resulting in a single chain polypeptide. Antibody fragments that recognize specific epitopes can be generated through known techniques. For example, such fragments include, but are not limited to: F (ab ') 2 fragments that can be produced by digestion of pepsin from the antibody molecule and Fab fragments that can be generated by reducing the disulphide bridges of the F (ab ') 2 fragments. Alternatively, Fab expression libraries can be constructed (Huse et al., 1989, Science, 246, 1275-1281) to allow rapid and easy identification of the monoclonal Fab fragments with the desired specificity. Antibodies to the peptide portion of a peptide labeled oligonucleotide can, in turn, be used to generate anti-idiotype antibodies that "mimic" the peptide, employing techniques well known to those skilled in the art. See, for example, Greenspan & Bona, 1993, FASEB J 7 (5), 437-444; and Nissinoff, 1991, J. Immunol. 147 (8), 2429-2438). For example, antibodies that bind to the peptide and competitively inhibit the binding of the peptide to its receptor can be used to generate anti-idiotypes that "mimic" the peptide receptor and, therefore, bind the peptide.
A molecular clone of an antibody to an antigen of interest can be prepared by several well-known techniques. The recombinant DNA methodology (see, for example, Maniatis et al., 1982, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York) can be used to construct nucleic acid sequences that encode a molecule of monoclonal antibody, or an antigen binding region thereof. Antibody molecules can be purified by many well-known techniques, for example, affinity or immunoabsorption chromatography, chromatographic methods such as HPLC (high performance liquid chromatography), or a combination thereof, etc. The methods of the production and use of antibodies used herein can be, for example, those described in Harlow and Lane (Harlow, E. and Lane, D., 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York), which is incorporated herein by reference in its entirety. The single-letter amino acid codes that correspond to the three-letter amino acid codes of the Sequence List appear below: A, Ala; R, Arg; N, Asn; D, Asp; B, Asx; C, Cys; Q, Gln; E, Glu; Z, Glx; G, Gly; H, His; I, lie; L, Leu; K, Lys; M, Met; F, Phe; P, Pro; S, Ser; T, Thr; W, Trp; And, Tyr; and V, Val. Antibodies suitable for use with the methods of this invention include the following, available from Affinity Bioreagents, Inc., 79, rue des Morillons, 75015, Paris, France. 1) Catalog No. PA 1-047 (rabbit IgG purified by affinity). The corresponding peptide recognized by this antibody is KFSREKKAAKT (SEQ ID NO: 71). 2) Catalog No. PA 1-039 (affinity purified rabbit immunoglobulin). The corresponding peptide recognized by this antibody is DQKRYHEDIFG (SEQ ID NO: 72). 3) Catalog No. PA 1-036 (purified rabbit IgG). The corresponding peptide recognized by the antibody is DLKEEKDINNNVKKT (SEQ ID NO: 73). 4) Catalog No. PA 1-014 (purified rabbit antibody). The corresponding peptide recognized by this antibody is CTGEEDTSE (SEQ ID NO: 74). 5) Catalog No. PA 3-013 (IgG purified by affinity). The corresponding peptide recognized by this antibody is PEETQTQDQPM (SEQ ID NO: 75). 6) Catalog No. PA 1-815 (anti-rabbit serum). The corresponding peptide recognized by this antibody is QKSDQGVEGPGAT (SEQ ID NO: 76). 7) Catalog No. PA 3-034 (rabbit polyclonal serum IgG). The corresponding peptide recognized by this antibody is DIGQSIKKFSKV (SEQ ID NO: 77). This polyclonal antibody also recognizes QRADSLSSHL (SEQ ID NO: 78). In addition, antibodies for use with the methods of this invention can be obtained from Medical & Biological Laboratories Co., Ltd., 440 Arsenal Street, Watertown, Massachusetts 02171, United States of America. They include the following: 1) Code No. 561 (Rabbit IgG from anti-serum). The corresponding peptide recognized by this antibody is YPYDVPDYA (SEQ ID NO: 79). 2) Code No. 562 (rabbit IgG from anti-serum). The corresponding peptide recognized by this antibody is EQKLISEEDL (SEQ ID NO: 80). 3) Code No. 563 (rabbit IgG from anti-serum). The corresponding peptide recognized by this antibody is YTDIEMNKLGK (SEQ ID NO: 81). 5.10. COLUMNS OF ANTIBODIES FOR CLASSIFYING NUCLEIC ACIDS An antibody specific for a peptide marker as it is used to identify nucleic acids from different sources can be ligated onto a solid phase support or a carrier material to facilitate separation using several well-known techniques. known.
Here, the "solid phase carrier or carrier" can be any carrier capable of binding an antigen or an antibody. Well-known supports or vehicles include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, amylases, natural and modified celluloses, polyacrylamides, gabbros, and magnetite. The nature of the vehicle can be either soluble or insoluble for the purposes of the present invention. The carrier or solid phase vehicle material can have virtually any possible structural configuration insofar as the coupled molecule can be fixed on an antigen or antibody. Thus, the configuration of the support can be spherical as in the case of a bead, or cylindrical as in the case of the inner surface of a specimen, or the external surface of a rod. Alternatively, the surface may be flat, such as a sheet, test strip, etc. Preferred supports include polystyrene beads. Those skilled in the art will know many other suitable vehicles for binding antibodies or antigens, or will be able to determine them through the use of routine experiments. For example, the carrier or solid phase vehicle material can be pellets, polymer particles, or other materials, to the extent that the coupled molecule can be ligated with an antigen or antibody. Such solid phase supports are readily apparent to a person with certain knowledge in the art. Solid phase supports or particularly useful carrier materials are polymeric beads of an average particle size of 0.1 to 10 mm. Antibodies against specific peptides can be fixed on any of the solid phase supports or carrier materials described above in various ways. Two of the most preferred fixations are adsorption and covalent attachment. The methods for making and employing the antibody columns employed herein can be, for example, the methods described in Harlow and Lane (Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York). ), which is incorporated herein by reference in its entirety. In addition, examples of suitable peptide antibodies and markers can be found, for example, in U.S. Patent Application No. 09 / 174,328, entitled "METHODS FOR MANIPULATING COMPLEX NUCLEIC ACID POPULATIONS USING PEPTIDE-LABELED OLIGONUCLEOTIDES" (methods for handling acid populations). complex nucleics using oligonucleotides labeled with peptides), by Iris and Pourny (Attorney Registration No. 9408-025), filed% on October 16, 1998, which is incorporated herein by reference in its entirety. 5.11. DETECTION OF ANTIBODIES AGAINST OLIGONUCLEOTIDES MARKED WITH PEPTIDES Antibodies that recognize oligonucleotides labeled with peptides can also be detectably labeled using any method known to the person skilled in the art. Many of these methods are known. For example, one of the ways in which an anti-peptide antibody can be detectably labeled is by linking it to an enzyme and its use in an enzyme immunoassay (EIA), "The Enzyme Linked Immunosorbent Assay (ELISA). "Voller, 1978, Diagnostic Horizons 2, 1-7; Voller et al., 1978, J. Clin. Pathol. 31, 507-520; Butler, 1981, Meth. Enzymol. 73, 48 2-523; Maggio, 1980, Enzyme Immunoassay, CRC Press, Boca Raton, FL; Ishikawa et al., 1981, Enzyme Immunoassay, Kgaku Shoin, Tokyo). The enzyme bound on the antibody will react with a suitable substrate, preferably a chromogenic substrate, in such a way that a chemical portion is produced which can be detected, for example, by spectrophotometric, fluorimetric or visual means. Enzymes that can be used to detectably label the antibody include, but are not limited to, malate dehydrogenase, staphylococcal nuclease, delta-5-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate, dehydrogenase, triosephosphate isomerase, horseradish peroxidase sour, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-6-phosphate dehydrogenase, glucoamylase and acetylcholinesterase. Detection can be achieved by methods • calorimetric ones that use a chromogenic substrate for the enzyme. The detection can also be achieved by visual comparison of the magnitude of the enzymatic reaction of a substrate compared to similarly prepared standards. Detection can also be achieved by employing any of several other immunoassays. For example, by means of • Radioactive labeling of antibodies or antibody fragments, it is possible to detect a peptide portion of the peptide-labeled oligonucleotide through the use of a radioimmunoassay (RIA) (see, for example, Weintraub, 1986, Principies of Radioimmunoassays, from radioimmunoassay), seventh training course on radioligand assay techniques, The Endocrine Society, which is incorporated herein by reference). The radioactive isotope can be detected by means such as the use of a gamma counter or a scintillation counter or by autoradiography. It is also possible to label the antibody with a fluorescent compound. When the fluorescently labeled antibody is exposed to light at an appropriate wavelength, its presence can be detected due to the fluorescence. Among the most commonly used fluorescent labeling compounds are fluorescein isothiocyanate, rhodamine, phycoerythrin, phycocyanin, allophycocyanin, o-phthaldehyde and fluorescamine. The antibody can also be detectably labeled using metals that avoid fluorescence such as 152Eu, or others from the lanthanide series. These metals can be fixed on the antibody using metal chelation groups such as diethylenetriaminpentaacetic acid (DTPA) or ethylenediaminetetraacetic acid (EDTA). The antibody can also be detectably labeled by its coupling with a chemiluminescent compound. The presence of a chemiluminescent labeled antibody is then determined by detecting the presence of the luminescence that arises during the course of a chemical reaction. Examples of particularly useful chemiluminescent labeling compounds are luminol, isoluminol, acrylonitrile ester, imidazole, acridinium salt and oxalate ester. In the same way, a bioluminescent compound can be used to label the antibody of the present invention. Bioluminescence is a type of chemolysis that is found in biological systems where a catalytic protein increases the efficiency of the chemiluminescent reaction. The presence of a bioluminescent protein is determined by the detection of the presence of luminescence. Important bioluminescent compounds for labeling purposes are green fluorescent protein, luciferin, luciferase and aequorin. 6. EXAMPLE: USE OF THE VGID® METHOD TO IDENTIFY HDinP GENES 6.1. INTRODUCTION Two human DinP genes (hDinP) have been identified using the VGID® method applied to cell line sources, as described below. The VGID® approach employed was the approach described above for cell line samples (see Figure 1). The objective of this example was to isolate human homolog (s) from the bacterial DinP gene. In bacteria and yeast, the product of the DinP gene (ie, DinP) is central to the damage repair pathway .DNA inducible is known as the "SOS repair system". Even though it is known that this route of DNA repair exists in humans, the capacity for induction has never been demonstrated. It is known that the components of this pathway are directly involved in the appearance of secondary cancers after radiation therapy or chemotherapy in humans. However, the human genes encoding the components of the pathway have not been previously identified. The VGID® example described below isolated, in less than three weeks, a total of five independent human cDNA clones. The clones were analyzed by DNA sequencing, translation in the six reading tables, and search of protein data base (BLASTX). The translations of the five clones showed high amino acid sequence homology with the bacterial DinP protein, thus confirming the identification of human homologs of bacterial DinP. It will be noted that hybridization with low stringency of a bacterial DinP probe with a human library would not have identified these clones since the nucleic acid sequence homology is too low to allow this type of observation. 6.2 MATERIALS AND METHODS Cell lines (phenotype selection). To isolate the hDinP transcripts by median selective subtraction by MutS, two different input cell lines were used in terms of their ability to perform DNA repair. The phenotype of interest (ie, human DinP activity) was provided by putatively "high expression" cells from a defined lymphoblastoid clonal line (lymphoblasts) (ie, cell lines number 1 in figure 1). Those cells were harvested at a time that corresponded to the apex of their growth curve in vitro (ie, 84 hours after the start of the growth phase). The competing .DNAc providers (ie, cell line number 2 of Figure 1) were hepatocytes cultured in standard medium for 60 hours before harvest) those two cell lines came from different sources and therefore have a consanguinity probability very low (that is, very low probability of having a common ancestor). However, the rapid growth rate of these cell lines is related to the possibility of substantial levels of mutation acquisition. MRNA extraction and cDNA synthesis. These procedures were performed using "trizol" Gibco-BRL (preparation of mRNA) and kits "Universal riboclone" from Promega (cDNA synthesis) in accordance with the protocols of the manufacturers. The synthesized cDNA was fractionated by size by electrophoresis on an agarose gel (0.08%); fragments within a range of 300 to 600 base pairs were removed from the gel. The cDNA was extracted from the gel segments using Gel-Clean Promega kits in accordance with the manufacturer's protocol. CDNA libraries. A library was constructed for each cell line by ligation of the flattened end of the cDNA selected by size into the QuanTox® Blunt plasmid vector (Quantum Biotechnologies). The alignment products were transformed into competent E.coli DH5α cells. Those cells were grown overnight in a liquid medium containing ampicillin. The cells were then harvested and plasmid vectors containing insert were recovered using Qiagen plasmid purification kits. Amplification of cDNA inserts. The inserts present in the cDNA library obtained from the lymphoblasts were amplified by polymerase chain reaction using oligonucleotide primers specific for the vector cloning cassette. The polymerase enzymes used were Pfu DNA polymerase (1.5 U / 100 μl reaction) and the Stoffel fragment of cDNA polymerase I (0.5 U / 100 μl reaction). The cycle protocol that was used was the following: 97 ° C. 3 min; 58 ° C, 5 sec; 70 ° C, 1 min; then 93 ° C, 30 sec; 58 ° C, 5 sec; 70 ° C, 1 min for 15 cycles. The polymerase chain reaction products were purified on Qiagen columns. The purified polymerase chain reaction products were thermally denatured at a temperature of 98 ° C for 5 minutes incubated at a temperature of 65 ° C for 20 minutes and cooled to room temperature. The denatured and cooled polymerase chain reaction products (350 ng in 90 μl) were equilibrated in an equal volume "2x reaction regulator (40 Mm tris-HCl, pH 7.6, 0.02 mM EDTA, 10 Mm MgCl2, 0.2 Mm DTT ) and exposed for 35 minutes MutS adsorbed on glass beads packed in an Eppendorf tube drilled through a small hole in the bottom of the tube During the incubation phase the Eppendorf tubes were placed in the horizontal position to increase the surface area of the tube. contact between the beads and the .DNA reformed Unrecycled reformed polymerase chain reaction products that were free in solution were recovered in the supernatant after centrifugation of the beads (8000 xg, 30 sec.). MutS mediated collection was repeated twice more with fresh beads, the supernatant was recovered at the end of this operation and stored at a temperature of 4 ° C until use. adante contains only structurally identical transcripts within all lymphoblasts in the phenotype cell line of interest. The procedure followed to isolate structurally common transcripts to all the hepatocytes were identical to the procedure described above except that the primers used for the polymerase chain reaction amplification (corresponding to vector sequences encoding T3 and T7 promoters) were biotinylated at the 5 'end. The final supernatant was also stored at a temperature of 4 ° C until use. Isolation of cDNA encoding hDinP. An aliquot of the supernatant stored from hepatocytes was then mixed with an aliquot of the supernatant stored from lymphoblasts in a 3: 1 ratio (hepatocytes: lymphoblasts). This mixture was denatured, reformed and exposed to pearls coated with MutS. As above, to remove all uneven heteroduplexes. The supernatant of this mixture was then exposed to streptadivine-coated beads (Dynabeads M-180, Dynal, used in accordance with the manufacturer's protocol) in order to trap all non-uneven homoduplex hybrids formed from a strand of hepatocytes and one strand of lymphoblast (ie, structurally identical transcripts in hepatocytes and lymphoblasts), as well as all transcripts specific for remaining hepatocytes. This collection step was performed by incubating the recovered supernatant after the MutS binding reaction (150 μl) with 150 μl of streptadivin beads. After recovery of the supernatant from streptadivin beads, the beads were rinsed twice in an IX reaction regulator to recover all unbound material. The washes were recovered by centrifugation, combined with the supernatant of streptadivine beads, and stored at a temperature of 4 ° C. The supernatant of combined streptadivine bead, which theoretically contained only transcripts specific for structurally identical lymphoblasts in all lymphoblast cells, were then desalted and concentrated using Qiaex II DNA purification kits (Qiagen) in accordance with the manufacturer's protocol. The purified material was subjected to the flat end treatment by the 3 'extension (Boehringer-Mannheim DNA tail kit), purified on Quiagen columns and cloned into a QuanTox (mr) Blunt vector in accordance with that described above. The twelve recombinant colonies obtained were then individually tested for the presence of inserts by: (i) amplification by polymerase chain reaction; and (ii) hybridization in several levels of strict character with a marked fragment, generated by polymerase chain reaction of DinP gene of E. coli. Under mildly stringent hybridization conditions (ie, 40 ° C overnight in 3X SSC, IX Denhardt's solution 20Mm sodium phosphate (pH 6.5), 10% dextran sulfate, 100 μg / ml Adn salmon sperm 3 washes in 3X SSC and 1% SDs at a temperature of 40 ° C for 15 minutes each), signals were obtained from the 12 clones, but the signals were slightly stronger for the 5 clones identified after sequencing and computer analysis to derive from 2 genes of hDinP (see below). In contrast under mildly hybrid conditions (ie, 50 ° C at night in the same regulator used for low level of severity plus 25% deionized formamide, 3 washes in 2X SCC and 1% SDS at 50 ° C during 15 minutes each), weak signals were obtained from the 12 clones without apparent differences in signal intensities. Finally under highly stringent hybridization conditions (ie, 60 ° C overnight in the same regulator used for moderately stringent conditions, 3 washes in IX SSC and 1% SDS at a temperature of 60 ° C for 15 minutes each) , a total absence of signals of the 12 clones was obtained. These hybridization results suggest that at least 5 of the 12 isolated clones contain inserts. The results further suggest that no hDinP clone could be isolated by simply screening a human library directly with a labeled fragment of the E. coli DinP gene; the hybridization signal in a library screening of this type could not be distinguished from the background. Of the 12 clones isolated by the method of the invention, 5 were sequenced, which presented the slightly stronger signal under highly stringent conditions in relation to the remaining 7. These 5 clones were then used as a search sequence in individual BLASTX protein database searches after transfer in the 6 reading frames (see Figure 5-9). The single-letter amino acid codes that appear in the computer analyzes provided by the BLASTX searches (see Figure 5-9) that correspond to the 3-letter amino acid codes of List Sequences appear in Section 5.9 above. 6.3 RESULTS. This example of VGIDMR isolated a total of 5 human cDNA inserts that are spliced (see Figure 4 for a map of the splice selections) that appear in Sequence List SEQ ID 1-5. The search in BLASTX protein database and a computerized analysis were carried out in each of the 5 sequences identified after transfer in the 6 reading frames (see Figure 5-9 for BLASTX results). The results revealed a high amino acid sequence homology exclusively with the bacterial DinP protein and its close relatives such as a UV mucB protection protein (see Figure 5A). Based on the sequences that are spliced, these 5 inserts were assigned to 2 separate hDinP homolog genes in accordance with what is described below. Three of the five inserts that are spliced (SEQ ID Nos: 1-3) encompass approximately the pre-said length for a full-length hDinP transcript after assembly into a composite sequence (SEQ ID NO: 6). The other 2 inserts (SEQ ID NOs: 4-5), which correspond to an accumulated length of 386 bases, are also spliced together and with the sequence composed of SEQ ID NO: 6. However, these 2 inserts provide evidence of the existence of 2 hDinP genes, as described below in more detail.
This result corresponds to other characterized human DNA repair genes, which are known to be encoded by multiple genes. The fact that SEQ ID Nos: 4-5 represent transcripts derived from different genes encoding hDisP isoforms is suggested through the limited internal sequence divergence at positions 237-252 and 274-279. 6.4 COMMENTS The two novel hDisP genes identified above represent a significant basis for our understanding of the genes involved in DNA repair. In addition, the new genes will be useful in the development of various prognostic tests, diagnostic tests, as well as therapeutic interventions for the treatment of disease, especially cancer. This is true, in part, because DNA repair pathways have been so strongly connected to cancer-causing mechanisms (see for example Fishel et al., 1993, Cell 75 (5), 1027-1038). The protein sequences encoded by the 5 human clones and their corresponding bacterial relatives are set forth in SEQ ID NOs: 7-70. Search analyzes for the 5 clones listed in SEQ ID NOs: l-5 appear in Figures 5-9 respectively. It is worth noting that the 5 independent clones encode a DinP E. coli protein homolog; that is, clones of hDinP (5 out of 12) were identified by the VGIDMR method in this experiment. This result demonstrates dramatically the high specificity for gene identification that can be obtained with the VGID1"method.1 This specificity correlates directly with the well-defined input phenotypes employed.Translation results of protein for SEQ ID NO: l (# 1) are listed in SEQ ID NOs: 7-24.Translation results of protein for SEQ ID NO: 2 (Tor-M) are listed in SEQ ID NOs: 25-29.Translation results of protein for SEQ ID. NO: 3 (# 3) are listed in SEQ ID NOs: 30-41 The protein translation results for SEQ ID NO: 4 (* 1) are listed in SEQ ID NOs: 42-58. protein for SEQ ID NO: 5 (* 2) are listed in SEQ ID NOs: 59-70 6.5 APPLICATION OF THE VGIDMR METHOD TO A MULTIPLE STAGE SYSTEM, COMPLEX - VGID "* MULTIPLEX (VGGTpM" RK), The VGID ^ method can be applied to complex multi-stage systems, such as cancer. In the case of cancer, several stages that have different phenotypes can coexist, for example, within a single biopsy sample, or within primary cell cultures propagated therefrom. When applied to such a system, however, the VGID method requires carrying out a large number of assays (i.e., an unlabeled cDNA library and a biotin-labeled cDNA library for comparison in pairs). each stage with all the other stages). The multiplex VGIDMR system, also known as ValiGene Gene Trapping (VGGTMR), offers an alternative approach for complex, multi-stage systems analysis. VGGT MR does not require carrying out a large number of pairwise comparisons. In contrast, the number of tests performed is considerably reduced, saving the user time and money. VGG ^ (VGIDMR multiplex) allows simultaneous analysis of more than two phenotypes (represented by more than two cDNA libraries) by inserts labeled by polymerase chain reaction from each library using primers having a unique marker. In a preferred embodiment, the sole marker is a peptide tag. In another preferred embodiment, the single peptide tag is recognized by an antibody specific for the tag. In this way, cDNA fragments derived from each of several libraries subjected to an analysis * VGIDMR multiplex can be specifically identified and retrieved as desired by the user. In addition, such identification and retrieval can be performed at any point of the VGIDMR Multiple Analysis by virtue of the unique library markers. For example, fragments from any number of cDNA libraries can be mixed, denatured, reformed, and subjected to one or several rounds of MutS chromatography and / or antibody affinity chromatography, all without losing track of the phenotypic source (ie say, cDNA library) from which a fragment of particular interest originated. In a typical VGIDMR Multiplex assay, the reformation occurs in the presence of high complexity (ie, labeled fragments from more than two cDNA libraries are mixed, denatured and reformated). After reformation, the classification of the various cDNAs is carried out using MutS chromatography and / or antibody affinity chromatography as desired by the user. The tagging scheme allows the user to isolate any desired cDNA fragment for cloning, probe use, or additional classification as desired. In summary, polymerase chain reaction primers of peptide-labeled oligonucleotides are used to differentially label cDNA fragments that come from different libraries. The markers allow the identification of sequences common to more than two libraries and the isolation of specific sequences to each library through chromatography. In addition, the markers allow the recovery of cDNA fragments of interest from a mixture of nucleic acids of different origins. Globally, the VGIDMR Multiplex process allows differentially labeling, classifying and isolating expressed nucleotide sequences that represent defined phenotypes present in complex mixtures. 6.5.1. EXAMPLE OF A COMPLEX SYSTEM - BREAST CANCER An example of a complex system that can be subjected to the VGIDMR multiplex method of the invention is breast cancer. In this example, congenic cell lines obtained from the same individual (HBL 100, HH9, MCF-7 and MCF-7 ras) can be used to represent four different stages of cancer (ie, four different phenotypes), as follows: (a) HBL 100 (pre-cancerous stage); (b) HH9 (pre-metastatic, hormone-sensitive stage); (c) MCF-7 (metastatic, hormone-dependent stage); and (d) MCF-7 ras (aggressive metastatic stage, hormone sensitive). The analysis of this system by VGID "1, by comparing two phenotypes at the same time, would require six different experiments, that is, it would be compared (a) with (b), (a) with (c), (a) With (d), (b) with (c), (b) with (d), and (c) with (d) In addition, to obtain knowledge of the metabolic pathways that can distinguish each stage, and to identify key elements (for example, optimal points of intervention) within a path of interest, the cell growth of each stage of cancer could be analyzed under different conditions For example, four suitable conditions could be the following: (a) standard growth conditions ( for example, cell culture medium containing fetal calf serum), (b) nonsteroidal conditions (e.g., conditions without serum, or conditions containing serum where the serum has been treated to remove all steroids); c) estradiol conditions (for example, non-ester conditions) ideas where a defined concentration of estradiol has been added again as the only steroid present); and (d) estradiol plus tamoxifen conditions (for example, . estradiol conditions where a defined concentration of tamoxifen has also been added). For a complete analysis of this system using VGIDMR, where complex is defined as comparing four different phenotypes among them in four different conditions, a total of twenty-four VGIDMR trials would be carried out. 6.5.2. ANALYSIS OF A COMPLETE SYSTEM EMPLOYING MULTIPLEX Each VGID -aMR assay in pairs just described would provide a reading of the expressed genes distinguishing the two samples under analysis. Twenty-four comparisons in pairs of this type would be further compared between them. The efficiency of comparing phenotypes in this way is limited by the large number of independent VGID ^ assays that are required. An alternative approach is Multiplex VGIDMR. In the previous example, HBL 100, HH9, MCF-7 and MCF-7 ras would each be labeled with a different peptide marker. It will be understood that, in the VGIDMR multiplex process, the markers serve not only to 'identify the source library of a particular cDNA insert, but also to provide a specific means of recovery. Using this approach in the previous example, the recovery of cDNA fragments that characterize each stage of cancer relative to others and in relation to the applied environmental conditions (i.e., cell culture growth conditions) would require a much smaller number of essays. For example, four assays would be performed in which fragments of .ADNc from each of the four cell lines are compared, (ie, mixed, denatured, reformatted, and classified by column chromatography of MutS and / or antibodies) at each one of the four growth conditions established above. Thus, a complete analysis of the system can be carried out in only four trials instead of the twenty-four trials required in the peer-review approach. The invention described and claimed herein is not limited in its scope to the specific embodiments discussed here since these embodiments are intended to illustrate various aspects of the invention. Any equivalent mode is within the scope of this invention. In fact, various modifications to the invention in addition to those illustrated and described herein will be apparent to those skilled in the art from the foregoing description. Such modifications are also within the scope of the appended claims. In Throughout this application several publications and patents are mentioned. Their contents are incorporated herein by reference in the present application in their totalities. fifteen • twenty LIST OF SEQUENCES < 110 > Iris, Francois J. -M. Pourny, Jean-Louis • < 120 > VGID MULTIPLE 5 < 130 > 9408-024 < 140 > Pending assignment < 141 > 1999-01-15 < 150 > 09/007, 905 < 151 > 1998-01-15 10 < 160 > 81 < 170 > Patentln Ver. 2.0 < 210 > 1 < 211 > 329 < 212 > DNA 15 < 213 > Escherichia coli < 400 > 1 gggccaccgc ttcaattttt ggcgtaattg tccgaaaaac ggcatgaatt tgcttagaaa 60 tttggcaaaa tggctttata tcaggggtca gaaagatccc gtctggcgct aaacgccgcg 120 cttctgcgga tcgcatggcc gaatgaatgc caagttggcg cgcgacatag ttcgccgtcg 180 20 tcaccacccc acgaccccca gtttctgctg gatcacgcga aataattaat ggctggtggc 240 gtaatgccgg attgtcacgc atctcgactt gggcatagaa ggcatcgata tcaacatggn 300 ggaattttac gtgtatcagt tgtcaataa 329 < 210 > 2 < 211 > 256 25 < 212 > DNA < 213 > Escherichia coli < 400 > 2 ccgggcgttt aggcagacgg gatctttctg acccctgatt ttgccaaata taaagccatt 60 tctaagcaaa ttcatgccgt ttttcggaca attacgccaa aaattgagcc ggtggtgatt 120 gatgaggctt acttagatgt gaccgccaat gcgttgtcag gcgcactgct ggccgcacag 180 ttacggcatg acatttatat acacacacga ttactctagt tcggcgggtg tatcgtatac 240 256 catactatta gcgatg < 210 > 3 < 211 > 248 < 212 > DNA < 213 > Escherichia coli < 400 > 3 gggatgaggc ttacttagat gtgaccgaca atgcgttgtc aggcgcaatn ctggccgcac 60 agttacggca tgacatttat aaacaancac gnttaactag ttcggtgggt gtatcgtata 120 acaaactatt agcgaagttg ggatctgant ttaataagcc aaacggtgtg acggtgatta 180 cgncggaaaa ccgcctggnt tttttagntc atttnccgat tggtgaattt cgcggggtcg gtgagaaa 240 248 < 210 > 4 < 211 > 387 < 212 > DNA < 213 > Escherichia coil < 400 >; 4 gccaacttgg cattcattcg gccatgcgat ccgcagaagc gcggcgttta gcgccagacg 60 ggatctttct gacccctgat tttgccaaat ataaagccat ttctaagcaa attcatgccg 120 tttttcggac aattacgcca aaaattgaag cggtggccct tgatgaggct tacttagatg 180 tgcgttgtca tgaccgccaa tggccgcaca ggcgcactgc gacatttata gttacggcat 240 tacacacacg attactctag ttcggtgggt gtatcgtata ccatactatt agcgaggttg 300 ggatctgatt taataagcca aacggtgtga cggtgattac gcggaaaacc gcctggtttt 360 ttagtcattt ccgattggtg aatttcg 387 < 210 > 5 < 211 > 381 < 212 > DNA < 213 > Escherichia coli < 400 > 5 gccaacttgg cattcattcg gccatgcgat ccgaagcgcg gcgtttaggc agcagacggg 60 atctttctga cccctgattt tgccaaatat aaagccattt ctaagcaaat tcatgccgtt 120 ttacgccaaa tttcggacaa aattgaagcg gtggtgattg atgaggctta cttagatgtg 180 accgccaatg cgttgtcagg cgcaatctgg ccgcacagtt acggcatgac atttataaac 240 aacacgttaa ctagttcggt gggtgtatcg tataacaaac tattagcgaa gttgggatct 300 gatttaataa gccaaacggt ttacgcggaa gtgacggtga aaccgcctgg ttttttagtc atttccgatt ggtgaatttc g 381 360 < 210 > 6 < 211 > 567 < 212 > DNA < 213 > Escherichia coli < 400 > 6 ttattgacaa ctgatacacg taaaattccc catgttgata tcgatgcctt ctatgcccaa 60 gtcgagatgc gtgacaatcc ggcattacgc caccagccat taattatttc gcgtgatcca 120 gcagaaactg ggggtcgtgg ggtggtgacg acggcgaact atgtcgcgcc aacttggcat 180 tcattcggcc atgcgatccg cagaagcgcc gggcgtttag gcagacggga tctttctgac 240 ccctgatttt gccaaatata aagccatttc taagcaaatt catgccgttt ttcggacaat 300 tacgccaaaa attgagccgg tggtgattga tgaggcttac ttagatgtga ccgccaatgc 360 gttgtcaggc gcactgctgg ccgcacagtt acggcatgac atttataaac aacacgttaa 420 ctagttcggt gggtgtatcg tataacaaac tattagcgaa gttgggatct gatttaataa 480 gccaaacggt gtgacggtga ttacgcggaa aaccgcctgg ttttttagtc atttccgatt 540 ggtgaatttc gcggggtcgg tgagaaa 567 < 210 > 7 < 211 > 63 < 212 > PRT < 213 > Escherichia coli < 400 > 7 Arg Gly Val Val Thr Thr Ala Asn Tyr Val Ala Arg Leu Gly He His 1 5 10 15 Be Wing Met Arg Wing Wing Glu Wing Arg Arg Leu Wing Pro Asp Gly He '20 25 30 Phe Leu Thr Pro Asp Phe Wing Lys Tyr Lys Wing He Ser Lys Gln He 35 40 45 His Ma Val Phe Arg Thr He Thr Pro Lys He Glu Ma Val Ma 50 55 '60 < 210 > 8 < 211 > 64 < 212 > PRT < 213 > Escherichia coli < 400 > 8 Arg Gly Val He Ser Thr Ma Asn Tyr Pro Ma Arg Lys Phe Gly Val 1 5 10 15 Arg Ser Ma Met Pro Thr Gly Met Ma Leu Lys Leu Cys Pro His Leu 25 30 Thr Leu Leu Pro Gly Arg Phe Asp Wing Tyr Lys Glu Wing Being Asn His 35 40 45 He Arg Glu He Phe Ser Arg Tyr Thr Ser Arg He Glu Pro Leu Ser 50 55 60 < 210 > 9 < 211 > 4 < 212 > PRT < 213 > Escherichia coli < 400 > 9 Thr Ma Asn Tyr 1 < 210 > 10 < 211 > 29 < 212 > PRT < 213 > Escherichia coli < 400 > 10 Lys Phe Xaa His Val Asp He Asp Ma Phe Tyr Ma Gln Val Glu Met 1 5 10 15 Arg Asp Asn Pro Ma Leu Arg His Gln Pro Leu He He 20 25 < 210 > 11 < 211 > 29 < 212 > PRT < 213 > Escherichia coli < 400 > 11 Lys He He His Val Asp Met Asp Cys Phe Phe Ma Ala Val Glu Met 1 5 10 15 Arg Asp Asn Pro Ma Leu Arg Asp He Pro He Wing He 20 25 < 210 > 12 < 211 > 10 < 212 > PKT < 213 > Escherichia coli < 400 > 12 Val Glu Met Arg Asp Asn Pro Ala Leu Arg 1 5 10 < 210 > 13 < 211 > 55 < 212 > PRT < 213 > Escherichia coli < 400 > 13 Arg Gly Val Val Thr Thr Ala Asn Tyr Val Ma Arg Gln Leu Gly He 1 5 10 15 His Ser Ma Met Arg Ser Wing Glu Ma Arg Arg Leu Ma Pro Asp Gly 25 30 He Phe Leu Thr Pro Asp Phe Ma Lys Tyr Lys Ma He Ser Lys Gln 35 40 45 He His Ma Val Phe Arg Thr 50 55 < 210 > 14 < 211 > 55 < 212 > PRT < 213 > Escherichia coli < 400 > 14 Arg Ser Val Val Ser Thr Cys Asn Tyr Val Ma Arg Ser Tyr Gly He 1 5 10 15 Arg Ser Gly Met Ser He Leu Lys Ma Leu Glu Leu Cys Pro Asn Ma 20 25 30 He Phe Wing His Ser Asn Phe Arg Asn Tyr Arg Lys His Ser Lys Arg 40 45 He Phe Ser Val He Glu Ser 50 55 < 210 > 15 < 211 > 5 < 212 > PRT < 213 > Escherichia coli < 400 > 15 Asn Tyr Val Ma Arg 1 5 < 210 > 16 < 211 > 28 < 212 > PRT < 213 > Escherichia coil < 400 > 16 Phe Xaa His Val Asp He Asp Ma Phe Tyr Ala Gln Val Glu Met Arg 1 5 10 15 Asp Asn Pro Ma Leu Arg His Gln Pro Leu He He 20 25 < 210 > 17 < 211 > 28 < 212 > PRT < 213 > Escherichia coli < 400 > 17 Phe Leu Tyr Phe Asp Phe Asp Ma Phe Phe Wing Ser Val Glu Glu Leu 1 5 10 15 Glu Asn Pro Glu Leu Val Asn Gln Pro Leu He Val 20 25 < 210 > 18 < 211 > 4 < 212 > PRT < 213 > Escherichia coli < 400 > 18 Gln Pro Leu He < 210 > 19 < 211 > 34 < 212 > PRT < 213 > Escherichia coli < 400 > 19 Val Asp He Asp Ma Phe Tyr Ma Gln Val Glu Met Arg Asp Asn Pro 1 5 10 15 Ma Leu Arg His Gln Pro Leu He He Ser Arg Asp Pro Wing Glu Thr 25 30 Gly Gly < 210 > 20 < 211 > 34 < 212 > PRT < 213 > Escherichia coli < 400 > 20 Val. Asp Met Gln Ser Phe Tyr Ala Ser Val Glu Lys Ala Glu Asn Pro 1 5 10 15 His Leu Lys Asn Arg Pro Val He Val Ser Gly Asp Pro Glu Lys Arg 20 25 30 Gly Gly < 210 > 21 < 211 > 59 < 212 > PRT < 213 > Escherichia coli < 400 > 21 Gly Val Val Thr Thr Ala Asn Tyr Val Ma Arg Gln Leu Gly He His 1 5 10 15 Ser Ala Met Arg Ser Ala Glu Ma Arg Arg Leu Ma Pro Asp Gly He 20 25 30 Phe Leu Thr Pro Phe Ala Lys Tyr Lys Ma I Am Lys Gln He His 40 45 Wing Val Phe Arg Thr He Thr Pro Lys He Glu 50 55 < 210 > 22 < 211 > 60 < 212 > PRT < 213 > Escherichia coli < 400 > 22 Gly Val Val Leu Ma Ma Cys Pro Leu Ma Lys Gln Lys Gly Val Val 1 5 10 15 Asn Ma Ser Arg Leu Trp Glu Ma Gln Glu Lys Cys Pro Glu Ma Val 20 25 30 Val Leu Arg Pro Arg Met Gln Arg Tyr He Asp Val Ser Leu Gln He 35 40 45 Thr Ma He Leu Glu Glu Tyr Thr Asp Leu Val Glu 50 55 60 < 210 > 23 < 211 > 60 < 212 > PRT < 213 > Escherichia coli < 400 > 23 Arg Gly Val Val Thr Thr Ala Asn Tyr Val Ma Arg Gln Leu Gly He 1 5 10 15 His Ser Ma Met Arg Ser Ma Glu Ma Arg Arg Leu Ma Pro Asp Gly 25 30 He Phe Leu Thr Pro Asp Phe Ma Lys Tyr Lys Ma He Ser Lys Gln 35 40 45 He His Ma Val Phe Arg Thr He Thr Pro Lys He 50 55 - 60 < 210 > 24 < 211 > 60 < 212 > PRT < 213 > Escherichia coli < 400 > 24 Lys Gly He Val Val Thr Cys Ser Tyr Glu Ma Arg Ma Arg Gly Val 1 5 10 15 Lys Thr Thr Met Pro Val Trp Gln Ma Lys Arg His Cys Pro Glu Leu 20 25 30 He Val Leu Pro Pro Asn Phe Asp Arg Tyr Arg Asn Ser Ser Arg Ala 40 45 Met Phe Thr He Leu Arg Glu Tyr Thr Asp Leu Val 50 55 60 < 210 > 25 < 211 > 35 < 212 > PRT < 213 > Escherichia coli < 400 > 25 He Phe Leu Tyr Lys Ma He Ser Lys Gln He His Wing Val Phe Arg 10 15 Thr He Thr Pro Lys He Glu Pro Val Val He Asp Glu Ma Tyr Leu 25 30 Asp Val Thr 35 < 210 > 26 < 211 > 50 < 212 > PRT < 213 > Escherichia coli < 400 > 26 He Val Leu Pro Pro Asn Phe Asp Arg Tyr Arg Asn Ser Ser Arg Ala 1 5 10 15 Met Phe Thr He Leu Arg Glu Tyr Thr Asp Leu Val Glu Pro Val Ser 20 25 30 He Asp Glu Gly Tyr Met Asp Met Thr Asp Thr Pro Tyr Ser Ser Arg 35 40 45 Ala Leu 50 < 210 > 27 < 211 > 35 < 212 > PRT < 213 > Escherichia coli < 400 > 27 Phe Ma Lys Tyr Lys Ma He Ser Lys Gln He His Ma Val Phe Arg 1 5 10 15 Thr He Thr Pro Lys He Glu Pro Val Val He Asp Glu Ma Tyr Leu 25 30 Asp Val Thr 35 < 210 > 28 < 211 > 35 < 212 > PRT < 213 > Escherichia coli < 400 > 28 Phe Asp Ma Tyr Lys Glu Wing Being Asn His He Arg Glu He Phe Ser 1 5 10 15 Arg Tyr Thr Ser Arg He Glu Pro Leu Ser Leu Asp Glu Ma Tyr Leu 20 25 30 Asp Val Thr 35 < 210 > 29 < 211 > 8 < 212 > PRT < 213 > Escherichia coli < 400 > 29 Asp Glu Ma Tyr Leu Asp Val Thr 1 5 < 210 > 30 < 211 > 69 < 212 > PRT < 213 > Escherichia coli < 400 > 30 Ser Gly Ma Xaa Leu Wing Wing Gly Leu Arg His Asp He Tyr Lys Gln 1 5 10 15 Xaa Arg Leu Thr Being Ser Val Gly Val Being Tyr Asn Lys Leu Leu Ma 25 30 Lys Leu Gly Ser Xaa Phe Asn Lys Pro Asn Gly Val Thr Val He Thr 35 40 45 Xaa Glu Asn Arg Leu Xaa Phe Leu Xaa His Xaa Pro He Gly Glu Phe 50 55 60 Arg Gly Val Gly Glu 65 < 210 > 31 < 211 > 69 < 212 > PRT < 213 > Escherichia coli < 400 > 31 Ser Ma Thr Leu He Ma Gln Glu He Arg Gln Thr He Phe Asn Glu 1 5 10 15 Leu Gln Leu Thr Ma Ser Wing Gly Val Wing Pro Val Lys Phe Leu Wing 25 30 Lys He Wing Being Asp Met Asn Lys Pro Asn Gly Gln Phe Val He Thr 35 40 45 Pro Wing Glu Val Pro Ma Phe Leu Gln Thr Leu Pro Leu Wing Lys He 50 55 60 Pro Gly Val Gly Lys 65 < 210 > 32 < 211 > 5 < 212 > PRT < 213 > Escherichia coli < 400 > 32 Asn Lys Pro Asn Gly 1 5 < 210 > 33 < 211 > 10 < 212 > PRT < 213 > Escherichia coli < 400 > 33 Asp Glu Ala Tyr Leu Asp Val Thr Asp Asn 1 5 10 < 210 > 34 < 211 > 10 < 212 > PRT < 213 > Escherichia coli < 400 > 34 Asp Glu Allah Tyr Leu Asp Val Thr Asp Ser 1 5 10 < 210 > 35 < 211 > 9 < 212 > PRT < 213 > Escherichia coli < 400 > 35 Asp Glu Ma Tyr Leu Asp Val Thr Asp 1 5 < 210 > 36 < 211 > 65 < 212 > PRT < 213 > Escherichia coli < 400 > 36 Ma Ma Gln Leu Arg His Asp He Tyr Lys Gln Xaa Arg Leu Thr Ser 1 5 10 15 Ser Val Gly Val Ser Tyr Asn Lys Leu Leu Wing Lys Leu Gly Ser Xaa 20 25 30 Phe Asn Lys Pro Asn Gly Val Thr Val He Thr Xaa Glu Asn Arg He 40 45 Xaa Phe Leu Xaa His Xaa Pro He Gly Glu Phe Arg Gly Val Gly Glu 50 55 60 Lys 65 < 210 > 37 < 211 > 65 < 212 > PRT < 213 > Escherichia coli < 400 > 37 Ma Lys Glu He Gln Ser Arg Leu Gln Lys Glu Leu Leu Pro Ser 1 5 10 15 Be He Gly He Ma Pro Asn Lys Phe Leu Ma Lys Met Wing Be Asp 20 25 30 Met Lys Lys Pro Leu Gly He Thr He Leu Arg Lys Arg Gln Val Pro 40 45 Asp He Leu Trp Pro Leu Pro Val Gly Glu Met His Gly Val Gly Lys 50 55 60 Lys 65 < 210 > 38 < 211 > 17 < 212 > PRT < 213 > Escherichia coli < 400 > 38 Asp Glu Ma Tyr Leu Asp Val Thr Asp Asn Ma Leu Ser Gly Ma Xaa 1 5 10 15 Leu < 210 > 39 < 211 > 17 < 212 > PRT < 213 > Escherichia coli < 400 > 39 Asp Glu Gly Tyr Met Asp Met Thr Asp Thr Pro Tyr Ser Ser Arg Ala 10 15 Leu < 210 > 40 < 211 > 66 < 212 > PRT < 213 > Escherichia coli < 400 > 40 Leu Ma Ma Gln Leu Arg His Asp He Tyr Lys Gln Xaa Arg Leu Thr 1 5 10 15 Ser Ser Val Gly Val Ser Tyr Asn Lys Leu Leu Ma Lys Leu Gly Ser 20 25 30 Xaa Phe Asn Lys Pro Asn Gly Val Thr Val He Thr Xaa Glu Asn Arg 40 45 Leu Xaa Phe Leu Xaa Glu Xaa Pro He Gly Glu Phe Arg Gly Val Gly 50 55 60 Glu Lys 65 < 210 > 41 < 211 > 66 < 212 > PRT < 213 > Escherichia coli < 400 > 41 He Ma Lys Lys He Lys Asn Phe Val Phe Gln Asn Leu Arg He Lys 10 15 I Have Been Gly He Being Asp His Phe Leu He Ma Lys He Phe Ser 20 25 30 *. om «. lr. Pto, he ßly 1 ty > to be v > ? * «J •» »p *. or. , ro u. "01u Ilß, r," and ".
Glu Lys 65 < 210 > 42 < 211 > 61 < 212 > PRT < 213 > Escherichia coli < 400 > 42 Ola Leu Gly He His Ser Ma 1 Met Arg Ser Ala ßla Al. Arg Arg Leu 10 15 ^ - ^^ - Phe Leu Thr Pro ASP Phe A1a Lys ^ Ly, A! A 25 30 I have and Ql »lle His Wing Val Phe Ar * Thr t«. 35 Ars Thr Ile Thr Pro Lys He 4S «. «« «I a, tau ^ 01" ", ^ u 50 < 210 > 43 < 211 > 61 < 212 > PRT < 213 > Escherichia coli < 400 > 43 Lys Phe Gly Val Arg Ser Ma Met Pro Thr Gly Met Ma Leu Lys Leu 1 S 10 15 Cyß Pro Hit Leu Thr Leu Leu Pro Oly Arg Phe Asp Ma Tyr Lys Glu 20 25 30 Ma Ser Asa His He Arg Glu He Phe Ser Arg Tyr Thr Ser Arg He 40 45 Glu Pro Leu Ser Leu Asp Glu Ma Tyr Leu Asp Val Thr 50 55 60 < 210 > 44 < 211 > 9 < 212 > PRT < 213 > Escherichia coli < 400 > 44 Leu Asp Glu Ma Tyr Leu Asp Val Thr 1 5 < 210 > 45 < 211 > 39 < 212 > PRT < 213 > Escherichia coli < 400 > 45 Gln Leu Arg His Asp He Tyr He His Thr Arg Leu Leu Phe Gly Gly 10 S Cys lie Val Tyr His Thr lie Ser Glu Val Gly He Phe Asn Lyß Pro 20 25 30 Asn Gly Val Thr Val He Thr 35 < 210 > 46 < 211 > 41 < 212 > PRT < 213 > Escherichia coli < 400 > 46 Glu He Arg Gln Thr He Phe Asn Glu Leu Gln Leu Thr Ma Ser Ma 1 5 0 15 Gly Val Ma Pro Val Lys Phe Leu Ma Lys He Ma Sex Asp Met Asn 20 25 30 Lys Pro Asn Gly Gln Phe Val He Thr 35 40 < 210 > 47 < 211 > 5 < 212 > PRT < 213 > Escherichia coli < 400 > 47 Asn Lys Pro Asn Gly 1 5 < 210 > 48 < 211 > 38 < 212 > PRT < 213 > Escherichia coli < 400 > 48 Ser Gly Ma Leu Leu Wing His Ser Tyr Gly Met Thr Phe He Tyr Thr 1 5 10 15 His Asp Tyr Ser Ser Val Vally Ser Tyr Thr He Leu Leu Wing 20 25 30 Lys Leu Gly Ser Asp Leu 35 < 210 > 49 < 211 > 38 < 212 > PRT < 213 > Escherichia coli < 400 > 49 Ser Ma Thr Leu He Ma Gln Glu He Arg Gln Thr He Phe Asn Glu 15 Leu Gln Leu Thr Ala Ser Ala Gly Val Ala Pro Val Lys Phe Leu Ala 20 25 30 Lys He Ala Ser Asp Net 35 < 210 > 50 < 211 > 68 < 212 > PRT < 213 > Escherichia coli < 400 > fifty ßly He His Ser Wing Met Arg Ser Ma Glu to Arg Arg Leu Ala Pro 1 5 10 15 Asp Gly He Phe Leu Thr Pro Asp Phe Ma Lys Tyr Lys Ma He Ser 20 25 30 Lys Gln He Asx Wing Val Phe Arg Thr He Thr Pro Lys He Glu Wing 35 40 45 Val Ala Leu Asp Glu Ma Tyr Leu Asp Val Thr Ma Asn Ala Leu Ser 50 55 60 Gly Ma Leu Leu SS < 210 > 51 < 211 > 68 < 212 > PRT < 213 > Escherichia coli < 400 > 51 Gly Val Lys Thr Thr Met Pro Val Trp Gln Ala Lys Arg His Cys Pro 1 5 10 15 slu Leu He Val Leu Pro Pro Asn Phe Asp Arg Tyr Arg Asn Ser Ser 20 25 30 Arg Ma Met Phe Thr He Leu Arg slu Tyr Thr Asp Uu Val Glu Pro 35 40 45 Val Ser He Asp Glu Gly Tyr Met Asp Met Thr Asp Thr Pro Tyr Ser 50 55 60 Ser Arg Ala Leu 65 < 210 > 52 < 211 > 18 < 212 > PRT < 213 > Escherichia coli < 400 > 52 Ser Ser Val Gly Val Ser Tyr Thr He Leu Leu Ma Lys Leu Gly Ser 1 5 10 15? Sp Leu < 210 > 53 < 211 > 18 < 212 > PRT < 213 > Escherichia coli < 400 > 53 Ser Ser He Gly He Ma Pro Asn Lys Phe Leu Ma Lys Met Ma Ser 1 5 10 15 Asp Met < 210 > 54 < 211 > 61 < 212 > PRT < 213 > Escherichia coli < 400 > 54 Gln Leu Gly He His Ser Ma Met Arg Ser Wing Glu Ma Arg Arg Leu 5 10 15 Ma Pro Asp Gly He Phe Leu Thr Pro Asp Phe Ma Lys Tyr Lys Ma 20 25 30 He Ser Lys Gln He His Ma Val Phe Arg Thr He Thr Pro Lys He 40 45 Glu Ala Val Ala Leu Asp Glu Ala Tyr Leu Asp Val Thr 50 5S 60 < 210 > 55 < 211 > 61 < 212 > PRT < 213 > Escherichia coli < 400 > 55 Lys Leu Gly Val Lys Ma Gly Met Pro He He Lys Ma Met Gln He 1 5 10 15 Ma Pro Ser Wing He Tyr Val Pro Met Arg Lys Pro He Tyr Glu Ma 20 25 30 Phe Ser Asn Arg He Met Asn Leu Leu Asn Lys His Wing Asp Lys He 35 40 45 Glu Val Ma Ser He Asp Glu Ma Tyr Leu Asp Val Thr 60 < 210 > 56 < 211 > 8 < 212 > PRT < 213 > Escherichia coli < 400 > 56 Asp Glu Ma Tyr Leu Asp Val Thr 1 5 < 210 > 57 < 211 > 43 < 212 > PRT < 213 > Escherichia coli < 400 > 57 Val Thr Ma Asn Ala Leu Ser Gly Ma Leu Leu Ma His Ser Tyr * - Gly 10 15 Met Thr Phe He Tyr Thr His Asp Tyr Ser Ser Ser Val G Gllyy Val Ser 20 25 30 Tyr Thr He Leu Leu Ma Lys Leu Gly be Asp 35 40 < 210 > 58 < 211 > 43 < 212 > PRT < 213 > Escherichia coli < 400 > 58 Val Glu Gly Asn Phe ßlu Asn Gly He Glu Leu Ma Arg Lys He Lys 10 15 Gla Glu He Leu Glu Lyß slu Lys He Thr Val Thr Val Gly Val Ma 20 25 30 Pro? Sn Lys He Leu Ma Lys He lie Ma Asp 35 40 < 210 > 59 < 211 > 20 < 212 > PRT < 213 > Escherichia coli < 400 > 59 Leu Thr Ser Ser Val Gly Val Ser Tyr Asn Lys Leu Leu Ma Lys Leu 1 5 10 15 Gly Ser Asp Leu 20 < 210 > 60 < 211 > 20 < 212 > PRT < 213 > Escherichia coli < 400 > 60 Leu Pro Be Ser He Gly He Ma Pro Asn Lys Phe Leu Ma Lys Het 5 10 15 Ma Ser Asp Met 20 < 210 > 61 < 211 > 35 < 212 > PRT < 213 > Escherichia coli < 400 > 61 Phe Ma Lys Tyr Lys Ma He Ser Lys Gln He Hiß Ma Val Phe Arg 1 5 10 15 Thr He Thr Pro Lys He Glu Ma Val Val He Asp Glu Wing Tyr Leu 20 25 30 Asp Val Thr 35 < 210 > 62 < 211 > 35 < 212 > PRT < 213 > Escherichia coli < 400 > 62 Phß A «p Ala Tyr Lys Glu Ala Ser Asn His He Arg Glu He Phe Ser 1 5 10? S Arg Tyr Thr Ser Arg He Glu Pro Leu Ser He Asp slu Ma Tyr Leu 20 25 30 Asp Val Thr 35 < 210 > 63 < 211 > 8 < 212 > PRT < 213 > Escherichia coli < 400 > 63 Asp Glu Ala Tyr Leu Asp Val Thr 1 S < 210 > 64 < 211 > 20 < 212 > PRT < 213 > Escherichia coli < 400 > 64 Leu Thr Ser Ser Val Gly Val Ser Tyr Asn Lys Leu Leu Ma Lys Leu 5 10 15 ßly Ser Asp Leu 20 < 210 > 65 < 211 > 20 < 212 > PRT < 213 > Escherichia coli < 400 > 65 Leu Thr Ma Ser Ma Gly Val Ma Pro Val Lys Phe Leu Ma Lys He 1 5 10 15 Ma Ser Asp Met 20 < 210 > 66 < 211 > 17 < 212 > PRT < 213 > Escherichia coli < 400 > 66 He Be Glu Val Gly He Phe Asn Lys Pro Asn Gly Val Thr Val He 1 5 10 15 Thr < 210 > 67 < 211 > 18 < 212 > PRT < 213 > Escherichia coli < 400 > 67 Leu Ma Lys He Wing Being Asp Met Asn Lys Pro Aen Gly Gln Phe Val 1 5 10 15 He Thr < 210 > 68 < 211 > 5 < 212 > PRT < 213 > Escherichia ccli < 400 > 68 Asn Lys Pro Asn Gly 1 5 < 210 > 69 < 211 > 41 < 212 > PRT < 213 > Escherichia ccli < 400 > 69 He Phe Leu Thr Pro Asp Pbe Wing Lys Tyr Lys Ma He Ser Lys Gln 1 5? O 15 He His Ma Val Phe Arg Thr He Thr Pro Lys He Glu Ma Val Val 20 25 3rd He Asp Glu Ma Tyr Leu Asp Val Thr 35 40 < 210 > 70 < 211 > 41 < 212 > PRT < 213 > Escherichia ccli < 400 > 70 He Val Leu Pro Pro Asn Phe Asp Arg Tyr Arg Asn Ser Ser Arg Ma 1 5 10 15 Met Phe Thr He Leu Arg Glu Tyr Thr Asp Leu Val Glu Pro Val Ser 20 25 30 He Asp Glu Gly Tyr Met Asp Met Thr 35 40 < 210 > 71 < 211 > 11 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 71 Lys Phe Ser Arg Glu Lys Lys Ma Ma Lys Thr 1 5 10 < 400 > 72 < 211 > 11 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 72 Asp Gln Lys? Rg Tyr His Glu Asp He Phe Gly 1 5 10 < 210 > 73 < 211 > 15 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 73 Asp Leu Lys Glu slu Lys Asp He Asn Asn Asn Val Lys Lys Thr 1 5 10 5 < 210 > 74 < 211 > 9 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 74 Cyß Thr Gly Glu Glu Asp Thr Ser Glu 1 5 < 210 > 75 < 211 > 11 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 75 Pro Glu Glu Thr Gln Thr Gln Asp Gln Pro Met 1 5 10 < 210 > 76 < 211 > 13 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 76 Gln Lys Ser Asp Gln Gly Val Glu Gly Pro Gly Ma Thr 1 5? O < 210 > 77 < 211 > 12 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 77 Asp He Oly Gln Ser He Lys Lys Phe Ser Lys Val 5? O < 210 > 78 < 211 > 10 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 78 Gln Arg Ala Asp Ser Leu Ser Ser His Leu 1 5 0 < 210 > 79 < 211 > 9 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 79 * yr Pro Tyr Asp Val Pro Asp Tyr Ma < 210 > 80 < 211 > 10 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 80 Glu Gln Lys Leu He Ser Glu Glu Asp Leu 1 5 10 < 210 > 81 < 211 > 11 < 212 > PRT < 213 > Oryctolagus cuniculus < 400 > 81 Tyr Thr Asp He Glu Met Asn Lys Leu Gly Lys 10

Claims (12)

  1. CLAIMS 1. A method to identify one or several genes that serve as a basis for a defined phenotype, comprising the • following steps in the indicated order: 5 (a) removing uneven duplex nucleic acid molecules formed from hybridization within each of several nucleic acid source populations; and (b) retaining uneven duplex nucleic acid molecules formed from hybridization between the various • 10 source populations; the molecules retained in step (b) comprise the gene or the various genes that form the basis of the defined phenotype.
  2. 2. The method according to claim 1, wherein the plural source populations comprise at least one 15 standardized cDNA library.
  3. 3. The method according to claim 1, wherein the plural source populations comprise at least one linearized cDNA library.
  4. 4. The method according to claim 1, wherein the various source populations consist of DNA, the DNA of each of the source populations is labeled with a different marker, and the hybridization in step (b) is carried out using an excess of .DNA labeled from one or several source populations.
  5. 5. The method according to claim 4, wherein the excess of labeled DNA is a triple excess.
  6. The method according to claim 1, wherein each of the source populations is derived from a cell line.
  7. A method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or an individual from which several libraries are derived Additional cDNAs, comprising: (a) hybridizing an insert DNA from the first cDNA library to itself; (b) hybridizing an insert DNA from each library of the various additional cDNA libraries to itself; (c) contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein; (d) contacting each separate population of DNAs hybridized in step (b) individually with a second, immobilized, unequal binding protein; (e) separating the unbound DNA from the ligated DNA contacted in step (c); (f) separating the unbound DNA from the ligated DNA individually contacted in step (d); (g) labeling each separate population of the unbound DNA separated in step (f) with a distinguishable label capable of binding a partner molecule immobilized on a substrate; (h) hybridizing the labeled DNA separately in step (g) with the unbound DNA separated in step (e); (i) contacting the DNA hybridized in step (h) with an immobilized third unequal binding protein; (j) separating the unbound DNA from the ligated DNA contacted in step (i); (k) contacting the unbound DNA separated in step (j) with the partner molecule from each different marker; Y (1) separating the unbound DNA from the ligated DNA contacted in step (k); said unbound DNA separated in step (1) encodes one or several identified genes that serve as a basis for the defined phenotype.
  8. The method according to claim 7, wherein one or more of the cDNA libraries is / are normalized.
  9. The method according to claim 7, wherein one or more of the cDNA libraries is / are linearized (s).
  10. The method according to claim 7, wherein the labeling is carried out by polymerase chain reaction using an initiator labeled with 5 'peptide.
  11. 11. The method according to claim 7, wherein at least one immobilized partner molecule is an antibody.
  12. 12. The method according to claim 11, wherein the antibody is an anti-peptide antibody. The method according to claim 7, wherein the hybridization in step (h) is carried out using an excess of labeled DNA. 14. The method according to claim 13, where the excess of labeled DNA is a triple excess. 15. The method according to claim 7, wherein the first, second or third immobilized binding protein is MutS. The method according to claim 1, wherein the defined phenotype is selected from the group consisting of a plant phenotype, a microorganism phenotype, and a pathological phenotype. The method according to claim 16, wherein the defined phenotype is a pathological phenotype selected from the group consisting of cancer, osteoporosis, obesity, type II diabetes, and a prion-related disease. 8. A method for identifying one or several genes that serve as a basis for a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or an individual from which several additional cDNA libraries are derived, comprising: (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction; (b) amplifying the insert DNA from each of several additional cDNA libraries by polymerase chain reaction; (c) hybridizing the amplified DNA in step (a) with itself; (d) hybridizing each separate population of amplified DNA in step (b) with itself; (e) contacting the DNA hybridized in step (c) with immobilized MutS; (f) contacting each separate population of hybridized .DNA in step (d) individually with immobilized MutS; (g) separating the unbound DNA from the ligated DNA contacted in step (e); (h) separating the unbound DNA from the ligated DNA put in contact in step (f); (i) labeling the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers; (j) labeling each separate population of unbound DNA separated in step (h) by polymerase chain reaction using at least one primer having a distinguishable 5 'peptide tag capable of binding a partner molecule immobilized on a substrate; (k) hybridizing the .DNA labeled in step (i) with the DNA labeled in step (j); (1) contacting the DNA hybridized in step (k) with immobilized MutS; (m) separating the unbound DNA from the linked DNA contacted in step (1); (n) contacting the unlinked .DNA separated in step (m) with one or more partner molecule (s) capable / capable of binding the distinguishable 5 'peptide labeled primers; and (o) separating the unbound DNA from the linked DNA contacted in step (n); said unbound DNA separated in step (o) encodes one or several identified genes that form (n) the basis of the defined phenotype. A method for identifying one or more alleles that serve as a basis for a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or individual from which several derivatives are derived additional cDNA libraries, comprising: (a) hybridizing the insert DNA from the first cDNA library to itself; (b) hybridizing the insert DNA from each of several additional cDNA libraries to itself; • (c) contacting the DNA hybridized in step (a) with an immobilized first unequal binding protein; (d) contacting each separate population of DNA hybridized in step (b) individually with a second immobilized, unequal binding protein; (e) separating the unbound DNA from the linked DNA placed in contact in step (c); (f) separating the unbound DNA from the linked DNA contacted in step (d); (g) labeling each separate population of unbound DNA separated in step (f) with a distinguishable marker capable 15 of binding a partner molecule immobilized to a substrate; (h) hybridizing the DNA labeled in step (g) with the unbound DNA separated in step (e); (i) contacting the DNA hybridized in step (h) with an immobilized third unequal binding protein; 20 (j) separating the unbound DNA from the linked DNA contacted in step (i); (k) releasing the separated ligated DNA in step (j) of the immobilized mismatched binding protein; (1) contacting the DNA released in step (k) with one or more partner molecules capable of binding the different markers; (m) denaturing the DNA contacted in step (l); and (n) separating the unbound DNA from the denatured DNA link in step (), said unbound DNA separated in step (n) encodes one or more identified alleles that form the basis of the defined phenotype. . The method according to claim 18 or according to claim 19, wherein at least one cDNA library is normalized. The method according to claim 18 or according to claim 19, wherein at least one cDNA library is linearized. The method according to claim 19, wherein labeling is performed by polymerase chain reaction using primers labeled with 5 'peptide. The method according to claim 19, wherein at least one immobilized partner molecule is an antibody. The method according to claim 23, wherein the antibody is an anti-peptide antibody. The method according to claim 19, wherein the hybridization in step (h) is carried out using an excess of labeled DNA. The method according to claim 19, wherein the excess of labeled DNA is a triple excess. . The method according to claim 19, wherein at least one of the immobilized mismatched binding proteins is MutS. . A method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or individual from which several derivatives are derived additional cDNA libraries, comprising: (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction; (b) amplifying the insert DNA from each of several additional cDNA libraries by polymerase chain reaction; (c) hybridizing the amplified DNA in step (a) with it; (d) hybridizing the amplified DNA from each library in step (b) therewith; (e) contacting the DNA hybridized in step (c) with immobilized MutS; (f) contacting each hybridized DNA population in step (d) individually with immobilized MutS; (g) separating the unbound DNA from linked DNA contacted in step (e); (h) separating the unbound DNA from the linked DNA contacted in step (f); (i) amplifying the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers; (j) amplifying and labeling each population of unbound DNA separated in step (h) by polymerase chain reaction employing a distinguishable 5 'peptide labeled primer; (k) hybridizing the amplified and labeled DNA in step (j) with DNA amplified in step (i); (1) contacting the DNA hybridized in step (k) with immobilized MutS; (m) separating the unbound DNA from the linked DNA contacted in step (1); (n) releasing the ligated DNA separately in step (m) of immobilized MutS; (o) contacting the DNA released in step (n) with one or more specific immobilized antibodies for each distinguishable primer labeled with 5 'peptide; (p) denaturing the DNA contacted in step (o); and (q) separating the unbound DNA from denatured bound DNA in step (p); said unbound DNA separated in step (q) encodes one or more identified alleles that form the basis of the defined phenotype. 29. The method according to claim 28, wherein the release of the DNA bound from MutS • immobilized in step (n) is carried out using ATP 5 or proteinase K. 30. The method of any of claims 1, 7, 18, 19 and 28, which further comprises the use of one or more genes or alleles identified to carry out a prognosis or diagnosis. 31. The method according to claim 30, wherein the gene or allele or the various identified genes or alleles, or else a protein encoded by them, is an objective for pharmacological intervention. 32. The method according to claim 1, wherein the various source populations are from three to twelve source populations. 33. The method according to claim 1, wherein the plural source populations are from three to six source populations. 34. The method according to claim 1, wherein the plural source populations consist of four source populations. 35. A method to identify one or several genes that serve as a basis for a defined phenotype presented by a Cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which several additional cDNA libraries are derived, comprising: (a) hybridizing the insert DNA from each cDNA library with the same; (b) contacting each population of the DNA hybridized in step (a) individually with an immobilized first unequal binding protein; (c) separating the unbound DNA from the linked DNA individually contacted in step (b); (d) labeling each separate population of the unbound DNA separated in step (c) with a distinguishable marker capable of binding a partner molecule immobilized on a substrate; (e) hybridizing the labeled DNA separately in step (d); (f) contacting the DNA hybridized in step (e) with a second immobilized, unequal binding protein; and (g) separating the unbound DNA from the linked DNA contacted in step (f). A method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which several libraries are derived of additional cDNAs, comprising: (a) amplifying the insert DNA from each cDNA library by polymerase chain reaction; (b) hybridizing each separate population of the amplified DNA in step (a) therewith; (c) contacting each separate population of the DNA hybridized in step (b) individually with immobilized MutS; (d) separating the unbound DNA from the linked DNA contacted in step (c); (e) labeling each separate population of the unbound DNA separated in step (d) by polymerase chain reaction using at least one primer having a distinguishable 5 'peptide tag capable of binding a partner molecule immobilized on a substrate; (f) hybridizing the labeled DNA in step (e); (g) contacting the DNA hybridized in step (f) with immobilized MutS; and (h) separating the unbound DNA from the linked DNA contacted in step (g). A method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented with a cell or individual from which several derivatives are derived additional cDNA libraries, comprising: (a) hybridizing the insert DNA from each cDNA library therewith; (b) contacting each separate population of the DNA hybridized in step (a) individually with a first immobilized unequal binding protein; (c) separating the bound DNA from the unbound DNA contacted in step (b); (d) labeling each separate population of the unbound DNA separated in step (c) with a distinguishable marker capable of binding a partner molecule immobilized on a substrate; (e) hybridizing the labeled DNA separately in step (d); (f) contacting the DNA hybridized in step (e) with a second immobilized, unequal binding protein; Y (g) separating the unbound DNA from the linked DNA contacted in step (f). 38. A method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which they are derived several additional cDNA libraries, comprising: (a) amplifying the insert DNA from each cDNA library by polymerase chain reaction; (b) hybridizing the amplified DNA from each library in step (a) therewith; (c) contacting the DNA of each library hybridized in step (b) individually with a first immobilized, unequal binding protein; (d) separating the unbound DNA from the linked DNA contacted in step (c); (e) amplifying and labeling each separate population of the unbound DNA separated in step (d) by polymerase chain reaction using at least one primer having a distinguishable 5 'peptide tag; (f) hybridizing the amplified and labeled DNA in step (e); (g) contacting the DNA hybridized in step (f) with a second immobilized immobilized binding protein; (h) separating the unbound DNA from the linked DNA contacted in step (g); (i) releasing the separated ligated DNA in step (h); and (j) separating the DNA released in step (i) into individual strands. A method for identifying one or more alleles that form the basis of a defined phenotype, comprising the following steps in the established order: (a) removing uneven duplex nucleic acid molecules formed by hybridization within each of several acid source populations nucleic acids; (b) retaining uneven duplex nucleic acid molecules formed from hybridization between several source populations; and (c) separating the uneven strands conserved in step (b), said separate strands comprising one or several alleles that form the basis of the defined phenotype. . A method for identifying one or several genes that form the basis of a defined phenotype, comprising the following steps, in the indicated order: (a) removing uneven duplex nucleic acid molecules formed from hybridization with each of two source populations of nucleic acids; and (b) retaining uneven duplex nucleic acid molecules formed from hybridization between the two source populations; the molecules conserved in step (b) comprise the gene or the various genes that form the basis of the defined phenotype. A method for identifying one or several genes that form the basis of a defined phenotype, comprising the following steps in the indicated order: (a) removing uneven duplex nucleic acid molecules formed from the hybridization within a first source population of nucleic acids; and (b) retaining disparate duplex nucleic acid molecules formed from hybridization between the first source population and a second source population of nucleic acids; the molecules conserved in step (b) comprise the gene or the various genes that form the basis of the defined phenotype. . A method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or an individual from which a first cDNA library is derived, but not presented by a cell or individual from which a second cDNA library, comprising: (a) hybridizing the insert DNA of the first cDNA library with itself; (b) hybridizing the insert DNA of the second cDNA library with itself; (c) contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein; (d) contacting the DNA hybridized in step (b) with a second immobilized immobilized binding protein.; (e) separating the unbound DNA from the linked DNA contacted in step (c); (f) separating the unbound DNA from the linked DNA contacted in step (d); (g) labeling the unbound DNA separated in step (f) with a label capable of binding a partner molecule immobilized on a substrate; (h) hybridizing the DNA labeled in step (g) with the unbound DNA separated in step (e); (i) contacting the DNA hybridized in step (h) with an immobilized third unequal binding protein; (j) separating the unbound DNA from the linked DNA contacted in step (i); (k) contacting the unbound DNA separated in the passage with the partner molecule immobilized on a substrate capable of binding the label; and (1) separating the unbound DNA from the linked DNA contacted in step (k); said unbound DNA separated in step (1) encodes one or several identified genes that form the basis of the defined phenotype. A method for identifying one or more genes that form the basis of a phenotype defined from organisms that have consanguinity, comprising: (a) hybridizing to itself insert DNA from a first collection of cDNA libraries derived from organisms that they have the phenotype defined; (b) contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein; (c) separating the unbound DNA from the linked DNA contacted in step (b); (d) labeling the unbound DNA separated in step (c) with a label capable of binding a partner molecule immobilized on a substrate; (e) hybridizing the DNA labeled in step (d) with insert DNA from a second library of cDNA libraries derived from organisms that do not have the defined phenotype; (f) contacting the DNA hybridized in step (e) with a second immobilized, unequal binding protein; (g) separating the unbound DNA from the linked DNA contacted in step (f); (h) contacting the unbound DNA separated in step (g) with the partner molecule immobilized on the substrate capable of binding the label; and (i) separating the unbound DNA from the linked DNA contacted in step (h); said unbound DNA separated in step (i) encodes one or several genes that form the basis of the defined phenotype. A method for identifying one or several genes that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which a second one is derived cDNA library, comprising: (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction; (b) amplifying the .ADN of insert from the second cDNA library by polymerase chain reaction; (c) hybridizing the amplified DNA in step (a) with itself; (d) hybridizing the amplified DNA in step (b) with itself; (e) contacting the DNA hybridized in step (c) with an immobilized first MutS; (f) contacting the DNA hybridized in step (d) with a second immobilized MutS; (g) separating the unbound DNA from the linked DNA contacted in step (e); (h) separating the unbound DNA from the ligated DNA contacted in step (f); (i) amplifying the unlinked .DNA separated in step (g) by polymerase chain reaction using unlabeled primers; (j) amplifying and labeling the unbound DNA separated in step (h) by polymerase chain reaction using 5 'biotinylated initiators; (k) hybridizing the amplified and labeled DNA in step (j) with DNA amplified in step (i); (1) contacting the DNA hybridized in step (k) with a third immobilized MutS; (m) separating the unbound DNA from the ligated DNA contacted in step (1); (n) contacting the unbound DNA separated in step (m) with immobilized streptavidin; and (o) separating the unbound DNA from the ligated DNA contacted in step (n); said unbound DNA separated in step (o) encodes one or several identified genes that form the basis of the defined phenotype. . The method for identifying one or several genes that form the basis of the disease phenotype from healthy and affected individuals that have consanguinity / comprising: (a) amplifying the insert DNA from a first collection of cDNA libraries derived from individuals affected by polymerase chain reaction; (b) hybridizing the amplified DNA in step (a) with my o; (c) contacting the DNA hybridized in step (b) with a first immobilized MutS; (d) separating the unbound DNA from the ligated DNA contacted in step (c); (e) amplifying and labeling the unbound DNA separated in step (d) by polymerase chain reaction using 5 'biotinylated initiators; (f) amplifying the insert DNA from a second collection of cDNA libraries derived from healthy individuals by polymerase chain reaction; (g) hybridizing the amplified and labeled DNA in step (e) with the amplified DNA in step (f); (h) contacting the DNA hybridized in step (g) with a second immobilized MutS; (i) separating the unbound DNA from the ligated DNA contacted in step (h); (j) contacting the unbound DNA separately in step (i) immobilized streptavidin; and (k) separating the unbound DNA from the ligated DNA contacted in step (j); said unbound DNA separated in step (k) encodes one or several identified genes that form the basis of the disease phenotype. 46. A method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which it is derived a second cDNA library, comprising: (a) hybridizing the insert DNA of the first cDNA library with itself; (b) hybridizing the insert DNA of the second cDNA library with itself; (c) contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein; (d) contacting the DNA hybridized in step (b) with a second, immobilized, unequal binding protein; (e) separating the unbound DNA from the ligated DNA contacted in step (c); (f) separating the unbound DNA from the ligated DNA contacted in step (d); (g) labeling the unbound DNA separated in step (f) with a label capable of binding a partner molecule immobilized to a substrate; (h) hybridizing the DNA labeled in step (g) with unbound DNA separated in step (e); (i) contacting the DNA hybridized in step (h) with an immobilized third unequal binding protein; (j) separating the unbound DNA from the ligated DNA contacted in step (i); (k) releasing the ligated DNA separated in step (j) of the immobilized third unequal binding protein; (1) contacting the DNA released in step (k) with the partner molecule immobilized on the substrate capable of binding the label; (m) denaturing the DNA contacted in step (1); and (n) separating the unbound DNA from the denatured ligated DNA in step (m); said unbound DNA separated in step (n) encodes one or several alleles that form the basis of the defined phenotype. . A method for identifying one or more alleles that form the basis of a phenotype defined from organisms that have consanguinity, comprising: (a) hybridizing itself to the .DNA of insert from a first collection of cDNA libraries derived from organisms that have the phenotype defined; (b) contacting the DNA hybridized in step (a) with a first immobilized unequal binding protein; (c) separating the unbound DNA from the ligated DNA contacted in step (b); (d) labeling the unbound DNA separated in step (c) with a label capable of binding a partner molecule immobilized on a substrate; (e) hybridizing the DNA labeled in step (d) with insert DNA from a second library of cDNA libraries derived from organisms that do not have the defined phenotype; (f) contacting the hybridized DNA in step (e) with a second immobilized mismatched binding protein; (g) separating the unbound DNA from the ligated DNA contacted in step (f); (h) releasing the ligated DNA separated in step (g) of the immobilized second unequal binding protein; (i) contacting the DNA released in step (h) with the partner molecule immobilized on the substrate capable of binding the label; (j) denaturing the DNA contacted in step (i); and (k) separating the ligated DNA from denatured unbound DNA in step (j); said .DNA linked separately in step (k) encodes one or more identified alleles that form the basis of the defined phenotype. A method for identifying one or more alleles that form the basis of a defined phenotype presented by a cell or individual from which a first cDNA library is derived, but not presented by a cell or individual from which a second one is derived cDNA library, comprising: (a) amplifying the insert DNA from the first cDNA library by polymerase chain reaction; (b) amplifying the insert DNA from the second cDNA library by polymerase chain reaction; (c) hybridizing the amplified DNA in step (a) with itself; (d) hybridizing the amplified DNA in step (b) with itself; (e) contacting the DNA hybridized in step (c) with a first immobilized MutS; (f) contacting the DNA hybridized in step (d) with a second immobilized MutS; (g) separating the unbound DNA from the ligated DNA contacted in step (e); (h) separating the unbound DNA from the ligated DNA contacted in step (f); (i) amplifying the unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers; (j) amplifying and labeling the unbound DNA separated in step (h) by polymerase chain reaction using 5 'biotinylated initiators; (k) hybridizing the amplified and labeled DNA in step (j) with DNA amplified in step (i); (1) contacting the DNA hybridized in step (k) with a third immobilized MutS; (m) separating the unbound DNA from the ligated DNA contacted in step (1); (n) releasing the ligated DNA separated in step (m) of the immobilized third MutS; (o) contacting the DNA released in step (n) with immobilized streptavidin; (p) denaturing the DNA contacted in step (o); (q) separating the unbound DNA from the denatured ligated DNA in step (p); said unbound DNA separated in step (q) encodes one or more identified alleles that form the basis of the defined phenotype. . A method for identifying one or more affected alleles that form the basis of a disease phenotype from healthy and affected individuals that have consanguinity, comprising: (a) amplifying the insert DNA from a first collection of derived cDNA libraries of individuals affected by polymerase chain reaction; (b) hybridizing the amplified DNA in step (a) with itself; (c) contacting the DNA hybridized in step (b) with a first immobilized MutS; (d) separating the unbound DNA from the ligated DNA contacted in step (c); (e) amplifying and labeling the unbound DNA separated in step (d) by polymerase chain reaction using the biotinylated 5 'initiators; (f) amplifying the insert DNA from a second collection of cDNA libraries derived from healthy individuals by polymerase chain reaction; (g) hybridizing the amplified and labeled DNA in step (e) with the amplified DNA in step (f); (h) contacting the DNA hybridized in step (g) with a second immobilized MutS; (i) separating the unbound DNA from the ligated DNA contacted in step (h); (j) releasing the ligated DNA separated in step (i) of the second immobilized MutS; (k) contact the unlinked .ADN separated in step (j) immobilized streptavidin; (1) denaturing the DNA contacted in step (k); (m) separating the bound DNA from denatured unbound DNA in step (1); said separate linked DNA in step (m) encodes one or several identified affected alleles that form the basis of the disease phenotype.
MXPA/A/2000/006875A 1998-01-15 2000-07-13 Multiplex vgid MXPA00006875A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09007905 1998-01-15

Publications (1)

Publication Number Publication Date
MXPA00006875A true MXPA00006875A (en) 2002-06-05

Family

ID=

Similar Documents

Publication Publication Date Title
Daw et al. A common region of 10p deleted in DiGeorge and velocardiofacial syndromes
US5916810A (en) Method for producing tagged genes transcripts and proteins
JP4226078B2 (en) Diagnostic method and diagnostic kit for genetic diseases by molecular combing
US6420111B1 (en) Multiplex VGID
US8574832B2 (en) Methods for preparing sequencing libraries
EP0760867B1 (en) Compositions and methods relating to dna mismatch repair genes
US6632610B2 (en) Methods of identification and isolation of polynucleotides containing nucleic acid differences
CN101248180B (en) Mitochondrial mutations and rearrangements as a diagnostic tool for the detection of sun exposure, prostate cancer and other cancers
JPH08500723A (en) Genome improper scanning
AU628451B2 (en) Method of enrichment and cloning for dna containing an insertion or corresponding to a deletion
CN1643148B (en) Mouse spermatogenesis genes, human male sterility-associated genes and diagnostic system using the same
CN113249496B (en) Primer composition and kit for single gene defect detection in Xq28 region
US20040023237A1 (en) Methods for genomic analysis
JPH1099085A (en) Polymorphism of human mitochondrial dna
US6368794B1 (en) Detection of altered expression of genes regulating cell proliferation
JP2002541821A (en) DNA molecule mapping method including infinite amplification step
CA2298980A1 (en) Novel gene encoding a dna repair endonuclease and methods of use thereof
TWI326710B (en)
MXPA00006875A (en) Multiplex vgid
JP2007330260A (en) Microsatellite sequences for canine genotyping
Samani Molecular genetics of susceptibility to the development of hypertension
González-Villaseñor et al. Screening for specific recombinant clones
CN111139297B (en) Kit for preimplantation embryo genetic diagnosis and prenatal diagnosis of DMD
JP2003530117A (en) Genome-fragment binding unequal weight identification and separation method
JP2000245487A (en) Gene composition and method