CA2314992A1 - Nucleotide polymorphisms in soybean - Google Patents
Nucleotide polymorphisms in soybean Download PDFInfo
- Publication number
- CA2314992A1 CA2314992A1 CA002314992A CA2314992A CA2314992A1 CA 2314992 A1 CA2314992 A1 CA 2314992A1 CA 002314992 A CA002314992 A CA 002314992A CA 2314992 A CA2314992 A CA 2314992A CA 2314992 A1 CA2314992 A1 CA 2314992A1
- Authority
- CA
- Canada
- Prior art keywords
- nucleic acid
- loci
- locus
- group
- probe
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/415—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/172—Haplotypes
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- Botany (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mycology (AREA)
- Gastroenterology & Hepatology (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Sequence polymorphisms at 53 loci throughout the soybean genome are described. These polymorphisms are used to fingerprint and map genes and QTL in selective breeding experiments and to identify flanking nucleic acids that map near genes and QTL.
Description
NUCLEOTIDE POLYMORPHISMS IN SOYBEAN
FIELD OF THE INVENTION
The invention is in the field of agricultural technology, particularly marker assisted selection of soybean.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of USSN 60/068,185 "NUCLEOTIDE POLYMORPHISMS IN SOYBEAN" filed December 19, 1997 by Jessen et al. , which is incorporated herein by reference.
BACKGROUND OF THE INVENTION
Genetic mutations are antecedent to evolution and genetic diversity. Along with mutations that change the expression of genes, and consequently the function and structure of organisms, comparative sequencing within a species reveals a plethora of neutral mutations in non-coding regions of genomic DNA. Whether or not mutations affect organisms, they constitute diversity in the nucleotide sequence - and if known, can be exploited as genetic markers on chromosomes.
To be genetically stable, a mutation in a base in one DNA strand typically requires a complementary base change in the opposite strand. Thus, the polymorphism, or change, between a wildtype and a mutant includes both strands of the DNA
molecule.
If a change allows the organism to be viable and fecund, any base can substitute for any other base at one or more nucleotide positions in a DNA sequence, or any length of DNA bases can be inserted or deleted. Mutations that are selectively neutral or provide an advantage in a particular environment may then proliferate within a population.
Genetic markers represent (mark the location of) specific loci in the genome of a species or closely related species, and sampling of different genotypes at these marker loci reveals genetic variation. The genetic variation at marker loci can then be described and applied to genetic studies, commercial breeding, diagnostics, cladistic analysis of variance, or genotyping of samples.
Genetic markers have the greatest utility when they are highly heritable, mufti-allelic, and numerous. Most genetic markers are highly heritable because their alleles are determined by the nucleotide sequence of DNA, which is highly conserved from one generation to the next, and the detection of their alleles is unaffected by the natural environment. Markers have multiple alleles because, in the evolutionary process, rare, genetically-stable mutations in DNA sequences defining marker loci arose and were disseminated through the generations along with other existing alleles. The highly conserved nature of DNA combined with the rare occurrence of stable mutations allows S genetic markers to be both predictable and discerning of different genotypes. The repertoire of genetic-marker technologies today allows multiple technologies to be used simultaneously in the same project. The invention of each new genetic-marker technology and each new DNA polymorphism adds additional utility to genetic markers.
Many genetic-marker technologies exist-- including restriction-fragment-length polymorphism (RFLP) Bostein et al (1980) Am J Hum Genet 32:314-331; single strand conformation polymorphism (SSCP) Fischer et al.
(1983) Proc Natl Acad Sci USA $O:1S79-1583, Orita et al. (1989) Genomics 5:874-879;
amplified fragment-length polymorphism (AFLP) Vos et al. (1995) Nucleic Acids Res 23:4407-4414; microsatellite or simple-sequence repeat (SSR) Weber JL and May PE
1S (1989) Am J Hum Genet 44:388-396; random-amplified polymorphic DNA (RAPD) Williams et al (1990) Nucleic Acids Res 18:6531-6S3S; sequence tagged site (STS) Olson et al. (1989) Science 245:1434-1435; genetic-bit analysis (GBA) Nikiforov et al (1994) Nucleic Acids Res 22:4167-4175; allele-specific polymerase chain reaction (ASPCR) Gibbs et al. ( 1989) Nucleic Acids Res 17:2437-2448, Newton et al. ( 1989) Nucleic Acids Res 17:2503-2516; nick-translation PCR (e.g., TaqMan"~ Lee et al. (1993) Nucleic Acids Res 21:3761-3766; and allele-specific hybridization (ASH) Wallace et al.
(1979) Nucleic Acids Res b:3S43-3SS7, (Sheldon et al. (1993) Clinical Chemistry 39(4):718-719) among others-- with each technology having its own particular basis for detecting polymorphisms in DNA sequence.
2S The development of polymorphic genetic markers has made it possible fog . quantitative and molecular geneticists to investigate what Edwards, et al., in Genetics 115:113 (1987) referred to as "quantitative trait loci" (QTL), as well as their numbers, magnitudes and distributions. QTL include genes that control, to some degree, numerically representable phenotypic traits (disease resistance, crop yield, resistance to environmental extremes, etc.), that are distributed within a family of individuals as well as within a population of families of individuals. An experimental paradigm has been developed to identify and analyze QTL. This paradigm involves crossing two inbred lines and genotyping multiple marker loci and evaluating one to several quantitative phenotypic traits among the progeny of the cross. QTL are then identified and ultimately selected for based on significant statistical associations between the genotypic values determined by genetic marker technology and the phenotypic variability among the segregating progeny.
Unfortunately, complete sets of genetic markers are not available for a variety of important crops, making it difficult to quickly assess the genotype of any particular individual. For example, although soybeans are a major cash crop which provide most of the world's protein and vegetable oils, complete sets of genetic markers which span the soybean genome are not available. Accordingly, there exists a need to develop genetic markers for genotyping, marker assisted selection, positional cloning of nucleic acids and the like, e.g., in soybean. This invention provides these and many other features.
SUMMARY OF THE INVENTION
New sequence polymorphisms at 63 different loci are described.
Identification of these alleles provides compositions and methods for rapidly determining the complete genotype of a soybean plant. This ability to determine, accurately and quickly, the genotype of a soybean plant provides for improved methods of marker assisted selection in plant breeding and in analysis of transgenic soybean cells and plants.
Example technologies which may be used to detect the loci include allele-specific hybridization (ASH), the polymerase chain reaction (PCR), random-amplified polymorphic DNA (RAPD), restriction-fragment-length polymorphism (RFLP), single strand conformation polymorphism (SSCP), allele-specific polymerase chain reaction (ASPCR), genetic-bit analysis (GBA), nick-translation PCR (TaqMan~), hybridization to solid phase arrays (e.g., very large scale immobilized polymer arrays (VLSIPS
arrays)), and the like.
In one embodiment, methods of detecting one or more genetic nucleotide polymorphism in a biological sample from a soybean plant are provided by hybridizing a probe nucleic acid to one of the loci described herein. For example, a biological sample derived from a soybean plant is provided, and a probe nucleic acid is hybridized to a target nucleic acid including a nucleotide polymorphism from the locus.
Preferred loci include pA060A, pA077A, pA08bA, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLTbSA, php0226SA, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A,php02636A, php03522A,php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pLO58A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, SOYBPSP, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A. Particularly preferred "php" loci inclu_ de php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php 10355B php02329A, php02371 A, php05290A, php02376A, and php 10078A. In certain embodiments where more than one loci are detected, at least one of the detected loci will typically be a locus with a "php" designation. One newly discovered advantage for all of the loci noted above is that probes which specifically hybridize to the selected locus do not specifically hybridize to additional loci in the soybean genome because the loci are all unique in the soybean genome. In preferred aspects, the loci and included polymorphic nucleotides are in linkage disequilibrium with a Quantitative Trait Locus (QTL) such as resistance to soybean cyst nematode, brown stem rot, phytopthora rot or the like. Accordingly, the presence or absence of the detected locus corresponds to the presence or absence of a particular QTL.. A variety of probe nucleic acids which hybridize to the loci are provided by the invention. The probes include the amplicons, PCR primers, and the like, described herein, which are used to identify and detect the loci.
Hybridization of a probe to a locus is detected to confirm the presence of the locus and typically to determine whether a particular polymorphic nucleotide is present at the locus. This detection is performed directly or indirectly.
Direct detection.
methods of detecting hybridization include Southern analysis, northern analysis, array-dependent nucleic acid hybridization on a nucleic acid polymer array, in situ hybridization, or other methods which directly monitor the hybridization.
Indirect detection includes, e.g., detection of an amplification product which is dependent on hybridization of the probe to the target nucleic acid. For example, the polymerase chain reaction (PCR) and/or the ligase chain reaction (LCR) are used to monitor.
hybridization, e.g., by detecting formation of an amplicon which is synthesized only if a probe (e.g., a PCR primer) hybridizes to the target. Similarly, the probe or target is optionally 5 PC'T/US98/26935 amplified prior to detection. Preferred amplification methods include PCR, LCR, and cloning of the target nucleic acid.
In several embodiments, it will be desirable to detect more than one locus.
For example, detection of multiple loci from a biological sample provide a way of S providing an overall genotype of the biological sample. Thus, in one embodiment, a second probe nucleic acid is hybridized to a second target nucleic acid linked to a second nucleotide polymorphism in a second locus selected from the second group of loci consisting of the loci noted above. Similarly, a plurality of probes (a third, fourth, fifth... nth probe) are hybridized to a plurality (a third, fourth, fifth...
nth) polymorphic nucleotide in one of the loci noted above. In one embodiment, a majority of the noted loci are detected. In an another preferred embodiment, all of the loci noted above are detected, thereby providing a comprehensive genotype. Similarly, 10 % , 20 % , 30 % , 40 % , 50 % , 60 % , 70 % , 80 % , 90 % , 100 % , or any intermediate percentage thereof of the loci are detected in alternate embodiments.
The methods are applicable to detection of loci and genotyping in a variety of biological samples, including a soybean plant, a soybean plant extract, an isolated soybean plant tissue, an isolated plant tissue extract, a soybean plant cell culture, a soybean plant cell culture extract, a recombinant cell comprising a nucleic acid derived from a soybean plant, a soybean plant seed, and an extract of a recombinant cell comprising a nucleic acid derived from a soybean plant.
The target nucleic acid which is detected can include the first polymorphic nucleotide, or it may be proximal to the polymorphic nucleotide. In typical embodiments, the target nucleic acid includes polymorphic nucleotide to be detected.
However, depending on how the target nucleic acid is detected, it can also be convenient to detect nucleotides proximal to the polymorphic nucleotide. For example, when LCR
is used, the presence or absence of a polymorphic nucleotide is detected by amplifying nucleotide regions flanking the polymorphic nucleotide.
In one aspect, the present invention includes marker-assisted selection of soybean plants, e.g., by detecting any of the loci noted above and selecting a plant based upon the presence or absence of one or more desired polymorphic nucleotide.
In another aspect, nucleic acids corresponding to nucleotides proximal to or including the marker nucleic acids are cloned. In a particularly preferred aspect, a nucleic acid flanked by two nucleic acid loci is cloned. Typically, this cloned nucleic acid includes a coding sequence. The cloned nucleic acid is optionally transduced into cells or plants, e.g., to make transgenic plants (e.g., soybean) expressing the coding sequence.
In one aspect, nucleotide polymorphisms proximal to the selected loci are identified and mapped, e.g., by genetic mapping or nucleotide sequencing of nucleic acid regions genetically linked to the selected locus from genetically diverse strains of _ soybean. The identification of these additional polymorphisms provides additional marker regions which are used to identify the source of a soybean nucleic acid.
Similar to the detection methods above, nucleotide polymorphisms are also detected by separating nucleic acids having the polymorphisms by size and or charge, thereby separating the nucleic acids. For example, single-strand conformation polymorphism can be performed on two or more nucleic acids on a polyacrylamide gel.
Amplification methods and compositions for detecting nucleic acids linked to loci are also provided. Typical amplification methods of the present invention include PCR, asymmetric PCR, and LCR. For example, methods of amplifying a nucleic acid with a first primer nucleic acid to a template nucleic acid and amplifying a portion of the template nucleic acid with a template-dependent polymerise enzyme or a ligase enzyme are provided. The primer hybridizes under stringent conditions to a locus nucleic acid from one of the loci described above. Typical amplification primer lengths are less than 100 nucleotides, although they may be longer or shorter, e.g., between about 10 and SO
nucleotides, typically between about 1S and 25 nucleotides, or as long as or longer than 100-200 nucleotides, or the like. Where the primer is a PCR primer, the primer provides a polymerise extendible substrate and the primer-dependent polymerise extends the primer. In, one aspect, the primer is an allele-specific primer. In typical LCR
amplification methods, the first primer hybridizes adjacent to a second primer on the template nucleic acid and the first and second primers are ligated with a ligase enzyme, thereby amplifying the portion of the template hybridized to the first and second primers.
In PCR methods, the method includes hybridizing a second primer to the template, wherein the first and second primer hybridize to complementary strands of the template nucleic acid.
Amplification mixtures for practicing the amplification methods are also provided. For example, a PCR reaction mixture having, e.g., a polymerise enzyme, deoxynucleotides, a template nucleic acid comprising a polymorphic nucleotide which 7 PCT/US98lZ6935 hybridizes under stringent conditions to a locus as above, and primers which specifically hybridize to the template nucleic acid are also provided. Primers include the PCR
primers described herein, and additional primers selected to amplify portions of the amplicons described herein. As noted, the primers are optionally allele-specific primers to facilitate quantitative PCR.
PCR amplicons are also provided, including nucleic acids having' a polymorphic nucleotide. The amplicon hybridizes under stringent conditions to a locus selected from a group of loci consisting of those set forth above. Exemplar amplicons are described herein. Particularly preferred amplicons include php11138 and php11627.
A variety of additional compositions are also provided by the present invention. One class of compositions has a first recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to a first allele from a locus in the soybean genome selected from the above loci, where the first recombinant nucleic acid shows decreased hybridization affinity for a second allele from the selected locus. The composition optionally includes one or more additional recombinant nucleic acids (i. e., additional probes) which differentially hybridize under allele-specific hybridization conditions to a second allele from a selected locus, wherein the second nucleic acid shows decreased hybridization affinity for the first allele from the selected locus. For example, mufti-color hybridization nucleic acid probe hybridization techniques such as comparative genomic hybridization (CGH) or fluorescence in situ hybridization (FISH) can be used to detect different alleles on different chromosomes.
In another aspect, a composition including a recombinant nucleic acid which specifically hybridizes to a first allele-specific probe and a second allele-specific probe is provided. The recombinant nucleic acid can be a probe, target nucleic acid, chromosomal nucleic acid, recombinant nucleic acid or the like. The first and second allele-specific probes hybridize under allele-specific hybridization conditions to a first haplotype of a locus in the soybean genome noted above. The composition optionally comprises additional materials such as allele-specific probes for the detection of the nucleic acid, or the like.
Sets of nucleic acid probes are also provided, including sets of nucleic acid probes having a plurality of probe nucleic acids which specifically hybridize to a plurality of target nucleic acids which hybridize under stringent conditions to a plurality of the loci noted above. The sets may be in any of a variety of physical arrangements, including WO 99/31964 g PCTNS98/26935 arrays, containers, or the like. In a particularly preferred embodiment, the set is in kit form, i. e., having the set of nucleic acids, and optionally comprising one or more additional component such as a container, instructional materials, one or more control target nucleic acids, and recombinant cells comprising one or more target nucleic acids.
Transgenic plants are provided. In particular, a transgenic plant having a recombinant nucleic acid which hybridizes under stringent conditions to a target nucleic acid is provided.' The target nucleic acid is genetically linked to (and preferably comprises) a nucleotide polymorphism from a locus selected from the group of loci noted above. In a preferred embodiment, the recombinant nucleic acid comprises a coding sequence encoded by a gene in linkage disequilibrium with a Quantitative Trait Locus (QTL). Example QTL include a QTL for resistance to soybean cyst nematode, a QTL
for resistance to brown stem rot, and a QTL for resistance to phytopthora rot.
Definitions A "polymorphism" is a change or difference between two related nucleic acids. A "nucleotide polymorphism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence. A "genetic nucleotide polymorphism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence, where the two nucleic acids are genetically related, i.e., homologous, e.g., where the nucleic acids are isolated from different strains of a soybean plant, or from different alleles of a single strain, or the like.
A "biological sample" is a portion of material isolated from a biological source such as a plant, isolated plant tissue, or plant cell, or a portion of material made from such a source such as a cell extract or the tike.
A "probe nucleic acid" is an RNA or DNA or analogue thereof. The probe may be of any length. Typical probes include PCR primers, PCR amplicons, cloned genomic nucleic acids encoding a genetic locus of interest, and the like.
"Marker assisted selection" refers to the process of selecting a desired trait or desired traits in a plant or plants by detecting one or more nucleic acids . from the plant, where the nucleic acid is associated with the desired trait.
A "locus" is a nucleic acid region where a polymorphic nucleic acid resides.
A "genetic marker" is a region on a genomic nucleic acid mapped by a marker nucleic acid. A "marker nucleic acid" is a nucleic acid which is an indicator for the presence of a marker locus. The marker can be either a probe nucleic acid which identifies a target nucleic acid genetically linked to the locus, or a sequence hybridized by the probe, i.e., a genomic nucleic acid linked to the locus. Typically, a probe will be used to hybridize to or amplify the locus. Example markers include isolated nucleic acids from the locus, cloned nucleic acids comprising the locus, PCR primers for amplifying the locus, and the like.
Two nucleic acid sequences are "genetically linked" when the sequences are in linkage disequilibrium.
A "vector" is a carrier composition which assists in transducing, transforming or infecting a cell with a nucleic acid, thereby causing the cell to express vector associated nucleic acids and, optionally, proteins other than those native to the cell, or in a manner not native to the cell. The term vector includes nucleic acid (ordinarily RNA or DNA) to be expressed by the cell (a "vector nucleic acid").
A
vector optionally includes materials to aid in achieving entry of the nucleic acid into the cell, such as a retroviral particle, liposome, protein coating or the like.
A "promoter" is an array of nucleic acid control sequences which directs transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II
type promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor elements which can be located as much as several thousand base pairs from the start site of transcription. A "constitutive" promoter is a promoter which is active in a selected organism under most environmental and developmental conditions. An "inducible" promoter is a promoter which is under environmental or developmental regulation in a selected organism.
The terms "isolated" or "biologically pure" refer to material which is substantially or essentially free from components which normally accompany it as found in its native state.
WO 99/31964 to PCT/US98/26935 DETAILED DISCUSSION OF THE INVENTION
The present invention resides in part in the identification of new marker loci for soybean plants. The loci include polymorphic nucelotides which vary depending .
on the particular strain of soybean considered. These loci are used in plant breeding projects, e.g., for marker-assisted selection, for positional cloning of linked nucleic acid regions of soybean chromosomal nucleic acids, and the like. Because the sequences of the loci and surrounding regions are provided, it is possible to easily select appropriate probes for detection of particular polymorphic nucleotides, e.g., by allele-specific hybridization, PCR amplification, or the like. The polymorphic nucleotides are detected directly, or by detecting nucleic acids in linkage disequilibrium with the loci.
The described loci which are prefixed by "php" were previously completely unknown, with no information being publicly available regarding the loci.
Some of the other loci (those not preface by "php") were previously identified by binding to RFLP probes; however, no sequence information regarding the loci was available and it was, therefore, not possible to design probes which hybridize specifically to polymorphic nucleotides. The locus SOYBPSP was previously sequenced in an intron of a gene, but no polymorphism was previously identified at the locus. Clones comprising loci not prefaced by "php" i.e., comprising publicly available RFLP
probes are publicly available from Biogenetic Services, Inc. (Brookings SD) and PE
AgGen, Inc. (formerly Linkage Genetics) (Salt Lake City, UT).
Uses for Nucleotide Po~morphisms The nucleotide polymorphisms described here are used, e.g., for DNA-fingerprinting soybean varieties, genetic-linkage mapping of the soybean genome, marker association with specific genes or quantitative-trait loci (QTL) affecting phenotypic traits, marker-assisted selection for preferred genotypes in soybean breeding, positional cloning of genes from the soybean genome and other purposes that will be apparent upon complete review of this disclosure.
DNA Fingerprinting DNA fingerprinting is the application of multiple genetic markers to DNA
extracted from an individual or pooled group of related individuals, such as an inbred line or variety, so that the cumulative marker allele profile provides a description of the variety's overall genotype. Comparisons of these marker allele profiles can be made among samples of different varieties, or among different samples of the same variety, and estimates can be made regarding the genetic relationships among these samples.
These estimates can be obtained without knowing the pedigrees of the fingerprinted varieties.
DNA fingerprints are used to determine whether a soybean variety is described and used within guidelines established, e.g., by Plant Variety Protection and patent laws. Seed of one variety may be sold by two different vendors using different variety names, or a variety may have been genetically bred from another variety by repeated cycles of backcrossing and selection to the extent that the new variety was essentially derived from the original variety. In these situations, questions about ownership may need to be answered. DNA-fingerprint profiles can be obtained from each variety or seed source and compared. If done properly, the data will show with a high probability whether or not the different samples are genetically alike.
The polymorphisms at the 63 polymorphic loci described herein invention provide a basis for quickly and reliably comparing DNA-fingerprint profiles in soybean. These 63 loci are well distributed around the soybean genome and provide an adequate number of loci to make reasonable conclusions regarding variety or seed-source identities and relationships.
DNA fingerprinting is used by soybean breeders to identify diverse breeding parents, to create novel recombinant populations, to better understand the structure of the soybean genome, and to better understand the history of pedigree breeding. The 63 example loci herein provide sufficient polymorphisms to estimate genetic relationships among soybean varieties.
DNA fingerprinting is used by soybean seed companies to estimate genetic purity of a seedlot for quality control and labeling of their product, and to identify the variety source of any contamination found among fingerprinted samples. These 63 loci provide sufficient polymorphisms to estimate genetic purity within a seedlot.
Genetic Mapping of the Soybean Genome Genetic mapping is done by finding polymorphic markers that are genetically linked to each other (in linkage groups) or linked to genes or QTL
affecting phenotypic traits of interest within a segregating population. The alignment of markers into linkage groups is useful as a reference for future use of the markers and for accurately positioning genes or QTL relative to the markers. The nucleotide polymorphisms described here for 63 exemplar loci provide a means to utilize these loci in genetic mapping studies in soybean. Many of these loci have multiple sub-loci and haplotypes across the sub-loci. Each haplotype provides a different allele composition within a locus, thereby expanding the utility of these marker loci to more soybean mapping studies than possible with only two alleles per locus. Many of these 63 loci were intentionally selected for polymorphism development because they were widely dispersed among soybean genetic linkage groups and would therefore collectively maximize their utility for mapping the soybean genome. Because each of these 63 loci was designed to be a discrete individual locus, these loci cannot be confused with duplicate loci having similar sequence elsewhere in the genome. They therefore are excellent reference loci on any soybean genetic map and can be used to reliably align the same linkage groups from different maps.
Many of these marker loci were selected to develop additional nucleotide polymorphisms because they were found to be genetically linked to important QTL for disease resistance. The loci php05219A, php07659A, php10355B, and pK069A were found to cluster around a resistance QTL on group G; the loci pT155A, pBLT24A, and pBLT65A were clustered around a resistance QTL on group A; and the locus php02301A
was near a resistance QTL on group M. These three QTL provide resistance to soybean cyst nematode (Heterodera glycines Ichinohe) (See also, Webb et al. ( 1995) Theor Appl Genet 91:574-581). The loci php02636A on group C, php08584A on group S, and pK079A on group L26 were all linked to additional QTL for resistance to soybean cyst nematode. The locus pB032B on group J was near Rbs3 for resistance to brown stem rot. The loci pK418A and pA280A on group N were near Rps,, the locus pR045A on group F was linked to Rps3, the loci pA378A and pL183A on group G were near Rps4, and the locus pT005A on group G was near Rpss, all providing resistance to phytophthora rot.
Marker-Assisted Selection in Soybean Improvement After genes or a QTL and a marker or markers are mapped together and found to be in linkage disequiiibrium, it is possible to use those markers to select for the desired alleles of those genes or QTL - a process called marker-assisted selection (MAS).
In brief, a nucleic acid corresponding to the marker nucleic acid is detected in a biological sample from a plant to be selected. This detection can take the form of hybridization of a probe nucleic acid to a marker, e.g., using allele-specific hybridization, Southern analysis, northern analysis, in situ hybridization, hybridization of primers followed by PCR amplification of a region of the marker or the like. A
variety of procedures for detecting markers are described herein. After the presence (or absence) of a particular marker in the biological sample is verified, the plant is selected, i.e., used to make progeny plants by selective breeding.
Nucleotide polymorphisms were developed at markers near numerous resistance loci in soybean that are effective against soybean cyst nematode, Phytophtl:ora sojae (phytophthora rot), and Phialophora gregata (brown stem rot). These are among the most damaging pathogens to soybeans in North America.
Soybean breeders need to combine disease resistance loci with genes for high yield and other desirable traits to develop improved soybean varieties.
Disease screening for large numbers of samples can be expensive, time consuming, and unreliable. Use of the nucleotide polymorphisms described here and genetically-linked nucleotides as genetic markers for disease resistance loci is an effective method of selecting resistant varieties in breeding programs. When a population is segregating for multiple loci affecting multiple diseases, the efficiency of MAS compared to phenotypic screening becomes even greater because all the loci can be processed in the lab together from a single sample of DNA. Another advantage over field evaluations for disease reaction is that MAS can be done at any time of year regardless of the growing season.
Moreover, environmental effects are irrelevant to marker-assisted selection.
Another use of MAS in plant and animal breeding is to assist the recovery of the recurrent parent genotype by backcross breeding. Backcross breeding is the process of crossing a progeny back to one of its parents. Backcrossing is usually done for the purpose of introgressing one or a, few loci from a donor parent into an otherwise desirable genetic background from the recurrent parent. The more cycles of backcrossing that is done, the greater the genetic contribution of the recurrent parent to the resulting variety. This is often necessary, because resistant plants may be otherwise undesirable, i.e., due to low yield, low fecundity, or the like. In contrast, strains which are the result of intensive breeding programs may have excellent yield, fecundity or the like, merely being deficient in one desired trait such as resistance to a particular pathogen.
The 63 marker loci described in the Examples below are distributed around the soybean genome and are used to select for the recurrent-parent genotype.
MAS for the recurrent-parent genotype can be combined with MAS for the disease resistance loci using these markers. Accordingly, it is possible to use the markers to introduce disease resistance QTL into plant varieties having an otherwise desirable genetic background using the markers of the invention for selection of the QTL
and for selection of the otherwise desirable background.
Positional Cloning in Soybean Positional gene cloning uses the proximity of a mapped gene and its linked markers to physically define a cloned chromosomal fragment that contains a desired gene. If two or more markers flanking the gene are physically close to each other, they may hybridize to the same DNA fragment, thereby identifying a clone on which the gene is located. If flanking markers are more distant from each other, a fragment containing the gene may be identified by constructing a contig of overlapping clones.
Recently, BAC (bacterial artificial chromosome) and YAC (yeast artificial chromosome) libraries containing large fragments of soybean DNA have been constructed Funke RP and Kolchinsky A (1994) CRC Press, Boca Raton, FL, pp125-308 1994;
Marek LF and Shoemaker RC (1996) Soybean Genet Newsl 23:126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24:196-198. These libraries and advances in genetic mapping make positional cloning of soybean genes feasible using the markers identified herein.
A marker is ideally locus-specific to reliably identify a clone from the targeted chromosomal region. The soybean genome is highly duplicated (Shoemaker et al (1996) (Glycine subgenus sofa) Genetics 144:329-338, but each nucleotide polymorphism and its PCR primers described here is specific to a single locus in the soybean genome and therefore correctly identifies soybean clones that hybridize to a corresponding probe DNA sequence corresponding to a particular target genomic location. Some of these marker loci are closely linked to agronomically important genes, such as genes for resistance to soybean cyst nematode and fungal pathogens, and are used as locus-specific reference points in positional cloning efforts for these genes.
Makin and Using Markers for Detection of Polymornhic Nucleic Acids The ability to characterize an individual by its genome is due to the inherent variability of genetic information. Although DNA sequences which code for necessary proteins are well conserved across a species, there are regions of DNA which are non-coding or code for portions of proteins which do not have critical functions and therefore, absolute conservation of nucleic acid sequence is not strongly selected for.
These variable regions are identified by genetic markers. Typically, genetic markers are bound by probes such as oligonucleotides or amplicons which bind to variable regions of the genome. In some instances, the presence or absence of binding to a genetic marker identifies individuals by their unique nucleic acid sequence. In other instances, a marker binds to nucleic acid sequences of all individuals but the individual is identified by the position in the genome bound by a marker probe.
The major causes of genetic variability are addition, deletion. or point mutations, recombination and transposable elements within the genome of individuals in a plant population.
Point mutations are typically the result of inaccuracy in DNA replication.
During meiosis in the creation of germ cells or in mitosis to create clones, DNA
polymerase "switches" bases, either transitionally {i. e. , a purine for a purine and a pyrimidine for a pyrimidine) or transversionally (i. e., purine to pyrimidine and vice versa). The base switch is maintained if the exonuclease function of DNA
polymerase does not correct the mismatch. At germination, or the next cell division (in clonal cells), the DNA strand with the point mutation becomes the template for a complementary strand and the base switch is incorporated into the genome. Transposable elements are sequences of DNA which have the ability to move or to jump to new locations within a genome and several examples of transposons are known in the art.
Given the sequences herein, one of skill cari generate probe nucleic acids for detecting markers, including probes which are PCR primers, allele-specific probes, PCR amplicons and the like for the detection of polymorphic nucleotides at the loci disclosed herein, as well as genetically-linked sequences.
Cloning methodologies for replicating nucleic acids and sequencing methods to verify the sequence of nucleic acids are well known in the art.
Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enrymology volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A
Laboratory Manual (2nd ed:) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology, F. M. Ausubel et al. , eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (through and including the 1997 Supplement) (Ausubel). A
catalogue of Bacteria and Bacteriophages useful for cloning is provided, e.g., by the ATCC, e.g., The ATCC Catalosue of Bacteria and Bacteriopha~e (1992) Gherna et al.
(eds) published by the ATCC. Additional basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Lewin (1995) Genes V Oxford University Press Inc., NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books. NY.
Product information from manufacturers of biological reagents and experimental equipment also provide information useful in known biological methods.
Such manufacturers include the Sigma Chemical Company (Saint Louis, MO); New England Biolabs (Beverly, MA); R&D systems (Minneapolis, MN); Pharmacia LKB
Biotechnology (Piscataway, NJ); CLONTECH Laboratories, Inc. (Palo Alto, CA);
ChemGenes Corp., (Waltham MA) Aldrich Chemical Company (Milwaukee, WI); Glen Research, Inc. (Sterling, VA); GIBCO BRL Life Technologies, Inc.
(Gaithersberg, MD);
Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland);
Invitrogen (San Diego, CA); Perkin Elmer (Foster City, CA); and Strategene; as well as many other commercial. sources known to one of skill. As described previously, genetic markers and some RFLP probes are available from Biogenetic Services, Inc.
(Brookings, SD), Linkage Genetics (Salt Lake City, ~UT-- a subsidiary of Perkin Elmer, Branchburg NJ) .
The nucleic acid compositions of this invention, whether DNA, RNA, cDNA, genomic DNA, or analogues thereof, or a hybrid of these molecules, are isolated from biological sources or synthesized in vitro. The nucleic acids of the invention are present in transfected whole cells, in transfected cell lysates, in transgenic plants (especially soybean) or in partially purified or substantially pure form.
In vitro amplification techniques suitable for amplifying sequences for use as molecular probes or generating nucleic acid fragments for subsequent subcloning are known. Examples of techniques sufficient to direct persons of skill through such in vitro amplification methods, including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), QJ3-replicase amplification and other RNA polymerase mediated techniques (e.g., NASBA) are found in Berger, Sambrook, and Ausubel, as well as Mullis et al., (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press Inc. San Diego, CA (1990) (Innis);
Arnheim & Levinson (October 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3, 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173;
Guatelli et WO 99/31964 I ~ PCTNS98/26935 al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin.
Chem 35, 1826; Landegren et al., (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace, ( 1989) Gene 4, 560; Barringer et al. ( 1990) Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al., U.S.
Pat. No.
5,426,039. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Bergen, all supra.
Oligonucleotides for use as probes, e.g., in in vitro amplification methods, for use as gene probes, or as inhibitor components (e.g., ribozymes) are typically synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts., 22(20):1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al.
(1984) Nucleic Acids Res., 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill.
Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide get electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149. The sequence of the synthetic oligonucleotides can be verified using the chemical degradation method of Maxam and Gilbert (1980) in Grossman and Moldave (eds.) Academic Press, New York, Methods in Enzymology 65:499-560.
Providing Large Nucleic Acid Templates In certain applications it is advantageous to make or clone large nucleic acids which encompass multiple loci, or to detect, clone, or isolate nucleic acids linked to polymorphic nucleotides. For example, as described supra, in one embodiment, positional cloning is used to isolate nucleic acids proximal to polymorphic nucleotides, e.g., at more than one locus. These nucleic acids are in linkage disequilibrium with the polymorphic nucleotides, i. e., they are genetically linked to the polymorphic nucleotides on a chromosomal nucleic acid. It will be appreciated that a nucleic acid genetically linked to a polymorphic nucleotide optionally resides up to about 50 centimorgans from the polymorphic nucleic acid, although the precise physical distance will vary depending on the cross-over frequency of the particular chromosomal region. Typical distances from a polymorphic nucleotide are in the range of I-50 centimorgans, for example, less than 1-S, about 1-5, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 centimorgans, etc.
Many methods of making large recombinant RNA and DNA nucleic acids, including recombinant plasmids, recombinant lambda phage, cosmids, yeast artificial chromosomes (YACs), P1 artificial chromosomes, Bacterial Artificial Chromosomes (BACs), and the like are known. A general introduction to YACs, BACs, PACs and MACs as artificial chromosomes is described in Monaco and Larin ( 1994) Trends Biotechnol I2(7):280-286. Examples of appropriate cloning techniques for making large nucleic acids, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Sambrook, and Ausubel, all supra.
In one aspect, nucleic acids hybridizing to the polymorphic nucleic acids disclosed herein (or linked to such nucleic acids) are cloned into large nucleic acids such as YACs, or are detected in YAC genomic libraries cloned from soybean. The construction of YACs and YAC libraries is known. See, Berger, supra, and Burke et al.
(1987) Science 236:806-812. Gridded libraries of YACs are described in Anand et al.
(1989) Nucleic Acids Res. 17, 3425-3433, and Anand et al. (1990) Nucleic Acids Res.
Riley (1990) 18:1951-1956 Nucleic Acids Res. 18(10):2887-2890 and the references therein describe cloning of YACs and related technologies. YAC libraries containing large fragments of soybean DNA have been constructed. See, Funke and Kolchinsky ( 1994) CRC Press,, Boca Raton, FL, pp. 125-308 1994; Marek and Shoemaker ( 1996) Soybean Genet Newsl 23:126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24:196-198. See also, Ausubel, chapter 13 for a description of procedures for making YAC libraries.
Similarly, cosmids or other molecular vectors such as BAC and P1 constructs are also useful for isolating or cloning nucleic acids linked to poIymorphic nucleic acids. Cosmid cloning is also known. See, e.g., Ausubel, chapter 1.10.11 (supplement 13) and the references therein. See also, Ish-Horowitz and Burke {1981) Nucleic Acids Res. 9:2989-2998; Murray (1983) Phage Lambda and Molecular Cloning in Lambda ll (Hendrix et al. , eds) 395-432 Cold Spring Harbor Laboratory, NY;
Frischauf et al. (1983) J.MoI. Biol. 170:827-842; and, Dunn and Blattner (1987) Nucleic Acids Res. 15:2677-2698, and the references cited therein. Construction of BAC
and P1 libraries is known; see, e.g., Ashworth et al. (1995) Anal Biochem 224{2):564-571;
Wang et al. (1994) Genomics 24(3):527-534; Kim et al. (1994) Genomics 22(2):336-9;
Rouquier et al. ( 1994) Anal Biochem 217(2):205-9; Shizuya et al. ( 1992) Proc Natl Acad Sci U S A 89(18):8794-7; Kim et al. (1994) Genomics 22(2):336-9; Woo et al.
(1994) Nucleic Acids Res 22(23): 4922-31; Wang et al. (1995) Plant (3):525-33; Cai (1995) Genomics 29(2): 413-25; Schmitt et al. (1996) Genomics 1996 33(1):9-20; Kim et al.
(1996) Genomics 34(2):213-8; Kim et al. (1996) Proc Natl Acad Sci U S A
(13):6297-301; Pusch et al. (1996) Gene 183(1-2):29-33; and, Wang et al.-(1996)-Genome Res 6(7): 612-9.
Improved methods of in vitro amplification to amplify large nucleic acids linked to the polymorphic nucleic acids herein are summarized in Cheng et al.
( 1994) Nature 369:684-685 and the references therein.
In addition, any of the cloning or amplification strategies described above are.useful for creating contigs of overlapping clones, thereby providing overlapping nucleic acids which show the physical relationship at the molecular level for genetically linked nucleic acids. A common example of this strategy is found in whole organism sequencing projects, in which overlapping clones are sequenced to provide the entire sequence of a chromosome. In this procedure, a library of the organism's cDNA
or genomic DNA is made according to standard procedures described, e.g., in the references above. Individual clones are isolated and sequenced, and overlapping sequence information is ordered to provide the sequence of the organism. See also, Tomb et al. ( 1997) Nature 539-547 describing the whole genome random sequencing and assembly of the complete genomic sequence of Helicobacter pylori; Fleischmann et al.
(1995) Science 269:496-512 describing whole genome random sequencing and assembly of the complete Haemophilus influenzae genome; Fraser et al. (1995) Science 270:397-403 describing whole genome random sequencing and assembly of the complete Mycoplasma genitalium genome and Bult et al. (1996) Science 273:1058-1073 describing-whole genome random sequencing and assembly of the complete Methanococcus jannaschii genome. Recently, Hagiwara and Curtis (1996) Nucleic Acids Research 24(12):2460-2461 developed a "long distance sequencer" PCR protocol for generating overlapping nucleic acids from very large clones to facilitate sequencing, and methods of amplifying and tagging the overlapping nucleic acids into suitable sequencing templates.
The methods can be used in conjunction with shotgun sequencing techniques to improve the efficiency of shotgun methods typically used in whole organism sequencing projects.
WO 99/31964 2o PCTNS98/26935 As applied to the present invention, the techniques are useful for identifying and sequencing genomic nucleic acids genetically linked to the loci described.
Hybridization Strategies In a preferred aspect, a labeled probe nucleic acid is specifically hybridized to a marker nucleic acid from a biological sample and the label is detected, thereby determining that the marker nucleic acid is present in the sample. -For example, a marker comprising a polymorphic nucleic acid can be detected by allele-specific hybridization of a probe to the region of the marker comprising the polymorphic nucleic acid. Similarly, a marker can be detected by Southern analysis, northern analysis, in situ analysis, or the like.
Two single-stranded nucleic acids "hybridize" when they form a double-stranded duplex. The region of double-strandedness can include the full-length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid. "Stringent hybridization conditions" in the context of nucleic acid hybridization are sequence dependent and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen ( 1993), id. Generally, stringent conditions are selected to be about 5 ° C lower than the thermal melting point (T"~ for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Highly stringent conditions are selected to be equal to the Tm point for a particular probe. Sometimes the term "Td" is used to define the temperature at which at least half of the probe dissociates from a perfectly matched target nucleic acid. In any case, a variety of estimation techniques for estimating the Tm or Td are available, and generally described in Tijssen, id. Typically, G-C base pairs in a duplex are estimated to contribute about 3°C to the Tm, while A-T base pairs are estimated to contribute about 2°C, up to a theoretical maximum of about 80-100°C. However, more sophisticated models of TM and Td are available and appropriate in which G-C
stacking interactions, solvent effects, the desired assay temperature and the like are taken into account. In one example, PCR primers were designed to have a dissociation temperature (Td) of approximately 60°C, using the formula: Td = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5;
where ~GC, IIAT, and libp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA.
An example of stringent hybridization conditions for hybridization of complementary nucleic acids that have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formalin with 1 mg of heparin at 42°C, with the hybridization being carried out overnight., An example of stringent wash conditions for a Southern blot of such nucleic acids is a 0.2x SSC wash at 65°C for 15 minutes (see, Sambrook, supra for a description of SSC buffer). Often the high stringency wash is preceded by a low stringency wash to remove background probe signal. An example low stringency wash is 2x SSC at 40°C for 15 minutes.
In general, a signal to noise ratio of 2x (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. For highly specific hybridization strategies such as allele-specific hybridization, an allele-specific probe is usually hybridized to a marker nucleic acid (e.g., a genomic nucleic acid, an amplicon, or the like) comprising a polymorphic nucleotide under highly stringent conditions.
Allele-Specific Hybridization (ASH) One especially preferred example of a hybridization technology for detecting marker nucleic acids is allele-specific hybridization, or "ASH."
This technology is based on the stable annealing of a short, single-stranded oligonucleotide probe to a single-stranded target nucleic acid only when base pairing is completely complementary. The hybridization can then be detected from a radioactive or non-radioactive label on the probe (methods of labeling probes and other nucleic acids are set forth in detail below).
ASH markers are polymorphic when their base composition at one or a few nucleotide positions in a segment of DNA is different among different genotypes.
For each polymorphism, two or more different ASH probes are designed to have identical DNA sequences except at the polymorphic nucleotide(s). Each probe will have exact homology with one allele sequence so that the complement of probes can distinguish all the alternative allele sequences. Each probe is hybridized against the target DNA. With appropriate probe design and stringency conditions, a single-base mismatch between the probe and target DNA will prevent hybridization and the unbound probe will wash away. In this manner, only one of the alternative probes will hybridize to a target sample that is homozygous or homogeneous for an allele (an allele is defined by the DNA homology between the probe and target). Samples that are heterozygous or heterogeneous for two alleles will hybridize to both of two alternative probes. Having a probe for each allele allows the polymorphism to be genetically co-dominant which is useful in determining zygosity. In addition, a co-dominant ASH system is useful when hybridization does not occur for either one of two alternative probes, so that control experiments can be directed towards verifying insufficient target DNA or the occurrence of a new allele.
ASH markers are used as dominant markers where the presence or absence of only one allele is determined from hybridization or lack of hybridization by only one . probe. The alternative allele may be inferred from the lack of hybridization.
Heterogeneous target nucleic acids (i.e., chromosomal DNA from a multiallelic plant) are detected by monitoring simultaneous hybridization of two or more probes comprising different polymorphic nucleotides to a genomic nucleic acid.
Allele-specific hybridization was first described by Wallace et al. (1979) who showed that the hybridization between an oligonucleotide probe and bacteriophage target DNA, dissociated at about 10~ C lower temperature when the probe and target sequences had a single base-pair mismatch compared to when the probe and target DNA
had perfect homology. This difference in thermal stability allowed ASH probes to discriminate the two alleles determined by a single-nucleotide polymorphism between the wildtype sequence and a point mutation in the am-3 bacteriophage.
Later it was shown that a mixture of ASH probes, designed from the possible degenerate DNA sequences coding for a known amino acid sequence, could be used to identify clones containing the rabbit ~-globin DNA that coded for that protein (Wallace et al. (1981) Nuclei Acids Res 9:879-894). They also showed that the only probe that hybridized to the clones had exact homology to the clone, whereas three probes that did not hybridize to the clones had a single base-pair mismatch with the target DNA.
ASH markers have been developed to diagnose susceptibility to human diseases caused by point mutations in DNA sequence. Examples are for the CBS-globin allele that can cause sickle-cell anemia (Corner et al. (1983) Proc Natl Acad Sci USA
80:278-282), the ~~-thalassemia allele that can cause ~-thalassemia (Pirastu et al. (1983) New England J Med 309:284-287), the ~,-antitrypsin allele that can cause liver cirrhosis and pulmonary emphysema (Kidd (1983) Nature 304:230-234), the HLA-DR
haplatypes associated with immune response {Angelini et al. ( 1986) Proc Natl Acad Sci USA
83:4489-4493), and the A985G allele that can cause medium-chain acyl-CoA
dehydrogenase deficiency (Iitia A et al. (1994) BioTechniques 17:566-571).
ASH markers have also been developed to identify strains of fungi _ resistant to the fungicide benzimidazole because of specific point mutations in the ~B-tubulin gene in Venturia inaequalis (Koenraadt and Jones (1992) Phytopathology 82:1354-1358 and Rhynckosporium secalis (Wheeler et al. (1995) Pestic Sci 43:201-209).
An ASH probe is designed to form a stable duplex with a nucleic acid target only when base pairing is completely complementary. One or more base-pair mismatches between the probe and target prevents stable hybridization. This holds true for numerous variations of the process. The probe and target molecules are optionally either RNA or denatured DNA; the target molecules) is/are -any length of nucleotides beyond the sequence that is complementary to the probe; the probe is designed to hybridize with either strand of a DNA target; the probe ranges in size to conform to variously stringent hybridization conditions, etc.
The polymerise chain reaction (PCR) (see, e.g., Mullis KB and Faloona F
(1987) Methods Enrymol 155:335-350 and references supra) allows the target sequence for ASH to be amplified from low concentrations of nucleic acid in relatively small volumes (Koenraadt H and Jones AR (1992) Phytopatholog 82:1354-1358; Iitia et al.
(1994) BioTechniques 17:566-571). Otherwise, the target sequence from genomic DNA
is digested with a restriction endonuclease and size separated by gel electrophoresis (Corner et al. 1983). Hybridizations typically occur with the target sequence bound to the surface of a membrane or, as described in U.S. Patent 5,468,613, the ASH
probe sequence may be bound to a membrane.
Utilizing nucleotide alleles and polymorphisms described here, ASH data were obtained by amplifying nucleic acid fragments (amplicons) from genomic DNA
using PCR, transferring the amplicon target DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to the amplicon target, and observing the hybridization dots by autoradiography. These genetic markers have utility .in the improvement of soybean, an important crop plant that supplies much of the world's oil and protein.
Solid Phase Arrays In one variant, ASH technologies are adapted to solid phase arrays for the rapid and specific detection of multiple polymorphic nucleotides. Typically, an ASH
probe is linked to a solid support and a target nucleic acid (e.g., a genomic nucleic acid, or an amplicon) is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluorophore. Where the target is labeled, hybridization is detected by detecting bound fluorescence. Where the probe is labeled, hybridization is typically detected by quenching of the label. Where both the probe and the target are labeled, detection of hybridization is typically performed by monitoring a color shift resulting from proximity of the two bound labels. A variety of labeling strategies, labels, and the like, particularly for fluorescent based applications are described, supra.
In one embodiment, an array of ash probes are synthesized on a solid support. Using chip masking technologies and photoprotective chemistry it is possible to generate ordered arrays of nucleic acid probes. These arrays, which are known, e.g., as 1S "DNA chips," or as very Large scale immobilized polymer arrays ("VLSIPS""' arrays) can include millions of defined probe regions on a substrate having an area of about lcm2 to several cm2.
The construction and use of solid phase nucleic acid arrays to detect target nucleic acids is well described in the literature. See, Fodor et al. (1991) Science, 251:
767- 777; Sheldon,et al. (1993) Clinical Chemistry 39(4): 718-719; Kozal et al. (199b) Nature Medicine 2(7): 753-759 and Hubbell U.S. Pat. No. 5,571,639. See also, Pinkel et al. PCT/US95/16155 (WO 96/17958). In brief, a combinatorial strategy allows for the synthesis of arrays containing a large number of probes using a minimal number of synthetic steps.. For instance, it is possible to synthesize and attach all possible DNA
8mer oligonucleotides (48, or 65,536 possible combinations) using only 32 chemical synthetic steps. In general, VLSIPS'~ procedures provide a method of producing 4~
different oligonucleotide probes on an array using only 4n synthetic steps.
Light-directed combinatorial synthesis of oligonucleotide arrays on a glass surface is performed with automated phosphoramidite chemistry and chip masking techniques similar to photoresist technologies in the computer chip industry.
Typically, a glass surface is derivatized with a silane reagent containing a functional group, e.g., a hydroxyl or amine group blocked by a photolabile protecting group. Photolysis through a photolithogaphic mask is used selectively to expose functional groups which are then ready to react with incoming 5'-photoprotected nucleoside phosphoramidites.
The phosphoramidites react only with those sites which are illuminated (and thus exposed by removal of the photolabile blocking group). Thus, the phosphoramidites only add to those areas selectively exposed from the preceding step. These steps are repeated until the desired array of sequences have been synthesized on the solid surface.
Combinatorial synthesis of different oligonucleotide analogues at different locations on the.arr~y is determined by the pattern of illumination during synthesis and the order of addition of coupling reagents. Monitoring of hybridization of target nucleic acids to the array is typically performed with fluorescence microscopes or laser scanning microscopes.
In addition to being able to design, build and use probe arrays using available techniques, one of skill is also able to order custom-made arrays and array-reading devices from manufacturers specializing in array manufacture. For example, Affymetrix Corp. in Santa Clara CA manufactures DNA VLSIP'~ arrays.
It will be appreciated that probe design is influenced by the intended application. For example, where several allele-specific probe-target interactions are to be detected in a single assay, e.g., on a single DNA chip, it is desirable to have similar melting temperatures for all of the probes. Accordingly, the length of the probes are adjusted so that the melting temperatures for all of the probes on the array are closely similar (it will be appreciated that different lengths for different probes may be needed to achieve a particular Tm where different probes have different GC contents).
Although melting temperature is a primary consideration in probe design, other factors are optionally used to further adjust probe construction.
Chromosome Painting Technologies--In Situ Hybridization In one aspect, a marker is used as a chromosome probe to cytogenetically detect the presence of a polymorphic nucleic acid or region linked to the nucleic acid.
This can be especially useful because cytogenetic identification of a chromosomal region provides a way of determining the physical location of the region hybridized by the probe, i. e., in reference to other known markers.
Typically, a probe which hybridizes to a polymorphic nucleotide or a linked nucleic acid is chemically linked to a colorometric label, or fluorophore. The probe is used to paint the chromosome with the color label, thereby identifying regions which are hybridized by the label. Chromosome painting refers to the staining of specific metaphase or prophase chromosomes or regions of chromosomes with probe mixtures, e.g., probes hybridizing to the polymorphic nucleic acids of the invention, and optionally, additional probes hybridizing to additional regions. The painting signal is preferably obtained by fluorescence in situ hybridization (FISH) of such mixtures with the target genome. A variety of staining technologies for the detection of chromosomal differences (typically abnormalities) are known. See, Jauch et al., Hum.
Genet. , 85:145-150 (1990); Wier Chromosomal, 100:371-376 {1991); Van-den-Engh et al., Cytometry 6:92-100 ( 1988)' and Kaltoft et al. Arch. Dermatol. Res. , 279:293-298 ( 1987); Sealey et al. Nucleic Acids Res. 13:1905 (1985); Landegent et al. Hum. Genet., 77:366 (1987);
Nisson et al., BRL Focus, 13:42 (1991).
Comparative genomic hybridization (CGH) is also a known approach for identifying the presence and localization of sequences in a genome compared to a reference genome. See, Kallioniemi, et al. {1992) Science 258:818. CGH can provide a quantitative estimate of copy number and also provides information regarding the localization of amplified or deleted sequences in a normal chromosome.
Many in situ detection techniques are known and can be adapted to the present invention. Fluorescent in situ hybridization (FISH), reverse chromosome painting, FISH on DAPI stained chromosomes, generation of Alphoid DNA probes for FISH using PCR, PRINS labeling of DNA, free chromatin mapping, spectral karyotyping and a variety of other techniques described, e.g., in Tijssen (1993) Laboratory Techniaues in biochemistry and molecular biology--hybridization with nucleic acid probes parts I and II, Elsevier, New York, and, Choo (ed) (1994) Methods In Molecular Biology Volume 33- In Situ Hybridization Protocols Humana Press Inc., New Jersey (see also, other books in the Methods in Molecular Biology series).
These color-labeling strategies are useful for distinguishing the presence or absence of a chromosomal nucleic acid. They are also useful for the detection of multiple probes with multiple labels. In particular, chromosomes are optionally stained with multiple probes, optionally having multiple color labels. In this way, it is possible to quickly provide a genetic map of a sample at the molecular level.
Furthermore, it is possible to determine whether two polymorphic nucleotides from the same locus are present. For example, if two allele-specific probes with different color labels are hybridized to a chromosomal sample under allele-specific hybridization conditions, it is possible specifically to detect both polymorphic nucleotides. For example, where a first probe has a "blue" label, and a second probe has a "yellow" label, a sample which is homozygous for the polymorphic nucleotide specifically bound by the first probe will look "blue" to an observer, a sample which is homozygous for the polymorphic nucleotide specifically bound by the second probe will look "yellow" to an observer, while a sample which is heterozygous and binds both probes will appear "green"
to an S observer. It will be appreciated that many color combinations are possible.
For example, where the first fluorophore emits a "blue" light and a second fluorophore emits a "yellow" light, the effect to the observer is that a "green" signal is observed. It will be appreciated that a wide variety of emission characteristics can be monitored; indeed, even when the fluorophores emit a non-visible wavelength of light, a combination color can be assigned to a ratio between any two (or more, e.g., where more than two probes are used in an assay) wavelengths of light.
Amplification Detection Strategies In a preferred embodiment, a polymorphic nucleotide is detected by amplifying the polymorphic nucleotide and detecting the resulting amplicon. A
variety of 1 S variations on this strategy are used to detect polymorphic nucleic acids, depending on the materials available, and the like.
~1) PCR
In one embodiment, nucleic acids primers which hybridize to regions of a genomic nucleic acid that flank a polymorphic nucleotide to be detected are used in PCR
or LCR reactions to generate an amplicon comprising the polymorphic nucleotide. A
variety of PCR and LCR strategies are known in the art and are found in Bergen, Sambrook, Ausubel, and Innis, all supra. See also, as Mullis et al., (1987) U.S. Patent No. 4,683,202. In brief, a nucleic acid having a polymorphic nucleic acid to be detected (a genomic DNA, a genomic clone, a genomic amplicon or the like) is hybridized to 2S primers which flank the polymorphic nucleotide to be detected (e.g., nucleotide polymorphisms at a locus such as pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pASOSA, pASl9A, pAS88A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT6SA, php0226SA, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03S22A, php0S219A, phpOS233A, php0S278A, phpOS342A, php076S9A, php08S84A, php1210SA, php02340B, phpOS264A, php103SSB, pK069A, pK079A, pK401A, pK4I8A, pK644B, pLOSBA, pL183A, pR04SA, pRlS3A, pT00SA, pTISSA, php02329A, php02371A, pA343A, pA748B, phpOS290A, pA8S8A, php02376A, pG17.3, pB132A, php10078A and/or SOYBPSP, all WO 99/31964 2g PCT/US98/Z6935 described supra). Example primers which amplify the polymorphic nucleic acids are provided in the examples section below. The primers are extended in a PCR
reaction (typically including a thermostable polymerase enzyme such as Taq, deoxynucleotides, Mg++ and the like; See, Ausubel, Innis, Berger or Sambrook for typical PCR
conditions). The resulting PCR amplicons comprise the polymorphic nucleic acid to be detected. Exemplar amplicons include pha12105, pha12390, pha12391; pha12392;
pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha11076, pha10653, pha10598, pha10615, pha10646, pha10618, pha10620, pha10782, pha11131, phaI1132, pha10650, pha10651, pha11138, pha10637, pha11078, pha11079, pha11139, pha10655, phaI1701, pha11627, pha10633, pha11074, phall075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha10647, pha08230,pha13070, pha13071, pha13072, pha13073, pha13074, pha13158, pha13560, pha13561, pha14257 and pha14395.
Methods of detecting PCR amplicons are known, and can easily be adapted to detecting the amplicons of the invention. Detection is typically performed by running PCR reaction products out on an acrylamide or agarose gei and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the PCR amplicons (using standard Sanger dideoxy or Maxam-Gilbert methods). The polymorphic nucleotides in amplicons are optionally detected by cleaving the amplicon with a restriction enzyme that recognizes the polymorphic nucleic acid, in an adaptation of standard RFLP analysis.
In addition to the example primer nucleic acids in the examples section below, one of skill is easily able to select a variety of other primers which can be used in PCR amplification of nucleic acids comprising or proximal to polymoiphic nucleotides.
In particular, methods of amplifying large nucleic acids by PCR are summarized in Cheng et al. { 1994) Nature 369: 684-685 and the references therein, in which PCR
amplicons of up to 40kb are generated. More typically, standard PCR is used to create amplicons of between about 100 and about 5,000 nucleotides in length, e.g., using the techniques described in Ausubel and Innis, supra. In any case, primers that hybridize to essentially any region of an amplicon made using the primers of the invention are designed by reference to the sequence of the amplicon. The sequence of the primers are selected to hybridize to regions of the amplicon.
Amplicons are sequenced by any of a variety of protocols. Most DNA
sequencing today is carried out by chain termination methods of DNA
sequencing. The most popular chain termination methods of DNA sequencing are variants of the dideoxynucleotide mediated chain termination method of Sanger. See, Sanger et al, (1977) Proc. Nat. Acad. Sci., USA 74:5463-5467. For a simple introduction to dideoxy sequencing, see, Current Protocols in Molecular Biology, F. M. Ausubel et al.
, eds. , Current Protocols, a joint venture between Greene Publishing Associates, Inc.
and John Wiley & Sons, Inc., (Supplement 37, current through 1997) (Ausubel), Chapter 7.
Thousands of laboratories employ dideoxynucleotide chain termination techniques.
Commercial kits containing the reagents most~typically used for these methods of DNA
sequencing are available and widely used.
In addition to the Sanger methods of chain termination, new PCR
exonuclease digestion methods are available for DNA sequencing of PCR
amplicons.
Direct sequencing of PCR generated amplicons by selectively incorporating boronated nuclease resistant nucleotides into the amplicons during PCR and digestion of the amplicons with a nuclease to produce sized template fragments has been developed (Porter et al. {1997) Nucleic Acids Research 25(8):1611-1617). In the methods, reactions on a template are performed, in which one of the nucleotide triphosphates in the PCR reaction mixture is partially substituted with a 2'deoxynucleoside 5'-a[P-borano]-triphosphate. The boronated nucleotide is stocastically incorporated into PCR
products at varying positions along the PCR amplicon. An exonuclease which is blocked by incorporated boronated nucleotides is used to cleave the PCR amplicons. The cleaved amplicons are then separated by size using polyacrylamide gel electrophoresis, providing_ the sequence of the amplicon. An advantage of this method is that it requires fewer biochemical manipulations for sequencing an amplicon than performing standard Sanger-style sequencing of PCR amplicons.
Once an amplicon is sequenced, the sequence is optionally used to select primers complementary to the amplicon, i. e., primers which will hybridize to the amplicon. It is expected that one of skill is thoroughly familiar with the theory acrd practice of nucleic acid hybridization and primer selection. Gait, ed.
Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford (1984); W.H.A. Kuijpers Nucleic Acids Research 18( 17), 5197 ( 1994); K. L. Dueholm J. Org. Chem. 59, 5767-(1994); S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology--hybridization with nucleic acid probes, e.g., part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York provide a basic guide to nucleic acid hybridization. Innis, supra, provides an overview of primer selection.
One of skill will recognize that the 3' end of an amplification primer is more important for PCR than the 5' end. Investigators have reported PCR
products where only a few nucleotides at the 3' end of an amplification primer were complementary to a DNA to be amplified. In this regard, nucleotides at the 5' end of a primer can incorporate structural features unrelated to the target nucleic acid; for instance, in one embodiment, a sequencing primer hybridization site (or a complement to such a primer, depending on the application) is incorporated into the amplification primer, where the sequencing primer is derived from a primer used in a standard sequencing kit, such as one using a biotinylated or dye-labeled universal M13 or SP6 primer. The primers are typically selected so that there is no complementarity between any known target sequence and any constant primer region. One of skill will appreciate that constant regions in primer sequences are optional.
Typically, all primer sequences _are selected to hybridize only to a perfectly complementary DNA, with the nearest mismatch hybridization possibility from known DNA sequence typical having at least about 50 to 70 % hybridization mismatches, and preferably 100 mismatches for the terminal 5 nucleotides at the 3' end of the primer.
The primers are selected so that no secondary structure forms within the primer. Self complementary primers have poor hybridization properties, because the complementary portions of the primers self hybridize (i.e., form hairpin structures).
Primers are selected to have minimal cross-hybridization, thereby preventing competition between individual primers and a template nucleic acid and preventing duplex formation of the primers in solution, and possible concatenation of the primers during PCR. If there is more than one constant region in the primer, the constant regions of the primer are selected so that they do not self hybridize or form hairpin structures.
One of skill will recognize that there are a variety of possible ways of performing the above selection steps, and that variations on the steps are appropriate.
Most typically, selection steps are performed using simple computer programs to perform the selection as outlined above; however, all of the steps are optionally performed manually. One available computer program for primer selection is the MacVector~' program from Kodak. In addition to programs for primer selection, one of skill can easily design simple programs for any or all of the preferred selection steps.
One of skill will recognize that a wide variety of amplicons are provided by the present invention. In particular, amplicons are generated with the primers described herein. The amplicons can be generated by exponential amplification as described in the examples herein, or by linear amplification using a single specific primer, or by using one of the example primers below in conjunction with a set of random primers.
It will be appreciated that .the amplicons are characterized by a variety of physicochemical properties, including, but not limited to the following.
First, the amplicons of the invention are produced in an amplification reaction using the primers as described above, with genomic soybean nucleic acid as a template (or a derivative thereof, such as a cloned or in vitro amplified genomic nucleic acid). Second, single stranded forms of the amplicons (e.g., denatured amplicons) hybridize under stringent conditions to the template nucleic acid. Conditions for specific hybridization of nucleic acids, including amplicon nucleic acids are described above.
A third physicochemical property of amplicons of the invention is that they specifically hybridize to one or more of the primers in the examples section below. In particular, the primers used to make the ,amplicon will hybridize to the amplicon; indeed, in PCR amplification strategies, hybridization of the primers to the amplicon is usually required for amplification. Additional physicochemical properties of the amplicons are described in the examples section, where example amplicons are described with reference, e.g., to size and hybridization to particular primers.
~2) LCR
In another embodiment, LCR is used to amplify specifically a polymorphic nucleic acid. By detecting the amplification product, presence of the polymorphic nucleotide is confirmed. Detection is typically performed by running LCR
reaction products out on an acrylamide or agarose gel and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the LCR amplicons (using standard Singer dideoxy or Maxim-Gilbert methods).
Detection techniques such as PCR amplification or other in vitro amplification methods are also used to detect LCR products.
The ligation chain reaction (LCR; sometimes denoted the "Iigation amplification reaction" or "LAR") and related techniques are used as diagnostic methods for detecting single nucleotide variations in target nucleic acids. LCR
provides a .
mechanism for linear or exponential amplification of a target nucleic acid via ligation of complementary oligonucleotides hybridized to a target. This amplification is performed to distinguish target nucleic acids that differ by a single nucleotide, providing a powerful tool for the analysis of genetic variation in the present invention, i.e., for distinguishing polymorphic nucleotides.
The principle underlying LCR is straightforward: Oligonucleotides which are complementary to adjacent segments of a target nucleic acid are brought into proximity by hybridization to the target, and Iigated using a ligase. To achieve linear amplification of the nucleic acid, a single pair of oligonucleotides which hybridize to adjoining areas of the target sequence are employed: the oligonucleotides are ligated, denatured from the template and the reaction is repeated. To achieve exponential amplification of the target nucleic acid two pairs of oligonucleotides (or more) are used, each pair hybridizing to complementary sequences on e.g., a double-stranded target polynucleotide. After ligation and denaturation, the target and each of the ligated oligonucleotide pairs serves as a template for hybridization of the complementary oligonucleotides to achieve ligation. The ligase enzyme used in performing LCR
is typically thermostable, allowing for repeated denaturation of the template and ligated oligonucleotide complex by heating the ligation reaction.
LCR is useful as a diagnostic tool in the detection of genetic variation.
Using LCR methods, it is possible to distinguish between target polynucleotides which differ by a single nucleotide at the site of ligation. Ligation occurs only between oligonucleotides hybridized to a target polynucleotide where the complementarily between the oligonucleotides and the target is perfect, enabling differentiation between allelic variants of a gene or other chromosomal sequence. The specificity of ligation during LCR can be increased by substituting the more specific NAD+-dependant ligases such as E. coli ligase and (thermostable) Taq ligase for the less specific T4 DNA
ligase. The use of NAD analogues in the ligation reaction further increases specificity of the ligation reaction. See, U.S. Pat. No. 5,508,179 to Wallace et al.
Finally, multiple LCR reactions can be run simultaneously in a single reaction, or in parallel reactions for simultaneous detection of any or all of the nucleotide polymorphisms described herein.
13). TAS. 3SR and OB amplification Nucleotide polymorphisms are also detected using other in vitro detection methods, including TAS, 3SR and QQ amplification. (TAS), the self sustained sequence replication system (3SR) and the QQ replicase amplification system (QB), are reviewed in The Journal Of NIH Research (1991) 3, 81-94. The present invention may be practiced in conjunction with TAS (Kwoh, et al. ( 1989) Proc. Natl. Acad. Sci. USA 86, I
173 or the related 3SR (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874) for detecting single-base alterations in target nucleic acids by transcribing the target, annealing oligonucleotide primers to the transcript and ligating the annealed primers. QB
replication (Lomell et al. (1989) J. Clin. Chem 35, 1826) may also be used in conjunction with the ligation methods of the present invention to detect mismatches by performing QB amplification on DNA ligated by the methods of the present invention.
Labeling and Detecting Probes A probe for use in an in situ detection procedure, an in vitro amplification procedure (PCR, LCR, NASBA, etc.), hybridization techniques (allele-specific hybridization, in situ analysis, Southern analysis, northern analysis, etc. ) or any other detection procedure herein can be labeled with any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means. Useful labels in the present invention include spectral labels such as fluorescent dyes (e.g., fluorescein isothiocyanate, Texas red, rhodamine, digoxigenin, biotin, and the like), radiolabels (e.g., 3H, ~uI, 35S, ~4C, 32p~ 33p~ etc.), enzymes (e.g., horse-radish peroxidase, alkaline phosphatase etc.) spectral colorimetric labels such as colloidal gold or colored glass or plastic (e.g. polystyrene, polypropylene, latex, etc.) beads. The label may be coupled directly or indirectly to a component of the detection assay (e.g., a probe, primer, amplicon, YAC, BAC or the like) according to methods well known in the art. As indicated above, a wide variety of labels may be used, with the choice of label depending on sensitivity required, ease of conjugation with the compound, stability requirements, available instrumentation, and disposal provisions. In WO 99/31964 34 PCfNS98/26935 general, a detector which monitors a probe- target nucleic acid hybridization is adapted to the particular label which is used. Typical detectors include spectrophotometers, phototubes and photodiodes, microscopes, scintillation counters, cameras, film and the like, as well as combinations thereof. Examples of suitable detectors are widely available from a variety of commercial sources known to persons of skill.
Commonly, an optical image of a substrate comprising a nucleic acid array with particular set of probes bound to the array is digitized for subsequent computer analysis.
Because incorporation of radiolabeled nucleotides into nucleic acids is straightforward, this detection represents a preferred labeling strategy.
Exemplar technologies for incorporating radiolabels include end-labeling with a kinase or phoshpatase enzyme, nick translation, incorporation of radio-active nucleotides with a polymerase and many other well known strategies.
Fluorescent labels are also preferred labels, having the advantage of requiring fewer precautions in handling. Preferred labels are typically characterized by one or more of the following: high sensitivity, high stability, low background, low environmental sensitivity and high specificity in labeling. Fluorescent moieties, which are incorporated into the labels of the invention, are generally are known, including Texas red, digoxigenin, biotin, 1- and 2-aminonaphthalene, p,p'-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene, bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol, bis-3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol, benzimidazolylphenylamine, 2-oxo-3-chromen, indoie, xanthen, 7-hydroxycoumarin, phenoxazine, calicylate, strophanthidin, porphyrins, triarylmethanes and flavin.
Individual fluorescent compounds which have functionalities for linking to an element desirably detected in an apparatus or assay of the invention, or which can be modified to incorporate such functionalities include, e.g., dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthydrol; rhodamineisothiocyanate; N-phenyl 1-amino-8-sulfonatonaphthalene; N-phenyl 2-amino-6-sulfonatonaphthalene;
4-acetamido-4-isothiocyanato-stilbene-2,2'-disulfonic acid; pyrene-3-sulfonic acid;
2;toluidinonaphthalene-6-sulfonate; N-phenyl-N-methyl- 2-aminoaphthalene-6-sulfonate;
ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; N,N'-dioctadecyl oxacarbocyanine: N,N'-dihexyl oxacarbocyanine; merocyanine, 4-(3'-pyrenyl)stearate; d-3-aminodesoxy-equilenin;
12-(9'-anthroyl)stearate; 2-methylanthracene; 9-vinylanthracene;
2,2'(vinylene-p-phenylene)bisbenzoxazole; p-bis{2-(4-methyl-5-phenyl-oxazolyl))benzene;
6-dimethylamino-1,2-benzophenazin; retinol; bis(3'-aminopyridinium) I,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin; chlorotetracycline;
N-(7-dimethylamino-4-methyl-2-oxo-3-chromenyl)maleimide; N-(p-(2- .
benzimidazolyl)-phenyl)maleimide; N-(4-fluoranthyl)maleimide; bis(homovanillic acid);
resazarin; 4-chloro-7-nitro-2,1,3- benzooxadiazole; merocyanine 540;
resorufin; rose bengal; and 2,4-diphenyl-3(2H)-furanone. Many fluorescent tags are commercially IO available from SIGMA chemical company (Saint Louis, MO), Molecular Probes, R&D
systems (Minneapolis, MN), Pharmacia LKB Biotechnology (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, ~CA), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc., GIBCO BRL Life Technologies, Inc.
(Gaithersberg, MD), Fluka Chemica- Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), and Applied Biosystems (Foster City, CA) as well as other commercial sources known to one of skill.
In one embodiment, nucleic acids are labeled by culturing recombinant cells which encode the nucleic acid in a medium which incorporates fluorescent or radio-active nucleotide analogues in the growth medium, resulting in the production of fluorescently labeled nucleic acids. Similarly, nucleic acids are synthesized in vitro using a primer and a DNA polymerase such as taq. For example, Hawkins et al. U.S.
Pat.
No. 5,525,711 describes pteridine nucleotide analogs for use in fluorescent DNA probes, including PCR amplicons.
The label is coupled directly or indirectly to a molecule to be detected (a product, substrate, enzyme, or the like) according to methods well known in the art. t~s indicated above, a wide variety of labels are used, with the choice of label depending on the sensitivity required, ease of conjugation of the compound, stability requirements, available instrumentation, and disposal provisions. Non radioactive labels are often attached by indirect means. Generally, a ligand molecule (e.g., biotin) is covalently bound to a nucleic acid such as a probe, primer, amplicon, YAC, BAC or the like. The ligand then binds to an anti-ligand (e.g., streptavidin) molecule which is either inherently detectable or covalently bound to a signal system, such as a detectable enzyme, a fluorescent compound, or a chemiluminescent compound. A number of ligands and anti-ligands can be used. Where a ligand has a natural anti-ligand, for example, biotin, thyroxine, and cortisol, it can be used in conjunction with labeled anti-ligands.
Alternatively, any haptenic or antigenic compound can be used in combination with an antibody. Labels can also be conjugated directly to signal generating compounds, e.g., by conjugation with an enzyme or fluorophore or chromophore. Enzymes of interest as labels will primarily be hydrolases, particularly phosphatases, esterases and glycosidases, or oxidoreductases, particularly peroxidases. Fluorescent compounds include fluorescein and its derivatives, rhodamine and its derivatives, dansyl, umbelliferone, etc.
Chemiluminescent compounds include luciferin and 2,3-dihydrophthalazinediones, e.g., luminol. Means of detecting labels are well known to those of skill in the art. Thus, for example, where the label is a radioactive label, means for detection include a scintillation counter or photographic film as in autoradiography.
Makine Trans enic Plants With Nucleic Acids Linked to Selected Loci Nucleic acids which are genetically linked to the loci described herein are optionally cloned and transduced into cells, especially to make transgenic plants. In particular, nucleic acids linked to the loci pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, ~pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A or SOYBPSP
are cloned and transduced into plants. The cloned sequences are useful as molecular tags for selected plant strains, and are further useful for encoding polypeptides.
Often, these-polypeptides are encoded by a QTL and are responsible for the phenotypic effects of the QTL.
The nucleic acids linked to a selected locus or selected loci are introduced into plant cells, either in culture or in organs of a plant, e.g., leaves, stems, fruit, seed, etc. The expression of natural or synthetic nucleic acids encoded by nucleic acids linked to polymorphic nucleic acids can be achieved by operably linking a nucleic ~
acid of interest to a promoter, incorporating the construct into an expression vector, and introducing the vector into a suitable host cell. Alternatively, an endogenous promoter linked to the nucleic acids can be used.
Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular nucleic acid. The vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, prokaryotes, or both (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems.
Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both.
See, Giliman & Smith, Gene 8: 81 ( 1979); Roberts, et al. , Nature, 328:731 ( 1987);
Schneider, B., et al., Protein Expr. Purif. 6435:10 (1995); Berger, Sambrook, Ausubel (all supra).
Cloning of QTL Linked Sequences into Bacterial Hosts There are several well-known methods of introducing nucleic acids into bacterial cells, any of which may be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors, etc. Bacterial cells are often used to amplify increase the number of plasmids containing DNA constructs of this invention. The bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook). In addition, a plethora of kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, EasyPrep'~, FIexiPrep~", both from Pharmacia Biotech; StrataClean~', from Stratagene; and, QIAexpress Expression System, Qiagen). The isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells, or incorporated into Agrobacterium tumefaciens to infect plants.
The in vitro delivery of nucleic acids into bacterial hosts can be to any cell grown in culture. Contact between the cells and the genetically engineered nucleic acid constructs, when carried out in vitro, takes place in a biologically compatible medium.
The concentration of nucleic acid varies widely depending on the particular application, but is generally between about 1 ,uM and about 10 mM. Treatment of the cells with the nucleic acid is generally carried out at physiological temperatures (about 37°C) for periods of time of from about 1 to 48 hours.
Alternatively, a nucleic acid operably linked to a promoter to form a fusion gene is expressed in bacteria such as E. coli and its gene product isolated and purified.
Transfecting Plant Cells To use isolated sequences in the above techniques, recombinant DNA
vectors suitable for transformation of plant cells are prepared. Techniques .for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature. See, for example, Weising, et al., Ann.
Rev. Genet.
22:421-477 (1988). A DNA sequence coding for the desired mRNA, polypeptide, or non-expressed tagging sequence is transduced into the plant. Where the sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the sequence from the gene in the intended tissues of the transformed plant.
Promoters in nucleic acids linked to the above loci are identified, e.g., by analyzing the 5' sequences upstream of a coding sequence in linkage disequilibrium with the loci. Optionally, such nucleic acids will be associated with a QTL.
Sequences characteristic of promoter sequences can be used to identify the promoter.
Sequences controlling eukaryotic gene expression have been extensively studied. For instance, promoter sequence elements include the TATA box consensus sequence (TATAAT), which is usually 20 to 30 base pairs upstream of a transcription start site.
In most instances the TATA box aids in accurate transcription initiation. In plants, further upstream from the TATA box, at positions -80 to -100, there is typically a promoter element with a series of adenines surrounding the trinucleotide G (or T) N G.
See, e.g., J. Messing, et al., in GENETIC ENGINEERING IN PLANTS, pp. 221-227 (Kosage, Meredith and Hollaender, eds. (1983)).
A number of methods are known to those of skill in the art for identifying and characterizing promoter regions in plant genomic DNA. See, e.g., Jordano, et al., Plant Cell 1:855-866 (1989); Bustos, et al., Plant Cell 1:839-854 (1989);
Green, et al., EMBO J. 7:4035-4044 (1988); Meier, et al., Plant Cell 3:309-316 (1991); and Zhang, et al., Plant Physiology 110:1069-1079 (1996).
In construction of recombinant expression cassettes of the invention, a plant promoter fragment is optionally employed which directs expression of the gene in all tissues of a regenerated plant. Such promoters are referred to herein as "constitutive"
promoters and are active under most environmental conditions and states of development or cell differentiation. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1'- or 2'-promoter derived from T-DNA of Agrobacterium tumafaciens, and other transcription initiation regions from various plant genes known to those of skill.
Alternatively, the plant promoter may direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or may be otherwise under more precise environmental control (inducible promoters).
Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.
If polypeptide expression is desired, a polyadenylation region at the 3'-end of the coding region is typically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from T-DNA.
Thewector comprising the sequences (e.g., promoters or coding regions) from genes of the invention will typically comprise a marker gene which confers a selectable phenotype on plant cells. For example, the marker can encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, 6418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosluforon or Basta.
Introduction of the Nucleic Acids into Plant Cells The DNA constructs of the invention are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques. For example, the DNA construct can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment. Alternatively, the DNA constructs are combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host directs the insertion of the construct and adjacent markers into the plant cell DNA when the cell is infected by the bacteria.
Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski, et al., EMBO J. 3:2717 (1984).
WO 99/31964 4~ PCT/US98/26935 Electroporation techniques are described in Fromm, et al., Proc. Nat'1. Acad.
Sci. USA
82:5824 (1985). Ballistic transformation techniques are described in Klein, et al., Nature 327: 70-73 ( 1987) .
Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, are also well described in the scientific literature.
See, for example Horsch, et al., Science 233:496-498 (1984), and Fraley, et al., Proc.
Nat'l. Acad. Sci. USA 80:4803 (1983). Agrobacterium-mediated transformation is a preferred method of transformation of dicots.
Generation of Transgenic Plants Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans, et al., Protonlasts Isolation and Culture Handbook of Plant Cell Culture, pp. 124-176, Macmillian Publishing Company, New York, (1983);
and Binding, REGENERATION OF Pl.r!!M's, PLrINf PROTOPLASTS, pp. 21-73, CRC
Press, Boca Raton, (1985). Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar, et al., J. Tissue Cult. Meth. 12:145 (1989);
McGranahan, et al., Plant Cell Rep. 8:512 (1990)), organs, or parts thereof. Such regeneration techniques are described generally in Klee, et al., Ann. Rev. of Plant Phys.
38:467-486 ( 1987).
One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
Discussion of the Accompanlring Sequence Listing The accompanying sequence listing provides complete or partial sequences for a number of amplicons comprising the marker loci herein and for various primers and probe sequences useful in allele-specific hybridization, PCR and the like. The information is presented in DNA sequences. One of skill will readily understand that the sequence also fully describes the complementary strand of the provided DNA, i.
e., by using standard base-pairing rules, the sequence of complementary DNA is provided and can be written out by any competent practitioner in the art. RNAs having the same sequence are provided by substituting "T" residues with "U" residues, and RNAs corresponding to the complementary strand are similarly provided. A variety of conservatively modified variations of the sequences are also fully provided.
For example, coding regions denoted by open reading frames, beginning with the start codon "ATG" coding for methionine and optionally ending with a stop codon (TAA, TAG
or TGA) can generally be modified by substituting codons which equivalently code for the same amino acid. The genetic code is well known, and is found in essentially all modern textbooks on Molecular Biology. See, e.g., Lewin (1995) Genes V Oxford University Press Inc., NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books. NY. Accordingly, although the given sequences are preferred because of their hybridization characteristics (i. e., .because the given sequences hybridize to genomic soybean DNA at polymorphic loci) one of skill will recognize that, for coding purposes, any coding sequence can be equivalently represented by any sequence having equivalent codons, and~that recitation of a single sequence provides all of these coding sequences. In the interest of not providing clearly redundant information, all possible coding sequences are not written out separately.
However, one of skill can easily do so with the information provided, by simple reference to the genetic code. Simple computer programs can also be used to list any or all such nucleic acids, given the provided sequence. For example, coding regions where the nucleotides TTT
(coding for phenylalanine) appear are optionally substituted with TTC, and vice-versa.
The codons TTA, TTG, CTT, CTC, CTA, CTG (coding for Leucine) are optionally substituted for one another, in any combination. Coding regions where ATT, ATC
or ATA appear (all coding for isoleucine) are optionally substituted for one another. The codons GTT, GTC, GTA and GTG (all coding for valine) are all optionally substituted for one another. The codons TCT, TCC, TCA, AGT, TCG and AGC (all coding for serine) are optionally substituted for one another. The codons CCT, CCC, CCA
and CCG (all coding for proline) are optionally substitued for one another. The codons ACT, ACC, ACA, and ACG (all coding for threonine) are optionally substitued for one another. The codons GCT, GCC, GCA and GCG (all coding for alanine) are optionally substitued for one another. The codons TAT and TAC (all coding for tyrosine) are optionally substitued for one another. The codons TAA, TAG and TGA (all coding for stop codons) are optionally substitued for one another. The codons CAT and CAC
(all coding for histadine) are optionally substitued for one another. The codons CAA and CAG (all coding for glutamine) are optionally substitued for one another. The codons AAT and AAC (all coding for asparagine) are optionally substitued for one another. The codons AAA and AAG (coding for lysine) are optionally substitued for one another. The codons GAA and GAG (coding for glutamic acid) are optionally substitued for one another. The codons TGT and TGC (coding for cyteine) are optionally substitued for one another. The codons CGT, CGC, CGA and CGG (coding for arginine) are optionally substitued for one another. The codons GGT, GGC, GGA and CCC
(coding for glycine) are optionally substitued for one another.
Additional conservative substitutions are also provided by the given sequences. With respect to particular nucleic acid sequences, conservatively modified variants are those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the.encoded sequence is a "conservatively modified variant"
where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following six groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
and 6) Phenylalanine (F), Tyrosine (Y), and Tryptophan (W).
Regarding nucleic acids, the amplicons and probes in the sequence listing are typically used in hybridization experiments, e.g., for marker-assisted selection.
Accordingly, amplicons and probes which are substantially similar or identical, which can be used in the methods herein (e.g., infra-specific alleles, genetically engineered nucleic acids made, e.g., by modification of the provided sequences, and the like) are provided by the accompanying sequences. In particular, bases may be added, deleted or changed without substantially altering the hybridization properties of the nucleic acid.
For example, bases which do not hybridize to a given probe can be modified without altering the hybridization properties of the probe to the given sequence.
Modifications in such non-hybridizing regions (e.g., flanking regions) result in a nucleic acid which is essentially the same (has the same desired phsiochemical properties, i. e., hybridizes' to the same probe) as the written sequence. For example, it will be appreciated that the amplicons are optionally larger than probes which hybridize to them.
Accordingly, the regions which are not involved in hybridization are not essential for hybridization to a probe. Thus, where the nucleotide to be detected is a polymorphic nucleotide, the regions of an amplicon flanking the polymorphic nucleotide which do not hybridize to a probe (e.g., an allele-specific probe, as described, supra) are not critical for hybridization to the probe. One of skill will recognize that these regions are optionally modified.
One of skill will further recognize that the sequences in the sequence listing are optionally part of larger sequences, e.g., the nucleic acids can be cloned into vectors known in the art. See, Sambrook, Ausubel, Berger and Innis, all supra.
Furthermore, subsequences of the given sequences are easily constructed, either by synthetically or recombinantly joining nucleotides to yield the subsequence.
Typical subsequences are at least about 10 nucleotides, often at least about 20 nucleotides, generally often at least about 30 nucletoides, and optionally any length, e.g., 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900 or the like and can, of course, be full-length. Similarly, a subsequence can include, e.g. 5 0, 109&, 15~, 20%, 25%, 30~, 35%, 40%, 4590, 50%, 55%, 60%, 65 % , 70 % , 75 ~ , 80 °6 , 85 °b , 90 °!o , 95 % , 100 ib or any percentage between those listed of a particular full-length nucleic acid.
The subsequences are characterized by the ability to specifically hybridize to the complement of the full-length sequence, and by sequence identity with a sequence in the sequence listing over a selected comparison window. A "comparison window", as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 10 to 1000, usually about 20 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith & Waterman, Adv. Appl.
Math.
2:482 (1981); by the homology alignment algorithm of Needleman & Wunsch, J.
Mol.
Biol. 48:443 (1970); by the search for similarity method of Pearson & Lipman, Proc.
Natl. Acad. Sci. USA 85:2444 (1988); by computerized implementations of these algorithms (including, but not limited to CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, California, GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, USA); the CLUSTAL program is well described by Higgins & Sharp, Gene, 73:237-244 {1988) and Higgins & Sharp CABIOS
5:151-153 (I989); Corpet et al., Nucleic Acids Research 16:10881-10890 (1988);
Huang et al. Computer Applications in the Biosciences 8:155-165 (1992); and Pearson et al., Methods in Molecular Biology 24:307-331 (1994). Alignment is also often performed by inspection and manual alignment. One of skill can easily select a variety of nucleic acids which are identical to the given nucleic acids over any selected comparison window size.
It will be recognized that where the window includes a nucleotide region to be detected (e.g., a locus), the remainder of the nucleic acid may be deleted or modified.
One of skill will also understand that sequencing technology is imperfect.
Typical error rates for DNA sequencing are on the order of 1-5 % , depending on the particular technology used for sequencing. Accordingly, one of skill will recognize that the amplicons of the sequence listing are most preferably obtained by amplifying a genomic nucleic acid (e.g., a genomic clone from a library or genomic nucleic acid isolated from a plant) using the primers described for the particular amplicon. However, the amplicons are also obtained by other means, including synthetic creation of a nucleic acid having the indicated sequence, or any other method described herein.
Modifyin~Nucleic Acids in the Seguence Listins One of skill will appreciate that many conservative variations of the nucleic acids in the sequence listing can be made using common techniques. For example, due to the degeneracy of the genetic code, "silent substitutions" (i.
e., substitutions of a nucleic acid sequence which do not result in an alteration in an encoded polypeptide) are an implied feature of every nucleic acid sequence which encodes an amino acid. Similarly, "conservative amino acid substitutions," in one or a few amino acids in an amino acid sequence of a packaging or packageable construct, are substituted with different amino acids with highly similar properties (see, above) are also readily identified as being highly similar to a disclosed construct. Such conservatively substituted variations of each explicitly disclosed sequence are a feature of the present invention. Substantially identical nucleotides such as allelic variants or recombinantly S engineered nucleic acids comprising all or a portion of a given sequence are also made using these standard techniques. - _ One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR
amplification using degenerate oligonucleotides, exposure of cells containing the nucleic 10 acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide {e.g., in conjunction with ligation andlor cloning to generate large nucleic acids) and other well-known techniques. See, Giliman and Smith ( 1979) Gene 8:81-97, Roberts et al. (1987) Nature 328:731-734 and Sambrook, Innis, Ausubel, Berger, Needham VanDevanter and Mullis (all supra). See also, Kilbey et al. (eds) (1984) Handbook of 15 Muta enicity Test Procedures. Second Edition Elsevier, New York.
One of skill can select a desired nucleic acid of the invention based upon the sequences provided and upon knowledge in the art regarding retroviruses generally.
The specific effects of many mutations on hybridization, for instance, can be determined with virtual certainty, even in the absence of experimental information on a 20 hybridization. Moreover, general knowledge regarding the nature of proteins and nucleic acids allows one of skill to select appropriate sequences with activity similar or equivalent to the nucleic acids in the sequence listings herein. Finally, most modifications to nucleic acids are evaluated by routine screening techniques in suitable assays for the desired characteristic. For instance, changes in the immunological 25 character of encoded polypeptides can be detected by an appropriate immunological assay. Modifications of other properties such as nucleic acid hybridization to a complementary nucleic acid, redox or thermal stability of encoded proteins, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according to standard techniques, all of which are well-suited to high throughput 30 selection.
EXAMPLES
The following examples are offered by way of illustration, and are not intended to be limiting. One of skill will immediately recognize a variety of alternate procedures, compositions, reagents and the like which can be substituted for those exemplified below.
Example 1: Identification of Marker Loci Marker loci must have multiple alleles to be used for genetic studies, genetic selection, and genetic identification within a species. We compared the DNA
sequences among different soybean varieties and found new sequence polymorphisms at 63 loci throughout the soybean genome. The alleles for these polymoiphisms are described in this example using allele-specific hybridization (ASH) as an example marker technology. We designed locus-specific oligonucleotide primers to amplify each locus by PCR, and designed allele-specific oligonucleotide probes to hybridize with and distinguish each allele. These polymorphisms provide new opportunities to develop genetic-marker technology and to expand genetic-marker applications for the improvement of soybean.
A. Materials and Methods Identification of Sequence Polymorphisms We selected soybean markers for conversion to ASH based on their genetic map locations. The objective was to have ASH markers well distributed around the soybean genome to maximize their general utility as markers and to have ASH markers near to genes of interest for marker-assisted selection and positional cloning of genetically linked genes, or other nucleic acids of interest. Map locations of RFLP markers were identified on Pioneer Hi-Bred International proprietary soybean marker maps and the USDA/ISU
public soybean marker map (Shoemaker RC, Olson TC (1993) (Glycine max L.
Merry.
P. 6131-6138. In O'Brien SJ (ed.) Genetic maps: Locus maps of complex genomes.
Cold Spring Harbor Laboratory Press, New York). The linkage groups on the Pioneer Hi-Bred maps were named according to the USDA/ISU public map by cross referencing markers in common between the maps.
Some marker loci selected for ASH development were homologous to cloned genomic soybean DNA. For these loci, fragments of genomic DNA were ligated into the PstI restriction site of the commonly available pBS+ vector, transformed into E.
coli strain DHSa using established protocols Keim P and Shoemaker RC (1988) Soybean Genet Newl 15:147-148, and mapped as RFLP markers prior to selection for use as ASH
probes. For a few other loci, genomic DNA fragments were ligated into the Lambda ZAP II vector, packaged, plated on E. coli strain XL1-Blue MRF', selected based on homology to a DNA probe and excised in pBluescript SK (-) phagemid in E. coli strain SOLRT" using ExAssistT" helper phage (Stratagene, La Jolla, California 92037 U.S.A.).
Alternatively, selected DNA fragments were ligated into the pCR~" vector and transformed into E. coli according to the TA Cloning~ Kit V2. i (Invitrogen, San Diego, California 92121 U.S.A.) or cloned into LIC vector (PharMingen, San Diego, California 92121 U.S.A.) and transformed into E. coli strain DHSa.
Plasmid DNA was isolated using the Magic" or Wizard' Miniprep systems (Promega, Madison, Wisconsin 53711 U.S.A.) or Nucleobond Ax kits (Macherey-Nagel, P.O. Box 10 13 52 D523 >'13 Duren, Germany). Minipreps were precipitated in 0.1 vol 7.5 M NH4Ac and 2.0 vols EtOH at -20~ C for 20 min, spun in a microcentrifuge for 20 min, washed in 70 ~ EtOH, dried, and dissolved in 10 mM
Tris/HCl pH 8.5. Insert DNA was sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction and a Perkin Elmer ABI 373 or 377 DNA sequencer.
Sequencing primers were designed from the vector.
DNA sequence from each locus was then used to design forward and reverse PCR primers to amplify the locus from different soybean varieties. The primers were designed to have a dissociation temperature (Td) of approximately 60~ C, using the formula:
Td = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5, where #GC, #AT, and #bp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP10 column.
The PCR reaction mixture to amplify each desired locus consisted of 1.0 ~,M of the forward primer and 1.0 ~cM of the reverse primer, 1X of buffer (20 mM
Tripotassium citrate, 20 mM MgS04, 40 mM Tris base, 10 mM Glycine, 5 inM
L-Histidine, and 0.01 l Triton X-100), 240 ~cM of each dNTP, 0.4 U ~cL-~ of Hot Tub DNA polymerase (Amersham Life Science, Arlington Heights, IL 00005 U.S.A), and 1.0 ng ~cL-~ of template DNA, all diluted in HPLC-grade H20. The PCR reaction was done in 33 cycles with an initial denaturing for 2 min at 94~ C, subsequent denaturing for 30 seconds at 94~ C , annealing for 2 min at 58~ C, and extension for 2.5 min at 70~
C. A final extension was done for 3 min at 70~ C. An aliquot of each PCR
product was run on a 1-2 % agarose gel and stained with EtBr for viewing under ultraviolet light.
Each amplicon was purified for sequencing using a QIAquick-spin PCR
Purification Kit using the manufacturer's protocol (Qiagen, Chatsworth, California 91311 U.S.A.). If multiple bands were amplified from one PCR, all were individually extracted from the gel and purified using the QIAquick Gel- Extraction Kit (Qiagen, Chatsworth, California 91311 U.S.A.). These products were then sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction on a Perkin Elmer ABI 373 or 377 DNA
sequencer.
Each oligonucleotide primer used to amplify a DNA fragment by PCR was also used to prime the sequencing of that fragment from one end. When a DNA fragment was too large to sequence in one pass from each side, additional sequencing primers were designed from the initial amplicon sequence and used to obtain more sequence further inside the fragment. When multiple bands were produced from an original set of PCR
primers, the sequences among these different fragments were compared and locus-specific primers were designed to amplify only the desired locus.
Eight soybean varieties (Table 1) used to compare DNA sequences and identify single-nucleotide polymorphisms were progenitors to many modern North American soybean varieties (Gizlice et al. (1994) Crop Sci 34:1143-1151) and should, therefore, represent most alleles existing for each locus. Two varieties, BSR101 and PI437.654, were also sequenced for each locus because they were the parents to a population of recombinant-inbred soybean lines used to map many of the markers. These 10 varieties represent a broad range of maturity types and were considered genetically diverse by RFLP fingerprinting analysis (unpublished data). The sequences of these 10 varieties were aligned and compared for differences at each locus.
Table 1. Soybean varieties used to compare DNA sequence, their estimated contributions to modern North American varieties, and their relative maturity groups.
BSR101 and PI437.654 were included to identify marker alleles that could be genetically mapped in a recombinant-inbred population.
Vanety ercentage ontn utton aturlty roup Man arm ttawa Lincoln 17.9 III
A. K. Harrow 4.9 III
S100 7.5 V
Ogden 4.9 VI
CNS 9.4 VII
Tokyo 3. 8 VII
Jackson 3.3 VII
PI437.654 - ~ II
Percentages rom lz ice et a .
Design and Testing of ASH Markers When single-nucleotide polymorphisms or other polymorphisms were found among the DNA sequences at each locus, several ASH oligonucleotide probes were designed, synthesized, and tested for each allele. The ASH probes were designed to have a dissociation temperature of about 37~ C. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using (-cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP10 column. The probes were tested against the 10 sequenced soybean varieties, 12 additional North American ancestor varieties, and 72 recombinant-inbred lines of the BSR101 X PI437.654 mapping population. Probes were also tested for signal (hybridization of the correct probe) to noise (hybridization of the incorrect probe) ratios for each locus using the amplified target DNA from one variety in a dilution series. The dilution series were made using about 100 ng, 10 ng, and 1 ng of PCR-product (target) DNA.
Target DNA was attached to a Hybond N+ nylon membrane (Amersham Life Science, Arlington Heights, IL 60005 U.S.A) in the following manner. The membrane was soaked briefly in water and the excess water was removed from the membrane by blotting. Two microliters of the PCR product or diluted PCR
product was pipetted onto the moist membrane. The membrane was placed on a blotter paper WO 99/31964 5~ PCTNS98126935 saturated with a DNA denaturing solution (0.6 M NaCI, 0.4 M NaOH), DNA side up, for two minutes. The membrane was then transferred to a blotter paper saturated with a neutralizing solution (0.5 M Tris pH7.5, 1.5 M NaCI) for 10 minutes. The membrane was then baked at 85-90c C for 1-2 hours and UV-crosslirtked at 20,000 ~J.
Prior to hybridization, each membrane was soaked in a hybridization-washing buffer (0.75 M Na, 0.5 M P04, 1.0 mM disodium EDTA, and 1 ~ sarkosyl) for 30 min at 65e .C, then in fresh buffer foi 30 min to overnight at room temperature.
Each ASH probe was end-labeled with 3~P transferred from the y position of ATP by the T4 polynucleotide kinase reaction according to the kinase manufacturer's protocol (New England Biolabs, Beverly, Massachusetts 01915 U.S.A.).
Hybridization of the probe and target DNA was done for at least one hour while shaking at 60 rpm at room temperature. Afterwards, the hybridization solution was discarded and the membrane was sequentially washed, each time in fresh solution, once for 2 min, once for min, and twice for 30 min while shaking at 60 rpm.
15 The hybridized membrane was placed against X-ray film for 30 minutes to 18 hr at -80 ~ C. The X-ray film was developed and the probe was evaluated for its hybridization characteristics. If necessary, the probe was redesigned and tested to increase the signal-to-noise ratio or signal strength.
Genetic Mapping of Soybean ASH Markers Two segregating soybean populations were used to confirm the map location of each ASH marker relative to other markers in linkage groups. These two populations each consisted of about 300 recombinant-inbred lines from the crosses PI437.654 X BSR101 and Bell X YB17E, respectively. The mapping procedure used and the population history for PI437.654 X BSR10~ were as already described (Webb et al (1995) Theor Appl Genet 91:574-581; Keim et al (1997) Crop Sci 37:537-543).
The population from Bell X YB17E consisted of F3,6 lines. When possible, the linkage groups were identified with and named according to the USDA/Iowa State University public soybean map (Shoemaker and Olson 1993).
Results We identified DNA-sequence polymorphisms at 63 independent loci~in the soybean genome by comparing the DNA sequences among different soybean genotypes.
These 63 loci were named (Table 2) according to pre-existing nomenclature. All were named as RFLP marker loci, except the loci php08320E, php08584A, php10078A and php10355B which were originally mapped as AFLP marker loci, the locus php which was only mapped as an ASH marker, and the locus SOYBPSP which was a soybean gene sequence in GENBANK (accession M 13759) for a 7S seed storage protein (Doyle et al. (1996) J Biol Chem 261:9228-9238). Of these 63 loci, only SOYBPSP was a known gene sequence and its nucleotide polymorphism (which was previously unknown and identified by our sequencing) was found in an intron.
We designed forward and reverse PCR primers (Table 3) that amplified a region (the amplified product of the PCR reaction is an "amplicon") of each locus containing at least one nucleotide polymorphism. Most of these loci had multiple polymorphic nucleotide positions separated by monomorphic nucleotides. When the distance between two polymorphic nucleotides was greater than the size of a typical oligonucleotide hybridization probe, we considered each polymorphic nucleotide position to be an independent sub-locus (Table 2). When the distance between two polymorphic nucleotide positions was within the size of one oligonucleotide probe, we considered those polymorphisms to be dependent and part of one sub-locus. We named each amplicon using the prefix 'pha' as an acronym for Pioneer Hi-Bred amplicon, followed by a unique identification number (Tables 2 and 3). Locus or amplicon names can both be used to refer to a DNA region containing specific polymorphic sub-loci.
The particular primer pairs described here produced amplicons ranging in length between 86 and 1880 base pairs. About half the amplicon sizes were estimated from bands viewed on agarose gels and about half were obtained by sequencing the entire amplicon region. Estimates from gels were accurate to within about 10 % of the insert in size. The loci php05219A (amplicon pha11138) and php10355B (amplicon pha11627) each produced two fragment sizes because each locus had an insertion-deletion event that varied among different soybean genotypes (Table 4).
One locus, pK079A, had two non-overlapping amplicons, pha11074 and pha11075. Two amplicons were needed for this locus because the DNA sequences flanking its two sub-loci were conserved at a second pK079 locus. A region of DNA
between the two sub-loci was found to be different from the second pK079 locus so PCR
primers specific to pK079A were designed for two amplicons, one for each~sub-locus.
The polymorphic nucleotides for each allele of each sub-locus were designated by upper-case letters within the probe sequences shown in Table 2.
Each oligonucleotide probe distinguishes one allele of a sub-locus and locus. A
public soybean variety representative of each allele is listed in Table 2 for the purpose of example.
The primer pairs and probes presented here enable the use of allele-specific hybridization (ASH) and many other techniques for detection of polymorphisms at these loci. ASH was used here as an example marker technology for each polymorphism. Other genetic marker technologies are equally effective as a means to exploit these polymorphisms in soybean improvement programs. In addition, other DNA sequences than specifically shown here are used for primers and probes to detect these polymorphisms, e.g., as described supra. Any DNA sequences flanking these polymorphic nucleotides and within the size range amplifiable by PCR (up to 40Kb using long-distance PCR methods, although more typically about 2KB or less using standard PCR methods) are used as PCR primers to amplify the polymorphic DNA, and the forward and reverse primers are designed from either DNA strand. Any DNA
sequence containing the polymorphic nucleotides and within the discriminatory capability of DNA , hybridization conditions are used as hybridization probes to detect the polymorphism.
The probes are designed from either DNA strand.
The genetic map locations by linkage group for 43 of these 63 loci were established by mapping them as ASH markers in segregating populations (Table 2).
Mapping showed the alleles of each locus to be heritable and to segregate normally in a recombinant-inbred population. When possible, the linkage-group names used here were the same as those used as a reference map for soybean (Shoemaker and Olson, 1993, supra.). Thirty eight of these 63 loci mapped as ASH markers to 17 linkage groups that correspond to the public reference map and 5 loci mapped to linkage groups on three other genetic maps that have not yet been identified with linkage groups on the public.
reference map. These later groups were named using a combination of letter (B, L or Z).
and number. Twenty markers, pA059A, pA064A, pA077A, pA593A, pBLTISA, pK401A, pR153A, php02340B, php02396A, php05264A pA343A, pA748B, pA858A, pB132A, pG17.3A, php02371A, php05290A, php02329A, php02376A, and php08320E, were not mapped as ASH markers, but sixteen of their loci have been mapped as either RFLP or AFLP markers. Four loci, php02396A, php05264A, php02329A and php02376A have not been mapped as markers of any kind. Regardless of their map status, the 63 loci described here reside on most, and possibly all, of the 20 chromosome pairs in soybean.
WO 99/31964 53 PC'f/US98I26935 Table 2. Soybean marker loci:
their genetic linkage group, sub-loci, and polymorphic nucleotides, probe sequences for allele-specific hybridizations, and a representative soybean variety for each allele.
ocus m age p icon a - o a oy can Name Groupfi Name Locus Sequence# Variety NO:
p U 9 p a ttgtga caatata arrow (4) 1 ttgtgaCcaatat CNS 2 P p a actaaat tatacc arrow 3 (2) 1 ggtataCatttag BSR101 ~ 4 p p a gttgc tgggt ( 1 ) 1 ttgcCtgggtt AKHarrow 6 2 ttttcTTttgttag Mandarin 7 2 ttttttcttgttag AKHarrow 8 3 tttccAaaggtg Mandarin 9 3 ttttccGaaggt AKHarrow 10 P p a 0 ggatt atacta 11 (3) 1 tggattTTatacta Ogden 12 2 atcatgatttcag PI437.654 13 2 tgaATCTGAtttc CNS 14 3 gCAAGTATCAtg CNS 15 3 tttcagTTtgattt PI437.654 16 4 atgttCgggga BSR101 I7 4 aatgitTgggg Mandarin 18 5 . agaacaCggaat BSR101 19 5 gaacaTggaatg Mandarin 20 6 tgcaaCggcat BSR101 21 6 tgcaaTggcatt Ogden 22 P p a tttgaa ctttat 23 ( 1 ) 1 tttgaaTctttatc PI437.654 24 p p a tcatc aatcac 25 ( 1 ) 1 ttcatcAaatcac Jackson 26 2 GatgaCgatttg BSR101 27 2 aCatgaTgatttg PI437.654 28 3 tcatCtGtgataa BSR101 29 3 tcatCtCtgataa Jackson 30 3 tcatGtGtgataa PI437.654 31 P ~ P a tgcact taaatta 32 (1,2) 1 ttgcacttaaatta Lincoln 33 2 ttccttcTtttttt BSR101 34 2 tccttcCtttttt PI437.654 35 3 agagaGatactc BSR101 36 3 agagaCatactc Columbus 37 4 gaccCgctc BSR101 38 4 gaccTgctcc Lincoln 39 5 gaaatcCcaaaaa BSR101 40 5 gaaatcAcaaaaaa Lincoln 41 6 tttatcAtttttgg PI437.654 42 6 atttatcTtttttg BSR101 43 pA343A D pha13072 1 cagcaAtgaaag CNS 44 (1) 1 tcagcatgaaag AKHarrow 45 2 ttaactTgccag AKHarrow 46 2 ttaaciAgccag CNS 47 3 accctCaatatg AKHarrow 48 3 accctAaatatg CNS 49 p G p a ttggaa tatact anc a 50 (1) 1 ttggaaGAtatac PI437.654 51 1 ttggaaGTtatac BSR101~ 52 2 agcacAagtgg PI437.654 53 2 agcacTagtgg PI86050 54 p U p a attaggggcag 55 (1,2) 1 attaggGggca PI437.654 56 2 atatgtaAcaaaag PI437.654 57 2 tatgtaCcaaaag BSR101 58 P y p a tctttt cataatg 59 (1,2) 1 cattatgCaaaag CNS 60 2 gtactaTtatttg PI437.654 61 2 gtactaGtatttg BSR101 62 3 ttgagGatttag BSRI01 63 3 ttgagCatttag AKHarrow 64 4 aaggaGgttgc BSR101 65 4 taaggaAgttgc AKHarrow 66 5 ttagttGagagg AKHarrow 67 .
5 ttagttAagagga BSR101 68 P y p a gttttt tt ataaatarrow 69 (1) 1 gtttttAttATTATTaCNS 70 2 aaatatAtatatatataAICHarrow 7I
2 aatatGtataAtatatCNS 72 3 atatatTtaataaatatCNS 73 3 tatatatAtaTATATATAICHarrow 74 4 aaaaTaaaaAtaaaagAICHarrow 75 4 aaaAaaaaCtaaaag CNS 76 5 atctttGatgagt CNS 77 5 atctttCatgagt AICHarrow 78 6 tTtmtCtttttac CNS 79 6 ttAtttttAttmac AICHarrow 80 P ~~ p a agaaat gtaagt 81 (1) 1 agaaattTgtaagt PI437.654 82 2 aatcttTtttaaag BSR101 83 2 aatcttCtttaaag PI437.654 84 P ~B p a tctat ctgaag an arln 85 (1) 1 tctatActgaag CNS 86 2 aatgatAatttagt CNS 87 2 aatgatCatttag Mandarin 88 P ~ ~ H p ctacatt mttg gg (1) 1 tacattGtttttg AKHarrow 90 2 mttgTtagaga CNS 91 2 mttgCtagag AKHarrow 92 3 acactGcttac CNS 93 3 tacactActtac AKHarrow 94 p 88 p a taaggtaatgttg 95 (1) 1 aggtGGTaatgt CNS 96 2 ggcttAtgcatt CNS 97 2 ggcttCtgcat AKHarrow 98 p W p a aggg ctctg . 99 ( 1, 1 tagggTctctg BS R 101 100 2) ~
2 atacttgtactct Ogden 101 2 tacttCgtactc BSR101 102 3 tatggagtaattg Ogden 103 p K p a ttgaat cccct 105 (1) 1 ttgaatCcccc PI437.654106 2 atgmCgaagc BSR101 107 2 . atgtttTgaagca PI437.654108 3 cggttTtattag PI437.654109 3 cggttCtattag BSR101 110 4 attccTgcccc BSR101 111 4 attccAgcccc Tokyo 112 pB p a accg gcaac 113 ( 1 1 ttgcCcggtg PI437. 114 ) 654 2 . tgtaaTgcGtg BSR101 lI5 2 tgtaaCgcGtg PI437.654116 2 tgtaaCgcAtgt Cook 117 3 tcaccTggatc BSR101 118 3 atcaccCggat PI437.654119 p p a ttcagt aaacc 120 (1) 1 ttcagtTaaacca PI437.654121 p~ p a tcttaa aggct o 0 122 ( 1 . tcttaaAaggct CNS 123 ) p~ p a gatca cccaa arrow 124 A
( 1 1 atcaaGcccaa CNS 125 ) p~ p a carat ccacaa . 126 A
( 1 1 cacatAccacaa BSR 101 127 ) 2 attattAttttcac PI437.654128 2 attattTttttcac BSR101 129 3 aggagtAgtaatt PI437.654130 3 ggagtGgtaatt BSR101 131 p~ p a ttggac attaata . 132 A
'(1) 1 ttggacAattaata BSR101 133 ~
2 taatatcTtatgca BSR101 134 2 aatatcGtatgca PI437.654135 p ~ p ttctcg gcc arrow 136 (4) 1 ttctcgCgcc CNS 137 2 ttctgataaaaaaa CNS 138 2 tctgatGaaaaaa AKHarrow 139 p p p a gaatga tttga 140 (1) 1 gaatgaTtttgac Mandarin 141 p M p a tcattc ttcatg . 142 p lA
(1) 1 tcattcTttcatg BSR101 143-p p . p a catata tagtag an arm 144 1 acatataAtagtag CNS 145 p p a aaaaaaaatgagg 146 OB
( 1 1 aaaaaaaTatgagg CNS 147 ) p pU p a tgtgacaaccga 148 lA
(1) 1 gtgacACaacc PI437.654 149 ph p p a tctcc gaaaca 150 OC
( 1 1 gtttcGggag PI437.654 151 ) 2 aagaagAtgatg BSR101 152 2 agaagCtgatg PI437.654 153 P P p a - ggca tttt arrow 154 c A
( 1 1 ggcaaGttttAc CNS 155 ) 2 ttttCcGgtgc AKHarrow 156 2 GttttAcAgtgc CNS 157 3 aataaaCaagagg AKHarrow 158 3 aataaaTaagagga CNS 159 4 tcgtttGcaatc AKHarrow 160 4 . tcgtttCcaatc CNS 161 5 aaatgatTccttg AKHarrow 162 5 aatgatCccttg CNS 163 6 tattatgTttttgatAKHarrow 164 6 tattatgCttttga CNS 165 7 ggaaagGattttt CNS 166 7 ggaaagAattttta AKHarrow 167 8 ttttgtATctgtat CNS 168 8 tttgtGGctgta AKHarrow 169 p . p a attaaa cccag an arm 170 P
1 attaaaTcccagt CNS 171 2 ttttgtTagagag Mandarin 172 2 ttttgtCagaga CNS 173 3 ttttctTctgtca Mandarin 174 3 ttttctCctgtc CNS 175 4 acaacTAAtaagg Mandarin 176 4 tacaactaaggta CNS 177 p p0 p a atcacg atcac . 178 (1) 1 gtgatCcgtg BSR101 179 P P p a cgcac atatg . 180 ( 1 ) 1 gcacaGatatg BSR 101 181 2 atactgAtttctg PI437.654 182 2 . tactgGtttctg _ 183 ~ BSR101 3 gttagaAtgttag PI437.654 184 3 ttagaGtgttagt BSR101 185 4 aatataaAtaaggg PI437.654 186 4 atataaGtaaggg BSR101 187 5 aaactcagttattt PI437.654 188 5 aactcAAagttatt BSR101 189 6 tttcttTgttattc PI437.654 190 6 atttcttCgttatt BSR101 191 7 attctgCtattatt PI437.654 192 7 tattctgTtattattBSR101 193 8 tgcacaTattact PI437.654 194 8 gcacaCattact BSR101 195 9 atcccatGttca PI437.654 I96 9 tcccatAttcag BSR101 197 P P~ p a . tttaag agagtt . 198 (1,2) 1 gtttaagAagagt BSR101 199 2 tgctttGatttg BSR101 200 2 tgctttAatttgg' Mandarin 201 3 atctctaTaaaca PI437.654 202 3 tctctaCaaacaa BSR101 203 4 tctcaaCttgga PI437.654 204 4 tctcaaTttggaa BSR101 205 5 tttgcAtgcaac PI437.654 206 5 tttgcTtgcaac BSR101 207 6 cacatCcatttg Ogden 208 6 cacatAcatttg PI437.654 209 P P~ ~ p a tacaaaa aaggtt arrow 210 1 acaaaaCaaggtt BSR101 211 2 ggcAtgTgagt AKHarrow 212 2 ggcTtgCgag BSR 101 2 i 3 tgatatATTcttca AKHarrow 214 3 tgatatGcttcaa BSR101 215 4 agttaacAtgaag AKHarrow 216 4 gttaacGtgaag BSR101 . 217 5 agcagtGaagta AKHarrow 21$
5 gcagtAaagtac BSR101 219 6 aatatatctctttttttAKHarrow 220 6 atatctcCtttttt CNS 221 7 aaaaaaGtagctaaAKIiarrow 222 7 aaaaaaCtagctaaCNS 223 8 tatttgCattagg AKHarrow 224 8 ttatttgTattaggCNS 225 9 tcttacaTctttg AKHarrow 226 9 cttacaGctttg CNS 227 10 cggtAgagatt AKHarrow 228 10 cggtGgagatt CNS 229 11 gggcacAaaag AKHarrow 230 11 ggcacGaaagt CNS 231 P P~ p a ctatat ttggtg 232 (I) 1 ctatatAttggtg PI437.654 233 2 aataatGattgtg PI437.654 234 2 taataatAattgtgtBSR101 235 3 ttattatCttttgtWilliams 236 3 ttattatTttttgtCNS 237 4 ttaAaatcTagtagCNS 238 4 attaAaatcCagtaPI437.654 239 4 attaTaatcCagtaWilliams 240 5 agattaAcaggc CNS 241 5 tagattaCcagg BSR101 242 p p K p a aagaga ggcta 243 (1) I aagagaTggcta PI437.654 244 2 tggaaAccttaat BSR101 245 2 tggaaGccttaa PI437.654 246 3 tacctcAagtgt BSR101 247 3 acctcGagtgt PI437.654 248 4 actaagAattttg PI437.654 249 4 actaagCattttg BSR101 250 P PO p a ttatag cacttg 251 (1,2) 1 tatagGcacttg Mukden 252 2 tggtaCgttatg Mukden 253 2 ttggtaAgttatg BSR101 254 3 ataaaccAtatatgPI437.654 255 3 ataaaccTtatatgBSR101 256 4 agtgtgtT"ITtttBSR101 257 4 gtgtgtAAGtttt PI437.654 258 5 ttggtCcaagg PI437.654 259 5 ttggtTcaaggt BSR101 260 P P~ p a atagg aaaagg 261 (1) 1 aataggTaaaagg PI437.654. 262 2 aaccttTctgtc BSR101 263 2 accttGctgtc PI437.654 264 p p ~ p a ctaca atgaag 265 1 tacatGatgaag AKHarrow 266 2 aatcttGctgtg BSR101 267 2 aatcttActgtga AKHarrow 268 3 atgtgGcattga BSR101 269 3 catgtgAcattg AKHarrow 270 4 tatacaaTatctaaaBSR101 271 4 atacaaCatctaaa AKHarrow 272 5 tcaatgAtggata BSR101 273 5 tcaatgTtggata AKHarrovi~ 274 p p p a aaaat ttggatc 275 (1) 1 aaaaatAttggat PI437.654 276 p p0 p a gaatgaatttttc arrow 277 OA
( 1 ) 1 . aatgaaCtttttcCNS 278 2 aaggagGgaaaa AKHarrow 279 2 aaggagAgaaaaa CNS 280 3 aaatgaaCaaaaaaaCNS 281 3 aaatgaaAaaaaaaaAKHarrow 282 p p p a tttgtt tttatga . 283 ~
(1) 1 tttgttCtttatga BSR101 284 2 atctatGtatatta BSR101 285 2 . tatctatAtatattaPI437.654 286 3 gttgcCaaatca BSR101 287 3 tgttgcAaaatca PI437.654 288 P P~ p a agtgc tgaaa 289 (1,2) 1 agtgcACtgaaa PI437.654 290 2 gaggaGatgtag BSR101 291 2 gaggaAatgtag PI437.654 292 3 gatgaTtttagc BSR101 293 3 gatgaGtttagc PI437.654 294 4 taggGatttgg BSR101 295 4 aataggAatttgg PI437.654 296 5 ccatgTttggtt BSR101 297 5 ccatgCttggt PI437.654 298 p P a agga tat cc 299 OE
( 1 ) 1 aggaAtatTccc PI437.654 300 2 agaaAAtcgcttt BSRIOI 301 2 gaaGGtcgGTC PI437.654 302 3 gttttttGactttt BSR101 303 3 gttttttTacttttgPI437.654 304 4 tatgatGtttcct BSR101 305 4 atatgatAtttcct PI437.654 306 P P~~ P a gg~c ggag 307 (1,2) 1 tggtacTggag PI437.654 308 WO 99/31964 6o PCT/US98/26935 2 tgcatAGcaaga BSR101 309 2 tgcatGAcaaga PI437.654 310 3 aactcGttgatg BSR101 311 3 aaactcAttgatg PI437.654 312 S 4 agtctGagmg PI437.654 313 4 aagtctAagmg BSR101 314 S gtgtaaAcggg PI437.654 31S
S tgtaaGcggga BSR101 316 6 tgaaGaaaaaTatg BSR101 31-7 6 gaaCaaaaaGatg PI437.654 318 7 atgggAcgttg BSR101 319 7 atgggGcgtt PI437.654 320 8 gctttgttgttg BSR101 321 8 gctttgGTAttg PI437.654 322 1S 9 ttagtGgacagt CNS 323 9 gttagtTgacag PI437.654 324 10 gccaaaGcaaata CNS 32S
10 gccaaaTcaaataa PI437.654 326 11 agtgaTactgg CNS 327 11 agtgaGactgg PI437.654 328 p a 1 gtcac gagaa 329 (1) 12 gtcacGTgaga PI437.654 330 13 taatccAgggaa BSR101 331 13 _ taatccTgggaa PI437.654 332 2S 14 attaattTagaagg BSR101 333 14 ttaattGagaagg PI437.654 334 1S gtctgCatatga CNS 33S
1S tgtctgTatatga PI437.654 336 16 atgtgCtttggt CNS 337 16 atgtgGtttggt PI437.654 338 17 cggTaagAtgaa BSR101 339 17 cggCaagGtg CNS 340 18 ctcgcAccttc BSR101 341 18 tcgcGccttc CNS 342 3S p p p a tttttta agaaaaga 343 (2) 1 tttttaGagaaag Mandarin 344 2 aaaaaaaTttaagacBell 34S
2 aaaaaaaAttaagacMandarin 346 3 ttttttttgtagtg PI437.654 347 3 tttttttGtgtagt Bell 348 4 ataatatATgaaam PI437.654 349 4 ataatatgaatttc Bell 3S0 S aatttcGtatgta Bell 3S1 4S S aatttcAtatgtac Mandarin 3S2 .
6 acatttGgattaa Mandarin 3S3 6 acamAgattaaa Bell 3S4 7 taaaaaaAtattactBell 3SS
7 taaaaaaGtattactPI437.654 3S6 8 gaacaaTttgtaat Mandarin 357 8 gaacaaCttgtaa Bell 358 9 cttcaCgaagg Mandarin 359 9 tcttcaTgaagg Bell 360 P P p a gtttctcttat ac . 361 SB
(1,2) 1 tcTTATCttatGac BSR101 362 1 tcTTATCttatAac S 100 363 2 gmcTgataac PI437.654 364 -2 gmcAgataac BSR101 365 P P ~ p a ttggg atgatg 366 SA
(1) 1 tgggGatgatg PI437.654 367 P ~ p a atctaa atttagt PI437.654 368 (1,2) 1 aatctaaAatttagt BSR101 369 P ~ y ~ p acttc agtgga 370 (1) 1 cttcGagtgga PI437.654 371 P a ~ gacaat taaaaa 372 (1) 2 gacaatTtaaaaa PI437.654 373 p p a aagaat ttccta o 0 374 (4) 1 aaagaatAttccta BSR101 375 2 atgtgtttggttt Tokyo 376 2 tgtgtGTttggt BSR101 377 3 tatmtaaatcg Tokyo 378 3 tamttTaaatcg BSR101 379 p~ ~ N p a ttaatta cttaag 380 (1,2) 1 ttaattaTcttaag Archer 381 2 tgcaaaaaaataaag BSR101 382 2 tgcaaaAaaaataaa Archer 383 2 actttatCttmg PI437.654 384 3 aagtCcaGcac Bell 385 3 , aagtCcaAcact PI437.654 386 3 aagtTcaGcact BSR101 387 4 atgCcCttttgt PI437.654 388 4 gatgCcAttttg BSR101 389 4 ggatgTcAtttt PI340.046 390 5 tttgctTtgtatg PI437.654 391 5 ttgctCtgtatg BSR101 392 6 agaattCgcatat PI437.654 393 6 gaattTgcatatc BSR101 394 7 tcatttAcccaaa PI437.654 395 7 tcatttGcccaa BSR101 396 8 tgtgctTatgag P437.654 397 8 gtgctGatgag BSR101 398 9 taattTAgcttaag PI437.654.399 9 ttaattCGgcttaa BSR101 400 10 ctgaaTgaggg PI437.654.401 10 ctgaaCgaggg BSR101 402 11 aattgtaAtcattg PI437.654 403 11 attgtaGtcattg BSR101 404 12 gataacCactca BSR101 405 12 gataacactcatt Sanga 406 13 agaataatatgcg PI103.091 408 14 attatatCacatgt PI340.046 409 14 attatatTacatgta BSRi01 410 15 tatataGggctg PI437.654 411 15 atatataTggctg Sanga 412 p p a gtataaa aaagg . 413 (2) 1 gtataaaaaaggg Lincoln 414 2 atccaCtaaatg PI437.654 415 2 atccaGtaaatg Lincoln 416 3 tcgataActtattt Lincoln 417 3 tcgataGcttatt PI437.654 418 4 atacGTTAGaaga Ogden 419 4 tatacTCaagact PI437.654 420 4 acTCTAGaagac Lincoln 421 4 tacTCTGGaaga CNS 422 5 aaatagTtaagatt PI437.654 423 5 aaatagCtaagatt Ogden 424 6 taacttaaAataaaa PI437.654 425 6 aacttaaGataaaa CNS 426 7 ttTcAtattaacc PI437.654 427 7 ~ ttGcGtCCTCA Ogden 428 8 aatcttCtataatc PI437.654 429 8 taatcttTtataatc Ogden 430 9 ttacaaGtttgag Ogden 431 9 ttacaaAtttgagt PI437.654 432 10 aaagaCtacttaa PI437.654 433 10 aaagaTtacttaaa Ogden 434 11 agattcAtatgttt PI437.654 435 11 ~ attcGTATGtatg Ogden 436 12 atacatACaaataa PI437.654 437 12 atacatATaaataag Ogden 438 12 atacatGCaaataa Lincoln 439 13 tttgttTGagaaaa Ogden 440 13 ttttgttCAagaaa PI437.654 441 13 tttgtiTAagaaaat Lincoln 442 14 ttatttCtttAttattPI437.654 443 14 ttatttTtttAttattLincoln 444 14 tatttCtttGttatt Ogden 445 15 aaatggCaaattg PI437.654 446 15 aaatggTaaattgt Lincoln 447 16 agttgGtctttg PI437.654 448 16 tagttgAtctttg Lincoln 449 17 attaacaGtaaagt PI37.654 450 17 tattaacataaaAgt Lincoln 451 18 ataaaagaatatat PI437.654 452 18 ataaaaAaatatatLincoln 453 19 cttctttCatttttPI437.654 454 19 cttctttAatttttaLincoln 455 20 tttactatGaaagaPI437.654 456 20 tttactatAaaagaLincoln 457 21 aacattCactataaLincoln 458 21 gaacattAactataPI437.654 459 22 taacattTgcataaLincoln 460 22 aacattCgcataa PI437.654 46-1 23 ttatataAtacataaPI437.654 462 23 ttatataTtacataaLincoln 463 24 cctcaTctaatg PI437.654 464 24 cctcaActaatg Lincoln 465 25 atccttGtttttg Lincoln 466 25 aatccttTtttttgPI437.654 467 26 aatccCcagaaa Lincoln 468 26 taatccTcagaaa PI437.654 469 27 atggaAgcgtc PI437.654 470 27 tggaGgcgtc Ogden 471 28 ggttgTggcg Ogden 472 28 ggttgAggcg PI437.654 473 P ~ p a actgc tataaca 474 (2) 1 actgcCtataac AKHarrow 47S
2 aaaattGccacg BSR101 476 2 aaaattAccacgt AKHarrow 477 3 ttctttTgtgaca BSR101 478 3 ttctttGgtgac AKHarrow 479 P 8 p a ggtga aaaaag . 480 (1) 1 ggtgaAaaaaagt Resnick 481 2 agtcaCattattc PI437.654 482 2 tagtcaTattattcResnick 483 3 taggtaCaaagtt PI437.654 484 3 aggtaTaaagttc Resnick 485 4 ttctctGtgttg PI437.654 486 4 ttctctCtgttg BSR101 487 5 ttgtgaAcataca PI437.654 488 5 tgtgaGcataca BSR101 489 6 tgataGatcttca Lincoln 490 6 gtgataAatcttc PI437.654 491 7 tacctactatgat PI437.654 492 7 tacctaTActatg BSR101 493 8 agaattAtgttgttPI437.654 494 8 agaattGtgttgt BSR101 495 9 agtatttAtaccaaBSR101 496 9 agtatttTtaccaaPI437.654 497 10 ataaaaaGgatgaaBSR101 498 10 taaaaaTgatgaagPI437.654 499 11 tatattAgaggat Resnik 500 11 tatattTgaggat PI86050 501 WO 99/31964 ~ PCT/US98/26935 12 atttatAcgagga PI437.654 502 12 atttatGcgagg Resnik 503 13 ttgtattATcaaattLincoln 504 13 tgtattTACcaaat PI437.654 505 14 gmcaAacaca Resnik 506 14 gtttcaCacaca PI86050 507 15 gatccGtatcc Resnik 508 15 agatccAtatcc PI86050 509 16 tatccGaccca _ 510 16 tatccAacccat Resnik 511 Pn~ P a ggaac ttacc 512 ( 1 ) 1 tggaacAttacc PI340.046 513 2 atgcttAactaac Burlison 514 2 tgcttGactaac BSR101 515 3 aaacacATaaaatg BSR101 516 3 aaaacacaaaatga Burlison 517 4 tcctgtAttttag BSR101 518 4 cctgtGmtag PI88788 519 5 cgttttAAaaatm PI88788 520 5 cgttttTTaaattttBSR101 521 6 tgataaaGttatttaBSR101 522 6 tgataaaAttatttaPI88788 523 7 tatttacTggttt BSR101 524 7 tatttacAggttt PI88788 525 8 aattttaTatttatgPI88788 526 8 aattttaGatttatgBSR101 527 9 aatttaGcacttc PI88788 528 9 aatttaAcacttc BSR101 529 10 gtgcGTtaTgc Burlison 530 10 gtgcTAtaAgct Proto 531 10 gtgcTTtaAgct BSR101 532 11 atttttTGgtatg PI88788 533 11 atttttTAgtatgg Burlison 534 11 aamttGAgtatg BSR101 535 12 gaaaaatCaaggt BSR101 536 12 gaaaaatAaaggta PI88788 537 13 ggtttTgccga PI88788 538 13 ggtttGgccg BSR101 539 14 tcaattCtcttag BSR101 540 14 tcaattTtcttagt Proto 541 15 tttatttTTaaaaaaaBSR101 542 15 tttatttTAaaaaaaaProto 543 15 tttatttATaaaaaaaBurlison 544 15 tttatttAAaaaaaaaPI88788 545 16 atmtGaaaattc BSR101 546 16 cattmTaaaattc PI88788 547 17 mcaatTtgtcat PI88788 548 17 ttcaatGtgtcat BSR101 549 18 acagaaCtcaac PI88788 550 18 acagaaTtcaaca BSR101 551 19 ctatTgaAggttt PI88788 552 19 tatGgaGggm BSR101 553 20 gactaaAgtgag Burlison 554 20 gactaaTgtgag BSR101 555 21 taacacTtaacac Burlison 556 21 taacacAtaacac BSR101 557 22 gtaaagttcaag Burlison 558 22 taaAGTTagttcaa BSR101 559 -23 agaaaaCaactgt BSR101 560 23 tagaaaaTaactgt Burlison 561 24 gccAAAACAGTt BSR101 562 24 caagcctaatag PI340.046 563 pK p a actata ttcgc . 564 ( 1 ) 1 actataCttcgc Lincoln 565 p 00 p a atgcc gcatg 566 (2) 1 atgccTgcatg AKHarrow 567 2 gtgtatGgttgt BSR101 568 2 tgtgtatAgttgt AKHarrow 569 3 gcatGtcgac ~ BSR101 570 3 gcatCtcgac AKHarrow 571 4 aaaaagTatgagg BSR101 572 4 aaaaaagCatgag AKHarrow 573 5 . gtgggcattag Williams 574 5 gtgggCcatta Mandarin 575 p p a 064 tgtgaa attagc . 576 ( I ) 1 gtgaaCattagc BSR 101 577 2 tgtatcGtaaatc BSR101 578 2 tgtatctaaatctt PI437.654 579 p a taaaatt ttggtt 580 P
(1) 1 taaaattAttggtt PI437.654 581 oy can popu atlons us to con a map ocatlon mar er by ASH
lrm t o eac ocus were (1) PI437.654 X BSR101, (2) Bell X YB17E, (3) B152 X Century 84, and (4) Shoemaker and Olson 1993. ~ indicates the marker's map location as an RFLP or AFLP
marker and was not confirmed by ASH.
$ Upper-case letters show polymorphic nucleotides in soybean.
Table 3. Forward and reverse PCR primer sequences for soybean marker loci.
mp icon ~'F"o'rw~ runer everse runer pnausi3uagaaggttgtaagagtctcggtcgtcg caaacgctcccaacagcucagaatctc p ggaatcaacttcacgtgagtgggagac 8 BggaB~B~aB~tgac p U6 tggtggctatggaaatctcatgtgtgga ctctcatttaccaaactccaacaatgatcacc p gaagaagattccacccagatcatcatcagtag cacactggtagatgggaagcaagaatagg p gcagtgcagctccaatcggtgtaac gtgcaatccaagacatctggacggac phalOti~cagctaaaccttacaaggatgattggtcaag ccctggactgaagngccataa"tgtatc _"
p a gctggttgggagaaagcacucc gtccagggctgtgaagagcaatg -p a gagggactatgtgaaatggagaggagtg gtatgctaaaagaggagacttgactggigag p gatgaaggaaccaacacttgcataacaamg tgtcagcactcctcacccamgccga S p a gagaacaaggacaaatcaataggtgagacgaagaaa cacttaacaggagtgcccctgatcaccag p a ctcatctgctcagaaccttcagtcagtc cggatcatgtctagtacactagagatgcttgtg p a ctacacttctaatgcctatttaggtgtgcttg gtcatatctagggagatttctaaccagttgtc p a taggcagcgtgacaaactgagcatagg gggttgat tcc at g g gggtaaatgaagttg p a gcactaatactttagttgactmgaggtggagag gccatgtggtagaagtatatgaaaaggtagatgacag ~
1 p a gaatatactagcttgatgcctatttgmctaaaccc agcagtcatacaatgctctttattgtggtgaag-p a catcttgaccagccaccaatctgagtacag ggcttcagtgagaaaggtggatcaaatgga p a ccaatatggaatgcagaaagtggatctatgttcag tcggtgtgttcaactatgacmggactg p a cccccaacaaaactaaaaatagaaccctcaacaacc ggtccaagaacattatgatcttgaacaaccttcac p a gcagaagaagaagataacgtacaagcatcaatcaagc caactcgtgcgatcctgagaaaactcc 15 p cattgggcatgattcttgaatagcctttttacc tggccagagcatcaacacatctataccttc p aaggtcggcttggtggttaaaggcag gagggctatguttcttctccagatgtgag p cgcataaccaaaacagtaacatcaatggaacag ccaaggaaggttggggatatagactagtg p a gaaactaaccttctgagaaaaccacacgttg gccttttatgcacattmcctggggatctaac p a caaaaatagcactggcagtacctggaacac gaaaaggmgttatgcttcgtactctgtctc p catatgtgctcagtgcctcc taccaattcaacacactgctac p a gatcgctactatgtataactatttgac tcctccgttactccaatcag p a gattacattaattaccgctatgactatatcttgggac gctaccactccattgcttctatgtattggtc p gtctcgctaatgggagtgaaac actagggtgtaaggaaaaggattc p a gagattggaaattgtagctctctttacttgctg cmgaggactcantggttgttattataggcamgg $ p a gaccgggctcgtaaaagcaatgggaatcat cctcctacagaattgtcaggcnatcatcg p a ctmcctacaggattgtcaggcttatcgtca gggctcgtaagtgtaatgggcatcag p a tttatagtgtcggtgcgtagactctgttactc caccaacttcatgcagctaggtcccaa p a ccaggtgtgtacatataaaaccaaacaggctc gaagagtaacagagtctacgcaccgac p ttatgggtttccttgtgttgcncttttctcc cccaaccgcttcctatatatcaaaatgatttcac 3~
p a gtgcaatctcatactggcttttcacgatgatag gctctgccctttgtcaacatcctaatgg p a gggaagaagaagaacactcggtacagtag aagctaggaaatccacactcaaattatcgacttgtgt p a cctgcttcttaatcttgttattctg cgaacaatatgatgagttctacaaag p a cactaagaattcgctcgctgta catcatcctgtccccctcct p a gaacracatggttattttatatgctac gatcctctccaatggcacacc p a gatgggaagatttacttctggtagac ggctctaaaccagttttattgacaag p a gcmgcatgaggattgataaggca gcugacatatagaaaatctgtgactga p a caccatcccttatttaattagtgcttaaaagattcttc ggagggtgcttatgtaaatgatgtaaagascat p a ctgcmtggactctgttgggataaacttctc catgaagctccaccamgctagtacatgaaac P caattcatggtttctcttat gtaacgaccacggttatc 40 p a gagttgcttcaaugctcatac ccuagaggcgagagtgtgg p a cttgcctcgcaccttctc ctagccttctccacaugttc P gattcggtagagtaccagatgat agcaaaacgccaccamcc p a tgttggcaggatctttgtgac atgctggaugtcttttctaccc P a gctgccaggttaagtgtttc cagattgacacgactggaag p a t gcaggacattcacagtcattg ctgctgtattacgttccattcttac P cagggtgggatgtccaatg c agactagagtatgaataatctacctc P a ctgaaccacttaaagagtcaacagc ctctcccatagatgacaaag P a gaatttgatcactcagaaaagggt atatgaatatgaatcaaccg P a a acggttgaccactacacag a gacgcaggatatggtgaag $
0 p a gacaacataaaagagtctataagacctccaagg taaatgacgagtatttatatgaagtgcagg P g ccgtgtaagcgtgtuaccaatctagttgc gctactgcaagttatcagtcaagagattattcc P a g tgctcaucttcacagagaag c aaactt t tt g B B ggBaB~ag P a c tcatgtaaccaactctctatgaagmgagatccac tctaatcggatttggtgtttcacacggtaag P a g atggctgtcattgccacagaggagtatcg tgactccaaaggaaagagasatgtttcaaaatcatc $$
P g gatagcaagtcaatttcatgccttgtgatagg caggacatgaagatgtacttagtgaatgtgaag P a ctggaagagggtgcaagggaatctgg agaacctagtctaccaccataccacgaac P c tcagtcgcattgcattaacatca caatcttcctccaccttctc WO 99/31964 ('7 PCT/US98/26935 p a caacgatctctccatgtccaac cggatgtaagtaatatgttatgttgg p a cacgggtacatgcatccttc gagaaattnagtggcaatgagtc p gccctacaatgacraaattacgtgtg gactgcttggctmgtctttcaagaagag Table 4: Sequences of nucleotide polymorphism regions:
pha08230 (SEQ ID N0:712) TTGAAGGTTG TAAGAGTCTC GGTCGTCGT TGTTCACCAA GGTAAGAATG
GCAGTCCCT GTAATTAACA AAACAAATTC AATAACATAA TTTTATGAGA
AGTAAAAGCA TGTGATGATG TTTATTAATT AATTAAATAA GTGTCATAGT
AAGATTAGTT TTTATAAAAT T[G/A]TTGG TTTTGTI"I'T'T ATTTATTTAA TAATZTI"T'CT
TAATCCGCAT AAAATAATGA CATTTATTTT ATTTTGAGAG GAAAGGAATA
CTAACCGTTA AGGATAACGA TGAGGTAATC AGCGTCAGCA TGGTGGGGGA
GAAGAAGGGT GTTGGGTTTG GAGTTGAACT CCAAAATGCG GTAGTCTCG
GAGATTCTGA AGCTGTTGGG AGCGTTTGT TGAACCTCTG GAGGACGCGA
ACGTGGCCAT ATTGGTI7"I"T GAAGAGAGTT TGGAACCTTT TAGAGTTGAA
GTGAAAAGGG TTCTTATTCT TATGTCTTCG TGGTTCTCTT TGAGACTCAG
AACCTTCACT TTCTTGGCTC TCTTTGTCTT GCTCCTCATC CTCGTCTTGG
TCTTCTTCTT CTTCTTCACT TTCCTTTCCT TGGTGCTITT' CCTGCTTGTG
TTGCCATTCG TGCTTTTCCT CTTCCTTTTG ATGAGGTTGG TGTG
pha10598 (SEQ ID N0:713) CTGCAGCTCC TTCTGCATCT TCCACCTCAA TAGTGATGGN TAACTCTATG
GAAGATGCAG TGGTAGTGTG GTGACCAGAT GTTCCTCCAC AATAATGCTC
CATTTCTTTG ACCTCAGGGA TTCCAGTGAT CTCCATGTCT TGACCTCTGC
CCACATTCCA TTGTCTTGAC CTCCTCCCAC ATTCCATTGT CTTGACCTCC
TCCCCCTCTT GTGATGTTAT CAAAGGTGGC TCCTCGTTTG TCCAAGAATA
AACAGCTACA TACATCACTG CATTCTGAAT CTGAAACACT CTCCTCTACT
CTTGTTCACA TGGTCTATTG AATGGGAAT CAACTTCTTC GTGAGTGGGT
TGTTCATTT TTGACCTTGT CCCCAAACTT GATGCCTTAA TTTTGCATCC
ACCTTTCCAC TCACTGCATG ATTTCTTCAA TTCCCCTGAC CTTGATATAG
AGGGTTGTGC CTCTGCCCCT TCTCAGGAAT GA[C/T]TTT GACTCTTCAA
TAGCTTGTGC TGCATTTGCT TCAACTCTTA AATGTTTACC TTGTCAGAA
CAAGGCTTCA ACCCACAAGA ATTGGCTAG CTCCTTCAGG GACTCTGAAC
TCCGTTGGGG TCACTGCTCG ATTCTTGTTT GGAACTAAGC ATTAGCACTG CAG
pha10615 (SEQ ID N0:714, 715) CTGCAGAGAG GAGTGGTGTT CATGCTTTCC CTGCTGGGGT GTGCCCACTG
TGGGTGCTG GTGGCCACTT CCTGGTGGT GGCTATGGAA ATCTCATGTG
TGGATAATA TCATTGATGC AAGATTGGTT GATGTAAACG GTAACATACT
TGACAGAAAG TCAATGGGGG AAGATCAGTT TTGGGCCATA AGAGGAGGTG
GTGGGGGAAG TTTTGGTGTC ATTC[A/T]T TCATGGAAGA TCAAGTTTGT
TTTTGTGACT CCAAAAGTGA CTGTTTTCAA AGTGATGAGA AACTTGGAGT
TGGAAGATGG TGCAAAGGGT CTTGTTTACA AGTGGCAATT GATTGCAACA
AAATTGCATG AAGATCTI'T'T CATAAGAGTG ATGCATGATG TGGTTGATGG
CACTCAAAAT GCCAATAAGA AGACCATTCA GGTTAC'ITT"T ATTGGTTTGT
TCTTGGCAA GGTGATCAAA TGTTGGAGTT TGGTAAATGA GAGTNTCCT
GAATTGGGTT TGGAGCAAAG TGACTGCATT GAAATGCCAT GGATCAACTC
CACCCTTTAT TGGTTCAATT ACCCAATTGG GACCCCCATT GTTTGATGTG
CCCAAAGAGC CCCTTTCACA TAGCTTCAAA ACCATGTCAG ATTATGTGtIA
GAGGCCCATT AGAGAAACTG CTCTTAAGTC CATTATGATT AAAAGTGAGA
GAGAGTGTGA GGATGGAATG GAATCCTTAT GGTGGAAAGA TGCATGAGAT
TTCACCATCA. GAAACTCCAT TTCCTCATAG AGCAGGGAAC TTGTTCTTGA
TTGAGTACTT AACATCTTGG GGGCAAGATG GGTTTGGATG CAGGTAATCC
GTTACCTAAA CATTTCAAGG TCA...(- 1300 bp)...AACATTAAGA CTATATTTAG
ATT'IZ'T'1'T"TG AAAGACAAAA GAATATTATT GATGATGGTT TGGCCCGCAA
GGTGTACAAA AACATGAGGA TAACAGTTTT TCCTCCCAAA ACAAAAAGCT
CGAGCACCAT ATGACTGAAA AGTGAAAACA AGACATATAC AATGCAAGTT
AAATTTAGAG ATATATCTCT AAATGCTGCA ACTAAGCCCG GAAAGTCAAA
CAAAAGCTAA CATATATCCA TCTGCAG
pha10618 (SEQ ID N0:716, 717) CTGCAGCACT CGAAAGTGAC CTGCATCAGC CTTCAACCTT CACCCTTATG
CAGCAACAAC CAAAACCATA ATTTAAAACC TATTTTGAAA CATATCCATG
TACATTTTCC CGCTATCAAG CCTGTTCTTT TAACAATAAT CATTATTCTC
TTGTCCAAAA ACTGGAAGCA ACTGAGATCA AGACAAATTG AGTGCAAAAT
CTACACAATC TATGTTCAAA ATATTGAGTG CAGGTCTCTT TCTTGCAATT
CCAAGTCGTA TCTATTGGGG TCACACACAA ATGGAATATC ACTAAAGCTA
TCGAAGAA GATTCCACCC AGATCATCAT CAGTAGCTA TTTGTGTCTG TTGATCCTTT
GTGGCTTCCC CATCA[T/G] CTTCTTCTCC [A/C]GAAAC ATTTGAATTC TTTTCAACT
AATTCCCTAA ATAACGA.......(-- 410 bp).....TGATGGTATG TGGAGATTTA
ACTTACTGTA GGTTCCAAGG TAGAGATATT TGTTTCCCAA AAACCCTT_C
CTATTCTTGC TTCCCATCTA CCAGTGTGG TGGTGCCTGC AAAACATTCA
CCATGTGTTA GTGATTCATC TTCGCAAGCA ACAATTCAGA AGCCAATTCG
AGTCGTATTC ATCAGAAATA TCCACCTCGC AACACCCCTA TACTTTGATA
CACCTCTTGA AAAGCCACTG CTCTTTCTGA TCAATTTCAA AATAACCAAA
GTCAAAAAAT CATATATAAT ATCAAATTCT GTACAACTTT ATATGCTGCA
AACACTACTA CTATAGTGCT ATTGCTATCT CTTATAAAAT ATAAAACTGC AG
pha10620 (SEQ ID N0:718) ACTGATGCTG GGCGTGTTCA TAACCTTGCT CCCATACTAA CAGAGCATCT
GTTTTCCTTC CCAATGCAGA AAGGGCGTGA CCTAGATGAC AGAATCAACC
AGTGCTAAGT TCAAAAGACA AGGCATTTGC CTCTATGACT CTTTCTATGG
TAAAATTTAA ACAAAACTGT CGCGTGATGA TAATAATCCA TAAACAGAAC
GAATAAATGA TAAATCCATC AAAATATAAA TTCCAAGTTC CAACACGGTT
CTTAACTTTT TATTCATCAA CCAGTAACCA CCTCTGTTAT TCTTAAAAGA
ACTTAAATCG CACTTCCCAG GGTACAAAAA GTTAAAAAGA TTCATCAAGA
ATTGGATGAA TTTAACAAGC AAGAGACCTT TGAGAATGTA GGCTTGAAGT
CGCGAGGGGT CAAGCTGAAG CGCTTTGTTG CAATCCTTAA TTACGTGCT
TGTGCAGCTC CAATCGGCTG TAACAGAAG GCCCTGTTAC TGAAAGCAAG
CATTGAACAT CACATTAGCA TTCACGGAAG AGAATTAAAC TTAATCGTGA
T[C/T]CGTG ATGAAAAAGA AACAGAATCG GAGAGGAATG GAGGTCCGAA
CCAGATGTCT TGGATTGCA CCAGACTGA GAAACGAGCG AATCGAGGAC
TCGAATGGCC TTGGACCAGT CCTTGGAG
pha10621 (SEQ ID N0:719) CAGCTAAACC TTACAAGGAT GATTGGTCAA GAAAAAGGA ATTCCCAGAA
ACGAGGAACT CACTCAGGCT ATGATAGTGA TGATAACATG TGTGAACAAC
GTTCTCTTTC AAAGAATAAC ATAAATGCAA TTCGTCATTC ACAAGTAGGG
[T/C]CTCT GAATCTATTT TCTTCATACT T[C]GTACTC TATGATAATC ATACTTTCC
TGTTCCCTAT CTTTCACAAC ATAAAATCTT GAGGATTAAG TGGGGAGAT
SO TTTGGCTACT ATTTATATAT CTAATAATC ACTATGAATA TTGAATATGC
AGTCTGGTGT GTGACACACA TTTGGGCTAA AACCAAACTC ATATTTAGTT
GAAAAAAATA AGACTCGTGG ATCTTCTATT TATAATATGG GCATACATTC
TTATATGGAG TA[TAATCTT TTCAGGTTCC ACGAGTA]AT TGATACATT
ATGGCAACTT CAGTCCAGGG ATATGCTGC CAGCTATpha 10623 AATT GAAATCTATT
ATCAATGGTG AATTCCCTG CTGGTTGGGA GAAAGCACTT CCGGTGAGT
AAATTCTAAA CTCACAGGTT TTTCTTACAT GTTTGTAACT TTTCAAGGTT
TGGATT[AC/ TT]ATACTAT CCAAAGTTGA TCATGA[ATC TGA)TTTCAG [TT/CAAGTA
TCA]TGATTT GGTAGTATGG GGCTTCAAAA AGTTATTCCA AATATTAAAA
CCTGGCTTCC AATTTGATTT CAGACATACA CTCCAGAGAG CCCAGCGGAT
GCCACCAGAA ACCTGTCTCA AACAAACCTT AATGCCCTTG CAAAGGTTCT
TCCCGGTCTG CTTGGTGGCA GTGCAGATCT TGCTTCTTCC A~ACATGACCT
TGCTCAAAAT GTT[T/C]GG GGACTTCCAA AAGGATACTC CAGCAGAGCG
TAATGTTAGA TTCGGTGTTA GAGAACA[C/ T]GGAATGGG AGCTATCTGC AA[C/T)GG
CATTGCTCTT CACAGCCCTG GACTGATTC CATATTGTGC AACCTTCTTT GG
pha10624 (SEQ ID N0:721, 722) CTGCAGGTAC TTITTATGAA TCAACTGCTT CTATTTCACA GCTTGTTTTC
ATTTATCTCT TCTAAAAGGG GAAGAGGGAA AAATTGTATT ATTTGCTAAA
ATATGATTTT CTTTTATCTA GAGGGACTA TGTGAAATGG AGAGGAGTG
CTATTTCTTC GGTTCCTI"I"T GTCACTTGTT GACGAATCAG AAAAGATAAG
GCAGTTAGCG GATTTTCTCT TTGGAAATAT TTTGAAAGGT TTGAA[G/T]
CTTTATCTTG ATAGTTATAC TTGCCATTAT ATATAAATTG GAAAATATGT
TTGTCATTGG ACTTTAATTA TTGTGTZTI"T CTCACCAGT CAAGTCTCCT
CTTTTAGCAT ACAATAGTT TTGTTGAGGC TGTI"ITTGTT CTGAACGACT
GTCATGTCCA TAATGGGCAT CGTGAGTCTC AAGGATCACG AA...(-- 970 bp)...AAATCTTGAA AGAATTAATA ATAAAAAAAC TATTTTATGG GGACAAAATA
CTTATTTAAG CCTTCTAATT ATAACATTAA TTGTGGGATC AGGATGCTTT
TCAAATTCTC GGCTGTAAAG AGATACGCAT TTCATCCACT CGTGCATCAT
CTGAGTCAGC AGATGTAGAG GAGGpAGGGG GAGA
pha10632 (SEQ ID N0:723) ACAAGCAGAG TACACCACGT TAGCAATCAG CAATCAATGT CAGAGCACAA
GAAACTGAAT CTAGAAAGAC CCTATCAACT TTCCTGTTTC CATTGCATGG
CCACAGATTG TGTTTTAATA AAGTTCAGTA ATTGATCACA CTCATTGTTT
GTACAATCTA TTATTCAAAA CTACTAGGCC TACAAATCAT GTTTAGCAGA
ATATACACTA AATAACCCAT CTGACAGACT CAGAATGCAA CTAAATGAT
GAAGGAACCA ACACTTGCAT AACAATTTG ATCACACGGT GTTTCAAATT
TCACATTGTA AAGGTAACCA AAAAAGAAGA AAAAGAAT[A /C]TTCCTAG
ATGAAATTTA CAAGCAAGTG TAAAACAACA CATGGTTAAA TGTGT[GT]T
TGGZT11TCT TTGTTATTAT TATTATTTT'T [TJAAATCG GCAAATGAGT GAGGAGTGCT
GACA[A/G] CACACTTATT ACTTGTTAAA ATTTATGGAA AACTAAAAAA
ATCATAAGAT TTAGTGAGAC TAACAAAATT TAGTCAATAA AAAAAA
pha10633 (SEQ ID NO: 724) CTGCAGTGGC ATTAGGCTCA TNATCACAAA ACAAATCCAT AACAGAAAAT
TACCGAGTCT GAGAACAAG GACAAATCAA TAGGTGAGAC GAAGAAAGN
AAGGACAGGC AATTNATGAT TTTTAAAAAG GAAAAGCAAA GATAGATGTT
AAATAAATTC CAGTGTTGTG GCCTCGANAG ANAAATTGCT AAGATAAAAT
CTAA[A/C]A TTTAGTACTA TAGCAAGAAA CACATCCCCA ACATACGTTT
GTTGAATATC TATGATATAC TTCTTACATT GATCAGTTAC TGTCTTCCAT
TGATTTGGGG TCCATTAAAA GCAGTATTC TGGTGATCAG GAGCACTCCT
GTTAAGTGG GGAAAAAAGT ATCAATTCTT TAAATGCTGA
pha10634 (SEQ ID NO: 725) TTCTATCATT AGCTCCGGAA TTGTTAACTC TTGATAACAA GTTTTCTGTA
CTTACAGGTG CAGCCCTCTT GCTGTCAATG CTGCTCAAAG TTATCTCTTC
WO 99/31964 ~0 PCT/US98/26935 CCTTGGTCCC AGTTGAGTAT TTATCAGTTT TTCCTGAATA AAAAGCAATC
AAGTATTCAA TATCACAGAA TACCTATGAA ATGTCACAGT ATAAAGGAAT
ATTATGGGAA CAGTACCCTC AATGCAGAAT TTTCCATCC TCATCTGCTC
AGAACCTTCA GTCAGTCGA GTTATTTCTG ACTTCAGAGA CACATTCTCA
GCAGTTAACA TCTCAACTTT TCGTGCCAAT TCTTCAGTCT CGGCCTAAAC
AGAGGAAATA TGAATATTAG TTATTAACTT TCCTCTTGTC AGAACAATCC
TTTTCCTTT TTAGTTATTT TATTTTATTT TTTGAGGCTA CATATGTGTA
CATGCAACAA GTGAATAAAC ATGTGATAGT AATAAAGAAA CTAAACAATT
TGTCTTTT"TA TCAGACAAGA CACGATGGAA GGTTTGCAAG TCAACAAACT
AAAT[T/G]T ATACCTGCTT CCTCAGCCTG GACCTTCTAG CAGATTCACG
GTTAGATTGT TTTCTCCTCT CCCGTTTCAG CTCACGCTCA TTCTAAGGAA
TTAAAAATAC ACACAGGAAT TAATTTACTG TCAGTAAAAG ATGGACGGTA
TTGCATAATT GTTCTAATGT GTTTCACAA GCATCTCTAA TGTACTAGAC
ATGATCCGA GACATTAGAA CAAACCATTA TACCTGTAAC CAAGTTTCAT
TACGCACAAC TGCACAAGGT TGTGCGGCAC TTGTGGAATT TGCCTTAGAA
TGAACAGTCG AAGGGTTCCT CAGCTCCAGT GCTGTGGCCA TACCTGAAGA
AACTACAGGT CCAACTAATG TTCCTGCAAC ACTAGCTGGG TAGCTGACAC
AATCTTTTGG AAGATGCAGT CTCTTGGAAG CCCGGACCAT TTGTAGCTCA
GTTTTCCCTT CTGCATCTGC TTGTACAAGA TTGATATAAT ATATAACCCT
CAAAACACAT ATTATGTTGA CAACTTAATT CCTAAAATTA ATAGCTGAAG
TTACTTAAAT ATCACATTCT GCATGCAAGT ATGACATTTT CTTGATAATT
GCATACAAAG AACACCAAAA GCTTGACTGT ATTAAGTCTG AAAACCAGGT
AAAAACCCAT TCAAGTATCC CATTTGTCAA ACTAATCAGA AAGTTAAAAC
TTGGGTAATT TCATTAACTG TGCATAAGTA ATAGGATAAT ATCAACTTAT
CAAGACCATA GTACCATACT CAATGATCTA GTTGACAACA TTGCACAGTC
ATGACACCAA AAGTATAACT GTAATCCAAC ACAACTCTTT TGTTCATCCA
GATGCAACAG TGAATAGGAG TCATACCAGT GATCGGTGTT CCTTCTCGGC
TTCTITTCCT TTTTGTTTGA TTAGCCTGAG CATAAAATTA ACAAATAACA
TAATTTAGTG CTTCAATTTG TCAGATCATT TAAATAANAT AGACTGCTAG
TGCACACAAA CAACAGATGT TAATGGTGAG AAAGTTCTCA CCCTGCAG
pha10635 (SEQ ID N0:726) CTGCAGTTTT TCCCAGAAAA AAAATGGTTA CGATTTAGAA CTAATGATGT
AGCACAAATA ATTGTTCAAA TCATTATGCT TCCAGCAACA AAAACTCAGA
TGCAACAATG GAGATGCCCA TGGCGAATCA AAATGCACTG GAAAATAGAT
TGGCACGATC ATAAAAGATC TTCTACACT TCTAATGCCT ATTTAGGTGT
GCTTGATAA AGATGAATTA GG[G]GGCAG GGATAGTTTT AAGAAATGAA
AAATGTTAAC TTGTTACTAT ATGTA[A/C) CAAAAGTAGA AACATCATCA
TTTCATTACA CCAAATGTTT TCATCTCTAT TTTAAGATGA CAATGAAGTG
AACATATAAT AAAAGAGCTC AATTTCGATA CACTTGGTAT AAAAG'ITITI' ATATTGACA ACTGGTTAGA AATCTCCCTA GATATGACT AAAATAATTT
TTACAAAAGT CAACAATTTT TAATTATGTG ACAATTTATG ATTGGATAAC
AGTGTAAAAA GAACATTTGT ATTGTTTGTG TATTTCCTAT TAAATTCATA
AAATAACATT TCTTTCATGC TTATI"I~'AT G'ITI"1~TCTTG GAAATACTTT
TCACCTATCC AACCATGCAC ATTTI'T'AACT AAAACCAAAA ATAGAAGCAA
TAACATACTT GAATTGATTG ATTCCATCAG CAAGAGCCGA ACCAAAAAGG
ACCATAATCT TGTCTGAGAT GCCAAAGGCA ATATTTTTCC TTGACACTAC
AGCAATCTGC AG
pha10636 (SEQ ID N0:727, 728) CTGCAGATTC ACTTCAGGAA TGGAACAAAA ACCACCCAAC CAAGCACTGC
TCACAAAAGG AATTTCATCT GTAAGTCGTC TAAGTCCACA TTCTCATTTT
TTAZTI"I"TTT GCTAATTATT TTCAACATCT TCAAGGTTCT TTTGTTTTTA
WO 99/31964 ~ 1 PCT/US98/26935 ACTAGTAATA GCGCCACTAA CTACCATTTT TTAAGGGGAA AAAAATTGGT
TACTGTGTGT ZTI'T'TCTTCT GTGGTCTTGA TCCAGACTAG TTTACATGTT
TTGGCGTTTT GGGTGTTGCA A[T]TTTTTT TTTGGAAAGG TTGTTATAGT
GGACTAGTGG CCATGGATGG TATTATGAGA GTTGATTAAG TAGGGGAAAA
AAAGGGTTTT TGTGGTTTTA AAAACTAGG CAGCGTGACA AACTGAGCAT
AGGGTTCCA TTATI'T"1"TAA CT~TTTAT TGAGAAAAAA AATAATATTT
TTGAGCCGTT TGGTTCCACT ATA[C/G]TT CGCCATCATG AAAAGCCGTG
TCCTTTGTGG CTTTCTTACA ATTCTACTTT ACTCTTACTC TACTCAAGAG
TTAAGATTCC TTZ"I'I'CAAAG CAACTTCAT TTACCCATCG GACATCAA C
CTATTTATT AACTATGTTT CCATGTCGAC CTTTGATGTA CAACACAAAA
ACAAACCAT AGCCATGCCC TTGTATTCTT GGCCAAAACA AGAGAGAGAA
AGAGAGAGAG TAAAAATCTA TGCTTTTCCT CCTCGGGAGT ATCAAAATGT
TTGTCGGACT CTCCCAAAAG TCATCAGTCA CCTI"I"TAAAC TTTTAAGATA
TACATGATAT TATCCACAAC AACGGTAAAA TTATGCATTG TATGTCGGGT
TACTCCACTT AACTATATTT TATTAACTTC TAAAAAACAC TATTTACACT
CGTCACTTCT TTACATCGAA AATCTGTAGT ATTATACATA CATTATAGTA
ACAAATTTCA CATTAAAAAT AAGAGTATAA TATAAGAAAG TATGCACCTT
TAATAACGCG TAAACGTGAA TTC'ITITI"TG GCTTTTGAAG CGTTGAGCCA
ACCTTTAAGC AAAATATTCA AATTTAACTC AAGTCATTGT CGGTAGTTAG
CTTATTTTGG TTATGATATG ATAGCCAGTT TAACGACAAT TCAAAATGCC
GACAGCATGA AA.....TCC AGTTTGGAAT AGTGACCAAG CCNAATTATA
ACTTTACGGA TCATACCAAN CCAAAGAACA TTGAGGTTTA ATATATATAT
ATATATAGCT TATGTCACTT TAATCAAAGC AGCAATTGCT GAAATTGCTG
GAGACAAGAA GTTGTTTAAT GTGGTGGCTT TTTGAGAAAT AACAGACAAC
CCCAATCTAA AAAAGTCCAG GAAGATATTG TTTACGTGTT GGGATTGAGA
TTGGAAGAAG AAGGTGAGAA TGTAAGAGCT GATACAGATT AGACTTGAAT
AGGTTGGGGA TTCCACTTGA TATATGGTGA TTATAAGGGC TNCNAAATTT
TGCTAACTTC AAGAAATAAA AACGTATTAA CTGATAAAAT GGAAGTTAAG
TCAACTTTNC TCTGCAG
pha10637 (SEQ ID N0:729) CTGCAGTTAC GAGTAATTGG GTAGGTTACT GCACTAATA CTTTAGTTGA
CTTIZ'GAGGT GGTTGAGAT ACTAGCTATC TATTAATCTG TTCTATAAAA
TACAGGCATA GCTGTTCTCT TGAACTATAG AATTTACATA GCCTTZT[C/
A]CCTATTAT TAATAACACC AAGACACACA TATACTGCTA GACACTTCTG
TATATGGGAT CTCTCAAAGA TCCTTTCATC TTTCTACATT TTATTTGGAT
ATGGGAATCA GGTGATCAAC CCTAACAGTT AGGGTTTGGA TGTGTTCTCC
AACCACCCAT GTTTGGTTGT GTCCTATAGC TCACACATTT TGAGACCTTT
TATCCTTTGA AGTTTGATTG ATTCCAACAT TCAGATTGTC TTTCCTCGAT
CAATTGGCCT ACTCAACCTT [T/G~CTGT CATCTACCTT TTCATATACT
TCTACCACAT AGC
pha10638 (SEQ ID N0:730, 731) CTGCAGCAAC AACCCCCAAA CCCCGTTGCC AGAAACAAGA CAATTTGTCA
AATGCAGAGC ATATATAAAA AATAGAGAGT AGGATCATCA CCTTTGCTGG
TTACTGGAAT ATTTTCTAAG TCCATTGAAC TCATTTTCTC TGATGCAGAG
CTCAAAGACC TGAAAAGCCA TTGTTTGTAA TAAGAACAGT AGACCATATA
TTTACTACCA TCAAGGTCTA AAAACACAAA ACAAAGTTCA CCTGCTAGCA
CCCCAAACAT TTGAGTTGCT TGATGAAAAA TATCATCTAC AATTGAATA
TACTAGCTTG ATGCCTATTT GTTTCTAAAC CCATATCCA AATA[C/A]T
AGTACCTCTC ACAATAGCTT TAAAGTAAAC ATGGAAATTC ATCAACCCCT
AAAT[G/C]C TCAACCTTAA TTGCCTCAGA GGCCATTTTA TTAATATAGA
TGGAAACCAG AAACATGTCC AACTATTACC TTGAAAAACT ATCATATTAG
CAACTAAGGA [G/A]GTTGC TGTATTTATC CTCT[T/C]A ACTAACCTGG
TTTGTGAACA AAGAAAATTC CAATTTTCTT TCAACAATAA AAATTCAGTT
TGGTTCCTAA TCACATTATG (T/C]AAAAG AGTAAACAAA ATGAGTTACA
AATAAAAGGT CAAATTTTAT CAAAAAATT CTTCACCACA ATAAAGAGCA
S TTGTATGACT GCTCTTTTA AATTAATTTT CAATCCTIZT AAGTTGCAAC
AATGATAAAG TATCAATTAC CTAAGAGTTC CAGGGAAAAT TAATAAGAGC
AAATAATTTA AGGATAATAG TTTATATGTC ACATACAACT AAAGCAGTGA
TTCTTAAGGA GGTTTTCCAA GATTGCACTA TGGTTTTACA GCTAAAGAAG
AGGTTATGAA TTTATTAAAC AGAAGCTGGG ACATCAATGG ATAAAATAAA
TGGATCCAGA AAGGTACAAA CCCCATTATC ATCTGTTTGG AAAAATTTGA
AAGCAAAATC ...( --- SObp )...CCAACC ACATATAGAC TGCTTATTAA
NACCCATCCA AGATTTCTAC TGAACCAAAA CGAATATGAT CCTAAATATN
CATTGAATTA ATCTTACCCA GATGGGGGGT CATTGATTGT AGTCCCCTTT
CCAAAAACCG GGGGGTTCCT AAATAAATTG AAAATAAAAA AGATTAGGAA
GCTAACAGAA ATTTGAATAT AACGCTCCAA AGTTCACACC ATATCAGTCC
ATATTCCCCT TGAAATAGAA AACTAATACA ATAGAAAGTG TCACACTACC
TCAAATTCCT CAAATTCTTT GTCCTCAATT GTTTCAACAC TTGGCATGGT
ACTAGGTGGT GGATGAAGCA AGTCTTGTGC TTCTTCCCAT GTAATTCTCA
ACTCCACAGC ATCTTCATTA TGTGTCAGCA ACCTCTTACT CTTTGGTGCA
ATATTTAGAG TCTTCTCAGA AACTGAAATT GTTTGCTGC AG
pha10640 (SEQ ID NO: 732, 733) CTGCAGA ATTITTCTCA TAATACAAGT TTAGAATTAGTTTCTCAAAA
AAACAAAAT TAAGAAAAGA AAAGTCAACA ATAAAGAAGT TTGATCCAAC
CTCTIZTATC ATGGACCCAT CTGCAAGCAT GTCCAATGGA AGCATCATTA
AATCACTTAA AGCATTAAGG AATTQGAAAG GTTTGAAGGA TGATTCACAT
TTGGATTCAT TGTTTTCATT GCTAACTTCA CGAGAGTCAC TATCATCAAT
GCTAAATAAA TCTGAAAGC CATCTTGACC AGTCACCAAT CTGAGTACAG
ATAATAAAA AGTTACTCAA GAGAACGAAC ATCATATATG ACATATATAT
CCTTCATGCA ATAGCAACCA AACTAATTTT ATTAGGTAAG GTTGGCTACA
TGGATGTATC TACGACATTA AGCTCTATCA AGCACCAAAT TTTCAGT[C/
T]AAACCATT TAGGGTAAGG TCTTACTAGA TGNTTTCTCA AGTTTTCTTA
GGTCTTTCTC CACCTI7TI"T GATTGAACTA ACTTCCATT TGATCCACCT
TTCTCACTGA AGCCTCTAC TGGTCTTCTT CTCACATGAT CAAACTAAGC
GAGTTTCCAT CATCTTCTCT TCAGTTGATG CTATTATAAC ACTGATGGAT
ACAT...(-.-610 bp)...ACATCCTAAT GCATATTGCT GTGGGTGCAA TGGAAGTAAG
ACTTACAGTG AAAAATATAT AAAGATTAAG ATTAATATTG TGTTATTTAA
AAGAAGAAAG AATGCATTAA ACAACTTCAG TTTCATAACT CACAGCATTC
TTCAGTTGTG CACCAGCCCC AAAGCCTGAT TTTCCAGCTG GAATAGGAAG
AACCATAGAA TCACTAATGG GATCTGATAT AGGATCCATG GGCATCTCTT
CAGCGGATTC ACGAAGAATA GCATTGAACA TTGCCACATC AAGTCTACTA
ACCAACTGCT CCATTACCTA CAACCACAAC AACAGTCCTA TTTCAATCAG
GCTGCAG
pha10641 (SEQ ID NO: 734) CTGCAGTCAA TCATAATATT CAACCATGAA ATCGAAGTAG TATCTAACCT
GTCTGCTCTC TTTTGCATTT CTGATGAGTC CATCAAGACA AGTATTGATG
ACCTTTAGGT CATCCTGAAA CTTTCTTTGC CTTGGGATTA TCCACCTTGC
CAATGGAATT TTCCAATAT GGAATGTAGA AAGTGGATCT ATGTTCAGC
TTCAAAAAGA GTGCCATAGA CTGC[C/T]T ATAACAGACT TAAAAGCAGA
GGATATATGA GCTCTCCTAA CCAAAGGTAA ACAGAACAAA TAAAAAAGTA
AGGTCTTGAA AATCATATTC CCAGCATGGC CTAAAATTTT GAGAATTAAC
TAAGGTGTAT TATGTACTTA TATTGATGAT GATAAGTCCT TZTI-TACCCA
WO 99/31964 ~3 PCTNS98/26935 CACAATTGTT TGTAGAAAAC TAGTAATTAT CTTCAACGTG TAAAATCAAG
AGATTGAGCT TCTGGAGTAC ATATTAATAA AGAAGTACAC TAAATAGGTA
GCTAGACACT TTTAAGGAGG CCAAATATAA GAAAGAAAAT T(A/G)CCAC
GTCTACTTTC ATTTACAATC AATACATCAA TATACACTTG AAATGCATAA
CCTTTAAGTC AAACCTTAAT AACTGGAGAT TCTTT[T/G) GTGACAGAA
CCAAAGTCAT AGTTGAACAC ACCGATCCC AATAATATCA AGAGCCAAAC
TAGAAAACTC TGCCTCAAGA TCCAATTCAA TTGAGTTAGG TCCATCATAA
CCCTCTCCTT CAAGAAGCTT ATTAAACTTC AATATTGTTC GTTCTGATTA
AGTTGTGAAT ATTTTGACCA TAGCTTCCAA GTATGAGTTA TGGAAAGCCA
GAGCAATGAC TGCATATTTT ATAAATATCA AGTAAAAGAA AAAATAATAT
TAGGTTACAT TCATAAAACA CTTGACATTA TTAGCACAGG AAAGACAAGC
TATTGAAACC ACTGATTATT CTTTAAAAGT TATTTAGACT CCTAAGTTGT
AACAGAATGC TGGTATTAAT TTAGTTAATT ACTGCAG
pha10646 (SEQ ID N0:616, 735) CC CCCAACAAAA CTAAAAATAG AACCCTCAAC AACC...(-45 bp)...
TCACACCAC ACAAAACCCA TATAATGGAG AAATTATTAC ATGGTCCAAG
ACACTTGTC TATAATTCTA ATTCTATGTG AC[AC]AACC GATTTTCTTG
ATGTATCAT GTGAAGATT GTTCAAGATC ATAATGTTCT TGGACCATG
TAAATAA
pha10647 (SEQ ID NO: 736, 737) CTGCAGAAGAA GAAGATAACG TACAAGCATC AATCAAGCT CTCACTGCTA
TTCCCCAAA ATATTTCAAA AAACAAAACA AGAAGAAGAA AAAACTGCAA
TGTATGGAA AAGTGGTTC TCTCGTTCAT CTTCGTGCTC TTGCTCGTCT
TCTCTTACTC AATTTTCATT GGCACCCTCG ACATTCGATC TTATTTCTTC
CCTCGCCTAA AGTTACCCGC GGCTGCACCT GCTCCCTGTG CACCCGAGCC
TCCTCTTCGG GTTTTCATGT ACGATCTCCC TCGCCGATTC AACGTCGGCA
TGATTGACCG CCGGAGCGCG TCGGAGACGC CGTCACGTTG AGGACTGGCC
GGCGTGG... (-255 bp). ..AACTCGCC CAAGCGTTCT TCGTGCCGTT
CTTCTCGNCG CTCAGCTTCA ACACGCACGG CCACACCATG AAGGATCCCG
CCACGCAGAT TGATCGCCAA TTGCAGGTGC GTCGCTAGGT TTCTCGATTG
TTCCGAATTG ATTGGTGAAT CANTCAAATG CTATGCTATG AGAGTTATTT
TCACAGCAGG TTTTGCATGT TTAGCCTTAG AACGCGATTA GTATGATCAC
TTGAAGATTT A[C]GATACA GGATGCTTGT TGAACGACAT TAGATGCTNA
AAATCATAAG AAATTTGATG AATGCTAAT[ A/G]TTCACA CCAGTCTAGA
AGGTAGGAAT ATCCTAATG GAGTTTTCTC AGGATCGCAC GAGTTGAAC
TCTGCTTATG TACCTGCAG
pha10648 (SEQ ID NO: 738, 739) CTGCAGCTTT GTAAAAATGT TTATGGCCA TTG GGCATG ATTC TTGAA
TAGCCt'TITf ACCAAGAAA CTGCATCATC ATTCTAGAAA TT[A/T]GTA
AGTT1TGTCT GATCACTTCA CAAGAGACCT TTAAA[A/G] AAGATTTATA
TGGCCCAAGA ATTTTGGAAA TGCATGCGCA AACCCATAAC TGAATGTAAA
GAAATTTACT AGTTTAATTG TAGTTCTCGG CTGAGTTTGT AGCCCTGCCT
TTGTT~"I'GTT TTGTTTTAGA CCGTTTTGTT CGGGCCTCTT TTGTTACTCA
AATGTTCACC GTTATTAATG GATTTTATTA TTTGTAGTTT AATTTCTCTG
ATAACCATTA TTGTTATTAT TTGTTCTTAC CATTATTAAT GGATTTTATT
TTCAATAAAA CAAAAAAAAG TTGAAATGTG TGTGTGCGCT TGTGTATTAT
ATAATGTGTC TGAGCTAT...( - 15 bp)... AATACAG GACACGTTTA
AATCACAAGA TATTTATCGG GAAGGTATA GATGTGTTGA TGCTCTAGCC
AACCATGCA CATTAGATGA ACATAGACTT TATGATGTTT GAGCAACCTC
CTACTATGTT GAGCCATTTT GATTGCTGAT TTAATAAATT TGGACTTGTG
WO 99/31964 ~4 PCTNS98/26935 TCCATCCTGG TAACAAGCAC TCCAAAATTC CTTCTTCTCT TCCCTATTGG
GGCACTCCCT CATAAGTCAT GGAAATTTCC AAATACACCC CAAACTIZ"TA
CTTTTGCACC ACCTCTCTTA TITT'TAAATT GTTATTACAA AACTACCCTC
AATTTCTTTA ACCTGTCTAT TTTGCAGACC TGCAG
pha10649 (SEQ ID N0:740) TGATTTTCTT TTGGATCCTG AGAACAAGCC CCCCAGAATT TCACTCATG
CATGTGCCAG AGTTCAGGAA TAGTCTTCCA AGGGAAACAA AGAAGCCCT
CATCTTGTTT CTCAAATTCA TCTTCATTTG GAATCAAAGG AGGCCTGAC
TGACTTTGTA GTAGGAGGAG GCCTTTGCAT TGCTGTGCTA CAAATGGTG
CTCCTTTCTT CCTTGAAGGA GCTCTTGGAT CCTGAAAATG AAGTAAAAT
TATTATATTT TCTAAGGGCT ATAAGGTCG GCTTGGTGGT TAAAGG CAG
AAGGGGGGA GGGAGAGGTC CGTGGTTTTG AGTTTGGAAT TTCTCTT AC
AAACAAACTA ACAACTAGTA TTGACCGATA AAAAAAAACT GATATTTC C
AGTTTATATA CTAGATGTTT AATAGGCATC ACAAGTCAGA TGCAAAGTT
TTTATATTT GTTTACTAAT TAGAAATCAT TCTTAGTATA ACTTTTAAGG
TAATTGTTA TAAAAGTCAA CAAACTTATC A[C/GJA[G/
C]ATGA[C/T]GATTTGTGAT T[C/T]GATG AACAGTGCAA AACACTITIT
AACTCTCATT ACATGTATTAT TTATTCATAT ACTATAACAT GTTTGATAAA
CATATATTCT GTTCTGTGA TCAAGAATGC AGAAAACTAT TCAACTTTAG
GATATACAGG TTATTTAGGAC TTAATTAACT TAGACCAAGG AAGAAATTTA
TACCTCAACC AAATGTTTC TTTGAAATTT GTAGGTI~ITT AAACTTGTTA
CAATGGTTCA GGACAACATGC TACGGAAGGA AATAATTGAC AAGTGTTA_C
TCACATCTGG AGAAGAAAAC ATAGCCCTC ACCCTCCA
pha 10650 (SEQ ID N0:741 ) _ CGCAATTAAC CCTCACTAAA GGGAACAAAA GCTTGCATGC CTGCAGCAAT
ATAACCAGGA TTCAGAATTA ATCTAGTTAG TATATCATAC AATGCAGGGC
CAGTTACAAT ACATACATA CGCATAACCA AAACAGTAAC ATCAATGGAA
CAGTAATAG GACACAAT(T /C]ATTATTA T[T/C]TTTT TGTTAAGGAA
ATTTCTAAGA AAAACACAAC CATTTGTACA AAAAAGGTAT TAATACATAG
CTACATGGAA GAAACCTACA TTA[A/T]AA TC[C/TJAGT AGTGAGAAAA
GATGGGGGCA ATTATGATAA TCTCGGAAAG CCTCTGCCAA GGGTCAGCAT
TCAAAATTGA GTTCCTTAGC CTGCTGTCTG CATATGCTTA TCCACAAGGA
ATATTGTCTC CGTGAGGATT AACCAAAAGC ATACCTCAAT GGGTCCAGAT
ATCCTGAAGA TAGCGCCCAA TTTGCTGAGC ACCAA[A/'I~ ATATAGGGCA
TCGGCAACGA GAAAGACTCC AATCTACGCC ACAAATGTCA AACTTGTGAA
TGTCAAGGTT AAGAAATAAG ATTTACAATT GAAGGTCTAC AGCAGATAGA
TTA[C/A]CA GGCGTGCAGA AAA_CACTAA TCTATATCCC CAAC CCT
T ,~G.CTGCAG GTCGACTCTA GAGGATCC
C CGGGTACCGA GCTCGAATTC GCCCTATAGT GAGTCGTATT ACACCCTATA
GTGAGTCGTA TTACGCCCTA TAGTGAGTCG TATTAC
pha10651 (SEQ ID N0:742, 743 AGCTTTCCAT GTTTATTCAC)GGCGCCATTG ACAGCTTGTT TCTTATTT1'T
TGTGTTCTGG CCATCTTCTT CATCTTCACT GTACTCACTC TCATAGTCAT
CATCATACTC TTCATCCTCG TTGTCCTCAT CCTCCTCAAC TCCATCCTCC
ATGATAGGTC CTTCATCAGC CAAAACAGCA CCCCTGGTGC CAGCCCGAGT
TCGTTTCAAA ACTTTGAGCT TTAAATTTGT CCTTCTGTCA CAGCCCAACC
AAGAATCACA CAAACAATAG CAAGTCAAAT TGTAGTTACC CTCTGATGGG
GCCTGGAACT TGCCCAATAC TAATCTAGAG CCCCCTTTCA CCTTCTCAAC
TGCTTCTGCA ACTACCTTAC TGGTCTCCTT CACATTNGNT CCTGACCCCT
CCATAGACTC CTCAATTGCC TTAGATGCAG CAGTTACAGC AGCAGCTTCA
WO 99/31964 ~5 PCT/US98/26935 TCCATGAAA CTAACCTTCT GAGAAAACCA CACGTTGTT TGAAACAGAA
TCCGCAAGCA AAAACCAGTA GTTTTCTTCC TTATGAAATG GGTAGTAGGG
GGCATGTGGA AGAGCACCAA TCAGGCTATT ACCCCTI"ITA ACATTTATCC
AAGCATGGAG AGTCACAATG TCCCCCTCTT GTTTACCCTC TTCACCTTCA
GTCTCACAAG TTACCTC[G/ A]AGTGTCAA GGAAGGCATC ATGTCCAGTA
CCGTCTCAAT GTCTTCCACT TCAGCGGAAG ATAACCCACC TGTTTGAATA
AGAAGGTCAG CTCGCTCCTG AGAGTCCATG TCATGAAGTT CCTGAAATGT
TCTCACTTTC TGCATTATCA AGAAGACAAA AT[T/GJCTT AGTTCCAACA
CTCAATGTCA TGGGATCAAT TTGGAAATTC AATGACCAAG TAAAAAAATT
CAACATTTGA TTTTCAAATA GCCAATAGTC ACTATACAGT ACATACAACT
TTGGGTTAAC TGTGCATAAA AGAGA[T/A] GGCTATGCAG TTTACGAAAG
AAAAAAAA.. .
(-- 40 by )...CAAAAT CAGACATTAA GG[T/C]TTC CACTTTCAAA CTCTAACCTA
AATTAAAAGG GGGGAAAAAG ACCAATCAAC ATATCATAGA CGCATTCTTT
GTTTCTAGAT TTTCTTAATT ATTTATGCAT TTGTCAATCT TGAGGACACT
TAGCAGAAAT AAAAATTCCA TTAAATTGGA TCCAACAAGT GGCTAATTAA
CTAGTTAGA TCCCCAGGAA AAATGTGCAT AAAAGGCTT GTTGACATTT
TATCTGAGGT AAATCTTAAA GACCTCTAGA CCATGTAACT ACAGGTAAAC
AGCATGGACA TGAAAATCCA GATCCAGACC AGGGAAATTT AATCATTTAC
AATTGGATGG CAAGGTCATG AATAATGCTG ACATACTTGC GGGCCACCTT
CTTAATAAT
pha10653 (SEQ ID N0:744, 745) GGAATGACTG CAACCTGAGA AAGACGCCCA TTATCCAAAG CTTGACGAAA
TCTAGATTGC GATGCCTCAG CAACAGCTTT TACTTCTTTC TCGGGCACAG
CAAAGCATAC AGAATGCTCA CTACTAGCCT ACATAAATAC TGTTAATGAT
TAATGCCATT TCTTATATAT CAGCGTGGAC AACTAGAAAA ATTGAAAAAA
GTTATAAGTG CACCTGAGAT ATCATGATAA CATTAGCTCC AACATCTTT"T
ACTGCACCA AAAATAGCAC TGGCAGTACC TGGAACACC AGCCATTCCA
GTTCTGCAAA AAAGCATCAA AGAAAANTTT ATTGGAATCT ACAACTTGGA
C(T/A]ATTA ATATTGGTTA AAGAAAACCT TA...(--155 bp)...TGGGAATGTT
CCTTATCATA ATGGGTATGC CATATCGCAT CACAGGAATA ATTGTGCGGG
GATGCAAGAC ATTGGCACCC AAATAAGACT GTACAACAGA CAATTTGCAA
GTTAATCTCC TTAATTTTAC AAGCAGAAGG AGCATTGAGG CTTTCCAGCA
TCTAACTCAC CATTTCCCAA GCCTCTTGAT AAGACAGTGT CTTCCAAATC
ACAGCCTCAC TAACTGCATA [C/A]GATAT TATTGTTTAT ATTTAATCCA
AACATCATGT TATGGCAGTC AAACACAACA CAAAAGAATC ATCAATATGT
CAGAGCTAGA ACTCCCTCTG GTTACTAAAA ACCAATTCTC ATGATCCAGT
CCACTCATTG TTTAACTCA_GAGACAGAGT ACGAAG ATA ACAAACCTTT
TCTAGGATC TGCACTATAC ACACCATCAA CATCTGTCCA AATTGTGACC
TGACGAGCCT TAAATAGAG CACCCATAAT TGCTGCCGAG AAGTCACTTC
CATCTCTCTT CAGTGTGGTA GGAATGTTTT GAGGTGTGCT TGCAATGAAT
CCAGTGGCAA TGATTACCTT ACATGGATTC AAAGAGTACC ATTTCTCAAG
TCTCTTCTCA GATTCCA
pha10655 (SEQ ID N0:746) AATAATATCA CAGTAAGAAA AAGACAACAG CTGTGGATGT CAGGAGGGAT
TGATCCGCAC ATGCATATGG ATCCAATCGA TAAGGTGAAG TTTCTCTTGG
CTTGCCTCGC ACCTTCTCAA TTAATATATA TATGTATGTG GTTTGGTGTC
TGTATATGA CATATGTGCT CAGTGCCTCC GCCAAA[G/ T]CAAATAAA
ATGGATACTG AAATGGTTGA GAGCATTGAT GTCACAGATG ATGCAAGGAG
TTTTAGCTGG GCAGTGGACA GCGCCATAAG GAGTGA[T/G ]ACTGGGTCC
ATI'TT"TGGAG AATCTCGTGT CCGTAGCTTT GCTTCGTTAG T[G/T]GACA
WO 99/31964 ~6 PCT/US98I26935 GTGCCATAAG GAGTAGTAGT GCTACTGAAG CAGATTTGGA ATCATCCCTT
GTCCAGGCAG AAGACAGGGC GATGAGGACT GTTGCAGCAC GATTAACAAA
AGCCGTGTCA AATGCAAAAA GACATTTTGC TAAAGGATTA TTTACCCGGG
CTGAGCTTAT ACCAGTCACC CAGGCTGAGC TTGAACCAGC CACCAACAAT
TTCCCAGGAT TATTATTTAC CCTGGATGAG CTTAAAGCAG CCACCAATAA
TTTCTCATTT GACAACAAGA TTGGTAC[T/ C]GGAGGCTT TG[GTA]TTG
TTGAGTACAG AGGCAAACTC [A/G]TTGAT GGTCGTGAGG TTGCAATCAA
GAGGGGTAAA ACCTGGTCAA ACTCATTTGG TAAGTCT[G/ A]AGTTTGCC
TTGTTCTCCC G[T/C]TTAC ACCACAAGAA TTTGGTTGGG CTGGTTGGAT_ TTTGTGAA[G /C]AAAAA[G /T]ATGAAAG GCTCTTGGTG TATGAGTACA
TGAAGAATGG G[G/A]CGTT GTATCATCAT TTGCAT[GA/ AG]CAAGAAG
GGTAGCAGT GTGTTGAATT GGTA
pha10658 (SEQ ID N0:747) CTGCAGGGTT TGATCGCTA CTATGTATAA CTATTTGAC TTAGAGGAAC
TCATCTTAAT TAATGTCTTC TACACCGTCT GCCTGCAACA GAGTATGTAT
GTATATATAA ACAACATTCC AGGGCCAGTG TCATACAAGT TACAGCCAAT
TTTCAAAGTA ATTAAGTGAT GCTAAACTAA ATCAAGACTG AGTTTAAATT
TTCAAAACTA ACTCATTTCA CATATATTTT GAGTTATTTA GGCCAACTCT
TGCTAACCAA AGTTGGGTGA ATATTCTAAC TGGTCCCCAA AACTGTGAAG
CGTAGTCACT TTAGTCTCTG AAATAATAAA ATTCAAAAAA GTTTCTAAAA
TTACAATCCA TATGTACTCA C[A/T]TTAG TCCAATATTC AAAAACTTGT
GTGATTTAAA TGAAAATATT AACATACTTT CAGGACAAAA GTGACAATAA
AATGACAACA TAAAGACCTA TTCTGAATCT TCTTACTTTC AGGGACAAAT
GGGATAACAC [A/T]TAACA CTTTTAGGAC TAAAATGGTA GTTTATCTAA
TCAAAGTTTT TATGGTAA[A GTT]AGTTCA AGCCCAACAA TAACACACTT
GAAATTCACA TAGAAAA[C/ T]AACTGTTC AAGTACAAAA TAGGAGGAAT
CCAACAAATT AGCAAGCC[A AAACAGT]TA ATAGGTACCT GATTTTCTGG
TAGAACTTTC ATCATCCATC CTCATCCCAC AGGTTGGAGC AAAAACTGCC
CATTGTACCT TTGTCCCACA ACATTGTGGA AATATCTCTA ACATTGCATT
CAAACACTTC CTGCACTCAT CACTTGGAAG ATCCCTATCA CACTGCACCC
ATCCATACAA TGTCTCATTG TCACCCCAAT CGAACTCCTC CACTGCCCAA
AACTGCTTGG TCTCTATAGT TGCTIZ'TGTA ATCAAACCLC/T]TC[C/
A]ATAGCATC CTCAATT[C/ T]TCTTAGTT TCTGTTGA[A /G]TTCTGTG
TTTGGACCAC GATCTTTGTG AGACAAATTC AATTZTI'AAG TGACTTTAAA
ATATGACA[C /A]ATTGAAA TGACCTGTTT GGATATTAAA TTTAAAGAAT
TTT[CIA]AA AAATGATGAG AAATTTATTG GAATT'IZTIT T[T/A][A/T
]AAATAAAAT AGAATTACAA AACGCTGACA AGGTGTTTTG GTAGGGAGAG
ATTCZTI"TCA ATTTCTTAGA AATCTTGGAA GTG[T/C]TA AATTCCTATA
TTTGATATAA CTATTTTAAT GATCATTTTC ATAAAT[C/A ]TAAAATTCA
CAAGAATCAT TTTTGAATAA GTCTTCTACT AAACGTGGTT CGGTGGAGAG
ACTCTACAAA ATGAGGTCAG ACATCGTAGG ATGTTAGT[C /T]AAGCATC
GGC[C/A]AA ACC[A/T]GT AAATACTTCA TATCATATCA ATCATATGAT
GAT[A]AAAA AAACGCTTTC TTAAGACACC GTGAACTCTA GAAAACAC[A
T]ATAAAATG AAATCTGCAC AAGC[T/A]T A[A/T][A/C ]GCACATGAC
TAAATAA[C/ T)TTTATCAA AATAAAAAAC TAAAA[T/CJ ACAGGAAAGT
TGAATTGCTT TATCCAAATA AAATTT[AA/ TT]AAAACGA AAGAAGTTTA
ATTTGCAAAT AGCTTGAATT TTTCAAATAC CATAC[TC/T A/CA]AAAAA
TTACCTT[G/ T]ATITITCT GGGTCCGGTA A[C/T]GTTC CACGTTGGGC
TTAAACTAAC TTTGCCATGG AAACTCTGA TTGGAGTAAC GGAGGATGC
ACACATCATA CCATAGGATA GCGGTAACAC TGTCAGGACA CAGCCGAGAG
ATTTCATTGA CAGCAGTGGT GAGACAGAAC TGGCAGAAGT ATCCTGTGAT
GTCGTATCTG CAG
WO 99!31964 ~~ PCTNS98I26935 pha10782 (SEQ ID N0:748) CTGCAGGTTA GTCGAACTCT TGTCATITTT TCTTCAAGTG TTGACTGATT
AGCTGTTACA AATITITAGA GATTACATT AATTACCGCT ATGACTATAT
CTTGGGACT ATATTGATTA ATAGTTGTCT TTGAAACTTT TGGAAAGTCT
AAAGATCTAT TATGTTGGAG AATAAACCTT TGTGCACTCA CGCACA[A/G
]ATATGAACA GAGAATATGT TTCAATGAAG CTTGTGTGTA TTTTCTTATC
AAGCACTTAC ACATATGCAC CTTTATATAG TGCTTTTATC AAAAACTGCT
ACAGCAGAAA ACAGAAA[T/ C]CAGTATAA AACTAACA[T /C]TCTAACC
AACTACTACT AATAATACCC TTA[T/C]TT ATATTTGAAT TAAACTC[AA
]AGTTATTTC TT[T/C]GTT ATTCTG[C/T ]TATTATTAA AGTTATTTCA
TTTGTTGCTT CTTCTTGTTT AATGCACA[T /C]ATTACTT TGTCTTGTAA
AGGTCGAGGA TATGATTCTG TTATTCCCTC TATTCCTGAT GATGTAAGTG
GCGAATCCCA T[G/A]TTCA GACCAAAATA TTTGCTGATA AAGTTGTGGC
ATATATAGA CCAATACATA GAAGCAATGG AGAAGGTAGC CTATCCCCC
TTTTGAAACA ATATGGCATT GTAATTACTA GACAGTTTGA TGACATTTCT
TTTTGCAATT TGTTAAGGTT AAACTTAAGC AAGGATTGAA AACTGCAATG
AGCATATCCA GCGAAGGAAA TGCAG
pha10783 (SEQ ID N0:749) CTGCAGAGT CTCGCTAATG GGAGTGAAAC CTCAGTTTC TACTTTCTGA
AGTGTAGTAG ATTGCATGC[ T/A]GGCATG AGCACATGCT TATGAATTAT
TGATAGATAC TGTTCAATAA GGGCATCACA TACATGAAAA CAACTACACT
TGTGACCTTT TTCTCTTAGT CTTGAAAAAC ATCGTTGAAA ACAAATTAAG
GAAATGCTAC CTATGTGAAA TTTTGATCAA GTTATGTGTG CAACTGTGTA
T[G/A]GTTG TGGATCGTT TATTCTTTCA AGAAGCCTAG ATTCACCTAC
ATGCATCTCA AGCATCAATA TTGAGTTGGA TCAAATGGCA AAGTTGATTT
CTTCTTTAGC TTGTITITAA ATGGATTTAT GATGAAGGGG CCACATTTCT
GGTTCGCAT[ G/C]TCGACC TCAT[A/G]C TITTITCTTT CTTGCAAAAT
ATTTGATTGG TGAACTGGTT CAAATGATTC ACGAGGCAGT GTAATTTACT
CTAATAAGTT TTAGTAGCTC AATGTTTGTG GGCC[C]ATT AGGTTCAATG
TAGGGCCAGG ATTGTGGTGG AAGAGAGCTG TTGAGAAATG TTGCAGCAGG
AGGAAGAGGA TGGAAGAAAA TTTTGAAATT TGAAGGAGTC ACGTTGCCTT
GCCAAAAACA TGATAGACAA GTAGTGTACC GTTGCAGTAG AAATTACCTG
AAATGCTATG TGTCCGTTGC ATTCTTCCCC TACATAAGGA TTGCTTCCTC
CACTCTAGTT TTTCAATTTC ATTTCAATTC TA'1"ITDTGAG TTCTTTGCAA
AAATAAAAAC CTGCATCCCT TITI"I'GGGGT GTATGGGTTG ATGGAAGTGA
GTGAAAGTGA TAGATACGGT AGATGACGTG ATGAAATTGG ATGCTTGAAT
AATAGATCGA AGAATCCTT TTCCTTACAC CCTAGTTAG GCCCTACTAT
TTATGCTCTC AATTCCATTC CTCAATTTCC AGAAATCTAT GTCTAAAGTC
ATGTCTGATG TCTGTCACTA TATATTAGTT AATTCATATA TGTTTATGAA
TATTGCATTG ACCCTCGTTA TATATGGATC AATCAAGCCC ATTGCTGCAG
pha 10792 (SEQ ID N0:750, 751 ) CTGCAGCAAC CGGTCTTCTA TCAAAACTAA CACATTTGAA TCTGTCTTGG
AATGATCTTC ATTCACAGAT GCCTCCAGAA TTTGGCCTCC TTCAGAACCT
GGCAGTTTTG GATCTTCGCA ATAGTGCCTT GCACGGTTCA ATTCCAGCAG
ATATATGTGA CTCAGGCAAT TTAGCTGTCC TCCAACTTGA TGGAAATTCA
TTTGAAGGGA ATATTCCGTC CGAGATTGG AAATTGTAGC TCTCTTTACT
TGCTGTATG CATCCTATTT CTCAAGCTCT AGTGTCATTT CTCTTAACAC
ATCTZTI"TGG AA[GA/GT/T A]TATACTAA TTCCTATCTA TTTTATGCAG
GAGTTTGTCT CACAATAATT TGACTGGTTC AATTCCAAAG TCCATGTCAA
AGCTAAACAA GCTCAAAATC CTCAAGCTGG AATTCAATGA ACTAAGTGG
AGAGATACCA ATGGAGCTTG GAATGCTTCA GAGTCTTCTT GCTGTAAACA
WO 99/31964 ~8 PCT/US98/26935 TATCATACAA CAGGCTCACA GGAAGGCTTC CTACAAGTAG CATATTTCA
GAACTTGGAC AAAAGTTCCT TGGAAGGAAA CCTGGGTCTT TGTTCACCCT
TGTTGAAGGG TCCATGTAAG ATGAATGTCC CCAAACCACT [A/T]GTGCT
TGACCCAAA TGCCTATAAC AACCAAATAA GTCCTCAAAG GCAAACAAA
CGAATCATCT GAGTCTGGCC CAGTCCATCG CCACAGGTTC CTTAGTGTAT
CTGCTATTGT AGCAATATCT GCATCCTTTG TCATTGTATT AGGAGTGATT
GCTGTTAGCC TACTTAATGT TTCTGTAAGG AGAAGCTAAC ATITITGGAT
AATGCTTTGG AAAGCATGTG CTCGAGCTCT TCAAGATCGG GAAGTCCAGC
CACAGGAAAG CTTATCCTGT TTGA...( --175bp)...CTTCGGGAAA GCAAGGCACC_ CAAATCTAAT AGCATTGAAA GGATACTATT GGATTCTTCA ATTACAACTT
TTAGTGACTG AGTTTGCACC AAATGGTAGC TTGCAAGCCA AGCTACATGA
AAGGCTTCCT TCAAGTCCTC CTCTTTCTTG GGCTATAAGG TTCAAAATCT
TGCTTGGAAC AGCAAAGGGG CTTGCTCATT TGCACCACTC TTTCCGTCCA
CCGATCATCC ACTACAACAT AAAGCCAAGT AACATTTTGC TTGACGAAAA
TTACAACGCC AAGATCTCAG ATTTTGGGTT GGCTCGGCTT CTGACAAAGC
TGGACCGGCA TGTGATGAGC AACAGGTTCC AGAGTGCACT AGGATATGTG
GCACCAGAAT TAGCATGCCA GAGCTTAAGG GTCAATGAGA AATGTGATGT
GTATGGTTTT GGGGTGATGA TCCTTGAGCT GGTGACAGGT AGGAGACCAG
TGGAGTATGG AGAAGACAAT GTGCTGATAC TGAATGACCA TGTGAGGGTG
CTGCTTGAGC AAGGGAATGT GTTGGAGTGT GTGGATCAAA GCATGAGTGA
GTATCCTGAA GATGAGGTAT TGCCTGTTCT GAAGCTAGCA ATGGTATGCA
CCTCTCAAAT TCCTTCTAGC AGGCCTACTA TGGCTGAAGT GGTGCAAATA
CTGCAG
pha11071 (SEQ ID N0:752) AGA ATCTGTGCAG CAAACGACAC CTTCATCCCC TGACGTTCCA TTTGCTCAG
TTGTTGGCAT CTTCACTGG ACCGGGCTCG TAAAAGTAAT GGGAATCAT
AAGTTTCCA TTATACAATT ATGAATTTCA TCCTTATCA ACAATATCCT
GGAAGCCCA GGTGGCCAGC TCATATCACC GGGATCAGC ATTTTCAACT
TCTGGTACTT CAACCCCATT CCCTGATAGA CCCCCTACTC TTGAAT[T/C
]CCCCTTTCC CAAAGGGGAA ACACCAAAGA TCTTGGGTTT TGAACACTTC
TCCACTCGAA G~TGGGGTTC AAGACTAGGA TCTGGGTCGT TGACGCCAGA
CGGTGCATGG CAAGGTTCAA GACTAGGCTC GGGATCGTTG ACTCCTGATG
GTATTGGGCT TGCTTCACGA TTAGGCTCTG GGTGTGTGAC ACCTGATGGT
CTGGGGC[A/ T]GGAATCCA GGTTAGGTTC TGGCTGTTTG ACACCTGACA
GTGCTGGGCC AATCAATCAA AACAACATCT CTGTGCAAAA CCAGATATCT
AAGGAAGCAA CTCTTGCAGA TACGGACAAT GGACATTCAA GTAATGCAAC
ATTGATTGAT CACAGAGTTT CATTTGAATT AACCGGGGAA GATGTTGCCC
GCTGTCTTGC ~TAATAjG/A] AACCGGGGTA TTGCTTC[G/ A]AAACATGT
CAGGGTCTTC ACAAGGTATA CTTTCCAAAG ACCCTGTTGA CAGAGAAAGG
GTGCAAAAAG ACACCGATAC ATGTACAGAG AAAACCGAT GATAAGCCTG
ACAATTCTGT AG AGGAGA GCAATGCCTT CACAAGCAAA ATTCTGTAAA
TTCTTCCAAA GAATTCAATT TTGACAACAG GAAAGATGAT GTTTCTGTTA
CTGCTGGCAG TGGCT
pha11073 (SEQ ID N0:753) CATGTTGCAA TACACTAAAA CCTCAATTCA TTTCTGACTG TAACATTGGG
AAGAAAGCCC AGCTGTTGGC TGATCTACCT TCCTTCCCAG CAACCTTCCT
GTTAGTCCAC CACTCATAGC CACTGCCAGT AGTAACAGAA ACATCACCTT
TCCTGTTGTC AAAATTGAAT TCTTTGGAAG AATTTACAGA ATTTTGCTTG
TGAAGGCATT GCTCTCCTT TTCCTACAGG ATTGTCAGGC TTATCGTCA
GTTTTCTCTG TACA[C/TjG C[G/A]TTAC AGCTACTATT GGTGTCTATT
WO 99/31964 ~9 PCTNS98/26935 TGCACCCCTT TCTCTTGTCA ACAGGGTCTT TGGTCAGTAT ACCTTGTGA
AGACCCTGAC ATGTTTCGAA GCAATACCCC AGTTTTATTT GCAAGACACC
G[G/A]GCAA CATCTTCCCC GGTTAATTCA AATGAAACTC TGTGATCAAC
CAATGTTGCA TTGCTTGGAT GTCCATTGTC TGAATCTGCA AGAGTTGCTT
CCTTAGAAAT CTGGTTTTGC ACAGAGATGT TGTTTTGATT GGTTGGCCCA
GCACTGTCAG GTGTCAAACA GCCAGAACCT AACCTGGATT CCTGCCCCAG
ACCATCAGGT GTCACACACC CAGAGCCTAA TCGCGAAGCA AGCCCAACAC
CATCAGGAGT CAAGGATCC CGAACCTAGT CTTGAACCTT GCCATGCACT
GTCTGGCGTC AATGATCCAG ATCCCAGTCT AGAACCCCAT CTTCGGGTGG
AGNAGTGTTC AACACCCAAG ATCTTTGGTG TTTCCCCTTT GGGGAATTCA
AGAGTAGGGG GTCTATCAGG GAATGGGGTT GAGGTGCCAG AAGTTGAAAA
TGCTGATCC[ G/A)GGTGAT ATGAACTGGC CACCTGGGCT TCCAGGATAT
TGTTGATAAG GATGAAATTC ATAATTGTAT AATGGAAACT TCTGATGCC
CATTACACTT ACGAGCCCG GTCCAGTGAA GATGCCAACA ATTGAGCAAA
TGGAACGTCA GGGGACGAAG GTGTCGTTTG CTGCACAGAT TCAGGTCGGG
GAG
pha11074, pha11075 (SEQ ID NO: 754) CTGCAGCTAT TTGCTTATGA TACTGTTAAT AAGAACCTCT CGCCAA AGC
CAGGGGAGCA GCCCAAACTC CCTATTCCAG CATCATTGAT AGCAGG TGC
TTGTGCTGGA GTTTGTTCAA CTATATGCAC ATATCCTCTA GAGTTGCTAA
AGACTCGACT AACTATCCA GGTGTGTACA TATAAAACCA AACAGGCTC
ATGTTTCCCA CATGAGTTTA ATTTCTATTA TTGTCAGTAT AAAAGTTT~"T
ACATTATTAA TCAAGAAAAA ATCATGTTAG CTATGACTTT TAAAGAGTTA
TTTTAAAAGT CAACGAACAT ACTATGCATG ATGATTTCTG ATTAGTTGAC
AAT[T/C]TA AAAACTCTT TATAGTGT CGGTGCGTAG ACTCTGTTAC TCTTC( CTATGTATCT TGTATCATCA TACCAAAAAT AAAAGCAAAA TTTAAGTTGA
ATTAGTAATT TTGCAGAGGG GTGTTTATGA TGGTCTACTA GATGCATTCC
TGAAAATAGT TAGAGAAGAG GGTGCAGGAG AACTTTACAG AGGTCTTACT
CCGAGTCTGA TTGGAGTAAT TCCATATTCT GCCACCAATT ACTTTGCCTA
TGACACCTTG AGGAAAGCAT ACAGAAAAAT TTTCAAAAAA GAGAAGATTG
GCAACATTGA AACCCTI"ITG ATAGGATCAG CAGCTGGTGC ATTTTCAAGT
AGTGCTACCT TTCCACT[C/ T]GAAGTGGC TCGCAAACAC ATGCAAGTGG
GAGCCCTCAG TGGAAGGCAA GTTTACAAAA ATGTGATTCA TGCCCTTGCA
AGCATTCTTG AGCAAGAAGG GATCCAAGGA TTATATAAAG GGTTGGGAC
CTAGCTGCAT -GAAGTTGGTG CCAGCTGCA G
pha11076 (SEQ ID N0:755) TAGTTCCAGC ATATCAACTC ATCGTTCCAT ATTGGACCTT GACCTAACCA
AGTTTACCAC ACAGAAACAG GTGTCTTCAC TGTTCCAACT ATGGAAGAGT
GAGCATGGAC GTGTCTACCA TAACCACGAA GAAGAGGCAA AGAGACTTGA
GATTTTCAAG AATAACTCGA ACTATATCAG GGACATGAAT GCAAACAGAA
AATCACCCCA TTCTCATCGT TTAGGATTGA ACAAGTTTGC TGACATCACT
CCTCAAGAGT TCAGCAAAAA GTACTTGCAA GCTCCCAAGG ATGTGTCGCA
GCAAATCAAA ATGGCCAACA AGAAAATGAA GAAGGAACAA TATTCTTGTG
ACCATCCACC TGCATCATGG GATTGGAGGA AAAAAGGTGT CATCACCCAA
GTAAAGTACC AAGGGGGCTG TGGTATGTGA AACCATTAAT TGTTTTACCA
CGTAATTAC[ T/C]ACTCCT TCTATCTTAA AATAAACTGT TGTCTTAAAT
TGTTTTATAT AAATTAAAAA TAAAATAAAA TAAAATAATA TI'TTTACCAA
ACTAACCTTA TTAGTGAAAG AGATTACCAT TAATACTAGA TTTAAAAACT
AAAAATATAT TTATTAGAGT TATTGGTGAA AA[T/A]AAT AATTAATTTT.
ACGTTGAAAA GTTAAATGTG ATATTTATTT TGCTACAATT TITTTTAGTG
WO 99/31964 $0 PCT/US98/26935 TGACACTTAT TTTGGAATGG GAAAATAATT ATTAACACTA ACTTITTITC
TTTGTCTTGT GG[T/A]AT GTGAAATCAT TTTGATATAT AGGAAGGGGT
TGGGCGTTT TCTGCCACGG GAGCCATAGA CCAGCACATG CAATAGCAAC
AGGAGACCTT GTTAGCCTTT CTGAACAAGA ACTCGTAGAC TGTGTGGAAG
AAAGCGAAGG TTGTTACAAT GGATGGCACT ATCAATCGTT CGAATGGGTT
TTAGAACATG GTGGTATTGC CACTGATGAT GATTATCCTT ACAGAGCTAA
AGAGGGTAGA TGCAAAGCCA ATAAGATACA AGACAAGGTT ACAATTGACG
GATATGAAAC TGTAATAATG TCAGATGAGA GTACAGAATC AGAGACAGAG
CAAGCGTTCT TAAGCGCCAT CCTTGAGCAA CCAATTAGTG TCTCAATTGA
TGCAAAAGAT TTTCATTTAT ACACCGGGGG AATTTATGAT GGAGAAAACT
GTACAAGTCC ~GTATGGGATT AATCACTTTG TTTTACTTGT GGGTTATGGT
TCAGCGGATG GTGTAGATTA CTGGATAGCG AAAAATTCAT GGGGAGAAGA
TTGGGGAGAA GATGGTTACA TTTGGATCCA AAGAAACACG GGTAATTTAT
TAGGAGTGTG TGGGATGAAT TATTTCGCTT CATACCCAAC CAAAGAGGAA
TCAGAAACAC TGGTGTCTG CTCGCGTTAA AGGTCATCGA AGAGTTGATC
ACTCTCCTCT TTGAAGCCGT AAAGGTTCAA TACAACGAGT GCTTGTTTTC
TTAGGGACAA GCATTGTACT TATGTATGAT TCTGTGTAAC CATGAGTCTC
CACGTTGTAC TAATGTGAAG GGCAAAAATA AAACACACAA CAAGTTCGTT
TTTCTCAAT
pha11078 (SEQ ID N0:756-758) CTGCAGAAAC TGTCGGTTTT ATTAGTAATT CAGCACAAAT TCTTCGGTGG
AGTTTTI"CI~T TTNTTCTGTT TTTGGCTGCG TTTACGTGTG TTTGTTTAGT
TTCCTGGAAA TTGTTCAAAT TGGTTAAAGG ATCTTTATAA GTGGACATTT
CAGTTGGAGT TAATTGCGCA ATTGAAAGTT AAGGCATAAA TGTCTCTATT
TAGGGAAGTA TTTGAATGCA ATGCTGCTCA TTTATGTTCT GTTAAATCTG
TATATTGATA TGCATTGATC CTTCTTATTT ATGTTATTGG CCATTATTTT
TTACATTTAA GTGTGGTTCA TAGCTTATAT ATGACAAAAA TATATGATTT
ATACTTCCTA TCGTGTTGGG ATGCAGATGG TCATGGGAGG TTTCCTATAT
TGCCAATTGG TGAATGTATT GTGCCTTTGG TCCAT...(- 210 bp)...
TGTCTTGTGA CGAAAGTACT ATAGTGTAG TGCAATCTCA TACTGGCTTT
TCACGATGAT AGGTTAAAA ATGTTTTCTA GCAAAGCACA TCAGTAATAA
TGATCCAA [G /T]ATI"ITTA ATGCCTAAGC TTGATAATTT TCTAAATTCT
TACTCGTTCC ATGAAGTGTG ...(--50 by )...TITITG GAAGCAACTA
TGACCAATCT TATCTTGTTT CATTTAAAAA TAATTCTAAA TCCTCTTGTA
TCATATTCTT ACTAATTCTC CTATTTACTA AACTAAAAAA GTATTTAGTG
TITITTATGT TTTCAAGATA TTAAATTCAT TTACATTTTC CTTCCTTCAT
ATTTTAAAGC ATTGTTGTTG TTTACAGAGT TTGGCCATT AGGATGTTGA
CAAAGGGCAG AGCTCCTTC ACGTATTACT AAGGAAATGC AAGTAAAGAT
TGGAAAATGC TCAGAGTC
pha11079 (SEQ ID N0:759) CTGCAGGTAA TTTCTTAATG AATGGGTTGT TCTT'ITGCTC CTCTGCAAAA
TCAAGTTATC TTTGATTAAG CGACTGATTG AAGACCTTTT TTTAATTTAT
TCGGGAAGA AGAAGAACAC TCGGTACAGT AGCAGGAAA ACATGGTGTA
GCCTATATGT GATAACTTAT TAACATGTGT CAATGCCATA TGGAATAAAA
AAATCTTCTT CTTCAGATAC ATTATTGAAC TITTTGACGT TTAATTTCTT
GTCTTCACAT GTCCTACTGA GTAGCCCAAT TTGTT[C/A] TTTATGATGT
GGCTGAAGAT TGAAGGGCCA CTCAGTTGTA TCTAT[G/A] TATATTAATA
TATGTATCAT TATTTCATAT ACCATATCGT GCATTTTCAC GGCTTATAAT
ATGTAAATTG GCTTCAAATA TTGCCAGATG CTACAAGAAA ACATGATTGA
TTATATGGAT ATGTGAATGC TGAATAGTTT CACAATAGCA ACTGAGATGT
CACAAACTTT TTACCTTCTT TACATTAAAT AAACTCCAGT CTCTTAAGAT
WO 99/31964 g 1 PCT/US98/26935 GTACCCCACC AATAAAGTGA TTGCATAAAT TATTCTAGCA AAAATGTATA
AGGTTAGCAT AGAGGTAAGG GTATGAAAGT TAAGCTTATG C
TTACTATTAT TATTATTAGC ATATCAAAAA CAACATTAAG TATTCTGGTG
CCTGCTTAAA ACTTCAAATT TTGAAAACTC TTCTCAGTAT GCTCCCCTCA
TATTCTTTTA GCTTCAAAAA TCATTTACAC GTGAGTTCTG AATATATGAT
TT[G/T]GCA ACAGATAGAT TGGTGTTTTC TCAACACAA GTCGATAATT
TGAGTGTGGA TTTCCTAGCT TCAAACAAT ATCTATACAT GAACGTCCTT
GATAGATCAA CTGCAG
phal-1131 (SEQ ID N0:760, 761) -CTGCAGTGCC ACCATCAGCA GTATCAGCAG ATGACACTGC AAATCAAGCT
GTACACTGTC CATCTCTATT GTTGATGCTC ATAGCCTGC TTCTTAATCT
TGTTATTCTG ATATGTTTC TATACACAAC TTCATTCTTT TTCTCTCAAT
ATTGATGTAT AATCAGTTAC CATI~TTTfA TATGTGGTGT GTGAATTACT
TTTTGCGCAA AAACAAAAAA TTGGCTATGT GCCTGGTAGT TTAAG[G/A]
AGAGTTCTAT TGATCAATTT ATGTAATGAA ATTACGTTTG CCAAGTGAGC
CTTAGTAAAT GGCTAACTTC TTACTGCTTC ATCATATGCT TT[G/A]ATT
TGGTTTGACT GAAATCTTCA TACCCTATTG CACAAAGATC CATAAAGCAT
GCCTTGCGTG CATTTT'TT"T GTTGCCATTC GTTGCGGGGG TGCTTCTGTC
ATTCATCCAT TCTTACTTAA TGTTGCATAC GAATGAATGT GTAAAACATT
TCTCAGAATC TCTA[T/C]A AACAAGTGCT TGCTTATAAT CTCGAAAAAC
CAACCACTGA ACATGAATGA GTGTGCATAT TAATTCTCAA [C/TJTTGGA
ACGTGAATCA GTGAGTGGTC TCTTTGC(A/ T]TGCAACTT GAGATGGTTG
TGAAACTCCT GAAACAAATA GTGTTGCAAA TG[T/G]ATG TGCCGACTAT
TTTTCTCTAC ATTATTGATT AAAAAATTTC CTITTTTATG AGAAGTCATG
TACTAAATAA CGTI"ITITAT ATGTTATTC TTTGTAGAAC TCATCATATT
GTTCGTTAA CGAACTCAAA TCTAAGTCGT TTATTTCAAT TTACTTTTTG
CACTAAT.. . ( -1200 bp) ...ACTAAAT TTGAGTATGT ATCACAGGTA
GTATGACTCA CATTGCCAAC AATCTTTGCT CCAAACCATG GAGTATCAGC
ACCATATGGA GGAGGTATGA CTCTAATTCC ATTGGACACG TAGGGAGGAA
GTAAGGCATG GAGTTCCTTC TCTATCCTTT CTGTACCAGA ATGGTCCCAA
AAGTAGTAGA ATTAGAAGTA GCCATTACCC CACTACAGGT GTTTAATTTC
ATTTCATCAT TTCGCAAAGA TATAAAATGT CATGTCACAA ATAAAAAAGA
ATATTTACCA GCCAAACCAG GTAAGCATGC AGTCCCCCCT GATAAGACTA
TAGTCTTGTA CCAATCACTG TCACATGCTA AATCTGCAG
pha11132 (SEQ ID N0:762) CTGCAGCGTG TAAATGAGCA TTGTGGGTTC TAATTTCCTG CTITITCCTC
TCCT>fiTGAT CTTTCAACCA CCTACCCATT GTCCTACCTC TTAGCAAGCT
CCTGTAAAGC TGGTGAGCAA TAATTTGCAC CAATAATAAA ATAAATGAAA
AAAAATACTT AGTAATAGAT TGAGTTGGTA TTTGAGCATA ACTGATGAAG
ATAATCATGA GGTTGATTGC AAGATACCCC ATTTCTGAGC AAGTTTTGAC
TGGAAAGGAA TTCTGGATTG AGTGCTTGGT GGAGTAAAAG TAAATCCTGT
ACATGACATA TTGCTATGTT ATTAATTTAT ACACACTAA GAATTCGCTC
GCTGTACAA AA[A/C]AAG GTTTGGTAAT GGTGGC(A/T ]TG[T/C]GA
GTGTTTGTCA TTGATGGTTC TCAGCTATTG GCATGAGGAA ATTATCAACT
TCACAACAAA AAGCAGTTAG TCATGAAAGC AGGCTAAGGC CAAATGATAT
[ATT/G]TTC TTCAAATTTG AAACTGAAAA ATGGGCCAAA TTGTTTCTTC
TGCATATGAT CATATTTAT CACTATATAA GTTAAC[A/G ]TGAAGCAGT .
[A/G]AAGTA CCTTCTTTTC ATTACTGTCT TTTGGCAAAA ATGGAGGGAA
ACACCCGTTA GATAACTGCA CAAAAATCTT GCATCAGAAG TTCAGAACCA
ATATTAACTT ACAAAAGCAA TAGACACACA CAGCAATAAT AGATGCTTCT
TTATATTGAA GAATCCTAGG ACCAGAACAA AAGGGGTCTA GGTTATGGAA
AAATTC'1'TTT CCGTGAAACT CCTTTGGAAG GCTTTGAATA TCTC[C]TTT
'1?TI"TCTATT TCCCCAATGC AGGTGTGTGT GT(GTGT]GG CATAGAAATA
GAATATCTTT TAAGAGAACT AACATAGCTT TTGCATGATT GTCAGTTTAC
ATGCTGCATT GCCAAGTAAG AAAAAA[G/C ]TAGCTAAAA GAAGTGCTAA
TGACCTAAT[ G/A]CAAATA ACTCTTAAAT TAAACACAGC CAATGTTCTT
ACA[T/G]CT TTGGAAGCAG TGGAACTCTT GGTATCAAAT TGGTGGCCTG
AAGGGCAC[A /G]AAAGTGG CATCTCTATA CCGGT[A/G] GAGATTGTAA
CGTTGGTGCT GTGCAGTGCT TTGGAGAGTT CCATGGCTGA TAGACTCCAT
GATCTAGCAA GGAACTCCAT AGATTCAGTT GGAGTATCA GGAGGGGGAC
LO AGGATGATG CAGGCCAATC TGCAG
pha11133 (SEQ ID N0:763, 764) CTGCAGAGAC CATTCAAGAT CGACTGAAGA ATTTGATTGC CTCGTCCCCT
GTGATGCTGT TCATGAAGGG TACCCCAGAT GCACCAAGAT GTGGTTTTAG
TTCCAGAGTT GCTGATGCCC TTCGACAAGA GGGCTTGAAT TTTGGGTCCT
TTGATATATT GACTGATGAG GAAGTGAGAC AGGGATTGAA GGTATACTCA
AATTGGCCAA CCTATCCTCA ACTCTACTAC AAAAGTGAGC TGATTGGTGG
TCATGATATT GTGATGGAGC TGCGAAATAA TGGGGAGCTG AAGTCGACTT
TATCTGAGTA GGATTATTAT TATTCCTTCA AATAACATGT GTTATGTCCT
AGAAGCCATT TTGGGAGGTT GTGTTTGATG TTCATTAAT GAACTACATG
GTTATTTTAT ATGCTACCG TCAGTGATTT TGAAAATTGT TAGTGTGGAG
CCTCA[T/A] CTAATGGTA TACTGAACAT GCATGAATTC CAATCAGATT
AGAAATTGTT ATATA[T/A ]TACATAATT ATTTGGTGGA AACCTCTCAT
TGGTACTGCC AAACAAATT AATTCAACAC GTGTGAGCCC CCTCGAAGTT
GCTTGCCCTT CAAAGTTTAT GTTTTGGACG TTATGCCAAT CCTT[T/G]T
TTTTGGTCTA GACTTTCACT CAGACAAGGA ACATT[C/A] ACTATAAAAT
TAGATTCTGA AACTTCGATA AAAAGAATGC 'ITI"TTAATAT ATAACGAACA
GTTTAAATTT TATAATTAAA AAAATCTTGA AATATAGTTT AGATTAAAAA
AATCTTGAAA TATGGTTTAG AGATTAAAAA AAAATGG[C/ T]AAATTGTG
ACTATTAACA GAATGATACA AAGA[C/T]C AACTATACAA ATAAAAGAAT
GATAATTTAA AATGATTGAG AAGTT'1CIT1~T GGTAGATATT AACA[G]TAA
A[A]GTATAA A[A]AAAGGG AGCAATTAGT TACAAATTAG GTCTGTAGTA
TTTTGAATTC TCATGATGAT TTATTAAATG CATCAATCTA ATCATATCAT
GTATTTCAAA TTTAAATCAT CCA[C/G]TA AATGCATCAA TCTAATCATA
TCATATATTT CAGATTTAAA TCATCC[AGT AAATGCATCA ATCTAATCAT
ATCATATATT TCAGATTTAA ATCATCC]AC TCGATA[G/A ]CTTATTT[G
T/AT/GC]AT GTATCATGTA TGCATGCTGT ATCTT[T/G] C[A/G]T[CC
TCATAGAATA ]ATTAACCTA ATCTT[C/T] TATAATCTTT TATTT(C/T]
TTT[A/G]TT ATTAGAAAAA GACTAAATAG GAAAGGATAG ATCATATATA
C[TCTAG/TC TGGITC/GTT AG]AAGACTC AAA[T/C]TT GTAACTTAA[
A/G]ATAAAA [G/A]AATAT ATAAAAATAA ATTAACCTGA TCTTCTTT[C
/A]ATI"I"I"TA ATZ"ITIT[TJ ACTAT[G/A] AAAGA[CIT] TACTTAAATG TATTTTGTT[
CA/TG/TA]A GAAAATAG[T /C]TAAGATT C[A/GTATG] TATGTTTGAA
TTAAATTTTC TTTTATATAA CATT[T/C] GCATAAATTT TTTTAACCAA
AAATAAGGAT AGAATTACTA TCAGCTGTT ACAACTTAGA AACATTCTAC
CAGATTCTCT TCCTTCTCCT TTTGCACTGT TCTTCTGAG TGTAATCC(T
/C]CAGAAAA GATGGAAGTA ATTCTTGAGATTCTG GGAGGAACT...(-- 1$5 bp)...CCTCCTAGTC GCCCCCACCT GATGACGCAA ACCTTGCGCC GCC(T/A]C
AACCGACGCT GAGACTCGCC ATGGCACACA CGTCCGTTCG GGATGCCGGA
GTTTCCACC GTCGTTGCAT CTCAATATA ATACCACACG TTTACGATC
CGTTTGGATA TGGAATGCCA AATATACCGGAG TTTTCATCGT TAACGGAGTT
GAGGCCGATGC CACAGATGCC GCAACAGCTTC CGTCATCATC AACGGAGGAT
ATGCTGGTGC CGTCAAATAG CAATGAGTT TGAAGAACCGC TGCCTGACA
TGATGGA[A/ G)GCGTCACC GCCGTCAGAG ACGGCGGGGA GAGTGACGTT
GAACGTGAAA GTGCATGAGA TCCAAGATCG CATCCCTATA GAAATGGAGT
TTGAAGACAC CGTTCTGAAG GTGAAAGAGA AGATAGTTGC CCGCGAAGAC
ATGCGAGGT GTGCCATTGG AGAGGATCG CGTTGCAGTC GCATTCAGCG
GGTGTGGAAT TGCTTGACCA TCAGGTTCTG CAG
pha11135 (SEQ ID N0:765) CTGCAGCTG ATGGGAAGAT TTACTTCTGG TAGACTATT TTAAAATCTT
GTAGTTTTCG TTTGGTTCCA AATTCATTCC AGAACTTTAT TTGCAATTTT
GCATAAATTT TTAAAATGTT TCTTCCGTGA ACATGATGAA AAGTAATTTG
CTTGTCAGCT TCCATTTGAC ACTTGTTCAA AATTAACTAA AAGTGATGTG
ATTATGGCCA TTACTATTTA ATTTTCCTGT GTTATTTGGA TGAAAAACAT
AGTTCCAAGA AAATTTCCAG ATTATACTTA TTCTATGGGG AGTGGTI'TIT
CTTATTTCTA AGTCTACCCT GTCTTTCTTT TGTAAAATTT TCATTCTGTT
TGGAGCTTTA TCCCGCATCC CAATCTTTAT TATTTTGTCT TCTAGAATAT
GTTCAAGGAA GTCGATGGGG TTGACTTAGA AATGGCATAT ACTAGTACAG
AATATTGATA GAAACCCAGA TTTTGAATAC AGGAGAGTGT GAATAATTCC
CAGTTGAAAA ACTTATCTCA GAAGAGCAAA TAAGGATGTA TTTATCCTTG
AATTTTAGTG GGTTAAGATT ACGTTGCACT [T]TAAATTA AATCAAGATG
GACTTGCCCT TACTTGGAAC ATTTAAGAGA [G/C)ATACT CACTTTATAG
GAGCCTTGCC CAATTAGAAA AAGAATAATG GAAAGGAAAC ACAAGGAAAA
ATITACGGAA GTAGACTCTC CCATAACAAA TAGAAAGATA ATGGAAGCTC
TCAATATGTA AGAGAAGAAA TGATAGCACA TATCATTCAA GAGATTTATA
AATCTAATCT TATCCTGCAC CCTGTTCTTA CTTGTCTCTC CTCTCCACCC
TGTC~T ATTGCTCCCA AGTTACTGAG TCTGCCCTTC TATTCCTCCT
GCCATGTGAG CTGTGTGGAC C[C/T]GCTC CTTTCCCCTG TCATCAAACA
TATTCTTTCA ATGGTACATG TGTTTTCTTA ATAGGATTAT TAGGAAGAGA
AATGTTAGAG ACATGCTCTT AGACATTATC TTTCTAACAC TCTGTGATTG
GCTGAATTTT GTTAAAAATC ACCAAGTTTG GGGGTCCCAC TTATCATTTA
ATGATTTTAT CTCCTAGTTT AGATGTGGGG TCCACCAAAA TTAGTGATTT
TCAATAAATT TCATTCAATC ACAATGAGAG TATTAGAGAG AGAGAGTTGT
TAGCATTCCT CCAGTAGGA ATTGTTTGTG AAAAACTGGT TTGGTGCCTG
AAGAACATTG ATTCTTCTTT GAAGGAAAAA AAAAA[A/G] GAAGGAAAGC
ATTITTTCCC ACTCTTTTCA TACTTAGTAT CAAATTAAAA TTGTCCTCAT
TATGATATTG ATCATTTCTT TGTTTGGAGT ATGCTTGAGG AATTGGTGCA
TGTTTTAATA GTGTCATATA TTAAGTACTG TAGGAGTAGA ATGTACTCTG
T1TTCATTGA TATATGTATT TATC[T/A]T TTTTG[G/T] GATTTCAATA
ACAGGGGATG TTAAGAAACT CCCTTGTCA ATAAAACTGG TTTAGAGCC
TGCAG
pha11136 (SEQ ID N0:766, 767) CTGCAGGGTG TCAGGGCAGA GCTTTAACAG ACTCTATTTG ATGAAGGCT
TTGCATGAGG ATTGATAAGG CAATITI"TG TTTGTCATTC TGATGGTTTG
ATCCAATTT CTTITCCATA TAGGATGAGA ATTTTATGGC GAATTTGGTA
GTATATT[A/ T]GAGGATCA CGCTTGTAAG GCGACGTAAT TTTTGTTTAT
TTTATGAAAC CCATCCAATT GTATTGAGTA TAATCTGTTA TTGAATATTT
AGAGATACAT GCAGTGCCAC TTGCTGCTTT TTATCTATGT TACAAAATTT
GGCATCCAGT TATATTAACC AACTACTTTA GTTAATTTCT GTGGAAAGTA
TTT[AITjTA CCAATCACAA TGCGAATAGT GAAACAATCA CAACTAGAAG
TTCTCGCTTC ATC[CIA]TT TTTATTCGGA GCATCGTTTC AAACACGTTA
GTTTTGCATT TTGCAATTTT TTTTTT[T]A AATCAATAGG CTCTGTTCTG
CTGAGGAATA CGTGTTTAAG GAATAACTAC CCGAAAACAA GCAAATCCCA
AACAACTTGT ATTAACTTCT GAATATTTTA CGGGATGGTT TITfTAAATG
WO 99/31964 g4 PCT/US98/26935 TAAATTTTGA GCATTCCTTG AAA...{- 200bp)...TGGGT[C/T] GGATA[T/C]
GGATCTTGAA AAGAAGTTCT TTTATTTCAT GGATTGCACT TATCTGTGT[
G/T]TGAAAC GCGACTTGGT GAGTCGTGCC TAAAACATGG TGGATAATAC
TCATCCCAGC TTGTGTATCT TATGTATG[T /C]TCACAAT CTCTCTTGAT
GAAGAT[T/C JTATCACTGC TTCCAACA[C /G]AGAGAAC TTT[G/AJTA
CCTA[TA]CT ATGATCGGAA AAGAAAAAGA TTTTGTGCTG ATTTITTAAT
ATAAAATAAT ATTGATAAAA CAATTTCTTG AATTCGATTT TGCTCACATG
AAAACTTTCC TTCCCTAAAA CTITI"T[C/T ]TCACCCAA AAAATAATAA
CAACA[T/C] AATTCTGTAT TCATGCCTAA GA'CI"ITTAGA TTTGTTCATT _ CCAATGAATA AT[G/A]TGA CTACITI"TAA TCATGTGGCT ACATTGCTAT
AGTTTGTAAG TTGTAATATT ATTAAAATTT G[AT/GTA]A ATACAATCGA
TAAACTGAAA TCAGTAAAAG AGAAAACAAG AGTTGCGAAT AACGGATTTA
TATGATAGTG ATTTCCATTT AT[A/G]CGA GGAAAATCTA CTTGGACATA
AAACAAGAGT CTACATAAAC CAAATAGCCA AGGAAATTAA AAAGGATACA
CAACATAAAG GAAAAAAGAA AAGAAAAAAA TCTCATCAG TCACAGATTT
TCTATATGTC AAGCTAGTT CAACAGAGAA GCATATCTCA TCAAAATACT
TCCATCCTTT GAGGAACTGG GTTTTATGAC CTGCAG
pha11138 (SEQ ID N0:768) CTGGATTTGA GATCACCAT CCCTTATTTA ATTAGTGCTT AAAAGATTCT
TCTGGAAAA GCCAAATATA GTTGGT[C/T ]CAAGGTTTT GCCATTAATA
TCCGATTAGT TAAGGGAGAG ACCATCATAC TTAAGGGGAG AAATTTATAG
ACTATIT'TTT ATTTCTTGAA GTAGCTAAAT TAGCAAACCA ATAGTCTAAA
TGATATAAAA AGACACAAAA CAAAATCTAT TTCAGGACAA TAATAAGTGA
TAGAAATAGT TCAAGTTTGG CAAGGGAAGG AATCCTTACC CCAAATGCTA
GCTTTGCTGG AAGGCAATGG GCGAACCAAT AAGTTAGCAG CATGTGGATC
ACAGTGTACA AACCCATGCT TGAACATCAT TTCAGCAAAA GTTTGACTAA
CCTGCAAAAT GTTTATACAT TTTACTAAAT TACTCCATAA ATGAAAGACA
GCCTATGTGT AACACTTAAA AGTTATCACC ATGTGACCAT GAAGTCACAG
GTTTGAGTTG TGGAATCAGC CTCTTGAAAA TGCAAGGAAA GGCTGTCTAC
TATACAGTAT AAACCCACAC AGAGATCTGA CCGTTCTGTG GGCAGGAGGT
TTATGAGTTT ATGGTACATG CCTCCCTTTA TTTTGTTGAT AAACC[A/T]
TATATGGTAC TAATAGCATG AGGGAATCAA AATGGTCATC ATTAACTTGC
TTAACTTCAT TGACGTATGT TATCTTGTCA AAGATATCAT GAGGCATATA
ATAA[AAACT TCGTACCCCA TTGCCCAGAG GCTCTTCGCT ATGCGAAGGT
ATGGGGGAGG GACGTTGTAC GCAGCCTTAC CCTTGNATAT GCAAAGAGGC
TGTTTCCGGA TTCGAACCCA TGACCAACAA GTCACCAAGG AACAACTTTA
CCGCTGCACA GGGCTCGCCC ATCATGAGGC ATATAATAA] TGTTATAAAA
AA[CTT/AAA JACACACTAT CAACACCTAT TGGTGACCAT TGTACAAGTG
[C/TJCTATA ATAAAGCCTG TAAACATAA C[G/TJTACC AATGTTGAAA
GTTCATGCAG GTTGATCCCA AGTTTCCGA ATGGTCTTTA CATCATTTAC
ATAAGCACCC TCC
phall139 (SEQ ID N0:769) GAATTCCATT GAAGTTGTAG AAGTCAAACT TAATTGTAGG TTTAGGACAG
CCTAGTTTGG TATTGGTGTT TGTAAZTIZ"T TTGTATATTG TGTAGAAGTT
ACTGTTCTGA ACCGCTTGCT CTTZ"TAACCG GAATGAAAGA ACAGAGTTGA
CTTTGTTTCA CATGCGCAGT GGGATTGTGT TGCTTTATGA AAATGGAA_C
TGCTTTTGGA CTCTGTTGGG ATAAACTTCT CTATAAATA CTTATAGGAG
AAGAAAACAA TAAGGAAAAA TTAAATAACG TTCTTCCATA GACTAAAATT
AGCTTATGTA TAAGTTAAAA TCATCTTTTG GAGAAGCTAA ATGAGAAAAC
CTTTGCAAAT TAACTTGTGC ATAAGCTAAT TTTAGTGAAG CTAATTTTAT
TTTTGCTTCT TATCTTATGG AGAAGTTTGT TTAAATAGG[ G/AJATTTGG
TGTAAATAGT TGTGCTGTTG TTAACTGGGT CATITTTCAG TTTTAACCTT
GTCTGAAACC AA[A/G]CAT GGCTAACAGA ATGTTAACAT TTTCCAAAAT
TATATTCACA TCAGAAATTT TTTTAGAAGT TTTTCCGGAT GA[T/G]TTT
AGCCCAAGAA TTTGTTTGAA TTTCCAAGAT CCATGCTATT TGCTTCAATT
GTAAATTTAA ATCCAACGTC TTTCTTGTTA TAAATGCAAC CAAACATCAA
ATTGAGAATT TTG'ITTITCT CTGTGAATTA TTAGTTAAGT GTGTTTTCTT
TGTTTAAAAT AGATCTAATT TGTTAAGCTT TATTTACCAG AGTTACCAAA
CCATCTGTGA GAAATATCCT TCATTCCGTG AAAGATCTGA AAATGTTGAT
CTCGTGGTGG AAATTTCTCT GCAACCATGG CATGTTTZTA AGCCCGATGG
AGTAAGATAA CTCTCTGGCA CTCTTTAATT TATTTATGCT TTTTCAGGCA
TTTZTCA[TC /GT]GCACTT TTTGTGATAT GGTTCTGATT TTGTTGCCAT
TAACAATTTG CTTGGTACTT ATTCTGTGAC AGGTGATTTT ATTCTCAGAC
ATTCTTACCC CACTTTCTGG AATGAATATA CCCTTTGATA TTGTGAAGGG
TAAGGGTCCT GTTATATTTG ATCCTATTCA CACAGCTGCC CAGGTTGATC
AAGTGAGGGA GTTTATTCCT GAAGAATCAG TTCCATATGT TGGTGAAGCA
CTGACAATTT TGAGGAAAG AGGTCAGACT GATCCTATGT TCTGTTGCAT
TTCTGCTTTT GTTCATTTTA CCCCCCACTT CCTTCTTATA GCAGAAGCAT
GTCTGTTTAG TCAAGTTCCA TAAGACAGGA AATTCTTTAT GTTTGTTTGA
CTTAAAAAGA ACTCTGTATT TTATGGATGC TCATTGCTTG TGAATCATCA
TTGGGGGGTT GTAATAGCAT GAAATGATGG AGAACAGGTG ATTGAAGGCA
TATCTCATGT GGTACATGGG GGAACATATA CCTTGAAAAC TCACAGAACT
ATATATTTTA TTGAACTGTC TATCCTGTAG GTGGCTATTA AGATGCTGAG
TTAATCTACA T[C/T]TCCT CTTTGTATTA AAAAGTTGT TTCATGTACT
AGCAAATGGT GGAGCTTCAT GGAAACATA CCCGACATCT AAATTTCTTA
TTATTGTTGC TTTATCATTA TTTTCTGTGG CCCAATTGAT GTATTATGCC
CCTATGTTTT ACTATCTTAT TGAATT~
pha11627 (SEQ ID N0:770) CAATTCATGG TTTCTCfTTA TC]TTAT[G /A]ACATTGT TGCCAAGTAA
TACTACTAT ATAAATTCAG ATTTGGGTTT C[A/T]GAT AACCGTGGTC GTTAC
pha11628 (SEQ ID N0:771) CTGCAGTGTT GTCTCTCGG AGTTGCTTCA ATTGCTCATA CTCTTTGGG
ATAAC[CJAC TCATTTCAAA GATGTACTAG TTTAAAACAT GCAAA[AJAA
[G/A]ATAAA GTTAATGTGT ATTTTGTATG TTGTAGGGAA GCACAAAGTA
TCTTGATTGA ATTAGGAAGA TTACACGAGC CGTATGCATC AGAATAA[AT
GGTTTGTGGG AGGTTAGATT TTCTGAACGA AGATGAAGA] TATGC[A/G]
AATTCTITTC AAATTAATTT TGGG[C/T']A AATGATGAAG CTAGACTGAT
AATTGATTAA TTTTGGGCAA ATAATATTAT AT[T/C]ACA TGTATGAGAT
TGATTTTAAG TGTATATGCA TACATGAAGC AATAGACTTA ATTTAATTA[
C/T]CTTAAG GAGTG[C/T] TG[G/A]ACT TTTGAGGATG [C/T]C[C/A ]TTTTGTGCT
[G/TJATGAG CCCTCCATGG TTGACATACA [A/G]AGCAA ATTGCAGGGT
GTCTTTAGCT GAGGTIrI"ITG CTGCTTCGAA GTGGCAATT GAATCAGCTC
CGTTGGACAG TGACATGGTG ATGGTGGTGA TAATTAATT[ CG/TA]GCTT
AAGGGTAAGT ACAACTTCTT AGCTCTGTAA GCAAAGGATG CCTTGTGGAG
TTGGTTCATC TAATCCACGT ATATATA[G/ TJGGCTGAA[ C/T]GAGGGA
ACAAGAGTTT TCAATCAATG A[T/C]TACA ATTCCACAC TCTCGCCTCT
AAAGTGCAT CCCTCACATT GAAGCATCCT CCAAATCCCA AAATATTATT
ATTACCACTT AAAGCTATTA CAAATCAGAA AACACTGCAG
pha11701 (SEQ ID N0:772) TTCTCTTGG CTTGCCTCGC ACCTTCTCA ATAATATCAC AGTAAGAAAA
AGAAGACAGC TGTGGATGTT AGGAGTGATT GACCCGCACA TGCATATGGA
WO 99/31964 86 PC'T/US98/Z6935 TCCAATCGG[ T/C]AAG[A/ G)TGAAGTTT CTCTTGGCTT GCCTCGC[A/
G]CCTTCT[A /C)AATTAAT ATAT[ATATA T)ATATGTAT GTG[G/C]TT
TGGTGTCTG[ T/C]ATATGA CATATGTGCT CAGTGCCTCC GCCAAAGCAA
ATAAAATGGA TACTGAAATG GTTGAGAGCA TTGATGTCAC AGATGATGCA
AGGAGTTTTA GCTGGGCAGT GGACAGCGCC ATAAGGAGTG AGACTGGGTC
CATITITGGA GAATCTCGTG TCCGTAGTTT TGATTCGTTA GTGGACAGTG
CCATAAGGAG TAGTAGTGCT ACTGAAGAAG ATTTGGGATC ATCCCCGGTC
CATGTTTZ"fG CTATGGCAGA AGACAGGGCG ATGAGGAGTA GTAGTGGAGC
TGATTTGGGA ACATTCCCTT TTCATGTTTT TGCTACGGCT GCGGATATGA
GGACTATTGC AGCACGATTA ACAAAAGCCG TGTCGAATGC AAAAAGACAT
TTTGCTGAAG GATTATTTAC CCGGGCTGAG CTTATACCAG TCACCCAGGC
TGAGCTTGAA CCAGCCACCA ACAATTTCCC [AIT)GGATT ATTATTTACC
CTGGCTGAGC TTAAAGCAGC CACCAATAAT TTCTC[AC/T A)GTGACAAC
ATGATTGATT TTTATGTGTA CAGAGGCAAA CTCGTTGATG GTCGTGAGGT
IS TGCAATCAAA AAGAGGATAG CAACCAGGGG AGACTCGTTT GGTAAGTCTG
AATTTGCCAT CTTTTCCCGT TTACATCACC GGAACTTGGT TGGGCTGGTT
GGGTTCTGCA AAAACAGAAA TGAAAGGCTG CTGGTGTATG AGTACATGAA
GAATGGGTCG TTGCATGATC ATTTGCATGA CAAGAACAA TGTGGAGAAG
GCTAGCAGT GTGTTGAATT
pha12105 (SEQ ID N0:773) ACGTGGCACA AATCCAAGGA CGTGGCGCGG AGAATCCTC GATTCGGTAG
AGTACCAGAT GATGAGGTG CACGTACACC TTAGGTTTAG GAGAGGCGAA
TCTCGCGGGT AAGAAGATAT TCCTTTACGA CGCCGTTTGC AGGCCGAGCG
AGATTCACTC GTTGGAGACG ACGCCGTTTG ATTACGTGGG GAACTGCGAG
AACAAGACGC TGCACGCGAC GCAGCAGATC GCGGAGTGTT GGACGCGCGC
GGTGAGGAAG CTGCTGGAGA GAGTGGCGGA GTCGGTGGAG AGAAAAACGT
TGGAGAAGGC GGCGAGGGAG TGTCACGCGG TGGAGCGGAT CTGGAAGTTG
TTAACGGAGG TTGAGGACGT GCACGTGATG ATGGATCCGG AGGATTTCTT
GAGGTTGAAG AAGGAGTTGG G[G/A]ATGA TGAGAAATTG CGGGGAAAT
GGTGGCGTTT TGCTTCAGG TCGAGGGAGC YCGTGGAGGT GGCGAGGATG
TGTAGGGATC TGAGGCAGAA GGTGCCGGAG ATATTGG
pha12390 (SEQ ID N0:774) CTGCAGTCTT TAGTTGGCCA AAGCCCAGAT TTGATTGTTT TTCTATCGCT
AGTTGAGAGA TGTTGGCAG GATCTTTGTG ACCAAATGT TGATGATTCG
TGGTATAGCC TTGTTTCTGG ATAGGACGTA TGTGAAACAA ACAACAAATG
TACAGTCATT ATGGGACATG GGTTTGCAAC TTTTCTGCAA ATATCTTTCT
CTATCTCCAG AAGTAGAACA TAAAACTGTT ACTGGTCTTC TTCGTATGAT
CGGAAGTGAA AGGTAATTTA TATTTTGCAA TCTCAGTTAT GAAAATGACC
AGTACAGTGT ATGTGCAGGT TGTTCTTCAT GTGTTCTGTA TATGCATTCT
AGAATTCATG TGATGGGAAC CAGTTGTTAC TGATAAAACC AACGAAGACT
AGTTCTTTAC AAGAACATAA GTGCAGAATA AAAGATAAAT CTAGAATTTT
GCAAAATGCA GATATACTGA ACCTCTAGCC CAAGCCCATT CCCTTTZ"TGC
CCTCACTTCA TGCCTTCTAT GCACTACATA TCTTTCCCTC AT[A]TTITT
TiZTfGTTTT TCCCTAATTA TTTTCCACCT GCGGGACCTC TCACTATTCC
TTGTCAGTCC AGCTCTTGGG TTATGCAGAA AACTTAACAA TGTGGATATT
CCCTTTATTT TTTATTCGGT CCTCCTCACC CATGTGTGTT TACGCTITI'I' ACTTCCCATT CCCTTCCTTG TAGCAATTAC CTTATTGGCC ATCCAATTTT
CTTAATATTC C
AAGCAGAG TGACCATTTG TTTGGGGAAA CCTTAAGTTG CCACTCTGCT
TTTTATTCTG TAAAATCAGG AACCATCACT CTGACATGAG GGAACTAGAA
TTGAGATTTC TGAATGGGTT TGGAATGGAG TTCTTGTTAG AGATGGGCTT
WO 99/31964 8,~ PCTNS98/26935 TTTCATAATT ATI'IZTGCTG ATCTTATTGT TTTTGAAAGA AATAATCCTT
GAACTACCTG CAAAACCATC TTCTACCAAT TGCCTACTGT TGTAGTCAAT
GTACAATGTT AAAGGTGCTG CCTTGTGACC TGAAAATCTA TATCTTGCCA
AAAAATGGGT CCATCAGTGT ATAATTAAAG TAATAATTTC AGAATAGTGA
TATATAATAA ACACCATATA GAGATTCCTA TGGTGATAAT TGTTGGAAAT
GGGAGTATTT TAGGATATTA GAGCTTCTTT AATGTTTGTT TTTCATTGTG
GCCTAGATAC TTGGTATAGC TAGGTTCAAT GCATTTTAGG AAGTGGTAGT
AGATCTGAAT TGCTCTIZTG TAAATTGTTT TAGCATTAAT TATCTTTGTA
ATACTCTTGA ATATAAATAG GCCATTCTGC TGGGTAGAA AAGACAATCC
AGCATATTA AGAAAAATTT TGTTTCTCAT CTTCCACAAT AATTATGTGA
CTTAGCCAGT AATTTTCAAT CTTACTGCAG
pha12391 (SEQ ID N0:775) CTGCAGTGCC ACTTGTACAA GCTCTGCTG CCAGGTTAAG TGTTTCTAA
TAAATTGTGC TTGGTTTTGG TATCAGATTG ACTACAT[A/ G]ATGAAGCC
ATTCTGACAT CATTCTGAAA TAATGAAATT TGGAATTGGA AATCTT[G/A
]CTGTGATCT TI'TT"TCCCCC TATTTGAGGG AAGCAAATAC TAGTTTGGCT
TAATTACACT TTTAGTTCCT TTATTTTAGC CTATGCACGA TTTTGGTTCC
CCTAGTTCTA ATTGCTTGCA TTTAGTCTTT GTAGTTACTC AATTGTTAAA
ATTAAGTCCC TCACCTCATA TTTTGTCCAA TATTTAGTGG AAATCTCACA
AGGTGTGATG TAAGTCATGT G[G/A]CATT GATTTGTGTA GATTGTTGTA
GGCAAAATAT ATACAAjT/C ]ATCTAAATA AAATATTTCA TATATTTGTT
TCCTTTGTTA ATCTATACAA AAGTATAAAT CAATG[A/T] TGGATAAATT
CTGGCTGTCT TTTTCAATGA ATAATGTCTG GCCTTGCAGC TTTCTCAAAC
ACACTAGCCC CAATTTGTTC TAGTCATAGC ATGGCCATGT TCTCTCTAGT
ATGTCCAGCT GCTTCAAATT CCATTGTGTC AATCTCTATG TGTCAAATCT
CCAACTATAT CAACTTCCA GTCGTGTCAA TCTGCAG
pha12392 (SEQ ID N0:776) CTGCAGGAC ATTCACAGTC ATTGCCGCA GTGGAAGGAA TTAACAAAGA
TAGTAACAG GCAATCTAAC TGCACTGTCA GACTGTGAT A GAACAGTC AA
TGCAGGA TG TAGTGGT GGACTTGCGG GACTATGCCT TAGAGTTCAT
TATCAACAAT GGTGGCATTG ACACCGAAGA GGATTACCCC TTTCAAGGTG
CTGTTGGTAT TTGTGA(T/C ]CAATATAAG GTTAGTTTTA CCTTTGATTC
TTTGATAAAT TAGTAAATGT TTCTAATAGT TTCATTATAA TACTTATATA
ATTI'I"TTTCT TATCTATAGA TAAATGCAGT TGATGGTTAC GAACGTGTTC
CTGCCTATGA TGAATTAGCC TTGAAAAAGG CGGTAGCAAA TCAACCAGTG
AGCGTTGCCA TATATTGAAG CATATGGCAA AGAGTTTCAA TTATATGAAT
CAGTAAGTTT TTTAATCAAC TTTACTTGAA AAGTAAAGAA CTAAAGGAAC
CTACAATGTG AGTAGAGAGA CGGACTTAAA GTTAGCAATG CATTACATTT
AAACTAATCT AATCTAAAGA GATACTCTCT CCAAATTTAA ATATAAACAA
ATTTAACTAT CAAACATATG AATTAAAAAA TTTAGTTAAT AATATTTAAT
TCAAAAATCT TCCTCAAATA TITTTCATAA TGATGCTTGT AAAATAAAAA
TAAATGATTT ATGAGAAATA AAATTTTATT AAATGAATCA CATAATAAGA
ATATTATCAT TAAATAAAAT AAAAATTAAT TAGTATTTAA TTTATTTATA
TTTAGATCTA AT'I"I"T'I"ITI"1' TATTGCTTAT ACTTAAGTCC AGAGAGAATA
CATTAAAAAT TGTTCAATTC TCAAATATAA ATTACAATAA CTTTACTTCT
TATTTACTGA CTTAATCTAG TATATTTTAT TTGTCCTAAT TTTCCAGGGT
ATATTCACAG GAAAATGTGG CACGTCAATA GACCATGGTG TTACAGCTGT
TGGGTACGGA ACAGAAAATG GAATTGATTA TTGGATTGTT AAGAATTCAT
GGGGTGAGAA TTGGGGTGAG GCAGGTTAT GTAAGAATGG AACGTAATAC
AGCAGAAGA CACTGCAG
WO 99/31964 g8 PCTNS98/26935 pha12393 (SEQ ID N0:777) CTGCAGGGT GGGATGTCCA ATGAATTAT ACAACACTAT TG CCCGTGT AAC
TGATGG AATTTATGAA GGTACTGCCC AGGTTTATTT TTCTGTATTT
CTTAATGATG GTTTTGAATT ATATTGTAAT ATCCTGTTTT GCTACAGGCA
TTGCCATTGG TGGAGATGTT TTCCCAGGTT CCACACTTTC TGACCATGTT
TTGCGGTTTA ACAACATACC ACAGGTAAAA TTTCACTTTT ATCTTGGGTA
CTGTATATAA TATCTGCAAC TCATTAAGAA CTGCAAATGA TTTGGAGTAT
GTCCTTTTCT TGTGAATGAT CAGTTTGTAT TTACATTAAA CCTCAGGTGA
AAATGATGGT AGTACTTGGG GAACTTGGTG GGCGTGATGA GTATTCTCTA
GTGGAAGCCC TAAAACAAGG GAAAGTGACT AAACCAGTTG TTGC[T/C]T
GGGTTAGCGG AACCTGTGCA CGACTCTTCA AATCTGAAGT ACAATTTGGT
CATGCTGTAT GTCAGTGGCC CAATATTI"IT TTACTAATTC TCAGTTATAG
CATTTATAAT AAGGGAGCCA AATTCCTGAT GTTCAGTGGC ACTTGTATAA
ATTAGGGAGC TAAAAGTGGT GGTGAGATGG AGTCTGCTCA AGCAAAGAAT
CAGGCACTAA AAGAAGCTGG AGCTGTTGTT CCCACTTCAT TTGAAGCTTT
TGAAGACGCA ATAAAGGAAA CATTTGACAA ATTGGTTAGT TTATCTTGAA
A'I"I"T'I"TCATC TCTCTAACTG ATAATTTTAC CTTGTCCCCA ACTCCCCATC
TCCCAAAGGG AGCAATGTTT TAAAGAAGTG TCTTTAGTTT TTAGTTGCCT
AGATTAGCTT GAATATCATA GTTGTITI"I"T C[TT]TTGTT AGGTTCAAGA
AGGGAACATC ACACCTTTTA AAGAGTTTAC TGCACCGCCA ATCCCTGAGG
ACCTTAACAC AGCAATTAGG AGTGGAAAAG TACGTGCTCC AACTCACATT
ATTTCCACCA TCTCTGATGA CAGAGGTATG TTGGGATCCT TCGAATTTAT
AAGTTGTAAC ATGGAACTGA GGTGTCCTAG ACACATTTGA TAGCTAAATT
AAACTTTTCT TCCATTATTT TTCC[G/A]A AGGTGAGGAG CCATGCTATG
CTGGTGTACC AATGTCTACC ATTATTGAAA ATGGTTATGG TGTGGGTGAT
GTAATCTCTC TTTTGTGGTT CAAACGCAGC CTTCCCCGTT ACTGTACTCA
ATTTATTGA GGTAGATTAT TCATACTCTA GTCTGCAG
pha12394 (SEQ ID N0:778) CTGCAGCCTC AAAGCCTGGC TGCATTATCT GATGGAAATG CAAGGATTG
GTGCTCATC ATGTTCCCCC ACAAGATGCT GTTGTCGACA AGGGAAAGAA
ACCTATATCA CCTCAAGTTA CTCCCAGAGG GAGAAGGTCC CTTTCTGAA
CCACTTAAAG AGTCAACAG TTGAAGGCCG AGCTGCTCTG TTGGCAAATA
ACAAAATGCC TCATCCATTT ATTTTGATCA A(G/A]CCCA AGGATGAGCC
TGTTGATGAT ATACCAGATT ATGAGATTCC CCTTGCAGTG ATTCCTCCTG
GTATTCAACT CACTATGCCT TGTTGAGAAG CTTCTTTCTT TCTGTCCATT
TTTTGTTGTT GTTGTTGTGG CATCTTGGTG TTGACTTTTA CTTTCATGGT
CCACTTTTAT TGAATTCAAT GATGAGTACT GTATTTTGTT GGTAGTATAA
TGCTGCGGAG TAGGTGGGTT ATTCTTCGGA ATATATAGGA AAAGTCTTTA
ATTCCAACAT ACCTCAAAAG AAGAAAAGAA TTTAATGCAC AATTTTGGCT
GAGCTGAGAA ATAATTTTAG TTGAATAGAA CCTGATTAAT TTTCCCTCCC
TGGAGAAAGG ATGACATTTC ACCAGAGCCT ACCATGTGCT TCCATTTTGC
ACACACAACA AGTCTGTAAT TGCATACAGT AGTAACAGTA TAACITI"TGC
TTTCCCTGAA AATATATGAT CATCTCTGTT TGCTGTTGCT TCTTTATGTG
GAAGATCCCT GGTTCCTGGA ATCAGAGATT CAGAAATGCC CTTGTATTTT
AGACTTGAGC CAAAGAAATG TTACTTCTGT GCCATTTTAT TGTTTTTGTT
ATTCTGTTTC TAACCCCACA TTAAACTCCA TTGATGAGTT CTATTTTGAT
TTATTTTAGA CTCTCCAATG GGTGCAGTTG AAAAGCAAGA TGTCCATGAC
ACTGTTGTAT CACAATGCAG AGATGAAGAC GTTGAACATG AAGATGTTTT
TCCTTCCTCA AATGAAGAAG CAACTTCTAA TGTATATGTA GCTTTGTCA
TCTATGGGAG AGGTAAAAA TTTCTCTGAG CTGCAG
WO 99/31964 g9 PCT/US98/Z6935 pha12395 (SEQ ID N0:779) CTGCAGTCAA TGCTGATGCC ATTTTCAGA ATTTGATCAC TCAGAAAAGG
GTGATGCAT TATATGGTAA GT"ITI"T[T/A ]TT[T/A][T TATT)ATAAA TAT[A/G)TA
TA[A]TATAT AT[A/T]TA[ 14(TA)/15( TA)/16(TA) ]ATAAATATT GAACTTTTA[
T/G]TTTT[A /T]TITIT[A /C]TITI"TAC CTTGGTTTGC TCATCTTT[C /G]ATGAGTG
TTGTTTTGTC TATTATTCGT GAATTTGAAT GGTTTCTAT GATTATACTT ACTA
TTTCT GTCTCGCCCT ATTTCGCAGC AATGGAACTT GCATTATCTT
TGGAAAGGCT GAATAATGAG AAGCTTCTAA ATTTACACAG CGTACGTCAC
TTCTATAGTT CTATTTTCTC AGTACTTTTG TGAATTCAAC TCAAACCTTT
TTAATTCCAC TACCTCCCAA TGGAATCTTA CTATGTTGTT GAATGCCAAT
AACTAGTTTT TATCACTATA TATGCAGTTA GCAAATGAAA ACAATGATGT
GCAATTTGTT GACTTTCTTG AAAGCGAGTT TTTGGTTGGT CAGGTAAATC
AATGTCCAGT AGTGTGATTC TTTCTATTGC TGTCAAGAGT AGTTGAGTAG
CATGGGAGAA ACATTGTTCT CATTAATTTC '1"f'IZ"TGTTGC TTAAGGTGGA
AGACATTAAA AAGATCTCAG AATATGTGGC CCAGTTAAGA AGAATGGGAA
AAGGACATGG TATAATATAT GTTAACTACT TGCATCTTAT AAACCAAA_C
GGTTGATTCA TATTCATATA CTTTGTGGT GAAATTAAAG TATTTTATTA
TTGGTGGANC TGCAG
pha12396 (SEQ ID N0:780, 781) CTGCAGGATT TGGCATCTTC TGTTTAATCA AGCATAAATG ATATGGTTGA
TTAATTAGTG ATATATCATG TTGGGATGCT GCTAACTGA ACGGTTGACC
ACTACACAG TGTGATGTTT TAATTAACTT TGCTTCAACA TT[ACC]ACC
TTATATTTAA AGAATTAGGA ATAGATTTAC CAGTAGGCCC TTATGATAAG
AAAAATAAAA TAAATTTACT TTCTCCTCAA TTAAAATCGG CTT[C/A]TG
CATTTCATTT TTTAAGAGGT TAAATGAGAG AATTTCTCTA TATAAAAATA
GATAAGTTAA TCACAAAATT TCAATTTCTT TTTCTITTCT TCTTI-TATAA
TTCTGTiTCT ATTTAAACAT TATTCTTCTG ATTAAGTGGG TTTGTGAATA
TGGATTGCCC AACCTAATTT ACCTAAAAAT TTGGAATAAA CTAAGTAAAC
TTATTTTGTC ATCATTAATT AGTTTAATGT GCATTACTTT GATTACTTGA
GGATAAAAGT GTGTTTTACA TATCTTTATC ATATAGGGAC CATTCGATGG
GCAAAATAAG ATAAGGTTTA ATAGGATAAA TTATTATATT CTI'TT"TCAGT
CAAZ"I"CI"TGA TACACTATTA AAATATAATT AGTTTATTTT ACCATCTATT
ATATCATGCT GTATCAGTGT AGCTTTTATC CCACCCATCT TAAGAAAAAA
TATATGTTGG AGAAAAAAGA AAAATATAAG AAAAAGTAAT GACAGAAGTG
ATAAATTTTG AGAAAAGTTC TTTGATGTTA ATGAAGAAAT TTGTTT...( --550 bp). ..TCCACGCT CGCCTCATCG TGCACTCGGT TACACCCGAT AACTTGTTCG
TCTCCAAACT CATCCTCTTC TACTCCAAAT CTAACCACGC GCACTTCGCG
CGCAAGGTGT TGGACGCGAC CCCCAACAGA AACACCTTCA CGTCATGTTC
CGCCACGCGC TCAACCTCTT CGCGTCATTC ACTITITCCA CAACCCCCAA
CGCCTCCCCC GATAACTTC ACCATATCCT GCGTCTTGA AAGCCTTAGC
TTCGTCTTTT TGCAGCCCCC ATTGGCGAAA GAGGTTCACT GTTTAATCCT
TCGACGCGGG CGGATTGGAC TCTGATATAT TTGTTCTCAA CACGTTGATC
ACGTGTTACT GCAG
pha12436 (SEQ ID N0:782) CAATTCAAGA AAAAAGACA ACATAAAAGA GTCTATAAGA CCTCC AAG
GA[A/G]TAT [C/T)CCCTA GAA[AA/GG) TCG[GTCG)C TTTCACAAAA GT[C/A)AAA
AAACATATGA T[A/G)TTT CCTGCACTTC ATATAAATAC TCGTCATTTAC
P13070 (SEQ ID N0:784) CTGCAGAATG CTGATATAAG TTGGCAAAGT CGTTTGCATT TTGGCCAGAA
GTTTGCATGC TTGAATAGCC ATTTTCAGCA ATAAAATATA ATCCCATGTG
TCTCTTI"TTG CTTCCATTCT CAGCCTTACT TTTTACCATT TGAAATTTCT
TTTGACATGT ATAAGCATTT ACATTCGAAC AAAATCTTGA AACTCAAGTT
TTAATAATTG ATTATGAGAT GGACTTTGAC TCTAACCTGA ACTTATTGTT
GACCAGTAAC TGTATI"1'T'fC ACTTAGAATC AGAACATAGC TAGAATTATC
GTGAGTATAC TCTTT..... ..(-650 by )......GGG GCTITT'I"1"i'T CATGATCATT
CATGTATCAT GTTACATATT TGACATAGCA GTAGGCGAGT TCTGAAATTG
TTCTTTGGTC AGGTTGTCAG GCACAAAAGG GATGTCAATC GATCATGTTC
TTAGAAGCTC AAGCAGGTAT AAAAAGGCCA GAACAACGGC CTCTCAATTG
GAAGAGACTC AAAGCCAGCC CAAATTTGTT CCAGACAGCC TCGCCGAGTA
ATCACACATT TCAGCAGAAG AAAGGAAAAA GGACACGTAA CCACAACAG't' GGTATI"I'TTG TTCATITT'TG TGTCCTTACA GAGCTCCGTC TCTCTGTTTT
TGTTGCCCTT TTACTTGTAG GTTTATCTCG TTCTTAACAT TCCAAAATCC
AAACAGAAAA CTATTTT'TG CCGTGTAA GCGTGTTTAC CAATCTAGTT
GGTGATTTA TGTCACGAAT CTAGTAAAGG AGAACAATGT GATATATTCT
ACTA[C/T]T ATATGTATGT TACACTAAAT TACTTATATC CCTAGTGTGA
ATTGTGTGAA GTATCTTTCA ATTCCATAGA GGAATAATC TCTTGACTGA
TAACTTGCAG TAGCGTTTT CTTITI"TAAC CACATITITT TCTATCTACC
AGAATGAAAC TTGAAACTTC CTTTAAGGAA TATGGGCCTA GTCAATAAGA
GAGATAATAT TTATATTTAG AGAGAGAGAG AGCAATAGGT ACAAGTTACA
AGCAAGAAGT GGTACAAGAA AAAGGCATTG GAAGCGGGTG CACAAAATTC
CATTCCCTTA GCATTAGCAT AGCAAAGTCA CTACTCTGAT CTCATTCTGT
TAAGCTGTTG AGTGCCGGTT GCAAAGTACA TAACCCAAAC AATCGCAGGT
GTAACCACAA TAAGCAAAAG CTGTCCCCTG TTGTCACTGG CGCAGCATCA
GCAATCACAG CCTCACTAGC CGATGCATCT GAGCCAAAAA GCATGCCCGA
TGCGCCCAGC CCAGCCCAAG CCCAATGATG ACCCCTTGCT GCTACGCATG
CGGTTCAGTT GGTTCAGCGC TGGCTGCAG
P13071 (SEQ ID N0:785, 786) CTGCAGTCCT TAGGACAGTA TTTGTTTGGT TCAACCAGCA ACCATCCTAG TC
TACAAA TCTTTGTTCT AACTATGGAT AATGGATGCC TTCAACCACA ATTAGCA
AAGCCAGGA T AGCTACTT GGTTCAAGCT TGTTGATTAC TGCATAATCC TGGT
AATT TTGACAGATG AGTCTGAATG AAATGTTTCA GAATCACAAC CTGTTGAG
TCTTGAGACT GGTATATTCT AGCTCTGCAT AGTAGACAGA AACAAAAAGT
CACTCCGG TAAAGACCAG AAACTGGATA TCTCAATAGT CAATTGTGTC
AAAGTATAAA TTATACTTAG TTTTCCAGAG AAAAATCTTT GGACCATCTG
AATCCAAATT TACTAACTAA TAAATGGAAC CTCCAATACA ACTACATCAT
TAAAGAATTC TGCTGGACAA TATTCTGTAA AAATCCATTA ACTTAATTGA
TATACTCACT CATAATTGTG CTATCTTGTT CGATGTTTGT TAATITITTC
AAATTGGGAT GTTGGTTTCC ATTATATCTT TTGCAAAAAA TTTGAGCATT
ACAAAATAAA ATTATAGAAT AAGATAAGCA GTTGGGGATA GAGAATCTCA
TITCI'CTGCA TTTCGTTAGA GATATTACGT TATATTAGAT ACTTCAGTGT
GGTTATGACA AGAAACTCTA TACTTTCTGG CATCTGAACC AAAGTTGTGG
TAGACCCACA CTAACATCAT ACCAAAGTG CTCATTCTTC ACAGAGTTAG
GAAAAACTA TACTTAACTA TAAAAACTCT GGCTACTCAC AAGAATGCTA
AGCTAGTTTT CTAATATTAA TTCAAAATAC AGAAAATTAT GTTTAATCTT
ACACTGGAAA TCAACGAAAA GAATGGAAAA AATAAAAATG AGGAAGAGCG
TATTGGATT GTTCACTTTA ACATCAACTT ATATCTCAGG AATTGTAAAG
CATCCCTGCA TCTCTATATG AGATTTTAAA CTAATCCAAA CAAGCTAACC
CTTCCTAGTC CTAGAACTCT TATCACATG CTGAAAGGTA TACACTCACT
ACTAACTAAA TAACATGAAA ATGATGTACA CTTTGATTAC AAACATTTTA
AGTTTTAACT GGTTTAAAAA TAGTTCATAA AAGAGAATAA TATATAGGTT
CAACAGTTTT CCAGCTAGTC CGCACACCTA TTTGCCACTT TGAAGCTGTG
TGTCTATTTC CGCTTTGAAG CTGTGTTATG AGATAAGACA ACACTCCAAC
TTTTATTCTA CAAATGCTGA TTTTCTTAAC AAAATCTTGT TTCTACTCTT
ATGTTGCTTG CTAATTAATA CCTTACCATG TCAACATCTA ACTTGAGAAG
ATCTATTIT'T CTAAATAAAT TATGTTTCTT CTTTATATAA TCGGCCTTCG
CTAATTAAAC TTTATTGTCC TTCC'IZTI~TA NTGCTGTTCT ACTAAGGAGC
AAGGGTAAAA TTGCGGTTCA TATGCACCAA CTA.....(-- 300 bp)... . GGCAGTAT
TATTGGCTTG GGAGGCCCGC CCATGAGAGT TCTGCAAAGT AAGCAGCGTC
AACTCACGGC CAGCCTATGA CCCCTCCAAA AACTTCTTGC ACATCACAAT
GAGTGCCGCA AGCAGCTCCC ACTGCTTGCT TTGTTTTCCT TACTCATTAG
TTTAAGGCAA [A/G]TTTT[ C/A)C[G/A] GTGCTGCTTT CCTCTT[G/A ]TTTATTGAC
TAGATGTTGG GTCGATACCT CGTTT[G/C) CAATCGATGT GGGGACTCTT
TATTCGAAGT TTGCTGAACT TGTGCTTTGC TATCTAAGCC TTGCACATCA
GCTTTTGTTA AATGAT[T/C ]CCTTGCACC TTGAATCATG ATCACATTTT
AGATTCTGAT ATTAGTATTA TG[C/T]TTT TGATGGAATG CAAGGAAAG[
G/A]AT1T1~T AATACAG[CC /AT]ACAAAA TGATACCAAA AGGCTTTATC
AATTGATTAA AAAAAGTGCA AGTGTTATTA TGCATGCACT TGAATTTGGT
TACAACTTGA TITTTTACAC AGCTTGGAAT TTACTGTTTG TAACAACTT
TCTCCCAACA CAAGTTTCG CTCAGTTTTG AATCATAAGG TTAGTAATCA
TTCAGCTGCT TTCAGTTATT TGGATCCTTC TTTGGCATGA ATGACTTTTG
GTTTTGGACA TACTTACAGG AATAATAATA TAAGCTGGGA TGCCAAAGAT
AATTGGAAAA GTACTCTTCT GCCTTCATGG TACTAGGCTT TTZTI"TCTGT
GAAAAATAGT GACAGTACAA GAGACTTTTA AATTGTATCA AACTGAAAAA
TCTTTGAGGC GATTTCTATT ACCTTCAGCT TCTTTAAGCT CAACCAACTG CAG
PI3072 (SEQ ID N0:787) CTGCAGCACC CCTAGCAAGA CCTGCTTATA TACAGAAGCA AATTTAATCT
ATGAGGCTCT GATTI"ITCTT TCTTTCTCAT ATTTATGTGA ACATCAAGCT
GAAAATAATC CTCATGTAA CCAACTCTCT ATGAAGTTTG AGATCCAAA
CCTGTTTI"TC TGAAAGATGG AAAAGAAAAT GAAAGTTAAA AAGATAGAAT
CAAACATGTA GCTATATTCA ATTGTAAGTG CATAATTCAG CA[A]TGAAA
GGTGATACAA TCAAAATTAC CCACCTCGGG TACAAGGGAA CATAGTCTCC
ATGTTGATCT GGC[A/T]AG TTAAATAAAA TTGAAACGTT AGACAAATAC
ATAAAATGTT ATCCCAAACA CTCAGAAAAA TAAATGAGTT ATTTGAACCC
T[C/A]AATA TG
CTGATAA TACCAAAAGG AAAATAACTT AGAGGCATTG CTCTCTATTC
AGAATTGCAG GTTCATCTTG ATAATTCATG TTATAAAGAT TTGCATTAAA
TATTAGAAAT TTAGAATAG CTTACCGAAG TGAAACACCA AATCCGATTA
GAGAACAAC ATCAAAAACA CCTAAGGAGT TGTATCTAAG TTTTGTCCAC
ATCAAAAAAT ACATTGATGA TGCGAAAAGG AGAGCATACC TCTTCCTGGA
ATGAAAACAG TCCCACTCTC TTTGCTGAAA TTGACACCAG GATTGTAACG
CTGCAG
P13073 (SEQ ID N0:788) CTGCAGCATG GGCACTATAC AAGGCTCAAG AAGAGCTCAT AAAGGTTGCA
AAAGAGTTTG GTGTTAAGCT CACAATGTTC CATGGCAGAG GAGGAACTGT
ATACTATTCA TGGCTCACTT CGGGTAACAG TGCAAGGTGA AGTTATTGAA
CAGTCATTTG GAGAGGAGCA CTTGTGCTTC AGAACACTTC AGCGCTTCAC
TGCTGCTACA CTTGAGCATG GAATGCACCC TCCTGTGGCA CCAAAACCAG
AGTGGCGTGC CCTCATGGAT GAGATGGCT GTCATTGCTA CAGAGGACiTA
TCGCTCCAT TGTTTTCCAG GAACCCCGTT TCGTTGAGTA CTTCCGATGT
GTAAGTATTG TTGAATACTT CAG[T/A]AT AGAAAGATGT CCTTGAAAAT
CTAGCAGTTT AAGTGGCATA TTTACAAAAA TGAT[A/C]A TTTAGTTAGC
ATGATTAACT AAAATGCAAT TGTTTCCAAT CAAGACAAAA TTCCTTTAGC
ATTATTGATG TTAAAATAAA TCGTTAATAA TGTTTTACCA TTTT[TJTTT
TCTTCCCCAA TCTTGTGAAT ATATTATTAT CAGTTGCAAA AATTCTGATT
CAACTGGAAT ACAATTATAT TTCTGATGA TTTAAGAAAC ATTTCTCTTT
CCTTTGGAGT CACAACTAT TTATCTTGCA ATTTATTTCC TTATTTTCTT
TCCTTGCTTT AATGCTGAAT ATTGTAAACT GCAG
P13074 (SEQ ID N0:789) CTGCAGGATA TGGAAAGTGG AAAACTAGAA AACGGAAAGC CTTTTAG TG
ATAGTGTGAT ATACTACTAT ACGTAAGTTA CATTCATTCA CCAACAAAAA
AAAACGTAAG TTACATGCAT TAGTTTTCCT TCTTTAAGGG ATAAAGTGAT
CTTTGAGGTA ACGATGGCCC AATCAAATGA GGTAACGATT GAGCTAAAAT
GCAAATGACA CAAATGCAAA TAGTAGTAAT TGCTTGAGAT TTAGGGGATT
AGTTTAGCAA GCATAGTGAT CATCATTTTA TAAAATTAAT AATGATATAT
TGAGGGACTT TTTAAAATCA TAAAACGTTT TAAATTCCAA CAGAAAT_GG
ATAGCAAGTC AATTTCATGC CTTGTGATAG AAAAAGAAA AGTCGTAGTA
AATTATATTT TGCTTTGTAT ATGCTCAAAT CACATTTITT ATTTTCTTTT
CTTTTAAAAA TTTAAAATTT AAAATATACT GTGTAATAGT ATGCTACACA
TTCATCATTT TAGAAAAGTC AAATGAATTA TTTGATITI"T TATTATATTT
TTTTATTCAA TTTGATTATT TATCTI'T'TAA AAAGTTTAAT TTGA
ATCTTATTTT TTGGTTTAAT TTAATTCTTT ATCTTTTAAA AAAATTAGTT
ATTTATCTTT TTTTAAAGTT TTATCATTCG ATATTTAAGA TTGACGTCAT
TAATCATTT AAGAATGAA[ C]TI'TTTCAG TTAAAAATGA AGTGGAAAGG
AG[A/G]GAA AAAATGAA[A /C]AAAAAAA ATTTATTTGT TAACGATGTC
AATTTTAAAT AGATTAAAAG ATAAAAAAAA ATCAAATAAA TCATTTAGTC
TTTTAGAAAT GAATGAGTTA TATTCAAATT TTTTAATATA GATGAACAAA
TGAATGAGTT ATACTTAAAT TITT'TAATAT AAAAGAACA TTTTTAAGGC
AGATGAAACT TAGGCCCTTT TCTGAAAACA GTTTATTAAC CCTTTI'CTGA
AAACATTCT TCACATTCAC TAAGTACATC TTCATGTCCT GCAG
P13158 (SEQ ID N0:790, 791) CTGCAGCAAC ATCAACAGTG TCCCGAATCG AAGCGTCTCG ATCCGGGTTC
CAACGCTCGC CGGCCTGGAC ACCGGAGATC GGTATCCGCC GGAGCTCCGT
TAATCTACTC CGGCGGAGCC ACCTTACTCC CAAGCGGGAA CATTTGTCCG
TCCGGGAAGA TCCTCAAACC GGGCTTGCCC TCGCGCGGGT CGAACCGGAC
TGATGTGTTG GGCTCCGGCA CCGTGAAACT ACGGCCGGGG CAGCATAGTG
CGAGGCGTCT CGGGCAATAT TCCGGTGCCC GTGGGCGCAC TGCCGCCTAC
GGTGAAGCGC GCGCTCAGCG GCTCCGATCC CGAGGAGTTG AAGAGGGCTG
GGAATGAGTT GTATAGAGGC GGGAACTTTG CGGAGGCGCT GGCATTGTAC
GATCGCGCCG TCGCCATCTC GCCGGGAAAC GCCGCATGCC GAAGCAACCG
CGCGGCGGCG CTTACGGCGC TCGGGAGGCT CGCCGAGGCC GCGAGGGAGT
GCCTCGAGGC GGTGAAGCTG GACCTTGCTT ATGCCAGAGC GCACAAGAGA
CTTGCTTCTC TTTATCTAAG GTAATGTATT AATGGAAAAA TTTGGATTTG
GATTTGCATT TGAATCTGAG TTTGAGTTTA GTTTTGTTGA GATTGGATTG
GAACCAAGAA ACTTGAGTTT AGAGCTAGTC AAACTTGATT ATGGCTTTGG
NCAACGTGTT TGGTACTCCC TGTTAACGTG ATTAGTGGAG . . . ( -- 50 by ) ..TGACT CTGTGTTGAA TGTTGATGCT ACTTTCAGTT GCTTCTGTAT
CCAAAGAAAC GTGACTCGTG ATATATCACT TTTGTGCAGG TTTGGACAGG
TTGAGAATTC GCGGCAGCAC CTGTGTCTCT CTGGGGTTCA AGAGGATAAG
TCTGAGGAGC AGAAGCTGGT GTTGTTGGAG AAGCATTTGA ATCGGTGCGC
TGATGCGCGG AAAGTTGGTG ACTGGAAGA GGGTGCTTAG GGAATCTGA
GGCTGCCATT GCTGTTGGAG CAGATT'ITTC GCCTCAGGTA GTTTTGAATT
GAAATTTCTG ATGTTACCAT TGTCTACATT [G/T]TI"I"IT G[C/T]TAGA
GATGTCATAT GAAATTATTA GCGTGTCTTT GGTTAATAGC ATAAAGTTTT
WO 99/31964 93 PC'f/US98/26935 AGGATAGCTA GGGTGTGATT GGTTTCTGTT TTCAAATAAC TGTITTTAGT
TTCCAAAATA CAACTAAACA AGGTTGCTCT GTTGGCTGGT GCAGAGTGTA
GTGCAAGGGA CAGAAAAGTG TGGAATTGAT GGGGAAATTT AACAGGTTTT
AATAATTTCT TTTGTTTGTT TI"1"I"TGGTTT TTGGTI"T'I"TA GAATACTTTT
[T]TTTTGAA ACAATITTTA GAACTTAATA GATTTTGGAT GGTAAATTGA
TTGGTAAG[C /TJAGTGTAA AATTATTI"I'G GGAAACTGTT TTTAAAATCA
AAAAGTGAGG AGAATTAATG AGGTCCTTA GTTCGTGGTA TGGTGGTAGA
CTAGATTCTC TACAATCAA GTATAAAATC ATCCTCGGGT TTTTACTTGA
TAGTTTAAGC TTTTAGGATA ATTGGTTTGT GACAATTTGT ATTGGGCTAA
CTTGTTGGAT TGCTACTTAC AGA -TTGTTG CTTGCAAGGT GGAAGCCTAT TTAAAACTGC ATCAACTTGA
AGATGCTGAA TCAAGTCTCT CAAATGTTCC GAAGTTGGAA GGTTGTCCTC
CAGAGTGCTC TCAGACCAAG TTCTTTGGTA TGGTTGGTGA AGCCTATGTT
CCTI"ITGTGT GTGCACAGGT TGAGATGGCC TTGGGGAGGT AAACACTAAA
AACCTTAGGC TTGAAATCCA AAGCTAAGTA AAACTTTTGA GTGAGGAACA
AGTAATGGAT TGTTGCAGGT TTGAGAATGC TGTT
P13560 (SEQ ID N0:792) CTGCAGAACT GGTGGTGGTA CTCAGTCGC ATTGCATTAA CATCTATAC
TATCTGTTCC TGCTGCACTG ATTTCAGTAA AGGATCTAAA AGCTTTGAGA
CTGGG[A/G] TTTAATATGG AGCTTATAGC TATTGGATGT TCAGTAAGAA
TAGACATGAT ACTATTTCCT ATGTCAACAT AATTCTTCTT CTTCTGACTT
GGAAAGTTAA CATGATCTTC TTCTGAATGC AGGCAATTTT TGTCTTATCC
TTTCGAGGTG TTATCCATAT ATGGATCATG GGGAAGAGGG GCCCTGTCTA
TGTTGCAATG TTTAAGCCAC TCGAAATTGT CTTCGCAGTC ATCTTGGGGG
TTACTTTTCT TGGGGACTCT CTTTATATTG GAAGGTATAA CTCAGTGTTT
TGT[C/T]AG AGAGTTATTT TCTTCTTACA TACTTCACAT TATTTTGTTA
AAATCCATTT TCT[C/T]CT GTCACTCTAT ATCACACTTT TCAGCTTATT
TTATACTTCT TTCTTTTCTC CATTTATGTC TACCTTA[TT A]GTTGTATA
AGAAGTTGTA AAAACCGTAG GTAGGAATAT AATTTCTCTT ATGTTAACAT
GTTCAATTTA AAACATTTGT TTCGGGTCTA ATTACAAGGG TCCAACTTTA
TGTACAGTGT GATCGGAGCT GCCATAATAG TTGTTGGTTT TTATGCTGTT
ATTTGGGGGA AAAGTCAAG AGAAGGTGGA GGAAGATTGT ACAGTCTGC AG
P1356I (SEQ ID N0:793) GAATTCTTAC AATCTCTTGA TTCATGTAAC TGACTTTATC AGAATAGTTC
AGTATACATT TTGATAACTT CACAATCTAA AGGATTCATC ATATAAGCAT
ATCAAAGAAA AGGTATGAAG GTAAAGTGGT TAAAGATAAT AAATATCTGA
CCTCAGGTAG TCTGGAAGCA AAATATTTTA TATAATCCCG TCCAATGTTC
TTCACAATAC TGTCTAAGAG ATAAAGAGAT GGCAGTITI'T GATCACTCGG
AACCTGGAAA GTCATGAACT CATTTCAATT AAACAAGACC TITTITCCAT
AATACAAAGA CGCCATGAGA GAAATAAGAT TTCCCTGACA TATGAAAACA
AGGGAAACAG CTGCAATCAA CGATATCTAA TCCAATTAAA AGTTAGTATC
AAATTTCACC AAAAGTGGAC TTTAGCACTG ATTGAGATTG GAAACTTTTA
GGGTTCACTT CTCTCTTGTA AATGGAAAAA TCCATGTTTA CCAGGATTCT
CAATCCCCTG TGCATACCAA AACCACTCCC CAATGGTGCT AATTACCAGT
TACACACGTA GTCAAAATAC CAAGCTACCG TAATTTGGCC AAAGGACAGT
ATTTGTTTT TCTTTGTAAC TCTGATTGGA AAACTAAATA TGTTTIZ'CGT
TCTTTTAAGG TGCAGGGCAG AGTGTACCAC TTAAGAACCA ATCTACATCT
ACAGGGCCAA TGAAATGAGA TGAAATTCAA TTTGAACAAA CTATGAATGC
CCZTiZTCTG ACTAGCATTC ATTTCCATCT AAGTACAACA CTAACCCAAT
TCATCAACGC ATAGCATGAC AAACAATAAA ACTAATTGAA TTGCTCTTTC
TTCATTCCAA ~ATCCATTACT ATTCAATTCA ATAAACTTGA CATAAACAAC
TGCATATTTT GTTCACCTCT ATAATATTAG CACAAACGGT GGCAGCAATT
GCCTTGGCAG CAGACAAGTT CTCTCCAGCA ATAATAGTCA AGTTGGTAAT
TATTGGCTTC GAATTGAAAG TGAGCTCAGC AAGCGCGGTC TTGTACTGAA
TCACAAGCTC TTGGTGCGGC GGAGGCTGCG GCTGATACCC TCCGCCGCCG
CCGGAGTCTC TGTCATATGC TCGGAACCTC GTAGACGGCA ACGTAGTAAC
TGCCGCAGGG CGTAGAGGTA ATTGTCGGGC GCTGAGTTCC TCGATTAATC
GAGGCTTCTT GGGACCGGGT TCTCTCGATC TGTCCAACG ATCTCTCCAT
GTTCATTCG AATCAAATGC AAACGAATTT AAAACCCTAA TCCTAACCTT
TATTGGTTCT CG[C/G]GCC GGTTTTATTG GGGAGGGGGA ATGAATCGAA
GAGAAGCTCG GATTTAAAAT TGTAGAGGGC GAATTGAGAT AAACCCTAAT
CCTAATTTAC ATGAATTAAT AAAAAATAAA AAATAAAAGA GAGAGATGGA
AGAAGGTGGA AGGAGGTGGA GTGTTTATAA GTAGGCTGTG ATCTTGGTTG
GAAAAAAAAA [A]CGAAGAA GAAAGAGTTC AAAAACTTAT GATGGAGTTC
TTATATTITT TAACGGTTTC CTAAAGCTTG ATTTTAAACG ATAAAATTTT
ATTAAAGTAA AATAAACATT TTCTGAT[G] AAAAAAAAAT ATCACTI"I"IT
TGTGTAAAGA ATATATCACG TTTAAATGTT TAAAATTAAC CTTAAAATTA
AATA'1~I'T"1'TA AAAATCTTTT TAGATTAGGA GTGTTTGGAT AGGATTTAAG
ATGAAATAAA ATTCAAATCA ATCGTATTTA AATAAATTAA TTTAGTGTTT
TAAAGTATCT TCAGTCGATC TAAATCAAAT CAATGTAATA ATATTAATAT
TATGTTGTTT TGATACTTTA ATTTCCAAC ATAACATATT ACTTACATCC
GACTTAAAT ATAATTAATA ATAAGTTGTT TTCATGCTAG GAATTTTATT
ACAGTATCAA AACACTAACA TGATTTAAAT ATCCTTATGA GATAAGGCCA
AAAAATTCCA ACGTGTAAGT GATACCCAAC TTCTTATTCG TGATGTCTTT
GTTGATGGCC ATTGGCGTTG GAAATATTTT TTCCTCCATT ATTCTAATTG
ATGTTAAGCA TAAGATGATG AACTTAATTC TTGATGAAAA TAGTCATGGT
GTGATCATTT CGGGGCATAT CCAATCTGGT ATCTACACTG CCAAGTCCAC
CCATGCAAAT GATTGATTCA CGAGTCTGCT AATCATAGTT CACTTACTGA
GAGATTGTAG CAAGGAAACT CAAATATGGA GTCTTATAAA TATGGATTCC
AAGTGTTGAC TTCTTCATTG TGAATTAAGA CATTTGGTTC GCAACTCAAC
TGAAACGACA TAGAAATCTT TTTGCTTGTC TTGCTGGGTG GTTGATTTGG
ATATCAAGGT ACGTGGAGAT CTTTGAGAAA CGCATTTGGT CTACATGG
P14257 (SEQ ID N0:794) CTGCAGAGGA TGATTGATGA GGGCACCTCT GAACATGTGC TAATAGTGTT
CAATGGAGTC ATAGATGATG TTGTTGAGCT TATGGTGGAC CCTTTTGGCA
ACTACCTTG TGCAGAAGTT GCTTGATGTG GGCGGAGATG ATGAAAGGTT G
CAGGTTGT GTCAATGTTG ACAAAAGAAC CAGGGCAGCT AATCAAAACC
TCTTTGAATA TACACGGGT ATATGCATCC TTCTGTTGA ACTGAAAGAT
TGTTCTTTTT TTCTTITTCT ATATCATAAT GAACATGTTT TCTIZTrCTT
TTTGACATGC TGAATTTGAT AGTGTTGCTG TATCAGGACT CGGGTGGTTC
AGAAGCTGAT CACGACTGTC GACTCTAGAA AACAAATTGC AATGCTTATG
TCTGCTATTC AATCTGGTTT TCTTGCTCTT ATTAAGGATC TAAATGGGAA
TCATGTCATA CAGCGTTGCT TGCAATACTT TAGCTGTAAA GATAATGAGG
TATAACACTT ATCTCTTTTG CTTGCAATTT TATTGTAGGT TTATTTTCCC
ATTATTCACT AACTTCAACA GTACATAGCC ATATAGTTAT TCCATACTAT
TCAGTTAAAT TTATAAGAGA ACAAGAACAA CCATTAGCTG TCCACTTTAC
AATTTTGTTC TAAGCTGTAT GAAGGTATCA TTACACATGA TGACACAAAA
TGGTCTTTAG TCAACACTGC TTAGAGCAGA GGGCATTCCT GTTTTATTAA
TGTTTCATAA ATTGGTTTAT TCATGGTTTT GTGAAGCGTT GGCTTCTTAT
AGTGTGAGCT CTCCTGCCTC TTTTGGTCAT TTGGTGTCAA CCAGATGAAA
TATATTTCAC AAAAGGGTCT TAAATCTTAC ACCTTGTTGA AACTTTTTTA
ATTGCATGGT AATCTCCCAG TCTATTTTAT GCACACTACT TCAGAATATC
CTGACTTTTC AAACTAATTT CATTATTCAG CAGTTTCTAA ATTTGCTGGA
TACTTCATCT CTGGGATTTT CATATTTTCA AGCCTATGTT TATCTCTGGT
CACTTCAGAA ATTTCTAAAA GGTAAGCCAA CTGTGGAAGG AACAAAATTC
ACCATCTATT GAGGGATACT ATCTTATGAC CATAATGGTG AAGCATTCTA
S TATTTGGCCA GTTAGCCTGA ATATGTTCCT TATTTTCTTT CCCAATTAGG
AATTCTTAA[ T/A]AGGCTA ACTCTCTACT ATATAAGAAG CAATGGAAAT
AAATGTTCTG GTTGAGAATG TAATGATTAT TGACCATTGA CATGGGACTG
AAAAAGTTAG TAATAAATTC GGGTGATCCT TTTCATTTTG TGCCAGCCTC
TAATAAATTA AATTATTTG ACTCATTGCC ACTAAAATTT CTCTGTTTC _ ATTTATTACT TACCACGATA AATATTAAAT ATAACGTAGG TACTGCAG
P14395 (SEQ ID N0:795) AAAGATTGA GCCCTACAAT GACTAAATTA CGTGTGTAC CGATTAATAT
GTTTTAGAAA AATAATCACT TTTC'T[A/C] TAAAAAAA[A /T]TTAAGAC
AGTTTCGGAA AAACAAATAT TTACTCAAA TTTATTAATT TCCAAACATA
TATGTTTCGT TTATATATAG TTCACAAAGA AAACACCGAA GTTTGAAAGC
AAACACTACA [C]AAAAAAA AAAAAAAAAA AAAGGAGCGA AGTGCTAGGA
CAAACCCTAC AAAATAATTA ACCAGGAAGG GTAGGTGGTT ACGTTGTTAA
CTCCAAAGGA TGAAGTTTCA ATAATTGATC ATTTCCTTZT TGCCCAATTG
GCATAAAAAG TAAATTTTAT GACAAATATC TAACGAGGAT ATGGCTCGGT
GATTGGAGCC ATAATTGATG ATTCTTGAGC AGATTGACTT CCTGAATAAC
CTGACCTTGT TACCTCTGAA TGTTGCCCAC CAGACAGCAT AGCCGTAGGC
AAAGAACTCT TATCACGCAT TAAAAATGCA GGTTCAGAAG GTTTAGCAAG
TGGGAAAGAG TCACTGTTAA GCATCAGTAA AACGGTATTC ATAGTTGGTC
TATCAGCTAT ATCTTCCTGT ACACACAGTA ATCCAATGTG AATGCATCTC
CTTATTTCAT TCCAAGAATA ATCCT'fTAAT GTGTCATCTA CAATATTTGA
AACTGTCCCT CCCCTCCAAT TTTTCCATGC CTGCAAAAAA ATTGCTCAAC
TGAAAATAAA ACGTTTATAA TAT[AT]GAA ATTTCTTAAA TGTGGAAAAT
TTC[A/G]TA TGTACAACTA TAAGCTTCCG AACTTCAAAT AAATAGGGAA
ATTAACTATT AACTAATAGT CGTGAAAAGG TATCACACTT ACAAAGCTTA
ATAGATCTTG TGCATTTTCC TCGCTACCAC GAATCTCACT GTTTCTTTGT
CCGCATACAA TTTCCAGAAT CATTACGCCA AAACTAAAGA CATCTGACTT
GACTGAAAAC TGTCCATATT TAATGTACTC AGGAGCCATA TATCCACTGT
ATATCACAAT AAGCAAGCAC ATTT[A/G]G ATTAAAAAA[ A/G]TATTAC
TTCACCATTC ATGTTTCACC TATTGTI"T'I"T TTAAGGGCCT ATTGGCTTTT
GTTCTAAAGA AAATCACTAA TTGAAAATAT GGAATATCTT ATTAACTCTC
AAGTGTTGTT ATACTTACAA GGTCCCGACA ATTGTATTTG TACTGGCTTG
AGTTTGATTG ATCTCAAATA ATCTTGCCAT GCCAAAATCT GATATTTTAG
GGTTCAACTC TTCATCTAAC AAAATGTTAC TTGTTTTGAG ATCACGATGA
ACAA(C/TjT TGTAATCGAG AATCTTCA[C /T]GAAGGTA AAGAAGACCT
CGAGCAATAC CCCTTATAAT ATTATAGCGT CTTTCCCAAT TCAAATTCAC
ACGATTGTTT GGATCTACAT AAATACCCAA GCCACCATAG ACATGTATAG
TCTTTCACAA TTTAATAAAT TGTTCAATAA TATCTCTTTA TATCATCAAA
GGTAAAGTCA AAGCAAAAAT TTGTAACTTA CCAAATATGA AATAATCAAG
GCTTTTATTG GGAACCAATT CATATATCAA TAACCTTTC TCTTCTTGAA
AAACAAAAGC CAAGCAGTC TAACTAAGTT TCGGTGTTGA AGCTTCCCTG TTAC
In the above, bases that are ambiguous are represented by an N. All bases noted within brackets are polymorphic in some manner, i.e. differ between -genotypes (but sequences are known and represented). Bracketed sequence notations that do not contain backslashes within them represent insertion/deletion events, i.e. the bracketed sequences occur in some genotypes but are deleted out in others. Bracketed bases before an after slashes are substituted in different genotypes, e.g., the [AC/TT] notation indicates that some genotypes exhibit an AC
sequence at this position and others exhibit a TT. Similarly, [TCTAG/TCTGG/TC/GTTAGJ indicates that genotypes exhibited one of these four different sequences at this position, i.e., the nomenclature indicates that the site displays a combination of insertion events and single base polymorphisms.
The nucleotide polymorphisms described by this invention can reduce the expense and time required to exploit genetic markers in soybean improvement, seed production, and the protection of proprietary rights. We describe the use of allele-specific hybridization (ASH) as one technology that can be used to detect these polymorphisms. Using these polymorphisms with a technology such as ASH
is less expensive and time consuming than other genetic marker methods. The polymorphisms described here have the genetic advantages of being co-dominant IS and locus specific, and have the operational advantages of being obtained by PCR
amplification from small quantities of DNA template and being detected without the use of gel electrophoresis.
Accordingly, the disclosures and descriptions herein are intended to be illustrative, bui not limiting, of the scope of the invention which is set forth in the following claims. One of skill will recognize many modifications which fall within the scope of the following claims. For example, all of the methods and compositions herein may be used in different combinations to achieve results selected by one of skill. All publications and patent applications cited herein are incorporated by reference in their entirety for all purposes, as if each were specifically indicated to be incorporated by reference.
FIELD OF THE INVENTION
The invention is in the field of agricultural technology, particularly marker assisted selection of soybean.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part of USSN 60/068,185 "NUCLEOTIDE POLYMORPHISMS IN SOYBEAN" filed December 19, 1997 by Jessen et al. , which is incorporated herein by reference.
BACKGROUND OF THE INVENTION
Genetic mutations are antecedent to evolution and genetic diversity. Along with mutations that change the expression of genes, and consequently the function and structure of organisms, comparative sequencing within a species reveals a plethora of neutral mutations in non-coding regions of genomic DNA. Whether or not mutations affect organisms, they constitute diversity in the nucleotide sequence - and if known, can be exploited as genetic markers on chromosomes.
To be genetically stable, a mutation in a base in one DNA strand typically requires a complementary base change in the opposite strand. Thus, the polymorphism, or change, between a wildtype and a mutant includes both strands of the DNA
molecule.
If a change allows the organism to be viable and fecund, any base can substitute for any other base at one or more nucleotide positions in a DNA sequence, or any length of DNA bases can be inserted or deleted. Mutations that are selectively neutral or provide an advantage in a particular environment may then proliferate within a population.
Genetic markers represent (mark the location of) specific loci in the genome of a species or closely related species, and sampling of different genotypes at these marker loci reveals genetic variation. The genetic variation at marker loci can then be described and applied to genetic studies, commercial breeding, diagnostics, cladistic analysis of variance, or genotyping of samples.
Genetic markers have the greatest utility when they are highly heritable, mufti-allelic, and numerous. Most genetic markers are highly heritable because their alleles are determined by the nucleotide sequence of DNA, which is highly conserved from one generation to the next, and the detection of their alleles is unaffected by the natural environment. Markers have multiple alleles because, in the evolutionary process, rare, genetically-stable mutations in DNA sequences defining marker loci arose and were disseminated through the generations along with other existing alleles. The highly conserved nature of DNA combined with the rare occurrence of stable mutations allows S genetic markers to be both predictable and discerning of different genotypes. The repertoire of genetic-marker technologies today allows multiple technologies to be used simultaneously in the same project. The invention of each new genetic-marker technology and each new DNA polymorphism adds additional utility to genetic markers.
Many genetic-marker technologies exist-- including restriction-fragment-length polymorphism (RFLP) Bostein et al (1980) Am J Hum Genet 32:314-331; single strand conformation polymorphism (SSCP) Fischer et al.
(1983) Proc Natl Acad Sci USA $O:1S79-1583, Orita et al. (1989) Genomics 5:874-879;
amplified fragment-length polymorphism (AFLP) Vos et al. (1995) Nucleic Acids Res 23:4407-4414; microsatellite or simple-sequence repeat (SSR) Weber JL and May PE
1S (1989) Am J Hum Genet 44:388-396; random-amplified polymorphic DNA (RAPD) Williams et al (1990) Nucleic Acids Res 18:6531-6S3S; sequence tagged site (STS) Olson et al. (1989) Science 245:1434-1435; genetic-bit analysis (GBA) Nikiforov et al (1994) Nucleic Acids Res 22:4167-4175; allele-specific polymerase chain reaction (ASPCR) Gibbs et al. ( 1989) Nucleic Acids Res 17:2437-2448, Newton et al. ( 1989) Nucleic Acids Res 17:2503-2516; nick-translation PCR (e.g., TaqMan"~ Lee et al. (1993) Nucleic Acids Res 21:3761-3766; and allele-specific hybridization (ASH) Wallace et al.
(1979) Nucleic Acids Res b:3S43-3SS7, (Sheldon et al. (1993) Clinical Chemistry 39(4):718-719) among others-- with each technology having its own particular basis for detecting polymorphisms in DNA sequence.
2S The development of polymorphic genetic markers has made it possible fog . quantitative and molecular geneticists to investigate what Edwards, et al., in Genetics 115:113 (1987) referred to as "quantitative trait loci" (QTL), as well as their numbers, magnitudes and distributions. QTL include genes that control, to some degree, numerically representable phenotypic traits (disease resistance, crop yield, resistance to environmental extremes, etc.), that are distributed within a family of individuals as well as within a population of families of individuals. An experimental paradigm has been developed to identify and analyze QTL. This paradigm involves crossing two inbred lines and genotyping multiple marker loci and evaluating one to several quantitative phenotypic traits among the progeny of the cross. QTL are then identified and ultimately selected for based on significant statistical associations between the genotypic values determined by genetic marker technology and the phenotypic variability among the segregating progeny.
Unfortunately, complete sets of genetic markers are not available for a variety of important crops, making it difficult to quickly assess the genotype of any particular individual. For example, although soybeans are a major cash crop which provide most of the world's protein and vegetable oils, complete sets of genetic markers which span the soybean genome are not available. Accordingly, there exists a need to develop genetic markers for genotyping, marker assisted selection, positional cloning of nucleic acids and the like, e.g., in soybean. This invention provides these and many other features.
SUMMARY OF THE INVENTION
New sequence polymorphisms at 63 different loci are described.
Identification of these alleles provides compositions and methods for rapidly determining the complete genotype of a soybean plant. This ability to determine, accurately and quickly, the genotype of a soybean plant provides for improved methods of marker assisted selection in plant breeding and in analysis of transgenic soybean cells and plants.
Example technologies which may be used to detect the loci include allele-specific hybridization (ASH), the polymerase chain reaction (PCR), random-amplified polymorphic DNA (RAPD), restriction-fragment-length polymorphism (RFLP), single strand conformation polymorphism (SSCP), allele-specific polymerase chain reaction (ASPCR), genetic-bit analysis (GBA), nick-translation PCR (TaqMan~), hybridization to solid phase arrays (e.g., very large scale immobilized polymer arrays (VLSIPS
arrays)), and the like.
In one embodiment, methods of detecting one or more genetic nucleotide polymorphism in a biological sample from a soybean plant are provided by hybridizing a probe nucleic acid to one of the loci described herein. For example, a biological sample derived from a soybean plant is provided, and a probe nucleic acid is hybridized to a target nucleic acid including a nucleotide polymorphism from the locus.
Preferred loci include pA060A, pA077A, pA08bA, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLTbSA, php0226SA, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A,php02636A, php03522A,php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pLO58A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, SOYBPSP, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A. Particularly preferred "php" loci inclu_ de php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php 10355B php02329A, php02371 A, php05290A, php02376A, and php 10078A. In certain embodiments where more than one loci are detected, at least one of the detected loci will typically be a locus with a "php" designation. One newly discovered advantage for all of the loci noted above is that probes which specifically hybridize to the selected locus do not specifically hybridize to additional loci in the soybean genome because the loci are all unique in the soybean genome. In preferred aspects, the loci and included polymorphic nucleotides are in linkage disequilibrium with a Quantitative Trait Locus (QTL) such as resistance to soybean cyst nematode, brown stem rot, phytopthora rot or the like. Accordingly, the presence or absence of the detected locus corresponds to the presence or absence of a particular QTL.. A variety of probe nucleic acids which hybridize to the loci are provided by the invention. The probes include the amplicons, PCR primers, and the like, described herein, which are used to identify and detect the loci.
Hybridization of a probe to a locus is detected to confirm the presence of the locus and typically to determine whether a particular polymorphic nucleotide is present at the locus. This detection is performed directly or indirectly.
Direct detection.
methods of detecting hybridization include Southern analysis, northern analysis, array-dependent nucleic acid hybridization on a nucleic acid polymer array, in situ hybridization, or other methods which directly monitor the hybridization.
Indirect detection includes, e.g., detection of an amplification product which is dependent on hybridization of the probe to the target nucleic acid. For example, the polymerase chain reaction (PCR) and/or the ligase chain reaction (LCR) are used to monitor.
hybridization, e.g., by detecting formation of an amplicon which is synthesized only if a probe (e.g., a PCR primer) hybridizes to the target. Similarly, the probe or target is optionally 5 PC'T/US98/26935 amplified prior to detection. Preferred amplification methods include PCR, LCR, and cloning of the target nucleic acid.
In several embodiments, it will be desirable to detect more than one locus.
For example, detection of multiple loci from a biological sample provide a way of S providing an overall genotype of the biological sample. Thus, in one embodiment, a second probe nucleic acid is hybridized to a second target nucleic acid linked to a second nucleotide polymorphism in a second locus selected from the second group of loci consisting of the loci noted above. Similarly, a plurality of probes (a third, fourth, fifth... nth probe) are hybridized to a plurality (a third, fourth, fifth...
nth) polymorphic nucleotide in one of the loci noted above. In one embodiment, a majority of the noted loci are detected. In an another preferred embodiment, all of the loci noted above are detected, thereby providing a comprehensive genotype. Similarly, 10 % , 20 % , 30 % , 40 % , 50 % , 60 % , 70 % , 80 % , 90 % , 100 % , or any intermediate percentage thereof of the loci are detected in alternate embodiments.
The methods are applicable to detection of loci and genotyping in a variety of biological samples, including a soybean plant, a soybean plant extract, an isolated soybean plant tissue, an isolated plant tissue extract, a soybean plant cell culture, a soybean plant cell culture extract, a recombinant cell comprising a nucleic acid derived from a soybean plant, a soybean plant seed, and an extract of a recombinant cell comprising a nucleic acid derived from a soybean plant.
The target nucleic acid which is detected can include the first polymorphic nucleotide, or it may be proximal to the polymorphic nucleotide. In typical embodiments, the target nucleic acid includes polymorphic nucleotide to be detected.
However, depending on how the target nucleic acid is detected, it can also be convenient to detect nucleotides proximal to the polymorphic nucleotide. For example, when LCR
is used, the presence or absence of a polymorphic nucleotide is detected by amplifying nucleotide regions flanking the polymorphic nucleotide.
In one aspect, the present invention includes marker-assisted selection of soybean plants, e.g., by detecting any of the loci noted above and selecting a plant based upon the presence or absence of one or more desired polymorphic nucleotide.
In another aspect, nucleic acids corresponding to nucleotides proximal to or including the marker nucleic acids are cloned. In a particularly preferred aspect, a nucleic acid flanked by two nucleic acid loci is cloned. Typically, this cloned nucleic acid includes a coding sequence. The cloned nucleic acid is optionally transduced into cells or plants, e.g., to make transgenic plants (e.g., soybean) expressing the coding sequence.
In one aspect, nucleotide polymorphisms proximal to the selected loci are identified and mapped, e.g., by genetic mapping or nucleotide sequencing of nucleic acid regions genetically linked to the selected locus from genetically diverse strains of _ soybean. The identification of these additional polymorphisms provides additional marker regions which are used to identify the source of a soybean nucleic acid.
Similar to the detection methods above, nucleotide polymorphisms are also detected by separating nucleic acids having the polymorphisms by size and or charge, thereby separating the nucleic acids. For example, single-strand conformation polymorphism can be performed on two or more nucleic acids on a polyacrylamide gel.
Amplification methods and compositions for detecting nucleic acids linked to loci are also provided. Typical amplification methods of the present invention include PCR, asymmetric PCR, and LCR. For example, methods of amplifying a nucleic acid with a first primer nucleic acid to a template nucleic acid and amplifying a portion of the template nucleic acid with a template-dependent polymerise enzyme or a ligase enzyme are provided. The primer hybridizes under stringent conditions to a locus nucleic acid from one of the loci described above. Typical amplification primer lengths are less than 100 nucleotides, although they may be longer or shorter, e.g., between about 10 and SO
nucleotides, typically between about 1S and 25 nucleotides, or as long as or longer than 100-200 nucleotides, or the like. Where the primer is a PCR primer, the primer provides a polymerise extendible substrate and the primer-dependent polymerise extends the primer. In, one aspect, the primer is an allele-specific primer. In typical LCR
amplification methods, the first primer hybridizes adjacent to a second primer on the template nucleic acid and the first and second primers are ligated with a ligase enzyme, thereby amplifying the portion of the template hybridized to the first and second primers.
In PCR methods, the method includes hybridizing a second primer to the template, wherein the first and second primer hybridize to complementary strands of the template nucleic acid.
Amplification mixtures for practicing the amplification methods are also provided. For example, a PCR reaction mixture having, e.g., a polymerise enzyme, deoxynucleotides, a template nucleic acid comprising a polymorphic nucleotide which 7 PCT/US98lZ6935 hybridizes under stringent conditions to a locus as above, and primers which specifically hybridize to the template nucleic acid are also provided. Primers include the PCR
primers described herein, and additional primers selected to amplify portions of the amplicons described herein. As noted, the primers are optionally allele-specific primers to facilitate quantitative PCR.
PCR amplicons are also provided, including nucleic acids having' a polymorphic nucleotide. The amplicon hybridizes under stringent conditions to a locus selected from a group of loci consisting of those set forth above. Exemplar amplicons are described herein. Particularly preferred amplicons include php11138 and php11627.
A variety of additional compositions are also provided by the present invention. One class of compositions has a first recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to a first allele from a locus in the soybean genome selected from the above loci, where the first recombinant nucleic acid shows decreased hybridization affinity for a second allele from the selected locus. The composition optionally includes one or more additional recombinant nucleic acids (i. e., additional probes) which differentially hybridize under allele-specific hybridization conditions to a second allele from a selected locus, wherein the second nucleic acid shows decreased hybridization affinity for the first allele from the selected locus. For example, mufti-color hybridization nucleic acid probe hybridization techniques such as comparative genomic hybridization (CGH) or fluorescence in situ hybridization (FISH) can be used to detect different alleles on different chromosomes.
In another aspect, a composition including a recombinant nucleic acid which specifically hybridizes to a first allele-specific probe and a second allele-specific probe is provided. The recombinant nucleic acid can be a probe, target nucleic acid, chromosomal nucleic acid, recombinant nucleic acid or the like. The first and second allele-specific probes hybridize under allele-specific hybridization conditions to a first haplotype of a locus in the soybean genome noted above. The composition optionally comprises additional materials such as allele-specific probes for the detection of the nucleic acid, or the like.
Sets of nucleic acid probes are also provided, including sets of nucleic acid probes having a plurality of probe nucleic acids which specifically hybridize to a plurality of target nucleic acids which hybridize under stringent conditions to a plurality of the loci noted above. The sets may be in any of a variety of physical arrangements, including WO 99/31964 g PCTNS98/26935 arrays, containers, or the like. In a particularly preferred embodiment, the set is in kit form, i. e., having the set of nucleic acids, and optionally comprising one or more additional component such as a container, instructional materials, one or more control target nucleic acids, and recombinant cells comprising one or more target nucleic acids.
Transgenic plants are provided. In particular, a transgenic plant having a recombinant nucleic acid which hybridizes under stringent conditions to a target nucleic acid is provided.' The target nucleic acid is genetically linked to (and preferably comprises) a nucleotide polymorphism from a locus selected from the group of loci noted above. In a preferred embodiment, the recombinant nucleic acid comprises a coding sequence encoded by a gene in linkage disequilibrium with a Quantitative Trait Locus (QTL). Example QTL include a QTL for resistance to soybean cyst nematode, a QTL
for resistance to brown stem rot, and a QTL for resistance to phytopthora rot.
Definitions A "polymorphism" is a change or difference between two related nucleic acids. A "nucleotide polymorphism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence. A "genetic nucleotide polymorphism" refers to a nucleotide which is different in one sequence when compared to a related sequence when the two nucleic acids are aligned for maximal correspondence, where the two nucleic acids are genetically related, i.e., homologous, e.g., where the nucleic acids are isolated from different strains of a soybean plant, or from different alleles of a single strain, or the like.
A "biological sample" is a portion of material isolated from a biological source such as a plant, isolated plant tissue, or plant cell, or a portion of material made from such a source such as a cell extract or the tike.
A "probe nucleic acid" is an RNA or DNA or analogue thereof. The probe may be of any length. Typical probes include PCR primers, PCR amplicons, cloned genomic nucleic acids encoding a genetic locus of interest, and the like.
"Marker assisted selection" refers to the process of selecting a desired trait or desired traits in a plant or plants by detecting one or more nucleic acids . from the plant, where the nucleic acid is associated with the desired trait.
A "locus" is a nucleic acid region where a polymorphic nucleic acid resides.
A "genetic marker" is a region on a genomic nucleic acid mapped by a marker nucleic acid. A "marker nucleic acid" is a nucleic acid which is an indicator for the presence of a marker locus. The marker can be either a probe nucleic acid which identifies a target nucleic acid genetically linked to the locus, or a sequence hybridized by the probe, i.e., a genomic nucleic acid linked to the locus. Typically, a probe will be used to hybridize to or amplify the locus. Example markers include isolated nucleic acids from the locus, cloned nucleic acids comprising the locus, PCR primers for amplifying the locus, and the like.
Two nucleic acid sequences are "genetically linked" when the sequences are in linkage disequilibrium.
A "vector" is a carrier composition which assists in transducing, transforming or infecting a cell with a nucleic acid, thereby causing the cell to express vector associated nucleic acids and, optionally, proteins other than those native to the cell, or in a manner not native to the cell. The term vector includes nucleic acid (ordinarily RNA or DNA) to be expressed by the cell (a "vector nucleic acid").
A
vector optionally includes materials to aid in achieving entry of the nucleic acid into the cell, such as a retroviral particle, liposome, protein coating or the like.
A "promoter" is an array of nucleic acid control sequences which directs transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II
type promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor elements which can be located as much as several thousand base pairs from the start site of transcription. A "constitutive" promoter is a promoter which is active in a selected organism under most environmental and developmental conditions. An "inducible" promoter is a promoter which is under environmental or developmental regulation in a selected organism.
The terms "isolated" or "biologically pure" refer to material which is substantially or essentially free from components which normally accompany it as found in its native state.
WO 99/31964 to PCT/US98/26935 DETAILED DISCUSSION OF THE INVENTION
The present invention resides in part in the identification of new marker loci for soybean plants. The loci include polymorphic nucelotides which vary depending .
on the particular strain of soybean considered. These loci are used in plant breeding projects, e.g., for marker-assisted selection, for positional cloning of linked nucleic acid regions of soybean chromosomal nucleic acids, and the like. Because the sequences of the loci and surrounding regions are provided, it is possible to easily select appropriate probes for detection of particular polymorphic nucleotides, e.g., by allele-specific hybridization, PCR amplification, or the like. The polymorphic nucleotides are detected directly, or by detecting nucleic acids in linkage disequilibrium with the loci.
The described loci which are prefixed by "php" were previously completely unknown, with no information being publicly available regarding the loci.
Some of the other loci (those not preface by "php") were previously identified by binding to RFLP probes; however, no sequence information regarding the loci was available and it was, therefore, not possible to design probes which hybridize specifically to polymorphic nucleotides. The locus SOYBPSP was previously sequenced in an intron of a gene, but no polymorphism was previously identified at the locus. Clones comprising loci not prefaced by "php" i.e., comprising publicly available RFLP
probes are publicly available from Biogenetic Services, Inc. (Brookings SD) and PE
AgGen, Inc. (formerly Linkage Genetics) (Salt Lake City, UT).
Uses for Nucleotide Po~morphisms The nucleotide polymorphisms described here are used, e.g., for DNA-fingerprinting soybean varieties, genetic-linkage mapping of the soybean genome, marker association with specific genes or quantitative-trait loci (QTL) affecting phenotypic traits, marker-assisted selection for preferred genotypes in soybean breeding, positional cloning of genes from the soybean genome and other purposes that will be apparent upon complete review of this disclosure.
DNA Fingerprinting DNA fingerprinting is the application of multiple genetic markers to DNA
extracted from an individual or pooled group of related individuals, such as an inbred line or variety, so that the cumulative marker allele profile provides a description of the variety's overall genotype. Comparisons of these marker allele profiles can be made among samples of different varieties, or among different samples of the same variety, and estimates can be made regarding the genetic relationships among these samples.
These estimates can be obtained without knowing the pedigrees of the fingerprinted varieties.
DNA fingerprints are used to determine whether a soybean variety is described and used within guidelines established, e.g., by Plant Variety Protection and patent laws. Seed of one variety may be sold by two different vendors using different variety names, or a variety may have been genetically bred from another variety by repeated cycles of backcrossing and selection to the extent that the new variety was essentially derived from the original variety. In these situations, questions about ownership may need to be answered. DNA-fingerprint profiles can be obtained from each variety or seed source and compared. If done properly, the data will show with a high probability whether or not the different samples are genetically alike.
The polymorphisms at the 63 polymorphic loci described herein invention provide a basis for quickly and reliably comparing DNA-fingerprint profiles in soybean. These 63 loci are well distributed around the soybean genome and provide an adequate number of loci to make reasonable conclusions regarding variety or seed-source identities and relationships.
DNA fingerprinting is used by soybean breeders to identify diverse breeding parents, to create novel recombinant populations, to better understand the structure of the soybean genome, and to better understand the history of pedigree breeding. The 63 example loci herein provide sufficient polymorphisms to estimate genetic relationships among soybean varieties.
DNA fingerprinting is used by soybean seed companies to estimate genetic purity of a seedlot for quality control and labeling of their product, and to identify the variety source of any contamination found among fingerprinted samples. These 63 loci provide sufficient polymorphisms to estimate genetic purity within a seedlot.
Genetic Mapping of the Soybean Genome Genetic mapping is done by finding polymorphic markers that are genetically linked to each other (in linkage groups) or linked to genes or QTL
affecting phenotypic traits of interest within a segregating population. The alignment of markers into linkage groups is useful as a reference for future use of the markers and for accurately positioning genes or QTL relative to the markers. The nucleotide polymorphisms described here for 63 exemplar loci provide a means to utilize these loci in genetic mapping studies in soybean. Many of these loci have multiple sub-loci and haplotypes across the sub-loci. Each haplotype provides a different allele composition within a locus, thereby expanding the utility of these marker loci to more soybean mapping studies than possible with only two alleles per locus. Many of these 63 loci were intentionally selected for polymorphism development because they were widely dispersed among soybean genetic linkage groups and would therefore collectively maximize their utility for mapping the soybean genome. Because each of these 63 loci was designed to be a discrete individual locus, these loci cannot be confused with duplicate loci having similar sequence elsewhere in the genome. They therefore are excellent reference loci on any soybean genetic map and can be used to reliably align the same linkage groups from different maps.
Many of these marker loci were selected to develop additional nucleotide polymorphisms because they were found to be genetically linked to important QTL for disease resistance. The loci php05219A, php07659A, php10355B, and pK069A were found to cluster around a resistance QTL on group G; the loci pT155A, pBLT24A, and pBLT65A were clustered around a resistance QTL on group A; and the locus php02301A
was near a resistance QTL on group M. These three QTL provide resistance to soybean cyst nematode (Heterodera glycines Ichinohe) (See also, Webb et al. ( 1995) Theor Appl Genet 91:574-581). The loci php02636A on group C, php08584A on group S, and pK079A on group L26 were all linked to additional QTL for resistance to soybean cyst nematode. The locus pB032B on group J was near Rbs3 for resistance to brown stem rot. The loci pK418A and pA280A on group N were near Rps,, the locus pR045A on group F was linked to Rps3, the loci pA378A and pL183A on group G were near Rps4, and the locus pT005A on group G was near Rpss, all providing resistance to phytophthora rot.
Marker-Assisted Selection in Soybean Improvement After genes or a QTL and a marker or markers are mapped together and found to be in linkage disequiiibrium, it is possible to use those markers to select for the desired alleles of those genes or QTL - a process called marker-assisted selection (MAS).
In brief, a nucleic acid corresponding to the marker nucleic acid is detected in a biological sample from a plant to be selected. This detection can take the form of hybridization of a probe nucleic acid to a marker, e.g., using allele-specific hybridization, Southern analysis, northern analysis, in situ hybridization, hybridization of primers followed by PCR amplification of a region of the marker or the like. A
variety of procedures for detecting markers are described herein. After the presence (or absence) of a particular marker in the biological sample is verified, the plant is selected, i.e., used to make progeny plants by selective breeding.
Nucleotide polymorphisms were developed at markers near numerous resistance loci in soybean that are effective against soybean cyst nematode, Phytophtl:ora sojae (phytophthora rot), and Phialophora gregata (brown stem rot). These are among the most damaging pathogens to soybeans in North America.
Soybean breeders need to combine disease resistance loci with genes for high yield and other desirable traits to develop improved soybean varieties.
Disease screening for large numbers of samples can be expensive, time consuming, and unreliable. Use of the nucleotide polymorphisms described here and genetically-linked nucleotides as genetic markers for disease resistance loci is an effective method of selecting resistant varieties in breeding programs. When a population is segregating for multiple loci affecting multiple diseases, the efficiency of MAS compared to phenotypic screening becomes even greater because all the loci can be processed in the lab together from a single sample of DNA. Another advantage over field evaluations for disease reaction is that MAS can be done at any time of year regardless of the growing season.
Moreover, environmental effects are irrelevant to marker-assisted selection.
Another use of MAS in plant and animal breeding is to assist the recovery of the recurrent parent genotype by backcross breeding. Backcross breeding is the process of crossing a progeny back to one of its parents. Backcrossing is usually done for the purpose of introgressing one or a, few loci from a donor parent into an otherwise desirable genetic background from the recurrent parent. The more cycles of backcrossing that is done, the greater the genetic contribution of the recurrent parent to the resulting variety. This is often necessary, because resistant plants may be otherwise undesirable, i.e., due to low yield, low fecundity, or the like. In contrast, strains which are the result of intensive breeding programs may have excellent yield, fecundity or the like, merely being deficient in one desired trait such as resistance to a particular pathogen.
The 63 marker loci described in the Examples below are distributed around the soybean genome and are used to select for the recurrent-parent genotype.
MAS for the recurrent-parent genotype can be combined with MAS for the disease resistance loci using these markers. Accordingly, it is possible to use the markers to introduce disease resistance QTL into plant varieties having an otherwise desirable genetic background using the markers of the invention for selection of the QTL
and for selection of the otherwise desirable background.
Positional Cloning in Soybean Positional gene cloning uses the proximity of a mapped gene and its linked markers to physically define a cloned chromosomal fragment that contains a desired gene. If two or more markers flanking the gene are physically close to each other, they may hybridize to the same DNA fragment, thereby identifying a clone on which the gene is located. If flanking markers are more distant from each other, a fragment containing the gene may be identified by constructing a contig of overlapping clones.
Recently, BAC (bacterial artificial chromosome) and YAC (yeast artificial chromosome) libraries containing large fragments of soybean DNA have been constructed Funke RP and Kolchinsky A (1994) CRC Press, Boca Raton, FL, pp125-308 1994;
Marek LF and Shoemaker RC (1996) Soybean Genet Newsl 23:126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24:196-198. These libraries and advances in genetic mapping make positional cloning of soybean genes feasible using the markers identified herein.
A marker is ideally locus-specific to reliably identify a clone from the targeted chromosomal region. The soybean genome is highly duplicated (Shoemaker et al (1996) (Glycine subgenus sofa) Genetics 144:329-338, but each nucleotide polymorphism and its PCR primers described here is specific to a single locus in the soybean genome and therefore correctly identifies soybean clones that hybridize to a corresponding probe DNA sequence corresponding to a particular target genomic location. Some of these marker loci are closely linked to agronomically important genes, such as genes for resistance to soybean cyst nematode and fungal pathogens, and are used as locus-specific reference points in positional cloning efforts for these genes.
Makin and Using Markers for Detection of Polymornhic Nucleic Acids The ability to characterize an individual by its genome is due to the inherent variability of genetic information. Although DNA sequences which code for necessary proteins are well conserved across a species, there are regions of DNA which are non-coding or code for portions of proteins which do not have critical functions and therefore, absolute conservation of nucleic acid sequence is not strongly selected for.
These variable regions are identified by genetic markers. Typically, genetic markers are bound by probes such as oligonucleotides or amplicons which bind to variable regions of the genome. In some instances, the presence or absence of binding to a genetic marker identifies individuals by their unique nucleic acid sequence. In other instances, a marker binds to nucleic acid sequences of all individuals but the individual is identified by the position in the genome bound by a marker probe.
The major causes of genetic variability are addition, deletion. or point mutations, recombination and transposable elements within the genome of individuals in a plant population.
Point mutations are typically the result of inaccuracy in DNA replication.
During meiosis in the creation of germ cells or in mitosis to create clones, DNA
polymerase "switches" bases, either transitionally {i. e. , a purine for a purine and a pyrimidine for a pyrimidine) or transversionally (i. e., purine to pyrimidine and vice versa). The base switch is maintained if the exonuclease function of DNA
polymerase does not correct the mismatch. At germination, or the next cell division (in clonal cells), the DNA strand with the point mutation becomes the template for a complementary strand and the base switch is incorporated into the genome. Transposable elements are sequences of DNA which have the ability to move or to jump to new locations within a genome and several examples of transposons are known in the art.
Given the sequences herein, one of skill cari generate probe nucleic acids for detecting markers, including probes which are PCR primers, allele-specific probes, PCR amplicons and the like for the detection of polymorphic nucleotides at the loci disclosed herein, as well as genetically-linked sequences.
Cloning methodologies for replicating nucleic acids and sequencing methods to verify the sequence of nucleic acids are well known in the art.
Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enrymology volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A
Laboratory Manual (2nd ed:) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook); and Current Protocols in Molecular Biology, F. M. Ausubel et al. , eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (through and including the 1997 Supplement) (Ausubel). A
catalogue of Bacteria and Bacteriophages useful for cloning is provided, e.g., by the ATCC, e.g., The ATCC Catalosue of Bacteria and Bacteriopha~e (1992) Gherna et al.
(eds) published by the ATCC. Additional basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Lewin (1995) Genes V Oxford University Press Inc., NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books. NY.
Product information from manufacturers of biological reagents and experimental equipment also provide information useful in known biological methods.
Such manufacturers include the Sigma Chemical Company (Saint Louis, MO); New England Biolabs (Beverly, MA); R&D systems (Minneapolis, MN); Pharmacia LKB
Biotechnology (Piscataway, NJ); CLONTECH Laboratories, Inc. (Palo Alto, CA);
ChemGenes Corp., (Waltham MA) Aldrich Chemical Company (Milwaukee, WI); Glen Research, Inc. (Sterling, VA); GIBCO BRL Life Technologies, Inc.
(Gaithersberg, MD);
Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland);
Invitrogen (San Diego, CA); Perkin Elmer (Foster City, CA); and Strategene; as well as many other commercial. sources known to one of skill. As described previously, genetic markers and some RFLP probes are available from Biogenetic Services, Inc.
(Brookings, SD), Linkage Genetics (Salt Lake City, ~UT-- a subsidiary of Perkin Elmer, Branchburg NJ) .
The nucleic acid compositions of this invention, whether DNA, RNA, cDNA, genomic DNA, or analogues thereof, or a hybrid of these molecules, are isolated from biological sources or synthesized in vitro. The nucleic acids of the invention are present in transfected whole cells, in transfected cell lysates, in transgenic plants (especially soybean) or in partially purified or substantially pure form.
In vitro amplification techniques suitable for amplifying sequences for use as molecular probes or generating nucleic acid fragments for subsequent subcloning are known. Examples of techniques sufficient to direct persons of skill through such in vitro amplification methods, including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), QJ3-replicase amplification and other RNA polymerase mediated techniques (e.g., NASBA) are found in Berger, Sambrook, and Ausubel, as well as Mullis et al., (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press Inc. San Diego, CA (1990) (Innis);
Arnheim & Levinson (October 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3, 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173;
Guatelli et WO 99/31964 I ~ PCTNS98/26935 al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin.
Chem 35, 1826; Landegren et al., (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace, ( 1989) Gene 4, 560; Barringer et al. ( 1990) Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al., U.S.
Pat. No.
5,426,039. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausbel, Sambrook and Bergen, all supra.
Oligonucleotides for use as probes, e.g., in in vitro amplification methods, for use as gene probes, or as inhibitor components (e.g., ribozymes) are typically synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts., 22(20):1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al.
(1984) Nucleic Acids Res., 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill.
Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide get electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149. The sequence of the synthetic oligonucleotides can be verified using the chemical degradation method of Maxam and Gilbert (1980) in Grossman and Moldave (eds.) Academic Press, New York, Methods in Enzymology 65:499-560.
Providing Large Nucleic Acid Templates In certain applications it is advantageous to make or clone large nucleic acids which encompass multiple loci, or to detect, clone, or isolate nucleic acids linked to polymorphic nucleotides. For example, as described supra, in one embodiment, positional cloning is used to isolate nucleic acids proximal to polymorphic nucleotides, e.g., at more than one locus. These nucleic acids are in linkage disequilibrium with the polymorphic nucleotides, i. e., they are genetically linked to the polymorphic nucleotides on a chromosomal nucleic acid. It will be appreciated that a nucleic acid genetically linked to a polymorphic nucleotide optionally resides up to about 50 centimorgans from the polymorphic nucleic acid, although the precise physical distance will vary depending on the cross-over frequency of the particular chromosomal region. Typical distances from a polymorphic nucleotide are in the range of I-50 centimorgans, for example, less than 1-S, about 1-5, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 centimorgans, etc.
Many methods of making large recombinant RNA and DNA nucleic acids, including recombinant plasmids, recombinant lambda phage, cosmids, yeast artificial chromosomes (YACs), P1 artificial chromosomes, Bacterial Artificial Chromosomes (BACs), and the like are known. A general introduction to YACs, BACs, PACs and MACs as artificial chromosomes is described in Monaco and Larin ( 1994) Trends Biotechnol I2(7):280-286. Examples of appropriate cloning techniques for making large nucleic acids, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Sambrook, and Ausubel, all supra.
In one aspect, nucleic acids hybridizing to the polymorphic nucleic acids disclosed herein (or linked to such nucleic acids) are cloned into large nucleic acids such as YACs, or are detected in YAC genomic libraries cloned from soybean. The construction of YACs and YAC libraries is known. See, Berger, supra, and Burke et al.
(1987) Science 236:806-812. Gridded libraries of YACs are described in Anand et al.
(1989) Nucleic Acids Res. 17, 3425-3433, and Anand et al. (1990) Nucleic Acids Res.
Riley (1990) 18:1951-1956 Nucleic Acids Res. 18(10):2887-2890 and the references therein describe cloning of YACs and related technologies. YAC libraries containing large fragments of soybean DNA have been constructed. See, Funke and Kolchinsky ( 1994) CRC Press,, Boca Raton, FL, pp. 125-308 1994; Marek and Shoemaker ( 1996) Soybean Genet Newsl 23:126-129 1996; Danish et al. (1997) Soybean Genet Newsl 24:196-198. See also, Ausubel, chapter 13 for a description of procedures for making YAC libraries.
Similarly, cosmids or other molecular vectors such as BAC and P1 constructs are also useful for isolating or cloning nucleic acids linked to poIymorphic nucleic acids. Cosmid cloning is also known. See, e.g., Ausubel, chapter 1.10.11 (supplement 13) and the references therein. See also, Ish-Horowitz and Burke {1981) Nucleic Acids Res. 9:2989-2998; Murray (1983) Phage Lambda and Molecular Cloning in Lambda ll (Hendrix et al. , eds) 395-432 Cold Spring Harbor Laboratory, NY;
Frischauf et al. (1983) J.MoI. Biol. 170:827-842; and, Dunn and Blattner (1987) Nucleic Acids Res. 15:2677-2698, and the references cited therein. Construction of BAC
and P1 libraries is known; see, e.g., Ashworth et al. (1995) Anal Biochem 224{2):564-571;
Wang et al. (1994) Genomics 24(3):527-534; Kim et al. (1994) Genomics 22(2):336-9;
Rouquier et al. ( 1994) Anal Biochem 217(2):205-9; Shizuya et al. ( 1992) Proc Natl Acad Sci U S A 89(18):8794-7; Kim et al. (1994) Genomics 22(2):336-9; Woo et al.
(1994) Nucleic Acids Res 22(23): 4922-31; Wang et al. (1995) Plant (3):525-33; Cai (1995) Genomics 29(2): 413-25; Schmitt et al. (1996) Genomics 1996 33(1):9-20; Kim et al.
(1996) Genomics 34(2):213-8; Kim et al. (1996) Proc Natl Acad Sci U S A
(13):6297-301; Pusch et al. (1996) Gene 183(1-2):29-33; and, Wang et al.-(1996)-Genome Res 6(7): 612-9.
Improved methods of in vitro amplification to amplify large nucleic acids linked to the polymorphic nucleic acids herein are summarized in Cheng et al.
( 1994) Nature 369:684-685 and the references therein.
In addition, any of the cloning or amplification strategies described above are.useful for creating contigs of overlapping clones, thereby providing overlapping nucleic acids which show the physical relationship at the molecular level for genetically linked nucleic acids. A common example of this strategy is found in whole organism sequencing projects, in which overlapping clones are sequenced to provide the entire sequence of a chromosome. In this procedure, a library of the organism's cDNA
or genomic DNA is made according to standard procedures described, e.g., in the references above. Individual clones are isolated and sequenced, and overlapping sequence information is ordered to provide the sequence of the organism. See also, Tomb et al. ( 1997) Nature 539-547 describing the whole genome random sequencing and assembly of the complete genomic sequence of Helicobacter pylori; Fleischmann et al.
(1995) Science 269:496-512 describing whole genome random sequencing and assembly of the complete Haemophilus influenzae genome; Fraser et al. (1995) Science 270:397-403 describing whole genome random sequencing and assembly of the complete Mycoplasma genitalium genome and Bult et al. (1996) Science 273:1058-1073 describing-whole genome random sequencing and assembly of the complete Methanococcus jannaschii genome. Recently, Hagiwara and Curtis (1996) Nucleic Acids Research 24(12):2460-2461 developed a "long distance sequencer" PCR protocol for generating overlapping nucleic acids from very large clones to facilitate sequencing, and methods of amplifying and tagging the overlapping nucleic acids into suitable sequencing templates.
The methods can be used in conjunction with shotgun sequencing techniques to improve the efficiency of shotgun methods typically used in whole organism sequencing projects.
WO 99/31964 2o PCTNS98/26935 As applied to the present invention, the techniques are useful for identifying and sequencing genomic nucleic acids genetically linked to the loci described.
Hybridization Strategies In a preferred aspect, a labeled probe nucleic acid is specifically hybridized to a marker nucleic acid from a biological sample and the label is detected, thereby determining that the marker nucleic acid is present in the sample. -For example, a marker comprising a polymorphic nucleic acid can be detected by allele-specific hybridization of a probe to the region of the marker comprising the polymorphic nucleic acid. Similarly, a marker can be detected by Southern analysis, northern analysis, in situ analysis, or the like.
Two single-stranded nucleic acids "hybridize" when they form a double-stranded duplex. The region of double-strandedness can include the full-length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid. "Stringent hybridization conditions" in the context of nucleic acid hybridization are sequence dependent and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen ( 1993), id. Generally, stringent conditions are selected to be about 5 ° C lower than the thermal melting point (T"~ for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Highly stringent conditions are selected to be equal to the Tm point for a particular probe. Sometimes the term "Td" is used to define the temperature at which at least half of the probe dissociates from a perfectly matched target nucleic acid. In any case, a variety of estimation techniques for estimating the Tm or Td are available, and generally described in Tijssen, id. Typically, G-C base pairs in a duplex are estimated to contribute about 3°C to the Tm, while A-T base pairs are estimated to contribute about 2°C, up to a theoretical maximum of about 80-100°C. However, more sophisticated models of TM and Td are available and appropriate in which G-C
stacking interactions, solvent effects, the desired assay temperature and the like are taken into account. In one example, PCR primers were designed to have a dissociation temperature (Td) of approximately 60°C, using the formula: Td = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5;
where ~GC, IIAT, and libp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA.
An example of stringent hybridization conditions for hybridization of complementary nucleic acids that have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formalin with 1 mg of heparin at 42°C, with the hybridization being carried out overnight., An example of stringent wash conditions for a Southern blot of such nucleic acids is a 0.2x SSC wash at 65°C for 15 minutes (see, Sambrook, supra for a description of SSC buffer). Often the high stringency wash is preceded by a low stringency wash to remove background probe signal. An example low stringency wash is 2x SSC at 40°C for 15 minutes.
In general, a signal to noise ratio of 2x (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. For highly specific hybridization strategies such as allele-specific hybridization, an allele-specific probe is usually hybridized to a marker nucleic acid (e.g., a genomic nucleic acid, an amplicon, or the like) comprising a polymorphic nucleotide under highly stringent conditions.
Allele-Specific Hybridization (ASH) One especially preferred example of a hybridization technology for detecting marker nucleic acids is allele-specific hybridization, or "ASH."
This technology is based on the stable annealing of a short, single-stranded oligonucleotide probe to a single-stranded target nucleic acid only when base pairing is completely complementary. The hybridization can then be detected from a radioactive or non-radioactive label on the probe (methods of labeling probes and other nucleic acids are set forth in detail below).
ASH markers are polymorphic when their base composition at one or a few nucleotide positions in a segment of DNA is different among different genotypes.
For each polymorphism, two or more different ASH probes are designed to have identical DNA sequences except at the polymorphic nucleotide(s). Each probe will have exact homology with one allele sequence so that the complement of probes can distinguish all the alternative allele sequences. Each probe is hybridized against the target DNA. With appropriate probe design and stringency conditions, a single-base mismatch between the probe and target DNA will prevent hybridization and the unbound probe will wash away. In this manner, only one of the alternative probes will hybridize to a target sample that is homozygous or homogeneous for an allele (an allele is defined by the DNA homology between the probe and target). Samples that are heterozygous or heterogeneous for two alleles will hybridize to both of two alternative probes. Having a probe for each allele allows the polymorphism to be genetically co-dominant which is useful in determining zygosity. In addition, a co-dominant ASH system is useful when hybridization does not occur for either one of two alternative probes, so that control experiments can be directed towards verifying insufficient target DNA or the occurrence of a new allele.
ASH markers are used as dominant markers where the presence or absence of only one allele is determined from hybridization or lack of hybridization by only one . probe. The alternative allele may be inferred from the lack of hybridization.
Heterogeneous target nucleic acids (i.e., chromosomal DNA from a multiallelic plant) are detected by monitoring simultaneous hybridization of two or more probes comprising different polymorphic nucleotides to a genomic nucleic acid.
Allele-specific hybridization was first described by Wallace et al. (1979) who showed that the hybridization between an oligonucleotide probe and bacteriophage target DNA, dissociated at about 10~ C lower temperature when the probe and target sequences had a single base-pair mismatch compared to when the probe and target DNA
had perfect homology. This difference in thermal stability allowed ASH probes to discriminate the two alleles determined by a single-nucleotide polymorphism between the wildtype sequence and a point mutation in the am-3 bacteriophage.
Later it was shown that a mixture of ASH probes, designed from the possible degenerate DNA sequences coding for a known amino acid sequence, could be used to identify clones containing the rabbit ~-globin DNA that coded for that protein (Wallace et al. (1981) Nuclei Acids Res 9:879-894). They also showed that the only probe that hybridized to the clones had exact homology to the clone, whereas three probes that did not hybridize to the clones had a single base-pair mismatch with the target DNA.
ASH markers have been developed to diagnose susceptibility to human diseases caused by point mutations in DNA sequence. Examples are for the CBS-globin allele that can cause sickle-cell anemia (Corner et al. (1983) Proc Natl Acad Sci USA
80:278-282), the ~~-thalassemia allele that can cause ~-thalassemia (Pirastu et al. (1983) New England J Med 309:284-287), the ~,-antitrypsin allele that can cause liver cirrhosis and pulmonary emphysema (Kidd (1983) Nature 304:230-234), the HLA-DR
haplatypes associated with immune response {Angelini et al. ( 1986) Proc Natl Acad Sci USA
83:4489-4493), and the A985G allele that can cause medium-chain acyl-CoA
dehydrogenase deficiency (Iitia A et al. (1994) BioTechniques 17:566-571).
ASH markers have also been developed to identify strains of fungi _ resistant to the fungicide benzimidazole because of specific point mutations in the ~B-tubulin gene in Venturia inaequalis (Koenraadt and Jones (1992) Phytopathology 82:1354-1358 and Rhynckosporium secalis (Wheeler et al. (1995) Pestic Sci 43:201-209).
An ASH probe is designed to form a stable duplex with a nucleic acid target only when base pairing is completely complementary. One or more base-pair mismatches between the probe and target prevents stable hybridization. This holds true for numerous variations of the process. The probe and target molecules are optionally either RNA or denatured DNA; the target molecules) is/are -any length of nucleotides beyond the sequence that is complementary to the probe; the probe is designed to hybridize with either strand of a DNA target; the probe ranges in size to conform to variously stringent hybridization conditions, etc.
The polymerise chain reaction (PCR) (see, e.g., Mullis KB and Faloona F
(1987) Methods Enrymol 155:335-350 and references supra) allows the target sequence for ASH to be amplified from low concentrations of nucleic acid in relatively small volumes (Koenraadt H and Jones AR (1992) Phytopatholog 82:1354-1358; Iitia et al.
(1994) BioTechniques 17:566-571). Otherwise, the target sequence from genomic DNA
is digested with a restriction endonuclease and size separated by gel electrophoresis (Corner et al. 1983). Hybridizations typically occur with the target sequence bound to the surface of a membrane or, as described in U.S. Patent 5,468,613, the ASH
probe sequence may be bound to a membrane.
Utilizing nucleotide alleles and polymorphisms described here, ASH data were obtained by amplifying nucleic acid fragments (amplicons) from genomic DNA
using PCR, transferring the amplicon target DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to the amplicon target, and observing the hybridization dots by autoradiography. These genetic markers have utility .in the improvement of soybean, an important crop plant that supplies much of the world's oil and protein.
Solid Phase Arrays In one variant, ASH technologies are adapted to solid phase arrays for the rapid and specific detection of multiple polymorphic nucleotides. Typically, an ASH
probe is linked to a solid support and a target nucleic acid (e.g., a genomic nucleic acid, or an amplicon) is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluorophore. Where the target is labeled, hybridization is detected by detecting bound fluorescence. Where the probe is labeled, hybridization is typically detected by quenching of the label. Where both the probe and the target are labeled, detection of hybridization is typically performed by monitoring a color shift resulting from proximity of the two bound labels. A variety of labeling strategies, labels, and the like, particularly for fluorescent based applications are described, supra.
In one embodiment, an array of ash probes are synthesized on a solid support. Using chip masking technologies and photoprotective chemistry it is possible to generate ordered arrays of nucleic acid probes. These arrays, which are known, e.g., as 1S "DNA chips," or as very Large scale immobilized polymer arrays ("VLSIPS""' arrays) can include millions of defined probe regions on a substrate having an area of about lcm2 to several cm2.
The construction and use of solid phase nucleic acid arrays to detect target nucleic acids is well described in the literature. See, Fodor et al. (1991) Science, 251:
767- 777; Sheldon,et al. (1993) Clinical Chemistry 39(4): 718-719; Kozal et al. (199b) Nature Medicine 2(7): 753-759 and Hubbell U.S. Pat. No. 5,571,639. See also, Pinkel et al. PCT/US95/16155 (WO 96/17958). In brief, a combinatorial strategy allows for the synthesis of arrays containing a large number of probes using a minimal number of synthetic steps.. For instance, it is possible to synthesize and attach all possible DNA
8mer oligonucleotides (48, or 65,536 possible combinations) using only 32 chemical synthetic steps. In general, VLSIPS'~ procedures provide a method of producing 4~
different oligonucleotide probes on an array using only 4n synthetic steps.
Light-directed combinatorial synthesis of oligonucleotide arrays on a glass surface is performed with automated phosphoramidite chemistry and chip masking techniques similar to photoresist technologies in the computer chip industry.
Typically, a glass surface is derivatized with a silane reagent containing a functional group, e.g., a hydroxyl or amine group blocked by a photolabile protecting group. Photolysis through a photolithogaphic mask is used selectively to expose functional groups which are then ready to react with incoming 5'-photoprotected nucleoside phosphoramidites.
The phosphoramidites react only with those sites which are illuminated (and thus exposed by removal of the photolabile blocking group). Thus, the phosphoramidites only add to those areas selectively exposed from the preceding step. These steps are repeated until the desired array of sequences have been synthesized on the solid surface.
Combinatorial synthesis of different oligonucleotide analogues at different locations on the.arr~y is determined by the pattern of illumination during synthesis and the order of addition of coupling reagents. Monitoring of hybridization of target nucleic acids to the array is typically performed with fluorescence microscopes or laser scanning microscopes.
In addition to being able to design, build and use probe arrays using available techniques, one of skill is also able to order custom-made arrays and array-reading devices from manufacturers specializing in array manufacture. For example, Affymetrix Corp. in Santa Clara CA manufactures DNA VLSIP'~ arrays.
It will be appreciated that probe design is influenced by the intended application. For example, where several allele-specific probe-target interactions are to be detected in a single assay, e.g., on a single DNA chip, it is desirable to have similar melting temperatures for all of the probes. Accordingly, the length of the probes are adjusted so that the melting temperatures for all of the probes on the array are closely similar (it will be appreciated that different lengths for different probes may be needed to achieve a particular Tm where different probes have different GC contents).
Although melting temperature is a primary consideration in probe design, other factors are optionally used to further adjust probe construction.
Chromosome Painting Technologies--In Situ Hybridization In one aspect, a marker is used as a chromosome probe to cytogenetically detect the presence of a polymorphic nucleic acid or region linked to the nucleic acid.
This can be especially useful because cytogenetic identification of a chromosomal region provides a way of determining the physical location of the region hybridized by the probe, i. e., in reference to other known markers.
Typically, a probe which hybridizes to a polymorphic nucleotide or a linked nucleic acid is chemically linked to a colorometric label, or fluorophore. The probe is used to paint the chromosome with the color label, thereby identifying regions which are hybridized by the label. Chromosome painting refers to the staining of specific metaphase or prophase chromosomes or regions of chromosomes with probe mixtures, e.g., probes hybridizing to the polymorphic nucleic acids of the invention, and optionally, additional probes hybridizing to additional regions. The painting signal is preferably obtained by fluorescence in situ hybridization (FISH) of such mixtures with the target genome. A variety of staining technologies for the detection of chromosomal differences (typically abnormalities) are known. See, Jauch et al., Hum.
Genet. , 85:145-150 (1990); Wier Chromosomal, 100:371-376 {1991); Van-den-Engh et al., Cytometry 6:92-100 ( 1988)' and Kaltoft et al. Arch. Dermatol. Res. , 279:293-298 ( 1987); Sealey et al. Nucleic Acids Res. 13:1905 (1985); Landegent et al. Hum. Genet., 77:366 (1987);
Nisson et al., BRL Focus, 13:42 (1991).
Comparative genomic hybridization (CGH) is also a known approach for identifying the presence and localization of sequences in a genome compared to a reference genome. See, Kallioniemi, et al. {1992) Science 258:818. CGH can provide a quantitative estimate of copy number and also provides information regarding the localization of amplified or deleted sequences in a normal chromosome.
Many in situ detection techniques are known and can be adapted to the present invention. Fluorescent in situ hybridization (FISH), reverse chromosome painting, FISH on DAPI stained chromosomes, generation of Alphoid DNA probes for FISH using PCR, PRINS labeling of DNA, free chromatin mapping, spectral karyotyping and a variety of other techniques described, e.g., in Tijssen (1993) Laboratory Techniaues in biochemistry and molecular biology--hybridization with nucleic acid probes parts I and II, Elsevier, New York, and, Choo (ed) (1994) Methods In Molecular Biology Volume 33- In Situ Hybridization Protocols Humana Press Inc., New Jersey (see also, other books in the Methods in Molecular Biology series).
These color-labeling strategies are useful for distinguishing the presence or absence of a chromosomal nucleic acid. They are also useful for the detection of multiple probes with multiple labels. In particular, chromosomes are optionally stained with multiple probes, optionally having multiple color labels. In this way, it is possible to quickly provide a genetic map of a sample at the molecular level.
Furthermore, it is possible to determine whether two polymorphic nucleotides from the same locus are present. For example, if two allele-specific probes with different color labels are hybridized to a chromosomal sample under allele-specific hybridization conditions, it is possible specifically to detect both polymorphic nucleotides. For example, where a first probe has a "blue" label, and a second probe has a "yellow" label, a sample which is homozygous for the polymorphic nucleotide specifically bound by the first probe will look "blue" to an observer, a sample which is homozygous for the polymorphic nucleotide specifically bound by the second probe will look "yellow" to an observer, while a sample which is heterozygous and binds both probes will appear "green"
to an S observer. It will be appreciated that many color combinations are possible.
For example, where the first fluorophore emits a "blue" light and a second fluorophore emits a "yellow" light, the effect to the observer is that a "green" signal is observed. It will be appreciated that a wide variety of emission characteristics can be monitored; indeed, even when the fluorophores emit a non-visible wavelength of light, a combination color can be assigned to a ratio between any two (or more, e.g., where more than two probes are used in an assay) wavelengths of light.
Amplification Detection Strategies In a preferred embodiment, a polymorphic nucleotide is detected by amplifying the polymorphic nucleotide and detecting the resulting amplicon. A
variety of 1 S variations on this strategy are used to detect polymorphic nucleic acids, depending on the materials available, and the like.
~1) PCR
In one embodiment, nucleic acids primers which hybridize to regions of a genomic nucleic acid that flank a polymorphic nucleotide to be detected are used in PCR
or LCR reactions to generate an amplicon comprising the polymorphic nucleotide. A
variety of PCR and LCR strategies are known in the art and are found in Bergen, Sambrook, Ausubel, and Innis, all supra. See also, as Mullis et al., (1987) U.S. Patent No. 4,683,202. In brief, a nucleic acid having a polymorphic nucleic acid to be detected (a genomic DNA, a genomic clone, a genomic amplicon or the like) is hybridized to 2S primers which flank the polymorphic nucleotide to be detected (e.g., nucleotide polymorphisms at a locus such as pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pASOSA, pASl9A, pAS88A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT6SA, php0226SA, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03S22A, php0S219A, phpOS233A, php0S278A, phpOS342A, php076S9A, php08S84A, php1210SA, php02340B, phpOS264A, php103SSB, pK069A, pK079A, pK401A, pK4I8A, pK644B, pLOSBA, pL183A, pR04SA, pRlS3A, pT00SA, pTISSA, php02329A, php02371A, pA343A, pA748B, phpOS290A, pA8S8A, php02376A, pG17.3, pB132A, php10078A and/or SOYBPSP, all WO 99/31964 2g PCT/US98/Z6935 described supra). Example primers which amplify the polymorphic nucleic acids are provided in the examples section below. The primers are extended in a PCR
reaction (typically including a thermostable polymerase enzyme such as Taq, deoxynucleotides, Mg++ and the like; See, Ausubel, Innis, Berger or Sambrook for typical PCR
conditions). The resulting PCR amplicons comprise the polymorphic nucleic acid to be detected. Exemplar amplicons include pha12105, pha12390, pha12391; pha12392;
pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha11076, pha10653, pha10598, pha10615, pha10646, pha10618, pha10620, pha10782, pha11131, phaI1132, pha10650, pha10651, pha11138, pha10637, pha11078, pha11079, pha11139, pha10655, phaI1701, pha11627, pha10633, pha11074, phall075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha10647, pha08230,pha13070, pha13071, pha13072, pha13073, pha13074, pha13158, pha13560, pha13561, pha14257 and pha14395.
Methods of detecting PCR amplicons are known, and can easily be adapted to detecting the amplicons of the invention. Detection is typically performed by running PCR reaction products out on an acrylamide or agarose gei and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the PCR amplicons (using standard Sanger dideoxy or Maxam-Gilbert methods). The polymorphic nucleotides in amplicons are optionally detected by cleaving the amplicon with a restriction enzyme that recognizes the polymorphic nucleic acid, in an adaptation of standard RFLP analysis.
In addition to the example primer nucleic acids in the examples section below, one of skill is easily able to select a variety of other primers which can be used in PCR amplification of nucleic acids comprising or proximal to polymoiphic nucleotides.
In particular, methods of amplifying large nucleic acids by PCR are summarized in Cheng et al. { 1994) Nature 369: 684-685 and the references therein, in which PCR
amplicons of up to 40kb are generated. More typically, standard PCR is used to create amplicons of between about 100 and about 5,000 nucleotides in length, e.g., using the techniques described in Ausubel and Innis, supra. In any case, primers that hybridize to essentially any region of an amplicon made using the primers of the invention are designed by reference to the sequence of the amplicon. The sequence of the primers are selected to hybridize to regions of the amplicon.
Amplicons are sequenced by any of a variety of protocols. Most DNA
sequencing today is carried out by chain termination methods of DNA
sequencing. The most popular chain termination methods of DNA sequencing are variants of the dideoxynucleotide mediated chain termination method of Sanger. See, Sanger et al, (1977) Proc. Nat. Acad. Sci., USA 74:5463-5467. For a simple introduction to dideoxy sequencing, see, Current Protocols in Molecular Biology, F. M. Ausubel et al.
, eds. , Current Protocols, a joint venture between Greene Publishing Associates, Inc.
and John Wiley & Sons, Inc., (Supplement 37, current through 1997) (Ausubel), Chapter 7.
Thousands of laboratories employ dideoxynucleotide chain termination techniques.
Commercial kits containing the reagents most~typically used for these methods of DNA
sequencing are available and widely used.
In addition to the Sanger methods of chain termination, new PCR
exonuclease digestion methods are available for DNA sequencing of PCR
amplicons.
Direct sequencing of PCR generated amplicons by selectively incorporating boronated nuclease resistant nucleotides into the amplicons during PCR and digestion of the amplicons with a nuclease to produce sized template fragments has been developed (Porter et al. {1997) Nucleic Acids Research 25(8):1611-1617). In the methods, reactions on a template are performed, in which one of the nucleotide triphosphates in the PCR reaction mixture is partially substituted with a 2'deoxynucleoside 5'-a[P-borano]-triphosphate. The boronated nucleotide is stocastically incorporated into PCR
products at varying positions along the PCR amplicon. An exonuclease which is blocked by incorporated boronated nucleotides is used to cleave the PCR amplicons. The cleaved amplicons are then separated by size using polyacrylamide gel electrophoresis, providing_ the sequence of the amplicon. An advantage of this method is that it requires fewer biochemical manipulations for sequencing an amplicon than performing standard Sanger-style sequencing of PCR amplicons.
Once an amplicon is sequenced, the sequence is optionally used to select primers complementary to the amplicon, i. e., primers which will hybridize to the amplicon. It is expected that one of skill is thoroughly familiar with the theory acrd practice of nucleic acid hybridization and primer selection. Gait, ed.
Oligonucleotide Synthesis: A Practical Approach, IRL Press, Oxford (1984); W.H.A. Kuijpers Nucleic Acids Research 18( 17), 5197 ( 1994); K. L. Dueholm J. Org. Chem. 59, 5767-(1994); S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology--hybridization with nucleic acid probes, e.g., part I chapter 2 "overview of principles of hybridization and the strategy of nucleic acid probe assays", Elsevier, New York provide a basic guide to nucleic acid hybridization. Innis, supra, provides an overview of primer selection.
One of skill will recognize that the 3' end of an amplification primer is more important for PCR than the 5' end. Investigators have reported PCR
products where only a few nucleotides at the 3' end of an amplification primer were complementary to a DNA to be amplified. In this regard, nucleotides at the 5' end of a primer can incorporate structural features unrelated to the target nucleic acid; for instance, in one embodiment, a sequencing primer hybridization site (or a complement to such a primer, depending on the application) is incorporated into the amplification primer, where the sequencing primer is derived from a primer used in a standard sequencing kit, such as one using a biotinylated or dye-labeled universal M13 or SP6 primer. The primers are typically selected so that there is no complementarity between any known target sequence and any constant primer region. One of skill will appreciate that constant regions in primer sequences are optional.
Typically, all primer sequences _are selected to hybridize only to a perfectly complementary DNA, with the nearest mismatch hybridization possibility from known DNA sequence typical having at least about 50 to 70 % hybridization mismatches, and preferably 100 mismatches for the terminal 5 nucleotides at the 3' end of the primer.
The primers are selected so that no secondary structure forms within the primer. Self complementary primers have poor hybridization properties, because the complementary portions of the primers self hybridize (i.e., form hairpin structures).
Primers are selected to have minimal cross-hybridization, thereby preventing competition between individual primers and a template nucleic acid and preventing duplex formation of the primers in solution, and possible concatenation of the primers during PCR. If there is more than one constant region in the primer, the constant regions of the primer are selected so that they do not self hybridize or form hairpin structures.
One of skill will recognize that there are a variety of possible ways of performing the above selection steps, and that variations on the steps are appropriate.
Most typically, selection steps are performed using simple computer programs to perform the selection as outlined above; however, all of the steps are optionally performed manually. One available computer program for primer selection is the MacVector~' program from Kodak. In addition to programs for primer selection, one of skill can easily design simple programs for any or all of the preferred selection steps.
One of skill will recognize that a wide variety of amplicons are provided by the present invention. In particular, amplicons are generated with the primers described herein. The amplicons can be generated by exponential amplification as described in the examples herein, or by linear amplification using a single specific primer, or by using one of the example primers below in conjunction with a set of random primers.
It will be appreciated that .the amplicons are characterized by a variety of physicochemical properties, including, but not limited to the following.
First, the amplicons of the invention are produced in an amplification reaction using the primers as described above, with genomic soybean nucleic acid as a template (or a derivative thereof, such as a cloned or in vitro amplified genomic nucleic acid). Second, single stranded forms of the amplicons (e.g., denatured amplicons) hybridize under stringent conditions to the template nucleic acid. Conditions for specific hybridization of nucleic acids, including amplicon nucleic acids are described above.
A third physicochemical property of amplicons of the invention is that they specifically hybridize to one or more of the primers in the examples section below. In particular, the primers used to make the ,amplicon will hybridize to the amplicon; indeed, in PCR amplification strategies, hybridization of the primers to the amplicon is usually required for amplification. Additional physicochemical properties of the amplicons are described in the examples section, where example amplicons are described with reference, e.g., to size and hybridization to particular primers.
~2) LCR
In another embodiment, LCR is used to amplify specifically a polymorphic nucleic acid. By detecting the amplification product, presence of the polymorphic nucleotide is confirmed. Detection is typically performed by running LCR
reaction products out on an acrylamide or agarose gel and detecting the size of the reaction products; alternatively, the products can be detected by allele-specific hybridization, by allele-specific hybridization to a polymer array as described supra, or by sequencing the LCR amplicons (using standard Singer dideoxy or Maxim-Gilbert methods).
Detection techniques such as PCR amplification or other in vitro amplification methods are also used to detect LCR products.
The ligation chain reaction (LCR; sometimes denoted the "Iigation amplification reaction" or "LAR") and related techniques are used as diagnostic methods for detecting single nucleotide variations in target nucleic acids. LCR
provides a .
mechanism for linear or exponential amplification of a target nucleic acid via ligation of complementary oligonucleotides hybridized to a target. This amplification is performed to distinguish target nucleic acids that differ by a single nucleotide, providing a powerful tool for the analysis of genetic variation in the present invention, i.e., for distinguishing polymorphic nucleotides.
The principle underlying LCR is straightforward: Oligonucleotides which are complementary to adjacent segments of a target nucleic acid are brought into proximity by hybridization to the target, and Iigated using a ligase. To achieve linear amplification of the nucleic acid, a single pair of oligonucleotides which hybridize to adjoining areas of the target sequence are employed: the oligonucleotides are ligated, denatured from the template and the reaction is repeated. To achieve exponential amplification of the target nucleic acid two pairs of oligonucleotides (or more) are used, each pair hybridizing to complementary sequences on e.g., a double-stranded target polynucleotide. After ligation and denaturation, the target and each of the ligated oligonucleotide pairs serves as a template for hybridization of the complementary oligonucleotides to achieve ligation. The ligase enzyme used in performing LCR
is typically thermostable, allowing for repeated denaturation of the template and ligated oligonucleotide complex by heating the ligation reaction.
LCR is useful as a diagnostic tool in the detection of genetic variation.
Using LCR methods, it is possible to distinguish between target polynucleotides which differ by a single nucleotide at the site of ligation. Ligation occurs only between oligonucleotides hybridized to a target polynucleotide where the complementarily between the oligonucleotides and the target is perfect, enabling differentiation between allelic variants of a gene or other chromosomal sequence. The specificity of ligation during LCR can be increased by substituting the more specific NAD+-dependant ligases such as E. coli ligase and (thermostable) Taq ligase for the less specific T4 DNA
ligase. The use of NAD analogues in the ligation reaction further increases specificity of the ligation reaction. See, U.S. Pat. No. 5,508,179 to Wallace et al.
Finally, multiple LCR reactions can be run simultaneously in a single reaction, or in parallel reactions for simultaneous detection of any or all of the nucleotide polymorphisms described herein.
13). TAS. 3SR and OB amplification Nucleotide polymorphisms are also detected using other in vitro detection methods, including TAS, 3SR and QQ amplification. (TAS), the self sustained sequence replication system (3SR) and the QQ replicase amplification system (QB), are reviewed in The Journal Of NIH Research (1991) 3, 81-94. The present invention may be practiced in conjunction with TAS (Kwoh, et al. ( 1989) Proc. Natl. Acad. Sci. USA 86, I
173 or the related 3SR (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874) for detecting single-base alterations in target nucleic acids by transcribing the target, annealing oligonucleotide primers to the transcript and ligating the annealed primers. QB
replication (Lomell et al. (1989) J. Clin. Chem 35, 1826) may also be used in conjunction with the ligation methods of the present invention to detect mismatches by performing QB amplification on DNA ligated by the methods of the present invention.
Labeling and Detecting Probes A probe for use in an in situ detection procedure, an in vitro amplification procedure (PCR, LCR, NASBA, etc.), hybridization techniques (allele-specific hybridization, in situ analysis, Southern analysis, northern analysis, etc. ) or any other detection procedure herein can be labeled with any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means. Useful labels in the present invention include spectral labels such as fluorescent dyes (e.g., fluorescein isothiocyanate, Texas red, rhodamine, digoxigenin, biotin, and the like), radiolabels (e.g., 3H, ~uI, 35S, ~4C, 32p~ 33p~ etc.), enzymes (e.g., horse-radish peroxidase, alkaline phosphatase etc.) spectral colorimetric labels such as colloidal gold or colored glass or plastic (e.g. polystyrene, polypropylene, latex, etc.) beads. The label may be coupled directly or indirectly to a component of the detection assay (e.g., a probe, primer, amplicon, YAC, BAC or the like) according to methods well known in the art. As indicated above, a wide variety of labels may be used, with the choice of label depending on sensitivity required, ease of conjugation with the compound, stability requirements, available instrumentation, and disposal provisions. In WO 99/31964 34 PCfNS98/26935 general, a detector which monitors a probe- target nucleic acid hybridization is adapted to the particular label which is used. Typical detectors include spectrophotometers, phototubes and photodiodes, microscopes, scintillation counters, cameras, film and the like, as well as combinations thereof. Examples of suitable detectors are widely available from a variety of commercial sources known to persons of skill.
Commonly, an optical image of a substrate comprising a nucleic acid array with particular set of probes bound to the array is digitized for subsequent computer analysis.
Because incorporation of radiolabeled nucleotides into nucleic acids is straightforward, this detection represents a preferred labeling strategy.
Exemplar technologies for incorporating radiolabels include end-labeling with a kinase or phoshpatase enzyme, nick translation, incorporation of radio-active nucleotides with a polymerase and many other well known strategies.
Fluorescent labels are also preferred labels, having the advantage of requiring fewer precautions in handling. Preferred labels are typically characterized by one or more of the following: high sensitivity, high stability, low background, low environmental sensitivity and high specificity in labeling. Fluorescent moieties, which are incorporated into the labels of the invention, are generally are known, including Texas red, digoxigenin, biotin, 1- and 2-aminonaphthalene, p,p'-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene, bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol, bis-3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol, benzimidazolylphenylamine, 2-oxo-3-chromen, indoie, xanthen, 7-hydroxycoumarin, phenoxazine, calicylate, strophanthidin, porphyrins, triarylmethanes and flavin.
Individual fluorescent compounds which have functionalities for linking to an element desirably detected in an apparatus or assay of the invention, or which can be modified to incorporate such functionalities include, e.g., dansyl chloride; fluoresceins such as 3,6-dihydroxy-9-phenylxanthydrol; rhodamineisothiocyanate; N-phenyl 1-amino-8-sulfonatonaphthalene; N-phenyl 2-amino-6-sulfonatonaphthalene;
4-acetamido-4-isothiocyanato-stilbene-2,2'-disulfonic acid; pyrene-3-sulfonic acid;
2;toluidinonaphthalene-6-sulfonate; N-phenyl-N-methyl- 2-aminoaphthalene-6-sulfonate;
ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; N,N'-dioctadecyl oxacarbocyanine: N,N'-dihexyl oxacarbocyanine; merocyanine, 4-(3'-pyrenyl)stearate; d-3-aminodesoxy-equilenin;
12-(9'-anthroyl)stearate; 2-methylanthracene; 9-vinylanthracene;
2,2'(vinylene-p-phenylene)bisbenzoxazole; p-bis{2-(4-methyl-5-phenyl-oxazolyl))benzene;
6-dimethylamino-1,2-benzophenazin; retinol; bis(3'-aminopyridinium) I,10-decandiyl diiodide; sulfonaphthylhydrazone of hellibrienin; chlorotetracycline;
N-(7-dimethylamino-4-methyl-2-oxo-3-chromenyl)maleimide; N-(p-(2- .
benzimidazolyl)-phenyl)maleimide; N-(4-fluoranthyl)maleimide; bis(homovanillic acid);
resazarin; 4-chloro-7-nitro-2,1,3- benzooxadiazole; merocyanine 540;
resorufin; rose bengal; and 2,4-diphenyl-3(2H)-furanone. Many fluorescent tags are commercially IO available from SIGMA chemical company (Saint Louis, MO), Molecular Probes, R&D
systems (Minneapolis, MN), Pharmacia LKB Biotechnology (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, ~CA), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc., GIBCO BRL Life Technologies, Inc.
(Gaithersberg, MD), Fluka Chemica- Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), and Applied Biosystems (Foster City, CA) as well as other commercial sources known to one of skill.
In one embodiment, nucleic acids are labeled by culturing recombinant cells which encode the nucleic acid in a medium which incorporates fluorescent or radio-active nucleotide analogues in the growth medium, resulting in the production of fluorescently labeled nucleic acids. Similarly, nucleic acids are synthesized in vitro using a primer and a DNA polymerase such as taq. For example, Hawkins et al. U.S.
Pat.
No. 5,525,711 describes pteridine nucleotide analogs for use in fluorescent DNA probes, including PCR amplicons.
The label is coupled directly or indirectly to a molecule to be detected (a product, substrate, enzyme, or the like) according to methods well known in the art. t~s indicated above, a wide variety of labels are used, with the choice of label depending on the sensitivity required, ease of conjugation of the compound, stability requirements, available instrumentation, and disposal provisions. Non radioactive labels are often attached by indirect means. Generally, a ligand molecule (e.g., biotin) is covalently bound to a nucleic acid such as a probe, primer, amplicon, YAC, BAC or the like. The ligand then binds to an anti-ligand (e.g., streptavidin) molecule which is either inherently detectable or covalently bound to a signal system, such as a detectable enzyme, a fluorescent compound, or a chemiluminescent compound. A number of ligands and anti-ligands can be used. Where a ligand has a natural anti-ligand, for example, biotin, thyroxine, and cortisol, it can be used in conjunction with labeled anti-ligands.
Alternatively, any haptenic or antigenic compound can be used in combination with an antibody. Labels can also be conjugated directly to signal generating compounds, e.g., by conjugation with an enzyme or fluorophore or chromophore. Enzymes of interest as labels will primarily be hydrolases, particularly phosphatases, esterases and glycosidases, or oxidoreductases, particularly peroxidases. Fluorescent compounds include fluorescein and its derivatives, rhodamine and its derivatives, dansyl, umbelliferone, etc.
Chemiluminescent compounds include luciferin and 2,3-dihydrophthalazinediones, e.g., luminol. Means of detecting labels are well known to those of skill in the art. Thus, for example, where the label is a radioactive label, means for detection include a scintillation counter or photographic film as in autoradiography.
Makine Trans enic Plants With Nucleic Acids Linked to Selected Loci Nucleic acids which are genetically linked to the loci described herein are optionally cloned and transduced into cells, especially to make transgenic plants. In particular, nucleic acids linked to the loci pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, ~pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A or SOYBPSP
are cloned and transduced into plants. The cloned sequences are useful as molecular tags for selected plant strains, and are further useful for encoding polypeptides.
Often, these-polypeptides are encoded by a QTL and are responsible for the phenotypic effects of the QTL.
The nucleic acids linked to a selected locus or selected loci are introduced into plant cells, either in culture or in organs of a plant, e.g., leaves, stems, fruit, seed, etc. The expression of natural or synthetic nucleic acids encoded by nucleic acids linked to polymorphic nucleic acids can be achieved by operably linking a nucleic ~
acid of interest to a promoter, incorporating the construct into an expression vector, and introducing the vector into a suitable host cell. Alternatively, an endogenous promoter linked to the nucleic acids can be used.
Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular nucleic acid. The vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, prokaryotes, or both (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems.
Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both.
See, Giliman & Smith, Gene 8: 81 ( 1979); Roberts, et al. , Nature, 328:731 ( 1987);
Schneider, B., et al., Protein Expr. Purif. 6435:10 (1995); Berger, Sambrook, Ausubel (all supra).
Cloning of QTL Linked Sequences into Bacterial Hosts There are several well-known methods of introducing nucleic acids into bacterial cells, any of which may be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors, etc. Bacterial cells are often used to amplify increase the number of plasmids containing DNA constructs of this invention. The bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook). In addition, a plethora of kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, EasyPrep'~, FIexiPrep~", both from Pharmacia Biotech; StrataClean~', from Stratagene; and, QIAexpress Expression System, Qiagen). The isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells, or incorporated into Agrobacterium tumefaciens to infect plants.
The in vitro delivery of nucleic acids into bacterial hosts can be to any cell grown in culture. Contact between the cells and the genetically engineered nucleic acid constructs, when carried out in vitro, takes place in a biologically compatible medium.
The concentration of nucleic acid varies widely depending on the particular application, but is generally between about 1 ,uM and about 10 mM. Treatment of the cells with the nucleic acid is generally carried out at physiological temperatures (about 37°C) for periods of time of from about 1 to 48 hours.
Alternatively, a nucleic acid operably linked to a promoter to form a fusion gene is expressed in bacteria such as E. coli and its gene product isolated and purified.
Transfecting Plant Cells To use isolated sequences in the above techniques, recombinant DNA
vectors suitable for transformation of plant cells are prepared. Techniques .for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature. See, for example, Weising, et al., Ann.
Rev. Genet.
22:421-477 (1988). A DNA sequence coding for the desired mRNA, polypeptide, or non-expressed tagging sequence is transduced into the plant. Where the sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the sequence from the gene in the intended tissues of the transformed plant.
Promoters in nucleic acids linked to the above loci are identified, e.g., by analyzing the 5' sequences upstream of a coding sequence in linkage disequilibrium with the loci. Optionally, such nucleic acids will be associated with a QTL.
Sequences characteristic of promoter sequences can be used to identify the promoter.
Sequences controlling eukaryotic gene expression have been extensively studied. For instance, promoter sequence elements include the TATA box consensus sequence (TATAAT), which is usually 20 to 30 base pairs upstream of a transcription start site.
In most instances the TATA box aids in accurate transcription initiation. In plants, further upstream from the TATA box, at positions -80 to -100, there is typically a promoter element with a series of adenines surrounding the trinucleotide G (or T) N G.
See, e.g., J. Messing, et al., in GENETIC ENGINEERING IN PLANTS, pp. 221-227 (Kosage, Meredith and Hollaender, eds. (1983)).
A number of methods are known to those of skill in the art for identifying and characterizing promoter regions in plant genomic DNA. See, e.g., Jordano, et al., Plant Cell 1:855-866 (1989); Bustos, et al., Plant Cell 1:839-854 (1989);
Green, et al., EMBO J. 7:4035-4044 (1988); Meier, et al., Plant Cell 3:309-316 (1991); and Zhang, et al., Plant Physiology 110:1069-1079 (1996).
In construction of recombinant expression cassettes of the invention, a plant promoter fragment is optionally employed which directs expression of the gene in all tissues of a regenerated plant. Such promoters are referred to herein as "constitutive"
promoters and are active under most environmental conditions and states of development or cell differentiation. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1'- or 2'-promoter derived from T-DNA of Agrobacterium tumafaciens, and other transcription initiation regions from various plant genes known to those of skill.
Alternatively, the plant promoter may direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or may be otherwise under more precise environmental control (inducible promoters).
Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.
If polypeptide expression is desired, a polyadenylation region at the 3'-end of the coding region is typically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from T-DNA.
Thewector comprising the sequences (e.g., promoters or coding regions) from genes of the invention will typically comprise a marker gene which confers a selectable phenotype on plant cells. For example, the marker can encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, 6418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosluforon or Basta.
Introduction of the Nucleic Acids into Plant Cells The DNA constructs of the invention are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques. For example, the DNA construct can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment. Alternatively, the DNA constructs are combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host directs the insertion of the construct and adjacent markers into the plant cell DNA when the cell is infected by the bacteria.
Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski, et al., EMBO J. 3:2717 (1984).
WO 99/31964 4~ PCT/US98/26935 Electroporation techniques are described in Fromm, et al., Proc. Nat'1. Acad.
Sci. USA
82:5824 (1985). Ballistic transformation techniques are described in Klein, et al., Nature 327: 70-73 ( 1987) .
Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, are also well described in the scientific literature.
See, for example Horsch, et al., Science 233:496-498 (1984), and Fraley, et al., Proc.
Nat'l. Acad. Sci. USA 80:4803 (1983). Agrobacterium-mediated transformation is a preferred method of transformation of dicots.
Generation of Transgenic Plants Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans, et al., Protonlasts Isolation and Culture Handbook of Plant Cell Culture, pp. 124-176, Macmillian Publishing Company, New York, (1983);
and Binding, REGENERATION OF Pl.r!!M's, PLrINf PROTOPLASTS, pp. 21-73, CRC
Press, Boca Raton, (1985). Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar, et al., J. Tissue Cult. Meth. 12:145 (1989);
McGranahan, et al., Plant Cell Rep. 8:512 (1990)), organs, or parts thereof. Such regeneration techniques are described generally in Klee, et al., Ann. Rev. of Plant Phys.
38:467-486 ( 1987).
One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
Discussion of the Accompanlring Sequence Listing The accompanying sequence listing provides complete or partial sequences for a number of amplicons comprising the marker loci herein and for various primers and probe sequences useful in allele-specific hybridization, PCR and the like. The information is presented in DNA sequences. One of skill will readily understand that the sequence also fully describes the complementary strand of the provided DNA, i.
e., by using standard base-pairing rules, the sequence of complementary DNA is provided and can be written out by any competent practitioner in the art. RNAs having the same sequence are provided by substituting "T" residues with "U" residues, and RNAs corresponding to the complementary strand are similarly provided. A variety of conservatively modified variations of the sequences are also fully provided.
For example, coding regions denoted by open reading frames, beginning with the start codon "ATG" coding for methionine and optionally ending with a stop codon (TAA, TAG
or TGA) can generally be modified by substituting codons which equivalently code for the same amino acid. The genetic code is well known, and is found in essentially all modern textbooks on Molecular Biology. See, e.g., Lewin (1995) Genes V Oxford University Press Inc., NY (Lewin); and Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books. NY. Accordingly, although the given sequences are preferred because of their hybridization characteristics (i. e., .because the given sequences hybridize to genomic soybean DNA at polymorphic loci) one of skill will recognize that, for coding purposes, any coding sequence can be equivalently represented by any sequence having equivalent codons, and~that recitation of a single sequence provides all of these coding sequences. In the interest of not providing clearly redundant information, all possible coding sequences are not written out separately.
However, one of skill can easily do so with the information provided, by simple reference to the genetic code. Simple computer programs can also be used to list any or all such nucleic acids, given the provided sequence. For example, coding regions where the nucleotides TTT
(coding for phenylalanine) appear are optionally substituted with TTC, and vice-versa.
The codons TTA, TTG, CTT, CTC, CTA, CTG (coding for Leucine) are optionally substituted for one another, in any combination. Coding regions where ATT, ATC
or ATA appear (all coding for isoleucine) are optionally substituted for one another. The codons GTT, GTC, GTA and GTG (all coding for valine) are all optionally substituted for one another. The codons TCT, TCC, TCA, AGT, TCG and AGC (all coding for serine) are optionally substituted for one another. The codons CCT, CCC, CCA
and CCG (all coding for proline) are optionally substitued for one another. The codons ACT, ACC, ACA, and ACG (all coding for threonine) are optionally substitued for one another. The codons GCT, GCC, GCA and GCG (all coding for alanine) are optionally substitued for one another. The codons TAT and TAC (all coding for tyrosine) are optionally substitued for one another. The codons TAA, TAG and TGA (all coding for stop codons) are optionally substitued for one another. The codons CAT and CAC
(all coding for histadine) are optionally substitued for one another. The codons CAA and CAG (all coding for glutamine) are optionally substitued for one another. The codons AAT and AAC (all coding for asparagine) are optionally substitued for one another. The codons AAA and AAG (coding for lysine) are optionally substitued for one another. The codons GAA and GAG (coding for glutamic acid) are optionally substitued for one another. The codons TGT and TGC (coding for cyteine) are optionally substitued for one another. The codons CGT, CGC, CGA and CGG (coding for arginine) are optionally substitued for one another. The codons GGT, GGC, GGA and CCC
(coding for glycine) are optionally substitued for one another.
Additional conservative substitutions are also provided by the given sequences. With respect to particular nucleic acid sequences, conservatively modified variants are those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the.encoded sequence is a "conservatively modified variant"
where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following six groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
and 6) Phenylalanine (F), Tyrosine (Y), and Tryptophan (W).
Regarding nucleic acids, the amplicons and probes in the sequence listing are typically used in hybridization experiments, e.g., for marker-assisted selection.
Accordingly, amplicons and probes which are substantially similar or identical, which can be used in the methods herein (e.g., infra-specific alleles, genetically engineered nucleic acids made, e.g., by modification of the provided sequences, and the like) are provided by the accompanying sequences. In particular, bases may be added, deleted or changed without substantially altering the hybridization properties of the nucleic acid.
For example, bases which do not hybridize to a given probe can be modified without altering the hybridization properties of the probe to the given sequence.
Modifications in such non-hybridizing regions (e.g., flanking regions) result in a nucleic acid which is essentially the same (has the same desired phsiochemical properties, i. e., hybridizes' to the same probe) as the written sequence. For example, it will be appreciated that the amplicons are optionally larger than probes which hybridize to them.
Accordingly, the regions which are not involved in hybridization are not essential for hybridization to a probe. Thus, where the nucleotide to be detected is a polymorphic nucleotide, the regions of an amplicon flanking the polymorphic nucleotide which do not hybridize to a probe (e.g., an allele-specific probe, as described, supra) are not critical for hybridization to the probe. One of skill will recognize that these regions are optionally modified.
One of skill will further recognize that the sequences in the sequence listing are optionally part of larger sequences, e.g., the nucleic acids can be cloned into vectors known in the art. See, Sambrook, Ausubel, Berger and Innis, all supra.
Furthermore, subsequences of the given sequences are easily constructed, either by synthetically or recombinantly joining nucleotides to yield the subsequence.
Typical subsequences are at least about 10 nucleotides, often at least about 20 nucleotides, generally often at least about 30 nucletoides, and optionally any length, e.g., 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900 or the like and can, of course, be full-length. Similarly, a subsequence can include, e.g. 5 0, 109&, 15~, 20%, 25%, 30~, 35%, 40%, 4590, 50%, 55%, 60%, 65 % , 70 % , 75 ~ , 80 °6 , 85 °b , 90 °!o , 95 % , 100 ib or any percentage between those listed of a particular full-length nucleic acid.
The subsequences are characterized by the ability to specifically hybridize to the complement of the full-length sequence, and by sequence identity with a sequence in the sequence listing over a selected comparison window. A "comparison window", as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 10 to 1000, usually about 20 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith & Waterman, Adv. Appl.
Math.
2:482 (1981); by the homology alignment algorithm of Needleman & Wunsch, J.
Mol.
Biol. 48:443 (1970); by the search for similarity method of Pearson & Lipman, Proc.
Natl. Acad. Sci. USA 85:2444 (1988); by computerized implementations of these algorithms (including, but not limited to CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, California, GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, USA); the CLUSTAL program is well described by Higgins & Sharp, Gene, 73:237-244 {1988) and Higgins & Sharp CABIOS
5:151-153 (I989); Corpet et al., Nucleic Acids Research 16:10881-10890 (1988);
Huang et al. Computer Applications in the Biosciences 8:155-165 (1992); and Pearson et al., Methods in Molecular Biology 24:307-331 (1994). Alignment is also often performed by inspection and manual alignment. One of skill can easily select a variety of nucleic acids which are identical to the given nucleic acids over any selected comparison window size.
It will be recognized that where the window includes a nucleotide region to be detected (e.g., a locus), the remainder of the nucleic acid may be deleted or modified.
One of skill will also understand that sequencing technology is imperfect.
Typical error rates for DNA sequencing are on the order of 1-5 % , depending on the particular technology used for sequencing. Accordingly, one of skill will recognize that the amplicons of the sequence listing are most preferably obtained by amplifying a genomic nucleic acid (e.g., a genomic clone from a library or genomic nucleic acid isolated from a plant) using the primers described for the particular amplicon. However, the amplicons are also obtained by other means, including synthetic creation of a nucleic acid having the indicated sequence, or any other method described herein.
Modifyin~Nucleic Acids in the Seguence Listins One of skill will appreciate that many conservative variations of the nucleic acids in the sequence listing can be made using common techniques. For example, due to the degeneracy of the genetic code, "silent substitutions" (i.
e., substitutions of a nucleic acid sequence which do not result in an alteration in an encoded polypeptide) are an implied feature of every nucleic acid sequence which encodes an amino acid. Similarly, "conservative amino acid substitutions," in one or a few amino acids in an amino acid sequence of a packaging or packageable construct, are substituted with different amino acids with highly similar properties (see, above) are also readily identified as being highly similar to a disclosed construct. Such conservatively substituted variations of each explicitly disclosed sequence are a feature of the present invention. Substantially identical nucleotides such as allelic variants or recombinantly S engineered nucleic acids comprising all or a portion of a given sequence are also made using these standard techniques. - _ One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR
amplification using degenerate oligonucleotides, exposure of cells containing the nucleic 10 acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide {e.g., in conjunction with ligation andlor cloning to generate large nucleic acids) and other well-known techniques. See, Giliman and Smith ( 1979) Gene 8:81-97, Roberts et al. (1987) Nature 328:731-734 and Sambrook, Innis, Ausubel, Berger, Needham VanDevanter and Mullis (all supra). See also, Kilbey et al. (eds) (1984) Handbook of 15 Muta enicity Test Procedures. Second Edition Elsevier, New York.
One of skill can select a desired nucleic acid of the invention based upon the sequences provided and upon knowledge in the art regarding retroviruses generally.
The specific effects of many mutations on hybridization, for instance, can be determined with virtual certainty, even in the absence of experimental information on a 20 hybridization. Moreover, general knowledge regarding the nature of proteins and nucleic acids allows one of skill to select appropriate sequences with activity similar or equivalent to the nucleic acids in the sequence listings herein. Finally, most modifications to nucleic acids are evaluated by routine screening techniques in suitable assays for the desired characteristic. For instance, changes in the immunological 25 character of encoded polypeptides can be detected by an appropriate immunological assay. Modifications of other properties such as nucleic acid hybridization to a complementary nucleic acid, redox or thermal stability of encoded proteins, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according to standard techniques, all of which are well-suited to high throughput 30 selection.
EXAMPLES
The following examples are offered by way of illustration, and are not intended to be limiting. One of skill will immediately recognize a variety of alternate procedures, compositions, reagents and the like which can be substituted for those exemplified below.
Example 1: Identification of Marker Loci Marker loci must have multiple alleles to be used for genetic studies, genetic selection, and genetic identification within a species. We compared the DNA
sequences among different soybean varieties and found new sequence polymorphisms at 63 loci throughout the soybean genome. The alleles for these polymoiphisms are described in this example using allele-specific hybridization (ASH) as an example marker technology. We designed locus-specific oligonucleotide primers to amplify each locus by PCR, and designed allele-specific oligonucleotide probes to hybridize with and distinguish each allele. These polymorphisms provide new opportunities to develop genetic-marker technology and to expand genetic-marker applications for the improvement of soybean.
A. Materials and Methods Identification of Sequence Polymorphisms We selected soybean markers for conversion to ASH based on their genetic map locations. The objective was to have ASH markers well distributed around the soybean genome to maximize their general utility as markers and to have ASH markers near to genes of interest for marker-assisted selection and positional cloning of genetically linked genes, or other nucleic acids of interest. Map locations of RFLP markers were identified on Pioneer Hi-Bred International proprietary soybean marker maps and the USDA/ISU
public soybean marker map (Shoemaker RC, Olson TC (1993) (Glycine max L.
Merry.
P. 6131-6138. In O'Brien SJ (ed.) Genetic maps: Locus maps of complex genomes.
Cold Spring Harbor Laboratory Press, New York). The linkage groups on the Pioneer Hi-Bred maps were named according to the USDA/ISU public map by cross referencing markers in common between the maps.
Some marker loci selected for ASH development were homologous to cloned genomic soybean DNA. For these loci, fragments of genomic DNA were ligated into the PstI restriction site of the commonly available pBS+ vector, transformed into E.
coli strain DHSa using established protocols Keim P and Shoemaker RC (1988) Soybean Genet Newl 15:147-148, and mapped as RFLP markers prior to selection for use as ASH
probes. For a few other loci, genomic DNA fragments were ligated into the Lambda ZAP II vector, packaged, plated on E. coli strain XL1-Blue MRF', selected based on homology to a DNA probe and excised in pBluescript SK (-) phagemid in E. coli strain SOLRT" using ExAssistT" helper phage (Stratagene, La Jolla, California 92037 U.S.A.).
Alternatively, selected DNA fragments were ligated into the pCR~" vector and transformed into E. coli according to the TA Cloning~ Kit V2. i (Invitrogen, San Diego, California 92121 U.S.A.) or cloned into LIC vector (PharMingen, San Diego, California 92121 U.S.A.) and transformed into E. coli strain DHSa.
Plasmid DNA was isolated using the Magic" or Wizard' Miniprep systems (Promega, Madison, Wisconsin 53711 U.S.A.) or Nucleobond Ax kits (Macherey-Nagel, P.O. Box 10 13 52 D523 >'13 Duren, Germany). Minipreps were precipitated in 0.1 vol 7.5 M NH4Ac and 2.0 vols EtOH at -20~ C for 20 min, spun in a microcentrifuge for 20 min, washed in 70 ~ EtOH, dried, and dissolved in 10 mM
Tris/HCl pH 8.5. Insert DNA was sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction and a Perkin Elmer ABI 373 or 377 DNA sequencer.
Sequencing primers were designed from the vector.
DNA sequence from each locus was then used to design forward and reverse PCR primers to amplify the locus from different soybean varieties. The primers were designed to have a dissociation temperature (Td) of approximately 60~ C, using the formula:
Td = (((((3 x #GC) + (2 x #AT)) x 37) - 562) / #bp) - 5, where #GC, #AT, and #bp are the number of guanine-cytosine base pairs, the number of adenine-thymine base pairs, and the number of total base pairs, respectively, involved in the annealing of the primer to the template DNA. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP10 column.
The PCR reaction mixture to amplify each desired locus consisted of 1.0 ~,M of the forward primer and 1.0 ~cM of the reverse primer, 1X of buffer (20 mM
Tripotassium citrate, 20 mM MgS04, 40 mM Tris base, 10 mM Glycine, 5 inM
L-Histidine, and 0.01 l Triton X-100), 240 ~cM of each dNTP, 0.4 U ~cL-~ of Hot Tub DNA polymerase (Amersham Life Science, Arlington Heights, IL 00005 U.S.A), and 1.0 ng ~cL-~ of template DNA, all diluted in HPLC-grade H20. The PCR reaction was done in 33 cycles with an initial denaturing for 2 min at 94~ C, subsequent denaturing for 30 seconds at 94~ C , annealing for 2 min at 58~ C, and extension for 2.5 min at 70~
C. A final extension was done for 3 min at 70~ C. An aliquot of each PCR
product was run on a 1-2 % agarose gel and stained with EtBr for viewing under ultraviolet light.
Each amplicon was purified for sequencing using a QIAquick-spin PCR
Purification Kit using the manufacturer's protocol (Qiagen, Chatsworth, California 91311 U.S.A.). If multiple bands were amplified from one PCR, all were individually extracted from the gel and purified using the QIAquick Gel- Extraction Kit (Qiagen, Chatsworth, California 91311 U.S.A.). These products were then sequenced using the Taq DyeDeoxy terminator cycle sequencing reaction on a Perkin Elmer ABI 373 or 377 DNA
sequencer.
Each oligonucleotide primer used to amplify a DNA fragment by PCR was also used to prime the sequencing of that fragment from one end. When a DNA fragment was too large to sequence in one pass from each side, additional sequencing primers were designed from the initial amplicon sequence and used to obtain more sequence further inside the fragment. When multiple bands were produced from an original set of PCR
primers, the sequences among these different fragments were compared and locus-specific primers were designed to amplify only the desired locus.
Eight soybean varieties (Table 1) used to compare DNA sequences and identify single-nucleotide polymorphisms were progenitors to many modern North American soybean varieties (Gizlice et al. (1994) Crop Sci 34:1143-1151) and should, therefore, represent most alleles existing for each locus. Two varieties, BSR101 and PI437.654, were also sequenced for each locus because they were the parents to a population of recombinant-inbred soybean lines used to map many of the markers. These 10 varieties represent a broad range of maturity types and were considered genetically diverse by RFLP fingerprinting analysis (unpublished data). The sequences of these 10 varieties were aligned and compared for differences at each locus.
Table 1. Soybean varieties used to compare DNA sequence, their estimated contributions to modern North American varieties, and their relative maturity groups.
BSR101 and PI437.654 were included to identify marker alleles that could be genetically mapped in a recombinant-inbred population.
Vanety ercentage ontn utton aturlty roup Man arm ttawa Lincoln 17.9 III
A. K. Harrow 4.9 III
S100 7.5 V
Ogden 4.9 VI
CNS 9.4 VII
Tokyo 3. 8 VII
Jackson 3.3 VII
PI437.654 - ~ II
Percentages rom lz ice et a .
Design and Testing of ASH Markers When single-nucleotide polymorphisms or other polymorphisms were found among the DNA sequences at each locus, several ASH oligonucleotide probes were designed, synthesized, and tested for each allele. The ASH probes were designed to have a dissociation temperature of about 37~ C. They were synthesized with a Perkin Elmer ABI 394 DNA/RNA Synthesizer using (-cyanoethyl phosphoramidite chemistry with the dimethoxytrityl protecting group removed, and were purified by desalting over a Pharmacia NAP10 column. The probes were tested against the 10 sequenced soybean varieties, 12 additional North American ancestor varieties, and 72 recombinant-inbred lines of the BSR101 X PI437.654 mapping population. Probes were also tested for signal (hybridization of the correct probe) to noise (hybridization of the incorrect probe) ratios for each locus using the amplified target DNA from one variety in a dilution series. The dilution series were made using about 100 ng, 10 ng, and 1 ng of PCR-product (target) DNA.
Target DNA was attached to a Hybond N+ nylon membrane (Amersham Life Science, Arlington Heights, IL 60005 U.S.A) in the following manner. The membrane was soaked briefly in water and the excess water was removed from the membrane by blotting. Two microliters of the PCR product or diluted PCR
product was pipetted onto the moist membrane. The membrane was placed on a blotter paper WO 99/31964 5~ PCTNS98126935 saturated with a DNA denaturing solution (0.6 M NaCI, 0.4 M NaOH), DNA side up, for two minutes. The membrane was then transferred to a blotter paper saturated with a neutralizing solution (0.5 M Tris pH7.5, 1.5 M NaCI) for 10 minutes. The membrane was then baked at 85-90c C for 1-2 hours and UV-crosslirtked at 20,000 ~J.
Prior to hybridization, each membrane was soaked in a hybridization-washing buffer (0.75 M Na, 0.5 M P04, 1.0 mM disodium EDTA, and 1 ~ sarkosyl) for 30 min at 65e .C, then in fresh buffer foi 30 min to overnight at room temperature.
Each ASH probe was end-labeled with 3~P transferred from the y position of ATP by the T4 polynucleotide kinase reaction according to the kinase manufacturer's protocol (New England Biolabs, Beverly, Massachusetts 01915 U.S.A.).
Hybridization of the probe and target DNA was done for at least one hour while shaking at 60 rpm at room temperature. Afterwards, the hybridization solution was discarded and the membrane was sequentially washed, each time in fresh solution, once for 2 min, once for min, and twice for 30 min while shaking at 60 rpm.
15 The hybridized membrane was placed against X-ray film for 30 minutes to 18 hr at -80 ~ C. The X-ray film was developed and the probe was evaluated for its hybridization characteristics. If necessary, the probe was redesigned and tested to increase the signal-to-noise ratio or signal strength.
Genetic Mapping of Soybean ASH Markers Two segregating soybean populations were used to confirm the map location of each ASH marker relative to other markers in linkage groups. These two populations each consisted of about 300 recombinant-inbred lines from the crosses PI437.654 X BSR101 and Bell X YB17E, respectively. The mapping procedure used and the population history for PI437.654 X BSR10~ were as already described (Webb et al (1995) Theor Appl Genet 91:574-581; Keim et al (1997) Crop Sci 37:537-543).
The population from Bell X YB17E consisted of F3,6 lines. When possible, the linkage groups were identified with and named according to the USDA/Iowa State University public soybean map (Shoemaker and Olson 1993).
Results We identified DNA-sequence polymorphisms at 63 independent loci~in the soybean genome by comparing the DNA sequences among different soybean genotypes.
These 63 loci were named (Table 2) according to pre-existing nomenclature. All were named as RFLP marker loci, except the loci php08320E, php08584A, php10078A and php10355B which were originally mapped as AFLP marker loci, the locus php which was only mapped as an ASH marker, and the locus SOYBPSP which was a soybean gene sequence in GENBANK (accession M 13759) for a 7S seed storage protein (Doyle et al. (1996) J Biol Chem 261:9228-9238). Of these 63 loci, only SOYBPSP was a known gene sequence and its nucleotide polymorphism (which was previously unknown and identified by our sequencing) was found in an intron.
We designed forward and reverse PCR primers (Table 3) that amplified a region (the amplified product of the PCR reaction is an "amplicon") of each locus containing at least one nucleotide polymorphism. Most of these loci had multiple polymorphic nucleotide positions separated by monomorphic nucleotides. When the distance between two polymorphic nucleotides was greater than the size of a typical oligonucleotide hybridization probe, we considered each polymorphic nucleotide position to be an independent sub-locus (Table 2). When the distance between two polymorphic nucleotide positions was within the size of one oligonucleotide probe, we considered those polymorphisms to be dependent and part of one sub-locus. We named each amplicon using the prefix 'pha' as an acronym for Pioneer Hi-Bred amplicon, followed by a unique identification number (Tables 2 and 3). Locus or amplicon names can both be used to refer to a DNA region containing specific polymorphic sub-loci.
The particular primer pairs described here produced amplicons ranging in length between 86 and 1880 base pairs. About half the amplicon sizes were estimated from bands viewed on agarose gels and about half were obtained by sequencing the entire amplicon region. Estimates from gels were accurate to within about 10 % of the insert in size. The loci php05219A (amplicon pha11138) and php10355B (amplicon pha11627) each produced two fragment sizes because each locus had an insertion-deletion event that varied among different soybean genotypes (Table 4).
One locus, pK079A, had two non-overlapping amplicons, pha11074 and pha11075. Two amplicons were needed for this locus because the DNA sequences flanking its two sub-loci were conserved at a second pK079 locus. A region of DNA
between the two sub-loci was found to be different from the second pK079 locus so PCR
primers specific to pK079A were designed for two amplicons, one for each~sub-locus.
The polymorphic nucleotides for each allele of each sub-locus were designated by upper-case letters within the probe sequences shown in Table 2.
Each oligonucleotide probe distinguishes one allele of a sub-locus and locus. A
public soybean variety representative of each allele is listed in Table 2 for the purpose of example.
The primer pairs and probes presented here enable the use of allele-specific hybridization (ASH) and many other techniques for detection of polymorphisms at these loci. ASH was used here as an example marker technology for each polymorphism. Other genetic marker technologies are equally effective as a means to exploit these polymorphisms in soybean improvement programs. In addition, other DNA sequences than specifically shown here are used for primers and probes to detect these polymorphisms, e.g., as described supra. Any DNA sequences flanking these polymorphic nucleotides and within the size range amplifiable by PCR (up to 40Kb using long-distance PCR methods, although more typically about 2KB or less using standard PCR methods) are used as PCR primers to amplify the polymorphic DNA, and the forward and reverse primers are designed from either DNA strand. Any DNA
sequence containing the polymorphic nucleotides and within the discriminatory capability of DNA , hybridization conditions are used as hybridization probes to detect the polymorphism.
The probes are designed from either DNA strand.
The genetic map locations by linkage group for 43 of these 63 loci were established by mapping them as ASH markers in segregating populations (Table 2).
Mapping showed the alleles of each locus to be heritable and to segregate normally in a recombinant-inbred population. When possible, the linkage-group names used here were the same as those used as a reference map for soybean (Shoemaker and Olson, 1993, supra.). Thirty eight of these 63 loci mapped as ASH markers to 17 linkage groups that correspond to the public reference map and 5 loci mapped to linkage groups on three other genetic maps that have not yet been identified with linkage groups on the public.
reference map. These later groups were named using a combination of letter (B, L or Z).
and number. Twenty markers, pA059A, pA064A, pA077A, pA593A, pBLTISA, pK401A, pR153A, php02340B, php02396A, php05264A pA343A, pA748B, pA858A, pB132A, pG17.3A, php02371A, php05290A, php02329A, php02376A, and php08320E, were not mapped as ASH markers, but sixteen of their loci have been mapped as either RFLP or AFLP markers. Four loci, php02396A, php05264A, php02329A and php02376A have not been mapped as markers of any kind. Regardless of their map status, the 63 loci described here reside on most, and possibly all, of the 20 chromosome pairs in soybean.
WO 99/31964 53 PC'f/US98I26935 Table 2. Soybean marker loci:
their genetic linkage group, sub-loci, and polymorphic nucleotides, probe sequences for allele-specific hybridizations, and a representative soybean variety for each allele.
ocus m age p icon a - o a oy can Name Groupfi Name Locus Sequence# Variety NO:
p U 9 p a ttgtga caatata arrow (4) 1 ttgtgaCcaatat CNS 2 P p a actaaat tatacc arrow 3 (2) 1 ggtataCatttag BSR101 ~ 4 p p a gttgc tgggt ( 1 ) 1 ttgcCtgggtt AKHarrow 6 2 ttttcTTttgttag Mandarin 7 2 ttttttcttgttag AKHarrow 8 3 tttccAaaggtg Mandarin 9 3 ttttccGaaggt AKHarrow 10 P p a 0 ggatt atacta 11 (3) 1 tggattTTatacta Ogden 12 2 atcatgatttcag PI437.654 13 2 tgaATCTGAtttc CNS 14 3 gCAAGTATCAtg CNS 15 3 tttcagTTtgattt PI437.654 16 4 atgttCgggga BSR101 I7 4 aatgitTgggg Mandarin 18 5 . agaacaCggaat BSR101 19 5 gaacaTggaatg Mandarin 20 6 tgcaaCggcat BSR101 21 6 tgcaaTggcatt Ogden 22 P p a tttgaa ctttat 23 ( 1 ) 1 tttgaaTctttatc PI437.654 24 p p a tcatc aatcac 25 ( 1 ) 1 ttcatcAaatcac Jackson 26 2 GatgaCgatttg BSR101 27 2 aCatgaTgatttg PI437.654 28 3 tcatCtGtgataa BSR101 29 3 tcatCtCtgataa Jackson 30 3 tcatGtGtgataa PI437.654 31 P ~ P a tgcact taaatta 32 (1,2) 1 ttgcacttaaatta Lincoln 33 2 ttccttcTtttttt BSR101 34 2 tccttcCtttttt PI437.654 35 3 agagaGatactc BSR101 36 3 agagaCatactc Columbus 37 4 gaccCgctc BSR101 38 4 gaccTgctcc Lincoln 39 5 gaaatcCcaaaaa BSR101 40 5 gaaatcAcaaaaaa Lincoln 41 6 tttatcAtttttgg PI437.654 42 6 atttatcTtttttg BSR101 43 pA343A D pha13072 1 cagcaAtgaaag CNS 44 (1) 1 tcagcatgaaag AKHarrow 45 2 ttaactTgccag AKHarrow 46 2 ttaaciAgccag CNS 47 3 accctCaatatg AKHarrow 48 3 accctAaatatg CNS 49 p G p a ttggaa tatact anc a 50 (1) 1 ttggaaGAtatac PI437.654 51 1 ttggaaGTtatac BSR101~ 52 2 agcacAagtgg PI437.654 53 2 agcacTagtgg PI86050 54 p U p a attaggggcag 55 (1,2) 1 attaggGggca PI437.654 56 2 atatgtaAcaaaag PI437.654 57 2 tatgtaCcaaaag BSR101 58 P y p a tctttt cataatg 59 (1,2) 1 cattatgCaaaag CNS 60 2 gtactaTtatttg PI437.654 61 2 gtactaGtatttg BSR101 62 3 ttgagGatttag BSRI01 63 3 ttgagCatttag AKHarrow 64 4 aaggaGgttgc BSR101 65 4 taaggaAgttgc AKHarrow 66 5 ttagttGagagg AKHarrow 67 .
5 ttagttAagagga BSR101 68 P y p a gttttt tt ataaatarrow 69 (1) 1 gtttttAttATTATTaCNS 70 2 aaatatAtatatatataAICHarrow 7I
2 aatatGtataAtatatCNS 72 3 atatatTtaataaatatCNS 73 3 tatatatAtaTATATATAICHarrow 74 4 aaaaTaaaaAtaaaagAICHarrow 75 4 aaaAaaaaCtaaaag CNS 76 5 atctttGatgagt CNS 77 5 atctttCatgagt AICHarrow 78 6 tTtmtCtttttac CNS 79 6 ttAtttttAttmac AICHarrow 80 P ~~ p a agaaat gtaagt 81 (1) 1 agaaattTgtaagt PI437.654 82 2 aatcttTtttaaag BSR101 83 2 aatcttCtttaaag PI437.654 84 P ~B p a tctat ctgaag an arln 85 (1) 1 tctatActgaag CNS 86 2 aatgatAatttagt CNS 87 2 aatgatCatttag Mandarin 88 P ~ ~ H p ctacatt mttg gg (1) 1 tacattGtttttg AKHarrow 90 2 mttgTtagaga CNS 91 2 mttgCtagag AKHarrow 92 3 acactGcttac CNS 93 3 tacactActtac AKHarrow 94 p 88 p a taaggtaatgttg 95 (1) 1 aggtGGTaatgt CNS 96 2 ggcttAtgcatt CNS 97 2 ggcttCtgcat AKHarrow 98 p W p a aggg ctctg . 99 ( 1, 1 tagggTctctg BS R 101 100 2) ~
2 atacttgtactct Ogden 101 2 tacttCgtactc BSR101 102 3 tatggagtaattg Ogden 103 p K p a ttgaat cccct 105 (1) 1 ttgaatCcccc PI437.654106 2 atgmCgaagc BSR101 107 2 . atgtttTgaagca PI437.654108 3 cggttTtattag PI437.654109 3 cggttCtattag BSR101 110 4 attccTgcccc BSR101 111 4 attccAgcccc Tokyo 112 pB p a accg gcaac 113 ( 1 1 ttgcCcggtg PI437. 114 ) 654 2 . tgtaaTgcGtg BSR101 lI5 2 tgtaaCgcGtg PI437.654116 2 tgtaaCgcAtgt Cook 117 3 tcaccTggatc BSR101 118 3 atcaccCggat PI437.654119 p p a ttcagt aaacc 120 (1) 1 ttcagtTaaacca PI437.654121 p~ p a tcttaa aggct o 0 122 ( 1 . tcttaaAaggct CNS 123 ) p~ p a gatca cccaa arrow 124 A
( 1 1 atcaaGcccaa CNS 125 ) p~ p a carat ccacaa . 126 A
( 1 1 cacatAccacaa BSR 101 127 ) 2 attattAttttcac PI437.654128 2 attattTttttcac BSR101 129 3 aggagtAgtaatt PI437.654130 3 ggagtGgtaatt BSR101 131 p~ p a ttggac attaata . 132 A
'(1) 1 ttggacAattaata BSR101 133 ~
2 taatatcTtatgca BSR101 134 2 aatatcGtatgca PI437.654135 p ~ p ttctcg gcc arrow 136 (4) 1 ttctcgCgcc CNS 137 2 ttctgataaaaaaa CNS 138 2 tctgatGaaaaaa AKHarrow 139 p p p a gaatga tttga 140 (1) 1 gaatgaTtttgac Mandarin 141 p M p a tcattc ttcatg . 142 p lA
(1) 1 tcattcTttcatg BSR101 143-p p . p a catata tagtag an arm 144 1 acatataAtagtag CNS 145 p p a aaaaaaaatgagg 146 OB
( 1 1 aaaaaaaTatgagg CNS 147 ) p pU p a tgtgacaaccga 148 lA
(1) 1 gtgacACaacc PI437.654 149 ph p p a tctcc gaaaca 150 OC
( 1 1 gtttcGggag PI437.654 151 ) 2 aagaagAtgatg BSR101 152 2 agaagCtgatg PI437.654 153 P P p a - ggca tttt arrow 154 c A
( 1 1 ggcaaGttttAc CNS 155 ) 2 ttttCcGgtgc AKHarrow 156 2 GttttAcAgtgc CNS 157 3 aataaaCaagagg AKHarrow 158 3 aataaaTaagagga CNS 159 4 tcgtttGcaatc AKHarrow 160 4 . tcgtttCcaatc CNS 161 5 aaatgatTccttg AKHarrow 162 5 aatgatCccttg CNS 163 6 tattatgTttttgatAKHarrow 164 6 tattatgCttttga CNS 165 7 ggaaagGattttt CNS 166 7 ggaaagAattttta AKHarrow 167 8 ttttgtATctgtat CNS 168 8 tttgtGGctgta AKHarrow 169 p . p a attaaa cccag an arm 170 P
1 attaaaTcccagt CNS 171 2 ttttgtTagagag Mandarin 172 2 ttttgtCagaga CNS 173 3 ttttctTctgtca Mandarin 174 3 ttttctCctgtc CNS 175 4 acaacTAAtaagg Mandarin 176 4 tacaactaaggta CNS 177 p p0 p a atcacg atcac . 178 (1) 1 gtgatCcgtg BSR101 179 P P p a cgcac atatg . 180 ( 1 ) 1 gcacaGatatg BSR 101 181 2 atactgAtttctg PI437.654 182 2 . tactgGtttctg _ 183 ~ BSR101 3 gttagaAtgttag PI437.654 184 3 ttagaGtgttagt BSR101 185 4 aatataaAtaaggg PI437.654 186 4 atataaGtaaggg BSR101 187 5 aaactcagttattt PI437.654 188 5 aactcAAagttatt BSR101 189 6 tttcttTgttattc PI437.654 190 6 atttcttCgttatt BSR101 191 7 attctgCtattatt PI437.654 192 7 tattctgTtattattBSR101 193 8 tgcacaTattact PI437.654 194 8 gcacaCattact BSR101 195 9 atcccatGttca PI437.654 I96 9 tcccatAttcag BSR101 197 P P~ p a . tttaag agagtt . 198 (1,2) 1 gtttaagAagagt BSR101 199 2 tgctttGatttg BSR101 200 2 tgctttAatttgg' Mandarin 201 3 atctctaTaaaca PI437.654 202 3 tctctaCaaacaa BSR101 203 4 tctcaaCttgga PI437.654 204 4 tctcaaTttggaa BSR101 205 5 tttgcAtgcaac PI437.654 206 5 tttgcTtgcaac BSR101 207 6 cacatCcatttg Ogden 208 6 cacatAcatttg PI437.654 209 P P~ ~ p a tacaaaa aaggtt arrow 210 1 acaaaaCaaggtt BSR101 211 2 ggcAtgTgagt AKHarrow 212 2 ggcTtgCgag BSR 101 2 i 3 tgatatATTcttca AKHarrow 214 3 tgatatGcttcaa BSR101 215 4 agttaacAtgaag AKHarrow 216 4 gttaacGtgaag BSR101 . 217 5 agcagtGaagta AKHarrow 21$
5 gcagtAaagtac BSR101 219 6 aatatatctctttttttAKHarrow 220 6 atatctcCtttttt CNS 221 7 aaaaaaGtagctaaAKIiarrow 222 7 aaaaaaCtagctaaCNS 223 8 tatttgCattagg AKHarrow 224 8 ttatttgTattaggCNS 225 9 tcttacaTctttg AKHarrow 226 9 cttacaGctttg CNS 227 10 cggtAgagatt AKHarrow 228 10 cggtGgagatt CNS 229 11 gggcacAaaag AKHarrow 230 11 ggcacGaaagt CNS 231 P P~ p a ctatat ttggtg 232 (I) 1 ctatatAttggtg PI437.654 233 2 aataatGattgtg PI437.654 234 2 taataatAattgtgtBSR101 235 3 ttattatCttttgtWilliams 236 3 ttattatTttttgtCNS 237 4 ttaAaatcTagtagCNS 238 4 attaAaatcCagtaPI437.654 239 4 attaTaatcCagtaWilliams 240 5 agattaAcaggc CNS 241 5 tagattaCcagg BSR101 242 p p K p a aagaga ggcta 243 (1) I aagagaTggcta PI437.654 244 2 tggaaAccttaat BSR101 245 2 tggaaGccttaa PI437.654 246 3 tacctcAagtgt BSR101 247 3 acctcGagtgt PI437.654 248 4 actaagAattttg PI437.654 249 4 actaagCattttg BSR101 250 P PO p a ttatag cacttg 251 (1,2) 1 tatagGcacttg Mukden 252 2 tggtaCgttatg Mukden 253 2 ttggtaAgttatg BSR101 254 3 ataaaccAtatatgPI437.654 255 3 ataaaccTtatatgBSR101 256 4 agtgtgtT"ITtttBSR101 257 4 gtgtgtAAGtttt PI437.654 258 5 ttggtCcaagg PI437.654 259 5 ttggtTcaaggt BSR101 260 P P~ p a atagg aaaagg 261 (1) 1 aataggTaaaagg PI437.654. 262 2 aaccttTctgtc BSR101 263 2 accttGctgtc PI437.654 264 p p ~ p a ctaca atgaag 265 1 tacatGatgaag AKHarrow 266 2 aatcttGctgtg BSR101 267 2 aatcttActgtga AKHarrow 268 3 atgtgGcattga BSR101 269 3 catgtgAcattg AKHarrow 270 4 tatacaaTatctaaaBSR101 271 4 atacaaCatctaaa AKHarrow 272 5 tcaatgAtggata BSR101 273 5 tcaatgTtggata AKHarrovi~ 274 p p p a aaaat ttggatc 275 (1) 1 aaaaatAttggat PI437.654 276 p p0 p a gaatgaatttttc arrow 277 OA
( 1 ) 1 . aatgaaCtttttcCNS 278 2 aaggagGgaaaa AKHarrow 279 2 aaggagAgaaaaa CNS 280 3 aaatgaaCaaaaaaaCNS 281 3 aaatgaaAaaaaaaaAKHarrow 282 p p p a tttgtt tttatga . 283 ~
(1) 1 tttgttCtttatga BSR101 284 2 atctatGtatatta BSR101 285 2 . tatctatAtatattaPI437.654 286 3 gttgcCaaatca BSR101 287 3 tgttgcAaaatca PI437.654 288 P P~ p a agtgc tgaaa 289 (1,2) 1 agtgcACtgaaa PI437.654 290 2 gaggaGatgtag BSR101 291 2 gaggaAatgtag PI437.654 292 3 gatgaTtttagc BSR101 293 3 gatgaGtttagc PI437.654 294 4 taggGatttgg BSR101 295 4 aataggAatttgg PI437.654 296 5 ccatgTttggtt BSR101 297 5 ccatgCttggt PI437.654 298 p P a agga tat cc 299 OE
( 1 ) 1 aggaAtatTccc PI437.654 300 2 agaaAAtcgcttt BSRIOI 301 2 gaaGGtcgGTC PI437.654 302 3 gttttttGactttt BSR101 303 3 gttttttTacttttgPI437.654 304 4 tatgatGtttcct BSR101 305 4 atatgatAtttcct PI437.654 306 P P~~ P a gg~c ggag 307 (1,2) 1 tggtacTggag PI437.654 308 WO 99/31964 6o PCT/US98/26935 2 tgcatAGcaaga BSR101 309 2 tgcatGAcaaga PI437.654 310 3 aactcGttgatg BSR101 311 3 aaactcAttgatg PI437.654 312 S 4 agtctGagmg PI437.654 313 4 aagtctAagmg BSR101 314 S gtgtaaAcggg PI437.654 31S
S tgtaaGcggga BSR101 316 6 tgaaGaaaaaTatg BSR101 31-7 6 gaaCaaaaaGatg PI437.654 318 7 atgggAcgttg BSR101 319 7 atgggGcgtt PI437.654 320 8 gctttgttgttg BSR101 321 8 gctttgGTAttg PI437.654 322 1S 9 ttagtGgacagt CNS 323 9 gttagtTgacag PI437.654 324 10 gccaaaGcaaata CNS 32S
10 gccaaaTcaaataa PI437.654 326 11 agtgaTactgg CNS 327 11 agtgaGactgg PI437.654 328 p a 1 gtcac gagaa 329 (1) 12 gtcacGTgaga PI437.654 330 13 taatccAgggaa BSR101 331 13 _ taatccTgggaa PI437.654 332 2S 14 attaattTagaagg BSR101 333 14 ttaattGagaagg PI437.654 334 1S gtctgCatatga CNS 33S
1S tgtctgTatatga PI437.654 336 16 atgtgCtttggt CNS 337 16 atgtgGtttggt PI437.654 338 17 cggTaagAtgaa BSR101 339 17 cggCaagGtg CNS 340 18 ctcgcAccttc BSR101 341 18 tcgcGccttc CNS 342 3S p p p a tttttta agaaaaga 343 (2) 1 tttttaGagaaag Mandarin 344 2 aaaaaaaTttaagacBell 34S
2 aaaaaaaAttaagacMandarin 346 3 ttttttttgtagtg PI437.654 347 3 tttttttGtgtagt Bell 348 4 ataatatATgaaam PI437.654 349 4 ataatatgaatttc Bell 3S0 S aatttcGtatgta Bell 3S1 4S S aatttcAtatgtac Mandarin 3S2 .
6 acatttGgattaa Mandarin 3S3 6 acamAgattaaa Bell 3S4 7 taaaaaaAtattactBell 3SS
7 taaaaaaGtattactPI437.654 3S6 8 gaacaaTttgtaat Mandarin 357 8 gaacaaCttgtaa Bell 358 9 cttcaCgaagg Mandarin 359 9 tcttcaTgaagg Bell 360 P P p a gtttctcttat ac . 361 SB
(1,2) 1 tcTTATCttatGac BSR101 362 1 tcTTATCttatAac S 100 363 2 gmcTgataac PI437.654 364 -2 gmcAgataac BSR101 365 P P ~ p a ttggg atgatg 366 SA
(1) 1 tgggGatgatg PI437.654 367 P ~ p a atctaa atttagt PI437.654 368 (1,2) 1 aatctaaAatttagt BSR101 369 P ~ y ~ p acttc agtgga 370 (1) 1 cttcGagtgga PI437.654 371 P a ~ gacaat taaaaa 372 (1) 2 gacaatTtaaaaa PI437.654 373 p p a aagaat ttccta o 0 374 (4) 1 aaagaatAttccta BSR101 375 2 atgtgtttggttt Tokyo 376 2 tgtgtGTttggt BSR101 377 3 tatmtaaatcg Tokyo 378 3 tamttTaaatcg BSR101 379 p~ ~ N p a ttaatta cttaag 380 (1,2) 1 ttaattaTcttaag Archer 381 2 tgcaaaaaaataaag BSR101 382 2 tgcaaaAaaaataaa Archer 383 2 actttatCttmg PI437.654 384 3 aagtCcaGcac Bell 385 3 , aagtCcaAcact PI437.654 386 3 aagtTcaGcact BSR101 387 4 atgCcCttttgt PI437.654 388 4 gatgCcAttttg BSR101 389 4 ggatgTcAtttt PI340.046 390 5 tttgctTtgtatg PI437.654 391 5 ttgctCtgtatg BSR101 392 6 agaattCgcatat PI437.654 393 6 gaattTgcatatc BSR101 394 7 tcatttAcccaaa PI437.654 395 7 tcatttGcccaa BSR101 396 8 tgtgctTatgag P437.654 397 8 gtgctGatgag BSR101 398 9 taattTAgcttaag PI437.654.399 9 ttaattCGgcttaa BSR101 400 10 ctgaaTgaggg PI437.654.401 10 ctgaaCgaggg BSR101 402 11 aattgtaAtcattg PI437.654 403 11 attgtaGtcattg BSR101 404 12 gataacCactca BSR101 405 12 gataacactcatt Sanga 406 13 agaataatatgcg PI103.091 408 14 attatatCacatgt PI340.046 409 14 attatatTacatgta BSRi01 410 15 tatataGggctg PI437.654 411 15 atatataTggctg Sanga 412 p p a gtataaa aaagg . 413 (2) 1 gtataaaaaaggg Lincoln 414 2 atccaCtaaatg PI437.654 415 2 atccaGtaaatg Lincoln 416 3 tcgataActtattt Lincoln 417 3 tcgataGcttatt PI437.654 418 4 atacGTTAGaaga Ogden 419 4 tatacTCaagact PI437.654 420 4 acTCTAGaagac Lincoln 421 4 tacTCTGGaaga CNS 422 5 aaatagTtaagatt PI437.654 423 5 aaatagCtaagatt Ogden 424 6 taacttaaAataaaa PI437.654 425 6 aacttaaGataaaa CNS 426 7 ttTcAtattaacc PI437.654 427 7 ~ ttGcGtCCTCA Ogden 428 8 aatcttCtataatc PI437.654 429 8 taatcttTtataatc Ogden 430 9 ttacaaGtttgag Ogden 431 9 ttacaaAtttgagt PI437.654 432 10 aaagaCtacttaa PI437.654 433 10 aaagaTtacttaaa Ogden 434 11 agattcAtatgttt PI437.654 435 11 ~ attcGTATGtatg Ogden 436 12 atacatACaaataa PI437.654 437 12 atacatATaaataag Ogden 438 12 atacatGCaaataa Lincoln 439 13 tttgttTGagaaaa Ogden 440 13 ttttgttCAagaaa PI437.654 441 13 tttgtiTAagaaaat Lincoln 442 14 ttatttCtttAttattPI437.654 443 14 ttatttTtttAttattLincoln 444 14 tatttCtttGttatt Ogden 445 15 aaatggCaaattg PI437.654 446 15 aaatggTaaattgt Lincoln 447 16 agttgGtctttg PI437.654 448 16 tagttgAtctttg Lincoln 449 17 attaacaGtaaagt PI37.654 450 17 tattaacataaaAgt Lincoln 451 18 ataaaagaatatat PI437.654 452 18 ataaaaAaatatatLincoln 453 19 cttctttCatttttPI437.654 454 19 cttctttAatttttaLincoln 455 20 tttactatGaaagaPI437.654 456 20 tttactatAaaagaLincoln 457 21 aacattCactataaLincoln 458 21 gaacattAactataPI437.654 459 22 taacattTgcataaLincoln 460 22 aacattCgcataa PI437.654 46-1 23 ttatataAtacataaPI437.654 462 23 ttatataTtacataaLincoln 463 24 cctcaTctaatg PI437.654 464 24 cctcaActaatg Lincoln 465 25 atccttGtttttg Lincoln 466 25 aatccttTtttttgPI437.654 467 26 aatccCcagaaa Lincoln 468 26 taatccTcagaaa PI437.654 469 27 atggaAgcgtc PI437.654 470 27 tggaGgcgtc Ogden 471 28 ggttgTggcg Ogden 472 28 ggttgAggcg PI437.654 473 P ~ p a actgc tataaca 474 (2) 1 actgcCtataac AKHarrow 47S
2 aaaattGccacg BSR101 476 2 aaaattAccacgt AKHarrow 477 3 ttctttTgtgaca BSR101 478 3 ttctttGgtgac AKHarrow 479 P 8 p a ggtga aaaaag . 480 (1) 1 ggtgaAaaaaagt Resnick 481 2 agtcaCattattc PI437.654 482 2 tagtcaTattattcResnick 483 3 taggtaCaaagtt PI437.654 484 3 aggtaTaaagttc Resnick 485 4 ttctctGtgttg PI437.654 486 4 ttctctCtgttg BSR101 487 5 ttgtgaAcataca PI437.654 488 5 tgtgaGcataca BSR101 489 6 tgataGatcttca Lincoln 490 6 gtgataAatcttc PI437.654 491 7 tacctactatgat PI437.654 492 7 tacctaTActatg BSR101 493 8 agaattAtgttgttPI437.654 494 8 agaattGtgttgt BSR101 495 9 agtatttAtaccaaBSR101 496 9 agtatttTtaccaaPI437.654 497 10 ataaaaaGgatgaaBSR101 498 10 taaaaaTgatgaagPI437.654 499 11 tatattAgaggat Resnik 500 11 tatattTgaggat PI86050 501 WO 99/31964 ~ PCT/US98/26935 12 atttatAcgagga PI437.654 502 12 atttatGcgagg Resnik 503 13 ttgtattATcaaattLincoln 504 13 tgtattTACcaaat PI437.654 505 14 gmcaAacaca Resnik 506 14 gtttcaCacaca PI86050 507 15 gatccGtatcc Resnik 508 15 agatccAtatcc PI86050 509 16 tatccGaccca _ 510 16 tatccAacccat Resnik 511 Pn~ P a ggaac ttacc 512 ( 1 ) 1 tggaacAttacc PI340.046 513 2 atgcttAactaac Burlison 514 2 tgcttGactaac BSR101 515 3 aaacacATaaaatg BSR101 516 3 aaaacacaaaatga Burlison 517 4 tcctgtAttttag BSR101 518 4 cctgtGmtag PI88788 519 5 cgttttAAaaatm PI88788 520 5 cgttttTTaaattttBSR101 521 6 tgataaaGttatttaBSR101 522 6 tgataaaAttatttaPI88788 523 7 tatttacTggttt BSR101 524 7 tatttacAggttt PI88788 525 8 aattttaTatttatgPI88788 526 8 aattttaGatttatgBSR101 527 9 aatttaGcacttc PI88788 528 9 aatttaAcacttc BSR101 529 10 gtgcGTtaTgc Burlison 530 10 gtgcTAtaAgct Proto 531 10 gtgcTTtaAgct BSR101 532 11 atttttTGgtatg PI88788 533 11 atttttTAgtatgg Burlison 534 11 aamttGAgtatg BSR101 535 12 gaaaaatCaaggt BSR101 536 12 gaaaaatAaaggta PI88788 537 13 ggtttTgccga PI88788 538 13 ggtttGgccg BSR101 539 14 tcaattCtcttag BSR101 540 14 tcaattTtcttagt Proto 541 15 tttatttTTaaaaaaaBSR101 542 15 tttatttTAaaaaaaaProto 543 15 tttatttATaaaaaaaBurlison 544 15 tttatttAAaaaaaaaPI88788 545 16 atmtGaaaattc BSR101 546 16 cattmTaaaattc PI88788 547 17 mcaatTtgtcat PI88788 548 17 ttcaatGtgtcat BSR101 549 18 acagaaCtcaac PI88788 550 18 acagaaTtcaaca BSR101 551 19 ctatTgaAggttt PI88788 552 19 tatGgaGggm BSR101 553 20 gactaaAgtgag Burlison 554 20 gactaaTgtgag BSR101 555 21 taacacTtaacac Burlison 556 21 taacacAtaacac BSR101 557 22 gtaaagttcaag Burlison 558 22 taaAGTTagttcaa BSR101 559 -23 agaaaaCaactgt BSR101 560 23 tagaaaaTaactgt Burlison 561 24 gccAAAACAGTt BSR101 562 24 caagcctaatag PI340.046 563 pK p a actata ttcgc . 564 ( 1 ) 1 actataCttcgc Lincoln 565 p 00 p a atgcc gcatg 566 (2) 1 atgccTgcatg AKHarrow 567 2 gtgtatGgttgt BSR101 568 2 tgtgtatAgttgt AKHarrow 569 3 gcatGtcgac ~ BSR101 570 3 gcatCtcgac AKHarrow 571 4 aaaaagTatgagg BSR101 572 4 aaaaaagCatgag AKHarrow 573 5 . gtgggcattag Williams 574 5 gtgggCcatta Mandarin 575 p p a 064 tgtgaa attagc . 576 ( I ) 1 gtgaaCattagc BSR 101 577 2 tgtatcGtaaatc BSR101 578 2 tgtatctaaatctt PI437.654 579 p a taaaatt ttggtt 580 P
(1) 1 taaaattAttggtt PI437.654 581 oy can popu atlons us to con a map ocatlon mar er by ASH
lrm t o eac ocus were (1) PI437.654 X BSR101, (2) Bell X YB17E, (3) B152 X Century 84, and (4) Shoemaker and Olson 1993. ~ indicates the marker's map location as an RFLP or AFLP
marker and was not confirmed by ASH.
$ Upper-case letters show polymorphic nucleotides in soybean.
Table 3. Forward and reverse PCR primer sequences for soybean marker loci.
mp icon ~'F"o'rw~ runer everse runer pnausi3uagaaggttgtaagagtctcggtcgtcg caaacgctcccaacagcucagaatctc p ggaatcaacttcacgtgagtgggagac 8 BggaB~B~aB~tgac p U6 tggtggctatggaaatctcatgtgtgga ctctcatttaccaaactccaacaatgatcacc p gaagaagattccacccagatcatcatcagtag cacactggtagatgggaagcaagaatagg p gcagtgcagctccaatcggtgtaac gtgcaatccaagacatctggacggac phalOti~cagctaaaccttacaaggatgattggtcaag ccctggactgaagngccataa"tgtatc _"
p a gctggttgggagaaagcacucc gtccagggctgtgaagagcaatg -p a gagggactatgtgaaatggagaggagtg gtatgctaaaagaggagacttgactggigag p gatgaaggaaccaacacttgcataacaamg tgtcagcactcctcacccamgccga S p a gagaacaaggacaaatcaataggtgagacgaagaaa cacttaacaggagtgcccctgatcaccag p a ctcatctgctcagaaccttcagtcagtc cggatcatgtctagtacactagagatgcttgtg p a ctacacttctaatgcctatttaggtgtgcttg gtcatatctagggagatttctaaccagttgtc p a taggcagcgtgacaaactgagcatagg gggttgat tcc at g g gggtaaatgaagttg p a gcactaatactttagttgactmgaggtggagag gccatgtggtagaagtatatgaaaaggtagatgacag ~
1 p a gaatatactagcttgatgcctatttgmctaaaccc agcagtcatacaatgctctttattgtggtgaag-p a catcttgaccagccaccaatctgagtacag ggcttcagtgagaaaggtggatcaaatgga p a ccaatatggaatgcagaaagtggatctatgttcag tcggtgtgttcaactatgacmggactg p a cccccaacaaaactaaaaatagaaccctcaacaacc ggtccaagaacattatgatcttgaacaaccttcac p a gcagaagaagaagataacgtacaagcatcaatcaagc caactcgtgcgatcctgagaaaactcc 15 p cattgggcatgattcttgaatagcctttttacc tggccagagcatcaacacatctataccttc p aaggtcggcttggtggttaaaggcag gagggctatguttcttctccagatgtgag p cgcataaccaaaacagtaacatcaatggaacag ccaaggaaggttggggatatagactagtg p a gaaactaaccttctgagaaaaccacacgttg gccttttatgcacattmcctggggatctaac p a caaaaatagcactggcagtacctggaacac gaaaaggmgttatgcttcgtactctgtctc p catatgtgctcagtgcctcc taccaattcaacacactgctac p a gatcgctactatgtataactatttgac tcctccgttactccaatcag p a gattacattaattaccgctatgactatatcttgggac gctaccactccattgcttctatgtattggtc p gtctcgctaatgggagtgaaac actagggtgtaaggaaaaggattc p a gagattggaaattgtagctctctttacttgctg cmgaggactcantggttgttattataggcamgg $ p a gaccgggctcgtaaaagcaatgggaatcat cctcctacagaattgtcaggcnatcatcg p a ctmcctacaggattgtcaggcttatcgtca gggctcgtaagtgtaatgggcatcag p a tttatagtgtcggtgcgtagactctgttactc caccaacttcatgcagctaggtcccaa p a ccaggtgtgtacatataaaaccaaacaggctc gaagagtaacagagtctacgcaccgac p ttatgggtttccttgtgttgcncttttctcc cccaaccgcttcctatatatcaaaatgatttcac 3~
p a gtgcaatctcatactggcttttcacgatgatag gctctgccctttgtcaacatcctaatgg p a gggaagaagaagaacactcggtacagtag aagctaggaaatccacactcaaattatcgacttgtgt p a cctgcttcttaatcttgttattctg cgaacaatatgatgagttctacaaag p a cactaagaattcgctcgctgta catcatcctgtccccctcct p a gaacracatggttattttatatgctac gatcctctccaatggcacacc p a gatgggaagatttacttctggtagac ggctctaaaccagttttattgacaag p a gcmgcatgaggattgataaggca gcugacatatagaaaatctgtgactga p a caccatcccttatttaattagtgcttaaaagattcttc ggagggtgcttatgtaaatgatgtaaagascat p a ctgcmtggactctgttgggataaacttctc catgaagctccaccamgctagtacatgaaac P caattcatggtttctcttat gtaacgaccacggttatc 40 p a gagttgcttcaaugctcatac ccuagaggcgagagtgtgg p a cttgcctcgcaccttctc ctagccttctccacaugttc P gattcggtagagtaccagatgat agcaaaacgccaccamcc p a tgttggcaggatctttgtgac atgctggaugtcttttctaccc P a gctgccaggttaagtgtttc cagattgacacgactggaag p a t gcaggacattcacagtcattg ctgctgtattacgttccattcttac P cagggtgggatgtccaatg c agactagagtatgaataatctacctc P a ctgaaccacttaaagagtcaacagc ctctcccatagatgacaaag P a gaatttgatcactcagaaaagggt atatgaatatgaatcaaccg P a a acggttgaccactacacag a gacgcaggatatggtgaag $
0 p a gacaacataaaagagtctataagacctccaagg taaatgacgagtatttatatgaagtgcagg P g ccgtgtaagcgtgtuaccaatctagttgc gctactgcaagttatcagtcaagagattattcc P a g tgctcaucttcacagagaag c aaactt t tt g B B ggBaB~ag P a c tcatgtaaccaactctctatgaagmgagatccac tctaatcggatttggtgtttcacacggtaag P a g atggctgtcattgccacagaggagtatcg tgactccaaaggaaagagasatgtttcaaaatcatc $$
P g gatagcaagtcaatttcatgccttgtgatagg caggacatgaagatgtacttagtgaatgtgaag P a ctggaagagggtgcaagggaatctgg agaacctagtctaccaccataccacgaac P c tcagtcgcattgcattaacatca caatcttcctccaccttctc WO 99/31964 ('7 PCT/US98/26935 p a caacgatctctccatgtccaac cggatgtaagtaatatgttatgttgg p a cacgggtacatgcatccttc gagaaattnagtggcaatgagtc p gccctacaatgacraaattacgtgtg gactgcttggctmgtctttcaagaagag Table 4: Sequences of nucleotide polymorphism regions:
pha08230 (SEQ ID N0:712) TTGAAGGTTG TAAGAGTCTC GGTCGTCGT TGTTCACCAA GGTAAGAATG
GCAGTCCCT GTAATTAACA AAACAAATTC AATAACATAA TTTTATGAGA
AGTAAAAGCA TGTGATGATG TTTATTAATT AATTAAATAA GTGTCATAGT
AAGATTAGTT TTTATAAAAT T[G/A]TTGG TTTTGTI"I'T'T ATTTATTTAA TAATZTI"T'CT
TAATCCGCAT AAAATAATGA CATTTATTTT ATTTTGAGAG GAAAGGAATA
CTAACCGTTA AGGATAACGA TGAGGTAATC AGCGTCAGCA TGGTGGGGGA
GAAGAAGGGT GTTGGGTTTG GAGTTGAACT CCAAAATGCG GTAGTCTCG
GAGATTCTGA AGCTGTTGGG AGCGTTTGT TGAACCTCTG GAGGACGCGA
ACGTGGCCAT ATTGGTI7"I"T GAAGAGAGTT TGGAACCTTT TAGAGTTGAA
GTGAAAAGGG TTCTTATTCT TATGTCTTCG TGGTTCTCTT TGAGACTCAG
AACCTTCACT TTCTTGGCTC TCTTTGTCTT GCTCCTCATC CTCGTCTTGG
TCTTCTTCTT CTTCTTCACT TTCCTTTCCT TGGTGCTITT' CCTGCTTGTG
TTGCCATTCG TGCTTTTCCT CTTCCTTTTG ATGAGGTTGG TGTG
pha10598 (SEQ ID N0:713) CTGCAGCTCC TTCTGCATCT TCCACCTCAA TAGTGATGGN TAACTCTATG
GAAGATGCAG TGGTAGTGTG GTGACCAGAT GTTCCTCCAC AATAATGCTC
CATTTCTTTG ACCTCAGGGA TTCCAGTGAT CTCCATGTCT TGACCTCTGC
CCACATTCCA TTGTCTTGAC CTCCTCCCAC ATTCCATTGT CTTGACCTCC
TCCCCCTCTT GTGATGTTAT CAAAGGTGGC TCCTCGTTTG TCCAAGAATA
AACAGCTACA TACATCACTG CATTCTGAAT CTGAAACACT CTCCTCTACT
CTTGTTCACA TGGTCTATTG AATGGGAAT CAACTTCTTC GTGAGTGGGT
TGTTCATTT TTGACCTTGT CCCCAAACTT GATGCCTTAA TTTTGCATCC
ACCTTTCCAC TCACTGCATG ATTTCTTCAA TTCCCCTGAC CTTGATATAG
AGGGTTGTGC CTCTGCCCCT TCTCAGGAAT GA[C/T]TTT GACTCTTCAA
TAGCTTGTGC TGCATTTGCT TCAACTCTTA AATGTTTACC TTGTCAGAA
CAAGGCTTCA ACCCACAAGA ATTGGCTAG CTCCTTCAGG GACTCTGAAC
TCCGTTGGGG TCACTGCTCG ATTCTTGTTT GGAACTAAGC ATTAGCACTG CAG
pha10615 (SEQ ID N0:714, 715) CTGCAGAGAG GAGTGGTGTT CATGCTTTCC CTGCTGGGGT GTGCCCACTG
TGGGTGCTG GTGGCCACTT CCTGGTGGT GGCTATGGAA ATCTCATGTG
TGGATAATA TCATTGATGC AAGATTGGTT GATGTAAACG GTAACATACT
TGACAGAAAG TCAATGGGGG AAGATCAGTT TTGGGCCATA AGAGGAGGTG
GTGGGGGAAG TTTTGGTGTC ATTC[A/T]T TCATGGAAGA TCAAGTTTGT
TTTTGTGACT CCAAAAGTGA CTGTTTTCAA AGTGATGAGA AACTTGGAGT
TGGAAGATGG TGCAAAGGGT CTTGTTTACA AGTGGCAATT GATTGCAACA
AAATTGCATG AAGATCTI'T'T CATAAGAGTG ATGCATGATG TGGTTGATGG
CACTCAAAAT GCCAATAAGA AGACCATTCA GGTTAC'ITT"T ATTGGTTTGT
TCTTGGCAA GGTGATCAAA TGTTGGAGTT TGGTAAATGA GAGTNTCCT
GAATTGGGTT TGGAGCAAAG TGACTGCATT GAAATGCCAT GGATCAACTC
CACCCTTTAT TGGTTCAATT ACCCAATTGG GACCCCCATT GTTTGATGTG
CCCAAAGAGC CCCTTTCACA TAGCTTCAAA ACCATGTCAG ATTATGTGtIA
GAGGCCCATT AGAGAAACTG CTCTTAAGTC CATTATGATT AAAAGTGAGA
GAGAGTGTGA GGATGGAATG GAATCCTTAT GGTGGAAAGA TGCATGAGAT
TTCACCATCA. GAAACTCCAT TTCCTCATAG AGCAGGGAAC TTGTTCTTGA
TTGAGTACTT AACATCTTGG GGGCAAGATG GGTTTGGATG CAGGTAATCC
GTTACCTAAA CATTTCAAGG TCA...(- 1300 bp)...AACATTAAGA CTATATTTAG
ATT'IZ'T'1'T"TG AAAGACAAAA GAATATTATT GATGATGGTT TGGCCCGCAA
GGTGTACAAA AACATGAGGA TAACAGTTTT TCCTCCCAAA ACAAAAAGCT
CGAGCACCAT ATGACTGAAA AGTGAAAACA AGACATATAC AATGCAAGTT
AAATTTAGAG ATATATCTCT AAATGCTGCA ACTAAGCCCG GAAAGTCAAA
CAAAAGCTAA CATATATCCA TCTGCAG
pha10618 (SEQ ID N0:716, 717) CTGCAGCACT CGAAAGTGAC CTGCATCAGC CTTCAACCTT CACCCTTATG
CAGCAACAAC CAAAACCATA ATTTAAAACC TATTTTGAAA CATATCCATG
TACATTTTCC CGCTATCAAG CCTGTTCTTT TAACAATAAT CATTATTCTC
TTGTCCAAAA ACTGGAAGCA ACTGAGATCA AGACAAATTG AGTGCAAAAT
CTACACAATC TATGTTCAAA ATATTGAGTG CAGGTCTCTT TCTTGCAATT
CCAAGTCGTA TCTATTGGGG TCACACACAA ATGGAATATC ACTAAAGCTA
TCGAAGAA GATTCCACCC AGATCATCAT CAGTAGCTA TTTGTGTCTG TTGATCCTTT
GTGGCTTCCC CATCA[T/G] CTTCTTCTCC [A/C]GAAAC ATTTGAATTC TTTTCAACT
AATTCCCTAA ATAACGA.......(-- 410 bp).....TGATGGTATG TGGAGATTTA
ACTTACTGTA GGTTCCAAGG TAGAGATATT TGTTTCCCAA AAACCCTT_C
CTATTCTTGC TTCCCATCTA CCAGTGTGG TGGTGCCTGC AAAACATTCA
CCATGTGTTA GTGATTCATC TTCGCAAGCA ACAATTCAGA AGCCAATTCG
AGTCGTATTC ATCAGAAATA TCCACCTCGC AACACCCCTA TACTTTGATA
CACCTCTTGA AAAGCCACTG CTCTTTCTGA TCAATTTCAA AATAACCAAA
GTCAAAAAAT CATATATAAT ATCAAATTCT GTACAACTTT ATATGCTGCA
AACACTACTA CTATAGTGCT ATTGCTATCT CTTATAAAAT ATAAAACTGC AG
pha10620 (SEQ ID N0:718) ACTGATGCTG GGCGTGTTCA TAACCTTGCT CCCATACTAA CAGAGCATCT
GTTTTCCTTC CCAATGCAGA AAGGGCGTGA CCTAGATGAC AGAATCAACC
AGTGCTAAGT TCAAAAGACA AGGCATTTGC CTCTATGACT CTTTCTATGG
TAAAATTTAA ACAAAACTGT CGCGTGATGA TAATAATCCA TAAACAGAAC
GAATAAATGA TAAATCCATC AAAATATAAA TTCCAAGTTC CAACACGGTT
CTTAACTTTT TATTCATCAA CCAGTAACCA CCTCTGTTAT TCTTAAAAGA
ACTTAAATCG CACTTCCCAG GGTACAAAAA GTTAAAAAGA TTCATCAAGA
ATTGGATGAA TTTAACAAGC AAGAGACCTT TGAGAATGTA GGCTTGAAGT
CGCGAGGGGT CAAGCTGAAG CGCTTTGTTG CAATCCTTAA TTACGTGCT
TGTGCAGCTC CAATCGGCTG TAACAGAAG GCCCTGTTAC TGAAAGCAAG
CATTGAACAT CACATTAGCA TTCACGGAAG AGAATTAAAC TTAATCGTGA
T[C/T]CGTG ATGAAAAAGA AACAGAATCG GAGAGGAATG GAGGTCCGAA
CCAGATGTCT TGGATTGCA CCAGACTGA GAAACGAGCG AATCGAGGAC
TCGAATGGCC TTGGACCAGT CCTTGGAG
pha10621 (SEQ ID N0:719) CAGCTAAACC TTACAAGGAT GATTGGTCAA GAAAAAGGA ATTCCCAGAA
ACGAGGAACT CACTCAGGCT ATGATAGTGA TGATAACATG TGTGAACAAC
GTTCTCTTTC AAAGAATAAC ATAAATGCAA TTCGTCATTC ACAAGTAGGG
[T/C]CTCT GAATCTATTT TCTTCATACT T[C]GTACTC TATGATAATC ATACTTTCC
TGTTCCCTAT CTTTCACAAC ATAAAATCTT GAGGATTAAG TGGGGAGAT
SO TTTGGCTACT ATTTATATAT CTAATAATC ACTATGAATA TTGAATATGC
AGTCTGGTGT GTGACACACA TTTGGGCTAA AACCAAACTC ATATTTAGTT
GAAAAAAATA AGACTCGTGG ATCTTCTATT TATAATATGG GCATACATTC
TTATATGGAG TA[TAATCTT TTCAGGTTCC ACGAGTA]AT TGATACATT
ATGGCAACTT CAGTCCAGGG ATATGCTGC CAGCTATpha 10623 AATT GAAATCTATT
ATCAATGGTG AATTCCCTG CTGGTTGGGA GAAAGCACTT CCGGTGAGT
AAATTCTAAA CTCACAGGTT TTTCTTACAT GTTTGTAACT TTTCAAGGTT
TGGATT[AC/ TT]ATACTAT CCAAAGTTGA TCATGA[ATC TGA)TTTCAG [TT/CAAGTA
TCA]TGATTT GGTAGTATGG GGCTTCAAAA AGTTATTCCA AATATTAAAA
CCTGGCTTCC AATTTGATTT CAGACATACA CTCCAGAGAG CCCAGCGGAT
GCCACCAGAA ACCTGTCTCA AACAAACCTT AATGCCCTTG CAAAGGTTCT
TCCCGGTCTG CTTGGTGGCA GTGCAGATCT TGCTTCTTCC A~ACATGACCT
TGCTCAAAAT GTT[T/C]GG GGACTTCCAA AAGGATACTC CAGCAGAGCG
TAATGTTAGA TTCGGTGTTA GAGAACA[C/ T]GGAATGGG AGCTATCTGC AA[C/T)GG
CATTGCTCTT CACAGCCCTG GACTGATTC CATATTGTGC AACCTTCTTT GG
pha10624 (SEQ ID N0:721, 722) CTGCAGGTAC TTITTATGAA TCAACTGCTT CTATTTCACA GCTTGTTTTC
ATTTATCTCT TCTAAAAGGG GAAGAGGGAA AAATTGTATT ATTTGCTAAA
ATATGATTTT CTTTTATCTA GAGGGACTA TGTGAAATGG AGAGGAGTG
CTATTTCTTC GGTTCCTI"I"T GTCACTTGTT GACGAATCAG AAAAGATAAG
GCAGTTAGCG GATTTTCTCT TTGGAAATAT TTTGAAAGGT TTGAA[G/T]
CTTTATCTTG ATAGTTATAC TTGCCATTAT ATATAAATTG GAAAATATGT
TTGTCATTGG ACTTTAATTA TTGTGTZTI"T CTCACCAGT CAAGTCTCCT
CTTTTAGCAT ACAATAGTT TTGTTGAGGC TGTI"ITTGTT CTGAACGACT
GTCATGTCCA TAATGGGCAT CGTGAGTCTC AAGGATCACG AA...(-- 970 bp)...AAATCTTGAA AGAATTAATA ATAAAAAAAC TATTTTATGG GGACAAAATA
CTTATTTAAG CCTTCTAATT ATAACATTAA TTGTGGGATC AGGATGCTTT
TCAAATTCTC GGCTGTAAAG AGATACGCAT TTCATCCACT CGTGCATCAT
CTGAGTCAGC AGATGTAGAG GAGGpAGGGG GAGA
pha10632 (SEQ ID N0:723) ACAAGCAGAG TACACCACGT TAGCAATCAG CAATCAATGT CAGAGCACAA
GAAACTGAAT CTAGAAAGAC CCTATCAACT TTCCTGTTTC CATTGCATGG
CCACAGATTG TGTTTTAATA AAGTTCAGTA ATTGATCACA CTCATTGTTT
GTACAATCTA TTATTCAAAA CTACTAGGCC TACAAATCAT GTTTAGCAGA
ATATACACTA AATAACCCAT CTGACAGACT CAGAATGCAA CTAAATGAT
GAAGGAACCA ACACTTGCAT AACAATTTG ATCACACGGT GTTTCAAATT
TCACATTGTA AAGGTAACCA AAAAAGAAGA AAAAGAAT[A /C]TTCCTAG
ATGAAATTTA CAAGCAAGTG TAAAACAACA CATGGTTAAA TGTGT[GT]T
TGGZT11TCT TTGTTATTAT TATTATTTT'T [TJAAATCG GCAAATGAGT GAGGAGTGCT
GACA[A/G] CACACTTATT ACTTGTTAAA ATTTATGGAA AACTAAAAAA
ATCATAAGAT TTAGTGAGAC TAACAAAATT TAGTCAATAA AAAAAA
pha10633 (SEQ ID NO: 724) CTGCAGTGGC ATTAGGCTCA TNATCACAAA ACAAATCCAT AACAGAAAAT
TACCGAGTCT GAGAACAAG GACAAATCAA TAGGTGAGAC GAAGAAAGN
AAGGACAGGC AATTNATGAT TTTTAAAAAG GAAAAGCAAA GATAGATGTT
AAATAAATTC CAGTGTTGTG GCCTCGANAG ANAAATTGCT AAGATAAAAT
CTAA[A/C]A TTTAGTACTA TAGCAAGAAA CACATCCCCA ACATACGTTT
GTTGAATATC TATGATATAC TTCTTACATT GATCAGTTAC TGTCTTCCAT
TGATTTGGGG TCCATTAAAA GCAGTATTC TGGTGATCAG GAGCACTCCT
GTTAAGTGG GGAAAAAAGT ATCAATTCTT TAAATGCTGA
pha10634 (SEQ ID NO: 725) TTCTATCATT AGCTCCGGAA TTGTTAACTC TTGATAACAA GTTTTCTGTA
CTTACAGGTG CAGCCCTCTT GCTGTCAATG CTGCTCAAAG TTATCTCTTC
WO 99/31964 ~0 PCT/US98/26935 CCTTGGTCCC AGTTGAGTAT TTATCAGTTT TTCCTGAATA AAAAGCAATC
AAGTATTCAA TATCACAGAA TACCTATGAA ATGTCACAGT ATAAAGGAAT
ATTATGGGAA CAGTACCCTC AATGCAGAAT TTTCCATCC TCATCTGCTC
AGAACCTTCA GTCAGTCGA GTTATTTCTG ACTTCAGAGA CACATTCTCA
GCAGTTAACA TCTCAACTTT TCGTGCCAAT TCTTCAGTCT CGGCCTAAAC
AGAGGAAATA TGAATATTAG TTATTAACTT TCCTCTTGTC AGAACAATCC
TTTTCCTTT TTAGTTATTT TATTTTATTT TTTGAGGCTA CATATGTGTA
CATGCAACAA GTGAATAAAC ATGTGATAGT AATAAAGAAA CTAAACAATT
TGTCTTTT"TA TCAGACAAGA CACGATGGAA GGTTTGCAAG TCAACAAACT
AAAT[T/G]T ATACCTGCTT CCTCAGCCTG GACCTTCTAG CAGATTCACG
GTTAGATTGT TTTCTCCTCT CCCGTTTCAG CTCACGCTCA TTCTAAGGAA
TTAAAAATAC ACACAGGAAT TAATTTACTG TCAGTAAAAG ATGGACGGTA
TTGCATAATT GTTCTAATGT GTTTCACAA GCATCTCTAA TGTACTAGAC
ATGATCCGA GACATTAGAA CAAACCATTA TACCTGTAAC CAAGTTTCAT
TACGCACAAC TGCACAAGGT TGTGCGGCAC TTGTGGAATT TGCCTTAGAA
TGAACAGTCG AAGGGTTCCT CAGCTCCAGT GCTGTGGCCA TACCTGAAGA
AACTACAGGT CCAACTAATG TTCCTGCAAC ACTAGCTGGG TAGCTGACAC
AATCTTTTGG AAGATGCAGT CTCTTGGAAG CCCGGACCAT TTGTAGCTCA
GTTTTCCCTT CTGCATCTGC TTGTACAAGA TTGATATAAT ATATAACCCT
CAAAACACAT ATTATGTTGA CAACTTAATT CCTAAAATTA ATAGCTGAAG
TTACTTAAAT ATCACATTCT GCATGCAAGT ATGACATTTT CTTGATAATT
GCATACAAAG AACACCAAAA GCTTGACTGT ATTAAGTCTG AAAACCAGGT
AAAAACCCAT TCAAGTATCC CATTTGTCAA ACTAATCAGA AAGTTAAAAC
TTGGGTAATT TCATTAACTG TGCATAAGTA ATAGGATAAT ATCAACTTAT
CAAGACCATA GTACCATACT CAATGATCTA GTTGACAACA TTGCACAGTC
ATGACACCAA AAGTATAACT GTAATCCAAC ACAACTCTTT TGTTCATCCA
GATGCAACAG TGAATAGGAG TCATACCAGT GATCGGTGTT CCTTCTCGGC
TTCTITTCCT TTTTGTTTGA TTAGCCTGAG CATAAAATTA ACAAATAACA
TAATTTAGTG CTTCAATTTG TCAGATCATT TAAATAANAT AGACTGCTAG
TGCACACAAA CAACAGATGT TAATGGTGAG AAAGTTCTCA CCCTGCAG
pha10635 (SEQ ID N0:726) CTGCAGTTTT TCCCAGAAAA AAAATGGTTA CGATTTAGAA CTAATGATGT
AGCACAAATA ATTGTTCAAA TCATTATGCT TCCAGCAACA AAAACTCAGA
TGCAACAATG GAGATGCCCA TGGCGAATCA AAATGCACTG GAAAATAGAT
TGGCACGATC ATAAAAGATC TTCTACACT TCTAATGCCT ATTTAGGTGT
GCTTGATAA AGATGAATTA GG[G]GGCAG GGATAGTTTT AAGAAATGAA
AAATGTTAAC TTGTTACTAT ATGTA[A/C) CAAAAGTAGA AACATCATCA
TTTCATTACA CCAAATGTTT TCATCTCTAT TTTAAGATGA CAATGAAGTG
AACATATAAT AAAAGAGCTC AATTTCGATA CACTTGGTAT AAAAG'ITITI' ATATTGACA ACTGGTTAGA AATCTCCCTA GATATGACT AAAATAATTT
TTACAAAAGT CAACAATTTT TAATTATGTG ACAATTTATG ATTGGATAAC
AGTGTAAAAA GAACATTTGT ATTGTTTGTG TATTTCCTAT TAAATTCATA
AAATAACATT TCTTTCATGC TTATI"I~'AT G'ITI"1~TCTTG GAAATACTTT
TCACCTATCC AACCATGCAC ATTTI'T'AACT AAAACCAAAA ATAGAAGCAA
TAACATACTT GAATTGATTG ATTCCATCAG CAAGAGCCGA ACCAAAAAGG
ACCATAATCT TGTCTGAGAT GCCAAAGGCA ATATTTTTCC TTGACACTAC
AGCAATCTGC AG
pha10636 (SEQ ID N0:727, 728) CTGCAGATTC ACTTCAGGAA TGGAACAAAA ACCACCCAAC CAAGCACTGC
TCACAAAAGG AATTTCATCT GTAAGTCGTC TAAGTCCACA TTCTCATTTT
TTAZTI"I"TTT GCTAATTATT TTCAACATCT TCAAGGTTCT TTTGTTTTTA
WO 99/31964 ~ 1 PCT/US98/26935 ACTAGTAATA GCGCCACTAA CTACCATTTT TTAAGGGGAA AAAAATTGGT
TACTGTGTGT ZTI'T'TCTTCT GTGGTCTTGA TCCAGACTAG TTTACATGTT
TTGGCGTTTT GGGTGTTGCA A[T]TTTTTT TTTGGAAAGG TTGTTATAGT
GGACTAGTGG CCATGGATGG TATTATGAGA GTTGATTAAG TAGGGGAAAA
AAAGGGTTTT TGTGGTTTTA AAAACTAGG CAGCGTGACA AACTGAGCAT
AGGGTTCCA TTATI'T"1"TAA CT~TTTAT TGAGAAAAAA AATAATATTT
TTGAGCCGTT TGGTTCCACT ATA[C/G]TT CGCCATCATG AAAAGCCGTG
TCCTTTGTGG CTTTCTTACA ATTCTACTTT ACTCTTACTC TACTCAAGAG
TTAAGATTCC TTZ"I'I'CAAAG CAACTTCAT TTACCCATCG GACATCAA C
CTATTTATT AACTATGTTT CCATGTCGAC CTTTGATGTA CAACACAAAA
ACAAACCAT AGCCATGCCC TTGTATTCTT GGCCAAAACA AGAGAGAGAA
AGAGAGAGAG TAAAAATCTA TGCTTTTCCT CCTCGGGAGT ATCAAAATGT
TTGTCGGACT CTCCCAAAAG TCATCAGTCA CCTI"I"TAAAC TTTTAAGATA
TACATGATAT TATCCACAAC AACGGTAAAA TTATGCATTG TATGTCGGGT
TACTCCACTT AACTATATTT TATTAACTTC TAAAAAACAC TATTTACACT
CGTCACTTCT TTACATCGAA AATCTGTAGT ATTATACATA CATTATAGTA
ACAAATTTCA CATTAAAAAT AAGAGTATAA TATAAGAAAG TATGCACCTT
TAATAACGCG TAAACGTGAA TTC'ITITI"TG GCTTTTGAAG CGTTGAGCCA
ACCTTTAAGC AAAATATTCA AATTTAACTC AAGTCATTGT CGGTAGTTAG
CTTATTTTGG TTATGATATG ATAGCCAGTT TAACGACAAT TCAAAATGCC
GACAGCATGA AA.....TCC AGTTTGGAAT AGTGACCAAG CCNAATTATA
ACTTTACGGA TCATACCAAN CCAAAGAACA TTGAGGTTTA ATATATATAT
ATATATAGCT TATGTCACTT TAATCAAAGC AGCAATTGCT GAAATTGCTG
GAGACAAGAA GTTGTTTAAT GTGGTGGCTT TTTGAGAAAT AACAGACAAC
CCCAATCTAA AAAAGTCCAG GAAGATATTG TTTACGTGTT GGGATTGAGA
TTGGAAGAAG AAGGTGAGAA TGTAAGAGCT GATACAGATT AGACTTGAAT
AGGTTGGGGA TTCCACTTGA TATATGGTGA TTATAAGGGC TNCNAAATTT
TGCTAACTTC AAGAAATAAA AACGTATTAA CTGATAAAAT GGAAGTTAAG
TCAACTTTNC TCTGCAG
pha10637 (SEQ ID N0:729) CTGCAGTTAC GAGTAATTGG GTAGGTTACT GCACTAATA CTTTAGTTGA
CTTIZ'GAGGT GGTTGAGAT ACTAGCTATC TATTAATCTG TTCTATAAAA
TACAGGCATA GCTGTTCTCT TGAACTATAG AATTTACATA GCCTTZT[C/
A]CCTATTAT TAATAACACC AAGACACACA TATACTGCTA GACACTTCTG
TATATGGGAT CTCTCAAAGA TCCTTTCATC TTTCTACATT TTATTTGGAT
ATGGGAATCA GGTGATCAAC CCTAACAGTT AGGGTTTGGA TGTGTTCTCC
AACCACCCAT GTTTGGTTGT GTCCTATAGC TCACACATTT TGAGACCTTT
TATCCTTTGA AGTTTGATTG ATTCCAACAT TCAGATTGTC TTTCCTCGAT
CAATTGGCCT ACTCAACCTT [T/G~CTGT CATCTACCTT TTCATATACT
TCTACCACAT AGC
pha10638 (SEQ ID N0:730, 731) CTGCAGCAAC AACCCCCAAA CCCCGTTGCC AGAAACAAGA CAATTTGTCA
AATGCAGAGC ATATATAAAA AATAGAGAGT AGGATCATCA CCTTTGCTGG
TTACTGGAAT ATTTTCTAAG TCCATTGAAC TCATTTTCTC TGATGCAGAG
CTCAAAGACC TGAAAAGCCA TTGTTTGTAA TAAGAACAGT AGACCATATA
TTTACTACCA TCAAGGTCTA AAAACACAAA ACAAAGTTCA CCTGCTAGCA
CCCCAAACAT TTGAGTTGCT TGATGAAAAA TATCATCTAC AATTGAATA
TACTAGCTTG ATGCCTATTT GTTTCTAAAC CCATATCCA AATA[C/A]T
AGTACCTCTC ACAATAGCTT TAAAGTAAAC ATGGAAATTC ATCAACCCCT
AAAT[G/C]C TCAACCTTAA TTGCCTCAGA GGCCATTTTA TTAATATAGA
TGGAAACCAG AAACATGTCC AACTATTACC TTGAAAAACT ATCATATTAG
CAACTAAGGA [G/A]GTTGC TGTATTTATC CTCT[T/C]A ACTAACCTGG
TTTGTGAACA AAGAAAATTC CAATTTTCTT TCAACAATAA AAATTCAGTT
TGGTTCCTAA TCACATTATG (T/C]AAAAG AGTAAACAAA ATGAGTTACA
AATAAAAGGT CAAATTTTAT CAAAAAATT CTTCACCACA ATAAAGAGCA
S TTGTATGACT GCTCTTTTA AATTAATTTT CAATCCTIZT AAGTTGCAAC
AATGATAAAG TATCAATTAC CTAAGAGTTC CAGGGAAAAT TAATAAGAGC
AAATAATTTA AGGATAATAG TTTATATGTC ACATACAACT AAAGCAGTGA
TTCTTAAGGA GGTTTTCCAA GATTGCACTA TGGTTTTACA GCTAAAGAAG
AGGTTATGAA TTTATTAAAC AGAAGCTGGG ACATCAATGG ATAAAATAAA
TGGATCCAGA AAGGTACAAA CCCCATTATC ATCTGTTTGG AAAAATTTGA
AAGCAAAATC ...( --- SObp )...CCAACC ACATATAGAC TGCTTATTAA
NACCCATCCA AGATTTCTAC TGAACCAAAA CGAATATGAT CCTAAATATN
CATTGAATTA ATCTTACCCA GATGGGGGGT CATTGATTGT AGTCCCCTTT
CCAAAAACCG GGGGGTTCCT AAATAAATTG AAAATAAAAA AGATTAGGAA
GCTAACAGAA ATTTGAATAT AACGCTCCAA AGTTCACACC ATATCAGTCC
ATATTCCCCT TGAAATAGAA AACTAATACA ATAGAAAGTG TCACACTACC
TCAAATTCCT CAAATTCTTT GTCCTCAATT GTTTCAACAC TTGGCATGGT
ACTAGGTGGT GGATGAAGCA AGTCTTGTGC TTCTTCCCAT GTAATTCTCA
ACTCCACAGC ATCTTCATTA TGTGTCAGCA ACCTCTTACT CTTTGGTGCA
ATATTTAGAG TCTTCTCAGA AACTGAAATT GTTTGCTGC AG
pha10640 (SEQ ID NO: 732, 733) CTGCAGA ATTITTCTCA TAATACAAGT TTAGAATTAGTTTCTCAAAA
AAACAAAAT TAAGAAAAGA AAAGTCAACA ATAAAGAAGT TTGATCCAAC
CTCTIZTATC ATGGACCCAT CTGCAAGCAT GTCCAATGGA AGCATCATTA
AATCACTTAA AGCATTAAGG AATTQGAAAG GTTTGAAGGA TGATTCACAT
TTGGATTCAT TGTTTTCATT GCTAACTTCA CGAGAGTCAC TATCATCAAT
GCTAAATAAA TCTGAAAGC CATCTTGACC AGTCACCAAT CTGAGTACAG
ATAATAAAA AGTTACTCAA GAGAACGAAC ATCATATATG ACATATATAT
CCTTCATGCA ATAGCAACCA AACTAATTTT ATTAGGTAAG GTTGGCTACA
TGGATGTATC TACGACATTA AGCTCTATCA AGCACCAAAT TTTCAGT[C/
T]AAACCATT TAGGGTAAGG TCTTACTAGA TGNTTTCTCA AGTTTTCTTA
GGTCTTTCTC CACCTI7TI"T GATTGAACTA ACTTCCATT TGATCCACCT
TTCTCACTGA AGCCTCTAC TGGTCTTCTT CTCACATGAT CAAACTAAGC
GAGTTTCCAT CATCTTCTCT TCAGTTGATG CTATTATAAC ACTGATGGAT
ACAT...(-.-610 bp)...ACATCCTAAT GCATATTGCT GTGGGTGCAA TGGAAGTAAG
ACTTACAGTG AAAAATATAT AAAGATTAAG ATTAATATTG TGTTATTTAA
AAGAAGAAAG AATGCATTAA ACAACTTCAG TTTCATAACT CACAGCATTC
TTCAGTTGTG CACCAGCCCC AAAGCCTGAT TTTCCAGCTG GAATAGGAAG
AACCATAGAA TCACTAATGG GATCTGATAT AGGATCCATG GGCATCTCTT
CAGCGGATTC ACGAAGAATA GCATTGAACA TTGCCACATC AAGTCTACTA
ACCAACTGCT CCATTACCTA CAACCACAAC AACAGTCCTA TTTCAATCAG
GCTGCAG
pha10641 (SEQ ID NO: 734) CTGCAGTCAA TCATAATATT CAACCATGAA ATCGAAGTAG TATCTAACCT
GTCTGCTCTC TTTTGCATTT CTGATGAGTC CATCAAGACA AGTATTGATG
ACCTTTAGGT CATCCTGAAA CTTTCTTTGC CTTGGGATTA TCCACCTTGC
CAATGGAATT TTCCAATAT GGAATGTAGA AAGTGGATCT ATGTTCAGC
TTCAAAAAGA GTGCCATAGA CTGC[C/T]T ATAACAGACT TAAAAGCAGA
GGATATATGA GCTCTCCTAA CCAAAGGTAA ACAGAACAAA TAAAAAAGTA
AGGTCTTGAA AATCATATTC CCAGCATGGC CTAAAATTTT GAGAATTAAC
TAAGGTGTAT TATGTACTTA TATTGATGAT GATAAGTCCT TZTI-TACCCA
WO 99/31964 ~3 PCTNS98/26935 CACAATTGTT TGTAGAAAAC TAGTAATTAT CTTCAACGTG TAAAATCAAG
AGATTGAGCT TCTGGAGTAC ATATTAATAA AGAAGTACAC TAAATAGGTA
GCTAGACACT TTTAAGGAGG CCAAATATAA GAAAGAAAAT T(A/G)CCAC
GTCTACTTTC ATTTACAATC AATACATCAA TATACACTTG AAATGCATAA
CCTTTAAGTC AAACCTTAAT AACTGGAGAT TCTTT[T/G) GTGACAGAA
CCAAAGTCAT AGTTGAACAC ACCGATCCC AATAATATCA AGAGCCAAAC
TAGAAAACTC TGCCTCAAGA TCCAATTCAA TTGAGTTAGG TCCATCATAA
CCCTCTCCTT CAAGAAGCTT ATTAAACTTC AATATTGTTC GTTCTGATTA
AGTTGTGAAT ATTTTGACCA TAGCTTCCAA GTATGAGTTA TGGAAAGCCA
GAGCAATGAC TGCATATTTT ATAAATATCA AGTAAAAGAA AAAATAATAT
TAGGTTACAT TCATAAAACA CTTGACATTA TTAGCACAGG AAAGACAAGC
TATTGAAACC ACTGATTATT CTTTAAAAGT TATTTAGACT CCTAAGTTGT
AACAGAATGC TGGTATTAAT TTAGTTAATT ACTGCAG
pha10646 (SEQ ID N0:616, 735) CC CCCAACAAAA CTAAAAATAG AACCCTCAAC AACC...(-45 bp)...
TCACACCAC ACAAAACCCA TATAATGGAG AAATTATTAC ATGGTCCAAG
ACACTTGTC TATAATTCTA ATTCTATGTG AC[AC]AACC GATTTTCTTG
ATGTATCAT GTGAAGATT GTTCAAGATC ATAATGTTCT TGGACCATG
TAAATAA
pha10647 (SEQ ID NO: 736, 737) CTGCAGAAGAA GAAGATAACG TACAAGCATC AATCAAGCT CTCACTGCTA
TTCCCCAAA ATATTTCAAA AAACAAAACA AGAAGAAGAA AAAACTGCAA
TGTATGGAA AAGTGGTTC TCTCGTTCAT CTTCGTGCTC TTGCTCGTCT
TCTCTTACTC AATTTTCATT GGCACCCTCG ACATTCGATC TTATTTCTTC
CCTCGCCTAA AGTTACCCGC GGCTGCACCT GCTCCCTGTG CACCCGAGCC
TCCTCTTCGG GTTTTCATGT ACGATCTCCC TCGCCGATTC AACGTCGGCA
TGATTGACCG CCGGAGCGCG TCGGAGACGC CGTCACGTTG AGGACTGGCC
GGCGTGG... (-255 bp). ..AACTCGCC CAAGCGTTCT TCGTGCCGTT
CTTCTCGNCG CTCAGCTTCA ACACGCACGG CCACACCATG AAGGATCCCG
CCACGCAGAT TGATCGCCAA TTGCAGGTGC GTCGCTAGGT TTCTCGATTG
TTCCGAATTG ATTGGTGAAT CANTCAAATG CTATGCTATG AGAGTTATTT
TCACAGCAGG TTTTGCATGT TTAGCCTTAG AACGCGATTA GTATGATCAC
TTGAAGATTT A[C]GATACA GGATGCTTGT TGAACGACAT TAGATGCTNA
AAATCATAAG AAATTTGATG AATGCTAAT[ A/G]TTCACA CCAGTCTAGA
AGGTAGGAAT ATCCTAATG GAGTTTTCTC AGGATCGCAC GAGTTGAAC
TCTGCTTATG TACCTGCAG
pha10648 (SEQ ID NO: 738, 739) CTGCAGCTTT GTAAAAATGT TTATGGCCA TTG GGCATG ATTC TTGAA
TAGCCt'TITf ACCAAGAAA CTGCATCATC ATTCTAGAAA TT[A/T]GTA
AGTT1TGTCT GATCACTTCA CAAGAGACCT TTAAA[A/G] AAGATTTATA
TGGCCCAAGA ATTTTGGAAA TGCATGCGCA AACCCATAAC TGAATGTAAA
GAAATTTACT AGTTTAATTG TAGTTCTCGG CTGAGTTTGT AGCCCTGCCT
TTGTT~"I'GTT TTGTTTTAGA CCGTTTTGTT CGGGCCTCTT TTGTTACTCA
AATGTTCACC GTTATTAATG GATTTTATTA TTTGTAGTTT AATTTCTCTG
ATAACCATTA TTGTTATTAT TTGTTCTTAC CATTATTAAT GGATTTTATT
TTCAATAAAA CAAAAAAAAG TTGAAATGTG TGTGTGCGCT TGTGTATTAT
ATAATGTGTC TGAGCTAT...( - 15 bp)... AATACAG GACACGTTTA
AATCACAAGA TATTTATCGG GAAGGTATA GATGTGTTGA TGCTCTAGCC
AACCATGCA CATTAGATGA ACATAGACTT TATGATGTTT GAGCAACCTC
CTACTATGTT GAGCCATTTT GATTGCTGAT TTAATAAATT TGGACTTGTG
WO 99/31964 ~4 PCTNS98/26935 TCCATCCTGG TAACAAGCAC TCCAAAATTC CTTCTTCTCT TCCCTATTGG
GGCACTCCCT CATAAGTCAT GGAAATTTCC AAATACACCC CAAACTIZ"TA
CTTTTGCACC ACCTCTCTTA TITT'TAAATT GTTATTACAA AACTACCCTC
AATTTCTTTA ACCTGTCTAT TTTGCAGACC TGCAG
pha10649 (SEQ ID N0:740) TGATTTTCTT TTGGATCCTG AGAACAAGCC CCCCAGAATT TCACTCATG
CATGTGCCAG AGTTCAGGAA TAGTCTTCCA AGGGAAACAA AGAAGCCCT
CATCTTGTTT CTCAAATTCA TCTTCATTTG GAATCAAAGG AGGCCTGAC
TGACTTTGTA GTAGGAGGAG GCCTTTGCAT TGCTGTGCTA CAAATGGTG
CTCCTTTCTT CCTTGAAGGA GCTCTTGGAT CCTGAAAATG AAGTAAAAT
TATTATATTT TCTAAGGGCT ATAAGGTCG GCTTGGTGGT TAAAGG CAG
AAGGGGGGA GGGAGAGGTC CGTGGTTTTG AGTTTGGAAT TTCTCTT AC
AAACAAACTA ACAACTAGTA TTGACCGATA AAAAAAAACT GATATTTC C
AGTTTATATA CTAGATGTTT AATAGGCATC ACAAGTCAGA TGCAAAGTT
TTTATATTT GTTTACTAAT TAGAAATCAT TCTTAGTATA ACTTTTAAGG
TAATTGTTA TAAAAGTCAA CAAACTTATC A[C/GJA[G/
C]ATGA[C/T]GATTTGTGAT T[C/T]GATG AACAGTGCAA AACACTITIT
AACTCTCATT ACATGTATTAT TTATTCATAT ACTATAACAT GTTTGATAAA
CATATATTCT GTTCTGTGA TCAAGAATGC AGAAAACTAT TCAACTTTAG
GATATACAGG TTATTTAGGAC TTAATTAACT TAGACCAAGG AAGAAATTTA
TACCTCAACC AAATGTTTC TTTGAAATTT GTAGGTI~ITT AAACTTGTTA
CAATGGTTCA GGACAACATGC TACGGAAGGA AATAATTGAC AAGTGTTA_C
TCACATCTGG AGAAGAAAAC ATAGCCCTC ACCCTCCA
pha 10650 (SEQ ID N0:741 ) _ CGCAATTAAC CCTCACTAAA GGGAACAAAA GCTTGCATGC CTGCAGCAAT
ATAACCAGGA TTCAGAATTA ATCTAGTTAG TATATCATAC AATGCAGGGC
CAGTTACAAT ACATACATA CGCATAACCA AAACAGTAAC ATCAATGGAA
CAGTAATAG GACACAAT(T /C]ATTATTA T[T/C]TTTT TGTTAAGGAA
ATTTCTAAGA AAAACACAAC CATTTGTACA AAAAAGGTAT TAATACATAG
CTACATGGAA GAAACCTACA TTA[A/T]AA TC[C/TJAGT AGTGAGAAAA
GATGGGGGCA ATTATGATAA TCTCGGAAAG CCTCTGCCAA GGGTCAGCAT
TCAAAATTGA GTTCCTTAGC CTGCTGTCTG CATATGCTTA TCCACAAGGA
ATATTGTCTC CGTGAGGATT AACCAAAAGC ATACCTCAAT GGGTCCAGAT
ATCCTGAAGA TAGCGCCCAA TTTGCTGAGC ACCAA[A/'I~ ATATAGGGCA
TCGGCAACGA GAAAGACTCC AATCTACGCC ACAAATGTCA AACTTGTGAA
TGTCAAGGTT AAGAAATAAG ATTTACAATT GAAGGTCTAC AGCAGATAGA
TTA[C/A]CA GGCGTGCAGA AAA_CACTAA TCTATATCCC CAAC CCT
T ,~G.CTGCAG GTCGACTCTA GAGGATCC
C CGGGTACCGA GCTCGAATTC GCCCTATAGT GAGTCGTATT ACACCCTATA
GTGAGTCGTA TTACGCCCTA TAGTGAGTCG TATTAC
pha10651 (SEQ ID N0:742, 743 AGCTTTCCAT GTTTATTCAC)GGCGCCATTG ACAGCTTGTT TCTTATTT1'T
TGTGTTCTGG CCATCTTCTT CATCTTCACT GTACTCACTC TCATAGTCAT
CATCATACTC TTCATCCTCG TTGTCCTCAT CCTCCTCAAC TCCATCCTCC
ATGATAGGTC CTTCATCAGC CAAAACAGCA CCCCTGGTGC CAGCCCGAGT
TCGTTTCAAA ACTTTGAGCT TTAAATTTGT CCTTCTGTCA CAGCCCAACC
AAGAATCACA CAAACAATAG CAAGTCAAAT TGTAGTTACC CTCTGATGGG
GCCTGGAACT TGCCCAATAC TAATCTAGAG CCCCCTTTCA CCTTCTCAAC
TGCTTCTGCA ACTACCTTAC TGGTCTCCTT CACATTNGNT CCTGACCCCT
CCATAGACTC CTCAATTGCC TTAGATGCAG CAGTTACAGC AGCAGCTTCA
WO 99/31964 ~5 PCT/US98/26935 TCCATGAAA CTAACCTTCT GAGAAAACCA CACGTTGTT TGAAACAGAA
TCCGCAAGCA AAAACCAGTA GTTTTCTTCC TTATGAAATG GGTAGTAGGG
GGCATGTGGA AGAGCACCAA TCAGGCTATT ACCCCTI"ITA ACATTTATCC
AAGCATGGAG AGTCACAATG TCCCCCTCTT GTTTACCCTC TTCACCTTCA
GTCTCACAAG TTACCTC[G/ A]AGTGTCAA GGAAGGCATC ATGTCCAGTA
CCGTCTCAAT GTCTTCCACT TCAGCGGAAG ATAACCCACC TGTTTGAATA
AGAAGGTCAG CTCGCTCCTG AGAGTCCATG TCATGAAGTT CCTGAAATGT
TCTCACTTTC TGCATTATCA AGAAGACAAA AT[T/GJCTT AGTTCCAACA
CTCAATGTCA TGGGATCAAT TTGGAAATTC AATGACCAAG TAAAAAAATT
CAACATTTGA TTTTCAAATA GCCAATAGTC ACTATACAGT ACATACAACT
TTGGGTTAAC TGTGCATAAA AGAGA[T/A] GGCTATGCAG TTTACGAAAG
AAAAAAAA.. .
(-- 40 by )...CAAAAT CAGACATTAA GG[T/C]TTC CACTTTCAAA CTCTAACCTA
AATTAAAAGG GGGGAAAAAG ACCAATCAAC ATATCATAGA CGCATTCTTT
GTTTCTAGAT TTTCTTAATT ATTTATGCAT TTGTCAATCT TGAGGACACT
TAGCAGAAAT AAAAATTCCA TTAAATTGGA TCCAACAAGT GGCTAATTAA
CTAGTTAGA TCCCCAGGAA AAATGTGCAT AAAAGGCTT GTTGACATTT
TATCTGAGGT AAATCTTAAA GACCTCTAGA CCATGTAACT ACAGGTAAAC
AGCATGGACA TGAAAATCCA GATCCAGACC AGGGAAATTT AATCATTTAC
AATTGGATGG CAAGGTCATG AATAATGCTG ACATACTTGC GGGCCACCTT
CTTAATAAT
pha10653 (SEQ ID N0:744, 745) GGAATGACTG CAACCTGAGA AAGACGCCCA TTATCCAAAG CTTGACGAAA
TCTAGATTGC GATGCCTCAG CAACAGCTTT TACTTCTTTC TCGGGCACAG
CAAAGCATAC AGAATGCTCA CTACTAGCCT ACATAAATAC TGTTAATGAT
TAATGCCATT TCTTATATAT CAGCGTGGAC AACTAGAAAA ATTGAAAAAA
GTTATAAGTG CACCTGAGAT ATCATGATAA CATTAGCTCC AACATCTTT"T
ACTGCACCA AAAATAGCAC TGGCAGTACC TGGAACACC AGCCATTCCA
GTTCTGCAAA AAAGCATCAA AGAAAANTTT ATTGGAATCT ACAACTTGGA
C(T/A]ATTA ATATTGGTTA AAGAAAACCT TA...(--155 bp)...TGGGAATGTT
CCTTATCATA ATGGGTATGC CATATCGCAT CACAGGAATA ATTGTGCGGG
GATGCAAGAC ATTGGCACCC AAATAAGACT GTACAACAGA CAATTTGCAA
GTTAATCTCC TTAATTTTAC AAGCAGAAGG AGCATTGAGG CTTTCCAGCA
TCTAACTCAC CATTTCCCAA GCCTCTTGAT AAGACAGTGT CTTCCAAATC
ACAGCCTCAC TAACTGCATA [C/A]GATAT TATTGTTTAT ATTTAATCCA
AACATCATGT TATGGCAGTC AAACACAACA CAAAAGAATC ATCAATATGT
CAGAGCTAGA ACTCCCTCTG GTTACTAAAA ACCAATTCTC ATGATCCAGT
CCACTCATTG TTTAACTCA_GAGACAGAGT ACGAAG ATA ACAAACCTTT
TCTAGGATC TGCACTATAC ACACCATCAA CATCTGTCCA AATTGTGACC
TGACGAGCCT TAAATAGAG CACCCATAAT TGCTGCCGAG AAGTCACTTC
CATCTCTCTT CAGTGTGGTA GGAATGTTTT GAGGTGTGCT TGCAATGAAT
CCAGTGGCAA TGATTACCTT ACATGGATTC AAAGAGTACC ATTTCTCAAG
TCTCTTCTCA GATTCCA
pha10655 (SEQ ID N0:746) AATAATATCA CAGTAAGAAA AAGACAACAG CTGTGGATGT CAGGAGGGAT
TGATCCGCAC ATGCATATGG ATCCAATCGA TAAGGTGAAG TTTCTCTTGG
CTTGCCTCGC ACCTTCTCAA TTAATATATA TATGTATGTG GTTTGGTGTC
TGTATATGA CATATGTGCT CAGTGCCTCC GCCAAA[G/ T]CAAATAAA
ATGGATACTG AAATGGTTGA GAGCATTGAT GTCACAGATG ATGCAAGGAG
TTTTAGCTGG GCAGTGGACA GCGCCATAAG GAGTGA[T/G ]ACTGGGTCC
ATI'TT"TGGAG AATCTCGTGT CCGTAGCTTT GCTTCGTTAG T[G/T]GACA
WO 99/31964 ~6 PCT/US98I26935 GTGCCATAAG GAGTAGTAGT GCTACTGAAG CAGATTTGGA ATCATCCCTT
GTCCAGGCAG AAGACAGGGC GATGAGGACT GTTGCAGCAC GATTAACAAA
AGCCGTGTCA AATGCAAAAA GACATTTTGC TAAAGGATTA TTTACCCGGG
CTGAGCTTAT ACCAGTCACC CAGGCTGAGC TTGAACCAGC CACCAACAAT
TTCCCAGGAT TATTATTTAC CCTGGATGAG CTTAAAGCAG CCACCAATAA
TTTCTCATTT GACAACAAGA TTGGTAC[T/ C]GGAGGCTT TG[GTA]TTG
TTGAGTACAG AGGCAAACTC [A/G]TTGAT GGTCGTGAGG TTGCAATCAA
GAGGGGTAAA ACCTGGTCAA ACTCATTTGG TAAGTCT[G/ A]AGTTTGCC
TTGTTCTCCC G[T/C]TTAC ACCACAAGAA TTTGGTTGGG CTGGTTGGAT_ TTTGTGAA[G /C]AAAAA[G /T]ATGAAAG GCTCTTGGTG TATGAGTACA
TGAAGAATGG G[G/A]CGTT GTATCATCAT TTGCAT[GA/ AG]CAAGAAG
GGTAGCAGT GTGTTGAATT GGTA
pha10658 (SEQ ID N0:747) CTGCAGGGTT TGATCGCTA CTATGTATAA CTATTTGAC TTAGAGGAAC
TCATCTTAAT TAATGTCTTC TACACCGTCT GCCTGCAACA GAGTATGTAT
GTATATATAA ACAACATTCC AGGGCCAGTG TCATACAAGT TACAGCCAAT
TTTCAAAGTA ATTAAGTGAT GCTAAACTAA ATCAAGACTG AGTTTAAATT
TTCAAAACTA ACTCATTTCA CATATATTTT GAGTTATTTA GGCCAACTCT
TGCTAACCAA AGTTGGGTGA ATATTCTAAC TGGTCCCCAA AACTGTGAAG
CGTAGTCACT TTAGTCTCTG AAATAATAAA ATTCAAAAAA GTTTCTAAAA
TTACAATCCA TATGTACTCA C[A/T]TTAG TCCAATATTC AAAAACTTGT
GTGATTTAAA TGAAAATATT AACATACTTT CAGGACAAAA GTGACAATAA
AATGACAACA TAAAGACCTA TTCTGAATCT TCTTACTTTC AGGGACAAAT
GGGATAACAC [A/T]TAACA CTTTTAGGAC TAAAATGGTA GTTTATCTAA
TCAAAGTTTT TATGGTAA[A GTT]AGTTCA AGCCCAACAA TAACACACTT
GAAATTCACA TAGAAAA[C/ T]AACTGTTC AAGTACAAAA TAGGAGGAAT
CCAACAAATT AGCAAGCC[A AAACAGT]TA ATAGGTACCT GATTTTCTGG
TAGAACTTTC ATCATCCATC CTCATCCCAC AGGTTGGAGC AAAAACTGCC
CATTGTACCT TTGTCCCACA ACATTGTGGA AATATCTCTA ACATTGCATT
CAAACACTTC CTGCACTCAT CACTTGGAAG ATCCCTATCA CACTGCACCC
ATCCATACAA TGTCTCATTG TCACCCCAAT CGAACTCCTC CACTGCCCAA
AACTGCTTGG TCTCTATAGT TGCTIZ'TGTA ATCAAACCLC/T]TC[C/
A]ATAGCATC CTCAATT[C/ T]TCTTAGTT TCTGTTGA[A /G]TTCTGTG
TTTGGACCAC GATCTTTGTG AGACAAATTC AATTZTI'AAG TGACTTTAAA
ATATGACA[C /A]ATTGAAA TGACCTGTTT GGATATTAAA TTTAAAGAAT
TTT[CIA]AA AAATGATGAG AAATTTATTG GAATT'IZTIT T[T/A][A/T
]AAATAAAAT AGAATTACAA AACGCTGACA AGGTGTTTTG GTAGGGAGAG
ATTCZTI"TCA ATTTCTTAGA AATCTTGGAA GTG[T/C]TA AATTCCTATA
TTTGATATAA CTATTTTAAT GATCATTTTC ATAAAT[C/A ]TAAAATTCA
CAAGAATCAT TTTTGAATAA GTCTTCTACT AAACGTGGTT CGGTGGAGAG
ACTCTACAAA ATGAGGTCAG ACATCGTAGG ATGTTAGT[C /T]AAGCATC
GGC[C/A]AA ACC[A/T]GT AAATACTTCA TATCATATCA ATCATATGAT
GAT[A]AAAA AAACGCTTTC TTAAGACACC GTGAACTCTA GAAAACAC[A
T]ATAAAATG AAATCTGCAC AAGC[T/A]T A[A/T][A/C ]GCACATGAC
TAAATAA[C/ T)TTTATCAA AATAAAAAAC TAAAA[T/CJ ACAGGAAAGT
TGAATTGCTT TATCCAAATA AAATTT[AA/ TT]AAAACGA AAGAAGTTTA
ATTTGCAAAT AGCTTGAATT TTTCAAATAC CATAC[TC/T A/CA]AAAAA
TTACCTT[G/ T]ATITITCT GGGTCCGGTA A[C/T]GTTC CACGTTGGGC
TTAAACTAAC TTTGCCATGG AAACTCTGA TTGGAGTAAC GGAGGATGC
ACACATCATA CCATAGGATA GCGGTAACAC TGTCAGGACA CAGCCGAGAG
ATTTCATTGA CAGCAGTGGT GAGACAGAAC TGGCAGAAGT ATCCTGTGAT
GTCGTATCTG CAG
WO 99!31964 ~~ PCTNS98I26935 pha10782 (SEQ ID N0:748) CTGCAGGTTA GTCGAACTCT TGTCATITTT TCTTCAAGTG TTGACTGATT
AGCTGTTACA AATITITAGA GATTACATT AATTACCGCT ATGACTATAT
CTTGGGACT ATATTGATTA ATAGTTGTCT TTGAAACTTT TGGAAAGTCT
AAAGATCTAT TATGTTGGAG AATAAACCTT TGTGCACTCA CGCACA[A/G
]ATATGAACA GAGAATATGT TTCAATGAAG CTTGTGTGTA TTTTCTTATC
AAGCACTTAC ACATATGCAC CTTTATATAG TGCTTTTATC AAAAACTGCT
ACAGCAGAAA ACAGAAA[T/ C]CAGTATAA AACTAACA[T /C]TCTAACC
AACTACTACT AATAATACCC TTA[T/C]TT ATATTTGAAT TAAACTC[AA
]AGTTATTTC TT[T/C]GTT ATTCTG[C/T ]TATTATTAA AGTTATTTCA
TTTGTTGCTT CTTCTTGTTT AATGCACA[T /C]ATTACTT TGTCTTGTAA
AGGTCGAGGA TATGATTCTG TTATTCCCTC TATTCCTGAT GATGTAAGTG
GCGAATCCCA T[G/A]TTCA GACCAAAATA TTTGCTGATA AAGTTGTGGC
ATATATAGA CCAATACATA GAAGCAATGG AGAAGGTAGC CTATCCCCC
TTTTGAAACA ATATGGCATT GTAATTACTA GACAGTTTGA TGACATTTCT
TTTTGCAATT TGTTAAGGTT AAACTTAAGC AAGGATTGAA AACTGCAATG
AGCATATCCA GCGAAGGAAA TGCAG
pha10783 (SEQ ID N0:749) CTGCAGAGT CTCGCTAATG GGAGTGAAAC CTCAGTTTC TACTTTCTGA
AGTGTAGTAG ATTGCATGC[ T/A]GGCATG AGCACATGCT TATGAATTAT
TGATAGATAC TGTTCAATAA GGGCATCACA TACATGAAAA CAACTACACT
TGTGACCTTT TTCTCTTAGT CTTGAAAAAC ATCGTTGAAA ACAAATTAAG
GAAATGCTAC CTATGTGAAA TTTTGATCAA GTTATGTGTG CAACTGTGTA
T[G/A]GTTG TGGATCGTT TATTCTTTCA AGAAGCCTAG ATTCACCTAC
ATGCATCTCA AGCATCAATA TTGAGTTGGA TCAAATGGCA AAGTTGATTT
CTTCTTTAGC TTGTITITAA ATGGATTTAT GATGAAGGGG CCACATTTCT
GGTTCGCAT[ G/C]TCGACC TCAT[A/G]C TITTITCTTT CTTGCAAAAT
ATTTGATTGG TGAACTGGTT CAAATGATTC ACGAGGCAGT GTAATTTACT
CTAATAAGTT TTAGTAGCTC AATGTTTGTG GGCC[C]ATT AGGTTCAATG
TAGGGCCAGG ATTGTGGTGG AAGAGAGCTG TTGAGAAATG TTGCAGCAGG
AGGAAGAGGA TGGAAGAAAA TTTTGAAATT TGAAGGAGTC ACGTTGCCTT
GCCAAAAACA TGATAGACAA GTAGTGTACC GTTGCAGTAG AAATTACCTG
AAATGCTATG TGTCCGTTGC ATTCTTCCCC TACATAAGGA TTGCTTCCTC
CACTCTAGTT TTTCAATTTC ATTTCAATTC TA'1"ITDTGAG TTCTTTGCAA
AAATAAAAAC CTGCATCCCT TITI"I'GGGGT GTATGGGTTG ATGGAAGTGA
GTGAAAGTGA TAGATACGGT AGATGACGTG ATGAAATTGG ATGCTTGAAT
AATAGATCGA AGAATCCTT TTCCTTACAC CCTAGTTAG GCCCTACTAT
TTATGCTCTC AATTCCATTC CTCAATTTCC AGAAATCTAT GTCTAAAGTC
ATGTCTGATG TCTGTCACTA TATATTAGTT AATTCATATA TGTTTATGAA
TATTGCATTG ACCCTCGTTA TATATGGATC AATCAAGCCC ATTGCTGCAG
pha 10792 (SEQ ID N0:750, 751 ) CTGCAGCAAC CGGTCTTCTA TCAAAACTAA CACATTTGAA TCTGTCTTGG
AATGATCTTC ATTCACAGAT GCCTCCAGAA TTTGGCCTCC TTCAGAACCT
GGCAGTTTTG GATCTTCGCA ATAGTGCCTT GCACGGTTCA ATTCCAGCAG
ATATATGTGA CTCAGGCAAT TTAGCTGTCC TCCAACTTGA TGGAAATTCA
TTTGAAGGGA ATATTCCGTC CGAGATTGG AAATTGTAGC TCTCTTTACT
TGCTGTATG CATCCTATTT CTCAAGCTCT AGTGTCATTT CTCTTAACAC
ATCTZTI"TGG AA[GA/GT/T A]TATACTAA TTCCTATCTA TTTTATGCAG
GAGTTTGTCT CACAATAATT TGACTGGTTC AATTCCAAAG TCCATGTCAA
AGCTAAACAA GCTCAAAATC CTCAAGCTGG AATTCAATGA ACTAAGTGG
AGAGATACCA ATGGAGCTTG GAATGCTTCA GAGTCTTCTT GCTGTAAACA
WO 99/31964 ~8 PCT/US98/26935 TATCATACAA CAGGCTCACA GGAAGGCTTC CTACAAGTAG CATATTTCA
GAACTTGGAC AAAAGTTCCT TGGAAGGAAA CCTGGGTCTT TGTTCACCCT
TGTTGAAGGG TCCATGTAAG ATGAATGTCC CCAAACCACT [A/T]GTGCT
TGACCCAAA TGCCTATAAC AACCAAATAA GTCCTCAAAG GCAAACAAA
CGAATCATCT GAGTCTGGCC CAGTCCATCG CCACAGGTTC CTTAGTGTAT
CTGCTATTGT AGCAATATCT GCATCCTTTG TCATTGTATT AGGAGTGATT
GCTGTTAGCC TACTTAATGT TTCTGTAAGG AGAAGCTAAC ATITITGGAT
AATGCTTTGG AAAGCATGTG CTCGAGCTCT TCAAGATCGG GAAGTCCAGC
CACAGGAAAG CTTATCCTGT TTGA...( --175bp)...CTTCGGGAAA GCAAGGCACC_ CAAATCTAAT AGCATTGAAA GGATACTATT GGATTCTTCA ATTACAACTT
TTAGTGACTG AGTTTGCACC AAATGGTAGC TTGCAAGCCA AGCTACATGA
AAGGCTTCCT TCAAGTCCTC CTCTTTCTTG GGCTATAAGG TTCAAAATCT
TGCTTGGAAC AGCAAAGGGG CTTGCTCATT TGCACCACTC TTTCCGTCCA
CCGATCATCC ACTACAACAT AAAGCCAAGT AACATTTTGC TTGACGAAAA
TTACAACGCC AAGATCTCAG ATTTTGGGTT GGCTCGGCTT CTGACAAAGC
TGGACCGGCA TGTGATGAGC AACAGGTTCC AGAGTGCACT AGGATATGTG
GCACCAGAAT TAGCATGCCA GAGCTTAAGG GTCAATGAGA AATGTGATGT
GTATGGTTTT GGGGTGATGA TCCTTGAGCT GGTGACAGGT AGGAGACCAG
TGGAGTATGG AGAAGACAAT GTGCTGATAC TGAATGACCA TGTGAGGGTG
CTGCTTGAGC AAGGGAATGT GTTGGAGTGT GTGGATCAAA GCATGAGTGA
GTATCCTGAA GATGAGGTAT TGCCTGTTCT GAAGCTAGCA ATGGTATGCA
CCTCTCAAAT TCCTTCTAGC AGGCCTACTA TGGCTGAAGT GGTGCAAATA
CTGCAG
pha11071 (SEQ ID N0:752) AGA ATCTGTGCAG CAAACGACAC CTTCATCCCC TGACGTTCCA TTTGCTCAG
TTGTTGGCAT CTTCACTGG ACCGGGCTCG TAAAAGTAAT GGGAATCAT
AAGTTTCCA TTATACAATT ATGAATTTCA TCCTTATCA ACAATATCCT
GGAAGCCCA GGTGGCCAGC TCATATCACC GGGATCAGC ATTTTCAACT
TCTGGTACTT CAACCCCATT CCCTGATAGA CCCCCTACTC TTGAAT[T/C
]CCCCTTTCC CAAAGGGGAA ACACCAAAGA TCTTGGGTTT TGAACACTTC
TCCACTCGAA G~TGGGGTTC AAGACTAGGA TCTGGGTCGT TGACGCCAGA
CGGTGCATGG CAAGGTTCAA GACTAGGCTC GGGATCGTTG ACTCCTGATG
GTATTGGGCT TGCTTCACGA TTAGGCTCTG GGTGTGTGAC ACCTGATGGT
CTGGGGC[A/ T]GGAATCCA GGTTAGGTTC TGGCTGTTTG ACACCTGACA
GTGCTGGGCC AATCAATCAA AACAACATCT CTGTGCAAAA CCAGATATCT
AAGGAAGCAA CTCTTGCAGA TACGGACAAT GGACATTCAA GTAATGCAAC
ATTGATTGAT CACAGAGTTT CATTTGAATT AACCGGGGAA GATGTTGCCC
GCTGTCTTGC ~TAATAjG/A] AACCGGGGTA TTGCTTC[G/ A]AAACATGT
CAGGGTCTTC ACAAGGTATA CTTTCCAAAG ACCCTGTTGA CAGAGAAAGG
GTGCAAAAAG ACACCGATAC ATGTACAGAG AAAACCGAT GATAAGCCTG
ACAATTCTGT AG AGGAGA GCAATGCCTT CACAAGCAAA ATTCTGTAAA
TTCTTCCAAA GAATTCAATT TTGACAACAG GAAAGATGAT GTTTCTGTTA
CTGCTGGCAG TGGCT
pha11073 (SEQ ID N0:753) CATGTTGCAA TACACTAAAA CCTCAATTCA TTTCTGACTG TAACATTGGG
AAGAAAGCCC AGCTGTTGGC TGATCTACCT TCCTTCCCAG CAACCTTCCT
GTTAGTCCAC CACTCATAGC CACTGCCAGT AGTAACAGAA ACATCACCTT
TCCTGTTGTC AAAATTGAAT TCTTTGGAAG AATTTACAGA ATTTTGCTTG
TGAAGGCATT GCTCTCCTT TTCCTACAGG ATTGTCAGGC TTATCGTCA
GTTTTCTCTG TACA[C/TjG C[G/A]TTAC AGCTACTATT GGTGTCTATT
WO 99/31964 ~9 PCTNS98/26935 TGCACCCCTT TCTCTTGTCA ACAGGGTCTT TGGTCAGTAT ACCTTGTGA
AGACCCTGAC ATGTTTCGAA GCAATACCCC AGTTTTATTT GCAAGACACC
G[G/A]GCAA CATCTTCCCC GGTTAATTCA AATGAAACTC TGTGATCAAC
CAATGTTGCA TTGCTTGGAT GTCCATTGTC TGAATCTGCA AGAGTTGCTT
CCTTAGAAAT CTGGTTTTGC ACAGAGATGT TGTTTTGATT GGTTGGCCCA
GCACTGTCAG GTGTCAAACA GCCAGAACCT AACCTGGATT CCTGCCCCAG
ACCATCAGGT GTCACACACC CAGAGCCTAA TCGCGAAGCA AGCCCAACAC
CATCAGGAGT CAAGGATCC CGAACCTAGT CTTGAACCTT GCCATGCACT
GTCTGGCGTC AATGATCCAG ATCCCAGTCT AGAACCCCAT CTTCGGGTGG
AGNAGTGTTC AACACCCAAG ATCTTTGGTG TTTCCCCTTT GGGGAATTCA
AGAGTAGGGG GTCTATCAGG GAATGGGGTT GAGGTGCCAG AAGTTGAAAA
TGCTGATCC[ G/A)GGTGAT ATGAACTGGC CACCTGGGCT TCCAGGATAT
TGTTGATAAG GATGAAATTC ATAATTGTAT AATGGAAACT TCTGATGCC
CATTACACTT ACGAGCCCG GTCCAGTGAA GATGCCAACA ATTGAGCAAA
TGGAACGTCA GGGGACGAAG GTGTCGTTTG CTGCACAGAT TCAGGTCGGG
GAG
pha11074, pha11075 (SEQ ID NO: 754) CTGCAGCTAT TTGCTTATGA TACTGTTAAT AAGAACCTCT CGCCAA AGC
CAGGGGAGCA GCCCAAACTC CCTATTCCAG CATCATTGAT AGCAGG TGC
TTGTGCTGGA GTTTGTTCAA CTATATGCAC ATATCCTCTA GAGTTGCTAA
AGACTCGACT AACTATCCA GGTGTGTACA TATAAAACCA AACAGGCTC
ATGTTTCCCA CATGAGTTTA ATTTCTATTA TTGTCAGTAT AAAAGTTT~"T
ACATTATTAA TCAAGAAAAA ATCATGTTAG CTATGACTTT TAAAGAGTTA
TTTTAAAAGT CAACGAACAT ACTATGCATG ATGATTTCTG ATTAGTTGAC
AAT[T/C]TA AAAACTCTT TATAGTGT CGGTGCGTAG ACTCTGTTAC TCTTC( CTATGTATCT TGTATCATCA TACCAAAAAT AAAAGCAAAA TTTAAGTTGA
ATTAGTAATT TTGCAGAGGG GTGTTTATGA TGGTCTACTA GATGCATTCC
TGAAAATAGT TAGAGAAGAG GGTGCAGGAG AACTTTACAG AGGTCTTACT
CCGAGTCTGA TTGGAGTAAT TCCATATTCT GCCACCAATT ACTTTGCCTA
TGACACCTTG AGGAAAGCAT ACAGAAAAAT TTTCAAAAAA GAGAAGATTG
GCAACATTGA AACCCTI"ITG ATAGGATCAG CAGCTGGTGC ATTTTCAAGT
AGTGCTACCT TTCCACT[C/ T]GAAGTGGC TCGCAAACAC ATGCAAGTGG
GAGCCCTCAG TGGAAGGCAA GTTTACAAAA ATGTGATTCA TGCCCTTGCA
AGCATTCTTG AGCAAGAAGG GATCCAAGGA TTATATAAAG GGTTGGGAC
CTAGCTGCAT -GAAGTTGGTG CCAGCTGCA G
pha11076 (SEQ ID N0:755) TAGTTCCAGC ATATCAACTC ATCGTTCCAT ATTGGACCTT GACCTAACCA
AGTTTACCAC ACAGAAACAG GTGTCTTCAC TGTTCCAACT ATGGAAGAGT
GAGCATGGAC GTGTCTACCA TAACCACGAA GAAGAGGCAA AGAGACTTGA
GATTTTCAAG AATAACTCGA ACTATATCAG GGACATGAAT GCAAACAGAA
AATCACCCCA TTCTCATCGT TTAGGATTGA ACAAGTTTGC TGACATCACT
CCTCAAGAGT TCAGCAAAAA GTACTTGCAA GCTCCCAAGG ATGTGTCGCA
GCAAATCAAA ATGGCCAACA AGAAAATGAA GAAGGAACAA TATTCTTGTG
ACCATCCACC TGCATCATGG GATTGGAGGA AAAAAGGTGT CATCACCCAA
GTAAAGTACC AAGGGGGCTG TGGTATGTGA AACCATTAAT TGTTTTACCA
CGTAATTAC[ T/C]ACTCCT TCTATCTTAA AATAAACTGT TGTCTTAAAT
TGTTTTATAT AAATTAAAAA TAAAATAAAA TAAAATAATA TI'TTTACCAA
ACTAACCTTA TTAGTGAAAG AGATTACCAT TAATACTAGA TTTAAAAACT
AAAAATATAT TTATTAGAGT TATTGGTGAA AA[T/A]AAT AATTAATTTT.
ACGTTGAAAA GTTAAATGTG ATATTTATTT TGCTACAATT TITTTTAGTG
WO 99/31964 $0 PCT/US98/26935 TGACACTTAT TTTGGAATGG GAAAATAATT ATTAACACTA ACTTITTITC
TTTGTCTTGT GG[T/A]AT GTGAAATCAT TTTGATATAT AGGAAGGGGT
TGGGCGTTT TCTGCCACGG GAGCCATAGA CCAGCACATG CAATAGCAAC
AGGAGACCTT GTTAGCCTTT CTGAACAAGA ACTCGTAGAC TGTGTGGAAG
AAAGCGAAGG TTGTTACAAT GGATGGCACT ATCAATCGTT CGAATGGGTT
TTAGAACATG GTGGTATTGC CACTGATGAT GATTATCCTT ACAGAGCTAA
AGAGGGTAGA TGCAAAGCCA ATAAGATACA AGACAAGGTT ACAATTGACG
GATATGAAAC TGTAATAATG TCAGATGAGA GTACAGAATC AGAGACAGAG
CAAGCGTTCT TAAGCGCCAT CCTTGAGCAA CCAATTAGTG TCTCAATTGA
TGCAAAAGAT TTTCATTTAT ACACCGGGGG AATTTATGAT GGAGAAAACT
GTACAAGTCC ~GTATGGGATT AATCACTTTG TTTTACTTGT GGGTTATGGT
TCAGCGGATG GTGTAGATTA CTGGATAGCG AAAAATTCAT GGGGAGAAGA
TTGGGGAGAA GATGGTTACA TTTGGATCCA AAGAAACACG GGTAATTTAT
TAGGAGTGTG TGGGATGAAT TATTTCGCTT CATACCCAAC CAAAGAGGAA
TCAGAAACAC TGGTGTCTG CTCGCGTTAA AGGTCATCGA AGAGTTGATC
ACTCTCCTCT TTGAAGCCGT AAAGGTTCAA TACAACGAGT GCTTGTTTTC
TTAGGGACAA GCATTGTACT TATGTATGAT TCTGTGTAAC CATGAGTCTC
CACGTTGTAC TAATGTGAAG GGCAAAAATA AAACACACAA CAAGTTCGTT
TTTCTCAAT
pha11078 (SEQ ID N0:756-758) CTGCAGAAAC TGTCGGTTTT ATTAGTAATT CAGCACAAAT TCTTCGGTGG
AGTTTTI"CI~T TTNTTCTGTT TTTGGCTGCG TTTACGTGTG TTTGTTTAGT
TTCCTGGAAA TTGTTCAAAT TGGTTAAAGG ATCTTTATAA GTGGACATTT
CAGTTGGAGT TAATTGCGCA ATTGAAAGTT AAGGCATAAA TGTCTCTATT
TAGGGAAGTA TTTGAATGCA ATGCTGCTCA TTTATGTTCT GTTAAATCTG
TATATTGATA TGCATTGATC CTTCTTATTT ATGTTATTGG CCATTATTTT
TTACATTTAA GTGTGGTTCA TAGCTTATAT ATGACAAAAA TATATGATTT
ATACTTCCTA TCGTGTTGGG ATGCAGATGG TCATGGGAGG TTTCCTATAT
TGCCAATTGG TGAATGTATT GTGCCTTTGG TCCAT...(- 210 bp)...
TGTCTTGTGA CGAAAGTACT ATAGTGTAG TGCAATCTCA TACTGGCTTT
TCACGATGAT AGGTTAAAA ATGTTTTCTA GCAAAGCACA TCAGTAATAA
TGATCCAA [G /T]ATI"ITTA ATGCCTAAGC TTGATAATTT TCTAAATTCT
TACTCGTTCC ATGAAGTGTG ...(--50 by )...TITITG GAAGCAACTA
TGACCAATCT TATCTTGTTT CATTTAAAAA TAATTCTAAA TCCTCTTGTA
TCATATTCTT ACTAATTCTC CTATTTACTA AACTAAAAAA GTATTTAGTG
TITITTATGT TTTCAAGATA TTAAATTCAT TTACATTTTC CTTCCTTCAT
ATTTTAAAGC ATTGTTGTTG TTTACAGAGT TTGGCCATT AGGATGTTGA
CAAAGGGCAG AGCTCCTTC ACGTATTACT AAGGAAATGC AAGTAAAGAT
TGGAAAATGC TCAGAGTC
pha11079 (SEQ ID N0:759) CTGCAGGTAA TTTCTTAATG AATGGGTTGT TCTT'ITGCTC CTCTGCAAAA
TCAAGTTATC TTTGATTAAG CGACTGATTG AAGACCTTTT TTTAATTTAT
TCGGGAAGA AGAAGAACAC TCGGTACAGT AGCAGGAAA ACATGGTGTA
GCCTATATGT GATAACTTAT TAACATGTGT CAATGCCATA TGGAATAAAA
AAATCTTCTT CTTCAGATAC ATTATTGAAC TITTTGACGT TTAATTTCTT
GTCTTCACAT GTCCTACTGA GTAGCCCAAT TTGTT[C/A] TTTATGATGT
GGCTGAAGAT TGAAGGGCCA CTCAGTTGTA TCTAT[G/A] TATATTAATA
TATGTATCAT TATTTCATAT ACCATATCGT GCATTTTCAC GGCTTATAAT
ATGTAAATTG GCTTCAAATA TTGCCAGATG CTACAAGAAA ACATGATTGA
TTATATGGAT ATGTGAATGC TGAATAGTTT CACAATAGCA ACTGAGATGT
CACAAACTTT TTACCTTCTT TACATTAAAT AAACTCCAGT CTCTTAAGAT
WO 99/31964 g 1 PCT/US98/26935 GTACCCCACC AATAAAGTGA TTGCATAAAT TATTCTAGCA AAAATGTATA
AGGTTAGCAT AGAGGTAAGG GTATGAAAGT TAAGCTTATG C
TTACTATTAT TATTATTAGC ATATCAAAAA CAACATTAAG TATTCTGGTG
CCTGCTTAAA ACTTCAAATT TTGAAAACTC TTCTCAGTAT GCTCCCCTCA
TATTCTTTTA GCTTCAAAAA TCATTTACAC GTGAGTTCTG AATATATGAT
TT[G/T]GCA ACAGATAGAT TGGTGTTTTC TCAACACAA GTCGATAATT
TGAGTGTGGA TTTCCTAGCT TCAAACAAT ATCTATACAT GAACGTCCTT
GATAGATCAA CTGCAG
phal-1131 (SEQ ID N0:760, 761) -CTGCAGTGCC ACCATCAGCA GTATCAGCAG ATGACACTGC AAATCAAGCT
GTACACTGTC CATCTCTATT GTTGATGCTC ATAGCCTGC TTCTTAATCT
TGTTATTCTG ATATGTTTC TATACACAAC TTCATTCTTT TTCTCTCAAT
ATTGATGTAT AATCAGTTAC CATI~TTTfA TATGTGGTGT GTGAATTACT
TTTTGCGCAA AAACAAAAAA TTGGCTATGT GCCTGGTAGT TTAAG[G/A]
AGAGTTCTAT TGATCAATTT ATGTAATGAA ATTACGTTTG CCAAGTGAGC
CTTAGTAAAT GGCTAACTTC TTACTGCTTC ATCATATGCT TT[G/A]ATT
TGGTTTGACT GAAATCTTCA TACCCTATTG CACAAAGATC CATAAAGCAT
GCCTTGCGTG CATTTT'TT"T GTTGCCATTC GTTGCGGGGG TGCTTCTGTC
ATTCATCCAT TCTTACTTAA TGTTGCATAC GAATGAATGT GTAAAACATT
TCTCAGAATC TCTA[T/C]A AACAAGTGCT TGCTTATAAT CTCGAAAAAC
CAACCACTGA ACATGAATGA GTGTGCATAT TAATTCTCAA [C/TJTTGGA
ACGTGAATCA GTGAGTGGTC TCTTTGC(A/ T]TGCAACTT GAGATGGTTG
TGAAACTCCT GAAACAAATA GTGTTGCAAA TG[T/G]ATG TGCCGACTAT
TTTTCTCTAC ATTATTGATT AAAAAATTTC CTITTTTATG AGAAGTCATG
TACTAAATAA CGTI"ITITAT ATGTTATTC TTTGTAGAAC TCATCATATT
GTTCGTTAA CGAACTCAAA TCTAAGTCGT TTATTTCAAT TTACTTTTTG
CACTAAT.. . ( -1200 bp) ...ACTAAAT TTGAGTATGT ATCACAGGTA
GTATGACTCA CATTGCCAAC AATCTTTGCT CCAAACCATG GAGTATCAGC
ACCATATGGA GGAGGTATGA CTCTAATTCC ATTGGACACG TAGGGAGGAA
GTAAGGCATG GAGTTCCTTC TCTATCCTTT CTGTACCAGA ATGGTCCCAA
AAGTAGTAGA ATTAGAAGTA GCCATTACCC CACTACAGGT GTTTAATTTC
ATTTCATCAT TTCGCAAAGA TATAAAATGT CATGTCACAA ATAAAAAAGA
ATATTTACCA GCCAAACCAG GTAAGCATGC AGTCCCCCCT GATAAGACTA
TAGTCTTGTA CCAATCACTG TCACATGCTA AATCTGCAG
pha11132 (SEQ ID N0:762) CTGCAGCGTG TAAATGAGCA TTGTGGGTTC TAATTTCCTG CTITITCCTC
TCCT>fiTGAT CTTTCAACCA CCTACCCATT GTCCTACCTC TTAGCAAGCT
CCTGTAAAGC TGGTGAGCAA TAATTTGCAC CAATAATAAA ATAAATGAAA
AAAAATACTT AGTAATAGAT TGAGTTGGTA TTTGAGCATA ACTGATGAAG
ATAATCATGA GGTTGATTGC AAGATACCCC ATTTCTGAGC AAGTTTTGAC
TGGAAAGGAA TTCTGGATTG AGTGCTTGGT GGAGTAAAAG TAAATCCTGT
ACATGACATA TTGCTATGTT ATTAATTTAT ACACACTAA GAATTCGCTC
GCTGTACAA AA[A/C]AAG GTTTGGTAAT GGTGGC(A/T ]TG[T/C]GA
GTGTTTGTCA TTGATGGTTC TCAGCTATTG GCATGAGGAA ATTATCAACT
TCACAACAAA AAGCAGTTAG TCATGAAAGC AGGCTAAGGC CAAATGATAT
[ATT/G]TTC TTCAAATTTG AAACTGAAAA ATGGGCCAAA TTGTTTCTTC
TGCATATGAT CATATTTAT CACTATATAA GTTAAC[A/G ]TGAAGCAGT .
[A/G]AAGTA CCTTCTTTTC ATTACTGTCT TTTGGCAAAA ATGGAGGGAA
ACACCCGTTA GATAACTGCA CAAAAATCTT GCATCAGAAG TTCAGAACCA
ATATTAACTT ACAAAAGCAA TAGACACACA CAGCAATAAT AGATGCTTCT
TTATATTGAA GAATCCTAGG ACCAGAACAA AAGGGGTCTA GGTTATGGAA
AAATTC'1'TTT CCGTGAAACT CCTTTGGAAG GCTTTGAATA TCTC[C]TTT
'1?TI"TCTATT TCCCCAATGC AGGTGTGTGT GT(GTGT]GG CATAGAAATA
GAATATCTTT TAAGAGAACT AACATAGCTT TTGCATGATT GTCAGTTTAC
ATGCTGCATT GCCAAGTAAG AAAAAA[G/C ]TAGCTAAAA GAAGTGCTAA
TGACCTAAT[ G/A]CAAATA ACTCTTAAAT TAAACACAGC CAATGTTCTT
ACA[T/G]CT TTGGAAGCAG TGGAACTCTT GGTATCAAAT TGGTGGCCTG
AAGGGCAC[A /G]AAAGTGG CATCTCTATA CCGGT[A/G] GAGATTGTAA
CGTTGGTGCT GTGCAGTGCT TTGGAGAGTT CCATGGCTGA TAGACTCCAT
GATCTAGCAA GGAACTCCAT AGATTCAGTT GGAGTATCA GGAGGGGGAC
LO AGGATGATG CAGGCCAATC TGCAG
pha11133 (SEQ ID N0:763, 764) CTGCAGAGAC CATTCAAGAT CGACTGAAGA ATTTGATTGC CTCGTCCCCT
GTGATGCTGT TCATGAAGGG TACCCCAGAT GCACCAAGAT GTGGTTTTAG
TTCCAGAGTT GCTGATGCCC TTCGACAAGA GGGCTTGAAT TTTGGGTCCT
TTGATATATT GACTGATGAG GAAGTGAGAC AGGGATTGAA GGTATACTCA
AATTGGCCAA CCTATCCTCA ACTCTACTAC AAAAGTGAGC TGATTGGTGG
TCATGATATT GTGATGGAGC TGCGAAATAA TGGGGAGCTG AAGTCGACTT
TATCTGAGTA GGATTATTAT TATTCCTTCA AATAACATGT GTTATGTCCT
AGAAGCCATT TTGGGAGGTT GTGTTTGATG TTCATTAAT GAACTACATG
GTTATTTTAT ATGCTACCG TCAGTGATTT TGAAAATTGT TAGTGTGGAG
CCTCA[T/A] CTAATGGTA TACTGAACAT GCATGAATTC CAATCAGATT
AGAAATTGTT ATATA[T/A ]TACATAATT ATTTGGTGGA AACCTCTCAT
TGGTACTGCC AAACAAATT AATTCAACAC GTGTGAGCCC CCTCGAAGTT
GCTTGCCCTT CAAAGTTTAT GTTTTGGACG TTATGCCAAT CCTT[T/G]T
TTTTGGTCTA GACTTTCACT CAGACAAGGA ACATT[C/A] ACTATAAAAT
TAGATTCTGA AACTTCGATA AAAAGAATGC 'ITI"TTAATAT ATAACGAACA
GTTTAAATTT TATAATTAAA AAAATCTTGA AATATAGTTT AGATTAAAAA
AATCTTGAAA TATGGTTTAG AGATTAAAAA AAAATGG[C/ T]AAATTGTG
ACTATTAACA GAATGATACA AAGA[C/T]C AACTATACAA ATAAAAGAAT
GATAATTTAA AATGATTGAG AAGTT'1CIT1~T GGTAGATATT AACA[G]TAA
A[A]GTATAA A[A]AAAGGG AGCAATTAGT TACAAATTAG GTCTGTAGTA
TTTTGAATTC TCATGATGAT TTATTAAATG CATCAATCTA ATCATATCAT
GTATTTCAAA TTTAAATCAT CCA[C/G]TA AATGCATCAA TCTAATCATA
TCATATATTT CAGATTTAAA TCATCC[AGT AAATGCATCA ATCTAATCAT
ATCATATATT TCAGATTTAA ATCATCC]AC TCGATA[G/A ]CTTATTT[G
T/AT/GC]AT GTATCATGTA TGCATGCTGT ATCTT[T/G] C[A/G]T[CC
TCATAGAATA ]ATTAACCTA ATCTT[C/T] TATAATCTTT TATTT(C/T]
TTT[A/G]TT ATTAGAAAAA GACTAAATAG GAAAGGATAG ATCATATATA
C[TCTAG/TC TGGITC/GTT AG]AAGACTC AAA[T/C]TT GTAACTTAA[
A/G]ATAAAA [G/A]AATAT ATAAAAATAA ATTAACCTGA TCTTCTTT[C
/A]ATI"I"I"TA ATZ"ITIT[TJ ACTAT[G/A] AAAGA[CIT] TACTTAAATG TATTTTGTT[
CA/TG/TA]A GAAAATAG[T /C]TAAGATT C[A/GTATG] TATGTTTGAA
TTAAATTTTC TTTTATATAA CATT[T/C] GCATAAATTT TTTTAACCAA
AAATAAGGAT AGAATTACTA TCAGCTGTT ACAACTTAGA AACATTCTAC
CAGATTCTCT TCCTTCTCCT TTTGCACTGT TCTTCTGAG TGTAATCC(T
/C]CAGAAAA GATGGAAGTA ATTCTTGAGATTCTG GGAGGAACT...(-- 1$5 bp)...CCTCCTAGTC GCCCCCACCT GATGACGCAA ACCTTGCGCC GCC(T/A]C
AACCGACGCT GAGACTCGCC ATGGCACACA CGTCCGTTCG GGATGCCGGA
GTTTCCACC GTCGTTGCAT CTCAATATA ATACCACACG TTTACGATC
CGTTTGGATA TGGAATGCCA AATATACCGGAG TTTTCATCGT TAACGGAGTT
GAGGCCGATGC CACAGATGCC GCAACAGCTTC CGTCATCATC AACGGAGGAT
ATGCTGGTGC CGTCAAATAG CAATGAGTT TGAAGAACCGC TGCCTGACA
TGATGGA[A/ G)GCGTCACC GCCGTCAGAG ACGGCGGGGA GAGTGACGTT
GAACGTGAAA GTGCATGAGA TCCAAGATCG CATCCCTATA GAAATGGAGT
TTGAAGACAC CGTTCTGAAG GTGAAAGAGA AGATAGTTGC CCGCGAAGAC
ATGCGAGGT GTGCCATTGG AGAGGATCG CGTTGCAGTC GCATTCAGCG
GGTGTGGAAT TGCTTGACCA TCAGGTTCTG CAG
pha11135 (SEQ ID N0:765) CTGCAGCTG ATGGGAAGAT TTACTTCTGG TAGACTATT TTAAAATCTT
GTAGTTTTCG TTTGGTTCCA AATTCATTCC AGAACTTTAT TTGCAATTTT
GCATAAATTT TTAAAATGTT TCTTCCGTGA ACATGATGAA AAGTAATTTG
CTTGTCAGCT TCCATTTGAC ACTTGTTCAA AATTAACTAA AAGTGATGTG
ATTATGGCCA TTACTATTTA ATTTTCCTGT GTTATTTGGA TGAAAAACAT
AGTTCCAAGA AAATTTCCAG ATTATACTTA TTCTATGGGG AGTGGTI'TIT
CTTATTTCTA AGTCTACCCT GTCTTTCTTT TGTAAAATTT TCATTCTGTT
TGGAGCTTTA TCCCGCATCC CAATCTTTAT TATTTTGTCT TCTAGAATAT
GTTCAAGGAA GTCGATGGGG TTGACTTAGA AATGGCATAT ACTAGTACAG
AATATTGATA GAAACCCAGA TTTTGAATAC AGGAGAGTGT GAATAATTCC
CAGTTGAAAA ACTTATCTCA GAAGAGCAAA TAAGGATGTA TTTATCCTTG
AATTTTAGTG GGTTAAGATT ACGTTGCACT [T]TAAATTA AATCAAGATG
GACTTGCCCT TACTTGGAAC ATTTAAGAGA [G/C)ATACT CACTTTATAG
GAGCCTTGCC CAATTAGAAA AAGAATAATG GAAAGGAAAC ACAAGGAAAA
ATITACGGAA GTAGACTCTC CCATAACAAA TAGAAAGATA ATGGAAGCTC
TCAATATGTA AGAGAAGAAA TGATAGCACA TATCATTCAA GAGATTTATA
AATCTAATCT TATCCTGCAC CCTGTTCTTA CTTGTCTCTC CTCTCCACCC
TGTC~T ATTGCTCCCA AGTTACTGAG TCTGCCCTTC TATTCCTCCT
GCCATGTGAG CTGTGTGGAC C[C/T]GCTC CTTTCCCCTG TCATCAAACA
TATTCTTTCA ATGGTACATG TGTTTTCTTA ATAGGATTAT TAGGAAGAGA
AATGTTAGAG ACATGCTCTT AGACATTATC TTTCTAACAC TCTGTGATTG
GCTGAATTTT GTTAAAAATC ACCAAGTTTG GGGGTCCCAC TTATCATTTA
ATGATTTTAT CTCCTAGTTT AGATGTGGGG TCCACCAAAA TTAGTGATTT
TCAATAAATT TCATTCAATC ACAATGAGAG TATTAGAGAG AGAGAGTTGT
TAGCATTCCT CCAGTAGGA ATTGTTTGTG AAAAACTGGT TTGGTGCCTG
AAGAACATTG ATTCTTCTTT GAAGGAAAAA AAAAA[A/G] GAAGGAAAGC
ATTITTTCCC ACTCTTTTCA TACTTAGTAT CAAATTAAAA TTGTCCTCAT
TATGATATTG ATCATTTCTT TGTTTGGAGT ATGCTTGAGG AATTGGTGCA
TGTTTTAATA GTGTCATATA TTAAGTACTG TAGGAGTAGA ATGTACTCTG
T1TTCATTGA TATATGTATT TATC[T/A]T TTTTG[G/T] GATTTCAATA
ACAGGGGATG TTAAGAAACT CCCTTGTCA ATAAAACTGG TTTAGAGCC
TGCAG
pha11136 (SEQ ID N0:766, 767) CTGCAGGGTG TCAGGGCAGA GCTTTAACAG ACTCTATTTG ATGAAGGCT
TTGCATGAGG ATTGATAAGG CAATITI"TG TTTGTCATTC TGATGGTTTG
ATCCAATTT CTTITCCATA TAGGATGAGA ATTTTATGGC GAATTTGGTA
GTATATT[A/ T]GAGGATCA CGCTTGTAAG GCGACGTAAT TTTTGTTTAT
TTTATGAAAC CCATCCAATT GTATTGAGTA TAATCTGTTA TTGAATATTT
AGAGATACAT GCAGTGCCAC TTGCTGCTTT TTATCTATGT TACAAAATTT
GGCATCCAGT TATATTAACC AACTACTTTA GTTAATTTCT GTGGAAAGTA
TTT[AITjTA CCAATCACAA TGCGAATAGT GAAACAATCA CAACTAGAAG
TTCTCGCTTC ATC[CIA]TT TTTATTCGGA GCATCGTTTC AAACACGTTA
GTTTTGCATT TTGCAATTTT TTTTTT[T]A AATCAATAGG CTCTGTTCTG
CTGAGGAATA CGTGTTTAAG GAATAACTAC CCGAAAACAA GCAAATCCCA
AACAACTTGT ATTAACTTCT GAATATTTTA CGGGATGGTT TITfTAAATG
WO 99/31964 g4 PCT/US98/26935 TAAATTTTGA GCATTCCTTG AAA...{- 200bp)...TGGGT[C/T] GGATA[T/C]
GGATCTTGAA AAGAAGTTCT TTTATTTCAT GGATTGCACT TATCTGTGT[
G/T]TGAAAC GCGACTTGGT GAGTCGTGCC TAAAACATGG TGGATAATAC
TCATCCCAGC TTGTGTATCT TATGTATG[T /C]TCACAAT CTCTCTTGAT
GAAGAT[T/C JTATCACTGC TTCCAACA[C /G]AGAGAAC TTT[G/AJTA
CCTA[TA]CT ATGATCGGAA AAGAAAAAGA TTTTGTGCTG ATTTITTAAT
ATAAAATAAT ATTGATAAAA CAATTTCTTG AATTCGATTT TGCTCACATG
AAAACTTTCC TTCCCTAAAA CTITI"T[C/T ]TCACCCAA AAAATAATAA
CAACA[T/C] AATTCTGTAT TCATGCCTAA GA'CI"ITTAGA TTTGTTCATT _ CCAATGAATA AT[G/A]TGA CTACITI"TAA TCATGTGGCT ACATTGCTAT
AGTTTGTAAG TTGTAATATT ATTAAAATTT G[AT/GTA]A ATACAATCGA
TAAACTGAAA TCAGTAAAAG AGAAAACAAG AGTTGCGAAT AACGGATTTA
TATGATAGTG ATTTCCATTT AT[A/G]CGA GGAAAATCTA CTTGGACATA
AAACAAGAGT CTACATAAAC CAAATAGCCA AGGAAATTAA AAAGGATACA
CAACATAAAG GAAAAAAGAA AAGAAAAAAA TCTCATCAG TCACAGATTT
TCTATATGTC AAGCTAGTT CAACAGAGAA GCATATCTCA TCAAAATACT
TCCATCCTTT GAGGAACTGG GTTTTATGAC CTGCAG
pha11138 (SEQ ID N0:768) CTGGATTTGA GATCACCAT CCCTTATTTA ATTAGTGCTT AAAAGATTCT
TCTGGAAAA GCCAAATATA GTTGGT[C/T ]CAAGGTTTT GCCATTAATA
TCCGATTAGT TAAGGGAGAG ACCATCATAC TTAAGGGGAG AAATTTATAG
ACTATIT'TTT ATTTCTTGAA GTAGCTAAAT TAGCAAACCA ATAGTCTAAA
TGATATAAAA AGACACAAAA CAAAATCTAT TTCAGGACAA TAATAAGTGA
TAGAAATAGT TCAAGTTTGG CAAGGGAAGG AATCCTTACC CCAAATGCTA
GCTTTGCTGG AAGGCAATGG GCGAACCAAT AAGTTAGCAG CATGTGGATC
ACAGTGTACA AACCCATGCT TGAACATCAT TTCAGCAAAA GTTTGACTAA
CCTGCAAAAT GTTTATACAT TTTACTAAAT TACTCCATAA ATGAAAGACA
GCCTATGTGT AACACTTAAA AGTTATCACC ATGTGACCAT GAAGTCACAG
GTTTGAGTTG TGGAATCAGC CTCTTGAAAA TGCAAGGAAA GGCTGTCTAC
TATACAGTAT AAACCCACAC AGAGATCTGA CCGTTCTGTG GGCAGGAGGT
TTATGAGTTT ATGGTACATG CCTCCCTTTA TTTTGTTGAT AAACC[A/T]
TATATGGTAC TAATAGCATG AGGGAATCAA AATGGTCATC ATTAACTTGC
TTAACTTCAT TGACGTATGT TATCTTGTCA AAGATATCAT GAGGCATATA
ATAA[AAACT TCGTACCCCA TTGCCCAGAG GCTCTTCGCT ATGCGAAGGT
ATGGGGGAGG GACGTTGTAC GCAGCCTTAC CCTTGNATAT GCAAAGAGGC
TGTTTCCGGA TTCGAACCCA TGACCAACAA GTCACCAAGG AACAACTTTA
CCGCTGCACA GGGCTCGCCC ATCATGAGGC ATATAATAA] TGTTATAAAA
AA[CTT/AAA JACACACTAT CAACACCTAT TGGTGACCAT TGTACAAGTG
[C/TJCTATA ATAAAGCCTG TAAACATAA C[G/TJTACC AATGTTGAAA
GTTCATGCAG GTTGATCCCA AGTTTCCGA ATGGTCTTTA CATCATTTAC
ATAAGCACCC TCC
phall139 (SEQ ID N0:769) GAATTCCATT GAAGTTGTAG AAGTCAAACT TAATTGTAGG TTTAGGACAG
CCTAGTTTGG TATTGGTGTT TGTAAZTIZ"T TTGTATATTG TGTAGAAGTT
ACTGTTCTGA ACCGCTTGCT CTTZ"TAACCG GAATGAAAGA ACAGAGTTGA
CTTTGTTTCA CATGCGCAGT GGGATTGTGT TGCTTTATGA AAATGGAA_C
TGCTTTTGGA CTCTGTTGGG ATAAACTTCT CTATAAATA CTTATAGGAG
AAGAAAACAA TAAGGAAAAA TTAAATAACG TTCTTCCATA GACTAAAATT
AGCTTATGTA TAAGTTAAAA TCATCTTTTG GAGAAGCTAA ATGAGAAAAC
CTTTGCAAAT TAACTTGTGC ATAAGCTAAT TTTAGTGAAG CTAATTTTAT
TTTTGCTTCT TATCTTATGG AGAAGTTTGT TTAAATAGG[ G/AJATTTGG
TGTAAATAGT TGTGCTGTTG TTAACTGGGT CATITTTCAG TTTTAACCTT
GTCTGAAACC AA[A/G]CAT GGCTAACAGA ATGTTAACAT TTTCCAAAAT
TATATTCACA TCAGAAATTT TTTTAGAAGT TTTTCCGGAT GA[T/G]TTT
AGCCCAAGAA TTTGTTTGAA TTTCCAAGAT CCATGCTATT TGCTTCAATT
GTAAATTTAA ATCCAACGTC TTTCTTGTTA TAAATGCAAC CAAACATCAA
ATTGAGAATT TTG'ITTITCT CTGTGAATTA TTAGTTAAGT GTGTTTTCTT
TGTTTAAAAT AGATCTAATT TGTTAAGCTT TATTTACCAG AGTTACCAAA
CCATCTGTGA GAAATATCCT TCATTCCGTG AAAGATCTGA AAATGTTGAT
CTCGTGGTGG AAATTTCTCT GCAACCATGG CATGTTTZTA AGCCCGATGG
AGTAAGATAA CTCTCTGGCA CTCTTTAATT TATTTATGCT TTTTCAGGCA
TTTZTCA[TC /GT]GCACTT TTTGTGATAT GGTTCTGATT TTGTTGCCAT
TAACAATTTG CTTGGTACTT ATTCTGTGAC AGGTGATTTT ATTCTCAGAC
ATTCTTACCC CACTTTCTGG AATGAATATA CCCTTTGATA TTGTGAAGGG
TAAGGGTCCT GTTATATTTG ATCCTATTCA CACAGCTGCC CAGGTTGATC
AAGTGAGGGA GTTTATTCCT GAAGAATCAG TTCCATATGT TGGTGAAGCA
CTGACAATTT TGAGGAAAG AGGTCAGACT GATCCTATGT TCTGTTGCAT
TTCTGCTTTT GTTCATTTTA CCCCCCACTT CCTTCTTATA GCAGAAGCAT
GTCTGTTTAG TCAAGTTCCA TAAGACAGGA AATTCTTTAT GTTTGTTTGA
CTTAAAAAGA ACTCTGTATT TTATGGATGC TCATTGCTTG TGAATCATCA
TTGGGGGGTT GTAATAGCAT GAAATGATGG AGAACAGGTG ATTGAAGGCA
TATCTCATGT GGTACATGGG GGAACATATA CCTTGAAAAC TCACAGAACT
ATATATTTTA TTGAACTGTC TATCCTGTAG GTGGCTATTA AGATGCTGAG
TTAATCTACA T[C/T]TCCT CTTTGTATTA AAAAGTTGT TTCATGTACT
AGCAAATGGT GGAGCTTCAT GGAAACATA CCCGACATCT AAATTTCTTA
TTATTGTTGC TTTATCATTA TTTTCTGTGG CCCAATTGAT GTATTATGCC
CCTATGTTTT ACTATCTTAT TGAATT~
pha11627 (SEQ ID N0:770) CAATTCATGG TTTCTCfTTA TC]TTAT[G /A]ACATTGT TGCCAAGTAA
TACTACTAT ATAAATTCAG ATTTGGGTTT C[A/T]GAT AACCGTGGTC GTTAC
pha11628 (SEQ ID N0:771) CTGCAGTGTT GTCTCTCGG AGTTGCTTCA ATTGCTCATA CTCTTTGGG
ATAAC[CJAC TCATTTCAAA GATGTACTAG TTTAAAACAT GCAAA[AJAA
[G/A]ATAAA GTTAATGTGT ATTTTGTATG TTGTAGGGAA GCACAAAGTA
TCTTGATTGA ATTAGGAAGA TTACACGAGC CGTATGCATC AGAATAA[AT
GGTTTGTGGG AGGTTAGATT TTCTGAACGA AGATGAAGA] TATGC[A/G]
AATTCTITTC AAATTAATTT TGGG[C/T']A AATGATGAAG CTAGACTGAT
AATTGATTAA TTTTGGGCAA ATAATATTAT AT[T/C]ACA TGTATGAGAT
TGATTTTAAG TGTATATGCA TACATGAAGC AATAGACTTA ATTTAATTA[
C/T]CTTAAG GAGTG[C/T] TG[G/A]ACT TTTGAGGATG [C/T]C[C/A ]TTTTGTGCT
[G/TJATGAG CCCTCCATGG TTGACATACA [A/G]AGCAA ATTGCAGGGT
GTCTTTAGCT GAGGTIrI"ITG CTGCTTCGAA GTGGCAATT GAATCAGCTC
CGTTGGACAG TGACATGGTG ATGGTGGTGA TAATTAATT[ CG/TA]GCTT
AAGGGTAAGT ACAACTTCTT AGCTCTGTAA GCAAAGGATG CCTTGTGGAG
TTGGTTCATC TAATCCACGT ATATATA[G/ TJGGCTGAA[ C/T]GAGGGA
ACAAGAGTTT TCAATCAATG A[T/C]TACA ATTCCACAC TCTCGCCTCT
AAAGTGCAT CCCTCACATT GAAGCATCCT CCAAATCCCA AAATATTATT
ATTACCACTT AAAGCTATTA CAAATCAGAA AACACTGCAG
pha11701 (SEQ ID N0:772) TTCTCTTGG CTTGCCTCGC ACCTTCTCA ATAATATCAC AGTAAGAAAA
AGAAGACAGC TGTGGATGTT AGGAGTGATT GACCCGCACA TGCATATGGA
WO 99/31964 86 PC'T/US98/Z6935 TCCAATCGG[ T/C]AAG[A/ G)TGAAGTTT CTCTTGGCTT GCCTCGC[A/
G]CCTTCT[A /C)AATTAAT ATAT[ATATA T)ATATGTAT GTG[G/C]TT
TGGTGTCTG[ T/C]ATATGA CATATGTGCT CAGTGCCTCC GCCAAAGCAA
ATAAAATGGA TACTGAAATG GTTGAGAGCA TTGATGTCAC AGATGATGCA
AGGAGTTTTA GCTGGGCAGT GGACAGCGCC ATAAGGAGTG AGACTGGGTC
CATITITGGA GAATCTCGTG TCCGTAGTTT TGATTCGTTA GTGGACAGTG
CCATAAGGAG TAGTAGTGCT ACTGAAGAAG ATTTGGGATC ATCCCCGGTC
CATGTTTZ"fG CTATGGCAGA AGACAGGGCG ATGAGGAGTA GTAGTGGAGC
TGATTTGGGA ACATTCCCTT TTCATGTTTT TGCTACGGCT GCGGATATGA
GGACTATTGC AGCACGATTA ACAAAAGCCG TGTCGAATGC AAAAAGACAT
TTTGCTGAAG GATTATTTAC CCGGGCTGAG CTTATACCAG TCACCCAGGC
TGAGCTTGAA CCAGCCACCA ACAATTTCCC [AIT)GGATT ATTATTTACC
CTGGCTGAGC TTAAAGCAGC CACCAATAAT TTCTC[AC/T A)GTGACAAC
ATGATTGATT TTTATGTGTA CAGAGGCAAA CTCGTTGATG GTCGTGAGGT
IS TGCAATCAAA AAGAGGATAG CAACCAGGGG AGACTCGTTT GGTAAGTCTG
AATTTGCCAT CTTTTCCCGT TTACATCACC GGAACTTGGT TGGGCTGGTT
GGGTTCTGCA AAAACAGAAA TGAAAGGCTG CTGGTGTATG AGTACATGAA
GAATGGGTCG TTGCATGATC ATTTGCATGA CAAGAACAA TGTGGAGAAG
GCTAGCAGT GTGTTGAATT
pha12105 (SEQ ID N0:773) ACGTGGCACA AATCCAAGGA CGTGGCGCGG AGAATCCTC GATTCGGTAG
AGTACCAGAT GATGAGGTG CACGTACACC TTAGGTTTAG GAGAGGCGAA
TCTCGCGGGT AAGAAGATAT TCCTTTACGA CGCCGTTTGC AGGCCGAGCG
AGATTCACTC GTTGGAGACG ACGCCGTTTG ATTACGTGGG GAACTGCGAG
AACAAGACGC TGCACGCGAC GCAGCAGATC GCGGAGTGTT GGACGCGCGC
GGTGAGGAAG CTGCTGGAGA GAGTGGCGGA GTCGGTGGAG AGAAAAACGT
TGGAGAAGGC GGCGAGGGAG TGTCACGCGG TGGAGCGGAT CTGGAAGTTG
TTAACGGAGG TTGAGGACGT GCACGTGATG ATGGATCCGG AGGATTTCTT
GAGGTTGAAG AAGGAGTTGG G[G/A]ATGA TGAGAAATTG CGGGGAAAT
GGTGGCGTTT TGCTTCAGG TCGAGGGAGC YCGTGGAGGT GGCGAGGATG
TGTAGGGATC TGAGGCAGAA GGTGCCGGAG ATATTGG
pha12390 (SEQ ID N0:774) CTGCAGTCTT TAGTTGGCCA AAGCCCAGAT TTGATTGTTT TTCTATCGCT
AGTTGAGAGA TGTTGGCAG GATCTTTGTG ACCAAATGT TGATGATTCG
TGGTATAGCC TTGTTTCTGG ATAGGACGTA TGTGAAACAA ACAACAAATG
TACAGTCATT ATGGGACATG GGTTTGCAAC TTTTCTGCAA ATATCTTTCT
CTATCTCCAG AAGTAGAACA TAAAACTGTT ACTGGTCTTC TTCGTATGAT
CGGAAGTGAA AGGTAATTTA TATTTTGCAA TCTCAGTTAT GAAAATGACC
AGTACAGTGT ATGTGCAGGT TGTTCTTCAT GTGTTCTGTA TATGCATTCT
AGAATTCATG TGATGGGAAC CAGTTGTTAC TGATAAAACC AACGAAGACT
AGTTCTTTAC AAGAACATAA GTGCAGAATA AAAGATAAAT CTAGAATTTT
GCAAAATGCA GATATACTGA ACCTCTAGCC CAAGCCCATT CCCTTTZ"TGC
CCTCACTTCA TGCCTTCTAT GCACTACATA TCTTTCCCTC AT[A]TTITT
TiZTfGTTTT TCCCTAATTA TTTTCCACCT GCGGGACCTC TCACTATTCC
TTGTCAGTCC AGCTCTTGGG TTATGCAGAA AACTTAACAA TGTGGATATT
CCCTTTATTT TTTATTCGGT CCTCCTCACC CATGTGTGTT TACGCTITI'I' ACTTCCCATT CCCTTCCTTG TAGCAATTAC CTTATTGGCC ATCCAATTTT
CTTAATATTC C
AAGCAGAG TGACCATTTG TTTGGGGAAA CCTTAAGTTG CCACTCTGCT
TTTTATTCTG TAAAATCAGG AACCATCACT CTGACATGAG GGAACTAGAA
TTGAGATTTC TGAATGGGTT TGGAATGGAG TTCTTGTTAG AGATGGGCTT
WO 99/31964 8,~ PCTNS98/26935 TTTCATAATT ATI'IZTGCTG ATCTTATTGT TTTTGAAAGA AATAATCCTT
GAACTACCTG CAAAACCATC TTCTACCAAT TGCCTACTGT TGTAGTCAAT
GTACAATGTT AAAGGTGCTG CCTTGTGACC TGAAAATCTA TATCTTGCCA
AAAAATGGGT CCATCAGTGT ATAATTAAAG TAATAATTTC AGAATAGTGA
TATATAATAA ACACCATATA GAGATTCCTA TGGTGATAAT TGTTGGAAAT
GGGAGTATTT TAGGATATTA GAGCTTCTTT AATGTTTGTT TTTCATTGTG
GCCTAGATAC TTGGTATAGC TAGGTTCAAT GCATTTTAGG AAGTGGTAGT
AGATCTGAAT TGCTCTIZTG TAAATTGTTT TAGCATTAAT TATCTTTGTA
ATACTCTTGA ATATAAATAG GCCATTCTGC TGGGTAGAA AAGACAATCC
AGCATATTA AGAAAAATTT TGTTTCTCAT CTTCCACAAT AATTATGTGA
CTTAGCCAGT AATTTTCAAT CTTACTGCAG
pha12391 (SEQ ID N0:775) CTGCAGTGCC ACTTGTACAA GCTCTGCTG CCAGGTTAAG TGTTTCTAA
TAAATTGTGC TTGGTTTTGG TATCAGATTG ACTACAT[A/ G]ATGAAGCC
ATTCTGACAT CATTCTGAAA TAATGAAATT TGGAATTGGA AATCTT[G/A
]CTGTGATCT TI'TT"TCCCCC TATTTGAGGG AAGCAAATAC TAGTTTGGCT
TAATTACACT TTTAGTTCCT TTATTTTAGC CTATGCACGA TTTTGGTTCC
CCTAGTTCTA ATTGCTTGCA TTTAGTCTTT GTAGTTACTC AATTGTTAAA
ATTAAGTCCC TCACCTCATA TTTTGTCCAA TATTTAGTGG AAATCTCACA
AGGTGTGATG TAAGTCATGT G[G/A]CATT GATTTGTGTA GATTGTTGTA
GGCAAAATAT ATACAAjT/C ]ATCTAAATA AAATATTTCA TATATTTGTT
TCCTTTGTTA ATCTATACAA AAGTATAAAT CAATG[A/T] TGGATAAATT
CTGGCTGTCT TTTTCAATGA ATAATGTCTG GCCTTGCAGC TTTCTCAAAC
ACACTAGCCC CAATTTGTTC TAGTCATAGC ATGGCCATGT TCTCTCTAGT
ATGTCCAGCT GCTTCAAATT CCATTGTGTC AATCTCTATG TGTCAAATCT
CCAACTATAT CAACTTCCA GTCGTGTCAA TCTGCAG
pha12392 (SEQ ID N0:776) CTGCAGGAC ATTCACAGTC ATTGCCGCA GTGGAAGGAA TTAACAAAGA
TAGTAACAG GCAATCTAAC TGCACTGTCA GACTGTGAT A GAACAGTC AA
TGCAGGA TG TAGTGGT GGACTTGCGG GACTATGCCT TAGAGTTCAT
TATCAACAAT GGTGGCATTG ACACCGAAGA GGATTACCCC TTTCAAGGTG
CTGTTGGTAT TTGTGA(T/C ]CAATATAAG GTTAGTTTTA CCTTTGATTC
TTTGATAAAT TAGTAAATGT TTCTAATAGT TTCATTATAA TACTTATATA
ATTI'I"TTTCT TATCTATAGA TAAATGCAGT TGATGGTTAC GAACGTGTTC
CTGCCTATGA TGAATTAGCC TTGAAAAAGG CGGTAGCAAA TCAACCAGTG
AGCGTTGCCA TATATTGAAG CATATGGCAA AGAGTTTCAA TTATATGAAT
CAGTAAGTTT TTTAATCAAC TTTACTTGAA AAGTAAAGAA CTAAAGGAAC
CTACAATGTG AGTAGAGAGA CGGACTTAAA GTTAGCAATG CATTACATTT
AAACTAATCT AATCTAAAGA GATACTCTCT CCAAATTTAA ATATAAACAA
ATTTAACTAT CAAACATATG AATTAAAAAA TTTAGTTAAT AATATTTAAT
TCAAAAATCT TCCTCAAATA TITTTCATAA TGATGCTTGT AAAATAAAAA
TAAATGATTT ATGAGAAATA AAATTTTATT AAATGAATCA CATAATAAGA
ATATTATCAT TAAATAAAAT AAAAATTAAT TAGTATTTAA TTTATTTATA
TTTAGATCTA AT'I"I"T'I"ITI"1' TATTGCTTAT ACTTAAGTCC AGAGAGAATA
CATTAAAAAT TGTTCAATTC TCAAATATAA ATTACAATAA CTTTACTTCT
TATTTACTGA CTTAATCTAG TATATTTTAT TTGTCCTAAT TTTCCAGGGT
ATATTCACAG GAAAATGTGG CACGTCAATA GACCATGGTG TTACAGCTGT
TGGGTACGGA ACAGAAAATG GAATTGATTA TTGGATTGTT AAGAATTCAT
GGGGTGAGAA TTGGGGTGAG GCAGGTTAT GTAAGAATGG AACGTAATAC
AGCAGAAGA CACTGCAG
WO 99/31964 g8 PCTNS98/26935 pha12393 (SEQ ID N0:777) CTGCAGGGT GGGATGTCCA ATGAATTAT ACAACACTAT TG CCCGTGT AAC
TGATGG AATTTATGAA GGTACTGCCC AGGTTTATTT TTCTGTATTT
CTTAATGATG GTTTTGAATT ATATTGTAAT ATCCTGTTTT GCTACAGGCA
TTGCCATTGG TGGAGATGTT TTCCCAGGTT CCACACTTTC TGACCATGTT
TTGCGGTTTA ACAACATACC ACAGGTAAAA TTTCACTTTT ATCTTGGGTA
CTGTATATAA TATCTGCAAC TCATTAAGAA CTGCAAATGA TTTGGAGTAT
GTCCTTTTCT TGTGAATGAT CAGTTTGTAT TTACATTAAA CCTCAGGTGA
AAATGATGGT AGTACTTGGG GAACTTGGTG GGCGTGATGA GTATTCTCTA
GTGGAAGCCC TAAAACAAGG GAAAGTGACT AAACCAGTTG TTGC[T/C]T
GGGTTAGCGG AACCTGTGCA CGACTCTTCA AATCTGAAGT ACAATTTGGT
CATGCTGTAT GTCAGTGGCC CAATATTI"IT TTACTAATTC TCAGTTATAG
CATTTATAAT AAGGGAGCCA AATTCCTGAT GTTCAGTGGC ACTTGTATAA
ATTAGGGAGC TAAAAGTGGT GGTGAGATGG AGTCTGCTCA AGCAAAGAAT
CAGGCACTAA AAGAAGCTGG AGCTGTTGTT CCCACTTCAT TTGAAGCTTT
TGAAGACGCA ATAAAGGAAA CATTTGACAA ATTGGTTAGT TTATCTTGAA
A'I"I"T'I"TCATC TCTCTAACTG ATAATTTTAC CTTGTCCCCA ACTCCCCATC
TCCCAAAGGG AGCAATGTTT TAAAGAAGTG TCTTTAGTTT TTAGTTGCCT
AGATTAGCTT GAATATCATA GTTGTITI"I"T C[TT]TTGTT AGGTTCAAGA
AGGGAACATC ACACCTTTTA AAGAGTTTAC TGCACCGCCA ATCCCTGAGG
ACCTTAACAC AGCAATTAGG AGTGGAAAAG TACGTGCTCC AACTCACATT
ATTTCCACCA TCTCTGATGA CAGAGGTATG TTGGGATCCT TCGAATTTAT
AAGTTGTAAC ATGGAACTGA GGTGTCCTAG ACACATTTGA TAGCTAAATT
AAACTTTTCT TCCATTATTT TTCC[G/A]A AGGTGAGGAG CCATGCTATG
CTGGTGTACC AATGTCTACC ATTATTGAAA ATGGTTATGG TGTGGGTGAT
GTAATCTCTC TTTTGTGGTT CAAACGCAGC CTTCCCCGTT ACTGTACTCA
ATTTATTGA GGTAGATTAT TCATACTCTA GTCTGCAG
pha12394 (SEQ ID N0:778) CTGCAGCCTC AAAGCCTGGC TGCATTATCT GATGGAAATG CAAGGATTG
GTGCTCATC ATGTTCCCCC ACAAGATGCT GTTGTCGACA AGGGAAAGAA
ACCTATATCA CCTCAAGTTA CTCCCAGAGG GAGAAGGTCC CTTTCTGAA
CCACTTAAAG AGTCAACAG TTGAAGGCCG AGCTGCTCTG TTGGCAAATA
ACAAAATGCC TCATCCATTT ATTTTGATCA A(G/A]CCCA AGGATGAGCC
TGTTGATGAT ATACCAGATT ATGAGATTCC CCTTGCAGTG ATTCCTCCTG
GTATTCAACT CACTATGCCT TGTTGAGAAG CTTCTTTCTT TCTGTCCATT
TTTTGTTGTT GTTGTTGTGG CATCTTGGTG TTGACTTTTA CTTTCATGGT
CCACTTTTAT TGAATTCAAT GATGAGTACT GTATTTTGTT GGTAGTATAA
TGCTGCGGAG TAGGTGGGTT ATTCTTCGGA ATATATAGGA AAAGTCTTTA
ATTCCAACAT ACCTCAAAAG AAGAAAAGAA TTTAATGCAC AATTTTGGCT
GAGCTGAGAA ATAATTTTAG TTGAATAGAA CCTGATTAAT TTTCCCTCCC
TGGAGAAAGG ATGACATTTC ACCAGAGCCT ACCATGTGCT TCCATTTTGC
ACACACAACA AGTCTGTAAT TGCATACAGT AGTAACAGTA TAACITI"TGC
TTTCCCTGAA AATATATGAT CATCTCTGTT TGCTGTTGCT TCTTTATGTG
GAAGATCCCT GGTTCCTGGA ATCAGAGATT CAGAAATGCC CTTGTATTTT
AGACTTGAGC CAAAGAAATG TTACTTCTGT GCCATTTTAT TGTTTTTGTT
ATTCTGTTTC TAACCCCACA TTAAACTCCA TTGATGAGTT CTATTTTGAT
TTATTTTAGA CTCTCCAATG GGTGCAGTTG AAAAGCAAGA TGTCCATGAC
ACTGTTGTAT CACAATGCAG AGATGAAGAC GTTGAACATG AAGATGTTTT
TCCTTCCTCA AATGAAGAAG CAACTTCTAA TGTATATGTA GCTTTGTCA
TCTATGGGAG AGGTAAAAA TTTCTCTGAG CTGCAG
WO 99/31964 g9 PCT/US98/Z6935 pha12395 (SEQ ID N0:779) CTGCAGTCAA TGCTGATGCC ATTTTCAGA ATTTGATCAC TCAGAAAAGG
GTGATGCAT TATATGGTAA GT"ITI"T[T/A ]TT[T/A][T TATT)ATAAA TAT[A/G)TA
TA[A]TATAT AT[A/T]TA[ 14(TA)/15( TA)/16(TA) ]ATAAATATT GAACTTTTA[
T/G]TTTT[A /T]TITIT[A /C]TITI"TAC CTTGGTTTGC TCATCTTT[C /G]ATGAGTG
TTGTTTTGTC TATTATTCGT GAATTTGAAT GGTTTCTAT GATTATACTT ACTA
TTTCT GTCTCGCCCT ATTTCGCAGC AATGGAACTT GCATTATCTT
TGGAAAGGCT GAATAATGAG AAGCTTCTAA ATTTACACAG CGTACGTCAC
TTCTATAGTT CTATTTTCTC AGTACTTTTG TGAATTCAAC TCAAACCTTT
TTAATTCCAC TACCTCCCAA TGGAATCTTA CTATGTTGTT GAATGCCAAT
AACTAGTTTT TATCACTATA TATGCAGTTA GCAAATGAAA ACAATGATGT
GCAATTTGTT GACTTTCTTG AAAGCGAGTT TTTGGTTGGT CAGGTAAATC
AATGTCCAGT AGTGTGATTC TTTCTATTGC TGTCAAGAGT AGTTGAGTAG
CATGGGAGAA ACATTGTTCT CATTAATTTC '1"f'IZ"TGTTGC TTAAGGTGGA
AGACATTAAA AAGATCTCAG AATATGTGGC CCAGTTAAGA AGAATGGGAA
AAGGACATGG TATAATATAT GTTAACTACT TGCATCTTAT AAACCAAA_C
GGTTGATTCA TATTCATATA CTTTGTGGT GAAATTAAAG TATTTTATTA
TTGGTGGANC TGCAG
pha12396 (SEQ ID N0:780, 781) CTGCAGGATT TGGCATCTTC TGTTTAATCA AGCATAAATG ATATGGTTGA
TTAATTAGTG ATATATCATG TTGGGATGCT GCTAACTGA ACGGTTGACC
ACTACACAG TGTGATGTTT TAATTAACTT TGCTTCAACA TT[ACC]ACC
TTATATTTAA AGAATTAGGA ATAGATTTAC CAGTAGGCCC TTATGATAAG
AAAAATAAAA TAAATTTACT TTCTCCTCAA TTAAAATCGG CTT[C/A]TG
CATTTCATTT TTTAAGAGGT TAAATGAGAG AATTTCTCTA TATAAAAATA
GATAAGTTAA TCACAAAATT TCAATTTCTT TTTCTITTCT TCTTI-TATAA
TTCTGTiTCT ATTTAAACAT TATTCTTCTG ATTAAGTGGG TTTGTGAATA
TGGATTGCCC AACCTAATTT ACCTAAAAAT TTGGAATAAA CTAAGTAAAC
TTATTTTGTC ATCATTAATT AGTTTAATGT GCATTACTTT GATTACTTGA
GGATAAAAGT GTGTTTTACA TATCTTTATC ATATAGGGAC CATTCGATGG
GCAAAATAAG ATAAGGTTTA ATAGGATAAA TTATTATATT CTI'TT"TCAGT
CAAZ"I"CI"TGA TACACTATTA AAATATAATT AGTTTATTTT ACCATCTATT
ATATCATGCT GTATCAGTGT AGCTTTTATC CCACCCATCT TAAGAAAAAA
TATATGTTGG AGAAAAAAGA AAAATATAAG AAAAAGTAAT GACAGAAGTG
ATAAATTTTG AGAAAAGTTC TTTGATGTTA ATGAAGAAAT TTGTTT...( --550 bp). ..TCCACGCT CGCCTCATCG TGCACTCGGT TACACCCGAT AACTTGTTCG
TCTCCAAACT CATCCTCTTC TACTCCAAAT CTAACCACGC GCACTTCGCG
CGCAAGGTGT TGGACGCGAC CCCCAACAGA AACACCTTCA CGTCATGTTC
CGCCACGCGC TCAACCTCTT CGCGTCATTC ACTITITCCA CAACCCCCAA
CGCCTCCCCC GATAACTTC ACCATATCCT GCGTCTTGA AAGCCTTAGC
TTCGTCTTTT TGCAGCCCCC ATTGGCGAAA GAGGTTCACT GTTTAATCCT
TCGACGCGGG CGGATTGGAC TCTGATATAT TTGTTCTCAA CACGTTGATC
ACGTGTTACT GCAG
pha12436 (SEQ ID N0:782) CAATTCAAGA AAAAAGACA ACATAAAAGA GTCTATAAGA CCTCC AAG
GA[A/G]TAT [C/T)CCCTA GAA[AA/GG) TCG[GTCG)C TTTCACAAAA GT[C/A)AAA
AAACATATGA T[A/G)TTT CCTGCACTTC ATATAAATAC TCGTCATTTAC
P13070 (SEQ ID N0:784) CTGCAGAATG CTGATATAAG TTGGCAAAGT CGTTTGCATT TTGGCCAGAA
GTTTGCATGC TTGAATAGCC ATTTTCAGCA ATAAAATATA ATCCCATGTG
TCTCTTI"TTG CTTCCATTCT CAGCCTTACT TTTTACCATT TGAAATTTCT
TTTGACATGT ATAAGCATTT ACATTCGAAC AAAATCTTGA AACTCAAGTT
TTAATAATTG ATTATGAGAT GGACTTTGAC TCTAACCTGA ACTTATTGTT
GACCAGTAAC TGTATI"1'T'fC ACTTAGAATC AGAACATAGC TAGAATTATC
GTGAGTATAC TCTTT..... ..(-650 by )......GGG GCTITT'I"1"i'T CATGATCATT
CATGTATCAT GTTACATATT TGACATAGCA GTAGGCGAGT TCTGAAATTG
TTCTTTGGTC AGGTTGTCAG GCACAAAAGG GATGTCAATC GATCATGTTC
TTAGAAGCTC AAGCAGGTAT AAAAAGGCCA GAACAACGGC CTCTCAATTG
GAAGAGACTC AAAGCCAGCC CAAATTTGTT CCAGACAGCC TCGCCGAGTA
ATCACACATT TCAGCAGAAG AAAGGAAAAA GGACACGTAA CCACAACAG't' GGTATI"I'TTG TTCATITT'TG TGTCCTTACA GAGCTCCGTC TCTCTGTTTT
TGTTGCCCTT TTACTTGTAG GTTTATCTCG TTCTTAACAT TCCAAAATCC
AAACAGAAAA CTATTTT'TG CCGTGTAA GCGTGTTTAC CAATCTAGTT
GGTGATTTA TGTCACGAAT CTAGTAAAGG AGAACAATGT GATATATTCT
ACTA[C/T]T ATATGTATGT TACACTAAAT TACTTATATC CCTAGTGTGA
ATTGTGTGAA GTATCTTTCA ATTCCATAGA GGAATAATC TCTTGACTGA
TAACTTGCAG TAGCGTTTT CTTITI"TAAC CACATITITT TCTATCTACC
AGAATGAAAC TTGAAACTTC CTTTAAGGAA TATGGGCCTA GTCAATAAGA
GAGATAATAT TTATATTTAG AGAGAGAGAG AGCAATAGGT ACAAGTTACA
AGCAAGAAGT GGTACAAGAA AAAGGCATTG GAAGCGGGTG CACAAAATTC
CATTCCCTTA GCATTAGCAT AGCAAAGTCA CTACTCTGAT CTCATTCTGT
TAAGCTGTTG AGTGCCGGTT GCAAAGTACA TAACCCAAAC AATCGCAGGT
GTAACCACAA TAAGCAAAAG CTGTCCCCTG TTGTCACTGG CGCAGCATCA
GCAATCACAG CCTCACTAGC CGATGCATCT GAGCCAAAAA GCATGCCCGA
TGCGCCCAGC CCAGCCCAAG CCCAATGATG ACCCCTTGCT GCTACGCATG
CGGTTCAGTT GGTTCAGCGC TGGCTGCAG
P13071 (SEQ ID N0:785, 786) CTGCAGTCCT TAGGACAGTA TTTGTTTGGT TCAACCAGCA ACCATCCTAG TC
TACAAA TCTTTGTTCT AACTATGGAT AATGGATGCC TTCAACCACA ATTAGCA
AAGCCAGGA T AGCTACTT GGTTCAAGCT TGTTGATTAC TGCATAATCC TGGT
AATT TTGACAGATG AGTCTGAATG AAATGTTTCA GAATCACAAC CTGTTGAG
TCTTGAGACT GGTATATTCT AGCTCTGCAT AGTAGACAGA AACAAAAAGT
CACTCCGG TAAAGACCAG AAACTGGATA TCTCAATAGT CAATTGTGTC
AAAGTATAAA TTATACTTAG TTTTCCAGAG AAAAATCTTT GGACCATCTG
AATCCAAATT TACTAACTAA TAAATGGAAC CTCCAATACA ACTACATCAT
TAAAGAATTC TGCTGGACAA TATTCTGTAA AAATCCATTA ACTTAATTGA
TATACTCACT CATAATTGTG CTATCTTGTT CGATGTTTGT TAATITITTC
AAATTGGGAT GTTGGTTTCC ATTATATCTT TTGCAAAAAA TTTGAGCATT
ACAAAATAAA ATTATAGAAT AAGATAAGCA GTTGGGGATA GAGAATCTCA
TITCI'CTGCA TTTCGTTAGA GATATTACGT TATATTAGAT ACTTCAGTGT
GGTTATGACA AGAAACTCTA TACTTTCTGG CATCTGAACC AAAGTTGTGG
TAGACCCACA CTAACATCAT ACCAAAGTG CTCATTCTTC ACAGAGTTAG
GAAAAACTA TACTTAACTA TAAAAACTCT GGCTACTCAC AAGAATGCTA
AGCTAGTTTT CTAATATTAA TTCAAAATAC AGAAAATTAT GTTTAATCTT
ACACTGGAAA TCAACGAAAA GAATGGAAAA AATAAAAATG AGGAAGAGCG
TATTGGATT GTTCACTTTA ACATCAACTT ATATCTCAGG AATTGTAAAG
CATCCCTGCA TCTCTATATG AGATTTTAAA CTAATCCAAA CAAGCTAACC
CTTCCTAGTC CTAGAACTCT TATCACATG CTGAAAGGTA TACACTCACT
ACTAACTAAA TAACATGAAA ATGATGTACA CTTTGATTAC AAACATTTTA
AGTTTTAACT GGTTTAAAAA TAGTTCATAA AAGAGAATAA TATATAGGTT
CAACAGTTTT CCAGCTAGTC CGCACACCTA TTTGCCACTT TGAAGCTGTG
TGTCTATTTC CGCTTTGAAG CTGTGTTATG AGATAAGACA ACACTCCAAC
TTTTATTCTA CAAATGCTGA TTTTCTTAAC AAAATCTTGT TTCTACTCTT
ATGTTGCTTG CTAATTAATA CCTTACCATG TCAACATCTA ACTTGAGAAG
ATCTATTIT'T CTAAATAAAT TATGTTTCTT CTTTATATAA TCGGCCTTCG
CTAATTAAAC TTTATTGTCC TTCC'IZTI~TA NTGCTGTTCT ACTAAGGAGC
AAGGGTAAAA TTGCGGTTCA TATGCACCAA CTA.....(-- 300 bp)... . GGCAGTAT
TATTGGCTTG GGAGGCCCGC CCATGAGAGT TCTGCAAAGT AAGCAGCGTC
AACTCACGGC CAGCCTATGA CCCCTCCAAA AACTTCTTGC ACATCACAAT
GAGTGCCGCA AGCAGCTCCC ACTGCTTGCT TTGTTTTCCT TACTCATTAG
TTTAAGGCAA [A/G]TTTT[ C/A)C[G/A] GTGCTGCTTT CCTCTT[G/A ]TTTATTGAC
TAGATGTTGG GTCGATACCT CGTTT[G/C) CAATCGATGT GGGGACTCTT
TATTCGAAGT TTGCTGAACT TGTGCTTTGC TATCTAAGCC TTGCACATCA
GCTTTTGTTA AATGAT[T/C ]CCTTGCACC TTGAATCATG ATCACATTTT
AGATTCTGAT ATTAGTATTA TG[C/T]TTT TGATGGAATG CAAGGAAAG[
G/A]AT1T1~T AATACAG[CC /AT]ACAAAA TGATACCAAA AGGCTTTATC
AATTGATTAA AAAAAGTGCA AGTGTTATTA TGCATGCACT TGAATTTGGT
TACAACTTGA TITTTTACAC AGCTTGGAAT TTACTGTTTG TAACAACTT
TCTCCCAACA CAAGTTTCG CTCAGTTTTG AATCATAAGG TTAGTAATCA
TTCAGCTGCT TTCAGTTATT TGGATCCTTC TTTGGCATGA ATGACTTTTG
GTTTTGGACA TACTTACAGG AATAATAATA TAAGCTGGGA TGCCAAAGAT
AATTGGAAAA GTACTCTTCT GCCTTCATGG TACTAGGCTT TTZTI"TCTGT
GAAAAATAGT GACAGTACAA GAGACTTTTA AATTGTATCA AACTGAAAAA
TCTTTGAGGC GATTTCTATT ACCTTCAGCT TCTTTAAGCT CAACCAACTG CAG
PI3072 (SEQ ID N0:787) CTGCAGCACC CCTAGCAAGA CCTGCTTATA TACAGAAGCA AATTTAATCT
ATGAGGCTCT GATTI"ITCTT TCTTTCTCAT ATTTATGTGA ACATCAAGCT
GAAAATAATC CTCATGTAA CCAACTCTCT ATGAAGTTTG AGATCCAAA
CCTGTTTI"TC TGAAAGATGG AAAAGAAAAT GAAAGTTAAA AAGATAGAAT
CAAACATGTA GCTATATTCA ATTGTAAGTG CATAATTCAG CA[A]TGAAA
GGTGATACAA TCAAAATTAC CCACCTCGGG TACAAGGGAA CATAGTCTCC
ATGTTGATCT GGC[A/T]AG TTAAATAAAA TTGAAACGTT AGACAAATAC
ATAAAATGTT ATCCCAAACA CTCAGAAAAA TAAATGAGTT ATTTGAACCC
T[C/A]AATA TG
CTGATAA TACCAAAAGG AAAATAACTT AGAGGCATTG CTCTCTATTC
AGAATTGCAG GTTCATCTTG ATAATTCATG TTATAAAGAT TTGCATTAAA
TATTAGAAAT TTAGAATAG CTTACCGAAG TGAAACACCA AATCCGATTA
GAGAACAAC ATCAAAAACA CCTAAGGAGT TGTATCTAAG TTTTGTCCAC
ATCAAAAAAT ACATTGATGA TGCGAAAAGG AGAGCATACC TCTTCCTGGA
ATGAAAACAG TCCCACTCTC TTTGCTGAAA TTGACACCAG GATTGTAACG
CTGCAG
P13073 (SEQ ID N0:788) CTGCAGCATG GGCACTATAC AAGGCTCAAG AAGAGCTCAT AAAGGTTGCA
AAAGAGTTTG GTGTTAAGCT CACAATGTTC CATGGCAGAG GAGGAACTGT
ATACTATTCA TGGCTCACTT CGGGTAACAG TGCAAGGTGA AGTTATTGAA
CAGTCATTTG GAGAGGAGCA CTTGTGCTTC AGAACACTTC AGCGCTTCAC
TGCTGCTACA CTTGAGCATG GAATGCACCC TCCTGTGGCA CCAAAACCAG
AGTGGCGTGC CCTCATGGAT GAGATGGCT GTCATTGCTA CAGAGGACiTA
TCGCTCCAT TGTTTTCCAG GAACCCCGTT TCGTTGAGTA CTTCCGATGT
GTAAGTATTG TTGAATACTT CAG[T/A]AT AGAAAGATGT CCTTGAAAAT
CTAGCAGTTT AAGTGGCATA TTTACAAAAA TGAT[A/C]A TTTAGTTAGC
ATGATTAACT AAAATGCAAT TGTTTCCAAT CAAGACAAAA TTCCTTTAGC
ATTATTGATG TTAAAATAAA TCGTTAATAA TGTTTTACCA TTTT[TJTTT
TCTTCCCCAA TCTTGTGAAT ATATTATTAT CAGTTGCAAA AATTCTGATT
CAACTGGAAT ACAATTATAT TTCTGATGA TTTAAGAAAC ATTTCTCTTT
CCTTTGGAGT CACAACTAT TTATCTTGCA ATTTATTTCC TTATTTTCTT
TCCTTGCTTT AATGCTGAAT ATTGTAAACT GCAG
P13074 (SEQ ID N0:789) CTGCAGGATA TGGAAAGTGG AAAACTAGAA AACGGAAAGC CTTTTAG TG
ATAGTGTGAT ATACTACTAT ACGTAAGTTA CATTCATTCA CCAACAAAAA
AAAACGTAAG TTACATGCAT TAGTTTTCCT TCTTTAAGGG ATAAAGTGAT
CTTTGAGGTA ACGATGGCCC AATCAAATGA GGTAACGATT GAGCTAAAAT
GCAAATGACA CAAATGCAAA TAGTAGTAAT TGCTTGAGAT TTAGGGGATT
AGTTTAGCAA GCATAGTGAT CATCATTTTA TAAAATTAAT AATGATATAT
TGAGGGACTT TTTAAAATCA TAAAACGTTT TAAATTCCAA CAGAAAT_GG
ATAGCAAGTC AATTTCATGC CTTGTGATAG AAAAAGAAA AGTCGTAGTA
AATTATATTT TGCTTTGTAT ATGCTCAAAT CACATTTITT ATTTTCTTTT
CTTTTAAAAA TTTAAAATTT AAAATATACT GTGTAATAGT ATGCTACACA
TTCATCATTT TAGAAAAGTC AAATGAATTA TTTGATITI"T TATTATATTT
TTTTATTCAA TTTGATTATT TATCTI'T'TAA AAAGTTTAAT TTGA
ATCTTATTTT TTGGTTTAAT TTAATTCTTT ATCTTTTAAA AAAATTAGTT
ATTTATCTTT TTTTAAAGTT TTATCATTCG ATATTTAAGA TTGACGTCAT
TAATCATTT AAGAATGAA[ C]TI'TTTCAG TTAAAAATGA AGTGGAAAGG
AG[A/G]GAA AAAATGAA[A /C]AAAAAAA ATTTATTTGT TAACGATGTC
AATTTTAAAT AGATTAAAAG ATAAAAAAAA ATCAAATAAA TCATTTAGTC
TTTTAGAAAT GAATGAGTTA TATTCAAATT TTTTAATATA GATGAACAAA
TGAATGAGTT ATACTTAAAT TITT'TAATAT AAAAGAACA TTTTTAAGGC
AGATGAAACT TAGGCCCTTT TCTGAAAACA GTTTATTAAC CCTTTI'CTGA
AAACATTCT TCACATTCAC TAAGTACATC TTCATGTCCT GCAG
P13158 (SEQ ID N0:790, 791) CTGCAGCAAC ATCAACAGTG TCCCGAATCG AAGCGTCTCG ATCCGGGTTC
CAACGCTCGC CGGCCTGGAC ACCGGAGATC GGTATCCGCC GGAGCTCCGT
TAATCTACTC CGGCGGAGCC ACCTTACTCC CAAGCGGGAA CATTTGTCCG
TCCGGGAAGA TCCTCAAACC GGGCTTGCCC TCGCGCGGGT CGAACCGGAC
TGATGTGTTG GGCTCCGGCA CCGTGAAACT ACGGCCGGGG CAGCATAGTG
CGAGGCGTCT CGGGCAATAT TCCGGTGCCC GTGGGCGCAC TGCCGCCTAC
GGTGAAGCGC GCGCTCAGCG GCTCCGATCC CGAGGAGTTG AAGAGGGCTG
GGAATGAGTT GTATAGAGGC GGGAACTTTG CGGAGGCGCT GGCATTGTAC
GATCGCGCCG TCGCCATCTC GCCGGGAAAC GCCGCATGCC GAAGCAACCG
CGCGGCGGCG CTTACGGCGC TCGGGAGGCT CGCCGAGGCC GCGAGGGAGT
GCCTCGAGGC GGTGAAGCTG GACCTTGCTT ATGCCAGAGC GCACAAGAGA
CTTGCTTCTC TTTATCTAAG GTAATGTATT AATGGAAAAA TTTGGATTTG
GATTTGCATT TGAATCTGAG TTTGAGTTTA GTTTTGTTGA GATTGGATTG
GAACCAAGAA ACTTGAGTTT AGAGCTAGTC AAACTTGATT ATGGCTTTGG
NCAACGTGTT TGGTACTCCC TGTTAACGTG ATTAGTGGAG . . . ( -- 50 by ) ..TGACT CTGTGTTGAA TGTTGATGCT ACTTTCAGTT GCTTCTGTAT
CCAAAGAAAC GTGACTCGTG ATATATCACT TTTGTGCAGG TTTGGACAGG
TTGAGAATTC GCGGCAGCAC CTGTGTCTCT CTGGGGTTCA AGAGGATAAG
TCTGAGGAGC AGAAGCTGGT GTTGTTGGAG AAGCATTTGA ATCGGTGCGC
TGATGCGCGG AAAGTTGGTG ACTGGAAGA GGGTGCTTAG GGAATCTGA
GGCTGCCATT GCTGTTGGAG CAGATT'ITTC GCCTCAGGTA GTTTTGAATT
GAAATTTCTG ATGTTACCAT TGTCTACATT [G/T]TI"I"IT G[C/T]TAGA
GATGTCATAT GAAATTATTA GCGTGTCTTT GGTTAATAGC ATAAAGTTTT
WO 99/31964 93 PC'f/US98/26935 AGGATAGCTA GGGTGTGATT GGTTTCTGTT TTCAAATAAC TGTITTTAGT
TTCCAAAATA CAACTAAACA AGGTTGCTCT GTTGGCTGGT GCAGAGTGTA
GTGCAAGGGA CAGAAAAGTG TGGAATTGAT GGGGAAATTT AACAGGTTTT
AATAATTTCT TTTGTTTGTT TI"1"I"TGGTTT TTGGTI"T'I"TA GAATACTTTT
[T]TTTTGAA ACAATITTTA GAACTTAATA GATTTTGGAT GGTAAATTGA
TTGGTAAG[C /TJAGTGTAA AATTATTI"I'G GGAAACTGTT TTTAAAATCA
AAAAGTGAGG AGAATTAATG AGGTCCTTA GTTCGTGGTA TGGTGGTAGA
CTAGATTCTC TACAATCAA GTATAAAATC ATCCTCGGGT TTTTACTTGA
TAGTTTAAGC TTTTAGGATA ATTGGTTTGT GACAATTTGT ATTGGGCTAA
CTTGTTGGAT TGCTACTTAC AGA -TTGTTG CTTGCAAGGT GGAAGCCTAT TTAAAACTGC ATCAACTTGA
AGATGCTGAA TCAAGTCTCT CAAATGTTCC GAAGTTGGAA GGTTGTCCTC
CAGAGTGCTC TCAGACCAAG TTCTTTGGTA TGGTTGGTGA AGCCTATGTT
CCTI"ITGTGT GTGCACAGGT TGAGATGGCC TTGGGGAGGT AAACACTAAA
AACCTTAGGC TTGAAATCCA AAGCTAAGTA AAACTTTTGA GTGAGGAACA
AGTAATGGAT TGTTGCAGGT TTGAGAATGC TGTT
P13560 (SEQ ID N0:792) CTGCAGAACT GGTGGTGGTA CTCAGTCGC ATTGCATTAA CATCTATAC
TATCTGTTCC TGCTGCACTG ATTTCAGTAA AGGATCTAAA AGCTTTGAGA
CTGGG[A/G] TTTAATATGG AGCTTATAGC TATTGGATGT TCAGTAAGAA
TAGACATGAT ACTATTTCCT ATGTCAACAT AATTCTTCTT CTTCTGACTT
GGAAAGTTAA CATGATCTTC TTCTGAATGC AGGCAATTTT TGTCTTATCC
TTTCGAGGTG TTATCCATAT ATGGATCATG GGGAAGAGGG GCCCTGTCTA
TGTTGCAATG TTTAAGCCAC TCGAAATTGT CTTCGCAGTC ATCTTGGGGG
TTACTTTTCT TGGGGACTCT CTTTATATTG GAAGGTATAA CTCAGTGTTT
TGT[C/T]AG AGAGTTATTT TCTTCTTACA TACTTCACAT TATTTTGTTA
AAATCCATTT TCT[C/T]CT GTCACTCTAT ATCACACTTT TCAGCTTATT
TTATACTTCT TTCTTTTCTC CATTTATGTC TACCTTA[TT A]GTTGTATA
AGAAGTTGTA AAAACCGTAG GTAGGAATAT AATTTCTCTT ATGTTAACAT
GTTCAATTTA AAACATTTGT TTCGGGTCTA ATTACAAGGG TCCAACTTTA
TGTACAGTGT GATCGGAGCT GCCATAATAG TTGTTGGTTT TTATGCTGTT
ATTTGGGGGA AAAGTCAAG AGAAGGTGGA GGAAGATTGT ACAGTCTGC AG
P1356I (SEQ ID N0:793) GAATTCTTAC AATCTCTTGA TTCATGTAAC TGACTTTATC AGAATAGTTC
AGTATACATT TTGATAACTT CACAATCTAA AGGATTCATC ATATAAGCAT
ATCAAAGAAA AGGTATGAAG GTAAAGTGGT TAAAGATAAT AAATATCTGA
CCTCAGGTAG TCTGGAAGCA AAATATTTTA TATAATCCCG TCCAATGTTC
TTCACAATAC TGTCTAAGAG ATAAAGAGAT GGCAGTITI'T GATCACTCGG
AACCTGGAAA GTCATGAACT CATTTCAATT AAACAAGACC TITTITCCAT
AATACAAAGA CGCCATGAGA GAAATAAGAT TTCCCTGACA TATGAAAACA
AGGGAAACAG CTGCAATCAA CGATATCTAA TCCAATTAAA AGTTAGTATC
AAATTTCACC AAAAGTGGAC TTTAGCACTG ATTGAGATTG GAAACTTTTA
GGGTTCACTT CTCTCTTGTA AATGGAAAAA TCCATGTTTA CCAGGATTCT
CAATCCCCTG TGCATACCAA AACCACTCCC CAATGGTGCT AATTACCAGT
TACACACGTA GTCAAAATAC CAAGCTACCG TAATTTGGCC AAAGGACAGT
ATTTGTTTT TCTTTGTAAC TCTGATTGGA AAACTAAATA TGTTTIZ'CGT
TCTTTTAAGG TGCAGGGCAG AGTGTACCAC TTAAGAACCA ATCTACATCT
ACAGGGCCAA TGAAATGAGA TGAAATTCAA TTTGAACAAA CTATGAATGC
CCZTiZTCTG ACTAGCATTC ATTTCCATCT AAGTACAACA CTAACCCAAT
TCATCAACGC ATAGCATGAC AAACAATAAA ACTAATTGAA TTGCTCTTTC
TTCATTCCAA ~ATCCATTACT ATTCAATTCA ATAAACTTGA CATAAACAAC
TGCATATTTT GTTCACCTCT ATAATATTAG CACAAACGGT GGCAGCAATT
GCCTTGGCAG CAGACAAGTT CTCTCCAGCA ATAATAGTCA AGTTGGTAAT
TATTGGCTTC GAATTGAAAG TGAGCTCAGC AAGCGCGGTC TTGTACTGAA
TCACAAGCTC TTGGTGCGGC GGAGGCTGCG GCTGATACCC TCCGCCGCCG
CCGGAGTCTC TGTCATATGC TCGGAACCTC GTAGACGGCA ACGTAGTAAC
TGCCGCAGGG CGTAGAGGTA ATTGTCGGGC GCTGAGTTCC TCGATTAATC
GAGGCTTCTT GGGACCGGGT TCTCTCGATC TGTCCAACG ATCTCTCCAT
GTTCATTCG AATCAAATGC AAACGAATTT AAAACCCTAA TCCTAACCTT
TATTGGTTCT CG[C/G]GCC GGTTTTATTG GGGAGGGGGA ATGAATCGAA
GAGAAGCTCG GATTTAAAAT TGTAGAGGGC GAATTGAGAT AAACCCTAAT
CCTAATTTAC ATGAATTAAT AAAAAATAAA AAATAAAAGA GAGAGATGGA
AGAAGGTGGA AGGAGGTGGA GTGTTTATAA GTAGGCTGTG ATCTTGGTTG
GAAAAAAAAA [A]CGAAGAA GAAAGAGTTC AAAAACTTAT GATGGAGTTC
TTATATTITT TAACGGTTTC CTAAAGCTTG ATTTTAAACG ATAAAATTTT
ATTAAAGTAA AATAAACATT TTCTGAT[G] AAAAAAAAAT ATCACTI"I"IT
TGTGTAAAGA ATATATCACG TTTAAATGTT TAAAATTAAC CTTAAAATTA
AATA'1~I'T"1'TA AAAATCTTTT TAGATTAGGA GTGTTTGGAT AGGATTTAAG
ATGAAATAAA ATTCAAATCA ATCGTATTTA AATAAATTAA TTTAGTGTTT
TAAAGTATCT TCAGTCGATC TAAATCAAAT CAATGTAATA ATATTAATAT
TATGTTGTTT TGATACTTTA ATTTCCAAC ATAACATATT ACTTACATCC
GACTTAAAT ATAATTAATA ATAAGTTGTT TTCATGCTAG GAATTTTATT
ACAGTATCAA AACACTAACA TGATTTAAAT ATCCTTATGA GATAAGGCCA
AAAAATTCCA ACGTGTAAGT GATACCCAAC TTCTTATTCG TGATGTCTTT
GTTGATGGCC ATTGGCGTTG GAAATATTTT TTCCTCCATT ATTCTAATTG
ATGTTAAGCA TAAGATGATG AACTTAATTC TTGATGAAAA TAGTCATGGT
GTGATCATTT CGGGGCATAT CCAATCTGGT ATCTACACTG CCAAGTCCAC
CCATGCAAAT GATTGATTCA CGAGTCTGCT AATCATAGTT CACTTACTGA
GAGATTGTAG CAAGGAAACT CAAATATGGA GTCTTATAAA TATGGATTCC
AAGTGTTGAC TTCTTCATTG TGAATTAAGA CATTTGGTTC GCAACTCAAC
TGAAACGACA TAGAAATCTT TTTGCTTGTC TTGCTGGGTG GTTGATTTGG
ATATCAAGGT ACGTGGAGAT CTTTGAGAAA CGCATTTGGT CTACATGG
P14257 (SEQ ID N0:794) CTGCAGAGGA TGATTGATGA GGGCACCTCT GAACATGTGC TAATAGTGTT
CAATGGAGTC ATAGATGATG TTGTTGAGCT TATGGTGGAC CCTTTTGGCA
ACTACCTTG TGCAGAAGTT GCTTGATGTG GGCGGAGATG ATGAAAGGTT G
CAGGTTGT GTCAATGTTG ACAAAAGAAC CAGGGCAGCT AATCAAAACC
TCTTTGAATA TACACGGGT ATATGCATCC TTCTGTTGA ACTGAAAGAT
TGTTCTTTTT TTCTTITTCT ATATCATAAT GAACATGTTT TCTIZTrCTT
TTTGACATGC TGAATTTGAT AGTGTTGCTG TATCAGGACT CGGGTGGTTC
AGAAGCTGAT CACGACTGTC GACTCTAGAA AACAAATTGC AATGCTTATG
TCTGCTATTC AATCTGGTTT TCTTGCTCTT ATTAAGGATC TAAATGGGAA
TCATGTCATA CAGCGTTGCT TGCAATACTT TAGCTGTAAA GATAATGAGG
TATAACACTT ATCTCTTTTG CTTGCAATTT TATTGTAGGT TTATTTTCCC
ATTATTCACT AACTTCAACA GTACATAGCC ATATAGTTAT TCCATACTAT
TCAGTTAAAT TTATAAGAGA ACAAGAACAA CCATTAGCTG TCCACTTTAC
AATTTTGTTC TAAGCTGTAT GAAGGTATCA TTACACATGA TGACACAAAA
TGGTCTTTAG TCAACACTGC TTAGAGCAGA GGGCATTCCT GTTTTATTAA
TGTTTCATAA ATTGGTTTAT TCATGGTTTT GTGAAGCGTT GGCTTCTTAT
AGTGTGAGCT CTCCTGCCTC TTTTGGTCAT TTGGTGTCAA CCAGATGAAA
TATATTTCAC AAAAGGGTCT TAAATCTTAC ACCTTGTTGA AACTTTTTTA
ATTGCATGGT AATCTCCCAG TCTATTTTAT GCACACTACT TCAGAATATC
CTGACTTTTC AAACTAATTT CATTATTCAG CAGTTTCTAA ATTTGCTGGA
TACTTCATCT CTGGGATTTT CATATTTTCA AGCCTATGTT TATCTCTGGT
CACTTCAGAA ATTTCTAAAA GGTAAGCCAA CTGTGGAAGG AACAAAATTC
ACCATCTATT GAGGGATACT ATCTTATGAC CATAATGGTG AAGCATTCTA
S TATTTGGCCA GTTAGCCTGA ATATGTTCCT TATTTTCTTT CCCAATTAGG
AATTCTTAA[ T/A]AGGCTA ACTCTCTACT ATATAAGAAG CAATGGAAAT
AAATGTTCTG GTTGAGAATG TAATGATTAT TGACCATTGA CATGGGACTG
AAAAAGTTAG TAATAAATTC GGGTGATCCT TTTCATTTTG TGCCAGCCTC
TAATAAATTA AATTATTTG ACTCATTGCC ACTAAAATTT CTCTGTTTC _ ATTTATTACT TACCACGATA AATATTAAAT ATAACGTAGG TACTGCAG
P14395 (SEQ ID N0:795) AAAGATTGA GCCCTACAAT GACTAAATTA CGTGTGTAC CGATTAATAT
GTTTTAGAAA AATAATCACT TTTC'T[A/C] TAAAAAAA[A /T]TTAAGAC
AGTTTCGGAA AAACAAATAT TTACTCAAA TTTATTAATT TCCAAACATA
TATGTTTCGT TTATATATAG TTCACAAAGA AAACACCGAA GTTTGAAAGC
AAACACTACA [C]AAAAAAA AAAAAAAAAA AAAGGAGCGA AGTGCTAGGA
CAAACCCTAC AAAATAATTA ACCAGGAAGG GTAGGTGGTT ACGTTGTTAA
CTCCAAAGGA TGAAGTTTCA ATAATTGATC ATTTCCTTZT TGCCCAATTG
GCATAAAAAG TAAATTTTAT GACAAATATC TAACGAGGAT ATGGCTCGGT
GATTGGAGCC ATAATTGATG ATTCTTGAGC AGATTGACTT CCTGAATAAC
CTGACCTTGT TACCTCTGAA TGTTGCCCAC CAGACAGCAT AGCCGTAGGC
AAAGAACTCT TATCACGCAT TAAAAATGCA GGTTCAGAAG GTTTAGCAAG
TGGGAAAGAG TCACTGTTAA GCATCAGTAA AACGGTATTC ATAGTTGGTC
TATCAGCTAT ATCTTCCTGT ACACACAGTA ATCCAATGTG AATGCATCTC
CTTATTTCAT TCCAAGAATA ATCCT'fTAAT GTGTCATCTA CAATATTTGA
AACTGTCCCT CCCCTCCAAT TTTTCCATGC CTGCAAAAAA ATTGCTCAAC
TGAAAATAAA ACGTTTATAA TAT[AT]GAA ATTTCTTAAA TGTGGAAAAT
TTC[A/G]TA TGTACAACTA TAAGCTTCCG AACTTCAAAT AAATAGGGAA
ATTAACTATT AACTAATAGT CGTGAAAAGG TATCACACTT ACAAAGCTTA
ATAGATCTTG TGCATTTTCC TCGCTACCAC GAATCTCACT GTTTCTTTGT
CCGCATACAA TTTCCAGAAT CATTACGCCA AAACTAAAGA CATCTGACTT
GACTGAAAAC TGTCCATATT TAATGTACTC AGGAGCCATA TATCCACTGT
ATATCACAAT AAGCAAGCAC ATTT[A/G]G ATTAAAAAA[ A/G]TATTAC
TTCACCATTC ATGTTTCACC TATTGTI"T'I"T TTAAGGGCCT ATTGGCTTTT
GTTCTAAAGA AAATCACTAA TTGAAAATAT GGAATATCTT ATTAACTCTC
AAGTGTTGTT ATACTTACAA GGTCCCGACA ATTGTATTTG TACTGGCTTG
AGTTTGATTG ATCTCAAATA ATCTTGCCAT GCCAAAATCT GATATTTTAG
GGTTCAACTC TTCATCTAAC AAAATGTTAC TTGTTTTGAG ATCACGATGA
ACAA(C/TjT TGTAATCGAG AATCTTCA[C /T]GAAGGTA AAGAAGACCT
CGAGCAATAC CCCTTATAAT ATTATAGCGT CTTTCCCAAT TCAAATTCAC
ACGATTGTTT GGATCTACAT AAATACCCAA GCCACCATAG ACATGTATAG
TCTTTCACAA TTTAATAAAT TGTTCAATAA TATCTCTTTA TATCATCAAA
GGTAAAGTCA AAGCAAAAAT TTGTAACTTA CCAAATATGA AATAATCAAG
GCTTTTATTG GGAACCAATT CATATATCAA TAACCTTTC TCTTCTTGAA
AAACAAAAGC CAAGCAGTC TAACTAAGTT TCGGTGTTGA AGCTTCCCTG TTAC
In the above, bases that are ambiguous are represented by an N. All bases noted within brackets are polymorphic in some manner, i.e. differ between -genotypes (but sequences are known and represented). Bracketed sequence notations that do not contain backslashes within them represent insertion/deletion events, i.e. the bracketed sequences occur in some genotypes but are deleted out in others. Bracketed bases before an after slashes are substituted in different genotypes, e.g., the [AC/TT] notation indicates that some genotypes exhibit an AC
sequence at this position and others exhibit a TT. Similarly, [TCTAG/TCTGG/TC/GTTAGJ indicates that genotypes exhibited one of these four different sequences at this position, i.e., the nomenclature indicates that the site displays a combination of insertion events and single base polymorphisms.
The nucleotide polymorphisms described by this invention can reduce the expense and time required to exploit genetic markers in soybean improvement, seed production, and the protection of proprietary rights. We describe the use of allele-specific hybridization (ASH) as one technology that can be used to detect these polymorphisms. Using these polymorphisms with a technology such as ASH
is less expensive and time consuming than other genetic marker methods. The polymorphisms described here have the genetic advantages of being co-dominant IS and locus specific, and have the operational advantages of being obtained by PCR
amplification from small quantities of DNA template and being detected without the use of gel electrophoresis.
Accordingly, the disclosures and descriptions herein are intended to be illustrative, bui not limiting, of the scope of the invention which is set forth in the following claims. One of skill will recognize many modifications which fall within the scope of the following claims. For example, all of the methods and compositions herein may be used in different combinations to achieve results selected by one of skill. All publications and patent applications cited herein are incorporated by reference in their entirety for all purposes, as if each were specifically indicated to be incorporated by reference.
Claims (68)
1. A method of selecting a first plant by marker-assisted selection of a nucleotide polymorphism comprising the steps of:
(i) detecting a first marker nucleic acid from the first plant which is genetically linked to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP, which loci comprise the nucleotide polymorphism; and, (ii) selecting the first plant comprising the marker nucleic acid, thereby selecting for the polymorphic nucleotide.
(i) detecting a first marker nucleic acid from the first plant which is genetically linked to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP, which loci comprise the nucleotide polymorphism; and, (ii) selecting the first plant comprising the marker nucleic acid, thereby selecting for the polymorphic nucleotide.
2. The method of claim 1, wherein the locus is selected from the group of loci consisting of php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, php05290A, php02376A, and php 10078A.
3. The method of claim 1, wherein at least one additional marker nucleic acid linked to a locus from the group consisting of pA060A, pA077A, pA086A, pA 169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTI5A, pA593A, pA882A, p8320E and SOYBPSP is detected and the plant is selected for the at least one additional locus.
4. The method of claim 3, wherein a majority of the additional loci are detected.
5. The method of claim 1, wherein the marker is detected by amplifying a nucleic acid comprising the selected locus.
6. The method of claim 5, wherein the amplified nucleic acid is an amplicon selected from the group consisting of pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha10598, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha10637, pha11139, pha10655, pha11701, pha11627, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha08230 pha13070, pha13071, pha13072, pha13073, pha13074, pha13560, pha13561, pha14257 and pha14395.
7. The method of claim 1, wherein the marker nucleic acid comprises the polymorphic nucleotide.
8. A method of detecting a genetic nucleotide polymorphism in a biological sample from a soybean plant, comprising the steps of:
(i) providing the biological sample;
(ii) providing a first probe nucleic acid which hybridizes to a first target nucleic acid linked to a first nucleotide polymorphism in a first locus selected from the first group of loci consisting of php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A and php10355B, php02329A, php02371A, php05290A, php02376A, and php10078A;
(iii) contacting the first probe to the first target nucleic acid; and, (iv) detecting hybridization of the first probe and the first target nucleic acid, thereby detecting the first nucleotide polymorphism in the biological sample.
(i) providing the biological sample;
(ii) providing a first probe nucleic acid which hybridizes to a first target nucleic acid linked to a first nucleotide polymorphism in a first locus selected from the first group of loci consisting of php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A and php10355B, php02329A, php02371A, php05290A, php02376A, and php10078A;
(iii) contacting the first probe to the first target nucleic acid; and, (iv) detecting hybridization of the first probe and the first target nucleic acid, thereby detecting the first nucleotide polymorphism in the biological sample.
9. The method of claim 8, further comprising detecting a second target nucleic acid linked to a second nucleotide polymorphism in a second locus selected from the second group of loci consisting of pA060A, pA077A, pA086A, pA 169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E and SOYBPSP.
10. The method of claim 9, wherein the method further comprises detecting marker polymorphic nucleotides at a majority of the first and second groups of loci, thereby providing a comprehensive genotype of the biological tissue.
11. The method of claim 8, wherein the first probe specifically hybridizes to a single locus of a soybean genome.
12. The method of claim 8, wherein step (iv) consists of indirect detection of the hybridization of the first probe to the first target nucleic acid.
13. The method of claim 12, wherein indirect detection of the hybridization comprises detecting a hybridization dependent PCR amplicon selected from the group consisting of pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha11076, pha10653, pha10598, pha10615, pha10646, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha11138, pha10637, pha11078, pha11079, pha11139, pha10655, pha11701, pha11627, pha10633, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha10647, pha08230, pha13070, pha13071, pha13072, pha13073, pha13074, pha13158, pha13560, pha13561, pha14257 and pha14395.
14. The method of claim 13, wherein the PCR amplicon is made using the first probe as a primer for polymerase-dependent amplification, wherein the first nucleic acid is a template for the polymerase dependent amplification, and wherein detection of hybridization between the first probe and the first nucleic acid is performed by detecting the PCR amplicon.
15. The method of claim 8, wherein the method comprises amplification of the first target nucleic acid, or amplification of the first probe.
16. The method of claim 8, wherein step (iv) comprises direct detection of the hybridization of the first probe to the first target nucleic acid.
17. The method of claim 16, wherein hybridization is detected using a technique selected from the group consisting of Southern blotting, northern blotting and array-dependent nucleic acid hybridization on a nucleic acid polymer array.
18. The method of claim 8, the method further comprising detection of a plurality of target nucleic acids linked to polymorphic nucleotides in a plurality of loci selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E and SOYBPSP.
19. The method of claim 8, wherein the biological sample is selected from the group consisting of a soybean plant, a soybean plant extract, an isolated soybean plant tissue, an isolated plant tissue extract, a soybean plant cell culture, a soybean plant cell culture extract, a recombinant cell comprising a nucleic acid derived from a soybean plant, a soybean plant seed, and an extract of a recombinant cell comprising a nucleic acid derived from a soybean plant.
20. The method of claim 8, wherein the first target nucleic acid comprises the first polymorphic nucleotide.
21. The method of claim 8, wherein the target nucleic acid is amplified prior to detection by an amplification technique selected from the group consisting of: PCR, LCR, and cloning of the target nucleic acid.
22. The method of claim 8, further comprising marker-assisted selection of a soybean plant comprising the detected nucleotide polymorphism.
23. The method of claim 8, further comprising cloning a nucleic acid proximal to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, pA059A, pA064A, pBLTI5A, pA593A, pA882A, p8320E and SOYBPSP.
24. The method of claim 8, wherein a second polymorphic nucleotide is detected, the first and second polymorphic nucleotides corresponding to different polynucleotide positions, the method further comprising positional cloning of a clonable nucleic acid which hybridizes under stringent conditions to a genomic nucleic acid located between the two polymorphic nucleotides.
25. The method of claim 24, wherein the clonable nucleic acid encodes a polypeptide expressed in a soybean plant.
26. The method of claim 24, wherein the clonable nucleic acid is operably linked to a heterologous promoter.
27. The method of claim 24, further comprising transducing the clonable nucleic acid into a cell.
28. The method of claim 8, further comprising map identification of a second nucleotide polymorphism proximal to the selected loci, optionally by sequencing a nucleic acid comprising the second nucleotide polymorphism.
29. The method of claim 8, further comprising (v) transducing a nucleic acid in linkage disequilibrium with a polymorphic nucleotide from the selected loci into a soybean plant, thereby providing a transgenic soybean plant.
30. The method of claim 29, wherein step (iv) is performed after step (v).
31. The method of claim 8, wherein the first probe comprises a nucleic acid selected from the group of nucleic acids consisting of (PROBE
SEQS).
SEQS).
32. The method of claim 8, wherein the polymorphic nucleotide is in linkage disequilibrium with a Quantitative Trait Locus (QTL) selected from the group consisting of a QTL for resistance to soybean cyst nematode, a QTL for resistance to brown stem rot, and a QTL for phytopthora rot.
33. A method of separating two nucleic acids comprising an allele-specific nucleotide polymorphism on a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E, pA059A, pA064A, pBLT15A, pA593A, pA882A, p8320E and SOYBPSP, wherein at least one of the loci is selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php0.10078A and SOYBPSP; said method consisting of separating the two nucleic acids by size or charge and detecting the two separated nucleic acids.
34. The method of claim 33, wherein the two nucleic acids are separated by single-strand conformation polymorphism on a polyacrylamide gel.
35. The method of claim 33, wherein the loci comprise one or more loci selected from the group consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278 A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A php10355B, php02329A, php02371A, php05290A, php02376A, and php10078A.
36. A method of amplifying a nucleic acid, comprising providing a first primer nucleic acid which hybridizes under stringent conditions to a locus nucleic acid from a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E and SOYBPSP;
providing a template nucleic acid which hybridizes to the selected locus under stringent conditions;
hybridizing the primer to the template; and, amplifying a portion of the template nucleic acid with a template-dependent polymerase enzyme or a ligase enzyme.
providing a template nucleic acid which hybridizes to the selected locus under stringent conditions;
hybridizing the primer to the template; and, amplifying a portion of the template nucleic acid with a template-dependent polymerase enzyme or a ligase enzyme.
37. The method of claim 36, wherein the primer is less than 100 nt in length and provides a polymerase extendible substrate and the primer-dependent polymerase extends the primer.
38. The method of claim 36, wherein the primer is an allele-specific primer.
39. The method of claim 36, wherein the first primer hybridizes adjacent to a second primer on the template nucleic acid and the first and second primers are ligated with a ligase enzyme, thereby amplifying the portion of the template hybridized to the first and second primers.
40. The method of claim 36, wherein the polymerase is a thermostable polymerase and the portion of the template is amplified by PCR
using the first primer as a PCR primer.
using the first primer as a PCR primer.
41. The method of claim 40, wherein the method further comprises hybridizing a second primer to the template, wherein the first and second primer hybridize to complementary strands of the template nucleic acid, and wherein the first and second primers are PCR primers for the PCR.
42. The method of claim 36, wherein the template is amplified by a technique selected from the group consisting of PCR, asymmetric PCR, and LCR.
43. A composition comprising a first recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to a first allele from a locus in the soybean genome selected from the group of loci consisting of pA060A, pA077A, pA086A, pA 169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSF, wherein the first recombinant nucleic acid shows decreased hybridization affinity for a second allele from the selected locus.
44. The composition of claim 43, wherein the isolated nucleic acid is an oligonucleotide probe selected from the group of probes consisting of those probes listed in Table 2.
45. The composition of claim 43, further comprising a second recombinant nucleic acid which differentially hybridizes under allele-specific hybridization conditions to the second allele from the selected locus, wherein the second nucleic acid shows decreased hybridization affinity for the first allele from the selected locus wherein the second locus is selected from pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP.
46. A composition comprising a recombinant nucleic acid which specifically hybridizes to a first allele-specific probe and a second allele-specific probe, which first and second allele-specific probes hybridize under allele-specific hybridization conditions to a first haplotype of a locus in the soybean genome selected from the group of loci consisting of pA060A, pA077A, pA086A, pA 169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E and SOYBPSP.
47. The composition of claim 46, the composition further comprising an allele-specific probe which hybridizes to the recombinant nucleic acid.
48. The composition of claim 46, the composition comprising an allele-specific probe which hybridizes to a locus selected from the group consisting of php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php 12105A, php02340B, php05264A php02329A, php02371A, php05290A, php02376A, php10078A and php10355B.
49. A PCR reaction mixture comprising:
a polymerase enzyme;
deoxynucleotides;
a template nucleic acid comprising a polymorphic nucleotide, which template nucleic acid hybridizes under stringent conditions to a locus selected from a group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP; and, primers which specifically hybridize to the template nucleic acid, which primers are extendible by the polymerase under selected PCR reaction conditions.
a polymerase enzyme;
deoxynucleotides;
a template nucleic acid comprising a polymorphic nucleotide, which template nucleic acid hybridizes under stringent conditions to a locus selected from a group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP; and, primers which specifically hybridize to the template nucleic acid, which primers are extendible by the polymerase under selected PCR reaction conditions.
50. The PCR reaction mixture of claim 49, wherein the primers are selected from the group of primers consisting of the PCR primers of Table 3.
51. The PCR reaction mixture of claim 49, wherein the primers are allele-specific primers.
52. A PCR amplicon, which amplicon comprises a nucleic acid comprising a polymorphic nucleotide, which amplicon hybridizes under stringent conditions to a locus selected from a group of loci consisting of pha12105, pha12390, pha12391,pha12392,pha12393,pha12394,pha12394,pha12395,pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha10598, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha10637, pha11139, pha10655, pha11701, pha11627, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha08230, pha13070, pha13071, pha13072, pha13073, pha13074, pha13560, pha13561, pha14257 and pha14395.
53. The PCR amplicon of claim 52, wherein the locus is selected from the group consisting of php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A php12105A, php02340B, php05264A, php02329A, php02371A, php05290A, php02376A, php10078A and php10355B.
54. The PCR amplicon of claim 52, wherein the amplicon is selected from the group consisting of pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha10598, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha10637, pha11139, pha10655, pha11701, pha11627, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha08230 pha13070, pha13071, pha13072, pha13073, pha13074, pha13560, pha13561, pha14257 and pha14395.
55. The PCR amplicon of claim 52, wherein the amplicon is selected from the group consisting of php11138 and php11627.
56. The PCR amplicon of claim 52, wherein the amplicon is cloned into a vector.
57. An isolated or recombinant nucleic acid which hybridizes under stringent conditions to the PCR amplicon of claim 52.
58. The nucleic acid of claim 57, which nucleic acid is a DNA
clone.
clone.
59. A set of nucleic acid probes comprising a plurality of probe nucleic acids which specifically hybridize to a plurality of target nucleic acids which hybridize under stringent conditions to a plurality of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP, or to an amplicon selected from the group consisting of pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha10598, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha10637, pha11139, pha10655, pha11701, pha11627, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha08230, pha13070, pha13071, pha13072, pha13073, pha13074, pha13560, pha13561, pha14257 and pha14395.
60. The set of claim 59, wherein the set hybridizes to a locus selected from the group of loci consisting of php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php12105A, php02340B, php05264A, php10355B, php02329A, php02371A, php05290A, php02376A, and php 10078A and php10355B.
61. The set of claim 59, wherein the probe nucleic acids are arranged in an array.
62. The set of claim 59, wherein the set of nucleic acids is in kit form, said kit optionally comprising one or more component selected from the components consisting of a container, instructional materials, one or more control target nucleic acids, and recombinant cells comprising one or more target nucleic acids.
63. A recombinant plant comprising a recombinant nucleic acid which hybridizes under stringent conditions to a target nucleic acid comprising a nucleotide polymorphism from a locus selected from the group of loci consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, php02265A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05233A, php07659A, php08584A, php10355B, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP or to an amplicon selected from the group consisting of pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha10598, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha10637, pha11139, pha10655, pha11701, pha11627, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha08230 pha13070, pha13071, pha13072, pha13073, pha13074, pha13560, pha13561, pha14257 and pha14395.
64. The recombinant plant of claim 63, wherein the recombinant nucleic acid comprises a coding sequence encoded by a gene in linkage disequilibrium with a Quantitative Trait Locus (QTL).
65. The recombinant plant of claim 64, wherein the QTL is selected from the group consisting of a QTL for resistance to soybean cyst nematode, a QTL for resistance to brown stem rot, and a QTL for resistance to phytopthora rot.
66. A method of selecting a first plant by marker-assisted selection of one or more nucleotide polymorphisms comprising the steps of:
(i) detecting a majority of marker nucleic acids from the first plant which are genetically linked to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP, which loci comprise the nucleotide polymorphism; and, (ii) selecting the first plant to comprise at least one of the marker nucleic acids, thereby selecting for the at least one polymorphic nucleotide.
(i) detecting a majority of marker nucleic acids from the first plant which are genetically linked to a locus selected from the group consisting of pA060A, pA077A, pA086A, pA169A, pA280A, pA378A, pA505A, pA519A, pA588A, pA947B, pB032A, pB032B, pB039A, pBLT24A, pBLT65A, php02265A, php02301A, php02361A, php02370C, php02387A, php02388A, php02393A, php02396A, php02636A, php03522A, php05219A, php05233A, php05278A, php05342A, php07659A, php08584A, php10355B, pK069A, pK079A, pK401A, pK418A, pK644B, pL058A, pL183A, pR045A, pR153A, pT005A, pT155A, php12105A, php02340B, php05264A, pA059A, pA064A, pBLTISA, pA593A, pA882A, p8320E, php02329A, php02371A, pA343A, pA748B, php05290A, pA858A, php02376A, pG17.3, pB132A, php10078A and SOYBPSP, which loci comprise the nucleotide polymorphism; and, (ii) selecting the first plant to comprise at least one of the marker nucleic acids, thereby selecting for the at least one polymorphic nucleotide.
67. A PCR amplicon selected from the group of amplicons consisting of: pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha11076, pha10653, pha10598, pha10615, pha10646, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha11138, pha10637, pha11078, pha11079, pha11139, pha10655, pha11701, pha11627, pha10633, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha10647, pha08230, pha13070, pha13071, pha13072, pha13073, pha13074, pha13158, pha13560, pha 13561, pha 14257 and pha 14395.
68. A composition comprising a plurality of nucleic acids, which plurality of nucleic acids hybridizes to a majority of PCR amplicons selected from the group of amplicons consisting of: pha12105, pha12390, pha12391, pha12392, pha12393, pha12394, pha12394, pha12395, pha12396, pha10634, pha10623, pha10624, pha10649, pha11135, pha10792, pha10635, pha10638, pha10648, pha10621, pha11071, pha11073, pha10640, pha11076, pha10653, pha10598, pha10615, pha10646, pha10618, pha10620, pha10782, pha11131, pha11132, pha10650, pha10651, pha11138, pha10637, pha11078, pha11079, pha11139, pha10655, pha11701, pha11627, pha10633, pha11074, pha11075, pha10632, pha11628, pha11133, pha10641, pha11136, pha10658, pha10636, pha10783, pha10647, pha08230, pha13070, pha13071, pha13072, pha13073, pha13074, pha13158, pha13560, pha13561, pha14257 and pha14395.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US6818597P | 1997-12-19 | 1997-12-19 | |
US60/068,185 | 1997-12-19 | ||
PCT/US1998/026935 WO1999031964A1 (en) | 1997-12-19 | 1998-12-18 | Nucleotide polymorphisms in soybean |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2314992A1 true CA2314992A1 (en) | 1999-07-01 |
Family
ID=22080955
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002314992A Abandoned CA2314992A1 (en) | 1997-12-19 | 1998-12-18 | Nucleotide polymorphisms in soybean |
Country Status (5)
Country | Link |
---|---|
AR (1) | AR017917A1 (en) |
AU (1) | AU1927699A (en) |
BR (1) | BR9813805A (en) |
CA (1) | CA2314992A1 (en) |
WO (1) | WO1999031964A1 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
PL361279A1 (en) | 2000-07-17 | 2004-10-04 | Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Agriculture And Agri-Food | Map-based genome mining method for identifying gene regulatory loci |
US7973212B2 (en) | 2003-08-01 | 2011-07-05 | Pioneer Hi-Bred International, Inc. | Soybean plants having superior agronomic performance and methods for their production |
PT1885176T (en) | 2005-05-27 | 2016-11-28 | Monsanto Technology Llc | Soybean event mon89788 and methods for detection thereof |
WO2010135324A1 (en) | 2009-05-18 | 2010-11-25 | Monsanto Technology Llc | Use of glyphosate for disease suppression and yield enhancement in soybean |
US9204603B2 (en) | 2011-12-21 | 2015-12-08 | The Curators Of The University Of Missouri | Soybean variety S05-11482 |
WO2013096818A1 (en) | 2011-12-21 | 2013-06-27 | The Curators Of The University Of Missouri | Soybean variety s05-11268 |
US9493843B2 (en) | 2012-12-20 | 2016-11-15 | Pioneer Hi-Bred International, Inc. | Genetic loci associated with Phytophthora tolerance in soybean and methods of use |
US9464330B2 (en) | 2012-12-21 | 2016-10-11 | Pioneer Hi-Bred International, Inc. | Genetic loci associated with soybean cyst nematode resistance and methods of use |
US20140178866A1 (en) * | 2012-12-21 | 2014-06-26 | Pioneer Hi-Bred International, Inc. | Genetic loci associated with soybean cyst nematode resistance and methods of use |
US20160272997A1 (en) | 2013-10-25 | 2016-09-22 | Pioneer Hi-Bred International, Inc. | Stem canker tolerant soybeans and methods of use |
US11096344B2 (en) | 2016-02-05 | 2021-08-24 | Pioneer Hi-Bred International, Inc. | Genetic loci associated with brown stem rot resistance in soybean and methods of use |
CN113795597B (en) * | 2021-02-08 | 2023-11-17 | 中国农业科学院作物科学研究所 | Soybean SNP (Single nucleotide polymorphism) typing detection chip and application thereof in molecular breeding and basic research |
CN114107554B (en) * | 2022-01-24 | 2022-04-15 | 华智生物技术有限公司 | Primer group for detecting purity of soybean variety and application thereof |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5015580A (en) * | 1987-07-29 | 1991-05-14 | Agracetus | Particle-mediated transformation of soybean plants and lines |
US5491081A (en) * | 1994-01-26 | 1996-02-13 | Pioneer Hi-Bred International, Inc. | Soybean cyst nematode resistant soybeans and methods of breeding and identifying resistant plants |
US5689035A (en) * | 1995-09-26 | 1997-11-18 | Pioneer Hi-Bred International, Inc. | Brown stem rot resistance in soybeans |
-
1998
- 1998-12-18 WO PCT/US1998/026935 patent/WO1999031964A1/en active Application Filing
- 1998-12-18 BR BR9813805-7A patent/BR9813805A/en not_active IP Right Cessation
- 1998-12-18 AU AU19276/99A patent/AU1927699A/en not_active Abandoned
- 1998-12-18 CA CA002314992A patent/CA2314992A1/en not_active Abandoned
- 1998-12-18 AR ARP980106514A patent/AR017917A1/en not_active Application Discontinuation
Also Published As
Publication number | Publication date |
---|---|
AU1927699A (en) | 1999-07-12 |
AR017917A1 (en) | 2001-10-24 |
WO1999031964A1 (en) | 1999-07-01 |
BR9813805A (en) | 2000-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Morgante et al. | PCR‐amplified microsatellites as markers in plant genetics | |
JP5674753B2 (en) | Corn event TC1507 and method for detecting the same | |
AU2001235414B2 (en) | Methods and kits for identifying elite event gat-zm1 in biological samples | |
KR101303110B1 (en) | Corn Event MIR604 | |
EP1869187B1 (en) | Elite event a2704-12 and methods and kits for identifying such event in biological samples | |
EP1922409B1 (en) | Herbicide tolerant cotton plants and methods for identifying same | |
US9012613B2 (en) | QTL controlling Sclerotinia stem rot resistance in soybean | |
US6733965B2 (en) | Microsatellite DNA markers and uses thereof | |
EP1995320A2 (en) | Polynucleotide markers | |
AU2001235414A1 (en) | Methods and kits for identifying elite event GAT-ZM1 in biological samples | |
AU2019246847B2 (en) | Qtls associated with and methods for identifying whole plant field resistance to sclerotinia | |
CA2314992A1 (en) | Nucleotide polymorphisms in soybean | |
EP2240588A1 (en) | Gray leaf spot tolerant maize and methods of production | |
AU2016270918A1 (en) | Genetic locus associated with phytophthora root and stem rot in soybean | |
Shirasawa et al. | Single nucleotide polymorphisms in randomly selected genes among japonica rice (Oryza sativa L.) varieties identified by PCR-RF-SSCP | |
US8946514B2 (en) | Sorghum fertility restorer genotypes and methods of marker-assisted selection | |
KR102254956B1 (en) | A molecular marker for selecting onion white bulb color and the use thereof | |
Horvath et al. | A Co‐Dominant PCR‐Based Marker for Assisted Selection of Durable Stem Rust Resistance in Barley | |
CN109182342B (en) | Rice blast resistance gene Pisj of rice and application thereof | |
KR102175444B1 (en) | Molecular marker for identification of oriental melon and identification method using the same marker | |
AU2014386227B2 (en) | Markers linked to reniform nematode resistance | |
Ruanjaichon et al. | Small GTP-binding protein gene is associated with QTL for submergence tolerance in rice | |
WO1999041415A9 (en) | Transposable element-anchored, amplification method for isolation and identification of tagged genes | |
Wang et al. | Expressed sequence tag-PCR markers for identification of alien barley chromosome 2H in wheat | |
Chen | Frequency, distribution and sequence divergence of rice microsatellites |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Dead |