DESCRIPTION
GENE MARKER ASSOCIATED WITH SWINE PROLIFERACY
I. FIELD OF THE INVENTION
The invention relates to methods and compositions useful in swine breeding. In particular, a method for determining genetic markers for swine proliferacy has been used to identify a gene marker that is associated with increased litter size. This marker can be used to assist traditional breeding programs designed to increase the proliferacy of pigs.
II. BACKGROUND OF THE INVENTION
Reproductive efficiency, which is defined as the number of piglets produced per breeding female, is an important factor in the efficient production of pork. Clearly, the goal of the pork producer is to increase the number of piglets born per female. Presently, the average number of pigs born per litter in the USA is about 9.5. Therefore, females which produce numbers of offspring in excess of this figure are highly desirable in pork production.
Currently, female reproductive efficiency measurements are based on performance data. A more reliable measurement would be to determine, preferably at the DNA level, whether or not a given female carries genes associated with increased litter size. Such genes may contribute to increased fertility (number of eggs produced, fertilized and implanted) or gestation performance (percentage of implanted eggs resulting in live offspring) . At this time, however, little information is available as to which loci influence reproductive performance.
Chinese breeds of pigs are known for reaching puberty at an early age and for their large litter size. American breeds, by way of contrast, are known for greater growth rates and for leanness. If possible, it would be desirable to combine the litter size of Chinese breeds with the growth rate and meat quality of American pigs. This task could be accomplished much more easily if markers could be used to identify the pigs that are particularly predisposed to producing large litters.
Because of the role for steroid hormones in mammalian reproduction, it has been hypothesized that the genes for some of these hormones, or their cognate receptors, are involved in various aspects of proliferacy. A prime candidate for this sort of analysis is estrogen (or the estrogen receptor) , which has profound effects on the reproductive cycle. One study has suggested that a particular polymorphism of the estrogen receptor gene is associated with increased litter size. Rothschild et al . (1992) . It is likely that other markers exist, however, that will prove valuable in effecting breeding programs designed to increase proliferacy.
One general approach to search for genetic markers is restriction fragment length polymorphism ("RFLP") analysis. The use of RFLP's to create genetic "fingerprints" has been well-documented for a variety of different organisms, including both higher plants and animals and, in particular, pigs. For example, polymorphisms in the swine leukocyte antigen (SLA) Class I genes have been documented by RFLP. Jung et al . , Theor. Appl . Genet . 77:271-274 (1989) . Hoganson et al . , ABSTRACT FOR ANNUAL MEETING OF MIDWESTERN SECTION OF THE AMERICAN SOCIETY FOR ANIMAL SCIENCE, March 26-28, 1990, reported on polymorphisms in the swine major histocompatibility locus. Another report examined SLA
Class I genes in boars using RFLP's. Jung et al . , Animal Genetics 20:79-91 (1989) . Such RFLP patterns could, in theory, be applied to an analysis of animals possessing and lacking various desirable characteristics. In this way, it is possible that markers for the desirable characteristics can be determined.
At present, while tools are available through which genetic analyses may be undertaken, there remains a need for further information on the genetic background of swine that produce larger litters.
III. SUMMARY OF THE INVENTION
Therefore, it is an object of the present invention to identify swine genetic markers that serve to identify animals that exhibit or do not exhibit a tendency towards large litter size.
It also is an object of the present invention to provide a general method for determining the association of swine genetic markers for litter size.
In fulfilling these objects, there is provided a marker, designated 122.2, that is associated with large pig litter size.
In another embodiment, there is provided is a nucleic acid that corresponds to the 122.2 marker.
In yet another embodiment, there is provided a method for determining the association of a swine genetic marker with litter size using a cross-breeding program and RAPD analysis.
In still yet another embodiment, there is provided a method for breeding swine to produce female animals with larger litter size.
In particular, there is provided a method of screening for large litter size in pigs comprising the steps of :
(a) obtaining a nucleic acid sample from a pig; and
(b) determining the presence or absence of a marker for band 122.2 in said nucleic acid sample.
Another specific embodiment is a method of increasing litter size in pigs comprising the steps of:
(a) crossing a first pig exhibiting the trait of large litter size with a second pig exhibiting at least one other desirable trait;
(b) obtaining a nucleic acid sample from progeny of said cross;
(c) identifying at least one progeny of said cross whose nucleic acid sample lacks a marker corresponding to band 122.2; and
(d) selecting progeny lacking the marker corresponding to band 122.2.
Yet another specific embodiment is a method of effecting a litter size breeding decision for pigs comprising the steps of :
(a) crossing a first pig exhibiting the trait of large litter size with a second pig exhibiting at least one other desirable trait;
(b) obtaining a nucleic acid sample from progeny of said cross;
(c) determining the presence or absence of a marker corresponding to band 122.2 in said nucleic acid sample; and
(d) selecting for further breeding at least one progeny of said cross which lacks said marker corresponding to band 122.2.
In another embodiment, the marker has the sequence
GTAGACGAGC^GTTTCCΑAAGG&TGCTATTTTGGGGΑΠ CΑTCTGCT∞TTCΑAAGGTTTCCTACGΑTAAAACCCCT
GAGOΛCTCTC ΓGAAAGGCTTTATGAAGTTTGG( ATCTGGGGAC^ CTCGTTGAGAGAACTTTC∞AAATACTTCΑAACCGTAGAC
AAGCGTTCCΆCTGCTAAACΆAAC&AACΆCTAAAGAGTGGGAATATC^ TTCGCΆAGGTGACGATTTGTTTGTTTGTGATTTCT
AGAAAGAAAAATATCTGTGATGAAAATTATGTCTACCCCCAGACGAATGGGAGAAGACAT T ITTCTTTTTATAGACACRACRITTTAATACΑGATGGGGGTCT^
CCTGGGCTCGACTAC GGACCCGAGCTGATG.
Particular embodiments a provided where the marker is a nucleic acid molecule of from 10 to 12 bases or 13 to 15 bases or 15 to 20 bases or 21-50 bases comprising the sequence 5' -GTAGACGAGC-3' .
Also provided is an isolated nucleic acid comprising at least 10 consecutive bases derived from the sequence
GTAGACGAGCΆAGTTTCCΆAAGGATGCTATTTTGGGGATTCT^ (^TCTGCTCGTTCY^GGTTTCCTACGATAAAACCCCTAAGAAC^^
GΑGαVACTCTCTTGAAftGGCTTTA^ CTCGTTGAGAGAACTTTCCGAAATACπα ^C
AAGCGTTCCACTGCTAAAC&AACAAACACTAAAGAGTGGGAATATC^ TTCGCAA∞TGACGΑTTTGTTTGTTΓGTGAT^
AGAAAGAAAAATATCTGTGATGAAAATTATGTCTACCCCCAGACGAATGGGAGAAGACAT T ITΓCTTTTTATAGACΆCTACTTTTAATACΆGATGGGGGTCTG
CCTGGGCTCGACTAC GGACCCGAGCTGATG.
In yet another specific embodiment, there is provided a method of identifying a marker for litter size breeding in pigs comprising the steps of:
(a) crossing a first pig exhibiting the trait of large litter size with a second pig exhibiting at least one other desirable trait;
(b) breeding female progeny;
(c) determining the litter size for said female progeny;
(d) obtaining a nucleic acid sample from progeny of said cross; and
(e) identifying the presence or absence of a marker in animals with large or small litter size.
Also provided are kits for use with any of the foregoing methods.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
IV. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1. The nucleic acid sequence of the marker 122.2 is provided.
V. DETAILED DESCRIPTION OF THE INVENTION
As stated above, the use of RFLP's to create genetic fingerprints has been well-documented for a variety of different organisms, including both higher plants and animals. The principal underlying this technology is that small variations in the genomes of related organisms may reflect important differences in the phenotype of these organisms. The small genetic variations will be those result in changes in the size of genomic DNA fragments. These changes may be caused by insertions or deletions of genetic material or, alternatively, may result from the addition or loss of recognition sites for restriction enzymes, which cleave DNA a specific points. Any of these alterations can result in a change in the
molecular weight of DNA fragments generated by cleavage with a given restriction enzyme.
RFLP analysis is, for a variety of reasons, rather cumbersome and has been supplanted by similar, but more advanced, technology. For example, RAPD ("random amplification of polymorphic DNA") relies on short, randomly generated oligonucleotide primers that hybridize to multiple points within the genome of any organism. When a given primer happens to bind to both strands of the genomic sequence within about 2000 base pairs, such that one primer binds 3' with respect to the other member of the pair, the primers will support polymerase amplification of the intervening sequences.
By adjusting the length and composition of the RAPD primer and the hybridization conditions, it is possible to create the desired number of "localized" hybridization events (about ten to fifty per genome. If the resulting amplification products are integrally labeled, for example, with radio- or chromophore-labeled deoxynucleotides, the products can be identified following size separation, much like restriction fragments. Alternatively, unlabeled products may be visualized with a nucleic acid dye such as ethidium bromide. This provides a fingerprint similar to that produced according to RFLP analysis.
The present invention relies on RAPD technology to establish fingerprints for swine breed with varying degrees of proliferacy. By analyzing these fingerprints, one can identify bands whose presence or absence correlates with large or small litter size. These bands correspond to sequences in the genomes of the animals under study, the genomic sequences being designated as
"markers." Then, breeding programs can be implemented in which the identified markers are used to predict the
proliferacy of offspring, thereby permitting the introgression of desirable litter size genes into the genetic background of swine breed possessing other desirable traits.
As a first step in determining the existence of genetic markers that are linked to litter size, it is necessary to set up reference families involving genetically disparate parents. The Chinese breed of pigs are known for reaching puberty at an early age and, also, for their large litter size. For purposes of this application, a large litter is greater than twelve. In contrast, American breeds are known for their greater growth rates and leanness but tend to produce smaller litters. Combining the characteristics of these two breed would be of great economic importance to pork producers. The offspring of a particular Chinese x American cross should allow genetic loci involved with litter size to identified. Female progeny producing above or below average litter sizes are screened for their RAPD fingerprint patterns. Markers consistently segregating with larger or smaller litter sizes imply genetic linkage of the marker and genes involved with proliferacy. A variety of pig breeds may be used. Preferred breeds are Meishan, Fengjing, Minzhu, Duroc, Hampshire, Landrace, Large White, Yorkshire, Spotted Poland China, Berkshire, Poland China and Chester White.
Genomic DNA from swine tissue, blood or semen is used as the template for the amplification. Genomic DNA can be prepared by standard methods using a commercial preparatory reagent such as Triazol (Life Technologies, Inc., Gaithersburg, MD) . For example, Triazol™-treated tissue samples are extracted into a Tris-saline-EDTA-SDS buffer and the extract treated with RNAse A and Proteinase K to digest RΝA and proteins . After phenol :chloroform extraction, the DΝA is ethanol
precipitated by standard methods and resuspended in TE buffer. For a general guide, see Kawasaki, In PCR PROTOCOLS, Academic Press, New York (1990) pp 142-152.
Another tool necessary for RAPD analysis is an appropriate oligonucleotide primer. The primer used in this research project was a 10 base-pair, random oligonucleotide having the sequence 5' -GTAGACGAGC-3' obtained from the Biotechnology Laboratory at the University of British Columbia (Vancouver, BC Canada) . Aliquots of 10 μg were provided in dry form and resuspended in 200 μl of TE. Recommended storage is at -20°C.
In theory, any random primer may be used in a RAPD format. While the size of a particular primer may vary somewhat, a desirable size range is 8-15 base-pairs; the preferred size is 10 base-pairs. The optimal length for each primer will be determined empirically. Once the amplification conditions have been adjusted to achieve the appropriate number of amplification products, it is then possible to examine the resulting fingerprints from genetically disparate breed, and their crosses, to determine whether there are any identifiable differences between large and small litter-producing females.
A number of template dependent processes are available to amplify the target sequences present in a given template sample. One of the best known amplification methods is the polymerase chain reaction
(referred to as PCR) which is described in detail in U.S. Patents 4,683,195, 4,683,202 and 4,800,159, and in Innis et al . , PCJ? Protocols , Academic Press, Inc., San Diego CA, 1990, each of which is incorporated herein by reference in its entirety. Briefly, in PCR, two primer sequences are prepared which are complementary to regions on opposite complementary strands of the target sequence.
An excess of deoxynucleoside triphosphates are added to a reaction mixture along with a DNA polymerase, e . g. , Taq polymerase. If the target sequence is present in a sample, the primers will bind to the target and the polymerase will cause the primers to be extended along the target sequence by adding on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended primers will dissociate from the target to form reaction products, excess primers will bind to the target and to the reaction products and the process is repeated. Preferably a reverse transcriptase PCR amplification procedure may be performed in order to quantify the amount of mRNA amplified. Polymerase chain reaction methodologies are well known in the art .
Another method for amplification is the ligase chain reaction ("LCR") , disclosed in EPA No. 320 308, incorporated herein by reference in its entirely. In LCR, two complementary probe pairs are prepared, and in the presence of the target sequence, each pair will bind to opposite complementary strands of the target such that they abut. In the presence of a ligase, the two probe pairs will link to form a single unit . By temperature cycling, as in PCR, bound ligated units dissociate from the target and then serve as "target sequences" for ligation of excess probe pairs. U.S. Patent 4,883,750 describes a method similar to LCR for binding probe pairs to a target sequence.
Qbeta Replicase, described in PCT Application No. PCT/US87/00880, may also be used as still another amplification method in the present invention. In this method, a replicative sequence of RNA which has a region complementary to that of a target is added to a sample in the presence of an RNA polymerase. The polymerase will copy the replicative sequence which can then be detected.
An isothermal amplification method, in which restriction endonucleases and ligases are used to achieve the amplification of target molecules that contain nucleotide 5' - [alpha-thio] -triphosphates in one strand of a restriction site may also be useful in the amplification of nucleic acids in the present invention. Walker et al . , Proc. Nat 'l Acad. Sci . USA 89:392-396 (1992) , incorporated herein by reference in its entirety.
Strand Displacement Amplification (SDA) is another method of carrying out isothermal amplification of nucleic acids which involves multiple rounds of strand displacement and synthesis, i.e., nick translation. A similar method, called Repair Chain Reaction (RCR) involves annealing several probes throughout a region targeted for amplification, followed by a repair reaction in which only two of the four bases are present. The other two bases can be added as biotinylated derivatives for easy detection. A similar approach is used in SDA. Target specific sequences can also be detected using a cyclic probe reaction (CPR) . In CPR, a probe having a 3' and 5' sequences of non-specific DΝA and middle sequence of specific RΝA is hybridized to DΝA which is present in a sample. Upon hybridization, the reaction is treated with RΝaseH, and the products of the probe identified as distinctive products which are released after digestion. The original template is annealed to another cycling probe and the reaction is repeated.
Still another amplification methods described in GB Application No. 2 202 328, and in PCT Application No. PCT/US89/01025, each of which is incorporated herein by reference in its entirety, may be used in accordance with the present invention. In the former application, "modified" primers are used in a PCR like, template and enzyme dependent synthesis. The primers may be modified by labelling with a capture moiety (e.g., biotin) and/or
a detector moiety (e.g., enzyme) . In the latter application, an excess of labeled probes are added to a sample. In the presence of the target sequence, the probe binds and is cleaved catalytically. After cleavage, the target sequence is released intact to be bound by excess probe. Cleavage of the labelled probe signals the presence of the target sequence.
Other nucleic acid amplification procedures include transcription-based amplification systems (TAS) , including nucleic acid sequence based amplification (NASBA) and 3SR. Kwoh et al . , Proc . Na t . Acad . Sci . USA 86:1173 (1989) ; Gingeras et al . , PCT Application WO 88/10315, incorporated herein by reference in their entirety. In NASBA, the nucleic acids can be prepared for amplification by standard phenol/chloroform extraction, heat denaturation of a clinical sample, treatment with lysis buffer and minispin columns for isolation of DNA and RNA or guanidinium chloride extraction of RNA. These amplification techniques involve annealing a primer which has target specific sequences. Following polymerization, DNA/RNA hybrids are digested with RNase H while double stranded DNA molecules are heat denatured again. In either case the single stranded DNA is made fully double stranded by addition of second target specific primer, followed by polymerization. The double stranded DNA molecules are then multiply transcribed by a polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNAs are reverse transcribed into double stranded DNA, and transcribed once against with a polymerase such as T7 or SP6. The resulting products, whether truncated or complete, indicate target specific sequences.
Davey et al . , EPA No. 329 822 (incorporated herein by reference in its entirely) disclose a nucleic acid amplification process involving cyclically synthesizing
single-stranded RNA ("ssRNA"), ssDNA, and double-stranded DNA (dsDNA) , which may be used in accordance with the present invention. The ssRNA is a first template for a first primer oligonucleotide, which is elongated by reverse transcriptase (RNA-dependent DNA polymerase) .
The RNA is then removed from the resulting DNA:RNA duplex by the action of ribonuclease H (RNase H, an RNase specific for RNA in duplex with either DNA or RNA) . The resultant ssDNA is a second template for a second primer, which also includes the sequences of an RNA polymerase promoter (exemplified by T7 RNA polymerase) 5' to its homology to the template. This primer is then extended by DNA polymerase (exemplified by the large "Klenow" fragment of E. coli DNA polymerase I) , resulting as a double-stranded DNA ("dsDNA") molecule, having a sequence identical to that of the original RNA between the primers and having additionally, at one end, a promoter sequence. This promoter sequence can be used by the appropriate RNA polymerase to make many RNA copies of the DNA. These copies can then re-enter the cycle leading to very swift amplification. With proper choice of enzymes, this amplification can be done isothermally without addition of enzymes at each cycle. Because of the cyclical nature of this process, the starting sequence can be chosen to be in the form of either DNA or RNA.
Miller et al . , PCT Application WO 89/06700 (incorporated herein by reference in its entirety) disclose a nucleic acid sequence amplification scheme based on the hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies of the sequence . This scheme is not cyclic, i.e., new templates are not produced from the resultant RNA transcripts. Other amplification methods include "race" and "one-sided PCR." Frohman, M.A. , In: PCR PROTOCOLS : A GUIDE TO METHODS AND APPLICATIONS, Academic Press, N.Y. (1990) and Ohara
et al . , Proc . Na t ' l Acad . Sci . USA, 86:5673-5677 (1989) , each herein incorporated by reference in their entirety.
Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid having the sequence of the resulting "di-oligonucleotide" , thereby amplifying the di-oligonucleotide, may also be used in the amplification step of the present invention. Wu et al . , Genomics 4:560 (1989) , incorporated herein by reference in its entirety.
PCR amplification products are analyzed by agarose, agarose-acrylamide or polyacrylamide gel electrophoresis using standard methods. See Sambrook et al , supra . In a preferred embodiment, the gel is a 1% agarose gel that is stained with ethidium bromide and visualized under UV light. Alternatively, the amplification products can be integrally labeled with radio- or fluorometrically- labeled nucleotides. Gels can then be exposed to x-ray film or visualized under the appropriate stimulating spectra, respectively.
As stated above, reference breeds (e.g., Chinese x American) will be crossed in order to produce FI offspring. These FI offspring are then crossed to themselves to produce F2's. The litter size of these F2's is then measured and the fingerprints compared. A variety of different approaches may be taken to compare the fingerprints. For example, the females can be divided on the basis of low and high proliferacy and individual markers correlated with the groups. Alternatively, the average litter size obtained with the absence or presence of a given band may be determined. Once identified, such markers can be used in a variety of breeding programs to effect introgression of desirable proliferacy traits into certain breeds.
As discussed below in the examples, a marker has been identified using this approach. The marker corresponds to a 255 base-pair band generated following RAPD analysis of swine DNA with a random oligonucleotide having the sequence 5' -GTAGACGAGC-3' . The band and marker have arbitrarily been designated as 122.2. Marker 122.2 tends to be associated with animals having smaller litters and is absent from animals with larger litters.
The markers identified as correlating with large or small litter size can be cloned and sequenced. Methods by which cloning and sequencing may be accomplished are well known to those of skill in the art. For example, random primers generating the markers of interest can be modified to include additional nucleotide sequences containing restriction endonuclease cleavage sites. The presence of such sites allows for the directional cloning of PCR products into suitable cloning vectors after treatment with an appropriate restriction enzyme. See Finney, "Molecular Cloning of PCR Products" in CURRENT
PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel et al . Eds., John Wiley & Sons, New York (1987), p. 15.7.1.
With the possession of more sequences, it will be useful to design probes that will be specific for the marker sequences. These probes can be used in an RFLP- type of analysis. This may permit the identification of polymorphisms that more accurately correspond to the trait of interest.
Following cloning, it will be of interest to look for open reading frames within the cloned fragment and flanking sequences, further, to compare any significant open readings frames with those known through databases such as GenBank. With additional sequences from the marker available, it also will be possible to design other primers that will be specific for the marker.
Primer sets are prepared in both the sense and antisense orientation. Suitable oligonucleotide primers can be synthesized using commercial synthesizers, such as those supplied by Applied Biosystems (Foster City, CA) . PCR- type methods can be used to sequence regions from the genome that are adjacent to the marker and, further, to clone the associated genomic sequences.
It also will be of interest to determine whether mRNA transcripts corresponding to the markers are synthesized. Methods for carrying out this determination are well known in the art. See, for example Sambrook et al . , MOLECULAR CLONING : A LABORATORY MANUAL, 2d Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989) . Northern analysis may be performed directly on RNA preparations. Alternatively, mRNA preparations can be used as templates for cDNA synthesis using poly(dT) or random hexamer primers by standard techniques. See Sambrook et al . , supra . In a particularly preferred embodiment, cDNA synthesis is carried out using a commercially available kit (Pharmacia) . The cDNA can then be used directly for PCR using the method of Saiki et al . , Science 239:487 (1988) . In a particularly preferred embodiment, the cDNA is packaged into bacteriophage particles using a commercially available kit (Promega, Madison, WI) . The packaged cDNA is then transfected into E. coli to produce a cDNA library.
Once the DNA sequence encoding an entire coding region is known, it can be used to prepare non-degenerate primers corresponding to that sequence, optionally containing restriction enzyme recognition sequences to aid in cloning of various DNA products. Alternative methods for carrying out this PCR analysis include use of the 5' or 3 ' RACE methods using commercially available kits, such as those manufactured by Life Technologies
(Gaithersburg, MD) or Clontech (Palo Alto, CA) . Primers for this method are selected according to the manufacturer's directions.
Gene fragments can be excised from the cloning vector by restriction enzyme digestion, labeled with 32P by conventional methods and used as probes to identify the complete gene encoding the marker-associated polypeptide from within a cDNA library. In a preferred embodiment, the probe is chosen such that it is long enough to ensure hybridization specificity, while remaining short enough to allow reasonable rates of hybridization to the target gene. Such probes can be used to screen genomic or cDNA libraries of related or unrelated organisms.
Once the entire coding sequence of a marker- associated gene has been determined, various probes and primers can be designed around that sequence. Primers may be of any length but, typically, are 10-20 bases in length. By assigning numeric values to a sequence, for example, the first residue is 1, the second residue is 2, etc., an algorithm defining all primers can be proposed:
n to n + y
where n is an integer from 1 to the last number of the sequence and y is the length of the primer minus one (9 to 19) , where n + y does not exceed the last number of the sequence. For marker 122.2, n is 1 to 255. Thus, for a 10-mer, the probes correspond to bases 1 to 10, 2 to 11, 3 to 12 ... and 246 to 255. For a 15-mer, the probes correspond to bases 1 to 15, 2 to 16, 3 to 17 ... and 241 to 255. For a 20-mer, the probes correspond to bases 1 to 20, 2 to 21, 3 to 22 ... and 236 to 255.
Screening of libraries is carried out by conventional methods. See Sambrook et al , supra . Clones which hybridize to the probe are purified and their sequences determined. To facilitate sequencing, nested deletions in the clones can be created using standard protocols, or by commercially available kits such as Erase-a-base (Promega, Madison, WI) or The Deletion Factory (Life Technologies, Gaithersburg, MD) , following the manufacturer's instructions. The sequences obtained are analyzed for the presence of open reading frames by conventional methods. In a preferred embodiment, cDNA libraries are prepared by both random hexamer and poly (dT) priming.
Once the entire coding sequence of a marker- associated gene has been determined, the gene can be inserted into an appropriate expression system. The gene can be expressed in any number of different recombinant DNA expression systems to generate large amounts of the polypeptide product, which can then be purified and used to vaccinate animals to generate antisera with which further studies may be conducted.
Examples of expression systems known to the skilled practitioner in the art include bacteria such as E. coli , yeast such as Pichia pastoris, baculovirus, and mammalian expression systems such as in Cos or CHO cells. In a preferred embodiment, polypeptides are expressed in E. coli and in baculovirus expression systems. A complete gene can be expressed or, alternatively, fragments of the gene encoding portions of polypeptide can be produced.
In a preferred embodiment, the gene sequence encoding the polypeptide is analyzed to detect putative transmembrane sequences. Such sequences are typically very hydrophobic and are readily detected by the use of standard sequence analysis software, such as MacVector
(IBI, New Haven, CT) . The presence of transmembrane sequences is often deleterious when a recombinant protein is synthesized in many expression systems, especially E. coli , as it leads to the production of insoluble aggregates which are difficult to renature into the native conformation of the protein. Deletion of transmembrane sequences typically does not significantly alter the conformation of the remaining protein structure.
Moreover, transmembrane sequences, being by definition embedded within a membrane, are inaccessible. Antibodies to these sequences will not, therefore, prove useful in in vivo or in situ studies. Deletion of transmembrane-encoding sequences from the genes used for expression can be achieved by standard techniques. See Ausubel et al . , supra, Chapter 8. For example, fortuitously-placed restriction enzyme sites can be used to excise the desired gene fragment, or PCR-type amplification can be used to amplify only the desired part of the gene.
In a preferred embodiment, computer sequence analysis is used to determine the location of the predicted major antigenic determinant epitopes of the polypeptide. Software capable of carrying out this analysis is readily available commercially, for example MacVector (IBI, New Haven, CT) . The software typically uses standard algorithms such as the Kyte/Doolittle or Hopp/Woods methods for locating hydrophilic sequences which are characteristically found on the surface of proteins and are, therefore, likely to act as antigenic determinants.
Once this analysis is made, polypeptides can be prepared which contain at least the essential features of the antigenic determinant and which can be employed in
the generation of antisera against the polypeptide. Minigenes or gene fusions encoding these determinants can be constructed and inserted into expression vectors by standard methods, for example, using PCR cloning methodology.
The gene or gene fragment encoding a polypeptide can be inserted into an expression vector by standard subcloning techniques. In a preferred embodiment, an E. coli expression vector is used which produces the recombinant polypeptide as a fusion protein, allowing rapid affinity purification of the protein. Examples of such fusion protein expression systems are the glutathione S-transferase system (Pharmacia, Piscataway, NJ) , the maltose binding protein system (NEB, Beverley, MA) , the FLAG system (IBI, New Haven, CT) , and the 6xHis system (Qiagen, Chatsworth, CA) .
Some of these systems produce recombinant polypeptides bearing only a small number of additional amino acids, which are unlikely to affect the antigenic ability of the recombinant polypeptide. For example, both the FLAG system and the 6xHis system add only short sequences, both of which are known to be poorly antigenic and which do not adversely affect folding of the polypeptide to its native conformation. Other fusion systems produce polypeptide where it is desirable to excise the fusion partner from the desired polypeptide. In a preferred embodiment, the fusion partner is linked to the recombinant polypeptide by a peptide sequence containing a specific recognition sequence for a protease. Examples of suitable sequences are those recognized by the Tobacco Etch Virus protease (Life Technologies, Gaithersburg, MD) or Factor Xa (New England Biolabs, Beverley, MA) .
In another preferred embodiment, the expression system used is one driven by the baculovirus polyhedron promoter. The gene encoding the polypeptide can be manipulated by standard techniques in order to facilitate cloning into the baculovirus vector. See Ausubel et al . , supra . A preferred baculovirus vector is the pBlueBac vector (Invitrogen, Sorrento, CA) . The vector carrying the gene for the polypeptide is transfected into Spodoptera frugiperda (Sf9) cells by standard protocols, and the cells are cultured and processed to produce the recombinant antigen. See Summers et al., A MANUAL OF METHODS FOR BACULOVIRUS VECTORS AND INSECT CELL CULTURE PROCEDURES, Texas Agricultural Experimental Station.
As an alternative to recombinant polypeptides, synthetic peptides corresponding to the antigenic determinants can be prepared. Such peptides are at least six amino acid residues long, and may contain up to approximately 35 residues, which is the approximate upper length limit of automated peptide synthesis machines, such as those available from Applied Biosystems (Foster City, CA) . Use of such small peptides for vaccination typically requires conjugation of the peptide to an immunogenic carrier protein such as hepatitis B surface antigen. Methods for performing this conjugation are well known in the art.
In a preferred embodiment, amino acid sequence variants of the polypeptide can be prepared. These may, for instance, be minor sequence variants of the polypeptide which arise due to natural variation within the population or they may be homologues found in other species. They also may be sequences which do not occur naturally but which are sufficiently similar that they function similarly and/or elicit an immune response that cross-reacts with natural forms of the polypeptide. Sequence variants can be prepared by standard methods of
site-directed mutagenesis such as those described above for removing the transmembrane sequence.
Amino acid sequence variants of the polypeptide can be substitutional, insertional or deletion variants.
Deletion variants lack one or more residues of the native protein which are not essential for function or immunogenic activity, and are exemplified by the variants lacking a transmembrane sequence described above. Another common type of deletion variant is one lacking secretory signal sequences or signal sequences directing a protein to bind to a particular part of a cell. An example of the latter sequence is the SH2 domain, which induces protein binding to phosphotyrosine residues.
Substitutional variants typically contain the exchange of one amino acid for another at one or more sites within the protein, and may be designed to modulate one or more properties of the polypeptide such as stability against proteolytic cleavage. Substitutions preferably are conservative, that is, one amino acid is replaced with one of similar shape and charge. Conservative substitutions are well known in the art and include, for example, the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to arginine, glutamine, or glutamate; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or phenylalanine; and valine to isoleucine or leucine.
Insertional variants include fusion proteins such as those used to allow rapid purification of the polypeptide
and also can include hybrid proteins containing sequences from other proteins and polypeptides which are homologues of the polypeptide. For example, an insertional variant could include portions of the amino acid sequence of the polypeptide from one species, together with portions of the homologous polypeptide from another species. Other insertional variants can include those in which additional amino acids are introduced within the coding sequence of the polypeptide. These typically are smaller insertions than the fusion proteins described above and are introduced, for example, to disrupt a protease cleavage site.
In a preferred embodiment, major antigenic determinants of the polypeptide are identified by an empirical approach in which portions of the gene encoding the polypeptide are expressed in a recombinant host, and the resulting proteins tested for their ability to elicit an immune response. For example, PCR can be used as described above to prepare a range of peptides lacking successively longer fragments of the C-terminus of the protein. The immunoprotective activity of each of these peptides then identifies those fragments or domains of the polypeptide which are essential for this activity. Further experiments in which only a small number of amino acids are removed at each iteration then allows the location of the antigenic determinants of the polypeptide.
Another preferred embodiment for the preparation of the polypeptides according to the invention is the use of peptide mimetics. Mimetics are peptide-containing molecules which mimic elements of protein secondary structure. See, for example, Johnson et al . , "Peptide Turn Mimetics" in BIOTECHNOLOGY AND PHARMACY, Pezzuto et al . , Eds., Chapman and Hall, New York (1993) . The underlying rationale behind the use of peptide mimetics
is that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such a way as to facilitate molecular interactions, such as those of antibody and antigen. A peptide mimetic is expected to permit molecular interactions similar to the natural molecule.
Successful applications of the peptide mimetic concept have thus far focussed on mimetics of β-turns within proteins, which are known to be highly antigenic. Likely β-turn structure within an polypeptide can be predicted by computer-based algorithms as discussed above. Once the component amino acids of the turn are determined, peptide mimetics can be constructed to achieve a similar spatial orientation of the essential elements of the amino acid side chains, as discussed in Johnson et al . , supra .
VI. EXAMPLES
Example 1: Preparation of Swine DNA
Blood is collected and, if desired, samples may be stored at 4°C prior to use. Blood collection tubes are placed on a orbital mixer for at least three minutes at room temperature to mix whole blood completely. Nine hundred μl of cold, sterile TE buffer (0.009 g EDTA and 0.395 g Tris/HCl in 250 ml of sterile H20) is added to a 1.5 ml microfuge tube. One hundred μl of whole blood is added to the tubes, which are then capped and labeled.
After five minutes, the tubes are rocked gently twice by hand. Tubes are spun at 10K g for fifteen seconds and the supernatant decanted. One ml of cold, sterile TE buffer and the tube is then vortexed for 2-5 sec to resuspend the pellet. The process is repeated at least one more time or until red color is absent from the pellet.
The pellet produced according to the preceding protocol is treated with PCR buffer including a nonionic detergent and protease K. PCR buffer comprises, in 100 ml of sterile H20:
0.37 g KC1 1.0 ml Tris/HCl (pH 8.3)
0.024 g MgCl2 0.01 g Gelatin 0.45 ml NP40 0.45 ml Tween 20
The tube is gently mixed or vortexed for five sec. The tubes are then incubated in a 37°C drying on an orbital mixed for 15-18 hours, or 56°C for 90 min. The protease is deactivated in a 97°C water bath for 10 min. The sample is spun down and frozen at -20°C prior to use.
Where solid tissues are used as the source for genomic DNA, the "EASY-DNA"™ kit (Invitrogen, San Diego, CA) is employed.
Example 2 : Polymerase Chain Reaction
A 25 μl PCR reaction mixture is prepared containing 25 ng genomic DNA (template) of the DNA isolated from pigs. To the template is added 0.2 μM primer (the synthetic oligodeoxy-nucleotide 5' -GTAGACGAGC-3' ) , 2 M MgCl2, 50 μM each of dATP, dTTP, dGTP, dCTP, lOmM Tris- HC1 pH 8.3 (at 25°C) , 50mM KCl, 0.001% gelatin, 5% DMSO and 1 unit of Taq Polymerase. The reaction mixture is first heated to 94°C for 10 min., then subjected to 45 cycles of PCR-type amplification of the following regime: 1 minute at 94°C, 1 minute at 35°C and 2 minutes at 72°C.
Example 3: Analysis of Amplification Products
The amplified reaction mixture is then loaded onto an electrophoresis gel consisting of 1.4% agarose in a buffer of 0.089 M Tris base, 0.089 M boric acid and 0.002
M EDTA. The gel is run using the same buffer and an electric field is applied with the amplified bands moving in the direction of the anode. The gel is run until the bromophenol blue tracking dye has migrated at least 5 cm.
For detection of bands, the gel is then examined with a UV light box after the gel is stained in an ethidium bromide solution (0.5 ml of EtBr (1 mg/ml in dH20) added to 100ml of gel buffer) . A band designated as 122.2 is found in amplification of DEKALB swine DNA and is 255 base pairs. This band is missing from the amplification products generated from Chinese swine DNA.
Example 4 : Statistical Correlation in Breeding Program
The animals used for analyzing the relevance of the 122.2 marker, with respect to litter size, consist of a cross between a DEKALB female and a Chinese male. From the original cross, the Fl's produced (50% DEKALB/50% Chinese) are intercrossed to produce an F2. The F2 females are bred and the litter size data collected and analyzed. Table 1, below, shows the relevant data:
TABLE 1
Data Set-1 AVG- NO- AVG+ NO+ TOTAL t-value
Al 14.3 10 11.2 13 23 1.696692
- Al, A2 and D are different resource families
As is evident from a these data, animals lacking the 122.2 marker had an average litter size that was some 23.4% higher than animals possessing the marker.
A second study was conducted using a composite line identified as the H line. For this study, 57 females were genotyped for the 122.2 marker. Fertility data (at least two litters) was available for each of the 57 females . These data are shown below in Table 2 :
TABLE 2
Data Set AVG- NO- AVG+ NO+ TOTAL t-value prob. (p)
1st Litter 12.8 31 11.2 26 57 2.28 <0.01
2nd Litter 11.7 31 9.9 26 57 2.60 <0.02-0.05
The data indicate a significant association of the absence of the 122.2 marker with large litter size (t-
value « 2.28 and 2.6 for the first and second litter, respectively) .
Example 5: Sequencing of Marker 122.2
The 255 base pair sequence of marker 122.2 was obtained using the "Silver Sequence-DNA" sequencing system (Promega, Madison, WI) . This system is a non- radioactive method which combines thermal cycle sequencing with a sensitive silver staining protocol to detect bands in a DNA sequencing gel . The protocol comprises the following steps. First, 2 μl of d/ddNTP mix was added to each four tubes labeled, G, A, T and C. In a separate tube, 1-2 pmol of plasmid DNA, 5 μl DNA sequencing buffer (5x) , 4-5 pmol of primer and sterile
H20 was added then added to a final volume of 16 μl . One μl of sequencing grade Taq polymerase was added to the 16 μl mixture and 4 μl of the enzyme/template mixture was distributed to each of the tubes labeled G, A, T and C followed by brief microcentrifugation. The tubes were then placed in a thermalcycler set for the following cycling:
95°C for 2 min, 95°C for 30 sec, 42°C for 30 sec, 70°C for 1 min
This cycle was repeated for 55 cycles, followed by incubation at 4°C. Following completion of thermalcycling, 3 μl of DNA sequencing stop solution is added and the tubes heated to 70°C for 2 min immediately prior to addition of 3-3.5 μl of each mixture to a 4-6% polyacrylamide (19:1 acrylamide:bis-acrylamide) sequencing gel (0.4 mm) . The gel was run at 1600 volts constant. The gel was silver stained and the sequence read. The sequence is provided in FIG. 1.
Example 6 Analysis of Nucleotide and Predicted Amino Acid Sequence of Marker 122.2
The sequence of marker 122.2 was subjected to a homology search in Genbank at the National Center for Biotechnology Information (National Library of Medicine, National Institutes of Health, Bethesda, MD) . There was only one significant match of 63%, over 77 residues (136- 212) of marker 122.2, with the gene for Plasmodium falciparum merozoite 190 kD precursor protein (pl90) .
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: Northern Illinois University
(B) STREET:
(C) CITY: DeKalb (D) STATE: Illinois
(E) COUNTRY : USA
(F) POSTAL CODE (ZIP) : 60115-2874
(A) NAME: DeKalb Swine Breeders (B) STREET: 3100 Sycamore Road
(C) CITY: DeKalb
(D) STATE: Illinois
(E) COUNTRY : USA
(F) POSTAL CODE (ZIP) : 60115
(ii) TITLE OF INVENTION: GENE MARKER ASSOCIATED WITH
SWINE PROLIFERACY
(iii) NUMBER OF SEQUENCES: 2
(iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS (D) SOFTWARE: PatentIn Release #1.0, Version
#1.30 (EPO)
(vi) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: US 08/423,572 (B) FILING DATE: 17-APR-1995
( 2 ) INFORMATION FOR SEQ ID NO : 1 :
( i ) SEQUENCE CHARACTERISTICS :
(A) LENGTH: 255 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:
10 GTAGACGAGC AAGTTTCCAA AGGATGCTAT TTTGGGGATT CTTGGCTTTC CTTGAGACCA 60
I
GAGCAACTCT CTTGAAAGGC TTTATGAAGT TTGGCATCTG GGGACTATTT CTGTTGAAGA 120 w I
15 AAGCGTTCCA CTGCTAAACA AACAAACACT AAAGAGTGGG AATATCAAGA CTTTTCTCAT 180
AGAAAGAAAA ATATCTGTGA TGAAAATTAT GTCTACCCCC AGACGAATGG GAGAAGACAT 240
CCTGGGCTCG ACTAC 255
20
(2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS
(A) LENGTH: 255 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS : single 5 (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:
10 CATCTGCTCG TTCAAAGGTT TCCTACGATA AAACCCCTAA GAACCGAAAG GAACTCTGGT 60
CTCGTTGAGA GAACTTTCCG AAATACTTCA AACCGTAGAC CCCTGATAAA GACAACTTCT 120 , ω ω
I
TTCGCAGAGT GACGATTTGT TTGTTTGTGA TTTCTCACCC TTATAGTTCT GAAAAGAGTA 180
15 TCTTTCTTTT TATAGACACT ACTTTTAATA CAGATGGGGG TCTGCTTACC CTCTTCTGTA 240
GGACCCGAGC TGATG 255