EP1007739A2 - Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques - Google Patents

Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques

Info

Publication number
EP1007739A2
EP1007739A2 EP98945882A EP98945882A EP1007739A2 EP 1007739 A2 EP1007739 A2 EP 1007739A2 EP 98945882 A EP98945882 A EP 98945882A EP 98945882 A EP98945882 A EP 98945882A EP 1007739 A2 EP1007739 A2 EP 1007739A2
Authority
EP
European Patent Office
Prior art keywords
primers
nucleotide sequences
subset
group
related nucleotide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98945882A
Other languages
German (de)
English (en)
Inventor
Michael Mcclelland
Graziano Pesole
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sidney Kimmel Cancer Center
Original Assignee
Sidney Kimmel Cancer Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sidney Kimmel Cancer Center filed Critical Sidney Kimmel Cancer Center
Publication of EP1007739A2 publication Critical patent/EP1007739A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]

Definitions

  • the present invention relates generally to methods of amplifying nucleotide sequences and more specifically to methods of identifying sets of primer pairs sufficient to accomplish the amplification.
  • DNA deoxyribonucleic acid
  • DNA is made of two strands of nucleotide building blocks; the two strands bind, or hybridize, much like a zipper and form a double helix.
  • Genes are discreet segments of the DNA and provide the information required to generate a new organism. Even simple organisms, such as bacteria, contain thousands of genes, and the number is many fold greater in complex organisms such as humans. Understanding the complexities of the development and functioning of living organisms requires knowledge of these genes. However, the amount of DNA that can be isolated for study has often been limiting. A major breakthrough in the study of genes was the development of the polymerase chain reaction (PCR) .
  • PCR polymerase chain reaction
  • PCR "amplifies" genes or portions of genes by making many identical copies, allowing isolation of genes from very tiny amounts of DNA.
  • PCR requires the design of primers composed of short stretches of nucleotides that bind, or hybridize, to discreet segments of the gene.
  • primers composed of short stretches of nucleotides that bind, or hybridize, to discreet segments of the gene.
  • the gene or portion of the gene that is adjacent to the bound primers is then copied many times, generating large quantities of material that can be used for further studies, such as identifying genes expressed abnormally in cancer cells.
  • PCR primers The design of PCR primers is relatively straightforward when the sequence of the gene of interest is known or when the number of sequences to be amplified is small. However, particular circumstances can make the design of PCR primers a difficult task. For example, it would be advantageous to be able to identify new related genes where the sequence of some related genes are already known. However, design of PCR primers requires that the sequence to be amplified is known. The length of primers is also critical for successful PCR amplification. For example, primers of sufficient length will selectively amplify a known gene but will likely be too specific to amplify related family members. On the other hand, if primers are too short, they will amplify many unrelated genes and generate a high background.
  • the present invention provides subsets of primers sufficient to amplify a group of related nucleotide sequences.
  • the subset of primers can be less than the maximum number of two primers per nucleotide sequence required to amplify a group of related nucleotide sequences.
  • the invention provides a subset of primers where the number of primers is less than or equal to the number of related nucleotide sequences in the group.
  • the primers in the subset can be limited to a specific length or range of G+C content, or the subset of primers can exclude primers that amplify an undesirable nucleotide sequence.
  • the present invention also provides a method of determining a set of primer pairs for amplifying a group of related nucleotide sequences.
  • the invention provides a method of determining a set of primer pairs for amplifying a group of structurally related nucleotide sequences that encode members of the human nuclear receptor family.
  • a method of the invention is performed by identifying a group of related nucleotide sequences; generating a set of primers that match each of the related nucleotide sequences; determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified; and selecting from the systematic pairings a subset of primers sufficient to amplify all of the related nucleotide sequences.
  • the invention additionally provides a method of identifying an amplified nucleotide sequence that is related to an original group of related nucleotide sequences.
  • a set of primers that samples a known group of nucleotide sequences that is induced by TGF- ⁇ can be used to identify previously unknown related nucleotide sequences induced by this agent.
  • the invention further provides a computer apparatus comprising a processor, main memory in communication with the processor, and a primer pair selector in communication with main memory for carrying out the computer-executed steps of identifying a group of related nucleotide sequences; generating a set of primers that match each of the related nucleotide sequences; determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified; and selecting from the systematic pairings a subset of primers which is sufficient to amplify all of the group of related nucleotide sequences.
  • the invention also provides a computer program product for determining a set of primer pairs sufficient to amplify a group of related nucleotide sequences comprising means for identifying a group of related nucleotide sequences; means for generating a set of primers that match each of the related nucleotide sequences; means for determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified; means for selecting from the systematic pairings a subset of primers which can amplify all of the related nucleotide sequences; and signal- bearing media containing the means for the identifying, generating, determining and selecting.
  • the present invention is best carried out as software operating in a computer, but one skilled in the art will recognize that it can be carried out as hardware or as a combination of hardware and software.
  • Figure 1 shows a block diagram of the computer system of the preferred embodiment.
  • FIG. 2 shows the flowchart that describes the operation of the present invention.
  • FIGS 3 and 4 show the flowcharts that depict the computer-executed steps for carrying out the method of the invention.
  • Figure 5 shows a program product for performing the method of the invention.
  • the present invention provides subsets of primers sufficient to amplify a group of related nucleotide sequences.
  • the subset of primers can be less than the maximum number of two primers per nucleotide sequence required to amplify a group of related nucleotide sequences.
  • the invention provides a subset of primers where the number of primers is less than or equal to the number of related nucleotide sequences in the group.
  • the primers in the subset can be limited to a specific length or range of G+C content, or the subset of primers can exclude primers that amplify an undesirable nucleotide sequence.
  • the present invention also provides a method of determining a set of primer pairs for amplifying a group of related nucleotide sequences.
  • the invention provides a method of determining a set of primer pairs for amplifying a group of structurally related nucleotide sequences which encode members of the human nuclear receptor family.
  • a method of the invention is performed by identifying a group of related nucleotide sequences; generating a set of primers that match each of the related nucleotide sequences; determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified, for example, by generating a matrix that ranks the primers and nucleotide sequences by the number of matches to the other; and selecting from the systematic pairings a subset of primers that is sufficient to amplify all of the related nucleotide sequences.
  • primers refers to an oligonucleotide sequence of any size that can be used to amplify a nucleotide sequence.
  • the primers are about 5 to about 50 nucleotides in length, generally about 8 to about 12 nucleotides in length.
  • group of related nucleotide sequences refers to DNA or RNA molecules that share a common feature.
  • the common feature is a feature of interest to an investigator, for example, a group of related nucleotide sequences that share one or more common structural features, such as the DNA binding domains shared by members of the human nuclear receptor family.
  • a group of related nucleotide sequences also can share a common feature such as being involved in DNA repair or apoptosis or being induced in response to a stimulus such as TGF- ⁇ .
  • a shared common feature of a group of related nucleotide sequences can define a large group of nucleotide sequences such as human mRNA sequences, which share the common feature of being expressed in a particular human cell.
  • a group of related nucleotide sequences can be compiled as a list (see, for example, Table II, Table V, Table VII and Table IX) .
  • a set of primers can be compiled as a list.
  • a common feature of a group of related nucleotide sequences also can be a common function.
  • a group of related nucleotide sequences can encode proteins that share a common function, which can be any biological function.
  • proteins include receptors, for example cell surface receptors, cytoplasmic receptors such as steroid hormone receptors, and nuclear receptors; secreted proteins; non-secreted proteins; hormones such as peptide hormones; and signal transduction proteins.
  • a group of related nucleotide sequences can be genes regulated during certain conditions, such as genes regulated during cell growth or regulating cell growth; as well as genes regulated during development, during pathogenesis, by pathogens, by drugs, by stress, by radiation exposure, during tissue repair, during senescence or during aging.
  • tumor suppressor genes, DNA repair genes, DNA replication genes, or DNA repair and replication genes can be groups of related nucleotide sequences.
  • genes responsible for resistance to therapy, resistance to drugs or for increased sensitivity to therapy or drugs also can be a group of related nucleotide sequences.
  • groups of related nucleotide sequences as these herein described can be broadly applicable to plant biology, agriculture, medicine and reproduction, veterinary medicine, microbiology and environmental sciences.
  • Primers identified by a method of the invention match with specific regions of nucleotide sequences.
  • the term "match" refers to a primer that is 100% identical to a region of a nucleotide sequence.
  • the primers identified herein can themselves function, for example, as PCR primers in a test tube or reaction vessel, or can be part of a longer oligonucleotide sequence.
  • a primer selected using the disclosed method can comprise a longer oligonucleotide that, in fact, is used to perform PCR.
  • the subset of primers that amplify the human nuclear receptor family shown in Table III can be synthesized and used to amplify nucleotide sequences related to this family.
  • additional oligonucleotide bases can be added to the an end of the primers shown in Table III. These additional bases, which can be arbitrary bases, aid in stabilizing the hybridization of PCR primers to the cDNA sequence.
  • the subset of primers identified using a method of the invention provide a core sequence, identical to a region of a DNA or RNA molecule, that allows amplification of such a nucleotide sequence, even though additional oligonucleotide bases can be added to the ends of these identified primers to facilitate performing PCR in a test tube.
  • the primers disclosed herein can be used to amplify nucleotide sequences by any method that amplifies nucleotide sequences.
  • amplification methods include the ligase chain reaction (LCR) , self-sustained sequence replication (3SR) , beta replicase reaction, for example, Q-beta replicase reaction, phage terminal binding protein reaction, strand displacement amplification (SDA) or nucleic acid sequence-based amplification (NASBA) also can be used to amplify nucleotide sequences using the primers of the invention (Trippler et al., J. Viral. Hepat.
  • the present invention is based on the understanding that a group of related nucleotide sequences share common stretches of nucleic acid sequence.
  • nucleotide sequences encoding structurally related proteins share regions of homology that encode conserved domains.
  • the related nucleotide sequences in such a group are all structurally related.
  • a group of related nucleotide sequences need not encode proteins that are structurally similar, but also can be nucleotide sequences that are commonly induced in response to a stimulus, for example, DNA damage, or exposure of a cell or organism to a drug or other chemical agent. In this case, some or all of the related nucleotide sequences in a group are not structurally related.
  • the present invention recognizes that there is a statistical probability that a group of related nucleotide sequences share one or a few common short stretches of nucleotide sequence. For example, a primer eight nucleotides in length would be expected to occur, statistically, about once every 65,000 base pairs (4 8 ). Using a method as disclosed herein, a sequence corresponding to an 8-mer primer occurred 34 times in the identified group of 44 human nuclear receptors (Table II) , whereas statistics would have predicted that the 8-mer would occur only once in this list of related nucleotide sequences. This result demonstrates that a group of related nucleotide sequences share common short stretches of nucleotide sequence, which a method of the invention identifies.
  • the invention provides a means to identify a set of primers that samples a group of related nucleotide sequences.
  • samples refers to the ability of a primer pair to hybridize to and amplify a nucleotide sequence by PCR.
  • the set of primers can be a minimal or near minimal set, the minimal set being a set of primers containing the fewest number of primers sufficient to amplify a group of related nucleotide sequences.
  • the set of primers is selected from all possible primers of a given size or range of sizes. For example, 8-mers would generate 65,536 (4 8 ) possible primers, whereas 10-mers would generate 1,048,576 (4 10 ) possible primers.
  • the disclosed method provides a means for identifying a group of related nucleotide sequences; generating a set of primers that matches each of the group of related nucleotide sequences, for example by selecting a primer size such as an 8-mer and determining for each primer which of the related nucleotide sequences are amplified; determining for each systematic pairing of each primer, which of the related nucleotide sequences are amplified, for example, by generating a matrix that ranks the primer pairs and nucleotide sequences by the number of matches to the other; and selecting from the systematic pairing a subset of primers sufficient to amplify all or most of the group of related nucleotide sequences.
  • systematic pairing refers to pairing any given primer in a set with all primers in the set, including itself.
  • the invention provides a method to amplify genes in the human nuclear receptor family (see Table II).
  • the invention provides a means to identify a subset of primers that amplify a group of related nucleotide sequences.
  • Visual inspection of a group of related nucleotide sequences allows identification of a set of primers sufficient to amplify some groups of related nucleotide sequences.
  • visual inspection for identification of PCR primers that amplify a group of related nucleotide sequences is efficient only when the number of related nucleotide sequences is small, for example, 5 nucleotide sequences or fewer, and the nucleotide sequences are relatively short.
  • a computer process When a group of related nucleotide sequences contains a large number of nucleotide sequences or if the nucleotide sequences are long, however, a computer process conveniently performs the identification of the set of primers.
  • the term "computer process" refers to a method for carrying out computer-executed steps. For example, if the group of nucleotide sequences contains at least 10 nucleotide sequences or more, or 25 nucleotide sequences or more, particularly 50 nucleotide sequences or more, a computer process provides an efficient method to identify primers that amplify the group of related nucleotide sequences.
  • a computer process can compile a list of all possible primers composed of short nucleotide sequences that can function as PCR primers.
  • the primers range in size from about 5 to about 50 nucleotides, for example, about 8 to about 12 bases long, and are generally about 8 or 9 bases in length.
  • the computer process provides an advantage in that a large number of possible primers can be considered, for example, the 65,536 (4 8 ) possible 8-mers, and the primers can be restricted by specific criteria, such as restricted to a predetermined range of G+C content or restricted to exclude matches to undesirable nucleotide sequences, for example, ribosomal RNA sequences.
  • Primers as short as 5 bases have been used for PCR (Caetano Anolles et al., Biotechnology 9:553 (1991)).
  • a large group of related nucleotide sequences can have common 5-mer primers that amplify all or most of the related nucleotide sequences.
  • very short primer sequences will occur frequently in related nucleotide sequences, they also statistically are likely to occur frequently in other unrelated nucleotide sequences. Therefore, very short primer sequences do not always provide the desired selectivity for amplifying the group of related nucleotide sequences.
  • the primer length is also constrained by the statistical frequency of any given nucleotide sequence.
  • a 10-mer would occur about once every 1,000,000 base pairs (4 10 ) .
  • the most common 10-mers occurred only a few times and were confined to regions of conserved domains.
  • the primers were limited to lengths no greater than 9-mers.
  • primers that match undesirable nucleotide sequences can be eliminated from the compiled list of possible primers.
  • Abundant nucleotide sequences such as mitochondrial DNA and ribosomal RNA or dispersed repetitive elements can lead to high background if primers in the set hybridize to these sequences during PCR.
  • primers that match such abundant sequences can be eliminated from the compiled list of possible primers .
  • RNA sequences and mitochondrial RNA sequences constitute a substantial and variable proportion of the RNA population of a cell, even after poly (A) RNA selection.
  • a list of human ribosomal RNA sequences, human mitochondrial DNA sequences, and eight representative Alu elements were compiled (see Table I).
  • a list of 99 mRNA sequences that carry fragments of LINE elements was also compiled (Hattori et al., Nature 321:625 (1986)), (see Table I).
  • LINE elements generally are found in a 5' truncated form of mRNA sequences and, where LINE elements occur, they generally are found in the 3' untranslated region of mRNA sequences.
  • Primers that match such sequences can be eliminated from the compiled list of possible primers.
  • all primers that matched with either strand of the human nuclear receptor sequences were compared to the ribosomal RNA, mitochondrial DNA and Alu sequences listed in Table I and those primers that matched with either strand of these abundant nucleotide sequences were removed from the compiled list of possible primers.
  • any primer that occurred three or more times in the list of LINE elements in Table I was removed from the compiled list of possible primers. After subtraction of abundant nucleotide sequences, only primers greater than 7-mers remained in the compiled list of possible primers. Because the vast majority of the 16,384 (4 ) possible 7-mer primers occurred in the sequences listed in
  • the primers were limited to lengths no shorter than 8-mers for determining a subset of primer pairs sufficient to amplify the human nuclear receptor gene family, the human G-protein coupled receptor gene family, human apoptosis-associated genes and human DNA repair and replication genes.
  • Other undesirable nucleotide sequences can include abundant mRNA sequences.
  • primers that match known abundant mRNA sequences that are constitutively expressed can be subtracted from the compiled list of possible primers.
  • High background of specific sized PCR products of the abundant mRNA sequences can obscure detection of similar sized PCR products of less abundant mRNA sequences. Removal of background contributed by the abundant mRNA sequences, whose PCR products would have a similar size compared to the less abundant mRNA sequences, would increase the likelihood of detecting amplified products from the less abundant mRNA sequences .
  • Accumulating information on the relative rank of abundance of mRNAs in cells can allow the identification of the top 100 or 500 mRNAs that occur in most cell types in an organism, for example, humans.
  • a list of such genes can be used to exclude primer pairs from consideration if, for example, more than one of these more abundant RNAs was likely to give a PCR product with a primer pair.
  • most of the publicly available information is confined to genes that are differentially expressed. The level of expression of genes that are not differentially expressed has not been made publicly available.
  • information from projects such as the Cancer Genome Anatomy Project and the Human Genome Initiatives EST project can be used to determine expression of genes in an undifferentiated state. Such information can be used to develop a list of the 100 most consistently highly expressed genes, which can then be excluded from a list of primers.
  • primers that do not perfectly match with a common gene but that could still be used to amplify the common gene can be excluded from the list of primers.
  • primers mismatched with the 5'- most specified base can be excluded. If such a primer pair samples more than a threshold number of these common genes, it can be excluded.
  • primers selected by the method of the invention increases the probability that the related nucleotide sequences of interest are given an opportunity to be sampled but will not exclude other nucleotide sequences from being sampled in the same mixture of PCR products.
  • the invention is not intended to select primers that have perfect matches exclusively with a number of the sequences of interest while having no matches in other sequences. Rather, the methods of the invention are directed to ensure that the related nucleotide sequences of interest are among the best matches so as to maximize the chance that these sequences will occur in a mixture of PCR products.
  • PCR reaction mixtures can be effective probes in differential hybridization experiments against clones for the expected mRNAs because the complexity of the probe is much lower than total cDNA.
  • Such PCR reaction mixtures are of particular interest in strategies that array clones or oligonucleotides on chips where the complexity of the probe can be a limiting factor in detecting rare transcripts .
  • a window of G+C content of the primers can be imposed, such as requiring a G+C content of about 20 to about 100%, in particular a G+C content of about 50% to about 90%.
  • the term "window of G+C content" refers to a range of percent composition for the nucleotides in the primer. Using a window of G+C content provides the advantage that the melting temperature of a pair of primers can be more closely matched to allow both primers to participate in the PCR reaction with similar efficiency.
  • a computer process can rank the primers in order of the number of related nucleotide sequences to which the primer matches. For example, the computer process generates a set of primers that amplifies all or most of a group of related nucleotide sequences by selecting a specified number of primers that matches the largest number of nucleotide sequences in the group of related nucleotide sequences and creating a set of primers composed of the selected number of primers and the complement of those primers.
  • the top ranked primers for example, the 30 primers that match the largest number of nucleotide sequences, thus yields a list containing a set of 60 primers composed of the 30 top ranked primers and their complements.
  • Each primer in the set of primers is paired with each other primer in the set of primers, and all of the theoretical PCR products are determined for each pair of primers and each nucleotide sequence.
  • the shortest PCR product is used, although the computer process records all PCR products generated.
  • a matrix is generated that ranks 1) the related nucleotide sequences by the number of primer pairs that generate a simulated PCR product, and 2) the primer pairs by the number of related nucleotide sequences that are successfully sampled. The matrix is used to generate a subset of primers that can amplify all, or nearly all, of the group of related nucleotide sequences.
  • the invention entails determining for each systematic pairing of each primer which of the related nucleotide sequences is amplified.
  • a nucleotide sequence in the matrix is selected by predetermined criteria as the first nucleotide sequence submitted to determine a primer pair that samples the nucleotide sequence.
  • "submitted” refers to analyzing a particular sequence using, for example, a computer process.
  • the primer pair that generates a simulated PCR product with the first nucleotide sequence is compared to all other related nucleotide sequences in the matrix. If more than one primer pair can generate PCR products from the first selected nucleotide sequence, the primer pair that samples the largest number of related nucleotide sequences is selected.
  • the primer pair that samples the first selected nucleotide sequence and all of the related nucleotide sequences sampled by this primer pair are removed from the matrix.
  • the next nucleotide sequence is selected, and the primer pair that samples the second selected nucleotide sequence and all other related nucleotide sequences sampled by this primer pair are removed from the matrix. This process is repeated until a subset of primers is selected that would theoretically PCR amplify all of the related nucleotide sequences.
  • Table III a set of primer pairs that amplifies the human nuclear receptor family is shown in Table III.
  • the computer process can require that the PCR products be limited to a predetermined size, for example greater than 100 base pairs and less than 1000 base pairs. Furthermore, the computer process can require that primer pairs generate PCR products that differ by a predetermined size range, such as ⁇ 3 base pairs. Such primer pairs can be advantageous, for example, if the PCR products are to be resolved by gel electrophoresis . Thus, PCR products of sizes that are impractical to separate using predetermined analytical techniques will not be generated. Primer pairs also can be limited to those primer pairs that generate a minimum number of different sized PCR products such as three or more different sized PCR products.
  • PCR products can be limited to PCR products derived from different nucleotide sequences and can be required to amplify a minimum number of nucleotide sequences, such as at least three nucleotide sequences.
  • PCR products can be limited to those with different primers at each end.
  • the analysis of PCR products on a gel is limited by the available resolution of the gel, by the production of artifactual products from internal priming of the products of interest, and by spurious priming from other genes that are not of interest.
  • primers to be used to probe arrays need not be limited to a given size differential of PCR products.
  • the methods of the invention are still advantageous for identifying primers for analyzing PCR products on arrays because the number of primers can be significantly reduced as described below.
  • the systematic pairing of primers maximizes the number of related nucleotide sequences that can be sampled with a set of primers.
  • the nucleotide sequence recognized by the largest number of primer pairs can be selected as the first nucleotide sequence submitted to determine a primer pair that samples the nucleotide sequence.
  • the primer pair that generates simulated PCR products with the first nucleotide sequence is compared to all other related nucleotide sequences in the matrix.
  • the primer pair that samples the first selected nucleotide sequence and all of the related nucleotide sequences in the matrix sampled by this primer pair are removed from the matrix.
  • the next nucleotide sequence is selected, in this case, the second selected nucleotide sequence is the nucleotide sequence remaining in the matrix that is sampled by the largest number of primer pairs.
  • the primer pair that samples the second nucleotide sequence and all other related nucleotide sequences sampled by this primer pair are removed from the matrix. This process is repeated until a subset of primer pairs is compiled that theoretically would result in PCR amplification of all, or nearly all, of the related nucleotide sequences.
  • An advantage of the invention is that a subset of primers can be identified that provides amplification of a maximal number of related nucleotide sequences, while minimizing the number of primer pairs required for the amplification.
  • An additional advantage is that primer pairs in the subset can be limited by specific criteria. For example, a set of primer pairs can be identified that generates more than one PCR product for each nucleotide sequence, thus assuring that at least two PCR products will be amplified for each nucleotide sequence and providing maximal opportunities to identify PCR products specific for a nucleotide sequence.
  • a set of primer pairs that amplifies the nucleotide sequences for at least that minimum number of times is generated. For example, any nucleotide sequences that are not amplified, or are amplified only once, by the set of primer pairs generated as outlined above can be identified and compiled into a new list of nucleotide sequences for analysis, which can be by a computer process. A new set of primers is generated that will match this new list of related nucleotide sequences.
  • This new set of primers can be added to the first set of primers and submitted to determine a primer pair that samples a nucleotide sequence in the original list of related nucleotide sequences to generate a new matrix that amplifies all, or nearly all, of the group of related nucleotide sequences.
  • This process of identifying nucleotide sequences that are not amplified or are amplified only once by the set of primer pairs can be repeated until all of the related nucleotide sequences have been amplified. Any nucleotide sequence for which primer pairs have not been identified by some number of repeats of this process can be removed from the list or a specific primer or primer pair can be designed, for example, by visual inspection.
  • Another approach to maximizing the likelihood that all of the related nucleotide sequences will be amplified involves selecting the nucleotide sequence recognized by the fewest number of primer pairs as the first nucleotide sequence submitted to determine a primer pair that amplifies the nucleotide sequence.
  • the method of identifying and removing sampled nucleotide sequences from the matrix is used to identify primer pairs that theoretically amplify all or most of the group of related nucleotide sequences.
  • the identified primer pairs define a subset of primers, that can be a minimal or near minimal set, sufficient to amplify a group of related nucleotide sequences.
  • the process of identifying any nucleotide sequences that are not amplified, or are amplified only once, by the set of primer pairs and of generating a new matrix that allows amplification of these related nucleotide sequences can be repeated until all of the related nucleotide sequences have been amplified, or the process can be terminated at an appropriate point.
  • Using such an approach generally has led to slightly less redundancy in amplifying the same nucleotide sequence but also sampled slightly fewer nucleotide sequences in the group.
  • the invention provides a method of selecting a subset of primers sufficient to amplify all of a group of related nucleotide sequences.
  • the subset of primers need not amplify all members of the group of related nucleotide sequences.
  • the number of repetitions of the process of identifying any nucleotide sequences not amplified, or amplified only once by the set of primer pairs can be limited.
  • the identified subset of primers can amplify a desired percentage of the group of related nucleotide sequences, which generally will be at least 80% of the group of nucleotide sequences, but can be 90%, 95% or 98% of the group, particularly 99% of the group.
  • the number of related nucleotide sequences in a group can vary.
  • a group can contain about 10 or more, about 20 or more, about 30 or more, about 40 or more, and generally about 50 or more, particularly about 75 or more or about 100 or more related nucleotide sequences.
  • a large group of related nucleotide sequences can contain, for example, greater than about 100 sequences, generally greater than about 200 sequences, and particularly greater than about 400 sequences.
  • the run time required for the computer process to identify a set of primers that amplify all of the nucleotide sequences can exceed a desirable length of time. Therefore, if desired, the computer process can be limited to a specified number of repeats of the process of identifying any nucleotide sequences not amplified, or amplified only once by the set of primer pairs. However, even when sampling only a portion of the nucleotide sequences, the method of the invention provides an advantage over the previously used arbitrary primers, since a subset of such primers is obtained.
  • the subset of primers that amplify a group of related nucleotide sequences can be a minimal or near minimal set. Because the computer process has the ability to systematically identify all possible primers of a given size or range of sizes and match those primers with all of the group of related nucleotide sequences, a minimal set of primers containing the fewest number of primers sufficient to amplify all of the group of related nucleotide sequences can be selected as the subset of primers. However, if desired, the subset of primers need not be the minimal set.
  • the subset of primers can be a near minimal set. For example, the identified subset of primers can be a near minimal set that is 20% more than the minimal number, generally 10% more than the minimal number, particularly 5% more than the minimal number.
  • a desirable subset of primers is one in which a reduction in the maximal number of primers sufficient to amplify a group of nucleotide sequences is achieved using methods of the invention.
  • Two primers, which can be different primers, are required to generate a PCR product. Therefore, the maximum number of primers sufficient to amplify a group of related nucleotide sequences is twice the number of related nucleotide sequences in the group.
  • a desirable subset of primers is one that is sufficient to amplify a group of related nucleotide sequences and contains about 50% of the maximum number of primers sufficient to amplify a group. In the case of a subset of primers containing 50% of the maximum number of primers sufficient to amplify a group, the number of primers in the subset is less than or equal to the number of related nucleotide sequences in the group. In addition, a desirable subset of primers can be one that is sufficient to amplify a group of related nucleotide sequences and that contains about 25% of the maximum number of primers, and particularly about 10% of the maximum number of primers .
  • the computer process selects a predetermined number of top ranked primers that match related nucleotide sequences in a group.
  • the set of top ranked primers may not include one or more primers that, although not occurring in the largest number of nucleotide sequences, is required to select the minimal set of primers to amplify all of a group of related nucleotide sequences. Consideration of all primers that match a group of related nucleotide sequences rather than the top ranked primers is required to assure that the subset of primers selected is the minimal set.
  • the run time required for the computer process to select the minimal set of primers sufficient to amplify all of the group of related nucleotide sequences can exceed a desirable length of time. Therefore, if desired, the computer process can be limited to a specified number of repeats of the process of identifying any nucleotide sequences not amplified, or amplified only once, by the set of primer pairs and generating a new set of top ranked primers.
  • the method provides an advantage.
  • the number of primers required to amplify a group of related nucleotide sequences is reduced significantly over the use of two primers for each nucleotide sequence, since the method identifies primer pairs that amplify multiple nucleotide sequences.
  • the invention also provides a method of using a set of primer pairs, which amplify an originally identified group of related nucleotide sequences, to amplify a population of nucleotide sequences related to the original group of nucleotide sequences.
  • the primers in the selected primer subset which are identified using a method of the invention, are synthesized by standard oligonucleotide synthesis techniques, which can be automated synthesis on a DNA synthesizer, such as an Applied Biosystems DNA synthesizer, or can be manual synthesis.
  • the synthesized primers are useful reagents for amplifying a population of nucleotide sequences related to the original group of related nucleotide sequences.
  • the primers listed in Table III can be synthesized and used to amplify a population of nucleotide sequences related to the human nuclear receptor family.
  • the primers listed in Table III can be synthesized with additional nucleotide bases added to one or both ends.
  • the amplified nucleotide sequences are used to identify nucleotide sequences related to the original group of human nuclear receptors shown in Table II. Identification of the amplified nucleotide sequences is useful for characterizing a specimen with respect to a given set of conditions.
  • the term "specimen” refers to any biological material of interest, such as cultured cells, a cell lysate, or a whole organism, which can be prokaryotic or eukaryotic, for example, cultured insect cells or mammalian cells or whole organisms such as mice or humans.
  • the expression of members of the human nuclear receptor family can be assessed using the nucleotide sequences amplified with the primers of Table III.
  • the expression of human nuclear receptors upon differentiation of P19 cells treated with retinoic acid, can be examined.
  • the expression of human nuclear receptors also can be examined in tissue specimens taken at different stages of development, such as during development of a neonatal human to an adult.
  • a subset of primers that amplify a group of related nucleotide sequences, as identified by a method of the invention is useful as a reagent for examining expression of members of a group of related nucleotide sequences .
  • the disclosed method generates a subset of primer pairs that amplifies a group of related nucleotide sequences, for example, a set of primers that amplifies the human nuclear receptor gene family. Furthermore, the disclosed method provides a subset of primers that amplifies a population of nucleotide sequences and identifies amplified nucleotide sequences related to the original group of related nucleotide sequences.
  • the amplified nucleotide sequence is a member of the group of related nucleotide sequences or is not a member of the original group of related nucleotide sequences.
  • the invention provides a method to determine expression of a group of related nucleotide sequences, for example the expression of the related nucleotide sequences after exposure of cells to a drug or chemical agent.
  • the invention also provides a method to systematically select a subset of primers to isolate and identify new nucleotide sequences, such as identifying nucleotide sequences related to the human nuclear receptor gene family or the human G-protein coupled receptor gene family.
  • DNA sequence databases were used to generate a list of genes or partial gene sequences classified as members of the human nuclear receptor gene family (see Table II), the human G-protein coupled receptor gene family (see Table V), human apoptosis-associated genes (see Table VII) and human DNA repair and replication genes (see Table IX) .
  • Duplications in the list of genes or partial gene sequences can be removed.
  • the term "duplication” refers to a region of nucleotide sequence common to more than one nucleotide sequence with a predetermined level of identity. For example, the shorter of two sequences that had a predetermined identity of being at least 95% identical were removed from the list of genes or partial gene sequences of the human nuclear receptor family (see Table II), the human G-protein coupled receptor gene family (see Table V), human apoptosis-associated genes (see Table VII) and human DNA repair and replication genes (see Table IX) . The shorter of any two sequences that overlap, where the overlap exceeds at least a predetermined identity, were also removed.
  • a list of 44 human nuclear receptor genes or partial gene sequences was used as a group of related nucleotide sequences for identification of a set of primers that can amplify all members of the identified group of human nuclear receptor genes (see Example II) .
  • a list of 113 human G-protein coupled receptor genes or partial gene sequences was used as a group of related nucleotide sequences for identification of a set of primers that can amplify all members of the identified group of human G-protein coupled receptor genes (see Example III).
  • degenerate primers derived from back translation of conserved amino acid motifs have been used to find new members of gene families (Carlberg et al.,
  • the invention provides a method to identify related but unidentified members of a group of related nucleotide sequences, which also contain the conserved structural regions that define the family. Using sets of primer pairs that include these conserved regions allows the amplification of all, or nearly all, related nucleotide sequences. From this pool of amplified sequences, the sequences of unknown family members containing the conserved region can be identified. For example, RNA samples are isolated from different tissues of an organism, in this case a human, and used as specimens to amplify a population of nucleotide sequences related to the human nuclear receptor family or the human G-protein coupled receptor family.
  • the amplified population of nucleotide sequences is cloned into a plasmid capable of replicating in bacteria, such as pBluescript (Stratagene; San Diego CA) , and cloned nucleic acid molecules are sequenced using standard techniques to identify nucleotide sequences in the population that were not members of the original group of related nucleotide sequences.
  • a plasmid capable of replicating in bacteria such as pBluescript (Stratagene; San Diego CA)
  • cloned nucleic acid molecules are sequenced using standard techniques to identify nucleotide sequences in the population that were not members of the original group of related nucleotide sequences.
  • the method can also be used to amplify sequences that are related by a common feature, such as ⁇ genes induced in response to a drug or chemical agent, genes involved in apoptosis, or genes involved in DNA repair and replication (see Examples IV and V) .
  • a common feature such as ⁇ genes induced in response to a drug or chemical agent, genes involved in apoptosis, or genes involved in DNA repair and replication (see Examples IV and V) .
  • a group of related nucleotide sequences can be identified that are genes induced upon stimulation of cells with retinoic acid.
  • the group of nucleotide sequences related by a common feature can contain members that are not structurally related.
  • treating P19 cells with retinoic acid causes the cells to differentiate into neurons and glial cells and induces genes such as Mashl, a basic helix-loop-helix protein, EGF receptor, a cell surface hormone receptor tyrosine kinase, and choline acetyl transferase. While these genes share the common feature of being induced by retinoic acid, the proteins encoded by these genes carry out completely different functions in different compartments of a cell.
  • Functional protein domains are encoded by regions of nucleotide sequence that are conserved between structurally related proteins.
  • the term "functional domain of a protein” refers to a region or a portion of a protein that allows the protein to perform its biological purpose, such as an enzymatic domain, a structural domain, or any domain used by the protein to carry out its biological function.
  • Primers that match with regions of a nucleotide sequence encoding such conserved domains also can amplify nucleotide sequences encoding structurally related family members. For example, primers that match with regions of the conserved kinase domain of the EGF receptor can amplify other tyrosine kinases.
  • regions of nucleotide sequence encoding conserved functional protein domains can be excluded from the list of all primers. These regions of nucleotide sequence encoding conserved functional domains can be added, for example, to a list of abundant nucleotide sequences to be excluded from the list of primers.
  • a method of the invention can provide a subset of primers that amplifies a group of related nucleotide sequences in which all members are not structurally related.
  • an amplified population of related nucleotide sequences that share a common feature but are not all structurally related is a source from which structurally related sequences can be obtained.
  • Proteins generally contain multiple functional domains, some of which are more highly conserved than others . For example, while the EGF receptor shares significant identity with the insulin receptor in its tyrosine kinase domain, which functions in intracellular signaling, the ligand binding domains do not share significant identity.
  • Excluding primers that match with highly conserved functional domains such as the kinase domain of the EGF receptor or the basic helix-loop-helix domain of Mashl, allows selection of a subset of primers that amplifies nucleotide sequences containing conserved structural domains. These nucleotide sequences containing conserved structural domains are related to an original group of related nucleotide sequences that are not all structurally related. For example, a subset of primers is selected that excludes the basic helix-loop-helix domain of Mashl from the subset of primers that amplifies Mashl, EGF receptor and choline acetyl transferase.
  • Such a subset of primers is used to amplify populations of nucleotide sequences in undifferentiated and differentiated P19 cells that contain members of the Mashl family but not other nucleotide sequences that share the basic helix-lop-helix domain.
  • a subset of primers is selected that allows identification of nucleotide sequences that are structurally related to, but not members of, an original group of nucleotide sequence that are not all structurally related.
  • a group of related nucleotide sequences can be related by a common biological function such as being induced, for example, by apoptosis.
  • members of the group of related nucleotide sequences will include nucleotide sequences that are structurally related as well as nucleotide sequences that are not structurally related.
  • a model system such as HaCaT cells can be used to determine if a given set of genes of interest are expressed.
  • apoptosis can be induced in HaCaT cells by the addition of sulindac (Hanif et al . , Biochem. Pharmacol. 52:237-245 (1996); Lu et al., Proc. Natl.
  • Apoptotic cells such as sulindac treated HaCaT cells can be used to identify genes induced by apoptosis or as a model system to confirm that a given subset of primers can amplify genes of interest induced by apoptosis.
  • primer pairs can be selected to perform PCR fingerprinting of a specimen of interest.
  • the computer process can require that primer pairs generate PCR products that differ by at least a predetermined size range such as ⁇ 3 base pairs.
  • PCR products of sizes that are impractical to separate using predetermined analytical techniques are not generated.
  • Primers from these conserved regions therefore, are predicted to generate similar sized PCR products. Eliminating these primers from the matrix maximizes the generation of dissimilar sized PCR products, which can be used to fingerprint a sample with respect to a given set of genes.
  • the invention provides a means to perform PCR fingerprinting.
  • PCR fingerprinting refers to using PCR to generate a set of nucleotide products of differing sizes that can be used to discriminate between a predetermined set of specimens.
  • PCR fingerprinting has been used to detect polymorphisms in related genomes (Welsh et al., Nucleic Acids Res. 19:303 (1991); Welsh and McClelland, Nucleic Acids Res. 19:861 (1991); Woods et al., J. Clin. Microbiol. 31:1927 (1993) ) .
  • PCR fingerprinting also has been used to study differential expression of arbitrarily sampled RNAs (Welsh et al.. Nucleic Acids Res.
  • the products of a PCR fingerprinting reaction also can be used as probes for differential hybridization.
  • the products of a PCR fingerprinting reaction are enriched for the related nucleotide sequences of interest. These PCR fingerprinting products will be effective as probes in differential hybridization experiments examining the related nucleotide sequences of interest because the complexity of the probe is much lower than for total cDNA.
  • primers can be designed to generate PCR fingerprints that can be used to quantitate the level of a particular nucleotide sequence relative to other nucleotide sequences within the group of related nucleotide sequences.
  • a method of the invention is used to isolate 3' ends of mRNA sequences, if the 3' ends are known.
  • Anchored oligo-(dT) primers have been used with arbitrarily selected primers or with a primer selected to match perfectly in one known gene to amplify the 3' end of that gene (Liang and Pardee, supra , 1992) .
  • anchor primer means a primer that can amplify the 3 1 end of an mRNA, generally by hybridizing to the poly (A) tail of an mRNA.
  • an anchor primer will generally be an oligo(dT) primer comprising a nucleotide sequence of the formula T n , where n designates the number of Ts in the primer.
  • a method of the invention which can be a computer process, is used to identify a smaller number of primers than the number of related nucleotide sequences in a group or even a single primer that hybridizes to the opposite strand relative to the anchored oligo-(dT) primer.
  • This identified primer with the anchored oligo-(dT) primer can generate PCR products from a known set of mRNA sequences, allowing one primer pair to amplify the 3' ends of multiple mRNA sequences.
  • anchored primers to sample the 3' end of mRNAs has several advantages. For example, where EST libraries are being probed, many EST libraries are naturally 3' biased so a probe of a given complexity that is also 3' biased should hybridize to more clones than a probe derived from internal sampling of mRNAs. In contrast, when using two arbitrary primers, many of the resulting PCR products will be derived from the middle or 5' end of mRNAs. These internal and 5' products are less likely to occur in a sample that is 3' biased, for example, a 3' biased EST library and such internal and 5 1 products might contribute to the background. Thus, with a 3' biased probe, the overall probe complexity can be lower for any given throughput of positive signals. However, when screening randomly primed EST libraries, internal priming is more desirable.
  • 3 ' sampling Another potential advantage of 3 ' sampling is that the 3' ends are generally the most divergent part of a mRNA. Thus, if there are closely related family members in a group of related nucleotide sequence that need to be distinguished, this distinction of closely related members is most easily achieved in the 3' non- coding region of the mRNA.
  • oligo(dT) primer can prime in multiple registers on the poly (A) tail, the PCR products can vary in size and not generate a discrete fingerprint.
  • an oligo(dT) primer combined with one or more specific primers, which can amplify more than one member of a group of related nucleotide sequences, can be used to generate a probe.
  • anchor primers that amplify the 3' end of a nucleotide sequence can be still be used.
  • Such a gel analysis which is convenient, fast, and economical, can be used to determine the quality and reproducibility of the probe at multiple RNA concentrations before using it as a probe.
  • oligo(dT) probes such as T n V, where V is G, C or A, have been used successfully to generate stable fingerprints.
  • Additional oligo(dT) probes that have been used to successfully generate fingerprints include T n G, T n A, and T n AC .
  • the invention also provides a subset of primers sufficient to amplify a group of related nucleotide sequences, wherein the subset comprises at least one anchor primer of the formula T n X ra , wherein X is selected from the group consisting of G, A, C and T, n is a number between 10 and 20 and m is a number between 0 and 3; and wherein the subset comprises one or more second primers, wherein the second primer combined with the anchor primer amplifies two or more related nucleotide sequences in the group. In such a case, the second primer is not an anchor primer.
  • the anchor primer can also have the formula T n X m , where n is a number between 5 and 50 and m is a number between 0 and 10.
  • a method of the invention is used to identify degenerate primers.
  • the term "degenerate primer” refers to a primer sequence that contains at least one position, X, that has more than one nucleotide, where X is A, G, T, C or a modified base such as inosine (I) .
  • X is A, G, T, C or a modified base such as inosine (I) .
  • a set of primers contains GATACCGT and GATTCCGT .
  • the set of primers contains CATCGAGG, CATCTAGG, CATCCAGG, CATCIAGG.
  • the statistical frequency of matching a given nucleotide sequence using degenerate primers increases with the number of primers in the set representing that degenerate primer, in these examples 2 and 4 primers, respectively, representing the degenerate primer.
  • the method provides an advantage in that longer primers, which would normally occur at low frequency in a group of related nucleotide sequences, can be identified.
  • degenerate primers allows, for example, some primers 10 or more bases long to occur many times in small groups of related nucleotide sequences.
  • An example would be primers of the form XXZXXZXXZXX, where each X is a different specified base and Z is either R (purine) or Y (pyrimidine) .
  • An example of such a primer is GAYTCRTCYCC.
  • the potential advantage of using such a degenerate primer is that the primer can hybridize to nucleotides encoding clusters of amino acid that are common among members of a group of related nucleotide sequences, even if the group is not from phylogenetically related genes.
  • a method of the invention is used to sample a large number of nucleotide sequences.
  • Public databases such as the THC database of the Institute for Genome Research, contain in excess of 50,000 nucleotide sequences from the 3' ends of human mRNA sequences.
  • Practical limitations to a predetermined analytical technique can require that limits be placed on the number of PCR products generated when large numbers of nucleotide sequences are to be analyzed. Therefore, minimizing the number of PCR products generated from an individual nucleotide sequence can be desirable.
  • a subset of primer pairs, where each primer pair generates 40 to 60 PCR products from a large group of related nucleotide sequences such as human mRNA sequences, is identified using the computer process in order to minimize the number of PCR products generated from an individual nucleotide sequence.
  • a matrix is generated that ranks the nucleotide sequences with respect to the number of primer pairs that can amplify a nucleotide sequence.
  • the nucleotide sequence amplified by the fewest number of primer pairs is selected.
  • the primer pair that amplifies the most other nucleotide sequences for example, up to 60 nucleotide sequences.
  • nucleotide sequences amplified by this selected primer pair are removed from the matrix, for example, up to 60 nucleotide sequences are removed.
  • a new matrix is generated that ranks the remaining nucleotide sequences with respect to the number of primer pairs that can amplify a nucleotide sequence.
  • the new matrix will contain, for example, up to 60 fewer nucleotide sequences than in the first matrix.
  • the nucleotide sequence remaining in the matrix that is amplified by the fewest number of primer pairs is selected.
  • a primer pair that amplifies the second selected nucleotide sequence and that amplifies the most other nucleotide sequences is selected.
  • nucleotide sequences amplified by this selected primer pair are removed from the matrix. This process is repeated until a subset of primer pairs is identified that amplifies all of the related nucleotide sequences. This subset of primer pairs minimizes redundant sampling of the nucleotide sequences in the original list due to removal of large numbers of nucleotide sequences, for example, up to 60 nucleotide sequences, at each iteration of the process. This subset of primer pairs produces the fewest number of PCR products from any single nucleotide sequence. An identical procedure, selecting a different starting primer pair that amplifies the first selected nucleotide sequence, can be used to generate a different subset of primer pairs that will also minimize redundant sampling of the original list of nucleotide sequences.
  • the 3' ends of nucleotide sequences derived from mRNA sequences can be isolated using anchored primers such as oligo (dT)C, oligo (dT)G, or oligo (dT)A (Liang and Pardee, supra , 1992) .
  • the computer process is used to identify primers that would amplify, with one of the anchor primers, a predetermined range of PCR products, for example, 40 to 60 PCR products, from a data base such as a database containing 3' expressed sequence tags (ESTs).
  • ESTs 3' expressed sequence tags
  • Identifying primer pairs capable of amplifying all nucleotide sequences from a large group of nucleotide sequences requires generation of a very large number of PCR primers.
  • the invention provides a method to identify a subset of primer pairs that amplifies all of the sequences in a large group of related nucleotide sequences, while minimizing the number of primer pairs sufficient to amplify all of the nucleotide sequences. For example, using a database containing a large group of nucleotide sequences, such as 50,000 mRNA sequences, primers are identified that can generate a predetermined number of PCR products, such as 40 to 60 PCR products, when paired with one of the anchor primers.
  • a matrix is generated that ranks the large group of nucleotide sequences, 50,000 in this example, with respect to the number of primer pairs that can amplify a nucleotide sequence, where the primer pairs are derived from the identified primers and the anchor primers that generate 40 to 60 PCR products.
  • the nucleotide sequence amplified by the fewest number of primer pairs is selected.
  • the primer pair that amplifies the most other nucleotide sequences up to 60 nucleotide sequences
  • All nucleotide sequences amplified by this selected primer pair are removed from the matrix, in this example up to 60 nucleotide sequences are removed from the matrix.
  • a new matrix is generated that ranks the remaining nucleotide sequences with respect to the number of primer pairs that can amplify a nucleotide sequence.
  • the process of identifying nucleotide sequences amplified by a primer pair and removing those identified nucleotide sequences from the list is repeated until all nucleotide sequences have been removed from the list and a set of primer pairs has been generated that can amplify all nucleotide sequences in the list. Because this set of primer pairs minimizes redundant sampling of the nucleotide sequences in the original list, the number of primers sufficient to amplify all of the nucleotide sequences has been minimized. In this example using 50,000 nucleotide sequences, a set of about 1500 primers can be identified that amplifies the entire list of 50,000 nucleotide sequences .
  • ESTs expressed sequence tags
  • GenBank http:// www.ncbi.nlm.nih.gov/ dbEST/ index.html.
  • a list of ESTs compiled from available databases can be used to exclude primers that match those ESTs that are highly expressed in many cell types.
  • such a list of ESTs can be used to develop primer sets that will allow the sampling of virtually all known ESTs or a user specified subset of ESTs.
  • primer sets that selectively amplified large sets of genes were determined by randomly pairing a set of primers using a Monte Carlo method (Lopez-Nieto and Nigam, Nature BioTechnology 14:857 (1996)). The primer sets were selective for protein- coding regions of structurally related genes.
  • a method of the invention provides a systematic approach to identifying primer pairs that can sample a group of related nucleotide sequences, using a variety of predetermined criteria.
  • the group of related nucleotide sequences need not be structurally related.
  • the method disclosed herein systematically maximizes the number of nucleotide sequences that can be amplified with a set of primers by systematically pairing primers using a matrix.
  • the invention disclosed herein provides a method to impose specified criteria on primers and on the PCR products generated by those primers.
  • the method allows imposing exclusion of primers that match undesirable nucleotide sequences, requiring a minimum number of PCR products be produced by a primer pair, exclusion of primers that match conserved structural regions that encode functional protein domains, and requiring that primer pairs generate different sized PCR products.
  • a method of the invention disclosed herein allows generating primers that match to any region of an mRNA sequence, not just the coding region.
  • Primer sets that sample open reading frames
  • ORF in mammalian mRNA sequences have been described (Lopez-Nieto and Nigam, supra , 1996) .
  • the ORF-specific 8-mer primers sampled 14% of the human nuclear receptor genes, 28% of the human G-protein coupled receptor genes, 40% of the human apoptosis-associated genes and 31% of the human DNA repair and replication genes identified as groups of related nucleotide sequences in the Examples herein.
  • the methods of the invention provided identification of sets of primer pairs that sampled 100% of the human nuclear receptor genes and human G-protein coupled receptor genes as well as 93% of the human apoptosis-associated genes and 98% of the human DNA repair and replication genes.
  • the invention also provides a subset of primers of sufficient to amplify a group of related nucleotide sequences comprising nuclear hormone receptor genes.
  • a subset of such primers can be selected from the group consisting of TGCAAGGG; TGCAGGAG; CAGCAGCG; GGCTGCAA; GCCTCCAG; TCCTGGAG; CTGCCTGG; CCTTCCTC; CTCCCTGG; CTGCCCTG; AGGGCTGC; CTGCTGGA; CCGCTGCC; GGAGGCAG; AGCCTGGA; GGGCAGAG; GGCAGCTG; GAGGAAGG; CAGCTGCC, GATTCCAC; GATGAGCT; CTTCTGGA; and CTGGAGCT.
  • the invention also provides the primers shown in Table III
  • the invention additionally provides a subset of primers sufficient to amplify a group of related nucleotide sequences comprising G-protein coupled receptors.
  • a subset of such primers can be selected from the group consisting of GCTGGCCA; TCTGCTGG; CTGTGCTG;
  • CTGGCCAG CTGGCCAC; CTGCCTCC; TGTGGCCC; GGCTATGT; TCCAGTCC; TGGCCAGC; CAGCACAG; CAGCAGCG; and CAGCCAGC.
  • the invention also provides the primers shown in Table
  • the invention further provides a subset of primers sufficient to amplify a group of related nucleotide sequences comprising apoptosis-associated genes.
  • a subset of such primers can be selected from the group consisting of CTGGAGGA; TCATCCAG; CTGGAGAA; GCTGCAGC; CTGCTGGA; GAACAGGA; GCTCCTGG GCCCCTGG;
  • the invention also provides the primers shown in Table VIII.
  • the invention additionally provides a subset of primers sufficient to amplify a group of related nucleotide sequences comprising DNA repair and replication genes.
  • a subset of such primers can be selected from the group consisting of GGAAGGAG; TGCAGGAG; CTGGCTGA; CTTCCTCA TCATCCAG; AGCAGCAA; AGGCTGGG;
  • CTTCCTGA CCTCCTGG TGCTCTGG; CTGCTGAA; GCTGCTGA;
  • TGGAGAGA CTGATGAC; GAGATGGA; AGATGCTG; GCTGGAAG;
  • the invention also provides the primers shown in Table X.
  • the invention also provides a computer apparatus comprising a processor, main memory in communication with the processor, and a primer pair selector in communication with the main memory for carrying out the computer-executed steps of identifying a group of related nucleotide sequences; generating a set of primers that matches each of the related nucleotide sequences; determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified; and selecting from the systematic pairings a subset of primers sufficient to amplify the group of related nucleotide sequences.
  • the invention also provides a computer program product for determining a set of primer pairs sufficient to amplify a group of related nucleotide sequences comprising means for identifying a group of related nucleotide sequences; means for generating a set of primers that match each of the related nucleotide sequences; means for determining for each systematic pairing of each primer which of the related nucleotide sequences are amplified; means for selecting from the systematic pairings a subset sufficient to amplify the group of the related nucleotide sequences; and signal- bearing media containing the means for the identifying, generating, determining and selecting.
  • Computer system 10 has operating system 11, processor 12, main memory 14, primer pair selector 16, display screen 20, input device 22, media drive 24, and disk storage 26, each of which is connected to system unit 10.
  • Operating system 11 is an operating system such as UNIX, MS-DOS, Windows, or OS/2.
  • the processor 12 is a general purpose programmable processor such as an Intel PENTIUM processor or a Motorola 68,000 processor, suitable for a mid-size computer such as DEC or IBM.
  • the main memory 14 can be well known random access memory (RAM) that is sufficiently large to hold the necessary programming and data structures.
  • the primer pair selector 16 in communication with main memory carries out the computer-executable steps of identifying a group of related nucleotide sequences; generating a set of primers that matches each of the related nucleotide sequences; determining for each systematic pairing of each primer, which of the related nucleotide sequences is amplified; and selecting from the systematic pairings a subset sufficient to amplify all of the related nucleotide sequences.
  • the display screen 20 is a screen for visualizing, for example, input data.
  • the input device 22 is a mouse or a keyboard, or a combination thereof, or any other device to input information.
  • the media drive 24 is a drive, such as a tape drive, a disk drive or a CD drive, that provides the computer system 10 access to the primer pair selector 16.
  • the disk storage 26 is a device, such as magnetic tape or Zip disk, that provides storage capacity for data.
  • Step 100 starts the implementation of the present invention.
  • step 102 a group of related nucleotide sequences is identified.
  • step 104 a set of primers that matches each of the group of related nucleotide sequences is generated.
  • step 106 the group of related nucleotide sequences amplified by a primer pair is determined for each systematic pairing of each primer.
  • step 108 from said systematic pairings, a subset of primers, sufficient to amplify all of the related nucleotide sequences, is selected.
  • the method steps related to selecting a subset of primers sufficient to amplify a group of related nucleotide sequences end in step 110.
  • step 200 starts the implementation of the present invention.
  • step 202 a group of related nucleotide sequences, referred to as sequences in Figures 3 and 4, is identified. If desired, duplications can be removed from a list containing the group of related nucleotide sequences. For example, the shorter of two sequences that have greater than 95% identity can be removed from the list.
  • step 204 a list of all possible primers is compiled.
  • the list of primers can be constrained to predetermined criteria, for example, a window of G+C content can be imposed, primers can be limited to a specified length or range of lengths, or any combination of these and other criteria can be imposed.
  • the primers are ranked by the number of nucleotide sequences that the primer is capable of sampling.
  • a set of top ranked primers is generated by selecting the top ranked primers and adding the top ranked primers and their complements to the set.
  • the top ranked primers are those that sample the largest number of nucleotide sequences. For example, the top 30 primers can be selected, generating a list of 60 primers, including the selected primers and their complements.
  • the list of primers can be constrained to predetermined criteria, for example, primers hybridizing to undesirable nucleotide sequences can be excluded.
  • each primer in the top ranked primer set is paired to all primers in the set.
  • the related nucleotide sequences which are sampled by primer pairs are determined for each primer pair. If desired, additional constraints can be imposed on the primer pairs.
  • the computer process can require that the PCR products be limited to a predetermined size, for example greater than 100 base pairs or less than 1000 base pairs. Furthermore, the computer process can require that primer pairs generate PCR products that differ by a predetermined size range, such as ⁇ 3 base pairs. Thus, PCR products of sizes that are impractical to separate using predetermined analytical techniques will not be generated. Primer pairs also can be limited to those primer pairs that generate a minimum number of different sized PCR products such as three or more different sized PCR products.
  • the different PCR products can be limited to PCR products derived from different nucleotide sequences.
  • a matrix is generated that ranks the nucleotide sequences by the number of primer pairs that generate simulated PCR products and the primer pairs by the number of sequences sampled.
  • a nucleotide sequence is selected by a predetermined criterion. For example, the nucleotide sequence sampled by the largest number of primers can be selected.
  • a primer pair that amplifies the selected sequence is identified.
  • predetermined criteria can be imposed on the primer pairs.
  • the primer pairs can be required to generate PCR products of a specified range, such as greater than 100 base pairs or less than 1000 base pairs.
  • the primer pairs can be required to sample a minimum number of nucleotide sequences, such as more than two different nucleotide sequences.
  • the primer pairs can be required to generate PCR products that differ by a predetermined value, such as the PCR products generated must differ by at least ⁇ 3 base pairs.
  • the primer pair that samples the selected nucleotide sequence and that samples the largest number of other nucleotide sequences in the group of related nucleotide sequences is identified. If desired, any one of the above mentioned criteria, or any other criteria, alone or in any combination, can be imposed on the primer pairs.
  • a subgroup is created by identifying all nucleotide sequences in the group of related nucleotide sequences that are amplified by the identified primer pair.
  • the identified primer pair and those nucleotide sequences in the subgroup that are sampled by the primer pair are removed from the matrix.
  • a new matrix is created that contains the remaining primer pairs and remaining nucleotide sequences.
  • step 226 an inquiry is performed to determine if there are any nucleotide sequences remaining in the new matrix that are sampled by any primer pairs remaining in the new matrix.
  • step 2208 if the answer is "yes”, then a nucleotide sequence remaining in the new matrix is selected by the predetermined criterion (go to step 216).
  • step 230 if the answer is "no”, a subset of primers from all primer pairs identified in step 218 that sample all of the group of related nucleotide sequences is selected. If desired, any nucleotide sequences that are not sampled by primer pairs derived from the set of primers selected in step 208 can be identified.
  • nucleotide sequences which are not sampled by the original set of top ranked primers can be used to generate a new set of primers that match the related nucleotide sequences that are not sampled.
  • the new set of top ranked primers is generated by compiling a new list of all primers that match the nucleotide sequences not sampled.
  • the top ranked primers in the new list and their complements are combined with the top ranked primers identified in the original list to generate a new set of top ranked primers as in step 208.
  • any nucleotide sequences that are not sampled by the new set of top ranked primers can be identified and a new list of primers can be compiled.
  • the process can be repeated until all of the nucleotide sequences m the original group of related nucleotide sequences are amplified. If desired, a limit to the number of repeats of identifying nucleotide sequences not sampled by the primers the top ranked primer list can be imposed, such as limiting the number of repeats to three.
  • the subset of primers selected by the method can amplify all of the group of related nucleotide sequences. However, if desired, the subset of primers need not amplify all members of the group of related nucleotide sequences.
  • the number of repeats of the process of identifying any nucleotide sequences not sampled, or for example, sampled only once, can be limited such that the identified subset of primers amplifies a desired percentage of the group of related nucleotide sequences.
  • the method steps related to selecting a subset of primers sufficient to amplify a group of related nucleotide sequences end step 232.
  • the program product 700 includes a computer readable medium 702, such as a floppy disk, readable by media drive 24 containing signals 704 to 710 recorded thereon.
  • Identifier signals 704 are means for identifying a group of related nucleotide sequences.
  • Identifier signals 706 are means for generating a set of primers that matches each of a group of related nucleotide sequences.
  • Identifier 708 are means for determining, for each systematic pairing of each primer, which of the related nucleotide sequences are amplified.
  • Identifier 710 are means for selecting from the systematic pairings a subset of primers which can amplify all of the related nucleotide sequences.
  • the medium 702 is signal bearing media containing said means for identifying, generating, determining, and selecting as described above.
  • This example demonstrates the selection of a set of PCR primers that amplify a group of related nucleotide sequences.
  • the program WORDUP was adapted to generate a list of the most common 9-mer, 8-mer and 7-mer primers from an identified group of related nucleotide sequences (Pesole et al., supra (1992)).
  • the program was written in C language on a UNIX operating system.
  • the appropriate upper limit of primer length was determined.
  • a primer 10 nucleotides in length is expected to occur about once every 1,000,000 (4 10 ) base pairs.
  • the most common 10-mers in a group of related nucleotide sequences were identified. If the 10-mers occurred only a few times and were confined to regions encoding conserved protein domains, the primers were limited to 9-mers or shorter. For the groups of related nucleotide sequences used in these Examples, the primers were limited to 9-mers or shorter.
  • accession numbers of abundant nucleotide sequences was compiled. The list contained human ribosomal RNA, human mitochondrial DNA, Alu elements and mRNA sequences carrying fragments of LINE elements (Table I).
  • RNA, mitochondrial DNA, Alu elements or LINE elements shown in Table I RNA, mitochondrial DNA, Alu elements or LINE elements shown in Table I.
  • primers were limited to those primers greater than seven nucleotides in length.
  • Primers eight or nine nucleotides in length were considered for further analysis.
  • the typical 8-mer has a frequency of about one in 65,000 (4 8 ) base pairs.
  • the typical 9-mer has a frequency of about one in 262,000 (4 9 ) .
  • a group of related nucleotide sequences was identified. Duplicate sequences with greater than 95% identity were identified and the longest of the nucleotide sequences in the identified group was retained in the group using the computer process CLEANUP (Grillo et al., Comput . Applic. Biosci. 12:1 (1996)), which is incorporated herein by reference. The shorter of the duplicate nucleotide sequences was removed from the list. Nucleotide sequences less than 800 bases in length were also removed from the list.
  • the remaining 8-mer and 9-mer primers were ranked in the order of the number of related nucleotide sequences to which the primers matched.
  • the top 50 primers matching the largest number of related nucleotide sequences were selected and these 50 primers and their complements were compiled into a list of primers.
  • Simulated PCR products from 50 to 1000 base pairs were determined for each primer pair combination.
  • a matrix was generated that ranked the related nucleotide sequences by the number of primer pairs that can generate a simulated PCR product and the primer pairs by the number of related nucleotide sequences that can be sampled.
  • the most frequently sampled related nucleotide sequence was selected as the first sequence to consider.
  • the primer pair that sampled the first selected nucleotide sequence was used to identify the remaining related nucleotide sequences which the selected primer pair sampled.
  • the primer pairs were required to sample at least two different nucleotide sequences.
  • the primer pair and the first selected nucleotide sequence sampled by the primer pair were removed from the matrix.
  • the most frequently sampled related nucleotide sequence remaining in the list was selected, and the primer pair that sampled this second selected related nucleotide sequence was selected and used to identify the remaining related sequences which the selected primer pair sampled.
  • the primer pair and the identified nucleotide sequences sampled by the primer pair were removed from the matrix.
  • the process of selecting a related nucleotide sequence and the primer pairs that generated simulated PCR products was repeated until a set of primer pairs was identified that sampled the entire list of related nucleotide sequences. If a first iteration of the above procedure did not yield a set of primer pairs that sampled all of the related nucleotide sequence, those related nucleotide sequences not sampled in the first matrix were compiled into a new list to repeat the procedure. In these examples, the number of iterations was limited to two.
  • PCR products were applied. If more than one PCR product was found for a given nucleotide sequence, the shorter of the PCR products was chosen. Also, the minimum number of PCR products for each nucleotide sequence was set at three to maximize sampling of each nucleotide sequence. PCR products also were limited to those that have different PCR primers at each end.
  • This example demonstrates that a subset of PCR primers that amplifies the human nuclear receptor gene family can be selected using the disclosed method.
  • a computer process was used to generate a set of primer pairs that amplifies the human nuclear receptor gene family (see Example I and Figure 2) .
  • the typical 8-mer has a frequency of about one in 65,000 (4 8 ) base pairs, or about once in the list of
  • the most prevalent 8-mer primer occurred 34 times in the list of human nuclear receptor sequences shown in Table II.
  • the typical 9-mer has a frequency of about one in 262,000 (4 9 ) .
  • the most prevalent 9-mer primer occurred 20 times in the list of human nuclear receptor sequences shown in Table II.
  • PCR primers sampling abundant nucleotide sequences were removed from consideration. After removal of abundant nucleotide sequences, 26 of the 100 top ranked 8-mer primers sampling the human nuclear receptor gene family remained in the list of primers. After removal of abundant nucleotide sequences, 50 of the 100 top ranked 9-mer primers sampling the human nuclear receptor gene family remained in the list of primers.
  • A0 and R0' GGCTGCAA and GCCTCCAG
  • the set of 8-mer primers that sampled all (100%) of the identified human nuclear receptor genes contained 21 primers that formed 13 primer pairs. These primers generated 75 simulated PCR products.
  • a set of 9-mer primers that sampled the human nuclear receptor family was also identified and contained 21 primers that formed 12 primer pairs. These 9-mer primers generated 48 simulated PCR products and sample 30 of 44 (68%) of the human nuclear receptor family.
  • This example demonstrates that a subset of PCR primers that amplifies the human G-protein coupled receptor gene family can be selected using the disclosed method.
  • a computer process was used to generate a set of primer pairs that amplifies the human G-protein coupled receptor gene family (see Example I and Figure 2) .
  • the typical 8-mer has a frequency of about one in 65,000 (4 8 ) base pairs.
  • the most prevalent 8-mer primer occurred 68 times in the list of human G-protein coupled receptor sequences shown in Table V.
  • the typical 9-mer has a frequency of about one in 262,000 (4 9 ).
  • the most prevalent 9-mer primer occurred 40 times in the list of human G-protein coupled receptor sequences shown in Table V.
  • PCR primers sampling abundant nucleotide sequences were removed from consideration. After removal of abundant nucleotide sequences, 28 of the 100 top ranked 8-mer primers sampling the human G-protein coupled receptor gene family remained in the list of primers. After removal of abundant nucleotide sequences, 45 of the 100 top ranked 9-mer primers sampling the human G-protein coupled receptor gene family remained in the list of primers
  • Y0 and FI TCCTGGTG and CTGGGCCA O and A0 : TGCTGGGC and GCTGGCCA
  • the set of 8-mer primers that sampled all (100%) of the identified human G-protein coupled receptor genes contained 45 primers that formed 29 primer pairs. These primers generated 240 simulated PCR products.
  • a set of 9-mer primers that sampled the human G-protein coupled receptor gene family was also identified and contained 53 primers that formed 37 primer pairs. These 9-mer primers generated 178 simulated PCR products and sample 101 of 113 (89%) of the G-protein coupled receptor gene family.
  • This example demonstrates that a subset of PCR primers that amplifies human apoptosis-associated genes can be selected using the disclosed method.
  • a computer process was used to generate a set of primer pairs that amplifies human apoptosis-associated genes (see Example I and Figure 2) .
  • the typical 8-mer has a frequency of about one in 65,000 (4 8 ) base pairs.
  • the most prevalent 8-mer primer occurred 30 times in the list of human apoptosis- associated gene sequences shown in Table VII.
  • the typical 9-mer has a frequency of about one in 262,000 (4 9 ) .
  • the most prevalent 9-mer primer occurred 17 times in the list of human apoptosis-associated gene sequences shown in Table VII.
  • PCR primers sampling abundant nucleotide sequences were removed from consideration. After removal of abundant nucleotide sequences, 15 of the 100 top ranked 8-mer primers sampling the human apoptosis-associated genes remained in the list of primers. After removal of abundant nucleotide sequences, 30 of the 100 rop ranked 9-mer primers sampling the human apoptosis-associated genes remained in the list of primers.
  • a set of 8-mer primer pairs that amplify 56 of the 60 human apoptosis-associated genes were identified (see Table VIII) . Genes 11, 13, 25 and 54 from the list in Table VII were not amplified.
  • the set of 8-mer primers that sampled 56 of 60 (93%) of the identified human apoptosis-associated genes contained 42 primers that formed 24 primer pairs. These primers generated 106 simulated PCR products.
  • DO and EO' TGAAGAGC and CTGCTGGA
  • A0 and AO' CCTGGGAG and CTCCCAGG
  • Tl and H2 CAGCTGGA and CAGCCGCC
  • This example demonstrates that a subset of PCR primers that amplifies human DNA repair and replication genes can be selected using the disclosed method.
  • a computer process was used to generate a set of primer pairs that amplifies human DNA repair and replication genes (see Example I and Figure 2) .
  • Human DNA repair and replication genes were identified using the PubMed database (http://www4.ncbi.nlm.nih.gov/PubMed/), which is incorporated herein by reference. A list of 169 mRNA and mRNA fragments of human DNA repair and replication genes was compiled. Following removal of duplicates and nucleotide sequences less than 800 bases, 65 human DNA repair and replication gene mRNA sequences remained in the list (see Table IX) . The typical 8-mer has a frequency of about one in 65,000 (4 8 ) base pairs. The most prevalent 8-mer primer occurred 36 times in the list of human DNA repair and replication gene sequences shown in Table IX. The typical 9-mer has a frequency of about one in 262,000 (4 9 ) . The most prevalent 9-mer primer occurred 22 times in the list of human DNA repair and replication gene sequences shown in Table IX.
  • Numbers 1, 2, 3, etc. represent arbitrary labeling of the 65 nucleotide sequences.
  • PCR primers sampling abundant nucleotide sequences were removed from consideration. After removal of abundant nucleotide sequences, 15 of the 100 top ranked 8-mer primers sampling the human DNA repair and replication genes remained in the list of primers. After removal of abundant nucleotide sequences, 38 of the 100 top ranked 9-mer primers sampling the human DNA repair and replication genes remained in the list of primers. A set of 8-mer primer pairs that amplify 64 of the 65 human DNA repair and replication genes were identified (see Table X) . Gene 38 from the list in Table IX was not amplified.
  • G2 and 02 TGGAGAGA and CTGATGAC
  • the set of 8-mer primers that sampled 64 of 65 (98%) of the identified human DNA repair and replication genes contained 44 primers that formed 25 primer pairs. These primers generated 120 simulated PCR products.
  • a set of 9-mer primers that sampled the human DNA repair and replication genes was also identified and contained 28 primers that formed 15 primer pairs. These 9-mer primers generated 51 simulated PCR products and sampled 35 of 65 (54%) of the human DNA repair and replication genes.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne une méthode permettant de déterminer un ensemble de paires d'amorces destinées à l'amplification d'un groupe de séquences nucléotidiques associées. La méthode de l'invention consiste à identifier un groupe de séquences nucléotidiques associées; à produire l'ensemble d'amorces correspondant à chacune des séquences nucléotidiques associées; à déterminer pour chaque appariement systématique de chacune des amorces quelles sont les séquences nucléotidiques associées amplifiées; et à sélectionner à partir des appariements systématiques, un sous-ensemble amplifiant toutes les séquences nucléotidiques associées. L'invention concerne également une méthode d'utilisation d'un ensemble de paires d'amorces, méthode qui consiste à amplifier un groupe de séquences nucléotidiques associées pour identifier des séquences nucléotidiques associées au groupe de séquences nucléotidiques d'origine. L'invention concerne aussi un dispositif informatique permettant de réaliser les différentes étapes, exécutées de manière informatisée, de l'invention. L'invention concerne enfin un logiciel comprenant un milieu vecteur de signaux permettant de mettre en oeuvre la méthode précitée.
EP98945882A 1997-09-05 1998-09-04 Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques Withdrawn EP1007739A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US92581697A 1997-09-05 1997-09-05
US925816 1997-09-05
PCT/US1998/018392 WO1999011823A2 (fr) 1997-09-05 1998-09-04 Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques

Publications (1)

Publication Number Publication Date
EP1007739A2 true EP1007739A2 (fr) 2000-06-14

Family

ID=25452285

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98945882A Withdrawn EP1007739A2 (fr) 1997-09-05 1998-09-04 Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques

Country Status (3)

Country Link
EP (1) EP1007739A2 (fr)
AU (1) AU9302798A (fr)
WO (1) WO1999011823A2 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU785425B2 (en) 2001-03-30 2007-05-17 Genetic Technologies Limited Methods of genomic analysis
US6898531B2 (en) 2001-09-05 2005-05-24 Perlegen Sciences, Inc. Algorithms for selection of primer pairs
US6740510B2 (en) 2001-09-05 2004-05-25 Perlegen Sciences, Inc. Methods for amplification of nucleic acids
WO2003021259A1 (fr) * 2001-09-05 2003-03-13 Perlegen Sciences, Inc. Selection de paires d'amorces
EP1941058A4 (fr) * 2005-10-27 2010-01-20 Rosetta Inpharmatics Llc Amplification d'acides nucléiques au moyen d'amorces non aléatoires
DE602006018352D1 (de) 2005-12-06 2010-12-30 Ambion Inc Rückübertragungs-primer und verfahren zu deren entwurf
CN102124126A (zh) * 2007-10-26 2011-07-13 生命技术公司 使用非随机引物的cdna合成
US20100159533A1 (en) * 2008-11-24 2010-06-24 Helicos Biosciences Corporation Simplified sample preparation for rna analysis
WO2012032510A1 (fr) 2010-09-07 2012-03-15 Yeda Research And Development Co. Ltd. Amorces pour l'amplification d'adn et procédés de sélection de ces dernières

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5437975A (en) * 1991-02-25 1995-08-01 California Institute Of Biological Research Consensus sequence primed polymerase chain reaction method for fingerprinting genomes
EP0592626B1 (fr) * 1992-03-11 2003-01-29 Dana-Farber Cancer Institute, Inc. Procedes de clonage d'arn messager
EP0642590A1 (fr) * 1992-05-27 1995-03-15 AMERSHAM INTERNATIONAL plc Analyse "fingerprint" d'arn

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9911823A3 *

Also Published As

Publication number Publication date
AU9302798A (en) 1999-03-22
WO1999011823A2 (fr) 1999-03-11
WO1999011823A3 (fr) 1999-06-10

Similar Documents

Publication Publication Date Title
To Identification of differential gene expression by high throughput analysis
EP0743989B1 (fr) Procédé d'identifcation des genes exprimés differentes
Meyers et al. Methods for transcriptional profiling in plants. Be fruitful and replicate
Alba et al. ESTs, cDNA microarrays, and gene expression profiling: tools for dissecting plant physiology and development
Xiang et al. cDNA microarray technology and its applications
US6309834B1 (en) Method for simultaneous identification of differentially expressed mRNAs and measurement of relative concentrations
US7691614B2 (en) Method of genome-wide nucleic acid fingerprinting of functional regions
Lennon High-throughput gene expression analysis for drug discovery
Martin et al. [14] Principles of differential display
CA2470965C (fr) Approche axee sur la biologie des systemes : plateformes de criblage a haut rendement, a dimensions multiples
JP2005502365A (ja) アレイ状に配置した増大させた生体試料の核酸の表示配列中における生体試料の遺伝分析
US20070020623A1 (en) Method for determining homeostasis of the skin
Stanton Methods to profile gene expression
Byers et al. Subtractive hybridization–genetic takeaways and the search for meaning
Rafalski et al. New experimental and computational approaches to the analysis of gene expression.
US20060024705A1 (en) Molecular analysis of hair follicles for disease
Ibrahim et al. A comparative analysis of transcript abundance using SAGE and Affymetrix arrays
WO1999011823A2 (fr) Selection de paires d'amorces pcr destinees a l'amplification d'un groupe de sequences nucleotidiques
JPH10510981A (ja) ヌクレオチド配列を特性決定するための方法、装置及び組成物
Li et al. RNA amplification, fidelity and reproducibility of expression profiling
Spetman et al. Microarray mapping of nucleosome position
US6861219B2 (en) Preferential display
Del Rio et al. Genomics and neurological phenotypes: applications for seizure-induced damage
US5595870A (en) Identifying nucleic acids by restriction digestion and hybridization with random or pseudorandom oligonucleotides
Auesukaree cDNA microarray technology for the analysis of gene expression

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000404

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20020403