US20070059743A1 - Method for the design of oligonucleotides for molecular biology techniques - Google Patents

Method for the design of oligonucleotides for molecular biology techniques Download PDF

Info

Publication number
US20070059743A1
US20070059743A1 US11/506,089 US50608906A US2007059743A1 US 20070059743 A1 US20070059743 A1 US 20070059743A1 US 50608906 A US50608906 A US 50608906A US 2007059743 A1 US2007059743 A1 US 2007059743A1
Authority
US
United States
Prior art keywords
oligonucleotides
oligonucleotide
sequences
primers
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/506,089
Other versions
US7853408B2 (en
Inventor
Alejandro Maass Sepulveda
Andres Aravena Duarte
Mauricio Gonzalez Canales
Servet Martinez Aguilera
Pilar Parada Valdecantos
Katia Ehrenfeld Stolzenbach
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Biosigma SA
Original Assignee
Biosigma SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Biosigma SA filed Critical Biosigma SA
Assigned to BIOSIGMA S.A. reassignment BIOSIGMA S.A. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARAVENA DUARTE, ANDRES OCTAVIO, EHRENFELD STOLZENBACH, KATIA NICOLE, GONZALEZ CANALES, MAURICIO ALEJANDRO, MAASS SEPULVEDA, ALEJANDRO EDUARDO, MARTINEZ AGUILERA, SERVET, PARADA VALDECANTOS, PILAR ANGELICA
Publication of US20070059743A1 publication Critical patent/US20070059743A1/en
Application granted granted Critical
Publication of US7853408B2 publication Critical patent/US7853408B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation

Definitions

  • the present invention discloses a method for the design of oligonucleotides useful in molecular biology techniques as PCR primers or in other techniques as identification and/or quantification probes. Specially, a method is disclosed to design specific oligonucleotides for the identification of a determined sequence in a metagenomic sample.
  • oligonucleotides DNA sequences satisfying given physicochemical and biological requirements to assess the presence of a certain organism or group of organisms.
  • fluorescent in situ hybridization FISH
  • denaturing gradient gel electrophoresis DGGE
  • conjugation with specific markers like detection or quantification probes for certain microorganisms, genes or sequences
  • PCR polymerase chain reaction
  • This invention could be applied in said cases or in other cases wherein specific oligonucleotides are required.
  • oligonucleotides are artificially synthesized according to the description of their composing bases.
  • the determination of the specific sequences that are suitable for each particular procedure is called “oligonucleotide design”. According to the involved procedure, certain thermodynamic restrictions could limit the set of valid oligonucleotides. Oligonucleotides resulting from this design procedure will be completely determined by the nucleotide sequences used in their synthesis, which could be characterized as words having finite length in the alphabet ⁇ A, C, T, G ⁇ .
  • the traditionally used method in this case requires performing a multiple alignment of all the sequences that are to be recognized, by means of a computer program as CLUSTALW (Higgins D., Thompson J., Gibson T. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680).
  • CLUSTAL W improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.
  • This alignment allows the determination of conserved regions among all the sequences to be recognized and therefore the design of oligonucleotides within these regions.
  • the performance of these alignments is expensive and could be prohibitive when the number of sequences is large.
  • multiple alignments require the determination of penalty parameters derived from some evolutionary model of the sequences. The result depends on the values chosen for these penalties and may not be robust when confronted to small changes in these values.
  • ⁇ G is evaluated for all the candidate oligonucleotides and the selection criteria is much stringent, as preferentially only oligonucleotides having ⁇ Gh min equal to ⁇ 1.5 kcal/mole ( ⁇ G for hairpin formation) are selected.
  • sequence auto-complementarity is evaluated and only 5 non-contiguous matches are allowed.
  • primer dimers the presence of complementary sequences in 4 residues at the 3′ end of the molecule in the same primer (to avoid dimers) and in the other primer (to avoid cross-reactivity) is evaluated.
  • secondary structure formation is faced in a different and more efficient way than the simple sequence complementarity comparison; in this case, differences in Gibbs free energies are evaluated for all possible conformations and the probability of each selected oligonucleotide to form secondary structures is determined based on the most stable conformation.
  • the method of the invention shows indisputable technical advantages over other existent methods in the state of the art.
  • said problems of the existing technique have been solved, creating a method for the design of specific oligonucleotides for a given sequence or group of sequences, that considers not only the information of the genetic material to be identified but also the information of all the genetic material that could be present in a metagenomic sample over which the method will be applied.
  • oligonucleotide design Another common problem in the field of oligonucleotide design is the fact that even when an oligonucleotide meeting the required specificity could be available, in practice of molecular biology procedures said oligonucleotide is not efficient. Explanations for this inefficiency are formation of secondary structures within the oligonucleotide sequence (hairpins) or auto-hybridization, which decreases the active concentration of the oligonucleotide in the reaction mix.
  • the method of the invention includes a step wherein the designed oligonucleotides are thermodynamically evaluated to discard formation of hairpins, auto-hybridization or cross-hybridization between two primers. For each of these situations, Gibbs free energy differences are calculated for all the possible conformations, the most stable conformation being selected; if said most stable conformation has a ⁇ G value less than a certain threshold, said oligonucleotide is discarded, thus guaranteeing the availability of the designed oligonucleotides.
  • the method of the present invention allows solving all the problems existing in the field of oligonucleotide design for Molecular Biology techniques.
  • the present invention discloses a method that can be used to identify one DNA or RNA sequence or one specific group of DNA or RNA sequences from a complex biological sample.
  • oligonucleotides that are artificially synthesized from a description of their composing bases.
  • the oligonucleotide design method comprises the selection or construction of a database of reference sequences, the selection of a subset of sequences belonging to target organisms, the selection of candidate oligonucleotides from such sequences, the depuration of these candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria, which allows to obtain a list of designed oligonucleotides and, optionally, the sorting of such oligonucleotides according to their taxonomical specificity.
  • This method variant comprises the construction or selection of a database of reference sequences, the selection of a subset of sequences belonging to target organisms, the selection of two sets of candidate oligonucleotides from such sequences, the depuration of each set of candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria, the elaboration of a list of oligonucleotides pairs or primers formed by one element from each set that satisfy physical and thermodynamic requirements and the sorting of such oligonucleotides pairs according to taxonomical specificity.
  • FIG. 1 shows the results of a PCR performed with oligonucleotides pairs or primers designed using the method of the invention to carry out a specific PCR for A. thiooxidans amplifying 16S rDNA from 5 samples of A. ferrooxidans and 2 samples of A. thiooxidans ; the content of each lane is specified in the Examples section, in Table 5.
  • the method for the design of oligonucleotides herein described takes a database of DNA or RNA sequences as an input. Depending on experimental requirements being considered, these sequences may be complete genomes or fragments from each genome. For instance, all known sequences of a given gene or genomic region could be considered. In a preferred embodiment of the present invention, a database designed by us is considered, which contains all known sequences of gene 16S. A requirement to be met by the database under use is that every sequence must have been taxonomically classified.
  • An example of database that can be used as input with the method of this invention is GenBank, from NCBI (Benson, D. A., Boguski, M. S., Lipman, D. J., Ostell, J. (1997). GenBank. Nucleic Acids Res. Jan 1; 25(1):1-6). This selected or constructed database is called “evaluation database”. From this database, the sequence subset corresponding to the organism(s) to be identified is extracted. This subset is called “design database”.
  • each sequence in the design database is optimally aligned to a reference sequence, which may be a gene that is homologous to the analyzed one, using the Needleman-Wunsch algorithm (Needleman, S. B., Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins . J Mol Biol. Mar; 48(3):443-53). In said case, these aligned sequences form the design database.
  • the oligonucleotide set therein contained is established.
  • This oligonucleotide set is built considering each subsequence, hereinafter called ‘words’, that has a defined size (typically between 18 and 50 letters) and is contained in each subsequence and also in the subsequences that are complementary reverse to them.
  • the words that are present more than once in some sequence are discarded, considering also a number of substitutions within the word, which typically could be up to 15% of the letters contained in the word. For instance, in a 20-letter word, 15% corresponds to 3 substitutions, so if a word having length 20 is coincident in 17 or more letters with another word of the same sequence, both words are discarded.
  • This procedure is efficiently performed if the hereinbelow described algorithm is followed, taking as inputs the design database, the size of the oligonucleotides being designed (n) and the maximum number of allowed letter substitutions (u).
  • the selection of candidate oligonucleotides is performed by taking into account all subsequences of defined length that are present in the target sequences and their corresponding reverse-complementary sequences.
  • Each word or subsequence generated by the described algorithm is called “candidate oligonucleotide”.
  • a large quantity of candidate oligonucleotides is obtained, which are submitted to the selection criteria of the method.
  • the first evaluation is the determination of the Gibbs free energy for the smallest energy secondary structure. This means that Gibbs free energy difference is calculated for all the spatial conformations where the oligonucleotide hybridizes with itself, until the structure with the smallest energy difference, i.e. the most stable structure, is found. If this value, which is called ⁇ Gh min as it defines the Gibbs free energy difference for hairpin formation of the oligonucleotide, is smaller than a threshold value, defined in a first attempt as the best quartile, the candidate oligonucleotide is discarded.
  • ⁇ Gh min a larger ⁇ Gh min could be selected, which may be, e.g., ⁇ 7 kcal/mole.
  • the ⁇ Gh min threshold value preferably used according to the invention is ⁇ 1.5 kcal/mole.
  • PCR polymerase chain reaction
  • FISH fluorescent in situ hybridization
  • oligonucleotides that have not been discarded in the former stage are oligonucleotides designed by the method and are useful in molecular biology procedures.
  • the evaluation database is analyzed looking for each oligonucleotide, registering the taxonomical group of the sequence in which the oligonucleotide appears.
  • This operation generates, for each oligonucleotide, a table indicating the number of sequences belonging to each taxonomical group to which said oligonucleotide hybridizes.
  • This table allows the calculation of two taxonomical specificity indexes given the target taxonomical group for each oligonucleotide, said indexes being described as follows:
  • Sensitivity T/N
  • the target taxonomical group is Escherichia coli
  • the oligonucleotide allows the identification of 75% of the sequences belonging to E. coli . Of all the recognized sequences, 50% belong to E. coli.
  • oligonucleotides for identification procedures should simultaneously maximize both indexes.
  • the following step in the method of this invention is the selection of the oligonucleotide with the largest Sensitivity and Selectivity, simultaneously. This can be achieved by, for example, obtaining the product of both indexes for each oligonucleotide and choosing the largest value thus obtained; this product is called “Rate”.
  • oligonucleotides designed and selected according to this method are useful in molecular biology procedures intended to determine the presence of a target taxonomical group in a complex sample. Generally, they are produced by chemical synthesis and could be labeled by any known labeling technique, e.g. radioactive, fluorescent or chemiluminiscent labeling.
  • PCR polymerase chain reaction
  • this complementary stage requires the definition of a maximum and minimum size for the desired PCR product and a limit for the melting temperature (Tm) difference between both oligonucleotides.
  • oligonucleotides designed according to Algorithm 1 are considered.
  • oligonucleotides pairs or primers formed by oligonucleotides that hybridize to the sequence are considered, in such a way that the first oligonucleotide hybridizes to the forward strand and the second oligonucleotide hybridizes to the reverse strand.
  • the “amplification size” is calculated as the difference between the hybridization positions of the second oligonucleotide minus the first one. Pairs of oligonucleotides having amplification sizes outside the pre-established range are discarded.
  • Tm is calculated using the method described by Le Novére N. (2001). MELTING, computing the melting temperature of nucleic acid duplex . Bioinformatics. 2001 Dec; 17(12):1226-7.
  • thermodynamic stability of the oligonucleotide pair is evaluated by determining the minimal energy structure formed by both oligonucleotides when hybridizing each other. If this energy, which is called ⁇ Gx min as it defines the Gibbs free energy difference for cross-hybridization between both oligonucleotides, is smaller than a threshold value, defined in a first attempt as the best quartile, the oligonucleotide is discarded. In all cases, such threshold should not be lower than ⁇ 12 kcal/mole.
  • Oligonucleotides pairs or primers that fulfill size restrictions for the amplification product, melting temperature difference restrictions and thermodynamic stability restrictions should be evaluated according to their taxonomical specificity.
  • the abovementioned Selectivity and Sensitivity parameters are evaluated but for each oligonucleotide pair member.
  • An oligonucleotide pair is considered to hybridize to a target sequence if both oligonucleotides hybridize to said sequence. That is, the set of sequences to which the oligonucleotide pair hybridizes is the intersection of the sets of sequences to which each oligonucleotide hybridizes.
  • the following described algorithm allows the identification of oligonucleotides pairs or primers that satisfy the requirements described above. It should be taken into account that for each of them the strand to which it hybridizes (1 or ⁇ 1) and the melting temperature, called Tm, has already been determined in the oligonucleotide design stage.
  • the oligonucleotide pair that maximizes the “Rate” parameter is selected.
  • the oligonucleotides pairs or primers designed and selected according to this method are useful in molecular biology procedures, such as PCR, intended to determine the presence of a target taxonomical group in a complex sample.
  • Example 1 Design of a specific oligonucleotide for bacteria belonging to Leptospirillum genus.
  • a new database was obtained with data comprising only 16S sequences selected from the public NCBI GenBank database. This new database is the “evaluation database”.
  • the set of 20-letter oligonucleotides that are present in each of the sequences was determined, discarding those sequences appearing more than once within each sequence, considering up to 3 substitutions, using Algorithm 1.
  • These oligonucleotides are the “candidate oligonucleotides”, which were evaluated according to their thermodynamic stability using the algorithm described in M. Zuker. (2003) Mfold web server for nucleic acid folding and hybridization prediction , Nucleic Acids Res. 31 (13), 3406-15. All candidate oligonucleotides with ⁇ Gh values lower than ⁇ 1.5 kcal/mole or with ⁇ Gd values lower than ⁇ 7 kcal/mole were discarded.
  • Oligonucleotide 1 TACAGACTCTTTACGCCCAG
  • Oligonucleotide 2 CTACAGACTCTTTACGCCCA
  • Oligonucleotide 3 CCTACAGACTCTTTACGCCC
  • Oligonucleotide 4 ACCTACAGACTCTTTACGCC
  • Oligonucleotide 5 CACCTACAGACTCTTTACGC
  • Oligonucleotide 6 CCACCTACAGACTCTTTACG
  • Oligonucleotide 7 CTGGGCGTAAAGAGTCTGTA
  • Oligonucleotide 8 TGGGCGTAAAGAGTCTGTAG
  • Oligonucleotide 9 GGGCGTAAAGAGTCTGTAGG
  • Oligonucleotide 10 GGCGTAAAGAGTCTGTAGGT
  • Oligonucleotide 11 GCGTAAAGAGTCTGTAGGTG Oligonucleotide
  • oligonucleotides 1 and 7 were selected as best candidates. Both were synthesized, fluorescently labeled with Cy5 and used as probes to identify bacteria belonging to Leptospirillum genus in a metagenomic sample by using the FISH technique. To assess that what was detected corresponds only to Leptospirillum , controls were carried out with pure Leptospirillum ferrooxidans cultures, and a specific detection was found.
  • Example 3 Design of a specific oligonucleotide pair for bacteria belonging to Sulfobacillus thermosulfidooxidans species.
  • Two sets of oligonucleotides that have 19 to 21 letters present in each of the sequences were determined, discarding those sequences appearing more than once within each sequence, considering up to 3 substitutions, using Algorithm 1. All oligonucleotides with a ⁇ Gh value lower than ⁇ 1.5 kcal/mole and with a ⁇ Gd value lower than ⁇ 7 kcal/mole were discarded. The first set considers all sub-sequences with 19 to 21 nucleotides in the target sequences and the second set considers the corresponding reverse-complementary sequences. Then, oligonucleotides pairs or primers that have hybridization sites with 200-500 nucleotides between them were determined.
  • the described method has been used for the design of many primer pairs specific for different taxons, like Acidithiobacillus thiooxidans, Acidithiobacillus ferrooxidans, Leptospirillum sp., Acidiphillum sp.
  • Table 3 thermodynamic and specificity requirements are shown corresponding to 4 oligonucleotides pairs or primers that were designed using the method of the invention to perform a specific PCR for each microorganism indicated in said Table, namely A. ferrooxidans, A. thiooxidans, Leptospirillum sp. y Acidiphillum sp.
  • These oligonucleotides are useful as PCR primers for said taxons.
  • Table 4 shows sequences of selected specific primers.
  • TABLE 3 Sense Antisense Size Dimer Specificity Primer T m ⁇ G d ⁇ G h Primer T m ⁇ G d ⁇ G h Min Max ⁇ G x Sensit Select Rate Org Sp
  • ferrooxidans Primers TH674.5F 62 ⁇ 3.44 1.53 TH1116.11R 60 ⁇ 3.23 0.57 444 444 ⁇ 9.98 4.00 100.00 4.00 1 1
  • A. thiooxidans Primers TH1143.9F 68 ⁇ 2.84 ⁇ 0.79 TH1393.1R 62 ⁇ 2.18 0.36 250 250 ⁇ 7.53 20.00 100.00 20.00 1 1 Leptospirillum sp.
  • the method of the present invention was used to design specific primers for Acidithiobacillus thiooxidans to be used in a metagenomic sample.
  • Primer pairs were designed using the described method and the 4 primer pairs having the best “Rate” indexes were selected.
  • PCR tests were carried out using 16S rDNA from 5 Acidithiobacillus ferrooxidans samples and 2 Acidithiobacillus thiooxidans samples, which were amplified using each designed primer pair.
  • FIG. 1 shows PCR results, the lanes in FIG. 1 having the following load: TABLE 5 Lane Sample 1 A. ferrooxidans DSM 16786 2 A. ferrooxidans ATCC23270 3 A. ferrooxidans DSM 14882 4 A. ferrooxidans ATCC 19859 5 A. ferrooxidans ATCC33020 6 A. thiooxidans sp. 7 A. thiooxidans DSM 504 (-) Sterile water
  • primer Pair C specifically amplifies A. thiooxidans , with no amplification of A. ferrooxidans . This means that Pair C could allow the specific determination of the presence of A. thiooxidans in a metagenomic sample, even when A. ferrooxidans is present in said sample, as is usually the case.

Abstract

The present invention discloses a method that can be used to identify one DNA sequence or one specific group of DNA sequences from a complex biological sample. Diverse molecular biology methods require the use of short DNA sequences, called oligonucleotides, that are artificially synthesized from a description of their composing bases. The disclosed method allows the design of oligonucleotides useful for said molecular biology procedures, like probe design procedures, and is characterized by the construction of a database of reference sequences, the selection of a subset of sequences belonging to target organisms, the selection of candidate oligonucleotides from such sequences, the depuration of these candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria, and the sorting of such oligonucleotides according to their taxonomic specificity. In a second aspect, a method is disclosed to design oligonucleotides pairs or primers, which are required in certain molecular biology techniques, like polymerase chain reaction (PCR) techniques. This method is similar to the first aspect of the invention, but thermodynamically compatible oligonucleotides pairs or primers that hybridize to the same sequence at a distance which is within a given range are evaluated.

Description

    FIELD OF THE INVENTION
  • The present invention discloses a method for the design of oligonucleotides useful in molecular biology techniques as PCR primers or in other techniques as identification and/or quantification probes. Specially, a method is disclosed to design specific oligonucleotides for the identification of a determined sequence in a metagenomic sample.
  • BACKGROUND OF THE INVENTION
  • Many methods in molecular biology require the use of short DNA sequences (oligonucleotides) satisfying given physicochemical and biological requirements to assess the presence of a certain organism or group of organisms. Among these methods, fluorescent in situ hybridization (FISH), denaturing gradient gel electrophoresis (DGGE), conjugation with specific markers, like detection or quantification probes for certain microorganisms, genes or sequences, and polymerase chain reaction (PCR), where two oligonucleotides are used as primers for the reaction, could be mentioned. This invention could be applied in said cases or in other cases wherein specific oligonucleotides are required.
  • Usually, oligonucleotides are artificially synthesized according to the description of their composing bases. The determination of the specific sequences that are suitable for each particular procedure is called “oligonucleotide design”. According to the involved procedure, certain thermodynamic restrictions could limit the set of valid oligonucleotides. Oligonucleotides resulting from this design procedure will be completely determined by the nucleotide sequences used in their synthesis, which could be characterized as words having finite length in the alphabet {A, C, T, G}.
  • Traditional oligonucleotide design methods, among which Primer3 (Rozen S., Skaletsky, H. (2000). Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, N.J., pp 365-386) can be mentioned, allow the design of oligonucleotides pairs or primers for PCR amplification, validating a series of thermodynamic requirements. However, these methods only allow the design of oligonucleotides for a particular sequence, not considering the case where many sequences from different organisms are to be recognized. The traditionally used method in this case requires performing a multiple alignment of all the sequences that are to be recognized, by means of a computer program as CLUSTALW (Higgins D., Thompson J., Gibson T. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680). This alignment allows the determination of conserved regions among all the sequences to be recognized and therefore the design of oligonucleotides within these regions. However, the performance of these alignments is expensive and could be prohibitive when the number of sequences is large. Moreover, multiple alignments require the determination of penalty parameters derived from some evolutionary model of the sequences. The result depends on the values chosen for these penalties and may not be robust when confronted to small changes in these values.
  • Among other methods for oligonucleotide design that have been developed in the last years, document US2003097223 (Nakae & Ihara, 22/05/2003) could be mentioned, for instance, which protects a new primer design method. This method automatically designs primer pairs and then these primer pairs are selected according to certain requirements, namely oligonucleotide length, GC content percentage and Tm (melting temperature). Besides the basic aspects in primer design, well-known for someone skilled in the art, the method of the present invention considers a thermodynamic analysis of the designed primers, which adds an advantage over the method described in US2003097223 as the stability of the designed primers is guaranteed, improving the success probabilities of the use of said primers. Another different aspect between the former document and the invention herein disclosed is the fact that said document points to the finding of primers useful for many exons of a genome, whereas in one aspect of the invention all the microorganisms belonging to certain taxon are to be amplified; this fact constitutes a difference by itself, but the strategy used in both cases to find primers or oligonucleotides that could recognize more than one template is also different in both cases: in document US2003097223 a plurality of primers is designed (indicated as step 701) using bioinformatics means from a data base comprising different exons (step 700), and then PCR amplified DNA fragments are analyzed together with the designed primers, and primers amplifying target exons are empirically determined. Inversely, in the present invention primers present in the maximum number of target sequences are identified from the design database (which includes the target sequences) and primers to be synthesized and used are chosen based on this information.
  • Another document belonging to a related field in the art is the paper of Wang and Seed: “A PCR primer bank for quantitative gene expression analysis”, Nucleic Acids Research, 2003, vol. 31, N0. 24 e154, where an algorithm is validated for the identification of specific transcription primers for PCR; the authors have created an online database with primers that fulfill said requirements for human and mice genes. The algorithm described by Wang and Seed significantly differs form the method proposed in the present invention, firstly because it does not contemplate the possibility of choosing an oligonucleotide or a primer pair common to a determined taxon, but specific primers are chosen for only one target sequence, and secondly because in the oligonucleotide selection procedure ΔG is evaluated only for the last 5 residues at the 3′ end of the molecule and the candidate is rejected when such value is less than −9 kcal/mole. In the present invention, ΔG is evaluated for all the candidate oligonucleotides and the selection criteria is much stringent, as preferentially only oligonucleotides having ΔGhmin equal to −1.5 kcal/mole (ΔG for hairpin formation) are selected. In order to predict the formation of hairpins in the referred paper, sequence auto-complementarity is evaluated and only 5 non-contiguous matches are allowed. In the same way, to avoid the formation of primer dimers the presence of complementary sequences in 4 residues at the 3′ end of the molecule in the same primer (to avoid dimers) and in the other primer (to avoid cross-reactivity) is evaluated. In the present invention, secondary structure formation is faced in a different and more efficient way than the simple sequence complementarity comparison; in this case, differences in Gibbs free energies are evaluated for all possible conformations and the probability of each selected oligonucleotide to form secondary structures is determined based on the most stable conformation.
  • As can be appreciated, the method of the invention shows indisputable technical advantages over other existent methods in the state of the art.
  • In summary, up to this date no oligonucleotide design method has been disclosed being fast and economical and allowing the design of specific oligonucleotides for a target sequence when said sequence is part of a metagenomic sample or allowing the design of oligonucleotides that simultaneously recognize various sequences belonging to different organisms.
  • In this disclosure, said problems of the existing technique have been solved, creating a method for the design of specific oligonucleotides for a given sequence or group of sequences, that considers not only the information of the genetic material to be identified but also the information of all the genetic material that could be present in a metagenomic sample over which the method will be applied.
  • Another common problem in the field of oligonucleotide design is the fact that even when an oligonucleotide meeting the required specificity could be available, in practice of molecular biology procedures said oligonucleotide is not efficient. Explanations for this inefficiency are formation of secondary structures within the oligonucleotide sequence (hairpins) or auto-hybridization, which decreases the active concentration of the oligonucleotide in the reaction mix. In the case of PCR technique, where an oligonucleotide pair is simultaneously used, a cross-hybridization between both oligonucleotides could be possible, besides auto-hybridization and hairpin formation, which also sequesters oligonucleotides in the reaction mix and makes said reaction inefficient.
  • In order to overcome this technical problem, the method of the invention includes a step wherein the designed oligonucleotides are thermodynamically evaluated to discard formation of hairpins, auto-hybridization or cross-hybridization between two primers. For each of these situations, Gibbs free energy differences are calculated for all the possible conformations, the most stable conformation being selected; if said most stable conformation has a ΔG value less than a certain threshold, said oligonucleotide is discarded, thus guaranteeing the availability of the designed oligonucleotides.
  • Thus, the method of the present invention allows solving all the problems existing in the field of oligonucleotide design for Molecular Biology techniques.
  • SUMMARY OF THE INVENTION
  • As previously described, the present invention discloses a method that can be used to identify one DNA or RNA sequence or one specific group of DNA or RNA sequences from a complex biological sample.
  • Diverse molecular biology methods require the presence of short DNA sequences, called oligonucleotides, that are artificially synthesized from a description of their composing bases.
  • The oligonucleotide design method comprises the selection or construction of a database of reference sequences, the selection of a subset of sequences belonging to target organisms, the selection of candidate oligonucleotides from such sequences, the depuration of these candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria, which allows to obtain a list of designed oligonucleotides and, optionally, the sorting of such oligonucleotides according to their taxonomical specificity.
  • The extension of this method to the case in which oligonucleotides pairs are required is also disclosed, as could be the case of polymerase chain reaction (PCR) procedures. This method variant comprises the construction or selection of a database of reference sequences, the selection of a subset of sequences belonging to target organisms, the selection of two sets of candidate oligonucleotides from such sequences, the depuration of each set of candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria, the elaboration of a list of oligonucleotides pairs or primers formed by one element from each set that satisfy physical and thermodynamic requirements and the sorting of such oligonucleotides pairs according to taxonomical specificity.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows the results of a PCR performed with oligonucleotides pairs or primers designed using the method of the invention to carry out a specific PCR for A. thiooxidans amplifying 16S rDNA from 5 samples of A. ferrooxidans and 2 samples of A. thiooxidans; the content of each lane is specified in the Examples section, in Table 5.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Oligonucleotide Design.
  • The method for the design of oligonucleotides herein described takes a database of DNA or RNA sequences as an input. Depending on experimental requirements being considered, these sequences may be complete genomes or fragments from each genome. For instance, all known sequences of a given gene or genomic region could be considered. In a preferred embodiment of the present invention, a database designed by us is considered, which contains all known sequences of gene 16S. A requirement to be met by the database under use is that every sequence must have been taxonomically classified. An example of database that can be used as input with the method of this invention is GenBank, from NCBI (Benson, D. A., Boguski, M. S., Lipman, D. J., Ostell, J. (1997). GenBank. Nucleic Acids Res. Jan 1; 25(1):1-6). This selected or constructed database is called “evaluation database”. From this database, the sequence subset corresponding to the organism(s) to be identified is extracted. This subset is called “design database”.
  • In some cases, considering the fact that partial sequences of the target genes could be found in public databases, it is convenient to normalize the relative positions within each sequence, and so each sequence in the design database is optimally aligned to a reference sequence, which may be a gene that is homologous to the analyzed one, using the Needleman-Wunsch algorithm (Needleman, S. B., Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. Mar; 48(3):443-53). In said case, these aligned sequences form the design database.
  • Once the design database has been defined, the oligonucleotide set therein contained is established. This oligonucleotide set is built considering each subsequence, hereinafter called ‘words’, that has a defined size (typically between 18 and 50 letters) and is contained in each subsequence and also in the subsequences that are complementary reverse to them. The words that are present more than once in some sequence are discarded, considering also a number of substitutions within the word, which typically could be up to 15% of the letters contained in the word. For instance, in a 20-letter word, 15% corresponds to 3 substitutions, so if a word having length 20 is coincident in 17 or more letters with another word of the same sequence, both words are discarded. This procedure is efficiently performed if the hereinbelow described algorithm is followed, taking as inputs the design database, the size of the oligonucleotides being designed (n) and the maximum number of allowed letter substitutions (u). The selection of candidate oligonucleotides is performed by taking into account all subsequences of defined length that are present in the target sequences and their corresponding reverse-complementary sequences.
  • Algorithm 1
  • For each sequence Si in the design database
      • Consider each word Pij being a previously unseen subsequence of Si with length n
      • For each sequence Sk, Ck is defined as the number of times the oligonucleotide appears within each candidate sequence, wherein subscript k is the sequence number.
        • Ck←0
        • For each word Pkl, being a subsequence of Sk with length n
          • If word Pkl, coincides with Pij in at least n-u letters
            • Ck←Ck+1
          • If word Pkl, coincides with Pij in exactly n letters
            • Mark word Pkl, as previously seen
            • Remember that Pij hybridizes with Sk on strand +1
        • For each word P′kl, being a reverse-complementary sequence of Pkl
          • If word P′kl, coincides with Pij in at least n-u letters
            • Ck←Ck+1
          • If word P′kl, coincides with Pij in exactly n letters
            • Mark word Pkl as previously seen
            • Remember that Pij hybridizes with Sk on strand −1
        • If Ck is greater than 1
          • Mark word Pij as discarded
      • If Pij is not discarded
        • Print word Pij as oligonucleotide candidate on strand +1
      • Consider each word P′ij being a previously unseen reverse-complementary sequence of Pij
      • For each sequence Sk
      • Ck←0
        • For each word Pk being a subsequence of Sk with length n
          • If word Pk coincides with P′ij in at least n-u letters
            • Ck←Ck+1
          • If word Pkl coincides with P′ij in exactly n letters
            • Mark word Pkl as previously seen
            • Remember that P′ij hybridizes with Sk on strand +1
        • For each word P′kl being a reverse-complementary sequence of Pkl
          • If word P′kl coincides with P′ij in at least n-u letters
            • Ck←Ck+1
          • If word P′kl coincides with P′ij in exactly n letters
            • Mark word P′kl as previously seen
            • Remember that P′ij hybridizes with Sk on strand −1
        • If Ck is greater than 1
          • Mark word P′ij as discarded
      • If P′ij is not discarded
        • Print word P′ij as oligonucleotide candidate on strand −1
  • Each word or subsequence generated by the described algorithm is called “candidate oligonucleotide”. In this first attempt a large quantity of candidate oligonucleotides is obtained, which are submitted to the selection criteria of the method.
  • These candidates are then evaluated by their thermodynamic stability. The first evaluation is the determination of the Gibbs free energy for the smallest energy secondary structure. This means that Gibbs free energy difference is calculated for all the spatial conformations where the oligonucleotide hybridizes with itself, until the structure with the smallest energy difference, i.e. the most stable structure, is found. If this value, which is called ΔGhmin as it defines the Gibbs free energy difference for hairpin formation of the oligonucleotide, is smaller than a threshold value, defined in a first attempt as the best quartile, the candidate oligonucleotide is discarded. If it is desired to reduce even more the number of candidate oligonucleotides, a larger ΔGhmin could be selected, which may be, e.g., −7 kcal/mole. The ΔGhmin threshold value preferably used according to the invention is −1.5 kcal/mole.
  • For certain procedures where designed oligonucleotide concentration would be too high, as in polymerase chain reaction (PCR) or fluorescent in situ hybridization (FISH), a second validation should be performed, which requires the evaluation of the smallest Gibbs free energy of all the structures formed by two copies of the candidate oligonucleotide. Analogously, if this energy does not surpass a threshold value for ΔGdmin, which defines the Gibbs free energy difference for the formation of oligonucleotide dimers, the oligonucleotide is discarded. In a first approach, the threshold is defined as the best quartile and, if a stricter bound for the oligonucleotide number is desired, a larger ΔGdmin can be selected. The ΔGdmin threshold value preferentially used according to this invention is −7 kcal/mole.
  • Methods to calculate these minimal energies are well known and have been described in literature, for instance:
  • Bommarito S., Peyret N., SantaLucia J. Jr. (2000). Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res. May 1; 28(9):1929-34. D. H. Mathews, J. Sabina, M. Zuker & D. H. Turner. (1999) Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J. Mol. Biol. 288, 911-940.
  • M. Zuker. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15.
  • Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, and Peter Schuster (1994). Fast Folding and Comparison of RNA Secondary Structures. Monatsh. Chem. 125: 167-188.
  • All the oligonucleotides that have not been discarded in the former stage are oligonucleotides designed by the method and are useful in molecular biology procedures.
  • Identification of Taxonomical Groups With Designed Oligonucleotides.
  • In identification procedures, it is desired to mark the presence of a specific taxonomical group in the sample. For this purpose, the evaluation database is analyzed looking for each oligonucleotide, registering the taxonomical group of the sequence in which the oligonucleotide appears. This operation generates, for each oligonucleotide, a table indicating the number of sequences belonging to each taxonomical group to which said oligonucleotide hybridizes. This table allows the calculation of two taxonomical specificity indexes given the target taxonomical group for each oligonucleotide, said indexes being described as follows:
  • Let N be the number of sequences belonging to the target taxonomical group that are present in the evaluation database. Let T be the number of sequences belonging to the target taxonomical group to which said oligonucleotide hybridizes; and let R be the total number of sequences to which said oligonucleotide hybridizes. We use “Sensitivity” to designate the percentage or ratio of target sequences effectively found. That is:
    Sensitivity=T/N
  • Analogously, we use “Selectivity” to designate the percentage or ratio of found sequences belonging to the target group. That is:
    Selectivity=T/R
  • For instance, if the target taxonomical group is Escherichia coli, there are N=80 sequences in the evaluation database belonging to this species, and the oligonucleotide hybridizes to R=120 sequences, of which T=60 belong to E. coli, then the Sensitivity of this oligonucleotide is
    Sensitivity=T/N=60/80=0.75
  • Whereas the Selectivity is
    Selectivity=T/R=60/120=0.5
  • In other words, the oligonucleotide allows the identification of 75% of the sequences belonging to E. coli. Of all the recognized sequences, 50% belong to E. coli.
  • Most suitable oligonucleotides for identification procedures should simultaneously maximize both indexes. The following step in the method of this invention is the selection of the oligonucleotide with the largest Sensitivity and Selectivity, simultaneously. This can be achieved by, for example, obtaining the product of both indexes for each oligonucleotide and choosing the largest value thus obtained; this product is called “Rate”.
  • The following algorithm describes the procedure to calculate these indexes for an oligonucleotide, represented as 0, as a function of the number of letters forming O (represented by n) and the maximum number of permitted substitutions, represented by u:
    Algorithm 2
    Let be T
    Figure US20070059743A1-20070315-P00801
    0
    Let be N
    Figure US20070059743A1-20070315-P00801
    0
    Let be R
    Figure US20070059743A1-20070315-P00801
    0
    For each sequence Si in the reference database
    If Si belongs to the target taxonomical group
    N
    Figure US20070059743A1-20070315-P00801
    N+1
    For each word Pij being a subsequence of Si with length n
    If Pij coincides with O in more than n-u letters
    R
    Figure US20070059743A1-20070315-P00801
    R+1
    If Si belongs to the target taxonomical group
    T
    Figure US20070059743A1-20070315-P00801
    T+1
    Finally,
    Sensitivity
    Figure US20070059743A1-20070315-P00801
    T/N
    Selectivity
    Figure US20070059743A1-20070315-P00801
    T/R
    Rate
    Figure US20070059743A1-20070315-P00801
    T2/(N-R)
  • The oligonucleotides designed and selected according to this method are useful in molecular biology procedures intended to determine the presence of a target taxonomical group in a complex sample. Generally, they are produced by chemical synthesis and could be labeled by any known labeling technique, e.g. radioactive, fluorescent or chemiluminiscent labeling.
  • Design of Oligonucleotides Pairs or Primers.
  • Certain types of molecular biology procedures require the simultaneous presence of many different oligonucleotides. For instance, polymerase chain reaction (PCR) requires the presence of two oligonucleotides that satisfy certain requirements. The subject method of this invention is complemented in this case by the following steps.
  • Further to the abovementioned elements, this complementary stage requires the definition of a maximum and minimum size for the desired PCR product and a limit for the melting temperature (Tm) difference between both oligonucleotides. To start, oligonucleotides designed according to Algorithm 1 are considered. For each sequence in the design database, oligonucleotides pairs or primers formed by oligonucleotides that hybridize to the sequence are considered, in such a way that the first oligonucleotide hybridizes to the forward strand and the second oligonucleotide hybridizes to the reverse strand. The “amplification size” is calculated as the difference between the hybridization positions of the second oligonucleotide minus the first one. Pairs of oligonucleotides having amplification sizes outside the pre-established range are discarded.
  • For each oligonucleotide, Tm is calculated using the method described by Le Novére N. (2001). MELTING, computing the melting temperature of nucleic acid duplex. Bioinformatics. 2001 Dec; 17(12):1226-7.
  • Pairs of oligonucleotides having a melting temperature difference over the pre-established temperature difference, which is preferably less than 4° C., are discarded.
  • Once an oligonucleotide pair list fulfilling the established requirements is obtained using the described method, the thermodynamic stability of the oligonucleotide pair is evaluated by determining the minimal energy structure formed by both oligonucleotides when hybridizing each other. If this energy, which is called ΔGxmin as it defines the Gibbs free energy difference for cross-hybridization between both oligonucleotides, is smaller than a threshold value, defined in a first attempt as the best quartile, the oligonucleotide is discarded. In all cases, such threshold should not be lower than −12 kcal/mole.
  • The method used to calculate ΔGxmin is described in M. Zuker. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15.
  • Oligonucleotides pairs or primers that fulfill size restrictions for the amplification product, melting temperature difference restrictions and thermodynamic stability restrictions should be evaluated according to their taxonomical specificity. The abovementioned Selectivity and Sensitivity parameters are evaluated but for each oligonucleotide pair member. An oligonucleotide pair is considered to hybridize to a target sequence if both oligonucleotides hybridize to said sequence. That is, the set of sequences to which the oligonucleotide pair hybridizes is the intersection of the sets of sequences to which each oligonucleotide hybridizes.
  • Having been determined the set of sequences to which the oligonucleotide pair hybridizes, the corresponding Selectivity and Sensitivity indexes are calculated, and pairs that maximize both criteria are selected for the molecular biology procedure.
  • The following described algorithm allows the identification of oligonucleotides pairs or primers that satisfy the requirements described above. It should be taken into account that for each of them the strand to which it hybridizes (1 or −1) and the melting temperature, called Tm, has already been determined in the oligonucleotide design stage.
    Algorithm 3
    For each sequence Si in the design database
    For each oligonucleotide Fij that hybridizes to Si at
    position j of strand 1
    For each oligonucleotide Rik that hybridizes to Si
    at position k of strand −1
    Size
    Figure US20070059743A1-20070315-P00801
    k-j (size of the amplified region)
    If Size is within the specified range
    If Tm(Fij) is different from Tm(Rik) in
    no more than 2 degrees
    ΔGd
    Figure US20070059743A1-20070315-P00801
    Heterodimer free energy
    If ΔGd > ΔGdmin
    Let be T
    Figure US20070059743A1-20070315-P00801
    0
    Let be N
    Figure US20070059743A1-20070315-P00801
    0
    Let be R
    Figure US20070059743A1-20070315-P00801
    0
    For each sequence Sm in the reference database
    If Sm belongs to the target taxonomical group
    N
    Figure US20070059743A1-20070315-P00801
    N+1
    If Sm simultaneously contains Fij and Rik
    R
    Figure US20070059743A1-20070315-P00801
    R+1
    If Sm belongs to the target taxon
    T
    Figure US20070059743A1-20070315-P00801
    T+1
    Sensitivity
    Figure US20070059743A1-20070315-P00801
    T/N
    Selectivity
    Figure US20070059743A1-20070315-P00801
    T/R
    Rate
    Figure US20070059743A1-20070315-P00801
    T2/(N·R)
    Print Fij, Rik, Sensitivity, Selectivity, Rate
    The oligonucleotide pair that maximizes “Rate” is selected
  • The oligonucleotide pair that maximizes the “Rate” parameter is selected. The oligonucleotides pairs or primers designed and selected according to this method are useful in molecular biology procedures, such as PCR, intended to determine the presence of a target taxonomical group in a complex sample.
  • EXAMPLES
  • Example 1. Design of a specific oligonucleotide for bacteria belonging to Leptospirillum genus.
  • A new database was obtained with data comprising only 16S sequences selected from the public NCBI GenBank database. This new database is the “evaluation database”.
  • All sequences that come from bacteria belonging to Leptospirillum genus, 44 sequences in this case, were selected to be the “design database”.
  • The set of 20-letter oligonucleotides that are present in each of the sequences was determined, discarding those sequences appearing more than once within each sequence, considering up to 3 substitutions, using Algorithm 1. These oligonucleotides are the “candidate oligonucleotides”, which were evaluated according to their thermodynamic stability using the algorithm described in M. Zuker. (2003) Mfold web server for nucleic acid folding and hybridization prediction, Nucleic Acids Res. 31 (13), 3406-15. All candidate oligonucleotides with ΔGh values lower than −1.5 kcal/mole or with ΔGd values lower than −7 kcal/mole were discarded. This analysis provided a total of 14785 oligonucleotides that were present in at least one of the 44 sequences in the design database. None of them is present in all the design sequences. Oligonucleotides present in most of the sequences were considered. This reduced the list to 12 oligonucleotides, which are the oligonucleotides designed by the method and have the following structures:
    Oligonucleotide 1: TACAGACTCTTTACGCCCAG
    Oligonucleotide 2: CTACAGACTCTTTACGCCCA
    Oligonucleotide 3: CCTACAGACTCTTTACGCCC
    Oligonucleotide 4: ACCTACAGACTCTTTACGCC
    Oligonucleotide 5: CACCTACAGACTCTTTACGC
    Oligonucleotide 6: CCACCTACAGACTCTTTACG
    Oligonucleotide 7: CTGGGCGTAAAGAGTCTGTA
    Oligonucleotide 8: TGGGCGTAAAGAGTCTGTAG
    Oligonucleotide 9: GGGCGTAAAGAGTCTGTAGG
    Oligonucleotide 10: GGCGTAAAGAGTCTGTAGGT
    Oligonucleotide 11: GCGTAAAGAGTCTGTAGGTG
    Oligonucleotide 12: CGTAAAGAGTCTGTAGGTGG
  • Example 2
  • Identification of bacteria belonging to Leptospirillum genus in a metagenomic sample.
  • The reference database was searched looking for said 12 oligonucleotides designed in Example 1, and the following Sensitivity and Selectivity values were obtained:
    TABLE 1
    Oligo N T R Sensitivity Selectivity Rate
    1 54 37 44 84.1% 68.5% 57.6%
    2 68 37 44 84.1% 54.4% 45.8%
    3 66 37 44 84.1% 56.1% 47.1%
    4 58 37 44 84.1% 63.8% 53.6%
    5 57 37 44 84.1% 64.9% 54.6%
    6 56 37 44 84.1% 66.1% 55.6%
    7 54 37 44 84.1% 68.5% 57.6%
    8 68 37 44 84.1% 54.4% 45.8%
    9 66 37 44 84.1% 56.1% 47.1%
    10 58 37 44 84.1% 63.8% 53.6%
    11 57 37 44 84.1% 64.9% 54.6%
    12 56 37 44 84.1% 66.1% 55.6%

    N = number of sequences belonging to the target taxonomical group that are present in the evaluation database.

    T = number of sequences belonging to the target taxonomical group to which said oligonucleotide hybridizes; and

    R = total number of sequences to which said oligonucleotide hybridizes.
  • According to these results, oligonucleotides 1 and 7 were selected as best candidates. Both were synthesized, fluorescently labeled with Cy5 and used as probes to identify bacteria belonging to Leptospirillum genus in a metagenomic sample by using the FISH technique. To assess that what was detected corresponds only to Leptospirillum, controls were carried out with pure Leptospirillum ferrooxidans cultures, and a specific detection was found.
  • Example 3. Design of a specific oligonucleotide pair for bacteria belonging to Sulfobacillus thermosulfidooxidans species.
  • In the “evaluation database” obtained in Example 1, existing sequences for Sulfobacillus thermosulfidooxidans bacteria were selected, 8 sequences in this case, which form the “design database”.
  • Two sets of oligonucleotides that have 19 to 21 letters present in each of the sequences were determined, discarding those sequences appearing more than once within each sequence, considering up to 3 substitutions, using Algorithm 1. All oligonucleotides with a ΔGh value lower than −1.5 kcal/mole and with a ΔGd value lower than −7 kcal/mole were discarded. The first set considers all sub-sequences with 19 to 21 nucleotides in the target sequences and the second set considers the corresponding reverse-complementary sequences. Then, oligonucleotides pairs or primers that have hybridization sites with 200-500 nucleotides between them were determined. This primer pairs were evaluated according to their thermodynamic stability using the criteria described in M. Zuker. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15, and all pairs having a cross hybridization energy ΔGxmin lower than −12 kcal/mole were discarded. This analysis provided a total of 237,223 oligonucleotides pairs or primers that were present in the 8 sequences in the design database. These 237,223 oligonucleotides pairs or primers constitute the “candidate oligonucleotides pairs or primers” designed by the method.
  • For each of these pairs the taxonomical specificity was evaluated in terms of their “Sensitivity”, “Selectivity” and “Rate” indexes. The first 5 primer pairs selected according to these criteria are shown in Table 2.
    TABLE 2
    Sense Primer Antisense Primer
    Pair
    1 GCTTGGCAACAGGCGCTCA GGCTTCCTCCGTCGGTACCG
    Pair
    2 TGAGTGGGGGATATCGGGCC TTTGCAGGGGCTTCCTCCGT
    Pair
    3 AGGCGCTCACAGGGGAGCTC GCGGCTGCTGGCACGTAGTT
    Pair
    4 GGTGAGGAACACGTGAGTG CCGGAGGCTTAAAACCGCT
    Pair
    5 CGGGCTGTGAGTGGGGGAT GGGGCTTCCTCCGTCGGTA
  • Example 4.
  • Other Results.
  • Primer design for different taxons.
  • The described method has been used for the design of many primer pairs specific for different taxons, like Acidithiobacillus thiooxidans, Acidithiobacillus ferrooxidans, Leptospirillum sp., Acidiphillum sp. In Table 3 thermodynamic and specificity requirements are shown corresponding to 4 oligonucleotides pairs or primers that were designed using the method of the invention to perform a specific PCR for each microorganism indicated in said Table, namely A. ferrooxidans, A. thiooxidans, Leptospirillum sp. y Acidiphillum sp. These oligonucleotides are useful as PCR primers for said taxons.
  • Table 4 shows sequences of selected specific primers.
    TABLE 3
    Sense Antisense Size Dimer Specificity
    Primer Tm ΔGd ΔGh Primer Tm ΔGd ΔGh Min Max ΔGx Sensit Select Rate Org Sp
    A. ferrooxidans Primers
    TH674.5F 62 −3.44 1.53 TH1116.11R 60 −3.23 0.57 444 444 −9.98 4.00 100.00 4.00 1 1
    A. thiooxidans Primers
    TH1143.9F 68 −2.84 −0.79 TH1393.1R 62 −2.18 0.36 250 250 −7.53 20.00 100.00 20.00 1 1
    Leptospirillum sp. Primers
    LP1233.1F 60 −4.49 1.07 LP1472.1R 58 −4.88 1.47 241 241 −11.38 18.18 100.00 18.18 8 8
    Acidiphillum sp. Primers
    AP202.6F 62 −2.65 1.40 AP626.2R 62 −2.27 −1.18 373 373 −11.27 2.63 100.00 2.63 1 1
  • TABLE 4
    Primer Sequence
    TH674.5F GAATTCCAGGTGTAGCGGTG
    TH1116.11R AACCGCTGCAACTAAGGACA
    TH1143.9F GGGACTCAGTGGAGACCGCC
    TH1393.1R GTGTGACGGGCGGTGTGTA
    LP129.4F GATCTGCCCTGGAGATGGGG
    LP381.1R CGTTGCTGCGTCAGGGTTG
    AP112.1F GGTGAGTAACGCGTAGGAA
    AP363.1R TCGCCCATTGTCCAATATT
  • Design of specific primers for A. thiooxidans useful in a metagenomic sample.
  • In other study, the method of the present invention was used to design specific primers for Acidithiobacillus thiooxidans to be used in a metagenomic sample. Primer pairs were designed using the described method and the 4 primer pairs having the best “Rate” indexes were selected. PCR tests were carried out using 16S rDNA from 5 Acidithiobacillus ferrooxidans samples and 2 Acidithiobacillus thiooxidans samples, which were amplified using each designed primer pair.
  • PCR protocal used was as follows:
      • 1. −95° C. for 5 minutes
      • 2. −95° C. for 30 seconds
      • 3. −62° C. for 30 seconds
      • 4. −72° C. for 20 seconds
      • 5. −to step (2) 29 more times
      • 6. −10° C. until tubes were removed
  • FIG. 1 shows PCR results, the lanes in FIG. 1 having the following load:
    TABLE 5
    Lane Sample
    1 A. ferrooxidans DSM 16786
    2 A. ferrooxidans ATCC23270
    3 A. ferrooxidans DSM 14882
    4 A. ferrooxidans ATCC 19859
    5 A. ferrooxidans ATCC33020
    6 A. thiooxidans sp.
    7 A. thiooxidans DSM 504
    (-) Sterile water
  • As can be appreciated in FIG. 1, all the designed primer pairs amplified A. thiooxidans, whereas primer Pair C specifically amplifies A. thiooxidans, with no amplification of A. ferrooxidans. This means that Pair C could allow the specific determination of the presence of A. thiooxidans in a metagenomic sample, even when A. ferrooxidans is present in said sample, as is usually the case.

Claims (40)

1. Method for the design of oligonucleotides that can be used in molecular biology procedures, wherein it comprises the steps of:
(a) selecting or building a reference sequence database,
(b) selecting a subset of sequences corresponding to target organisms to build a reference database,
(c) selecting candidate oligonucleotides from these sequences,
(d) depurating these candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria,
(e) obtaining a list of oligonucleotides that fulfill all these requirements, which constitute the oligonucleotides designed by the method.
2. Method according to claim 1, wherein the reference database is built from a public database on step (a).
3. Method according to claim 2, wherein the reference database is built from a public database of 16S genes.
4. Method according to claim 1, wherein a reference database is built on step (b) from the public nucleotide sequence database defined in (a), from which sequences are selected corresponding to target genes.
5. Method according to claim 1, wherein a subset of target sequences is built on step (b) according to a taxonomical classification.
6. Method according to claim 1, wherein the selection of candidate oligonucleotides is performed on step (c) by taking into account all subsequences of defined length that are present in the target sequences and their reverse-complementary sequences.
7. Method according to claim 1, wherein, during the depuration of candidate oligonucleotides on step (d), an oligonucleotide is considered to be valid only when said oligonucleotide appears less than twice within each target sequence.
8. Method according to claim 7, wherein an oligonucleotide is considered to appear more than once if the same sequence is repeated or if the major part of the nucleotides belonging to two possible sequences is coincident.
9. Method according to claim 8, wherein the number of coincident oligonucleotides is 85% of the sequence or more.
10. Method according to claim 1, wherein, during the depuration of candidate oligonucleotides on step (d), an oligonucleotide is considered to be valid only when said oligonucleotide forms secondary structures that have a Gibbs free energy difference over a predetermined threshold.
11. Method according to claim 10, wherein the Gibbs free energy difference threshold is at least −7 kcal/mole.
12. Method according to claim 11, wherein the Gibbs free energy difference threshold is preferably −1,5 kca/mole.
13. Method according to claim 1, wherein, during the depuration of candidate oligonucleotides on step (d), an oligonucleotide is considered to be valid only when said oligonucleotide self-hybridizes forming a secondary structure that has a Gibbs free energy difference over a predetermined threshold.
14. Method according to claim 13, wherein the Gibbs free energy difference threshold is −7 kcal/mole.
15. Method according to claim 1, wherein it optionally includes the step of:
(f) sorting out the designed oligonucleotides according to their taxonomical specificity.
16. Method according to claim 15, wherein the sorting is carried out looking for the designed oligonucleotide in the database of step (a) and registering the number of sequences belonging to each taxonomical group to which said oligonucleotide hybridizes.
17. Method according to claim 15, wherein the method comprises sorting out the designed oligonucleotides according to their taxonomical specificity as represented by the Sensitivity of the oligonucleotide under consideration with respect to the group of target sequences, wherein the Sensitivity corresponds to the ratio between target sequences that were found and target sequences that exist in the database.
18. Method according to claim 15, wherein the sorting of candidate oligonucleotides is further carried out according to the Selectivity of the oligonucleotide under consideration with respect to the group of target sequences, wherein the Selectivity corresponds to the proportion of target sequences that were found belonging to the target taxon.
19. Method according to claim 15, wherein the sorting of candidate oligonucleotides is carried out according to the product between Sensitivity and Selectivity of the oligonucleotide under consideration with respect to the group of target sequences, being selected those oligonucleotides where the product is higher.
20. Method according to claim 15, wherein the method comprises the synthesis of the selected oligonucleotides, the labeling of said oligonucleotides using any known labeling technique, and the use of said oligonucleotides to identify the presence of microorganisms that belong to the target taxon in a metagenomic sample.
21. Use of an oligonucleotide designed using the method according to claim 15, wherein said oligonucleotide is usefull to identify the presence of given microorganisms in a metagenomic sample.
22. Method for the design of oligonucleotides pairs or primers that can be used in molecular biology procedures, wherein it comprises the steps of:
(a) selecting or building a reference sequence database,
(b) selecting a subset of sequences corresponding to target organisms,
(c) selecting two sets of candidate oligonucleotides from these sequences,
(d) depurating each set of candidate oligonucleotides according to hybridization specificity and thermodynamic stability criteria,
(e) obtaining a list of oligonucleotides pairs or primers formed by one element from each set satisfying said physical and thermodynamic requirements, which constitute the oligonucleotides pairs or primers designed by the method, and optionally
(f) sorting out these oligonucleotides pairs or primers according to their taxonomical specificity.
23. Method according to claim 22, wherein a reference database is built from a public database on step (a).
24. Method according to claim 23, wherein the reference database is built from a public database of 16S genes.
25. Method according to claim 22, wherein a reference database is built on step (b) from the public nucleotide sequence database defined in (a), from which sequences are selected corresponding to target genes.
26. Method according to claim 25, wherein a subset of target sequences is built on step (b) according to a taxonomical classification.
27. Method according to claim 22, wherein during the selection of two sets of candidate oligonucleotides on step (c) the first subset comprises all subsequences of defined length that are present in the target sequences and the second subset comprises their corresponding reverse-complementary sequences.
28. Method according to claim 22, wherein, during the deputation of candidate oligonucleotides on step (d), an oligonucleotide is considered to be valid only when said oligonucleotide appears less than twice within each target sequence.
29. Method according to claim 28, wherein an oligonucleotide is considered to appear more than once if the same sequence is repeated or if the major part of the nucleotides belonging to two possible sequences is coincident.
30. Method according to claim 29, wherein the number of coincident oligonucleotides is 85% of the sequence or more.
31. Method according to claim 22, wherein a set of oligonucleotides pairs or primers formed by one element from each set established on step (c) is selected on step (d) by assessing that the distance between the hybridization positions of the first oligonucleotide and the second oligonucleotide in a given sequence is within a pre-established range.
32. Method according to claim 31, wherein the selection of a set of oligonucleotides pairs or primers formed by one element from each set established on step (c) considers that the difference between the predicted melting temperatures (Tm) for both oligonucleotides is within a pre-established range.
33. Method according to claim 31, wherein the selection of a set of oligonucleotides pairs or primers formed by one element from each set established on step (c) considers that the most stable structure that could be formed by both oligonucleotides has a Gibbs free energy difference over a pre-established threshold.
34. Method according to claim 33, wherein the Gibbs free energy difference threshold is −12 kcal/mole.
35. Method according to claim 22, wherein the sorting by taxonomical specificity of the oligonucleotides pairs or primers designed using the method is determined by intersecting the sets of sequences in which each of the oligonucleotides of the pair hybridizes to the database of step (a).
36. Method according to claim 22, wherein the method optionally comprises the step of sorting out the designed oligonucleotides pairs or primers according to their taxonomical specificity as represented by the Sensitivity of the oligonucleotide pair under consideration with respect to the group of target sequences, wherein the Sensitivity corresponds to the ratio between target sequences that contain both oligonucleotides and target sequences that exist in the database.
37. Method according to claim 36, wherein the sorting of candidate oligonucleotides is further carried out according to the Selectivity of the oligonucleotides pairs or primers under consideration with respect to the group of target sequences, wherein the Selectivity corresponds to the proportion of target sequences that contain both oligonucleotides belonging to the target taxon.
38. Method according to claim 36, wherein the sorting of candidate oligonucleotides is carried out according to the product between Sensitivity and Selectivity of the oligonucleotides pairs or primers under consideration with respect to the group of target sequences, being selected those oligonucleotides where the product is higher.
39. Method according to claim 36, wherein the method comprises the synthesis of the selected oligonucleotides pairs or primers and the use of said oligonucleotides as PCR primers to detect the presence of microorganisms belonging to the target taxon in a metagenomic sample.
40. Use of an oligonucleotide pair designed using the method according to claim 36, wherein said oligonucleotide is useful to identify the presence of given microorganisms in a metagenomic sample.
US11/506,089 2005-08-17 2006-08-17 Method for the design of oligonucleotides for molecular biology techniques Expired - Fee Related US7853408B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CL2102-2005 2005-08-17
CL2005002102 2005-08-17

Publications (2)

Publication Number Publication Date
US20070059743A1 true US20070059743A1 (en) 2007-03-15
US7853408B2 US7853408B2 (en) 2010-12-14

Family

ID=40636643

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/506,089 Expired - Fee Related US7853408B2 (en) 2005-08-17 2006-08-17 Method for the design of oligonucleotides for molecular biology techniques

Country Status (8)

Country Link
US (1) US7853408B2 (en)
AR (1) AR054923A1 (en)
AU (1) AU2006203551B2 (en)
BR (1) BRPI0604215A (en)
FR (1) FR2889845A1 (en)
MX (1) MXPA06009317A (en)
PE (1) PE20070356A1 (en)
ZA (1) ZA200606828B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008119084A1 (en) * 2007-03-28 2008-10-02 The Children's Mercy Hospital Method for identifying and selecting low copy nucleic acid segments
WO2012122571A1 (en) * 2011-03-10 2012-09-13 Gen-Probe Incorporated Methods and compositions for the selection and optimization of oligonucleotide tag sequences
US20140194316A1 (en) * 2010-03-04 2014-07-10 Miacom Diagnostics Gmbh Enhanced multiplex fish
WO2020175966A3 (en) * 2019-02-28 2020-11-26 Seegene, Inc. Methods for determining a designable region of oligonucleotides
WO2024015860A3 (en) * 2022-07-15 2024-03-21 The Regents Of The University Of Michigan Analyte detection using fluorogenic probes or multiplex technologies

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2379595A2 (en) 2008-12-23 2011-10-26 AstraZeneca AB Targeted binding agents directed to 5 1 and uses thereof
EP4276116A3 (en) 2015-04-17 2024-01-17 Amgen Research (Munich) GmbH Bispecific antibody constructs for cdh3 and cd3
TWI829617B (en) 2015-07-31 2024-01-21 德商安美基研究(慕尼黑)公司 Antibody constructs for flt3 and cd3
TWI744242B (en) 2015-07-31 2021-11-01 德商安美基研究(慕尼黑)公司 Antibody constructs for egfrviii and cd3
TWI796283B (en) 2015-07-31 2023-03-21 德商安美基研究(慕尼黑)公司 Antibody constructs for msln and cd3
EP3411404B1 (en) 2016-02-03 2022-11-09 Amgen Research (Munich) GmbH Psma and cd3 bispecific t cell engaging antibody constructs
KR20180103084A (en) 2016-02-03 2018-09-18 암젠 리서치 (뮌헨) 게엠베하 BCMA and CD3 bispecific T cell engrafting antibody constructs
MX2022014636A (en) 2020-05-19 2023-02-23 Amgen Inc Mageb2 binding constructs.
IL300314A (en) 2020-10-08 2023-04-01 Affimed Gmbh Trispecific binders
JP2023547662A (en) 2020-11-06 2023-11-13 アムジェン リサーチ (ミュニック) ゲゼルシャフト ミット ベシュレンクテル ハフツング Polypeptide constructs that selectively bind to CLDN6 and CD3
AU2021374839A1 (en) 2020-11-06 2023-06-08 Amgen Inc. Multitargeting bispecific antigen-binding molecules of increased selectivity
WO2022096704A1 (en) 2020-11-06 2022-05-12 Amgen Inc. Antigen binding domain with reduced clipping rate
BR112023008670A2 (en) 2020-11-06 2024-02-06 Amgen Inc POLYPEPTIDE CONSTRUCTS LINKED TO CD3
AR125290A1 (en) 2021-04-02 2023-07-05 Amgen Inc MAGEB2 JOINING CONSTRUCTIONS
EP4334358A1 (en) 2021-05-06 2024-03-13 Amgen Research (Munich) GmbH Cd20 and cd22 targeting antigen-binding molecules for use in proliferative diseases
AU2022320948A1 (en) 2021-07-30 2024-01-18 Affimed Gmbh Duplexbodies
CA3233696A1 (en) 2021-11-03 2023-05-11 Joachim Koch Bispecific cd16a binders
WO2023079493A1 (en) 2021-11-03 2023-05-11 Affimed Gmbh Bispecific cd16a binders
WO2023218027A1 (en) 2022-05-12 2023-11-16 Amgen Research (Munich) Gmbh Multichain multitargeting bispecific antigen-binding molecules of increased selectivity
WO2024059675A2 (en) 2022-09-14 2024-03-21 Amgen Inc. Bispecific molecule stabilizing composition

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030097223A1 (en) * 1999-12-14 2003-05-22 Hitachi, Ltd. Primer design system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030097223A1 (en) * 1999-12-14 2003-05-22 Hitachi, Ltd. Primer design system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008119084A1 (en) * 2007-03-28 2008-10-02 The Children's Mercy Hospital Method for identifying and selecting low copy nucleic acid segments
US20140194316A1 (en) * 2010-03-04 2014-07-10 Miacom Diagnostics Gmbh Enhanced multiplex fish
US11866769B2 (en) * 2010-03-04 2024-01-09 Miacom Diagnostics Gmbh Enhanced multiplex fish
WO2012122571A1 (en) * 2011-03-10 2012-09-13 Gen-Probe Incorporated Methods and compositions for the selection and optimization of oligonucleotide tag sequences
EP3498864A1 (en) * 2011-03-10 2019-06-19 Gen-Probe Incorporated Methods and compositions for the selection and optimization of oligonucleotide tag sequences
US10385476B2 (en) 2011-03-10 2019-08-20 Gen-Probe Incorporated Methods and compositions for the selection and optimization of oligonucleotide tag sequences
USRE48732E1 (en) 2011-03-10 2021-09-14 Gen-Probe Incorporated Methods and compositions for the selection and optimization of oligonucleotide tag sequences
WO2020175966A3 (en) * 2019-02-28 2020-11-26 Seegene, Inc. Methods for determining a designable region of oligonucleotides
WO2024015860A3 (en) * 2022-07-15 2024-03-21 The Regents Of The University Of Michigan Analyte detection using fluorogenic probes or multiplex technologies

Also Published As

Publication number Publication date
ZA200606828B (en) 2008-03-26
MXPA06009317A (en) 2008-10-10
PE20070356A1 (en) 2007-04-05
US7853408B2 (en) 2010-12-14
BRPI0604215A (en) 2007-04-10
AU2006203551B2 (en) 2011-06-02
FR2889845A1 (en) 2007-02-23
AR054923A1 (en) 2007-07-25
AU2006203551A1 (en) 2007-03-08

Similar Documents

Publication Publication Date Title
US7853408B2 (en) Method for the design of oligonucleotides for molecular biology techniques
McLoughlin Microarrays for pathogen detection and analysis
Wolf et al. Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures
He et al. Empirical establishment of oligonucleotide probe design criteria
Marchais et al. Single-pass classification of all noncoding sequences in a bacterial genome using phylogenetic profiles
US20050136427A1 (en) Methods for controlling cross-hybridization in analysis of nucleic acid sequences
US20110105346A1 (en) Universal fingerprinting chips and uses thereof
CA2597947C (en) Methods of genetic analysis involving the amplification of complementary duplicons
Li et al. In search of RNase P RNA from microbial genomes
US6892141B1 (en) Primer design system
Qu et al. Selecting specific PCR primers with MFEprimer
US20150324518A1 (en) Genetic Affinity of Microorganisms and Viruses
US7565248B2 (en) Computer system for designing oligonucleotides used in biochemical methods
EP1136932B1 (en) Primer design system
US7085652B2 (en) Methods for searching polynucleotide probe targets in databases
Mouratidis et al. Nucleic Quasi-Primes: Identification of the Shortest Unique Oligonucleotide Sequences in a Species
Chen et al. Design of multiplex PCR primers using heuristic algorithm for sequential deletion applications
JP2011239708A (en) Design method for probe for nucleic acid standard substrate detection, probe for nucleic acid standard substrate detection and nucleic acid detecting system having the same
Garbarine et al. An information theoretic method of microarray probe design for genome classification
JP2001258568A (en) Primer design system
Kiryanova et al. The method of generation barcode for DNA certification of plants and organisms
JP2003052385A (en) Probe sequence determination system for dna array
Class et al. Patent application title: Genetic Affinity of Microorganisms and Viruses Inventors: George E. Fox (Houston, TX, US) Richard C. Willson, Iii (Houston, TX, US) Zhengdong Zhang (Houston, TX, US)
AU2006214800B2 (en) Methods of genetic analysis involving the amplification of complementary duplicons
EP1148123B1 (en) Method for determining base sequence of analytical oligonucleotides for the detection of nucleic acids

Legal Events

Date Code Title Description
AS Assignment

Owner name: BIOSIGMA S.A., CHILE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAASS SEPULVEDA, ALEJANDRO EDUARDO;ARAVENA DUARTE, ANDRES OCTAVIO;GONZALEZ CANALES, MAURICIO ALEJANDRO;AND OTHERS;REEL/FRAME:018893/0135

Effective date: 20061020

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20181214