WO2017209575A1 - Evaluation of specificity of oligonucleotides - Google Patents

Evaluation of specificity of oligonucleotides Download PDF

Info

Publication number
WO2017209575A1
WO2017209575A1 PCT/KR2017/005818 KR2017005818W WO2017209575A1 WO 2017209575 A1 WO2017209575 A1 WO 2017209575A1 KR 2017005818 W KR2017005818 W KR 2017005818W WO 2017209575 A1 WO2017209575 A1 WO 2017209575A1
Authority
WO
WIPO (PCT)
Prior art keywords
oligonucleotide
formula
sequence
nucleotide sequences
reference nucleotide
Prior art date
Application number
PCT/KR2017/005818
Other languages
French (fr)
Inventor
Jong Yoon Chun
Gi-Seok YOON
Original Assignee
Seegene, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seegene, Inc. filed Critical Seegene, Inc.
Priority to KR1020197000224A priority Critical patent/KR102189358B1/en
Publication of WO2017209575A1 publication Critical patent/WO2017209575A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6811Selection methods for production or design of target specific oligonucleotides or binding molecules
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/117Modifications characterised by incorporating modified base

Definitions

  • the present invention relates to evaluation of specificity of oligonucleotides.
  • Nucleic acid amplification is a pivotal process for a wide variety of methods in molecular biology, such that various amplification methods have been proposed. For example, Miller, H. I. et al . (WO 89/06700) amplified a nucleic acid sequence based on the hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies of the sequence.
  • Other known nucleic acid amplification procedures include transcription-based amplification systems (Kwoh, D. et al ., Proc . Natl . Acad . Sci . U.S.A . , 86:1173(1989); and Gingeras T.R. et al ., WO 88/10315).
  • PCR polymerase chain reaction
  • the real-time PCR generally uses oligonucleotides such as primers and/or probes which hybridize specifically with target nucleic acid sequences.
  • oligonucleotides such as primers and/or probes which hybridize specifically with target nucleic acid sequences.
  • methods using hybridization between labeled probes and target nucleic acid sequences include the Molecular beacon method using dual-labeled probes capable of hairpin structure (Tyagi et al, Nature Biotechnology v. 14 Mar. 1996), the HyBeacon method (French DJ et al., Mol.
  • the PCR and real-time PCR generally utilize primers and/or probes for amplifying or detecting a desired target nucleic acid sequence from a mixture of various nucleic acids.
  • the primer and/or probe has high specificity for target nucleic acid sequence so as to obtain accurate amplification or detection results.
  • DSO dual specificity oligonucleotide
  • DPO dual priming oligonucleotide
  • the DSO has three different portions within the oligonucleotide molecule: 5'-high Tm specificity portion, 3'-low Tm specificity portion and separation portion, wherein the hybridization specificity is determined dually by the two portions (5'-high Tm specificity portion and 3'-low Tm specificity portion) separated by the separation portion consisting of universal bases.
  • a target discriminative probe capable of discriminating target nucleic acid sequences from non-target nucleic acid sequences was also developed by the present inventors ( see WO 2011/028041).
  • the TD probe comprises three unique portions within the oligonucleotide molecule: 5'-second hybridization portion, 3'-first hybridization portion and separation portion, wherein the hybridization specificity of the TD probe is determined dually by the 5'-second hybridization portion and the 3'-first hybridization portion separated by the separation portion consisting of universal bases.
  • oligonucleotides used in PCR and real-time PCR are designed and prepared to hybridize or match target nucleic acid sequences.
  • the elaborately designed oligonucleotides may hybridize with non-target nucleic acid sequences that have not been identified in its design. Accordingly, it is necessary to check whether the designed oligonucleotides hybridize only to the intended target, but not any unintended targets. This is generally referred to as a specificity evaluation (checking) process.
  • the specificity evaluation process may involve: searching the designed oligonucleotides against a database of known nucleotide sequences (e.g. , GenBank) using any sequence alignment algorithm or program (e.g. , BLAST) to find homologous sequences (homology search), and analyzing the resulting homologous sequences to check whether the designed oligonucleotide hybridizes only to a desired target nucleic acid sequence.
  • sequence alignment algorithm or program e.g. , BLAST
  • BLAST is one of the most widely used sequence similarity search tools, which compares a nucleotide query sequence against a nucleotide sequence database and find similar sequences to the query in the database. This program is offered free of charge by the National Center for Biotechnology Information (NCBI): http://www.ncbi.nih.gov.
  • the BLAST program is basically a string-matching program. Biological string matching looks for similarity as an indication of homology. Similarity between the query and the sequences in the database may be measured by the percent identity, or the number of bases in the query that exactly match a corresponding region of a sequence from the database.
  • the output of a BLAST search reports a set of scores and statistics on the matches it has found based on the raw score S, various parameters of the scoring algorithm, and properties of the query and database.
  • the raw score S is a measure of similarity and the size of the match.
  • the BLAST output lists the hits ranked by their E value.
  • the E (expect) value of a match measures, roughly, the chances that the string matching (allowing for gaps) occurs in a randomly generated database of the same size and composition. The close to 0, the E value is, the less likely it occurred by chance. In other words, the lower the E value, the better the match. It can be used as a measure for match of the primer to the target nucleic acid sequence.
  • BLAST provides comparatively good results for typical oligonucleotides, it is not suitable for non-typical oligonucleotides, such as containing several contiguous universal bases, non-natural bases, or the like, within the sequence.
  • the BLAST produces only results of one of portions separated by the universal bases even though a complete sequence is inputted as a query. Also, the BLAST does not provide individual mismatch results for the 5' portion and the 3' portion, each of which is important considerations in the design of oligonucleotides containing a plurality of consecutive universal bases within the sequence.
  • the BLAST treats the universal base or the degenerate base as mismatch, regardless of its specific type.
  • the present inventors have endeavored to develop a method for evaluating specificity of oligonucleotides, particularly non-typical oligonucleotides containing consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence.
  • the present inventors have developed a novel method comprising comparing the sequence of the oligonucleotide against a database of nucleotide sequences, extracting reference nucleotide sequences comprising a region homologous to the oligonucleotide, and analyzing portion-by-portion match/mismatch between the oligonucleotide and each of the reference nucleotide sequences to provide individual match results in two portions separated by the consecutive bases not involved in Watson-Crick base pairs.
  • oligonucleotides such as represented by 5'-X-Y-Z-3' (wherein Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs)
  • conventional sequence alignment algorithms or programs may provide only the match/mismatch results of either the X portion or the Z portion.
  • the method of the present invention provides the match/mismatch results of both of the X portion and the Z portion by using the homology region and it franking regions in an extracted reference nucleotide sequence.
  • the method of the present invention allows accurate evaluation of specificity, particularly annealing specificity of the oligonucleotides having the atypical structure, and helps to select an appropriate oligonucleotide considering the importance of the X and Z portions.
  • the method of the present invention provides results of classification of reference nucleotide sequences according to the numbers of mismatched bases in the portions X and Z, as well as biological features thereof, thereby allowing the user to evaluate specificity, particularly target specificity of the oligonucleotides in a simple and intuitive manner.
  • Fig. 1 is a flow chart illustrating a process for evaluating specificity of an oligonucleotide according to an embodiment of the present invention.
  • Fig. 2 is a schematic representation of evaluating specificity of an oligonucleotide (DPO primer) according to an embodiment of the present invention.
  • the sequence of the portion X (query) in the DPO primer represented by 5'-X-Y-Z-3' is compared against a database using the BLAST, and a plurality of reference nucleotide sequences comprising a region homologous to the X portion are extracted. Afterwards, the portion-by-portion match/mismatch between the complete sequence of the DPO primer and a homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers of mismatched bases in the portions X and Z are provided.
  • Fig. 3 shows the results of portion-by-portion match/mismatch analysis (sequence alignment) between the complete sequence of an exemplary DPO primer represented by 5'-X-Y-Z-3' (top row) and a reference nucleotide sequence extracted by one embodiment of the present invention (bottom row).
  • the DPO primer was found to match the minus (-) strand of the reference nucleotide sequence, and to have one (1) mismatched base in the portion X and zero (0) mismatched base in the portion Z.
  • the information is denoted by "- 1
  • a method for evaluating specificity of an oligonucleotide comprising the steps of:
  • X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs
  • Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • the present inventors have endeavored to develop a method for evaluating specificity of oligonucleotides, particularly non-typical oligonucleotides containing consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence.
  • the present inventors have developed a novel method comprising comparing the sequence of the oligonucleotide against a database of nucleotide sequences, extracting reference nucleotide sequences comprising a region homologous to the oligonucleotide, and analyzing portion-by-portion match/mismatch between the oligonucleotide and each of the reference nucleotide sequences to provide individual match results in two portions separated by the consecutive bases not involved in Watson-Crick base pairs.
  • specificity encompasses “annealing or hybridization specificity” and “target specificity”.
  • annealing or hybridization specificity refers to the fidelity of hybridization to be made between completely or perfectly complementary bases. The term is used to describe the relationship between two nucleic acid sequences. According to the definition, oligonucleotides having high specificity can hybridize to another oligonucleotide or polynucleotide under certain conditions; while oligonucleotides having low specificity are not.
  • target specificity refers to a property of an oligonucleotide that matches, hybridizes to, amplifies, or detects a target nucleic acid sequence of interest, but not any other nucleic acid sequences (non-target nucleic acid sequences), which can be used interchangeably with the term “target specificity”, “specificity to target nucleic acid”, or “specific to target nucleic acid sequence”.
  • oligonucleotides having high specificity can amplify or detect only a desired target nucleic acid sequence from a sample containing a mixture of various nucleic acids by PCR or real-time PCR method; while oligonucleotides having low specificity may amplify or detect non-targets as well as the target of interest, resulting in reduction of target amplification efficiency and false-positive results.
  • specificity may mean either or both of annealing specificity and target specificity.
  • the specificity is dependent on several factors such as the hybridization conditions (e.g. , temperature), the specificity can be determined primarily by the homology between the oligonucleotide sequence and the reference nucleotide sequence. In other words, the specificity may depend on the match/mismatches between the oligonucleotide and the reference nucleotide sequence. Those skilled in the art will be able to ascertain whether the designed oligonucleotide can hybridize to a nucleic acid sequence under certain conditions to selectively amplify or detect it, based on the match/mismatches between the designed oligonucleotide and the nucleotide sequence.
  • the term "information on specificity" as used herein refers to any information conducive to evaluation of specificity of an oligonucleotide.
  • the information on specificity as used herein refers to information obtained by analyzing the similarity between the oligonucleotide sequence and the reference nucleotide sequence, i.e. , the match/mismatches therebetween. Information on specificity will be described in detail below.
  • evaluating specificity or “evaluation of specificity” as used herein includes determining the specificity of an oligonucleotide based on the provided information, i.e. , the match/mismatches between an oligonucleotide sequence and the reference nucleotide sequence.
  • oligonucleotide can hybridize to a specific nucleic acid sequence under certain conditions, based on the match/mismatches.
  • the designed oligonucleotide can hybridize only to the target nucleic acid sequence under certain conditions to selectively amplify or detect it, based on the match/mismatches between the designed oligonucleotide and the reference nucleotide sequence.
  • the present invention is directed to a method for evaluating specificity of a non-typical oligonucleotide comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence.
  • the present invention provides individual match/mismatch results in two portions (portions X and Z) separated by the consecutive bases not involved in Watson-Crick base pairs.
  • the user can accurately evaluate the specificity of the oligonucleotide through the mismatch results in each of the portions provided by the present invention. Therefore, the present method is particularly useful in evaluating specificity of such non-typical oligonucleotides.
  • Fig. 1 is a flow chart illustrating a process for evaluating specificity of an oligonucleotide according to an exemplary embodiment of the present invention. The method of the present invention 100 will be described in more detail with reference to Fig. 1 as follows:
  • an oligonucleotide to be evaluated for specificity is provided in this step 110 .
  • the oligonucleotide is a primer or a probe, which is used for amplifying or detecting a target nucleic acid sequence.
  • target nucleic acid sequence refers to a nucleic acid sequence to be amplified or detected using the oligonucleotide of the present invention.
  • the target nucleic acid sequence may be double-stranded or single-stranded.
  • the target nucleic acid sequence may be either one strand or both strands of the double stranded nucleic acids, i.e. , (+) strand (coding strand, sense strand, non-template strand) or (-) strand (non-coding strand, antisense strand, template strand).
  • the target nucleic acid sequence may be one polynucleotide sequence comprising a region capable of hybridizing with the oligonucleotide of the present invention.
  • the target nucleic acid sequence may be at least two polynucleotide sequences comprising a consensus region that can be hybridized with an oligonucleotide of the present invention.
  • the target nucleic acid sequence may be a nucleotide sequence having genetic diversity.
  • the target nucleic acid sequence may be a group consisting of genetically identical gene families, i.e. , a gene and variants thereof.
  • the target nucleic acid sequence may a group of a gene and its subtypes which belong to the gene according to conventionally known classification criteria. For example, if an oligonucleotide is intended to amplify or detect human papillomavirus (HPV) type 16, the target nucleic acid sequence may be composed of a plurality of genes belonging to HPV type 16.
  • HPV human pap
  • non-target nucleic acid sequence refers to a nucleic acid sequence other than the target nucleic acid sequence to be amplified or detected using the oligonucleotide of the present invention.
  • the non-target nucleic acid sequence also include nucleic acid sequences that are not intended to be amplified or detected, but can be accidentally amplified or detected using the oligonucleotide of the present invention.
  • oligonucleotide refers to a short polynucleotide to be evaluated for its specificity.
  • the oligonucleotide may be referred to as "query” or “query sequence”.
  • the oligonucleotide includes linear oligomers of natural or modified monomers or linkages, including deoxyribonucleotides and ribonucleotides, capable of specifically hybridizing to a target nucleic acid sequence, which can be naturally occurring or artificially synthesized.
  • the oligonucleotide is preferably single stranded for maximum efficiency in amplification.
  • the oligonucleotide is an oligodeoxyribonucleotide.
  • the oligonucleotide of this invention can be comprised of naturally occurring dNMP (i.e., dAMP, dGM, dCMP and dTMP), modified nucleotide, or non-natural nucleotide.
  • the oligonucleotide can also include ribonucleotides.
  • the oligonucleotide of this invention may include nucleotides with backbone modifications such as peptide nucleic acid (PNA) (M. Egholm et al., Nature, 365:566-568(1993)), phosphorothioate DNA, phosphorodithioate DNA, phosphoramidate DNA, amide-linked DNA, MMI-linked DNA, 2'-O-methyl RNA, alpha-DNA and methylphosphonate DNA, nucleotides with sugar modifications such as 2'-O-methyl RNA, 2'-fluoro RNA, 2'-amino RNA, 2'-O-alkyl DNA, 2'-O-allyl DNA, 2'-O-alkynyl DNA, hexose DNA, pyranosyl RNA, and anhydrohexitol DNA, and nucleotides having base modifications such as C-5 substituted
  • the oligonucleotide of this invention may include a base other than natural bases (A, T, C or G).
  • the oligonucleotide to be evaluated for specificity in the method of the present invention is a primer or a probe.
  • primer refers to an oligonucleotide, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of primer extension product which is complementary to a nucleic acid strand (template) is induced, i.e. , in the presence of nucleotides and an agent for polymerization, such as DNA polymerase, and at a suitable temperature and pH.
  • the primer must be sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The exact length of the primers will depend on many factors, including temperature, application, and source of primer.
  • probe refers to a single-stranded nucleic acid molecule comprising a portion or portions that are substantially complementary to a target nucleic acid sequence.
  • the probe may contain a label capable of generating a signal for detection of a target nucleic acid sequence.
  • the 3'-end of the probe may be "blocked" to prohibit its extension.
  • the blocking may be achieved in accordance with conventional methods. For instance, the blocking may be performed by adding to the 3'-hydroxyl group of the last nucleotide a chemical moiety such as biotin, labels, a phosphate group, alkyl group, non-nucleotide linker, phosphorothioate or alkane-diol. Alternatively, the blocking may be carried out by removing the 3'-hydroxyl group of the last nucleotide or using a nucleotide with no 3'-hydroxyl group such as dideoxynucleotide.
  • annealing or “priming” as used herein refers to the apposition of an oligodeoxynucleotide or nucleic acid to a template nucleic acid, whereby the apposition enables the polymerase to polymerize nucleotides into a nucleic acid molecule which is complementary to the template nucleic acid or a portion thereof.
  • hybridizing used herein refers to the formation of a double-stranded nucleic acid from complementary single stranded nucleic acids. There is no intended distinction between the terms “annealing” and “hybridizing”, and these terms will be used interchangeably.
  • the oligonucleotide to be evaluated for specificity in the present invention is an oligonucleotide represented by the following Formula (I):
  • X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs
  • Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence.
  • the oligonucleotide of Formula (I) has three different portions with distinct properties, and its annealing specificity to a target sequence is dually determined by its separate two portions, i.e. , the portion X and the portion Z.
  • the annealing specificities of conventional (typical) primers or probes are governed by their complete sequences.
  • the annealing specificity of the oligonucleotide of Formula (I) is dually determined by separate two portions, i.e. , the portion X and the portion Z separated by the portion Y.
  • the portion Y comprises two or more consecutive bases, each of which is not involved in Watson-Crick base pairs.
  • a Watson-Crick base pair means that adenine (A) binds to thymine (T) or uracil (U) whilst, guanine (G) binds to cytosine (C).
  • the base not involved in Watson-Crick base pairs refers to any base which does not form a Watson-Crick base pair with an opposing base in a target nucleic acid sequence.
  • the base not involved in Watson-Crick base pairs includes any base showing a lower strength (low melting temperature) of base pairing between the base and an opposing base in a target nucleic acid sequence than that of the base pairing between natural bases.
  • the portion Y is designed to have lowest Tm value among the three portions when the oligonucleotide anneals to a target nucleic acid sequence.
  • Examples of the base not involved in Watson-Crick base pairs include: (i) non-natural bases; (ii) universal bases; and (iii) mismatched bases.
  • the bases comprised in the separation portion Y are selected from non-natural bases; universal bases; mismatched bases and combinations thereof.
  • non-natural base refers to derivatives of natural bases such as adenine (A), guanine (G), thymine (T), cytosine (C) and uracil (U), which are capable of forming hydrogen-bonding base pairs with each other (see, U.S. Pat. Nos. 8,440,406).
  • non-natural base includes bases having different base pairing patterns from natural bases as mother compounds, as described, for example, in U.S. Pat. Nos. 5,432,272, 5,965,364, 6,001,983, 6,037,120, and 8,440,406.
  • the base pairing between non-natural bases involves two or three hydrogen bonds as natural bases.
  • the base pairing between non-natural bases is also formed in a specific manner.
  • a non-natural base contained in an oligonucleotide of Formula (I) is not involved in Watson-Crick base pairs, if an opposing base in a target nucleic acid sequence is a natural base.
  • the base pairing between a non-natural base and an opposing base in a target nucleic acid sequence has a low strength (low melting temperature) compared to the base pairing between natural bases.
  • base pairing serves to generate a bubble structure and to separate the portions X and Z.
  • non-natural bases include the following bases in base pair combinations: iso-C/iso-G, iso-dC/iso-dG, K/X, H/J, and M/N (see U.S. Pat. Nos. 7,422,850 and 8,440,406).
  • universal base refers to one capable of forming base pairs with each of the natural DNA/RNA bases with little discrimination between them, the base pairs being not involved in Watson-Crick base pairs.
  • the base pairing between a universal base contained in the oligonucleotide of Formula (I) and an opposing base contained in the target nucleic acid sequence has a low strength (low melting temperature) compared to the base pairing between natural bases.
  • base pairing serves to generate a bubble structure and to separate the portions X and Z.
  • Examples of the universal base include deoxyinosine, inosine, 7-deaza-2'-deoxyinosine, 2-aza-2'-deoxyinosine, 2'-OMe inosine, 2'-F inosine, deoxy 3-nitropyrrole, 3-nitropyrrole, 2'-OMe 3-nitropyrrole, 2'-F 3-nitropyrrole, 1-(2'-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole, deoxy 5-nitroindole, 5-nitroindole, 2'-OMe 5-nitroindole, 2'-F 5-nitroindole, deoxy 4-nitrobenzimidazole, 4-nitrobenzimidazole, deoxy 4-aminobenzimidazole, 4-aminobenzimidazole, deoxy nebularine, 2'-F nebularine, 2'-F 4-nitrobenzimidazole, PNA-5-introindole, PNA-nebularine,
  • the universal base is deoxyinosine, inosine, 1-(2'-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole, or 5-nitroindole, more particularly, deoxyinosine or inosine.
  • mismatched base refers to a base which is not capable of forming hydrogen-bonding base pairs with an opposing base in a target nucleic acid sequence (see WO 2013/123552 and WO 2014/124290).
  • the type of the mismatched base may vary depending upon the type of an opposing base in a target nucleic acid.
  • the portion Y comprising the mismatched bases serves to generate a bubble structure and to separate the portions X and Z.
  • the portion Y may have two consecutive bases not involved in Watson-Crick base pairs, preferably three, four, five, six, seven, or more consecutive bases not involved in Watson-Crick base pairs.
  • the portion Y has 2-10, 2-9, 2-8, 2-7, 2-6 or 2-5, 2-4 or 2-3 consecutive bases not involved in Watson-Crick base pairs, more particularly 3-10, 3-9, 3-8, 3-7, 3-6, 3-5 or 3-4 consecutive bases not involved in Watson-Crick base pairs, most particularly 4-10, 4-9, 4-8, 4-7, 4-6 or 4-5 consecutive bases not involved in Watson-Crick base pairs.
  • the portion Y has two consecutive non-natural bases, preferably three, four, five, six, seven, eight or more consecutive non-natural bases.
  • the portion Y has two consecutive universal bases, preferably three, four, five, six, seven, eight or more consecutive universal bases.
  • the portion Y has two consecutive mismatched bases, preferably three, four, five, six, seven, eight or more consecutive mismatched bases.
  • the portion Y has two, preferably three, four, five, six, seven, eight or more consecutive bases, each base being independently selected from non-natural bases, universal bases and mismatched bases.
  • the portions X and Z are portions, each having a hybridizing nucleotide sequence to a target nucleic acid sequence, i.e. , portions, each having a hybridizing nucleotide sequence complementary to a site on a template nucleic acid to hybridize therewith.
  • oligonucleotide are sufficiently complementary to hybridize selectively to a target nucleic acid sequence under the designated annealing conditions or stringent conditions, encompassing the terms “substantially complementary” and “perfectly complementary”, preferably perfectly complementary.
  • portion X and/or the portion Z in the oligonucleotide of Formula (I) may have one or more mismatches to a template (target nucleic acid sequence) to an extent that it can serve as primer or probe.
  • the portion X and/or the portion Z in the oligonucleotide of Formula (I) can have 1-2, 1-3 or 1-4 non-complementary nucleotides.
  • portion X and/or the portion Z in the oligonucleotide of Formula (I) have a nucleotide sequence perfectly complementary to a site on a template, i.e. , no mismatches.
  • the length of the portion X and the portion Z may be in the range from 3 to 50 nucleotide residues, respectively.
  • the portion X is longer than the portion Z.
  • the length of the portion X is 15 to 50, 15 to 40, 15 to 30, or 15 or 25 nucleotide residues, more particularly, 17 to 50, 17 to 40, 17 to 30, or 17 to 25 nucleotide residues, and most particularly, 20 to 50, 20 to 40, 20 to 30, or 20 to 25 nucleotide residues.
  • the length of the portion Z is 3 to 15, 3 to 12, or 3 to 10 nucleotide residues, more particularly, 5 to 15, 5 to 12, or 5 to 10 nucleotide residues, most particularly, 6 to 12 nucleotide residues.
  • the portion Z is longer than the portion X.
  • the length of the portion Z is 15 to 50, 15 to 40, 15 to 30, or 15 or 25 nucleotide residues, more particularly, 17 to 50, 17 to 40, 17 to 30, or 17 to 25 nucleotide residues, and most particularly, 20 to 50, 20 to 40, 20 to 30, or 20 to 25 nucleotide residues.
  • the length of the portion X is 3 to 15, 3 to 12, or 3 to 10 nucleotide residues, more particularly, 5 to 15, 5 to 12, or 5 to 10 nucleotide residues, most particularly, 6 to 12 nucleotide residues.
  • the Tm of each of the portions X and Z ranges from 6°C to 80°C, 6°C to 70°C, 6°C to 60°C, 6°C to 50°C, 6°C to 40°C, 10°C to 80°C, 10°C to 70°C, 10°C to 60°C, 10°C to 50°C, 10°C to 40°C, 20°C to 80°C, 20°C to 70°C, 20°C to 60°C, 20°C to 50°C, 20°C to 40°C, 30°C to 80°C, 30°C to 70°C, 30°C to 60°C, 30°C to 50°C, or 30°C to 40°C.
  • the Tm of the portion Y ranges from 1°C to 15°C, 1°C to 10°C, 1°C to 5°C, 2°C to 15°C, 2°C to 10°C, 2°C to 5°C, 3°C to 15°C, 3°C to 10°C, or 3°C to 5°C. In an embodiment, the Tm of the portion Y is lower than that of each of the portions X and Z.
  • the Tm of the portion X is higher than that of the portion Z. In a particular embodiment, the Tm of the portion X is 5°C, 10°C, 15°C, 20°C or 25°C higher than that of the portion Z. In another embodiment, the Tm of the portion Z is higher than that of the portion X. In a particular embodiment, the Tm of the portion Z is 5°C, 10°C, 15°C, 20°C or 25°C higher than that of the portion Z.
  • either or both of the X and Z portions may comprise at least one universal base or degenerate base.
  • the universal bases are not present contiguously in the oligonucleotide sequence, but are present separately.
  • the Y portion also contains two or more consecutive universal bases
  • the two or more universal bases contained in either or both of the X portion and the Z portion are distinguished from two or more consecutive universal bases in the Y portion, in that these are present separately in the sequence.
  • the universal bases are present contiguously in the sequence of the oligonucleotide.
  • the Y portion also contains two or more consecutive universal bases
  • the two or more universal bases contained in either or both of the X portion and the Z portion are not distinguished from two or more consecutive universal bases in the Y portion. In this case, any one of them may be treated or regarded as the Y portion.
  • universal bases closer to the 5' end may be treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion.
  • a region distant from (distal to) the 5' end may be treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion.
  • a region having more universal bases is treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion.
  • degenerate base means that any of the four bases (A, C, G or T) or a specific subset of four bases (2 or 3 bases) may be present at the indicated position. Further, the term means more than one base possibility at a particular position. An oligo sequence can be synthesized with multiple bases at the same position, this is termed as degenerate base also sometime referred as "wobble" position or “mixed base”
  • the degenerate bases may have different extent of degeneracy.
  • extent of degeneracy refers to the number of bases that can occupy a given nucleotide position. "Full degeneracy” results when any of the four bases (A, C, G or T) can occupy a given degenerate position.
  • an oligonucleotide having a base A at a given degenerate position four oligonucleotides composed of an oligonucleotide having the base C at a given degenerate position, an oligonucleotide having the base G at a given degenerate position, and an oligonucleotide having the base T at a given degenerate position may be used together.
  • "partial degeneracy" results when a given degenerate position can be occupied by a specific subset of four bases (2-3) such as A/G, C/T, A/C/G, A/T/G, or the like.
  • the IUB degenerate codes for nucleotide bases are used herein.
  • R means either of the purine bases A or G
  • Y means either of the pyrimidine bases C or T
  • M means either of the amino bases A or C
  • K means either of the keto bases G or T
  • S means either of the stronger hydrogen bonding partners C or G
  • W means either of the weaker hydrogen bonding partners A or T
  • H means A, C or T
  • B means G, T or C
  • V means G, C or A
  • D means G, A or T
  • N means G, A, C or T.
  • the oligonucleotide represented by Formula (I) is a dual specificity oligonucleotide (referred to as DSO or DPO) as disclosed in WO 2006/095981. Details of the dual specificity oligonucleotide are found supra .
  • the oligonucleotide represented by Formula (I) is a target discriminative (TD) probe as disclosed in WO2011/028041. Details of the target discriminative probe are found supra .
  • the oligonucleotide of Formula (I) provided in this step may be a pre-existing oligonucleotide (primer or probe).
  • the oligonucleotide of Formula (I) provided in this step may be an oligonucleotide designed based on the target nucleic acid sequence to be amplified or detected.
  • the oligonucleotide may be one which is designed manually or by using a design program well known in the art.
  • primer/probe design programs include, without limitation, Primer3 (http://frodo.wi.mit.edu/), Visual OMP TM software (DNA Software, Inc., Ann Arbor, Mich.), Integrated DNA Technology (IDT) OligoAnalyzer 3.0 program (http://scitools.idtdna.com/Analvzer/oligocalc.asp), DINAmelt TM program (http://dinamelt.bioinfo.rpi.edu/), OLIGO 7 (Wojciech Rychlik (2007) "OLIGO 7 Primer Analysis Software", Methods Mol. Biol. 402: 35-60) and Primer Express 3.0 software (Applied Biosystems USA).
  • the oligonucleotide of Formula (I) is designed such that its X and Y portions have a sequence that can be substantially hybridized to the target nucleic acid sequence.
  • the X and Y portions in the oligonucleotide of Formula (I) are designed to match (have a significant sequence similarity to) a specific region of the target nucleic acid sequence.
  • the oligonucleotide of Formula (I) When the oligonucleotide of Formula (I) is intended for amplifying or detecting a plurality of target nucleic acid sequences (for example, a nucleotide sequence having genetic diversity; a group consisting of genetically identical gene families, i.e. , a gene and variants thereof; a group of a gene and its subtypes), the oligonucleotide may be prepared by aligning the plurality of target nucleic acid sequences, finding a common sequence, i.e. , a conserved region, and designing an oligonucleotide sequence to match the conserved region.
  • the oligonucleotide of Formula (I) may be designed to have 100% identity with a plurality of target nucleic acid sequences.
  • the oligonucleotide of Formula (I) may be designed to have a few mismatches for a plurality of target nucleic acid sequences, as long as it can be hybridized to the target nucleic acid sequences under controlled hybridization conditions (e.g. , temperature).
  • controlled hybridization conditions e.g. , temperature
  • the oligonucleotide of Formula (I) may be one of a plurality of candidate oligonucleotides designed based on a target nucleic acid sequence(s).
  • One of skill in the art can design a plurality of candidate oligonucleotides of Formula (I) based on a known target nucleic acid sequence(s), and the oligonucleotide of Formula (I) used in the method of the present invention may be one of the plurality of candidate oligonucleotides.
  • the oligonucleotide of Formula (I) may be one of the oligonucleotides used in multiplex amplification or detection.
  • the oligonucleotide of Formula (I) may be one of a plurality of oligonucleotides (or candidate oligonucleotides) for amplifying or detecting a plurality of target nucleic acid sequences.
  • the oligonucleotide of Formula (I) may be one of a pair of primers (i.e. , a forward primer and a reverse primer) for amplifying a target nucleic acid sequence.
  • the oligonucleotide Formula (I) is one which can be used for PCR or real-time PCR.
  • the oligonucleotide of Formula (I) is one which is useful in a variety of fields, for example (i) Miller, H. I method (WO 89/06700) and Davey, C.
  • oligonucleotide of the present invention is one which can be applied to various nucleic acid amplification, sequencing, and hybridization-related techniques.
  • the complete or a partial sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, and reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) are extracted from the database 120 .
  • database of nucleotide sequences refers to a set or collection of data relating to two or more nucleotide sequences derived from various sources.
  • the database of nucleotide sequences may comprise information related to nucleotide sequences, for example, their specific sequences and identities.
  • the database may be publicly available, commercially available, or generated by the inventor.
  • the database is a collection arranged for ease and speed of search and retrieval by a computer.
  • databases well known in the art include, but are not limited to, a GenBank database, an EST database, an EMBL nucleotide sequence database, an Entrez nucleotide database, and a LIFESEQ TM database.
  • the database of nucleotide sequences herein may also be referred to as a "reference database”.
  • the database to be compared with the oligonucleotide of Formula (I) herein may be any of the databases described above, or a combination thereof.
  • the comparison of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least database of nucleotide sequences in this step (b) involves searching the database using a sequence alignment algorithm or program. Also, the comparison of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database using a sequence alignment algorithm or program.
  • the comparison of the sequence of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database and analyzing the alignments.
  • the comparison of the complete or a partial of the sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database and determining homology or similarity therebetween.
  • the comparison between two sequences i.e. , between the complete or a partial sequence of the oligonucleotide of Formula (I) and nucleotide sequences in a database, may be performed using a sequence alignment algorithm or program.
  • sequence alignment algorithm or program is well known in the art.
  • sequence alignment algorithms or programs include the local homology algorithm of Smith and Waterman (1981, Adv. Appl. Math. 2: 482), the homology alignment algorithm of Needleman and Wunsch (1970, J. Mol. Biol.,), the search for similarity via the method of Person and Lipman (1988, Proc. Nat'l. Acad. Sci. USA 85: 2444), computerized implementations of these algorithms (GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetic Computer Group, 575, Science Drive, Madison, Wisconsin), manual alignment, and inspection.
  • sequence alignment algorithm or program is selected from the group consisting of Smith & Waterman, Needleman-Wunsch, BLAST, and FASTA algorithm or program.
  • sequence alignment algorithms or programs use appropriate parameters to find a region homologous to an oligonucleotide (query sequence).
  • sequence alignment algorithm or program used in the method of the present invention may employ parameters set to default values, or may employ parameters adjusted appropriately by those skilled in the art.
  • a representative sequence alignment algorithm or program, the BLAST algorithm uses parameters such as E-value, Reward/penalty, Gap penalty, Gap creation, Word size, Scoring matrix, PSSM, Filter, and the like.
  • the parameters in the sequence alignment algorithm or program may be appropriately adjusted by one skilled in the art, in order to control the amount (number) of reference nucleotide sequences to be extracted, through regulation of the degree (extent) of homology (homology cutoff) between the complete or a partial sequence of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences in the database.
  • the oligonucleotide of Formula (I) is short in length, it is preferable to decrease the word size and to increase the E value as compared with their default values, in order to increase match probability.
  • the sequence alignment algorithm or program used in the present invention may be an algorithm or program developed by the present inventors.
  • the algorithm or program may be one developed to evaluate specificity of an oligonucleotide comprising two or more contiguous bases, each of which is not involved in Watson-Crick base pairs, or optionally comprising non-contiguous universal bases or degenerate bases within its sequence.
  • the algorithm or program may not consider the sequence of the Y portion in the oligonucleotide of Formula (I).
  • the algorithm or program does not consider the homology between the sequence of the Y portion in the oligonucleotide of Formula (I) and a corresponding reference nucleotide sequence in the database. That is, the comparison using the above algorithm or program may include determination of homology in two portions X and Z except for the portion Y.
  • reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) are extracted from the database.
  • reference nucleotide sequence refers to a sequence within a database, which comprises a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I).
  • the number of reference nucleotide sequences extracted may be at least one.
  • Each of the reference nucleotide sequences comprises a homologous region and optionally its flanking regions.
  • region homologous refers to a particular region within a reference nucleotide sequence from a database, which is identical or similar to the complete or a partial sequence of the oligonucleotide of Formula (I).
  • the homologous region refers to a specific region within a reference nucleotide sequence that matches the complete or a partial sequence of the oligonucleotide of Formula (I).
  • the extracted reference nucleotide sequences may have homologous sequences of different sizes.
  • the homologous region is the same length as the oligonucleotide provided in step (a).
  • the reference nucleotide sequences extracted by the BLAST algorithm may include a homologous region of the same length as the oligonucleotide provided in step (a).
  • the homologous region is the same length as and has homology with the complete sequence of the oligonucleotide provided in step (a).
  • the homologous region is shorter than the oligonucleotide provided in step (a).
  • the oligonucleotide provided in step (a) comprises a relatively large number of contiguous bases not involved in Watson-Crick base pairs (e.g. , four, five or six or more universal bases)
  • the reference nucleotide sequences extracted by the BLAST may include a homologous region shorter than the oligonucleotide provided in step (a).
  • an oligonucleotide represented by 5'-X-Y-Z-3' is compared against a database using the BLAST
  • reference nucleotide sequences comprising a region homologous to only the X portion (the homologous region having the same length as the portion X) may be extracted.
  • the homologous region is shorter than the complete sequence of the oligonucleotide provided in step (a), and has homology with a partial sequence of the oligonucleotide, i.e. , the X portion.
  • a region homologous to the complete or a partial sequence of the oligonucleotide indicates a region within a reference nucleotide sequence, which has a substantial homology (similarity) to the complete or a partial sequence of the oligonucleotide.
  • the substantial homology indicates that the homology between the region within the reference nucleotide sequence and the complete or a partial sequence of the oligonucleotide is higher than a defined or selected degree of homology (a certain threshold).
  • the defined degree of homology refers to a criterion or threshold for extracting, from a database, reference nucleotide sequences having high similarity or homology with a designed oligonucleotide.
  • the defined degree of homology may be 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences.
  • the defined degree of homology between the sequence in either of the portions X and Z of the oligonucleotide and a corresponding reference nucleotide sequence is 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences.
  • the defined degree of homology between the sequence in the portion X of the oligonucleotide and a corresponding reference nucleotide sequence is at least 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, and the degree of homology between the sequence in the portion Z of the oligonucleotide and the homologous region in the corresponding reference nucleotide sequence is at least 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences.
  • the complete sequence of the oligonucleotide of Formula (I) is used in the comparison of step (b).
  • reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b).
  • the complete sequence of an oligonucleotide consisting of 30 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences, each comprising a homologous region of 30 nucleotides in length, may be extracted from the database in step (b).
  • reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) (e.g. , the portion X, the portion Y or part thereof) may be extracted from the database in step (b).
  • the complete sequence of an oligonucleotide of 30 nucleotides in length is compared against a GenBank database, and reference sequences comprising a homologous region of less than 30 nucleotides in length may be extracted from the database in step (b).
  • a partial sequence of the oligonucleotide of Formula (I) is used in the comparison of step (b).
  • the partial sequence of the oligonucleotide of Formula (I) used in the comparison of step (b) may be the portion X, the portion Z, or a part thereof.
  • reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b). For example, only the portion X consisting of 15 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences comprising a homologous region of 15 nucleotides in length may be extracted from the database in step (b).
  • reference nucleotide sequences comprising a region homologous to a part of the partial sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b). For example, only the portion X consisting of 15 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences comprising a homologous region of less than 15 nucleotides in length may be extracted from the database in step (b).
  • only the sequence of the X portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the X portion may be extracted from the database in step (b).
  • only the sequence of the Z portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the Z portion may be extracted from the database in step (b).
  • only the sequence of the part of the X portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the part of the X portion may be extracted from the database in step (b).
  • only the sequence of the part of the Z portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the part of the Z portion may be extracted from the database in step (b).
  • the comparison i.e. , homology determination
  • the comparison i.e. , homology determination
  • the homology determination is characterized by using a partial sequence, particularly a partial sequence except for the Y portion.
  • a partial sequence rather than the complete sequence of the oligonucleotide prevents the Y portion from adversely affecting the homology determination, thereby enabling the extraction of reference nucleotide sequences with more precise homology.
  • the use of a partial sequence of the oligonucleotide make it possible to avoid the problem that the homologous region is misjudged because of the bases not involved in Watson-Crick base pairs, contained in the Y portion.
  • the reference nucleotide sequences extracted according to any of the above embodiments are those comprising a region homologous to the sequence in the X or Z portion, or a part thereof.
  • Fig. 2 An exemplary procedure is illustrated in Fig. 2, in which only the sequence of the X portion in the oligonucleotide is compared against a database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the sequence of the X portion are extracted from the database.
  • portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences is analyzed, and (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences are provided 130 .
  • step (a) a match/mismatch between the oligonucleotide of Formula (I) provided in step (a) and each of the reference nucleotide sequences extracted in step (b) is analyzed portion-by-portion.
  • portion-by-portion match/mismatch refers to the match/mismatch in each portion of the oligonucleotide of Formula (I).
  • the term is used interchangeably with "local match/mismatch”.
  • analyzing portion-by-portion match/mismatch indicates analyzing a match/mismatch per each portion of the oligonucleotide of Formula (I).
  • analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences indicates analyzing match/mismatch between the sequence of each of the portions X and Z in the oligonucleotide of Formula (I) and the sequence of a corresponding portion in each of the reference nucleotide sequences.
  • portion-by-portion match/mismatch involves: comparing the sequence of the portion X in the oligonucleotide of Formula (I) with a corresponding sequence in each of the reference nucleotide sequences to calculate match/mismatches therebetween and comparing the sequence of the portion Z in the oligonucleotide of Formula (I) with a corresponding sequence in each of the reference nucleotide sequences to calculate match/mismatches therebetween.
  • the numbers or ratios of matched or mismatched bases in the portions X and Z are conducive to evaluation of specificity of the oligonucleotides of Formula (I). Thus, these are collectively referred herein to as information on specificity.
  • oligonucleotide comprising consecutive universal bases in a sequence, such as a dual specificity oligonucleotide
  • the specificity is determined dually by the X portion and the Z portion separated by the consecutive universal bases. Therefore, it is very important to check the annealing specificity in each of the X and Z portions of the oligonucleotide, for evaluation of specificity of the oligonucleotide.
  • the method of the present invention provides individual match/mismatch results in the X and Z portions.
  • the user can evaluate specificity of the oligonucleotide in a more accurate manner based on the results.
  • the numbers or ratios of matched or mismatched bases in each of the X and Z portions are provided for all extracted reference nucleotide sequences.
  • the user can ascertain whether the designed oligonucleotide is hybridized only to a target nucleic acid sequence.
  • the presence of mismatches between the Z portion of the oligonucleotide and the target nucleic acid sequence provides a strong basis for the user to select other oligonucleotides instead of the designed oligonucleotide.
  • the presence of mismatches in the X portion provides a hint for the user to decide whether to use the oligonucleotide in view of hybridization conditions, since the oligonucleotide even with mismatched base pairs in the X portion may hybridize to a target nucleic acid sequence under certain conditions.
  • match/mismatch results of the X and Z portions are very useful in evaluating specificity of oligonucleotides of Formula (I).
  • the numbers or ratios of matched or mismatched bases provided in this step may be calculated by comparing the sequence of the X portion in the oligonucleotide with the corresponding sequence in each of the reference nucleotide sequences and comparing the sequence of the Z portion in the oligonucleotide with the corresponding sequence in each of the reference nucleotide sequences.
  • the complete sequence of the oligonucleotide of Formula (I) is aligned (arranged) with each of the extracted reference nucleotide sequences on the basis of the homology region thereof, and the numbers or ratios of matched or mismatched bases are then analyzed in the X and Y portions.
  • alignment information can be obtained when a reference nucleotide sequence is extracted.
  • step (b) when the complete sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region in each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
  • the complete sequence of an oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region (40 nucleotides in length) homologous to the complete sequence of the oligonucleotide of Formula (I) are extracted in step (b), the numbers or ratios of matched or mismatched bases in the portions X and Z can be directly calculated, because the homologous region already contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I).
  • step (b) when the complete sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
  • an oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region ( e.g. , 10-15, 10-20, 10-30 or 10-35 nucleotides in length) homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the numbers or ratios of matched or mismatched bases in the portions X and Z cannot be directly calculated, because the homologous region may not contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I).
  • flanking regions are further used for the calculation of the numbers or ratios of matched or mismatched bases in the portions X and Z.
  • the complete sequence of the oligonucleotide of Formula (I) is compared with a corresponding sequence in each the reference nucleotide sequences comprising the homologous region and its flanking regions, to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
  • flanking regions refer to the remaining regions except for the homologous region in the reference nucleotide sequence.
  • the flanking regions includes a region corresponding to the Y portion and a region corresponding to the Z portion.
  • the flanking regions includes a region corresponding to the Y portion and a region corresponding to the X portion.
  • a partial sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and the reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
  • a partial sequence e.g. , 10-15, 10-20, 10-30 or 10-35 nucleotides in length
  • oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region (e.g.
  • step (b) the numbers or ratios of matched or mismatched bases in the portions X and Z cannot be directly calculated using only the homologous regions, because the homologous region may not contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I).
  • its flanking regions are further used for the calculation of the numbers or ratios of matched or mismatched bases in the portions X and Z.
  • the complete sequence of the oligonucleotide of Formula (I) is compared with a corresponding sequence in each the reference nucleotide sequences comprising the homologous region and its flanking regions, to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
  • the homologous region in each of the reference nucleotide sequences may be the same or shorter in length compared to the oligonucleotide of Formula (I) provided in step (a).
  • the complete sequence of the oligonucleotide of Formula (I) is compared against a database of nucleotide sequences and the number of the bases not involved in Watson-Crick base pairs in the Y portion is relatively small
  • reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) may be extracted.
  • reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) may be extracted. Further, when a partial sequence of the oligonucleotide of Formula (I) is used for comparison, reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) may be extracted.
  • Such comparison or analysis may also be referred to as "extension of the comparison", in that the comparison in step (b) utilizes a partial sequence of the oligonucleotide while the comparison in step (c) utilizes the complete sequence of the oligonucleotide.
  • the homologous region may be extended and then the numbers or ratios of matched or mismatched bases in the portions X and Z can be calculated. Extending the homologous region to calculate the numbers or ratios of matched or mismatched bases indicates that the homologous region is extended over a sequence corresponding to the complete sequence of the oligonucleotide, and then the numbers or ratios of matched or mismatched bases are calculated. In other words, it indicates that the sequences of the flanking regions are taken (or restored) from the extracted nucleic acid sequence or database to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
  • the bases contained in the Y portion hybridize to corresponding bases in the target nucleic acid sequence at a relatively low affinity as compared to the bases that form Watson-Crick base pairs. That is, when the oligonucleotide of Formula (I) is hybridized to the target nucleic acid sequence, the Y portion can form a loop structure. This loop formation of the Y portion may narrow the space between a region to which the X portion hybridizes and a region to which the Z portion hybridizes.
  • flanking region opposite to a portion X or Y of interest for the calculation of the number or ratio of matched or mismatched bases is determined by considering a portion of interest and its possible opposing regions.
  • the flanking region opposite to the Z portion is generally a region which is 5 nucleotides apart from the homologous region to which the X portion is hybridized, but it may also be a region which is 4 nucleotides or 3 nucleotides apart from the homologous region to which the X portion is hybridized, due to the loop formation on the Y portion.
  • the calculation of the number or ratio of matched or mismatched bases may be made between the Z portion and a region which is 5 nucleotides from the region to which the X portion is hybridized, between the Z portion and a region which is 4 nucleotides from the region to which the X portion is hybridized, and between the Z portion and a region which is 3 nucleotides from the region to which the X portion is hybridized.
  • the ratio of the number of mismatched bases to the number of matched bases in each of the portions X and Y is provided.
  • the ratio of the number of mismatched bases to the number of whole nucleotide sequence in each of the portions X and Y is provided.
  • the ratio of the number of matched bases to the number of mismatched bases in each of the portions X and Y is provided.
  • the ratio of the number of matched bases to the number of whole nucleotide sequence in each of the portions X and Y is provided.
  • the method of the present invention may change the criterion for treating the universal bases or degenerate bases as a match or a mismatch, followed by providing the numbers or ratios of matched or mismatched bases based on the changed criterion in step (c).
  • the universal base when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one universal base, the universal base may not be counted as mismatched bases in step (c). That is, when there is at least one universal base in either or both of the portions X and Z in the oligonucleotide of Formula (I), the universal base is treated as the matched base, regardless of the type of a corresponding nucleotide in each of the reference nucleotide sequences. For example, if there are three mismatched bases and one additional universal base in the X portion consisting of 15 nucleotides, then one embodiment of the invention may determine the total number of mismatched bases to be three (3).
  • the universal base may or may not be counted as matched bases. For example, if there are three mismatched bases and one additional universal base in the X portion of 15 nucleotides in length, the total number of matched bases in the X portion may be determined to be twelve (12). Alternatively, the total number of matched bases in the X portion may be determined to be eleven (11).
  • the method of the present invention takes into account the match between the degenerated base and a corresponding base in the reference nucleotide sequence. That is, when a degenerate base is present in either or both of the portions X and Z in the oligonucleotide of Formula (I), the degenerate base may or may not be counted as mismatched bases in step (c), depending on the type of the degenerate base (depending on bases represented by the degenerate base).
  • the degenerate base when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one degenerate base, the degenerate base is not counted as mismatched bases in step (c), with a proviso that any one of bases represented by the degenerate base matches the corresponding base in the reference nucleotide sequence.
  • Conventional sequence alignment algorithms or programs such as BLAST treat the degenerate base as mismatch, regardless of its type.
  • the present method is characterized in determining a match/mismatch based on the type of the degenerate base.
  • the present method treats the degenerate base as match.
  • the corresponding base in the reference nucleotide sequence to be compared is cytosine (C) or thymine (T)
  • the present method treats the degenerate bases as mismatch.
  • the degenerate base is converted into each of the bases encompassed by said degenerate base, and then performing steps (b) and (c).
  • a degenerate base “R” (either of the purine bases A or G) in the oligonucleotide of Formula (I)
  • a first oligonucleotide in which the degenerate base “R” is converted into adenine (A) and a second oligonucleotide in which the degenerate base “R” is converted into guanine (G) are prepared, and are subjected to the present method, respectively.
  • This method can prevent the degenerate bases from being judged as a mismatch and thus affecting the extraction of nucleotide sequences having a homology region.
  • the numbers or ratios of matched or mismatched bases between each of the portions X and Z in the oligonucleotide of Formula (I) and each of the reference nucleotide sequences can be expressed in a variety way.
  • the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion may be collectively presented as Xm
  • 0" indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is zero, and the number of mismatched bases between the Z portion and the reference nucleotide sequence is zero.
  • the notation means that oligonucleotide of Formula (I) except for the Y portion perfectly matches the reference nucleotide sequence.
  • 0" indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is 1 and the number of mismatched bases between the Z portion and the reference nucleotide sequence is zero.
  • 1” indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is zero and the number of mismatched bases between the Z portion and the reference nucleotide sequence is 1.
  • the number of the whole nucleotide sequence of each of portions X and Y or the number of the matched bases in each of the portions X and Y can be indicated additionally.
  • the numbers of mismatched bases are highly associated with the specificity of the oligonucleotide of Formula (I).
  • the numbers of mismatched bases in the X and Z portion may differently affect the evaluation of specificity, particularly annealing specificity of the oligonucleotide of Formula (I), while the Y portion has no effect on the evaluation of the specificity.
  • the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion have a negative effect on the specificity of the oligonucleotide to a different degree.
  • the method of the present invention may impart different weights to the two values, i.e. , the number or ratio of mismatched bases in the X portion and the number or ratio of mismatched bases in the Z portion, for evaluation of more accurate specificity of the oligonucleotide of Formula (I).
  • the match in the Z portion is more important than that in the X portion upon the determination of the specificity (for example, a dual specificity oligonucleotide disclosed in WO 2006/095981).
  • oligonucleotides having one mismatch in the Z portion can be evaluated to have poor specificity compared to oligonucleotides having one mismatch in the X portion.
  • oligonucleotides having one mismatch in the Z portion may be evaluated to have poor specificity compared to oligonucleotides having two, three or four mismatches in the X portion.
  • the weight to be given to the number of mismatched bases in the Z portion may be greater than the weight to be given to the number of mismatched bases in the X portion.
  • the weights can be given by a person skilled in the art in various ways.
  • the match in the X portion is more important than that in the Z region upon the determination of the specificity (see , for example, a target discriminative (TD) probe disclosed in WO 2011/028041).
  • the weight to be given to the number of mismatched bases in the X portion may be greater than the weight to be given to the number of mismatched bases in the Z portion.
  • an embodiment of the present invention may impart a penalty score to the oligonucleotide of Formula (I) based on the numbers of mismatched bases in the X and Z portions.
  • the penalty score is a value reflecting the degradation of the specificity of the oligonucleotide of Formula (I).
  • the penalty score may be given per mismatched base.
  • the penalty score to be given per mismatched base in the X portion and the penalty score to be given per mismatched base in the Z portion may be different from each other.
  • the penalty score to be given per mismatched base in the X portion may be smaller than the penalty score to be given per mismatched base in the Z portion.
  • Such difference in penalty scores can be achieved by giving a weighted penalty score. For example, assuming that the specificity of the oligonucleotide of Formula (I) in which no mismatched base is present in both the X and Z portions (i.e.
  • the portions X and Z in the oligonucleotide perfectly matched with a target nucleic acid sequence is "100"
  • a penalty score of "10” may be given per a mismatched base in the X portion
  • a penalty score of "20", "30", “40", “50”, or "60” per a mismatched base in the Z portion.
  • the specificity of the oligonucleotide having one mismatched base pair in the Z site will be "80", “70", “60”, “50”, or “40”, respectively.
  • the present invention allows for accurate specificity evaluation by imparting different weighted penalty scores to the portion X and the portion Z, depending on the numbers of mismatched bases in the portions X and Z.
  • the penalty score to be given per a mismatched base in the Z portion is smaller than that in the X portion.
  • the Y portion does not affect the evaluation of the specificity of the oligonucleotide, so that the Y portion is not considered in the evaluation of the specificity.
  • the present invention provides the match/mismatch results in the X and Z portions of the oligonucleotide individually, the specificity of each of X and Z portions can be evaluated individually by the match/mismatch results for each portion.
  • the specificity of each of X and Z portions may be determined by assessing the match/mismatch results in each portion based on different criteria (e.g. different match/mismatch threshold).
  • the specificity of the X portion it is determined whether there are two or less mismatches between the X portion and a reference nucleotide sequence
  • the specificity of the Z portion it is determined whether there is one or less mismatch between the X portion and a reference nucleotide sequence.
  • the specificity of the oligonucleotide can be evaluated by combining the specificity evaluation in each portion, thereby determining the nucleotide sequences to which the oligonucleotide is anneal or hybridized.
  • the number of match/mismatches per each portion can be predefined for evaluating specificity, and then the coverage, inclusivity and exclusivity of the oligonucleotide can be evaluated. Further, the coverage, inclusivity and exclusivity of the oligonucleotide can be modulated, if needed, by adjusting the hybridization conditions and the like.
  • the present method may further provide the direction of the match between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences,.
  • the direction of the match may be provided to distinguish the oligonucleotide of the Formula (I) matched to the (+) strand (coding strand, sense strand, non-template strand) of the reference nucleotide sequence from the oligonucleotide of the Formula (I) matched to the (-) strand (non-coding strand, antisense strand, template strand) of the reference nucleotide sequence.
  • oligonucleotide of Formula (I) matches the (+) strand of the reference nucleotide sequence, an indication such as "F” or “+” may be presented, and otherwise an indication such as “R” or “-” may be presented.
  • the direction of the match may be presented in combination of the numbers of mismatched bases in the X and Z portions described above.
  • notations such as "F Xm
  • 0" indicates that the oligonucleotide of Formula (I) matches the (-) strand of the reference nucleotide sequence, and that the oligonucleotide of Formula (I) has one mismatched base in the X portion and no mismatched base in the Z portion, in a simple and intuitive manner.
  • the present method may further biological features of the reference nucleotide sequences.
  • the biological features of the reference nucleotide sequences comprise the sources, gene IDs, or descriptions of the reference nucleotide sequences extracted.
  • the biological features of the reference nucleotide sequences may include the position of a region corresponding to the oligonucleotide (for example, the position numbers of the nucleotides at the 5' end and the 3' end).
  • the biological features of the reference nucleotide sequences may include a list of reference nucleotide sequences having some homology with the designed oligonucleotide.
  • the biological features of the reference nucleotide sequences may include one or more features provided in the sequence alignment algorithm or program, such as the conventional BLAST algorithm.
  • the biological features of the reference nucleotide sequences may be useful in evaluating specificity of an oligonucleotide.
  • the user analyzes the list of reference nucleotide sequences comprising a region homologous to the designed oligonucleotide and their specific sequence information, thereby determining whether the designed oligonucleotide amplifies or detects (or hybridizes to) only the target nucleic acid sequence, but not the non-target nucleic acid sequence. Furthermore, it is possible to control the degree of mismatch of the oligonucleotide, specifically, the degree of mismatch in the X and Z portions with regard to a target nucleic acid sequence.
  • the presence of a target nucleic acid sequence and the absence of non-target nucleic acid sequences in the list of reference nucleotide sequences indicate that the oligonucleotide is suitable for amplification or detection of a target nucleic acid sequence.
  • the presence of non-target nucleic acid sequences in the list of reference nucleotide sequences indicates that the oligonucleotide is not suitable for amplification or detection of a target nucleic acid sequence, which becomes a strong basis for selecting other oligonucleotides.
  • the biological features of the reference nucleotide sequences include information conducive to determination of the target coverage of the oligonucleotide.
  • the present method may further provide results of classification of the reference nucleotide sequences according to the number of mismatched bases in the portion X and the number of mismatched bases in the portion Z.
  • the user needs to identify reference nucleotide sequences which are homologous to the designed oligonucleotide, and thus the provision of such results of classification is very useful in determining the specificity of the designed oligonucleotide.
  • the results of classification of the reference nucleotide sequences are those obtained by grouping (sorting) the reference nucleotide sequences on the basis of the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion, including, for example, a list and the number of reference nucleotide sequences belonging to each group, and a biological properties of each of the reference nucleotide sequences.
  • Primers or probes may also hybridize with reference nucleotide sequences with a few mismatches under certain hybridization conditions. Therefore, in order to evaluate the suitability or workability of the designed primer or probe, it is necessary to check perfectly matched reference nucleotide sequences as well as partially matched reference nucleotide sequences. To this end, the method of the present invention provides a list and the number of reference nucleotide sequences belonging to each group, and biological properties of each of the reference nucleotide sequences in a simple and intuitive manner.
  • 0" (the number of mismatched bases in the X portion is zero and the number of mismatched bases in the Z portion is zero) with regard to the oligonucleotide of Formula (I) may be provided.
  • 0" (the number of mismatched bases in the X portion is 1 and the number of mismatched bases in the Z portion is zero), “0
  • 0" is provided as “30”, it means that there are 30 reference nucleotide sequences which are 100% identical to the X and Z portions in the oligonucleotide of Formula (I).
  • Those skilled in the art will take into account information about the reference nucleotide sequences corresponding to "1
  • 0" are highly likely to be amplified or detected using the oligonucleotide of Formula (I).
  • the user may design another oligonucleotide to avoid amplification or detection of the non-target nucleic acid sequence, or may ignore the amplification or detection of the non-target nucleic acid sequences if the number or the importance of the non-target nucleic acid sequences is low.
  • the target nucleic acid sequence are likely not to be amplified or detected using the oligonucleotide of Formula (I).
  • the user can modify the sequence of the oligonucleotide of Formula (I) ( e.g. , by incorporating a degenerate base) or to design another oligonucleotide, in order to cover the target nucleic acid sequence belonging to the mismatch type "0
  • non-target nucleic acid sequences are present among the reference nucleotide sequences corresponding to the mismatch type "0
  • the results of classification of the reference nucleotide sequences based on the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion are useful in evaluating the specificity of the designed oligonucleotide in a more simple and intuitive manner.
  • the results of classification may further comprise information of each reference nucleotide sequence.
  • the information provided may be used to determine whether the oligonucleotide exhibits the same match results as those that have been reviewed at the time of initial design. For example, assuming that the oligonucleotide of Formula (I) was designed to match five target nucleic acid sequences in the mismatch type "0
  • the further results of classification may be used to verify the coverage of the oligonucleotide of Formula (I).
  • the user can analyze the results of classification and identify the target nucleic acid sequences to be amplified or detected using the designed oligonucleotide, so that the results of classification can be used to verify the coverage of the oligonucleotide of Formula (I).
  • the method of the present invention may further provide information on the sequence similarity between the oligonucleotide and the each of the reference nucleotide sequences.
  • the information on the similarity can be displayed in various ways.
  • the information on the similarity can be expressed as the number of matched nucleotides relative to the total number of nucleotides of the designed oligonucleotide, or a percent-identity score thereof.
  • the information on the similarity may be calculated by excluding the similarity between the portion Y of the oligonucleotide and a corresponding region of the reference nucleotide sequence.
  • the similarity (%) may be calculated by [(total number of nucleotides matched in the X portion and the Z portion) / ( p + r )] * 100.
  • the information on the similarity may be calculated by assuming that the portion Y of the oligonucleotide and the corresponding portion of the reference nucleotide sequence match each other.
  • the similarity (%) may be calculated by [(total number of nucleotides matched in the X portion and the Z portion + q ) / ( p + q + r )] * 100.
  • the similarity between the X portion of the oligonucleotide and a corresponding portion of the reference nucleotide sequence, and the similarity between the Z portion of the oligonucleotide and a corresponding portion of the reference nucleotide sequence are provided separately.
  • the sequence similarity can be determined by treating the universal base or degenerate base in the same manner as the treatment of the universal base or degenerate base in the calculation of the number of mismatched bases.
  • the method of the present invention provides information on the specificity of an oligonucleotide in a variety of ways, allowing a user to analyze the homology of the oligonucleotide with target and non-target nucleic acid sequences in an easier, faster and more intuitive manner.
  • the method of the present invention is characterized by providing information on the specificity of oligonucleotides, it may also be referred to as a method of providing information on the specificity of oligonucleotides.
  • the method of the present invention may further comprise the step of evaluating specificity of the oligonucleotide of Formula (I) using the information provided in the step (c).
  • Evaluating specificity of the oligonucleotide of Formula (I) using the information provided in the step (c) may be accomplished by determining inclusivity and exclusivity of the oligonucleotide of Formula (I).
  • the method of the present invention may be used to evaluate workability of the oligonucleotide, particularly the oligonucleotide represented by the Formula (I), as a primer or a probe.
  • the match/mismatch results in the portions X and Z provided in step (c)
  • step (c) permits to ascertain whether the oligonucleotide is hybridized to a particular target nucleic acid sequence.
  • the method of the present invention can be used to determine that the oligonucleotide will act as a primer or probe for a particular target nucleic acid sequence.
  • the methods as describe above may be embodied on a computer by software including instructions for implementing a process for executing the methods.
  • a computer readable storage medium containing instructions to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
  • X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs
  • Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • a computer program to be stored on a computer readable storage medium to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
  • X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs
  • Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence
  • the program instructions are operative, when performed by the processor, to cause the processor to perform the present method described above.
  • the program instructions for performing the present method may comprise (i) an instruction to compare the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences; (ii) an instruction to extract reference nucleotide sequences having a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database; (iii) an instruction to portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide
  • the present method described above is implemented in a processor, such as a processor in a stand-alone computer, a network attached computer or a data acquisition device such as a real-time PCR machine.
  • a processor such as a processor in a stand-alone computer, a network attached computer or a data acquisition device such as a real-time PCR machine.
  • the types of the computer readable storage medium include various storage medium such as CD-R, CD-ROM, DVD, flash memory, floppy disk, hard drive, portable HDD, USB, magnetic tape, MINIDISC, nonvolatile memory card, EEPROM, optical disk, optical storage medium, RAM, ROM, system memory and web server.
  • various storage medium such as CD-R, CD-ROM, DVD, flash memory, floppy disk, hard drive, portable HDD, USB, magnetic tape, MINIDISC, nonvolatile memory card, EEPROM, optical disk, optical storage medium, RAM, ROM, system memory and web server.
  • the oligonucleotide of Formula (I) for amplifying or detecting a target nucleic acid sequence may be provided in various ways.
  • the sequence of the oligonucleotide of Formula (I) may be provided to a separate system such as a desktop computer system via a network connection (e.g. , LAN, VPN, intranet and Internet) or direct connection (e.g. , USB or other direct wired or wireless connection), or provided on a portable medium such as a CD, DVD, floppy disk, portable HDD or the like.
  • a network connection e.g. , LAN, VPN, intranet and Internet
  • direct connection e.g. , USB or other direct wired or wireless connection
  • portable medium such as a CD, DVD, floppy disk, portable HDD or the like.
  • the instructions to configure the processor to perform the present invention may be included in a logic system.
  • the instructions may be downloaded and stored in a memory module (e.g. , hard drive or other memory such as a local or attached RAM or ROM), although the instructions can be provided on any software storage medium such as a portable HDD, USB, floppy disk, CD and DVD.
  • a computer code for implementing the present invention may be implemented in a variety of coding languages such as C, C++, Java, Visual Basic, VBScript, JavaScript, Perl and XML.
  • a variety of languages and protocols may be used in external and internal storage and transmission of data and commands according to the present invention.
  • a device for evaluating specificity of an oligonucleotide comprising (a) a computer processor and (b) the computer readable storage medium described above coupled to the computer processor.
  • the processor may be prepared in such a manner that a single processor can do several performances.
  • the processor unit may be prepared in such a manner that several processors do the several performances, respectively.
  • EXAMPLE 1 Evaluation of specificity of an oligonucleotide according to one embodiment of the present invention
  • a DPO primer (SEQ ID NO: 1) was designed to amplify a 16S ribosomal RNA of Bacteroides fragilis (Genbank Accession No: HM352993.1) as a target nucleic acid sequence.
  • the nucleotide sequence of the designed DPO primer is shown below:
  • the DPO primer has three distinct portions: (i) a portion "X" at its 5' end: GACTCTAGAGAGACTGCCGTCGTAA; (ii) a separation portion "Y” consisting of five (5) deoxyinosine (I) (as highlighted in bold) as a universal base: IIIII; (iii) a portion "Z" at its 3' end: GAGGAAGGTG.
  • the portion X in the DPO primer (i.e. , 5'-GACTCTAGAGAGACTGCCGTCGTAA-3') was compared against the GenBank database using the BLAST for homology analysis.
  • the parameters used in the BLAST algorithm are as follows:
  • the extracted reference nucleotide sequences were each compared with the complete sequence of the DPO primer, to obtain the number of mismatched bases between the portion X of the DPO primer and each of the reference nucleotide sequences as well as the number of mismatched bases between the portion Z of the DPO primer and each of the reference nucleotide sequences (see Fig. 2).
  • the DPO primer has one mismatched base in the portion X and zero (0) mismatched base with regard to the exemplary reference nucleotide sequence.
  • the DPO primer was found to match the (-) strand of the reference nucleotide sequence.
  • D refers to the direction of the match of the oligonucleotide of interest relative to the reference nucleotide sequence.
  • + means that the oligonucleotide of interest matches the (+) strand of the reference nucleotide sequence
  • - means that the oligonucleotide of interest matches the (-) strand of the reference nucleotide sequence.
  • Xm indicates the number of mismatched bases in the portion X
  • Zm indicates the number of mismatched bases in the portion Z. The result was presented as "- 1
  • the reference nucleotide sequences were then sorted according to the number of mismatched bases in the portion X and the number of mismatched bases in the portion Z. The results are shown in Table 1 below.
  • 0" 230 reference nucleotide sequences
  • 0” 422 reference nucleotide sequences
  • 1” 10 reference nucleotide sequences
  • the designed DPO primer has specificity to the nucleic acid sequence of Bacteroides fragilis .
  • the results also provide information on the coverage of target nucleic acid sequences to be amplified using the designed DPO primer, depending on hybridization conditions. Specifically, from the results above, one of skill in the art will recognize that target nucleic acid sequences included in the mismatch type "0
  • results provide information about whether the DPO primers have an annealing specificity for each of the extracted reference nucleotides.
  • the designed oligonucleotide can be evaluated for its specificity in a more straightforward and intuitive manner.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method for evaluating specificity of oligonucleotides represented by 5'-X-Y-Z-3'. The present invention comprises comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database; and analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.

Description

EVALUATION OF SPECIFICITY OF OLIGONUCLEOTIDES
The present invention relates to evaluation of specificity of oligonucleotides.
Nucleic acid amplification is a pivotal process for a wide variety of methods in molecular biology, such that various amplification methods have been proposed. For example, Miller, H. I. et al. (WO 89/06700) amplified a nucleic acid sequence based on the hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies of the sequence. Other known nucleic acid amplification procedures include transcription-based amplification systems (Kwoh, D. et al., Proc . Natl . Acad . Sci . U.S.A ., 86:1173(1989); and Gingeras T.R. et al., WO 88/10315).
The most predominant process for a nucleic acid amplification known as polymerase chain reaction (hereinafter referred to as "PCR") is based on repeated cycles of denaturation of double-stranded DNA, followed by oligonucleotide primer annealing to the DNA template, and primer extension by a DNA polymerase (Mullis et al. U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,800,159; Saiki et al., (1985) Science 230, 1350-1354).
Recently, a real-time PCR technique is widely used for detecting the amplification of a target nucleic acid sequence in a real time manner. The real-time PCR generally uses oligonucleotides such as primers and/or probes which hybridize specifically with target nucleic acid sequences. Examples of methods using hybridization between labeled probes and target nucleic acid sequences include the Molecular beacon method using dual-labeled probes capable of hairpin structure (Tyagi et al, Nature Biotechnology  v. 14 Mar. 1996), the HyBeacon method (French DJ et al., Mol. Cell Probes, 15(6):363-374 (2001)), the Hybridization probe method using two probes singly labeled with donor or acceptor (Bernad et al, 147-148 Clin Chem  2000; 46) and the Lux method using single-labeled oligonucleotides (U.S. Pat. No. 7,537,886). In addition, the TaqMan method using cleavage reaction of a dual-labeled probe by the 5'-nuclease activity of DNA polymerases as well as hybridization of dual-labeled probes has been widely employed (U.S. Pat. Nos. 5,210,015 and 5,538,848).
The PCR and real-time PCR generally utilize primers and/or probes for amplifying or detecting a desired target nucleic acid sequence from a mixture of various nucleic acids. Thus, it is required that the primer and/or probe has high specificity for target nucleic acid sequence so as to obtain accurate amplification or detection results.
In this regard, the present inventors have developed a dual specificity oligonucleotide (DSO), also referred to as DPO (dual priming oligonucleotide), that performs a template-dependent reaction with higher specificity (see WO 2006/095981). The DSO has three different portions within the oligonucleotide molecule: 5'-high Tm specificity portion, 3'-low Tm specificity portion and separation portion, wherein the hybridization specificity is determined dually by the two portions (5'-high Tm specificity portion and 3'-low Tm specificity portion) separated by the separation portion consisting of universal bases.
A target discriminative probe (TD probe) capable of discriminating target nucleic acid sequences from non-target nucleic acid sequences was also developed by the present inventors (see WO 2011/028041). The TD probe comprises three unique portions within the oligonucleotide molecule: 5'-second hybridization portion, 3'-first hybridization portion and separation portion, wherein the hybridization specificity of the TD probe is determined dually by the 5'-second hybridization portion and the 3'-first hybridization portion separated by the separation portion consisting of universal bases.
Generally, oligonucleotides used in PCR and real-time PCR are designed and prepared to hybridize or match target nucleic acid sequences. However, even the elaborately designed oligonucleotides may hybridize with non-target nucleic acid sequences that have not been identified in its design. Accordingly, it is necessary to check whether the designed oligonucleotides hybridize only to the intended target, but not any unintended targets. This is generally referred to as a specificity evaluation (checking) process.
The specificity evaluation process may involve: searching the designed oligonucleotides against a database of known nucleotide sequences (e.g., GenBank) using any sequence alignment algorithm or program (e.g., BLAST) to find homologous sequences (homology search), and analyzing the resulting homologous sequences to check whether the designed oligonucleotide hybridizes only to a desired target nucleic acid sequence. Such specificity evaluation process has become a very useful tool for assessing the suitability or workability of primers and probes.
Various sequence alignment algorithms or programs have been utilized to evaluate the specificity of the oligonucleotides. Among them, BLAST is one of the most widely used sequence similarity search tools, which compares a nucleotide query sequence against a nucleotide sequence database and find similar sequences to the query in the database. This program is offered free of charge by the National Center for Biotechnology Information (NCBI): http://www.ncbi.nih.gov.
The BLAST program is basically a string-matching program. Biological string matching looks for similarity as an indication of homology. Similarity between the query and the sequences in the database may be measured by the percent identity, or the number of bases in the query that exactly match a corresponding region of a sequence from the database.
The output of a BLAST search reports a set of scores and statistics on the matches it has found based on the raw score S, various parameters of the scoring algorithm, and properties of the query and database. The raw score S is a measure of similarity and the size of the match. The BLAST output lists the hits ranked by their E value. The E (expect) value of a match measures, roughly, the chances that the string matching (allowing for gaps) occurs in a randomly generated database of the same size and composition. The close to 0, the E value is, the less likely it occurred by chance. In other words, the lower the E value, the better the match. It can be used as a measure for match of the primer to the target nucleic acid sequence.
While BLAST provides comparatively good results for typical oligonucleotides, it is not suitable for non-typical oligonucleotides, such as containing several contiguous universal bases, non-natural bases, or the like, within the sequence.
In particular, in the case of oligonucleotides containing a plurality of consecutive universal bases, such as a dual specificity oligonucleotide developed by the present inventors, the BLAST produces only results of one of portions separated by the universal bases even though a complete sequence is inputted as a query. Also, the BLAST does not provide individual mismatch results for the 5' portion and the 3' portion, each of which is important considerations in the design of oligonucleotides containing a plurality of consecutive universal bases within the sequence.
In addition, the BLAST treats the universal base or the degenerate base as mismatch, regardless of its specific type.
Thus, in view of the fact that conventional sequence alignment algorithms or programs are not suitable for evaluating specificity of non-typical oligonucleotides, there remains a need to develop a novel method for evaluating specificity of non-typical oligonucleotides in a more accurate manner.
Throughout this application, various patents and publications are referenced and citations are provided in parentheses. The disclosure of these patents and publications in their entirety are hereby incorporated by references into this application in order to more fully describe this invention and the state of the art to which this invention pertains.
The present inventors have endeavored to develop a method for evaluating specificity of oligonucleotides, particularly non-typical oligonucleotides containing consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence. As a result, the present inventors have developed a novel method comprising comparing the sequence of the oligonucleotide against a database of nucleotide sequences, extracting reference nucleotide sequences comprising a region homologous to the oligonucleotide, and analyzing portion-by-portion match/mismatch between the oligonucleotide and each of the reference nucleotide sequences to provide individual match results in two portions separated by the consecutive bases not involved in Watson-Crick base pairs.
Accordingly, it is an object of this invention to provide a method for evaluating specificity of an oligonucleotide.
It is another object of this invention to provide a computer readable storage medium containing instructions to configure a processor to perform a method for evaluating specificity of an oligonucleotide.
It is still another object of this invention to provide a device for evaluating specificity of an oligonucleotide.
It is further object of this invention to provide a computer program to be stored on a computer readable storage medium to configure a processor to perform a method for evaluating specificity of an oligonucleotide.
Other objects and advantages of the present invention will become apparent from the detailed description to follow taken in conjugation with the appended claims and drawings.
The features and advantages of this invention will be summarized as follows:
(a) Conventional sequence alignment algorithms or programs do not provide the match/mismatch results over the complete sequence of non-typical oligonucleotides, such as represented by 5'-X-Y-Z-3' (wherein Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs). In contrast, the method of the present invention provides the match/mismatch results in the portions X and Z individually, allowing a user to evaluate specificity, particularly annealing specificity of the portions X and Y with a different weight.
(b) For non-typical oligonucleotides, such as represented by 5'-X-Y-Z-3' (wherein Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs), conventional sequence alignment algorithms or programs may provide only the match/mismatch results of either the X portion or the Z portion. In contrast, the method of the present invention provides the match/mismatch results of both of the X portion and the Z portion by using the homology region and it franking regions in an extracted reference nucleotide sequence. Therefore, the method of the present invention allows accurate evaluation of specificity, particularly annealing specificity of the oligonucleotides having the atypical structure, and helps to select an appropriate oligonucleotide considering the importance of the X and Z portions.
(c) For oligonucleotides containing contiguous or non-contiguous universal bases within the sequence, conventional sequence alignment algorithms or programs determine the universal bases as mismatches. On the other hand, the method according to one embodiment of the present invention determines the universal bases as matches, thereby allowing accurate evaluation of specificity of the oligonucleotides.
(d) For oligonucleotides containing degenerate base(s) within the sequence, conventional sequence alignment algorithms or programs determine the degenerate base(s) as mismatch, irrespective of the type of its corresponding nucleotide. On the other hand, the method according to one embodiment of the present invention determines the match/mismatch according to the type of bases represented by the degenerate base, thereby allowing accurate evaluation of specificity of the oligonucleotides.
(e) The method of the present invention provides results of classification of reference nucleotide sequences according to the numbers of mismatched bases in the portions X and Z, as well as biological features thereof, thereby allowing the user to evaluate specificity, particularly target specificity of the oligonucleotides in a simple and intuitive manner.
Fig. 1 is a flow chart illustrating a process for evaluating specificity of an oligonucleotide according to an embodiment of the present invention.
Fig. 2 is a schematic representation of evaluating specificity of an oligonucleotide (DPO primer) according to an embodiment of the present invention. The sequence of the portion X (query) in the DPO primer represented by 5'-X-Y-Z-3' is compared against a database using the BLAST, and a plurality of reference nucleotide sequences comprising a region homologous to the X portion are extracted. Afterwards, the portion-by-portion match/mismatch between the complete sequence of the DPO primer and a homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers of mismatched bases in the portions X and Z are provided.
Fig. 3 shows the results of portion-by-portion match/mismatch analysis (sequence alignment) between the complete sequence of an exemplary DPO primer represented by 5'-X-Y-Z-3' (top row) and a reference nucleotide sequence extracted by one embodiment of the present invention (bottom row). As depicted in Fig. 3, the DPO primer was found to match the minus (-) strand of the reference nucleotide sequence, and to have one (1) mismatched base in the portion X and zero (0) mismatched base in the portion Z. The information is denoted by "- 1 | 0" in Fig. 3.
I. Evaluation of Specificity of Oligonucleotide
In one aspect of this invention, there is provided a method for evaluating specificity of an oligonucleotide, comprising the steps of:
(a) providing an oligonucleotide represented by the following Formula (I):
5'-X-Y-Z-3' (I)
wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
(b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
(c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
The present inventors have endeavored to develop a method for evaluating specificity of oligonucleotides, particularly non-typical oligonucleotides containing consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence. As a result, the present inventors have developed a novel method comprising comparing the sequence of the oligonucleotide against a database of nucleotide sequences, extracting reference nucleotide sequences comprising a region homologous to the oligonucleotide, and analyzing portion-by-portion match/mismatch between the oligonucleotide and each of the reference nucleotide sequences to provide individual match results in two portions separated by the consecutive bases not involved in Watson-Crick base pairs.
As used herein, the term "specificity" encompasses "annealing or hybridization specificity" and "target specificity".
The term "annealing or hybridization specificity" refers to the fidelity of hybridization to be made between completely or perfectly complementary bases. The term is used to describe the relationship between two nucleic acid sequences. According to the definition, oligonucleotides having high specificity can hybridize to another oligonucleotide or polynucleotide under certain conditions; while oligonucleotides having low specificity are not.
The term "target specificity" refers to a property of an oligonucleotide that matches, hybridizes to, amplifies, or detects a target nucleic acid sequence of interest, but not any other nucleic acid sequences (non-target nucleic acid sequences), which can be used interchangeably with the term "target specificity", "specificity to target nucleic acid", or "specific to target nucleic acid sequence". According to the definition, oligonucleotides having high specificity can amplify or detect only a desired target nucleic acid sequence from a sample containing a mixture of various nucleic acids by PCR or real-time PCR method; while oligonucleotides having low specificity may amplify or detect non-targets as well as the target of interest, resulting in reduction of target amplification efficiency and false-positive results.
The term specificity as used herein may mean either or both of annealing specificity and target specificity.
While the specificity is dependent on several factors such as the hybridization conditions (e.g., temperature), the specificity can be determined primarily by the homology between the oligonucleotide sequence and the reference nucleotide sequence. In other words, the specificity may depend on the match/mismatches between the oligonucleotide and the reference nucleotide sequence. Those skilled in the art will be able to ascertain whether the designed oligonucleotide can hybridize to a nucleic acid sequence under certain conditions to selectively amplify or detect it, based on the match/mismatches between the designed oligonucleotide and the nucleotide sequence.
In addition, the term "information on specificity" as used herein refers to any information conducive to evaluation of specificity of an oligonucleotide. As described above, the information on specificity as used herein refers to information obtained by analyzing the similarity between the oligonucleotide sequence and the reference nucleotide sequence, i.e., the match/mismatches therebetween. Information on specificity will be described in detail below.
Also, the term "evaluating specificity" or "evaluation of specificity" as used herein includes determining the specificity of an oligonucleotide based on the provided information, i.e., the match/mismatches between an oligonucleotide sequence and the reference nucleotide sequence.
Those skilled in the art will be able to ascertain whether a designed oligonucleotide can hybridize to a specific nucleic acid sequence under certain conditions, based on the match/mismatches.
Further, those skilled in the art will be able to ascertain whether the designed oligonucleotide can hybridize only to the target nucleic acid sequence under certain conditions to selectively amplify or detect it, based on the match/mismatches between the designed oligonucleotide and the reference nucleotide sequence.
The present invention is directed to a method for evaluating specificity of a non-typical oligonucleotide comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, within the sequence. The present invention provides individual match/mismatch results in two portions (portions X and Z) separated by the consecutive bases not involved in Watson-Crick base pairs.
In particular, in the case of an oligonucleotide comprising two portions with different effects on the specificity, the user can accurately evaluate the specificity of the oligonucleotide through the mismatch results in each of the portions provided by the present invention. Therefore, the present method is particularly useful in evaluating specificity of such non-typical oligonucleotides.
Fig. 1 is a flow chart illustrating a process for evaluating specificity of an oligonucleotide according to an exemplary embodiment of the present invention. The method of the present invention 100 will be described in more detail with reference to Fig. 1 as follows:
Step (a): Providing an oligonucleotide 110
First, an oligonucleotide to be evaluated for specificity is provided in this step 110. The oligonucleotide is a primer or a probe, which is used for amplifying or detecting a target nucleic acid sequence.
The term "target nucleic acid sequence", "'target sequence", or "target" as used herein refers to a nucleic acid sequence to be amplified or detected using the oligonucleotide of the present invention. The target nucleic acid sequence may be double-stranded or single-stranded. The target nucleic acid sequence may be either one strand or both strands of the double stranded nucleic acids, i.e., (+) strand (coding strand, sense strand, non-template strand) or (-) strand (non-coding strand, antisense strand, template strand). The target nucleic acid sequence may be one polynucleotide sequence comprising a region capable of hybridizing with the oligonucleotide of the present invention. Alternatively, the target nucleic acid sequence may be at least two polynucleotide sequences comprising a consensus region that can be hybridized with an oligonucleotide of the present invention. The target nucleic acid sequence may be a nucleotide sequence having genetic diversity. The target nucleic acid sequence may be a group consisting of genetically identical gene families, i.e., a gene and variants thereof. The target nucleic acid sequence may a group of a gene and its subtypes which belong to the gene according to conventionally known classification criteria. For example, if an oligonucleotide is intended to amplify or detect human papillomavirus (HPV) type 16, the target nucleic acid sequence may be composed of a plurality of genes belonging to HPV type 16.
On the other hand, the term "non-target nucleic acid sequence", "non-target sequence", or "non-target" as used herein refers to a nucleic acid sequence other than the target nucleic acid sequence to be amplified or detected using the oligonucleotide of the present invention. The non-target nucleic acid sequence also include nucleic acid sequences that are not intended to be amplified or detected, but can be accidentally amplified or detected using the oligonucleotide of the present invention.
The term "oligonucleotide" as used herein refers to a short polynucleotide to be evaluated for its specificity. The oligonucleotide may be referred to as "query" or "query sequence".
The oligonucleotide includes linear oligomers of natural or modified monomers or linkages, including deoxyribonucleotides and ribonucleotides, capable of specifically hybridizing to a target nucleic acid sequence, which can be naturally occurring or artificially synthesized. The oligonucleotide is preferably single stranded for maximum efficiency in amplification. Preferably, the oligonucleotide is an oligodeoxyribonucleotide. The oligonucleotide of this invention can be comprised of naturally occurring dNMP (i.e., dAMP, dGM, dCMP and dTMP), modified nucleotide, or non-natural nucleotide. The oligonucleotide can also include ribonucleotides. For example, the oligonucleotide of this invention may include nucleotides with backbone modifications such as peptide nucleic acid (PNA) (M. Egholm et al., Nature, 365:566-568(1993)), phosphorothioate DNA, phosphorodithioate DNA, phosphoramidate DNA, amide-linked DNA, MMI-linked DNA, 2'-O-methyl RNA, alpha-DNA and methylphosphonate DNA, nucleotides with sugar modifications such as 2'-O-methyl RNA, 2'-fluoro RNA, 2'-amino RNA, 2'-O-alkyl DNA, 2'-O-allyl DNA, 2'-O-alkynyl DNA, hexose DNA, pyranosyl RNA, and anhydrohexitol DNA, and nucleotides having base modifications such as C-5 substituted pyrimidines (substituents including fluoro-, bromo-, chloro-, iodo-, methyl-, ethyl-, vinyl-, formyl-, ethynyl-, propynyl-, alkynyl-, thiazolyl-, imidazolyl-, pyridyl-), 7-deazapurines with C-7 substituents (substituents including fluoro-, bromo-, chloro-, iodo-, methyl-, ethyl-, vinyl-, formyl-, alkynyl-, alkenyl-, thiazolyl-, imidazolyl-, pyridyl-), inosine, and diaminopurine.
For example, the oligonucleotide of this invention may include a base other than natural bases (A, T, C or G).
The oligonucleotide to be evaluated for specificity in the method of the present invention is a primer or a probe.
The term "primer" as used herein refers to an oligonucleotide, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of primer extension product which is complementary to a nucleic acid strand (template) is induced, i.e., in the presence of nucleotides and an agent for polymerization, such as DNA polymerase, and at a suitable temperature and pH. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization. The exact length of the primers will depend on many factors, including temperature, application, and source of primer.
The term "probe" as used herein refers to a single-stranded nucleic acid molecule comprising a portion or portions that are substantially complementary to a target nucleic acid sequence. The probe may contain a label capable of generating a signal for detection of a target nucleic acid sequence. The 3'-end of the probe may be "blocked" to prohibit its extension. The blocking may be achieved in accordance with conventional methods. For instance, the blocking may be performed by adding to the 3'-hydroxyl group of the last nucleotide a chemical moiety such as biotin, labels, a phosphate group, alkyl group, non-nucleotide linker, phosphorothioate or alkane-diol. Alternatively, the blocking may be carried out by removing the 3'-hydroxyl group of the last nucleotide or using a nucleotide with no 3'-hydroxyl group such as dideoxynucleotide.
The term "annealing" or "priming" as used herein refers to the apposition of an oligodeoxynucleotide or nucleic acid to a template nucleic acid, whereby the apposition enables the polymerase to polymerize nucleotides into a nucleic acid molecule which is complementary to the template nucleic acid or a portion thereof. The term used "hybridizing" used herein refers to the formation of a double-stranded nucleic acid from complementary single stranded nucleic acids. There is no intended distinction between the terms "annealing" and "hybridizing", and these terms will be used interchangeably.
The oligonucleotide to be evaluated for specificity in the present invention is an oligonucleotide represented by the following Formula (I):
5'-X-Y-Z-3' (I)
wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence.
The oligonucleotide of Formula (I) has three different portions with distinct properties, and its annealing specificity to a target sequence is dually determined by its separate two portions, i.e., the portion X and the portion Z.
In general, the annealing specificities of conventional (typical) primers or probes are governed by their complete sequences. In contrast, the annealing specificity of the oligonucleotide of Formula (I) is dually determined by separate two portions, i.e., the portion X and the portion Z separated by the portion Y.
In the oligonucleotide of Formula (I), the portion Y comprises two or more consecutive bases, each of which is not involved in Watson-Crick base pairs.
As used herein, a Watson-Crick base pair means that adenine (A) binds to thymine (T) or uracil (U) whilst, guanine (G) binds to cytosine (C).
Thus, the base not involved in Watson-Crick base pairs refers to any base which does not form a Watson-Crick base pair with an opposing base in a target nucleic acid sequence. Particularly, the base not involved in Watson-Crick base pairs includes any base showing a lower strength (low melting temperature) of base pairing between the base and an opposing base in a target nucleic acid sequence than that of the base pairing between natural bases.
In an embodiment, the portion Y is designed to have lowest Tm value among the three portions when the oligonucleotide anneals to a target nucleic acid sequence.
These bases not involved in Watson-Crick base pairs generate a bubble structure during annealing (hybridization) or amplification, particularly, under the condition that the portion X or/and Y specifically anneals (hybridizes) to a target nucleic acid sequence, and then separate the portions X and Z, thereby enhancing the annealing specificity of primer or probe to a target sequence.
Examples of the base not involved in Watson-Crick base pairs include: (i) non-natural bases; (ii) universal bases; and (iii) mismatched bases. In an embodiment, the bases comprised in the separation portion Y are selected from non-natural bases; universal bases; mismatched bases and combinations thereof.
The term "non-natural base" as used herein refers to derivatives of natural bases such as adenine (A), guanine (G), thymine (T), cytosine (C) and uracil (U), which are capable of forming hydrogen-bonding base pairs with each other (see, U.S. Pat. Nos. 8,440,406). The term used herein "non-natural base" includes bases having different base pairing patterns from natural bases as mother compounds, as described, for example, in U.S. Pat. Nos. 5,432,272, 5,965,364, 6,001,983, 6,037,120, and 8,440,406. The base pairing between non-natural bases involves two or three hydrogen bonds as natural bases. The base pairing between non-natural bases is also formed in a specific manner.
A non-natural base contained in an oligonucleotide of Formula (I) is not involved in Watson-Crick base pairs, if an opposing base in a target nucleic acid sequence is a natural base. The base pairing between a non-natural base and an opposing base in a target nucleic acid sequence has a low strength (low melting temperature) compared to the base pairing between natural bases. Thus, such base pairing serves to generate a bubble structure and to separate the portions X and Z.
Specific examples of non-natural bases include the following bases in base pair combinations: iso-C/iso-G, iso-dC/iso-dG, K/X, H/J, and M/N (see U.S. Pat. Nos. 7,422,850 and 8,440,406).
The term "universal base" as used herein refers to one capable of forming base pairs with each of the natural DNA/RNA bases with little discrimination between them, the base pairs being not involved in Watson-Crick base pairs.
The base pairing between a universal base contained in the oligonucleotide of Formula (I) and an opposing base contained in the target nucleic acid sequence has a low strength (low melting temperature) compared to the base pairing between natural bases. Thus, such base pairing serves to generate a bubble structure and to separate the portions X and Z.
Examples of the universal base include deoxyinosine, inosine, 7-deaza-2'-deoxyinosine, 2-aza-2'-deoxyinosine, 2'-OMe inosine, 2'-F inosine, deoxy 3-nitropyrrole, 3-nitropyrrole, 2'-OMe 3-nitropyrrole, 2'-F 3-nitropyrrole, 1-(2'-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole, deoxy 5-nitroindole, 5-nitroindole, 2'-OMe 5-nitroindole, 2'-F 5-nitroindole, deoxy 4-nitrobenzimidazole, 4-nitrobenzimidazole, deoxy 4-aminobenzimidazole, 4-aminobenzimidazole, deoxy nebularine, 2'-F nebularine, 2'-F 4-nitrobenzimidazole, PNA-5-introindole, PNA-nebularine, PNA-inosine, PNA-4-nitrobenzimidazole, PNA-3-nitropyrrole, morpholino-5-nitroindole, morpholino-nebularine, morpholino-inosine, morpholino-4-nitrobenzimidazole, morpholino-3-nitropyrrole, phosphoramidate-5-nitroindole, phosphoramidate-nebularine, phosphoramidate-inosine, phosphoramidate-4-nitrobenzimidazole, phosphoramidate-3-nitropyrrole, 2'-O-methoxyethyl inosine, 2'-O-methoxyethyl nebularine, 2'-O-methoxyethyl 5-nitroindole, 2'-O-methoxyethyl 4-nitrobenzimidazole, 2'-O-methoxyethyl 3-nitropyrrole, and combinations thereof. In particular, the universal base is deoxyinosine, inosine, 1-(2'-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole, or 5-nitroindole, more particularly, deoxyinosine or inosine.
The term "mismatched base" as used herein refers to a base which is not capable of forming hydrogen-bonding base pairs with an opposing base in a target nucleic acid sequence (see WO 2013/123552 and WO 2014/124290). The type of the mismatched base may vary depending upon the type of an opposing base in a target nucleic acid.
Since the mismatched base contained in the oligonucleotide of Formula (I) does not form a base pair with an opposing base contained in the target nucleic acid, the portion Y comprising the mismatched bases serves to generate a bubble structure and to separate the portions X and Z.
The portion Y may have two consecutive bases not involved in Watson-Crick base pairs, preferably three, four, five, six, seven, or more consecutive bases not involved in Watson-Crick base pairs. According to a particular embodiment, the portion Y has 2-10, 2-9, 2-8, 2-7, 2-6 or 2-5, 2-4 or 2-3 consecutive bases not involved in Watson-Crick base pairs, more particularly 3-10, 3-9, 3-8, 3-7, 3-6, 3-5 or 3-4 consecutive bases not involved in Watson-Crick base pairs, most particularly 4-10, 4-9, 4-8, 4-7, 4-6 or 4-5 consecutive bases not involved in Watson-Crick base pairs.
In an embodiment, the portion Y has two consecutive non-natural bases, preferably three, four, five, six, seven, eight or more consecutive non-natural bases. In another embodiment, the portion Y has two consecutive universal bases, preferably three, four, five, six, seven, eight or more consecutive universal bases. In still another embodiment, the portion Y has two consecutive mismatched bases, preferably three, four, five, six, seven, eight or more consecutive mismatched bases. In still another embodiment, the portion Y has two, preferably three, four, five, six, seven, eight or more consecutive bases, each base being independently selected from non-natural bases, universal bases and mismatched bases.
In the oligonucleotide of Formula (I), the portions X and Z are portions, each having a hybridizing nucleotide sequence to a target nucleic acid sequence, i.e., portions, each having a hybridizing nucleotide sequence complementary to a site on a template nucleic acid to hybridize therewith.
The term "complementary" is used herein to mean that oligonucleotide are sufficiently complementary to hybridize selectively to a target nucleic acid sequence under the designated annealing conditions or stringent conditions, encompassing the terms "substantially complementary" and "perfectly complementary", preferably perfectly complementary.
It will be appreciated that the portion X and/or the portion Z in the oligonucleotide of Formula (I) may have one or more mismatches to a template (target nucleic acid sequence) to an extent that it can serve as primer or probe. For example, the portion X and/or the portion Z in the oligonucleotide of Formula (I) can have 1-2, 1-3 or 1-4 non-complementary nucleotides.
Most particularly, the portion X and/or the portion Z in the oligonucleotide of Formula (I) have a nucleotide sequence perfectly complementary to a site on a template, i.e., no mismatches.
The length of the portion X and the portion Z may be in the range from 3 to 50 nucleotide residues, respectively.
In an embodiment, the portion X is longer than the portion Z. Specifically, the length of the portion X is 15 to 50, 15 to 40, 15 to 30, or 15 or 25 nucleotide residues, more particularly, 17 to 50, 17 to 40, 17 to 30, or 17 to 25 nucleotide residues, and most particularly, 20 to 50, 20 to 40, 20 to 30, or 20 to 25 nucleotide residues. The length of the portion Z is 3 to 15, 3 to 12, or 3 to 10 nucleotide residues, more particularly, 5 to 15, 5 to 12, or 5 to 10 nucleotide residues, most particularly, 6 to 12 nucleotide residues.
In another embodiment, the portion Z is longer than the portion X. Specifically, the length of the portion Z is 15 to 50, 15 to 40, 15 to 30, or 15 or 25 nucleotide residues, more particularly, 17 to 50, 17 to 40, 17 to 30, or 17 to 25 nucleotide residues, and most particularly, 20 to 50, 20 to 40, 20 to 30, or 20 to 25 nucleotide residues. The length of the portion X is 3 to 15, 3 to 12, or 3 to 10 nucleotide residues, more particularly, 5 to 15, 5 to 12, or 5 to 10 nucleotide residues, most particularly, 6 to 12 nucleotide residues.
In an embodiment, the Tm of each of the portions X and Z ranges from 6℃ to 80℃, 6℃ to 70℃, 6℃ to 60℃, 6℃ to 50℃, 6℃ to 40℃, 10℃ to 80℃, 10℃ to 70℃, 10℃ to 60℃, 10℃ to 50℃, 10℃ to 40℃, 20℃ to 80℃, 20℃ to 70℃, 20℃ to 60℃, 20℃ to 50℃, 20℃ to 40℃, 30℃ to 80℃, 30℃ to 70℃, 30℃ to 60℃, 30℃ to 50℃, or 30℃ to 40℃. In an embodiment, the Tm of the portion Y ranges from 1℃ to 15℃, 1℃ to 10℃, 1℃ to 5℃, 2℃ to 15℃, 2℃ to 10℃, 2℃ to 5℃, 3℃ to 15℃, 3℃ to 10℃, or 3℃ to 5℃. In an embodiment, the Tm of the portion Y is lower than that of each of the portions X and Z.
In an embodiment, the Tm of the portion X is higher than that of the portion Z. In a particular embodiment, the Tm of the portion X is 5℃, 10℃, 15℃, 20℃ or 25℃ higher than that of the portion Z. In another embodiment, the Tm of the portion Z is higher than that of the portion X. In a particular embodiment, the Tm of the portion Z is 5℃, 10℃, 15℃, 20℃ or 25℃ higher than that of the portion Z.
In the oligonucleotide of Formula (I), either or both of the X and Z portions may comprise at least one universal base or degenerate base.
In an embodiment, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise two or more universal bases, the universal bases are not present contiguously in the oligonucleotide sequence, but are present separately. Where the Y portion also contains two or more consecutive universal bases, the two or more universal bases contained in either or both of the X portion and the Z portion are distinguished from two or more consecutive universal bases in the Y portion, in that these are present separately in the sequence.
In another embodiment, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise two or more universal bases, the universal bases are present contiguously in the sequence of the oligonucleotide. Where the Y portion also contains two or more consecutive universal bases, the two or more universal bases contained in either or both of the X portion and the Z portion are not distinguished from two or more consecutive universal bases in the Y portion. In this case, any one of them may be treated or regarded as the Y portion. As one example, universal bases closer to the 5' end may be treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion. As another example, a region distant from (distal to) the 5' end may be treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion. As still another example, a region having more universal bases is treated as the Y portion, and a portion at the 5' end around the Y portion is treated as the X portion and a portion at the 3' end around the Y portion is treated as the Z portion.
The term "degenerate base" as used herein means that any of the four bases (A, C, G or T) or a specific subset of four bases (2 or 3 bases) may be present at the indicated position. Further, the term means more than one base possibility at a particular position. An oligo sequence can be synthesized with multiple bases at the same position, this is termed as degenerate base also sometime referred as "wobble" position or "mixed base"
The degenerate bases may have different extent of degeneracy. The term "extent of degeneracy" refers to the number of bases that can occupy a given nucleotide position. "Full degeneracy" results when any of the four bases (A, C, G or T) can occupy a given degenerate position. In this case, an oligonucleotide having a base A at a given degenerate position, four oligonucleotides composed of an oligonucleotide having the base C at a given degenerate position, an oligonucleotide having the base G at a given degenerate position, and an oligonucleotide having the base T at a given degenerate position may be used together. On the other hand, "partial degeneracy" results when a given degenerate position can be occupied by a specific subset of four bases (2-3) such as A/G, C/T, A/C/G, A/T/G, or the like.
With regard to the indication of the degenerate base, the IUB degenerate codes for nucleotide bases are used herein. In these codes, R means either of the purine bases A or G; Y means either of the pyrimidine bases C or T; M means either of the amino bases A or C; K means either of the keto bases G or T; S means either of the stronger hydrogen bonding partners C or G; W means either of the weaker hydrogen bonding partners A or T; H means A, C or T; B means G, T or C; V means G, C or A; D means G, A or T; and N means G, A, C or T.
According to a particular embodiment of the present invention, the oligonucleotide represented by Formula (I) is a dual specificity oligonucleotide (referred to as DSO or DPO) as disclosed in WO 2006/095981. Details of the dual specificity oligonucleotide are found supra.
According to a still particular embodiment of the present invention, the oligonucleotide represented by Formula (I) is a target discriminative (TD) probe as disclosed in WO2011/028041. Details of the target discriminative probe are found supra.
The oligonucleotide of Formula (I) provided in this step may be a pre-existing oligonucleotide (primer or probe).
Alternatively, the oligonucleotide of Formula (I) provided in this step may be an oligonucleotide designed based on the target nucleic acid sequence to be amplified or detected.
The oligonucleotide may be one which is designed manually or by using a design program well known in the art. Examples of conventional primer/probe design programs include, without limitation, Primer3 (http://frodo.wi.mit.edu/), Visual OMPTM software (DNA Software, Inc., Ann Arbor, Mich.), Integrated DNA Technology (IDT) OligoAnalyzer 3.0 program (http://scitools.idtdna.com/Analvzer/oligocalc.asp), DINAmeltTM program (http://dinamelt.bioinfo.rpi.edu/), OLIGO 7 (Wojciech Rychlik (2007) "OLIGO 7 Primer Analysis Software", Methods Mol. Biol. 402: 35-60) and Primer Express 3.0 software (Applied Biosystems USA).
The oligonucleotide of Formula (I) is designed such that its X and Y portions have a sequence that can be substantially hybridized to the target nucleic acid sequence. For this purpose, the X and Y portions in the oligonucleotide of Formula (I) are designed to match (have a significant sequence similarity to) a specific region of the target nucleic acid sequence.
When the oligonucleotide of Formula (I) is intended for amplifying or detecting a plurality of target nucleic acid sequences (for example, a nucleotide sequence having genetic diversity; a group consisting of genetically identical gene families, i.e., a gene and variants thereof; a group of a gene and its subtypes), the oligonucleotide may be prepared by aligning the plurality of target nucleic acid sequences, finding a common sequence, i.e., a conserved region, and designing an oligonucleotide sequence to match the conserved region. The oligonucleotide of Formula (I) may be designed to have 100% identity with a plurality of target nucleic acid sequences. Alternatively, the oligonucleotide of Formula (I) may be designed to have a few mismatches for a plurality of target nucleic acid sequences, as long as it can be hybridized to the target nucleic acid sequences under controlled hybridization conditions (e.g., temperature).
The oligonucleotide of Formula (I) may be one of a plurality of candidate oligonucleotides designed based on a target nucleic acid sequence(s). One of skill in the art can design a plurality of candidate oligonucleotides of Formula (I) based on a known target nucleic acid sequence(s), and the oligonucleotide of Formula (I) used in the method of the present invention may be one of the plurality of candidate oligonucleotides.
The oligonucleotide of Formula (I) may be one of the oligonucleotides used in multiplex amplification or detection. The oligonucleotide of Formula (I) may be one of a plurality of oligonucleotides (or candidate oligonucleotides) for amplifying or detecting a plurality of target nucleic acid sequences.
In addition, the oligonucleotide of Formula (I) may be one of a pair of primers (i.e., a forward primer and a reverse primer) for amplifying a target nucleic acid sequence.
The oligonucleotide Formula (I) is one which can be used for PCR or real-time PCR. The oligonucleotide of Formula (I) is one which is useful in a variety of fields, for example (i) Miller, H. I method (WO 89/06700) and Davey, C. et al (EP 329,822), ligase chain reaction (LCR, Wu, DY et al., Genomics 4: 560 (1989)), polymerase ligase chain reaction (Barany, PCR Methods and Applic., 1: 5-16 (1991)), gap-LCR (WO 90/01069), repair chain reaction (EP 439,182), 3SR (Kwoh et al., PNAS, USA, 86: 1173 (1989)) and NASBA (US Pat. No. 5,130,238), such as a primer-a nucleic acid amplification method, (ii) the related cycle sequencing (Kretz et al., (1994) Cycle sequencing PCR Methods Appl. 3: S107-S112) and Pyro sequencing (Ronaghi et al., (1996) Anal. Biochem., 242: 84-89 and (1998) Science 281: 363 -365), etc., such as primer extension-related techniques, and (iii) the detection of a target nucleotide sequence using oligonucleotide microarray, such as hybridization-related techniques. The oligonucleotide of the present invention is one which can be applied to various nucleic acid amplification, sequencing, and hybridization-related techniques.
Step (b): Comparing against a nucleotide sequence database and extracting reference nucleotide sequences comprising a homologous region 120
In this step, the complete or a partial sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, and reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) are extracted from the database 120.
The term "database of nucleotide sequences", "nucleotide sequence database", "nucleotide database", or "database" as used herein refers to a set or collection of data relating to two or more nucleotide sequences derived from various sources. The database of nucleotide sequences may comprise information related to nucleotide sequences, for example, their specific sequences and identities. The database may be publicly available, commercially available, or generated by the inventor. The database is a collection arranged for ease and speed of search and retrieval by a computer.
Examples of databases well known in the art include, but are not limited to, a GenBank database, an EST database, an EMBL nucleotide sequence database, an Entrez nucleotide database, and a LIFESEQTM database. The database of nucleotide sequences herein may also be referred to as a "reference database".
The database to be compared with the oligonucleotide of Formula (I) herein may be any of the databases described above, or a combination thereof.
The comparison of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least database of nucleotide sequences in this step (b) involves searching the database using a sequence alignment algorithm or program. Also, the comparison of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database using a sequence alignment algorithm or program. Further, the comparison of the sequence of the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database and analyzing the alignments. Moreover, the comparison of the complete or a partial of the sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences in this step (b) involves aligning the complete or a partial sequence of the oligonucleotide with each of nucleotide sequences in the database and determining homology or similarity therebetween.
In this step, the comparison between two sequences, i.e., between the complete or a partial sequence of the oligonucleotide of Formula (I) and nucleotide sequences in a database, may be performed using a sequence alignment algorithm or program.
The sequence alignment algorithm or program is well known in the art. Examples of sequence alignment algorithms or programs include the local homology algorithm of Smith and Waterman (1981, Adv. Appl. Math. 2: 482), the homology alignment algorithm of Needleman and Wunsch (1970, J. Mol. Biol.,), the search for similarity via the method of Person and Lipman (1988, Proc. Nat'l. Acad. Sci. USA 85: 2444), computerized implementations of these algorithms (GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetic Computer Group, 575, Science Drive, Madison, Wisconsin), manual alignment, and inspection.
Other examples of algorithms or programs for determining homology include a BLAST program (Basic Local Alignment Search Tool at the National Center for Biological Information), ALIGN, AMAS (Analysis of Multiply Aligned Sequences), AMPS (Protein Multiple Sequence Alignment), ASSET (Aligned Segment Statistical Evaluation Tool), BANDS, BESTSCOR, BIOSCAN (Biological Sequence Comparative Analysis Node), BLIMPS (BLocks IMProved Searcher), FASTA, Intervals & Points, BMB, CLUSTAL V, CLUSTAL W, CONSENSUS, LCONSENSUS, WCONSENSUS, Smith-Waterman algorithm, DARWIN, Las Vegas algorithm, FNAT (Forced Nucleotide Alignment Tool), Framealign. Framesearch, DYNAMIC, FILTER, FSAP (Fristensky Sequence Analysis Package), GAP (Global Alignment Program), GENAL, GIBBS, GenQuest, ISSC (Sensitive Sequence Comparison), LALIG (Local Sequence Alignment), LCP (Local Content Program), MACAW (Multiple Alignment Construction & Analysis Workbench), MAP (Multiple Alignment Program), MBLKP, MBLKN, PIMA (Pattern-Induced Multi-sequence Alignment), SAGA (Sequence Alignment by Genetic Algorithm) and WHAT-IF. In particular, the sequence alignment algorithm or program is selected from the group consisting of Smith & Waterman, Needleman-Wunsch, BLAST, and FASTA algorithm or program.
The sequence alignment algorithms or programs use appropriate parameters to find a region homologous to an oligonucleotide (query sequence). The sequence alignment algorithm or program used in the method of the present invention may employ parameters set to default values, or may employ parameters adjusted appropriately by those skilled in the art. For example, a representative sequence alignment algorithm or program, the BLAST algorithm, uses parameters such as E-value, Reward/penalty, Gap penalty, Gap creation, Word size, Scoring matrix, PSSM, Filter, and the like. The parameters in the sequence alignment algorithm or program may be appropriately adjusted by one skilled in the art, in order to control the amount (number) of reference nucleotide sequences to be extracted, through regulation of the degree (extent) of homology (homology cutoff) between the complete or a partial sequence of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences in the database. In particular, in view that the oligonucleotide of Formula (I) is short in length, it is preferable to decrease the word size and to increase the E value as compared with their default values, in order to increase match probability.
In an embodiment of the invention, the sequence alignment algorithm or program used in the present invention may be an algorithm or program developed by the present inventors. The algorithm or program may be one developed to evaluate specificity of an oligonucleotide comprising two or more contiguous bases, each of which is not involved in Watson-Crick base pairs, or optionally comprising non-contiguous universal bases or degenerate bases within its sequence. The algorithm or program may not consider the sequence of the Y portion in the oligonucleotide of Formula (I). For example, the algorithm or program does not consider the homology between the sequence of the Y portion in the oligonucleotide of Formula (I) and a corresponding reference nucleotide sequence in the database. That is, the comparison using the above algorithm or program may include determination of homology in two portions X and Z except for the portion Y.
After performing the comparison as describe above, reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) are extracted from the database.
As used herein, the term "reference nucleotide sequence" refers to a sequence within a database, which comprises a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I). The number of reference nucleotide sequences extracted may be at least one.
Each of the reference nucleotide sequences comprises a homologous region and optionally its flanking regions.
As used herein, the term "region homologous", "homologous region" or "homology region" with regard to the complete or a partial sequence of the oligonucleotide of Formula (I) refers to a particular region within a reference nucleotide sequence from a database, which is identical or similar to the complete or a partial sequence of the oligonucleotide of Formula (I). In other words, the homologous region refers to a specific region within a reference nucleotide sequence that matches the complete or a partial sequence of the oligonucleotide of Formula (I).
The extracted reference nucleotide sequences may have homologous sequences of different sizes.
In an embodiment, the homologous region is the same length as the oligonucleotide provided in step (a). For example, where the oligonucleotide provided in step (a) comprises a relatively small number of contiguous bases not involved in Watson-Crick base pairs (e.g., two or three universal bases), the reference nucleotide sequences extracted by the BLAST algorithm may include a homologous region of the same length as the oligonucleotide provided in step (a). In this case, the homologous region is the same length as and has homology with the complete sequence of the oligonucleotide provided in step (a).
In another embodiment, the homologous region is shorter than the oligonucleotide provided in step (a). For example, where the oligonucleotide provided in step (a) comprises a relatively large number of contiguous bases not involved in Watson-Crick base pairs (e.g., four, five or six or more universal bases), the reference nucleotide sequences extracted by the BLAST may include a homologous region shorter than the oligonucleotide provided in step (a). Specifically, where an oligonucleotide represented by 5'-X-Y-Z-3' (particularly, having a relatively large number of contiguous bases not involved in Watson-Crick base pairs in the Y portion) is compared against a database using the BLAST, reference nucleotide sequences comprising a region homologous to only the X portion (the homologous region having the same length as the portion X) may be extracted. In this case, the homologous region is shorter than the complete sequence of the oligonucleotide provided in step (a), and has homology with a partial sequence of the oligonucleotide, i.e., the X portion.
The phrase "a region homologous to the complete or a partial sequence of the oligonucleotide" indicates a region within a reference nucleotide sequence, which has a substantial homology (similarity) to the complete or a partial sequence of the oligonucleotide. The substantial homology indicates that the homology between the region within the reference nucleotide sequence and the complete or a partial sequence of the oligonucleotide is higher than a defined or selected degree of homology (a certain threshold). The defined degree of homology refers to a criterion or threshold for extracting, from a database, reference nucleotide sequences having high similarity or homology with a designed oligonucleotide. For example, the defined degree of homology may be 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences. In one embodiment of the present invention, the defined degree of homology between the sequence in either of the portions X and Z of the oligonucleotide and a corresponding reference nucleotide sequence is 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences. In another embodiment of the present invention, the defined degree of homology between the sequence in the portion X of the oligonucleotide and a corresponding reference nucleotide sequence is at least 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, and the degree of homology between the sequence in the portion Z of the oligonucleotide and the homologous region in the corresponding reference nucleotide sequence is at least 90% or more, 91% or more, 92% or more, 93% or more, 94% or more, 95% or more, 96% or more, 97% or more, 98% or more, or 99% or more, based on the total number of bases in either of two aligned nucleotide sequences.
In an embodiment, the complete sequence of the oligonucleotide of Formula (I) is used in the comparison of step (b).
In a particular embodiment, when the complete sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b). For example, the complete sequence of an oligonucleotide consisting of 30 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences, each comprising a homologous region of 30 nucleotides in length, may be extracted from the database in step (b).
In another particular embodiment, when the complete sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) (e.g., the portion X, the portion Y or part thereof) may be extracted from the database in step (b). For example, the complete sequence of an oligonucleotide of 30 nucleotides in length is compared against a GenBank database, and reference sequences comprising a homologous region of less than 30 nucleotides in length may be extracted from the database in step (b).
In another embodiment, a partial sequence of the oligonucleotide of Formula (I) is used in the comparison of step (b).
The partial sequence of the oligonucleotide of Formula (I) used in the comparison of step (b) may be the portion X, the portion Z, or a part thereof.
In a particular embodiment, when a partial sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b). For example, only the portion X consisting of 15 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences comprising a homologous region of 15 nucleotides in length may be extracted from the database in step (b).
In another particular embodiment, when a partial sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences, reference nucleotide sequences comprising a region homologous to a part of the partial sequence of the oligonucleotide of Formula (I) may be extracted from a database in step (b). For example, only the portion X consisting of 15 nucleotide residues is compared against a GenBank database, and reference nucleotide sequences comprising a homologous region of less than 15 nucleotides in length may be extracted from the database in step (b).
In an embodiment of the present invention, only the sequence of the X portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the X portion may be extracted from the database in step (b).
In another embodiment of the present invention, only the sequence of the Z portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the Z portion may be extracted from the database in step (b).
In still another embodiment of the present invention, only the sequence of the part of the X portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the part of the X portion may be extracted from the database in step (b).
In yet another embodiment of the present invention, only the sequence of the part of the Z portion in the oligonucleotide is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the part of the Z portion may be extracted from the database in step (b).
According to embodiments using a partial sequence of the oligonucleotide of Formula (I), the comparison (i.e., homology determination) between the oligonucleotide and a database of nucleotide sequences is made between the sequence of the X or Z portion, or a part thereof in the oligonucleotide and reference nucleotide sequences in a database. That is, the homology determination is characterized by using a partial sequence, particularly a partial sequence except for the Y portion.
The use of a partial sequence rather than the complete sequence of the oligonucleotide prevents the Y portion from adversely affecting the homology determination, thereby enabling the extraction of reference nucleotide sequences with more precise homology. Namely, the use of a partial sequence of the oligonucleotide make it possible to avoid the problem that the homologous region is misjudged because of the bases not involved in Watson-Crick base pairs, contained in the Y portion.
The reference nucleotide sequences extracted according to any of the above embodiments are those comprising a region homologous to the sequence in the X or Z portion, or a part thereof.
An exemplary procedure is illustrated in Fig. 2, in which only the sequence of the X portion in the oligonucleotide is compared against a database of nucleotide sequences, and then reference nucleotide sequences comprising a region homologous to the sequence of the X portion are extracted from the database.
Step (c): Analyzing match/mismatch 130
Afterwards, portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences is analyzed, and (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences are provided 130.
In this step, a match/mismatch between the oligonucleotide of Formula (I) provided in step (a) and each of the reference nucleotide sequences extracted in step (b) is analyzed portion-by-portion.
The term "portion-by-portion match/mismatch" as used herein refers to the match/mismatch in each portion of the oligonucleotide of Formula (I). The term is used interchangeably with "local match/mismatch".
Also, the phrase "analyzing portion-by-portion match/mismatch" as used herein indicates analyzing a match/mismatch per each portion of the oligonucleotide of Formula (I). Thus, "analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences" indicates analyzing match/mismatch between the sequence of each of the portions X and Z in the oligonucleotide of Formula (I) and the sequence of a corresponding portion in each of the reference nucleotide sequences.
The analysis of portion-by-portion match/mismatch involves: comparing the sequence of the portion X in the oligonucleotide of Formula (I) with a corresponding sequence in each of the reference nucleotide sequences to calculate match/mismatches therebetween and comparing the sequence of the portion Z in the oligonucleotide of Formula (I) with a corresponding sequence in each of the reference nucleotide sequences to calculate match/mismatches therebetween.
As a result, (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences are provided.
The numbers or ratios of matched or mismatched bases in the portions X and Z are conducive to evaluation of specificity of the oligonucleotides of Formula (I). Thus, these are collectively referred herein to as information on specificity.
In the case of an oligonucleotide comprising consecutive universal bases in a sequence, such as a dual specificity oligonucleotide, the specificity is determined dually by the X portion and the Z portion separated by the consecutive universal bases. Therefore, it is very important to check the annealing specificity in each of the X and Z portions of the oligonucleotide, for evaluation of specificity of the oligonucleotide.
However, conventional sequence alignment algorithms or programs do not provide individual mismatch information for each of the X and Z portions as described above. Also, if the homology scores between the reference nucleotide sequences and the complete sequence of the oligonucleotide are somewhat low, conventional sequence alignment algorithms or programs may only provide match/mismatch results for a partial sequence of the oligonucleotide, not a complete sequence of the oligonucleotide. For example, when an oligonucleotide of 20 nucleotide residues is BLAST searched, the BLAST algorithm may also provide match/mismatch results for less than 20 nucleotides in length. In such a case, it may be impossible to obtain match/mismatch results in either or both of the portions X and Z.
In contrast, the method of the present invention provides individual match/mismatch results in the X and Z portions. Thus, the user can evaluate specificity of the oligonucleotide in a more accurate manner based on the results.
According to the present invention, the numbers or ratios of matched or mismatched bases in each of the X and Z portions are provided for all extracted reference nucleotide sequences. Thus, based on the results, the user can ascertain whether the designed oligonucleotide is hybridized only to a target nucleic acid sequence.
For oligonucleotides where the match in the Z portion is more important than the match in the X portion in terms of specificity, the presence of mismatches between the Z portion of the oligonucleotide and the target nucleic acid sequence provides a strong basis for the user to select other oligonucleotides instead of the designed oligonucleotide. On the other hand, the presence of mismatches in the X portion provides a hint for the user to decide whether to use the oligonucleotide in view of hybridization conditions, since the oligonucleotide even with mismatched base pairs in the X portion may hybridize to a target nucleic acid sequence under certain conditions. As such, match/mismatch results of the X and Z portions are very useful in evaluating specificity of oligonucleotides of Formula (I).
The numbers or ratios of matched or mismatched bases provided in this step may be calculated by comparing the sequence of the X portion in the oligonucleotide with the corresponding sequence in each of the reference nucleotide sequences and comparing the sequence of the Z portion in the oligonucleotide with the corresponding sequence in each of the reference nucleotide sequences.
In an embodiment, the complete sequence of the oligonucleotide of Formula (I) is aligned (arranged) with each of the extracted reference nucleotide sequences on the basis of the homology region thereof, and the numbers or ratios of matched or mismatched bases are then analyzed in the X and Y portions. In an embodiment, such alignment information (or result) can be obtained when a reference nucleotide sequence is extracted.
In one embodiment of the present invention, when the complete sequence of the oligonucleotide of Formula (I) is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region in each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
For example, when the complete sequence of an oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region (40 nucleotides in length) homologous to the complete sequence of the oligonucleotide of Formula (I) are extracted in step (b), the numbers or ratios of matched or mismatched bases in the portions X and Z can be directly calculated, because the homologous region already contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I).
In another embodiment of the present invention, when the complete sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
For example, when the complete sequence of an oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences and reference nucleotide sequences comprising a region (e.g., 10-15, 10-20, 10-30 or 10-35 nucleotides in length) homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the numbers or ratios of matched or mismatched bases in the portions X and Z cannot be directly calculated, because the homologous region may not contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I). In this case, in addition to the homologous region, its flanking regions are further used for the calculation of the numbers or ratios of matched or mismatched bases in the portions X and Z. In other words, the complete sequence of the oligonucleotide of Formula (I) is compared with a corresponding sequence in each the reference nucleotide sequences comprising the homologous region and its flanking regions, to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
The flanking regions refer to the remaining regions except for the homologous region in the reference nucleotide sequence. For example, when the homologous region is homologous to the portion X in the oligonucleotide of Formula (I), the flanking regions includes a region corresponding to the Y portion and a region corresponding to the Z portion. When the homologous region is homologous to the portion Z in the oligonucleotide of Formula (I), the flanking regions includes a region corresponding to the Y portion and a region corresponding to the X portion.
In still another embodiment of the present invention, when a partial sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and the reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z are provided.
For example, when a partial sequence (e.g., 10-15, 10-20, 10-30 or 10-35 nucleotides in length) of an oligonucleotide of Formula (I) of 40 nucleotides in length is compared against at least one database of nucleotide sequences, and then reference nucleotide sequences comprising a region (e.g., 10-15, 10-20, 10-30 or 10-35 nucleotides in length) homologous to the partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the numbers or ratios of matched or mismatched bases in the portions X and Z cannot be directly calculated using only the homologous regions, because the homologous region may not contains sequences corresponding to the portions X and Z in the oligonucleotide of Formula (I). In this case, in addition to the homologous region, its flanking regions are further used for the calculation of the numbers or ratios of matched or mismatched bases in the portions X and Z. In other words, the complete sequence of the oligonucleotide of Formula (I) is compared with a corresponding sequence in each the reference nucleotide sequences comprising the homologous region and its flanking regions, to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
As noted above, the homologous region in each of the reference nucleotide sequences may be the same or shorter in length compared to the oligonucleotide of Formula (I) provided in step (a). Specifically, when the complete sequence of the oligonucleotide of Formula (I) is compared against a database of nucleotide sequences and the number of the bases not involved in Watson-Crick base pairs in the Y portion is relatively small, reference nucleotide sequences comprising a region homologous to the complete sequence of the oligonucleotide of Formula (I) may be extracted. In contrast, when the number of bases not involved in Watson-Crick base pairs in the Y portion is relatively large, reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) may be extracted. Further, when a partial sequence of the oligonucleotide of Formula (I) is used for comparison, reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) may be extracted.
Such comparison or analysis may also be referred to as "extension of the comparison", in that the comparison in step (b) utilizes a partial sequence of the oligonucleotide while the comparison in step (c) utilizes the complete sequence of the oligonucleotide.
When reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted, the homologous region may be extended and then the numbers or ratios of matched or mismatched bases in the portions X and Z can be calculated. Extending the homologous region to calculate the numbers or ratios of matched or mismatched bases indicates that the homologous region is extended over a sequence corresponding to the complete sequence of the oligonucleotide, and then the numbers or ratios of matched or mismatched bases are calculated. In other words, it indicates that the sequences of the flanking regions are taken (or restored) from the extracted nucleic acid sequence or database to calculate the numbers or ratios of matched or mismatched bases in the portions X and Z.
The process using a partial sequence of the oligonucleotide of Formula (I) to obtain match/mismatch results of the portions X and Z is illustrated in Fig. 2.
As shown in Fig. 2, when a reference nucleotide sequence comprising a region homologous to the X portion of the oligonucleotide of Formula (I) is extracted, its flanking regions opposite to the Z portion are taken from the database or the extracted reference nucleotide sequence to calculate the numbers of mismatched bases in the portions X and Z. Conversely, when a reference nucleotide sequence comprising a region homologous to the Z portion of the oligonucleotide of Formula (I) is extracted, its flanking regions opposite to the X portion are taken from the database or the extracted reference nucleotide sequence to calculate the numbers of mismatched bases in the portions X and Z.
In the case of the oligonucleotides of Formula (I), the bases contained in the Y portion hybridize to corresponding bases in the target nucleic acid sequence at a relatively low affinity as compared to the bases that form Watson-Crick base pairs. That is, when the oligonucleotide of Formula (I) is hybridized to the target nucleic acid sequence, the Y portion can form a loop structure. This loop formation of the Y portion may narrow the space between a region to which the X portion hybridizes and a region to which the Z portion hybridizes.
Thus, given such hybridization variability, it would be preferred that a flanking region opposite to a portion X or Y of interest for the calculation of the number or ratio of matched or mismatched bases is determined by considering a portion of interest and its possible opposing regions.
For example, assuming that a total of 5 bases are contained in the Y portion, when a reference nucleotide sequence comprising a region homologous to the X portion of the oligonucleotide of Formula (I) is extracted, the flanking region opposite to the Z portion is generally a region which is 5 nucleotides apart from the homologous region to which the X portion is hybridized, but it may also be a region which is 4 nucleotides or 3 nucleotides apart from the homologous region to which the X portion is hybridized, due to the loop formation on the Y portion.
For example, when a total of 5 bases are contained in the Y portion, the calculation of the number or ratio of matched or mismatched bases may be made between the Z portion and a region which is 5 nucleotides from the region to which the X portion is hybridized, between the Z portion and a region which is 4 nucleotides from the region to which the X portion is hybridized, and between the Z portion and a region which is 3 nucleotides from the region to which the X portion is hybridized.
In an embodiment, there is provided the number of matched bases in each of the portion X and Z.
In an embodiment, there is provided the ratio of the number of mismatched bases to the number of matched bases in each of the portions X and Y.
In an embodiment, there is provided the ratio of the number of mismatched bases to the number of whole nucleotide sequence in each of the portions X and Y.
In an embodiment, there is provided the ratio of the number of matched bases to the number of mismatched bases in each of the portions X and Y.
In an embodiment, there is provided the ratio of the number of matched bases to the number of whole nucleotide sequence in each of the portions X and Y.
When either or both of the portions X and Z comprise at least one universal base or degenerate base, the method of the present invention may change the criterion for treating the universal bases or degenerate bases as a match or a mismatch, followed by providing the numbers or ratios of matched or mismatched bases based on the changed criterion in step (c).
In one embodiment of the present invention, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one universal base, the universal base may not be counted as mismatched bases in step (c). That is, when there is at least one universal base in either or both of the portions X and Z in the oligonucleotide of Formula (I), the universal base is treated as the matched base, regardless of the type of a corresponding nucleotide in each of the reference nucleotide sequences. For example, if there are three mismatched bases and one additional universal base in the X portion consisting of 15 nucleotides, then one embodiment of the invention may determine the total number of mismatched bases to be three (3).
In an embodiment of providing the number of matched bases in the portions X and Z, the universal base may or may not be counted as matched bases. For example, if there are three mismatched bases and one additional universal base in the X portion of 15 nucleotides in length, the total number of matched bases in the X portion may be determined to be twelve (12). Alternatively, the total number of matched bases in the X portion may be determined to be eleven (11).
In one embodiment of the present invention, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one degenerate base, the method of the present invention takes into account the match between the degenerated base and a corresponding base in the reference nucleotide sequence. That is, when a degenerate base is present in either or both of the portions X and Z in the oligonucleotide of Formula (I), the degenerate base may or may not be counted as mismatched bases in step (c), depending on the type of the degenerate base (depending on bases represented by the degenerate base).
In a particular embodiment, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one degenerate base, the degenerate base is not counted as mismatched bases in step (c), with a proviso that any one of bases represented by the degenerate base matches the corresponding base in the reference nucleotide sequence. Conventional sequence alignment algorithms or programs such as BLAST treat the degenerate base as mismatch, regardless of its type. In contrast, the present method is characterized in determining a match/mismatch based on the type of the degenerate base. For example, if there is a degenerate base "R" (either of the purine bases A or G) in the oligonucleotide and the corresponding base in the reference nucleotide sequence to be compared is adenine (A) or guanine (G), the present method treats the degenerate base as match. On the other hand, when the corresponding base in the reference nucleotide sequence to be compared is cytosine (C) or thymine (T), the present method treats the degenerate bases as mismatch. Thus, the method of the present invention can produce a more accurate match/mismatch results even in the presence of degenerate bases within the oligonucleotide of Formula (I) compared to conventional sequence alignment algorithm or program.
In another embodiment of the present invention, when either or both of the portions X and Z in the oligonucleotide of Formula (I) comprise at least one degenerate base, the degenerate base is converted into each of the bases encompassed by said degenerate base, and then performing steps (b) and (c).
For example, when there is a degenerate base "R" (either of the purine bases A or G) in the oligonucleotide of Formula (I), a first oligonucleotide in which the degenerate base "R" is converted into adenine (A) and a second oligonucleotide in which the degenerate base "R" is converted into guanine (G) are prepared, and are subjected to the present method, respectively. This method can prevent the degenerate bases from being judged as a mismatch and thus affecting the extraction of nucleotide sequences having a homology region.
According to one embodiment, the numbers or ratios of matched or mismatched bases between each of the portions X and Z in the oligonucleotide of Formula (I) and each of the reference nucleotide sequences can be expressed in a variety way.
For example, the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion may be collectively presented as Xm | Zm, (Xm, Zm), Xm-Zm, Xm & Zm, and the like; wherein "Xm" represents the number of mismatched bases in the X portion, and "Zm" represents the number of mismatched bases in the Z portion.
For example, the notation "0 | 0" indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is zero, and the number of mismatched bases between the Z portion and the reference nucleotide sequence is zero. In other word, the notation means that oligonucleotide of Formula (I) except for the Y portion perfectly matches the reference nucleotide sequence. On one hand, "1 | 0" indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is 1 and the number of mismatched bases between the Z portion and the reference nucleotide sequence is zero. On the other hand, "0 | 1" indicates that the number of mismatched bases between the X portion and the reference nucleotide sequence is zero and the number of mismatched bases between the Z portion and the reference nucleotide sequence is 1.
In addition to the notations above, it will be appreciated by those skilled in the art that the numbers of mismatched bases in the portions X and Z can be expressed in other ways.
In an embodiment, the number of the whole nucleotide sequence of each of portions X and Y or the number of the matched bases in each of the portions X and Y can be indicated additionally.
The numbers of mismatched bases are highly associated with the specificity of the oligonucleotide of Formula (I).
The numbers of mismatched bases in the X and Z portion may differently affect the evaluation of specificity, particularly annealing specificity of the oligonucleotide of Formula (I), while the Y portion has no effect on the evaluation of the specificity. As discussed above, the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion have a negative effect on the specificity of the oligonucleotide to a different degree. Taking into account the difference in the above effects, the method of the present invention may impart different weights to the two values, i.e., the number or ratio of mismatched bases in the X portion and the number or ratio of mismatched bases in the Z portion, for evaluation of more accurate specificity of the oligonucleotide of Formula (I).
According to one embodiment, the match in the Z portion is more important than that in the X portion upon the determination of the specificity (for example, a dual specificity oligonucleotide disclosed in WO 2006/095981). In this case, oligonucleotides having one mismatch in the Z portion can be evaluated to have poor specificity compared to oligonucleotides having one mismatch in the X portion. Further, oligonucleotides having one mismatch in the Z portion may be evaluated to have poor specificity compared to oligonucleotides having two, three or four mismatches in the X portion. Considering that the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion differently affect the evaluation of the specificity as described above, the weight to be given to the number of mismatched bases in the Z portion may be greater than the weight to be given to the number of mismatched bases in the X portion. The weights can be given by a person skilled in the art in various ways.
According to another embodiment, the match in the X portion is more important than that in the Z region upon the determination of the specificity (see, for example, a target discriminative (TD) probe disclosed in WO 2011/028041). In this case, the weight to be given to the number of mismatched bases in the X portion may be greater than the weight to be given to the number of mismatched bases in the Z portion.
Furthermore, an embodiment of the present invention may impart a penalty score to the oligonucleotide of Formula (I) based on the numbers of mismatched bases in the X and Z portions. The penalty score is a value reflecting the degradation of the specificity of the oligonucleotide of Formula (I).
The penalty score may be given per mismatched base. The penalty score to be given per mismatched base in the X portion and the penalty score to be given per mismatched base in the Z portion may be different from each other.
In one embodiment, if the match in the Z portion is more important than that in the X portion upon the determination of the specificity, the penalty score to be given per mismatched base in the X portion may be smaller than the penalty score to be given per mismatched base in the Z portion. Such difference in penalty scores can be achieved by giving a weighted penalty score. For example, assuming that the specificity of the oligonucleotide of Formula (I) in which no mismatched base is present in both the X and Z portions (i.e., the portions X and Z in the oligonucleotide perfectly matched with a target nucleic acid sequence) is "100", a penalty score of "10" may be given per a mismatched base in the X portion, and a penalty score of "20", "30", "40", "50", or "60" per a mismatched base in the Z portion. In this case, the specificity of the oligonucleotide having one mismatched base in the X portion will be "90" (= 100 - 10), and the specificity of the oligonucleotide having one mismatched base pair in the Z site will be "80", "70", "60", "50", or "40", respectively. As such, the present invention allows for accurate specificity evaluation by imparting different weighted penalty scores to the portion X and the portion Z, depending on the numbers of mismatched bases in the portions X and Z.
In another embodiment, if the match in the X portion is more important than that in the Z portion upon the determination of the specificity, the penalty score to be given per a mismatched base in the Z portion is smaller than that in the X portion.
Meanwhile, unlike the X and Z portions described above, the Y portion does not affect the evaluation of the specificity of the oligonucleotide, so that the Y portion is not considered in the evaluation of the specificity.
Since the present invention provides the match/mismatch results in the X and Z portions of the oligonucleotide individually, the specificity of each of X and Z portions can be evaluated individually by the match/mismatch results for each portion.
In an embodiment, the specificity of each of X and Z portions may be determined by assessing the match/mismatch results in each portion based on different criteria (e.g. different match/mismatch threshold).
For example, with regard to the specificity of the X portion, it is determined whether there are two or less mismatches between the X portion and a reference nucleotide sequence, and with regard to the specificity of the Z portion, it is determined whether there is one or less mismatch between the X portion and a reference nucleotide sequence.
The specificity of the oligonucleotide can be evaluated by combining the specificity evaluation in each portion, thereby determining the nucleotide sequences to which the oligonucleotide is anneal or hybridized.
In an embodiment, the number of match/mismatches per each portion can be predefined for evaluating specificity, and then the coverage, inclusivity and exclusivity of the oligonucleotide can be evaluated. Further, the coverage, inclusivity and exclusivity of the oligonucleotide can be modulated, if needed, by adjusting the hybridization conditions and the like.
In addition to the numbers or ratios of matched or mismatched bases in the portions X and Z, the present method may further provide the direction of the match between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences,. Specifically, the direction of the match may be provided to distinguish the oligonucleotide of the Formula (I) matched to the (+) strand (coding strand, sense strand, non-template strand) of the reference nucleotide sequence from the oligonucleotide of the Formula (I) matched to the (-) strand (non-coding strand, antisense strand, template strand) of the reference nucleotide sequence. For example, if the oligonucleotide of Formula (I) matches the (+) strand of the reference nucleotide sequence, an indication such as "F" or "+" may be presented, and otherwise an indication such as "R" or "-" may be presented. The direction of the match may be presented in combination of the numbers of mismatched bases in the X and Z portions described above. For example, notations such as "F Xm | Zm", "+ Xm | Zm", "R Xm | Zm", "- Xm | Zm", and the like can be used. The above notation is illustrated in Fig. 3. As shown in Fig. 3, the notation "- 1 | 0" indicates that the oligonucleotide of Formula (I) matches the (-) strand of the reference nucleotide sequence, and that the oligonucleotide of Formula (I) has one mismatched base in the X portion and no mismatched base in the Z portion, in a simple and intuitive manner.
The present method may further biological features of the reference nucleotide sequences.
The biological features of the reference nucleotide sequences comprise the sources, gene IDs, or descriptions of the reference nucleotide sequences extracted. In addition, the biological features of the reference nucleotide sequences may include the position of a region corresponding to the oligonucleotide (for example, the position numbers of the nucleotides at the 5' end and the 3' end). Further, the biological features of the reference nucleotide sequences may include a list of reference nucleotide sequences having some homology with the designed oligonucleotide. The biological features of the reference nucleotide sequences may include one or more features provided in the sequence alignment algorithm or program, such as the conventional BLAST algorithm.
The biological features of the reference nucleotide sequences may be useful in evaluating specificity of an oligonucleotide. The user analyzes the list of reference nucleotide sequences comprising a region homologous to the designed oligonucleotide and their specific sequence information, thereby determining whether the designed oligonucleotide amplifies or detects (or hybridizes to) only the target nucleic acid sequence, but not the non-target nucleic acid sequence. Furthermore, it is possible to control the degree of mismatch of the oligonucleotide, specifically, the degree of mismatch in the X and Z portions with regard to a target nucleic acid sequence. The presence of a target nucleic acid sequence and the absence of non-target nucleic acid sequences in the list of reference nucleotide sequences indicate that the oligonucleotide is suitable for amplification or detection of a target nucleic acid sequence. In contrast, the presence of non-target nucleic acid sequences in the list of reference nucleotide sequences indicates that the oligonucleotide is not suitable for amplification or detection of a target nucleic acid sequence, which becomes a strong basis for selecting other oligonucleotides.
The biological features of the reference nucleotide sequences include information conducive to determination of the target coverage of the oligonucleotide.
The present method may further provide results of classification of the reference nucleotide sequences according to the number of mismatched bases in the portion X and the number of mismatched bases in the portion Z.
The user needs to identify reference nucleotide sequences which are homologous to the designed oligonucleotide, and thus the provision of such results of classification is very useful in determining the specificity of the designed oligonucleotide.
The results of classification of the reference nucleotide sequences are those obtained by grouping (sorting) the reference nucleotide sequences on the basis of the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion, including, for example, a list and the number of reference nucleotide sequences belonging to each group, and a biological properties of each of the reference nucleotide sequences.
Primers or probes may also hybridize with reference nucleotide sequences with a few mismatches under certain hybridization conditions. Therefore, in order to evaluate the suitability or workability of the designed primer or probe, it is necessary to check perfectly matched reference nucleotide sequences as well as partially matched reference nucleotide sequences. To this end, the method of the present invention provides a list and the number of reference nucleotide sequences belonging to each group, and biological properties of each of the reference nucleotide sequences in a simple and intuitive manner.
Specifically, the number of reference nucleotide sequences having the mismatch of "0 | 0" (the number of mismatched bases in the X portion is zero and the number of mismatched bases in the Z portion is zero) with regard to the oligonucleotide of Formula (I) may be provided. Further, the number of reference nucleotide sequences having the mismatch of "1 | 0" (the number of mismatched bases in the X portion is 1 and the number of mismatched bases in the Z portion is zero), "0 | 1", "1 | 2", "2 | 2", "3 | 0", "3 | 1", "3 | 2", and the like, with regard to the oligonucleotide of Formula (I) may be provided.
For example, if the number of reference nucleotide sequences belonging to the mismatch type "0 | 0" is provided as "30", it means that there are 30 reference nucleotide sequences which are 100% identical to the X and Z portions in the oligonucleotide of Formula (I). Those skilled in the art will take into account information about the reference nucleotide sequences corresponding to "1 | 0", "0 | 1", "1 | 1", "2 | 0", "2 | 1", "0 | 2", "2 | 2", "3 | 0", "3 | 1", "3 | 2", or the like for accurate evaluation of specificity of the oligonucleotide of Formula (I).
If the match in the Z portion is more important than that in the X portion upon evaluation of specificity, reference nucleotide sequences corresponding to the mismatch type "1 | 0" are highly likely to be amplified or detected using the oligonucleotide of Formula (I). Thus, when there is a non-target nucleic acid sequence among reference nucleotide sequence belonging to the mismatch type "1 | 0", the user may design another oligonucleotide to avoid amplification or detection of the non-target nucleic acid sequence, or may ignore the amplification or detection of the non-target nucleic acid sequences if the number or the importance of the non-target nucleic acid sequences is low. If there are a target nucleic acid sequence within the mismatch type "0 | 1", the target nucleic acid sequence are likely not to be amplified or detected using the oligonucleotide of Formula (I). Thus, the user can modify the sequence of the oligonucleotide of Formula (I) (e.g., by incorporating a degenerate base) or to design another oligonucleotide, in order to cover the target nucleic acid sequence belonging to the mismatch type "0 | 1". Also, if non-target nucleic acid sequences are present among the reference nucleotide sequences corresponding to the mismatch type "0 | 1", it is preferable to check whether the non-target nucleic acid sequences are to be amplified or detected, and then determine the use of the oligonucleotide. The results of classification of the reference nucleotide sequences based on the number of mismatched bases in the X portion and the number of mismatched bases in the Z portion are useful in evaluating the specificity of the designed oligonucleotide in a more simple and intuitive manner.
The results of classification may further comprise information of each reference nucleotide sequence.
Further, the information provided may be used to determine whether the oligonucleotide exhibits the same match results as those that have been reviewed at the time of initial design. For example, assuming that the oligonucleotide of Formula (I) was designed to match five target nucleic acid sequences in the mismatch type "0 | 0" (the number of mismatched bases in the X portion is zero and the number of mismatched bases in the Z portion is zero), to match three target nucleic acid sequences in the mismatch type "1 | 0" and to match two target nucleic acid sequences in the mismatch type "1 | 1", it is possible to determine whether the same results are obtained as the mismatch results considered in the design, by comparing the designed oligonucleotide against a database containing only the target nucleic acid sequences, and identifying the numbers of target nucleic acid sequences belonging to the mismatch types "0 | 0", "1 | 0", and "1 | 1", respectively.
In addition, the further results of classification may be used to verify the coverage of the oligonucleotide of Formula (I). The user can analyze the results of classification and identify the target nucleic acid sequences to be amplified or detected using the designed oligonucleotide, so that the results of classification can be used to verify the coverage of the oligonucleotide of Formula (I).
Meanwhile, the method of the present invention may further provide information on the sequence similarity between the oligonucleotide and the each of the reference nucleotide sequences.
The information on the similarity can be displayed in various ways. In one embodiment, the information on the similarity can be expressed as the number of matched nucleotides relative to the total number of nucleotides of the designed oligonucleotide, or a percent-identity score thereof.
In particular, the information on the similarity may be calculated by excluding the similarity between the portion Y of the oligonucleotide and a corresponding region of the reference nucleotide sequence. For example, assuming that the X portion is p nucleotides in length, the Y portion is q nucleotides in length, and the portion Z is r nucleotides in length, the similarity (%) may be calculated by [(total number of nucleotides matched in the X portion and the Z portion) / (p + r)] * 100.
Alternatively, the information on the similarity may be calculated by assuming that the portion Y of the oligonucleotide and the corresponding portion of the reference nucleotide sequence match each other. For example, assuming that the X portion is p nucleotides in length, the Y portion is q nucleotides in length, and the portion Z is r nucleotides in length, the similarity (%) may be calculated by [(total number of nucleotides matched in the X portion and the Z portion + q) / (p + q + r)] * 100.
As another alternative, the similarity between the X portion of the oligonucleotide and a corresponding portion of the reference nucleotide sequence, and the similarity between the Z portion of the oligonucleotide and a corresponding portion of the reference nucleotide sequence are provided separately.
Meanwhile, when at least one universal base or degenerate base is present in either or both of the portions X and Z in the oligonucleotide of Formula (I), the sequence similarity can be determined by treating the universal base or degenerate base in the same manner as the treatment of the universal base or degenerate base in the calculation of the number of mismatched bases.
As described above, the method of the present invention provides information on the specificity of an oligonucleotide in a variety of ways, allowing a user to analyze the homology of the oligonucleotide with target and non-target nucleic acid sequences in an easier, faster and more intuitive manner.
Since the method of the present invention is characterized by providing information on the specificity of oligonucleotides, it may also be referred to as a method of providing information on the specificity of oligonucleotides.
Those skilled in the art can evaluate specificity of the designed oligonucleotide using the information provided by the method of present invention. Thus, the method of the present invention may further comprise the step of evaluating specificity of the oligonucleotide of Formula (I) using the information provided in the step (c).
Evaluating specificity of the oligonucleotide of Formula (I) using the information provided in the step (c) may be accomplished by determining inclusivity and exclusivity of the oligonucleotide of Formula (I).
The method of the present invention may be used to evaluate workability of the oligonucleotide, particularly the oligonucleotide represented by the Formula (I), as a primer or a probe. The match/mismatch results in the portions X and Z provided in step (c)
The match/mismatch results in the portions X and Z provided in step (c) permits to ascertain whether the oligonucleotide is hybridized to a particular target nucleic acid sequence. Thus, the method of the present invention can be used to determine that the oligonucleotide will act as a primer or probe for a particular target nucleic acid sequence.
The methods as describe above may be embodied on a computer by software including instructions for implementing a process for executing the methods.
II. Storage medium, Computer program and Device
Since the storage medium, the device and the computer program of the prevent invention described hereinbelow are intended to perform the present methods in a computer, the common descriptions between them are omitted in order to avoid undue redundancy leading to the complexity of this specification.
In another aspect of this invention, there is provided a computer readable storage medium containing instructions to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
(a) providing an oligonucleotide represented by the following Formula (I):
5'-X-Y-Z-3' (I)
wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
(b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
(c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
In still another aspect of this invention, there is provided a computer program to be stored on a computer readable storage medium to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
(a) providing an oligonucleotide represented by the following Formula (I):
5'-X-Y-Z-3' (I)
wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
(b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
(c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
The program instructions are operative, when performed by the processor, to cause the processor to perform the present method described above. The program instructions for performing the present method may comprise (i) an instruction to compare the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences; (ii) an instruction to extract reference nucleotide sequences having a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database; (iii) an instruction to portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
The present method described above is implemented in a processor, such as a processor in a stand-alone computer, a network attached computer or a data acquisition device such as a real-time PCR machine.
The types of the computer readable storage medium include various storage medium such as CD-R, CD-ROM, DVD, flash memory, floppy disk, hard drive, portable HDD, USB, magnetic tape, MINIDISC, nonvolatile memory card, EEPROM, optical disk, optical storage medium, RAM, ROM, system memory and web server.
The oligonucleotide of Formula (I) for amplifying or detecting a target nucleic acid sequence may be provided in various ways. For example, the sequence of the oligonucleotide of Formula (I) may be provided to a separate system such as a desktop computer system via a network connection (e.g., LAN, VPN, intranet and Internet) or direct connection (e.g., USB or other direct wired or wireless connection), or provided on a portable medium such as a CD, DVD, floppy disk, portable HDD or the like.
The instructions to configure the processor to perform the present invention may be included in a logic system. The instructions may be downloaded and stored in a memory module (e.g., hard drive or other memory such as a local or attached RAM or ROM), although the instructions can be provided on any software storage medium such as a portable HDD, USB, floppy disk, CD and DVD. A computer code for implementing the present invention may be implemented in a variety of coding languages such as C, C++, Java, Visual Basic, VBScript, JavaScript, Perl and XML. In addition, a variety of languages and protocols may be used in external and internal storage and transmission of data and commands according to the present invention.
In a further aspect of this invention, there is provided a device for evaluating specificity of an oligonucleotide, comprising (a) a computer processor and (b) the computer readable storage medium described above coupled to the computer processor.
The processor may be prepared in such a manner that a single processor can do several performances. Alternatively, the processor unit may be prepared in such a manner that several processors do the several performances, respectively.
The present invention will now be described in further detail by examples. It would be obvious to those skilled in the art that these examples are intended to be more concretely illustrative and the scope of the present invention as set forth in the appended claims is not limited to or by the examples.
EXAMPLES
EXAMPLE 1: Evaluation of specificity of an oligonucleotide according to one embodiment of the present invention
<1-1> Design of dual priming oligonucleotide ( DPO )
By referring to the disclosure of WO 2006/095981, a DPO primer (SEQ ID NO: 1) was designed to amplify a 16S ribosomal RNA of Bacteroides fragilis (Genbank Accession No: HM352993.1) as a target nucleic acid sequence. The nucleotide sequence of the designed DPO primer is shown below:
5'-GACTCTAGAGAGACTGCCGTCGTAAIIIIIGAGGAAGGTG-3' (SEQ ID NO: 1)
As shown above, the DPO primer has three distinct portions: (i) a portion "X" at its 5' end: GACTCTAGAGAGACTGCCGTCGTAA; (ii) a separation portion "Y" consisting of five (5) deoxyinosine (I) (as highlighted in bold) as a universal base: IIIII; (iii) a portion "Z" at its 3' end: GAGGAAGGTG.
<1-2> Evaluation of specificity using BLAST
In order to evaluate specificity of the DPO primer, the portion X in the DPO primer (i.e., 5'-GACTCTAGAGAGACTGCCGTCGTAA-3') was compared against the GenBank database using the BLAST for homology analysis. The parameters used in the BLAST algorithm are as follows:
- query: primer sequence filename in FASTA format
- db: nucleotide database filename
- out: filename to be saved
- evalue: 1000
- word_size: 4
- perc_identity: 60
- num_alignments: 1000000
- num_descriptions: 1000000
As a result, a total of 2387 reference nucleotide sequences, each of which contains a region homologous to the portion X in the DPO primer, were extracted. Each of the extracted reference nucleotide sequences contained a homologous region and optionally flanking regions which can be used for comparison with the Z portion.
The extracted reference nucleotide sequences, each containing a homologous region and its flanking regions, were each compared with the complete sequence of the DPO primer, to obtain the number of mismatched bases between the portion X of the DPO primer and each of the reference nucleotide sequences as well as the number of mismatched bases between the portion Z of the DPO primer and each of the reference nucleotide sequences (see Fig. 2).
The result of the comparison between the DPO primer and one of the reference nucleotide sequences is shown in Fig. 3.
It can be seen from Fig. 3 that the DPO primer has one mismatched base in the portion X and zero (0) mismatched base with regard to the exemplary reference nucleotide sequence. In addition, the DPO primer was found to match the (-) strand of the reference nucleotide sequence.
The information was displayed as the notation "D Xm | Zm". In the above notation, "D" refers to the direction of the match of the oligonucleotide of interest relative to the reference nucleotide sequence. Specifically, "+" means that the oligonucleotide of interest matches the (+) strand of the reference nucleotide sequence, and "-" means that the oligonucleotide of interest matches the (-) strand of the reference nucleotide sequence. Also, "Xm" indicates the number of mismatched bases in the portion X, and "Zm" indicates the number of mismatched bases in the portion Z. The result was presented as "- 1 | 0" in Fig. 3.
The reference nucleotide sequences were then sorted according to the number of mismatched bases in the portion X and the number of mismatched bases in the portion Z. The results are shown in Table 1 below.
Mismatch types and number of reference nucleotide sequences included in each mismatch type
0|0 1|0 0|1 1|1 2|0 2|1 0|2 1|2 2|2 3|0 3|1 3|2 0|3 1|3 2|3 3|3
230 422 - 10 - - - - - - - - - - - -
4|0 4|1 4|2 4|3 0|4 1|4 2|4 3|4 4|4 5|0 5|1 5|2 5|3 5|4 0|5 1|5
- - - - - - - - - - - - - - - -
2|5 3|5 4|5 5|5 6|0 6|1 6|2 6|3 6|4 6|5 7|0 7|1 7|2 7|3 7|4 etc.
- - - - - - - - - - - 1 - - - 1724
Hit(Match/Mismatch) number 2387
Among the above mismatch types, the reference nucleotide sequences included in the mismatch types "0 | 0" (230 reference nucleotide sequences), "1 | 0" (422 reference nucleotide sequences) and "1 | 1" (10 reference nucleotide sequences), which are likely to hybridize with the DPO primer of the present invention, were examined for their sources. As a result, it was revealed that all the reference nucleotide sequences included in such mismatch types were derived from Bacteroides fragilis. This indicates that the designed DPO primer has specificity to the nucleic acid sequence of Bacteroides fragilis.
The results also provide information on the coverage of target nucleic acid sequences to be amplified using the designed DPO primer, depending on hybridization conditions. Specifically, from the results above, one of skill in the art will recognize that target nucleic acid sequences included in the mismatch type "0 | 0", "1 | 0" and "1 | 1" can be amplified by adjustment of the hybridization conditions.
Further, the results provide information about whether the DPO primers have an annealing specificity for each of the extracted reference nucleotides.
As such, the designed oligonucleotide can be evaluated for its specificity in a more straightforward and intuitive manner.
Having described a preferred embodiment of the present invention, it is to be understood that variants and modifications thereof falling within the spirit of the invention may become apparent to those skilled in this art, and the scope of this invention is to be determined by appended claims and their equivalents.

Claims (17)

  1. A method for evaluating specificity of an oligonucleotide, comprising the steps of:
    (a) providing an oligonucleotide represented by the following Formula (I):
    5'-X-Y-Z-3' (I)
    wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
    (b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
    (c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
  2. The method of claim 1, wherein when the complete sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and reference nucleotide sequences comprising a region homologous to a partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z is provided.
  3. The method of claim 1, wherein when a partial sequence of the oligonucleotide of Formula (I) is compared to at least one database of nucleotide sequences and the reference nucleotide sequences comprising a region homologous to the partial sequence of the oligonucleotide of Formula (I) are extracted in step (b), the portion-by-portion match/mismatch between the complete sequence of the oligonucleotide of Formula (I) and the homologous region and its flanking regions of each of the reference nucleotide sequences is analyzed, and the numbers or ratios of matched or mismatched bases in the portions X and Z is provided.
  4. The method of claim 3, wherein the partial sequence of the oligonucleotide of Formula (I) used in the comparison of step (b) is the portion X, the portion Z, or a part thereof.
  5. The method of claim 3, wherein the partial sequence of the oligonucleotide of Formula (I) used in the comparison of step (b) is the portion X, the portion Z, or a part thereof.
  6. The method of claim 5, wherein the sequence alignment algorithm or program is selected from the group consisting of Smith & Waterman, Needleman-Wunsch, BLAST, and FASTA.
  7. The method of claim 1, wherein either or both of the X and Z portions comprise at least one universal base or degenerate base.
  8. The method of claim 7, wherein the universal base is not counted as mismatched bases in step (c).
  9. The method of claim 7, wherein the degenerate base is not counted as mismatched bases in step (c), with a proviso that any one of bases represented by the degenerate base matches a corresponding base in the reference nucleotide sequence.
  10. The method of claim 1, which further provides biological features of each of the reference nucleotide sequences in step (c).
  11. The method of claim 1, which further provides results of classification of the reference nucleotide sequences according to the number of mismatched bases in the portion X and the number of mismatched bases in the portion Z.
  12. The method of claim 1, wherein the oligonucleotide of formula (I) is a primer or a probe.
  13. The method of claim 1, wherein the oligonucleotide of formula (I) is a primer or a probe.
  14. The method of claim 1, wherein the bases comprised in the separation portion Y are selected from non-natural bases; universal bases; mismatched bases and combinations thereof.
  15. A computer readable storage medium containing instructions to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
    (a) providing an oligonucleotide represented by the following Formula (I):
    5'-X-Y-Z-3' (I)
    wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
    (b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
    (c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
  16. A device for evaluating specificity of an oligonucleotide, comprising (a) a computer processor and (b) the computer readable storage medium of claim 15 coupled to the computer processor.
  17. A computer program to be stored on a computer readable storage medium to configure a processor to perform a method for evaluating specificity of an oligonucleotide, the method comprising:
    (a) providing an oligonucleotide represented by the following Formula (I):
    5'-X-Y-Z-3' (I)
    wherein X represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence, Y represents a separation portion comprising two or more consecutive bases, each of which is not involved in Watson-Crick base pairs, and Z represents a portion comprising a hybridizing nucleotide sequence hybridized to a target nucleic acid sequence;
    (b) comparing the complete or a partial sequence of the oligonucleotide of Formula (I) against at least one database of nucleotide sequences, and extracting reference nucleotide sequences comprising a region homologous to the complete or a partial sequence of the oligonucleotide of Formula (I) from the database;
    (c) analyzing portion-by-portion match/mismatch between the oligonucleotide of Formula (I) and each of the reference nucleotide sequences to provide (i) the number or ratio of matched or mismatched bases between the portion X of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences and individually (ii) the number or ratio of matched or mismatched bases between the portion Z of the oligonucleotide of Formula (I) and each of the reference nucleotide sequences.
PCT/KR2017/005818 2016-06-03 2017-06-02 Evaluation of specificity of oligonucleotides WO2017209575A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020197000224A KR102189358B1 (en) 2016-06-03 2017-06-02 Evaluation of the specificity of oligonucleotides

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20160069487 2016-06-03
KR10-2016-0069487 2016-06-03

Publications (1)

Publication Number Publication Date
WO2017209575A1 true WO2017209575A1 (en) 2017-12-07

Family

ID=60477701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2017/005818 WO2017209575A1 (en) 2016-06-03 2017-06-02 Evaluation of specificity of oligonucleotides

Country Status (2)

Country Link
KR (1) KR102189358B1 (en)
WO (1) WO2017209575A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021261924A1 (en) * 2020-06-24 2021-12-30 Seegene, Inc. Computer-implemented method for providing coverage of oligonucleotide set for plurality of nucleic acid sequences
WO2022232395A3 (en) * 2021-04-28 2022-12-08 Q-State Biosciences, Inc. Therapeutic compositions for treating pain via multiple targets

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024136630A1 (en) * 2022-12-23 2024-06-27 주식회사 씨젠 Method for sequence homology search of nucleotide database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030097223A1 (en) * 1999-12-14 2003-05-22 Hitachi, Ltd. Primer design system
US6898531B2 (en) * 2001-09-05 2005-05-24 Perlegen Sciences, Inc. Algorithms for selection of primer pairs
US20050164184A1 (en) * 2001-12-08 2005-07-28 Jong-Yoon Chun Hybridization portion control oligonucleotide and its uses

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100977186B1 (en) * 2005-03-05 2010-08-23 주식회사 씨젠 Processes Using Dual Specificity Oligonucleotide and Dual Specificity Oligonucleotide

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030097223A1 (en) * 1999-12-14 2003-05-22 Hitachi, Ltd. Primer design system
US6898531B2 (en) * 2001-09-05 2005-05-24 Perlegen Sciences, Inc. Algorithms for selection of primer pairs
US20050164184A1 (en) * 2001-12-08 2005-07-28 Jong-Yoon Chun Hybridization portion control oligonucleotide and its uses

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LOAKES ET AL.: "3-Nitropyrrole and 5-nitroindole as universal bases in primers for DNA sequencing and PCR", NUCLEIC ACIDS RESEARCH, vol. 23, no. 13, 11 July 1995 (1995-07-11), pages 2361 - 2366, XP002109690 *
YE ET AL.: "Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction", BMC BIOINFORMATICS, vol. 13, no. 134, 18 June 2012 (2012-06-18), pages 1 - 11, XP021132324 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021261924A1 (en) * 2020-06-24 2021-12-30 Seegene, Inc. Computer-implemented method for providing coverage of oligonucleotide set for plurality of nucleic acid sequences
WO2022232395A3 (en) * 2021-04-28 2022-12-08 Q-State Biosciences, Inc. Therapeutic compositions for treating pain via multiple targets

Also Published As

Publication number Publication date
KR102189358B1 (en) 2020-12-09
KR20190003868A (en) 2019-01-09

Similar Documents

Publication Publication Date Title
WO2017209575A1 (en) Evaluation of specificity of oligonucleotides
WO2015068957A1 (en) Method for the detection of multiple target nucleic acids using clamping probes and detection probes
WO2017188669A2 (en) Method for detecting target nucleic acid sequence using cleaved complementary tag fragment and composition thereof
WO2018139826A1 (en) Age predicting method using dna methylation
WO2019013451A1 (en) Dna polymerase with increased gene mutation specificity and pcr buffer composition for increasing activity thereof
WO2021230663A1 (en) Method for predicting prognosis of patients having early breast cancer
WO2021261924A1 (en) Computer-implemented method for providing coverage of oligonucleotide set for plurality of nucleic acid sequences
WO2021182881A1 (en) Multiple biomarkers for breast cancer diagnosis and use thereof
WO2011139032A2 (en) Primer composition for amplifying a gene region having diverse variations in a target gene
WO2022097844A1 (en) Method for predicting survival prognosis of pancreatic cancer patients by using gene copy number variation information
WO2015167087A1 (en) Method for predicting risk of ankylosing spondylitis using dna copy number variants
WO2022025623A1 (en) System and method for prime editing efficiency prediction using deep learning
WO2022124848A1 (en) Computer-implemented method for preparing oligonucleotides used to detect nucleotide mutation of interest
WO2022045859A1 (en) Computer-implemented method for providing nucleic acid sequence data set for design of oligonucleotide
WO2017213458A1 (en) Methods for preparing tagging oligonucleotides
WO2011081501A2 (en) Molecular marker for discrimination of ms3 (nuclear-genetic male sterility gene) and certification method using same
WO2024136630A1 (en) Method for sequence homology search of nucleotide database
WO2019135477A2 (en) Method of diagnosing tsutsugamushi disease by using multicopy gene
EP3895170A1 (en) Method for detecting a target analyte in a sample using an s-shaped function for a slope data set
WO2012150818A2 (en) Method for predicting a risk of chronic myeloid leukemia, and diagnosis kit using same
WO2022220575A1 (en) Genetic polymorphism marker for determining skin color, and use thereof
WO2023106680A1 (en) Method and system for predicting change in skin brightness of prescription containing vitamin c
WO2023075569A1 (en) Probe set for isothermal single reaction using split t7 promoter, and use thereof
WO2024053921A1 (en) Method and device for predicting prime editing efficiency of various prime editors in different cell types
WO2022265463A1 (en) Detection of multiple target nucleic acid using multiple detection temperatures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17807068

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20197000224

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 17807068

Country of ref document: EP

Kind code of ref document: A1