CA2593916A1 - Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same - Google Patents
Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same Download PDFInfo
- Publication number
- CA2593916A1 CA2593916A1 CA002593916A CA2593916A CA2593916A1 CA 2593916 A1 CA2593916 A1 CA 2593916A1 CA 002593916 A CA002593916 A CA 002593916A CA 2593916 A CA2593916 A CA 2593916A CA 2593916 A1 CA2593916 A1 CA 2593916A1
- Authority
- CA
- Canada
- Prior art keywords
- library
- probe
- probes
- sequences
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000000523 sample Substances 0.000 title claims abstract description 571
- 150000007523 nucleic acids Chemical class 0.000 title claims abstract description 182
- 102000039446 nucleic acids Human genes 0.000 title claims abstract description 145
- 108020004707 nucleic acids Proteins 0.000 title claims abstract description 145
- 238000000034 method Methods 0.000 title claims abstract description 119
- 239000000203 mixture Substances 0.000 title claims abstract description 43
- 238000004458 analytical method Methods 0.000 title description 8
- 238000001514 detection method Methods 0.000 claims abstract description 68
- 108020004711 Nucleic Acid Probes Proteins 0.000 claims abstract description 9
- 239000002853 nucleic acid probe Substances 0.000 claims abstract description 9
- 238000003556 assay Methods 0.000 claims description 119
- 239000013615 primer Substances 0.000 claims description 111
- 239000002987 primer (paints) Substances 0.000 claims description 111
- 125000003729 nucleotide group Chemical group 0.000 claims description 108
- 239000002773 nucleotide Substances 0.000 claims description 107
- 108091034117 Oligonucleotide Proteins 0.000 claims description 83
- 108020005187 Oligonucleotide Probes Proteins 0.000 claims description 83
- 239000002751 oligonucleotide probe Substances 0.000 claims description 83
- 241000282414 Homo sapiens Species 0.000 claims description 59
- 230000000295 complement effect Effects 0.000 claims description 59
- 230000009977 dual effect Effects 0.000 claims description 47
- 230000027455 binding Effects 0.000 claims description 44
- 238000009739 binding Methods 0.000 claims description 44
- 108091093088 Amplicon Proteins 0.000 claims description 43
- 101710163270 Nuclease Proteins 0.000 claims description 42
- 108020004414 DNA Proteins 0.000 claims description 38
- 239000000178 monomer Substances 0.000 claims description 36
- 108020004999 messenger RNA Proteins 0.000 claims description 35
- 238000006243 chemical reaction Methods 0.000 claims description 29
- 239000002299 complementary DNA Substances 0.000 claims description 29
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 claims description 27
- 230000014509 gene expression Effects 0.000 claims description 26
- 238000012986 modification Methods 0.000 claims description 26
- 230000004048 modification Effects 0.000 claims description 26
- -1 hydroxy, amino Chemical group 0.000 claims description 25
- 229910052739 hydrogen Inorganic materials 0.000 claims description 24
- 239000001257 hydrogen Substances 0.000 claims description 24
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 23
- 230000003321 amplification Effects 0.000 claims description 21
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 21
- 239000000126 substance Substances 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 19
- 230000015572 biosynthetic process Effects 0.000 claims description 18
- 238000009396 hybridization Methods 0.000 claims description 18
- 230000001965 increasing effect Effects 0.000 claims description 17
- 238000003786 synthesis reaction Methods 0.000 claims description 17
- 150000008300 phosphoramidites Chemical class 0.000 claims description 12
- 239000007787 solid Substances 0.000 claims description 12
- 239000012634 fragment Substances 0.000 claims description 11
- 125000000217 alkyl group Chemical group 0.000 claims description 10
- 125000003118 aryl group Chemical group 0.000 claims description 10
- 238000002844 melting Methods 0.000 claims description 10
- 230000008018 melting Effects 0.000 claims description 10
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 claims description 9
- 238000011529 RT qPCR Methods 0.000 claims description 9
- 125000003277 amino group Chemical group 0.000 claims description 9
- OZFPSOBLQZPIAV-UHFFFAOYSA-N 5-nitro-1h-indole Chemical compound [O-][N+](=O)C1=CC=C2NC=CC2=C1 OZFPSOBLQZPIAV-UHFFFAOYSA-N 0.000 claims description 7
- 239000011230 binding agent Substances 0.000 claims description 7
- 238000006467 substitution reaction Methods 0.000 claims description 7
- 125000002490 anilino group Chemical group [H]N(*)C1=C([H])C([H])=C([H])C([H])=C1[H] 0.000 claims description 6
- 239000000975 dye Substances 0.000 claims description 6
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims description 6
- 239000010931 gold Substances 0.000 claims description 6
- 229910052737 gold Inorganic materials 0.000 claims description 6
- 238000009830 intercalation Methods 0.000 claims description 6
- JJZHVDFHKFWLIH-UHFFFAOYSA-N 1-[3-[bis(4-methoxyphenyl)-phenylmethoxy]propylamino]-4-(3-hydroxypropylamino)anthracene-9,10-dione Chemical compound C1=CC(OC)=CC=C1C(C=1C=CC(OC)=CC=1)(C=1C=CC=CC=1)OCCCNC1=CC=C(NCCCO)C2=C1C(=O)C1=CC=CC=C1C2=O JJZHVDFHKFWLIH-UHFFFAOYSA-N 0.000 claims description 5
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 claims description 5
- 229910019142 PO4 Inorganic materials 0.000 claims description 5
- 125000003545 alkoxy group Chemical group 0.000 claims description 5
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 claims description 5
- 239000010452 phosphate Substances 0.000 claims description 5
- 241000894007 species Species 0.000 claims description 5
- LUYYUZYWBTWOFO-UHFFFAOYSA-N 1,4-bis(3-hydroxypropylamino)anthracene-9,10-dione Chemical group O=C1C2=CC=CC=C2C(=O)C2=C1C(NCCCO)=CC=C2NCCCO LUYYUZYWBTWOFO-UHFFFAOYSA-N 0.000 claims description 4
- OQSYYTSWSXEJHW-UHFFFAOYSA-N 1,8-bis(3-hydroxypropylamino)anthracene-9,10-dione Chemical compound O=C1C2=CC=CC(NCCCO)=C2C(=O)C2=C1C=CC=C2NCCCO OQSYYTSWSXEJHW-UHFFFAOYSA-N 0.000 claims description 4
- 239000003155 DNA primer Substances 0.000 claims description 4
- 201000010099 disease Diseases 0.000 claims description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 4
- 150000004713 phosphodiesters Chemical class 0.000 claims description 4
- ZEMGGZBWXRYJHK-UHFFFAOYSA-N thiouracil Chemical compound O=C1C=CNC(=S)N1 ZEMGGZBWXRYJHK-UHFFFAOYSA-N 0.000 claims description 4
- FQTRYNSHWKHDNN-UHFFFAOYSA-N 1,4-bis(2-hydroxyethylamino)-6-methylanthracene-9,10-dione Chemical group C1=CC(NCCO)=C2C(=O)C3=CC(C)=CC=C3C(=O)C2=C1NCCO FQTRYNSHWKHDNN-UHFFFAOYSA-N 0.000 claims description 3
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 claims description 3
- 241000699666 Mus <mouse, genus> Species 0.000 claims description 3
- 241000700159 Rattus Species 0.000 claims description 3
- 125000003710 aryl alkyl group Chemical group 0.000 claims description 3
- 230000009870 specific binding Effects 0.000 claims description 3
- LQCXSTKAMNHLCH-UHFFFAOYSA-N 1,4-bis(3-hydroxypropylamino)-6-methylanthracene-9,10-dione Chemical compound C1=CC(NCCCO)=C2C(=O)C3=CC(C)=CC=C3C(=O)C2=C1NCCCO LQCXSTKAMNHLCH-UHFFFAOYSA-N 0.000 claims description 2
- CFQZCRRDAULXRH-UHFFFAOYSA-N 1,5-bis(3-hydroxypropylamino)anthracene-9,10-dione Chemical compound O=C1C2=C(NCCCO)C=CC=C2C(=O)C2=C1C=CC=C2NCCCO CFQZCRRDAULXRH-UHFFFAOYSA-N 0.000 claims description 2
- RBUHNGWCCHXPOS-UHFFFAOYSA-N 1-[3-[bis(4-methoxyphenyl)-phenylmethoxy]propylamino]-5-(3-hydroxypropylamino)anthracene-9,10-dione Chemical compound C1=CC(OC)=CC=C1C(C=1C=CC(OC)=CC=1)(C=1C=CC=CC=1)OCCCNC1=CC=CC2=C1C(=O)C1=CC=CC(NCCCO)=C1C2=O RBUHNGWCCHXPOS-UHFFFAOYSA-N 0.000 claims description 2
- QUNZQVRSTLNGHH-UHFFFAOYSA-N 1-[3-[bis(4-methoxyphenyl)-phenylmethoxy]propylamino]-8-(3-hydroxypropylamino)anthracene-9,10-dione Chemical compound C1=CC(OC)=CC=C1C(C=1C=CC(OC)=CC=1)(C=1C=CC=CC=1)OCCCNC1=CC=CC2=C1C(=O)C1=C(NCCCO)C=CC=C1C2=O QUNZQVRSTLNGHH-UHFFFAOYSA-N 0.000 claims description 2
- PRDFBSVERLRRMY-UHFFFAOYSA-N 2'-(4-ethoxyphenyl)-5-(4-methylpiperazin-1-yl)-2,5'-bibenzimidazole Chemical compound C1=CC(OCC)=CC=C1C1=NC2=CC=C(C=3NC4=CC(=CC=C4N=3)N3CCN(C)CC3)C=C2N1 PRDFBSVERLRRMY-UHFFFAOYSA-N 0.000 claims description 2
- FWBHETKCLVMNFS-UHFFFAOYSA-N 4',6-Diamino-2-phenylindol Chemical compound C1=CC(C(=N)N)=CC=C1C1=CC2=CC=C(C(N)=N)C=C2N1 FWBHETKCLVMNFS-UHFFFAOYSA-N 0.000 claims description 2
- IOCHVQUKLRTZFT-UHFFFAOYSA-N 5,8-bis(4-methylanilino)-9,10-dioxoanthracene-2-carboxylic acid Chemical compound C1=CC(C)=CC=C1NC(C=1C(=O)C2=CC=C(C=C2C(=O)C=11)C(O)=O)=CC=C1NC1=CC=C(C)C=C1 IOCHVQUKLRTZFT-UHFFFAOYSA-N 0.000 claims description 2
- VXJVWRHUIAISNC-UHFFFAOYSA-N 9,10-dioxo-5,8-bis(propylamino)anthracene-2-carboxylic acid Chemical compound O=C1C2=CC(C(O)=O)=CC=C2C(=O)C2=C1C(NCCC)=CC=C2NCCC VXJVWRHUIAISNC-UHFFFAOYSA-N 0.000 claims description 2
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 2
- ZYFVNVRFVHJEIU-UHFFFAOYSA-N PicoGreen Chemical compound CN(C)CCCN(CCCN(C)C)C1=CC(=CC2=[N+](C3=CC=CC=C3S2)C)C2=CC=CC=C2N1C1=CC=CC=C1 ZYFVNVRFVHJEIU-UHFFFAOYSA-N 0.000 claims description 2
- CGNLCCVKSWNSDG-UHFFFAOYSA-N SYBR Green I Chemical compound CN(C)CCCN(CCC)C1=CC(C=C2N(C3=CC=CC=C3S2)C)=C2C=CC=CC2=[N+]1C1=CC=CC=C1 CGNLCCVKSWNSDG-UHFFFAOYSA-N 0.000 claims description 2
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 claims description 2
- 229960005542 ethidium bromide Drugs 0.000 claims description 2
- INAAIJLSXJJHOZ-UHFFFAOYSA-N pibenzimol Chemical compound C1CN(C)CCN1C1=CC=C(N=C(N2)C=3C=C4NC(=NC4=CC=3)C=3C=CC(O)=CC=3)C2=C1 INAAIJLSXJJHOZ-UHFFFAOYSA-N 0.000 claims description 2
- ACOJCCLIDPZYJC-UHFFFAOYSA-M thiazole orange Chemical compound CC1=CC=C(S([O-])(=O)=O)C=C1.C1=CC=C2C(C=C3N(C4=CC=CC=C4S3)C)=CC=[N+](C)C2=C1 ACOJCCLIDPZYJC-UHFFFAOYSA-M 0.000 claims description 2
- ANRHNWWPFJCPAZ-UHFFFAOYSA-M thionine Chemical compound [Cl-].C1=CC(N)=CC2=[S+]C3=CC(N)=CC=C3N=C21 ANRHNWWPFJCPAZ-UHFFFAOYSA-M 0.000 claims description 2
- 150000004985 diamines Chemical class 0.000 claims 3
- GMJKWGLQXRMWPW-UHFFFAOYSA-N 1,4-bis[4-(2-hydroxyethyl)anilino]anthracene-9,10-dione Chemical compound C1=CC(CCO)=CC=C1NC(C=1C(=O)C2=CC=CC=C2C(=O)C=11)=CC=C1NC1=CC=C(CCO)C=C1 GMJKWGLQXRMWPW-UHFFFAOYSA-N 0.000 claims 1
- WQHGVTJIKDXENN-UHFFFAOYSA-N 3-(pyren-1-ylmethoxy)propane-1,2-diol Chemical compound C1=C2C(COCC(O)CO)=CC=C(C=C3)C2=C2C3=CC=CC2=C1 WQHGVTJIKDXENN-UHFFFAOYSA-N 0.000 claims 1
- 241000219195 Arabidopsis thaliana Species 0.000 claims 1
- 241000244203 Caenorhabditis elegans Species 0.000 claims 1
- 241000255601 Drosophila melanogaster Species 0.000 claims 1
- 241000282577 Pan troglodytes Species 0.000 claims 1
- YTAHJIFKAKIKAV-XNMGPUDCSA-N [(1R)-3-morpholin-4-yl-1-phenylpropyl] N-[(3S)-2-oxo-5-phenyl-1,3-dihydro-1,4-benzodiazepin-3-yl]carbamate Chemical compound O=C1[C@H](N=C(C2=C(N1)C=CC=C2)C1=CC=CC=C1)NC(O[C@H](CCN1CCOCC1)C1=CC=CC=C1)=O YTAHJIFKAKIKAV-XNMGPUDCSA-N 0.000 claims 1
- 125000005600 alkyl phosphonate group Chemical group 0.000 claims 1
- 229910052729 chemical element Inorganic materials 0.000 claims 1
- 230000002596 correlated effect Effects 0.000 claims 1
- 230000000875 corresponding effect Effects 0.000 claims 1
- NAGJZTKCGNOGPW-UHFFFAOYSA-K dioxido-sulfanylidene-sulfido-$l^{5}-phosphane Chemical compound [O-]P([O-])([S-])=S NAGJZTKCGNOGPW-UHFFFAOYSA-K 0.000 claims 1
- 239000000138 intercalating agent Substances 0.000 claims 1
- 125000000548 ribosyl group Chemical class C1([C@H](O)[C@H](O)[C@H](O1)CO)* 0.000 claims 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-K thiophosphate Chemical compound [O-]P([O-])([O-])=S RYYWUUFWQRZTIU-UHFFFAOYSA-K 0.000 claims 1
- 238000003752 polymerase chain reaction Methods 0.000 description 76
- 108090000623 proteins and genes Proteins 0.000 description 70
- 101100451681 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SSA4 gene Proteins 0.000 description 43
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 27
- 238000003753 real-time PCR Methods 0.000 description 25
- YMWUJEATGCHHMB-UHFFFAOYSA-N Dichloromethane Chemical compound ClCCl YMWUJEATGCHHMB-UHFFFAOYSA-N 0.000 description 24
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 24
- 210000004027 cell Anatomy 0.000 description 20
- 239000000047 product Substances 0.000 description 20
- 230000000694 effects Effects 0.000 description 19
- 230000000087 stabilizing effect Effects 0.000 description 19
- 230000002441 reversible effect Effects 0.000 description 16
- 238000013461 design Methods 0.000 description 15
- 125000001424 substituent group Chemical group 0.000 description 15
- 238000011160 research Methods 0.000 description 12
- 125000005647 linker group Chemical group 0.000 description 11
- 238000002474 experimental method Methods 0.000 description 10
- 239000000758 substrate Substances 0.000 description 10
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 9
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 9
- 238000001574 biopsy Methods 0.000 description 9
- 239000000872 buffer Substances 0.000 description 9
- 239000003153 chemical reaction reagent Substances 0.000 description 9
- GNBHRKFJIUUOQI-UHFFFAOYSA-N fluorescein Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GNBHRKFJIUUOQI-UHFFFAOYSA-N 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 9
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 9
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 8
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 8
- 241000894006 Bacteria Species 0.000 description 8
- 101150007068 HSP81-1 gene Proteins 0.000 description 8
- 101150087422 HSP82 gene Proteins 0.000 description 8
- 101001017254 Homo sapiens Myb-binding protein 1A Proteins 0.000 description 8
- 101150028525 Hsp83 gene Proteins 0.000 description 8
- 102100034005 Myb-binding protein 1A Human genes 0.000 description 8
- UIIMBOGNXHQVGW-UHFFFAOYSA-M Sodium bicarbonate Chemical compound [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 8
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 8
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 8
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 8
- 108091033319 polynucleotide Proteins 0.000 description 8
- 102000040430 polynucleotide Human genes 0.000 description 8
- 239000002157 polynucleotide Substances 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 7
- 238000002360 preparation method Methods 0.000 description 7
- 238000013518 transcription Methods 0.000 description 7
- 230000035897 transcription Effects 0.000 description 7
- 101150107820 ATG9 gene Proteins 0.000 description 6
- 102000004190 Enzymes Human genes 0.000 description 6
- 108090000790 Enzymes Proteins 0.000 description 6
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- 108060002716 Exonuclease Proteins 0.000 description 6
- 102100034343 Integrase Human genes 0.000 description 6
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 6
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 6
- 239000002253 acid Substances 0.000 description 6
- 238000013459 approach Methods 0.000 description 6
- 125000004432 carbon atom Chemical group C* 0.000 description 6
- 102000013165 exonuclease Human genes 0.000 description 6
- 238000000126 in silico method Methods 0.000 description 6
- 239000002609 medium Substances 0.000 description 6
- 238000002493 microarray Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000011002 quantification Methods 0.000 description 6
- 238000010187 selection method Methods 0.000 description 6
- 229910052717 sulfur Inorganic materials 0.000 description 6
- 230000008685 targeting Effects 0.000 description 6
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 6
- 210000001519 tissue Anatomy 0.000 description 6
- JKVKLDWULQJCIS-UHFFFAOYSA-N 1,4-dihydroxy-6-methylanthracene-9,10-dione Chemical compound C1=CC(O)=C2C(=O)C3=CC(C)=CC=C3C(=O)C2=C1O JKVKLDWULQJCIS-UHFFFAOYSA-N 0.000 description 5
- 238000012408 PCR amplification Methods 0.000 description 5
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 5
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 5
- 125000004429 atom Chemical group 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 229940104302 cytosine Drugs 0.000 description 5
- 238000001917 fluorescence detection Methods 0.000 description 5
- 125000001072 heteroaryl group Chemical group 0.000 description 5
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 5
- 238000000338 in vitro Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 239000003446 ligand Substances 0.000 description 5
- 238000001906 matrix-assisted laser desorption--ionisation mass spectrometry Methods 0.000 description 5
- IPVQLZZIHOAWMC-QXKUPLGCSA-N perindopril Chemical compound C1CCC[C@H]2C[C@@H](C(O)=O)N(C(=O)[C@H](C)N[C@@H](CCC)C(=O)OCC)[C@H]21 IPVQLZZIHOAWMC-QXKUPLGCSA-N 0.000 description 5
- 238000000746 purification Methods 0.000 description 5
- BBEAQIROQSPTKN-UHFFFAOYSA-N pyrene Chemical compound C1=CC=C2C=CC3=CC=CC4=CC=C1C2=C43 BBEAQIROQSPTKN-UHFFFAOYSA-N 0.000 description 5
- 125000006853 reporter group Chemical group 0.000 description 5
- 238000010839 reverse transcription Methods 0.000 description 5
- 230000035939 shock Effects 0.000 description 5
- LOSXTWDYAWERDB-UHFFFAOYSA-N 1-[chloro(diphenyl)methyl]-2,3-dimethoxybenzene Chemical compound COC1=CC=CC(C(Cl)(C=2C=CC=CC=2)C=2C=CC=CC=2)=C1OC LOSXTWDYAWERDB-UHFFFAOYSA-N 0.000 description 4
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 4
- 241000233866 Fungi Species 0.000 description 4
- 108091092195 Intron Proteins 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 4
- 239000007832 Na2SO4 Substances 0.000 description 4
- 238000002944 PCR assay Methods 0.000 description 4
- 108091093037 Peptide nucleic acid Proteins 0.000 description 4
- JUJWROOIHBZHMG-UHFFFAOYSA-N Pyridine Chemical compound C1=CC=NC=C1 JUJWROOIHBZHMG-UHFFFAOYSA-N 0.000 description 4
- PMZURENOXWZQFD-UHFFFAOYSA-L Sodium Sulfate Chemical compound [Na+].[Na+].[O-]S([O-])(=O)=O PMZURENOXWZQFD-UHFFFAOYSA-L 0.000 description 4
- 125000002877 alkyl aryl group Chemical group 0.000 description 4
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 4
- 125000004104 aryloxy group Chemical group 0.000 description 4
- 229960002685 biotin Drugs 0.000 description 4
- 235000020958 biotin Nutrition 0.000 description 4
- 239000011616 biotin Substances 0.000 description 4
- 238000003776 cleavage reaction Methods 0.000 description 4
- 125000001495 ethyl group Chemical group [H]C([H])([H])C([H])([H])* 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- 125000003835 nucleoside group Chemical group 0.000 description 4
- 229910052698 phosphorus Inorganic materials 0.000 description 4
- 239000013641 positive control Substances 0.000 description 4
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 4
- 238000010791 quenching Methods 0.000 description 4
- 230000007017 scission Effects 0.000 description 4
- 229910000030 sodium bicarbonate Inorganic materials 0.000 description 4
- 235000017557 sodium bicarbonate Nutrition 0.000 description 4
- 229910052938 sodium sulfate Inorganic materials 0.000 description 4
- 235000011152 sodium sulphate Nutrition 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- XGDRLCRGKUCBQL-UHFFFAOYSA-N 1h-imidazole-4,5-dicarbonitrile Chemical compound N#CC=1N=CNC=1C#N XGDRLCRGKUCBQL-UHFFFAOYSA-N 0.000 description 3
- 125000001731 2-cyanoethyl group Chemical group [H]C([H])(*)C([H])([H])C#N 0.000 description 3
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical group C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 3
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 3
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 3
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 3
- 229930024421 Adenine Natural products 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 3
- 102000053602 DNA Human genes 0.000 description 3
- 108020001019 DNA Primers Proteins 0.000 description 3
- 239000012625 DNA intercalator Substances 0.000 description 3
- AHCYMLUZIRLXAA-SHYZEUOFSA-N Deoxyuridine 5'-triphosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C=C1 AHCYMLUZIRLXAA-SHYZEUOFSA-N 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- XEKOWRVHYACXOJ-UHFFFAOYSA-N Ethyl acetate Chemical compound CCOC(C)=O XEKOWRVHYACXOJ-UHFFFAOYSA-N 0.000 description 3
- LYCAIKOWRPUZTN-UHFFFAOYSA-N Ethylene glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 3
- 229930010555 Inosine Natural products 0.000 description 3
- ZMXDDKWLCZADIW-UHFFFAOYSA-N N,N-Dimethylformamide Chemical compound CN(C)C=O ZMXDDKWLCZADIW-UHFFFAOYSA-N 0.000 description 3
- KAESVJOAVNADME-UHFFFAOYSA-N Pyrrole Chemical compound C=1C=CNC=1 KAESVJOAVNADME-UHFFFAOYSA-N 0.000 description 3
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 3
- 101150087239 SSA4 gene Proteins 0.000 description 3
- YXFVVABEGXRONW-UHFFFAOYSA-N Toluene Chemical compound CC1=CC=CC=C1 YXFVVABEGXRONW-UHFFFAOYSA-N 0.000 description 3
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 description 3
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 3
- 229960000643 adenine Drugs 0.000 description 3
- 125000002947 alkylene group Chemical group 0.000 description 3
- 125000005161 aryl oxy carbonyl group Chemical group 0.000 description 3
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 125000002619 bicyclic group Chemical group 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 3
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 3
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 3
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 3
- 239000000539 dimer Substances 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- GVEPBJHOBDJJJI-UHFFFAOYSA-N fluoranthrene Natural products C1=CC(C2=CC=CC=C22)=C3C2=CC=CC3=C1 GVEPBJHOBDJJJI-UHFFFAOYSA-N 0.000 description 3
- GTQFZXYECNSNNC-UHFFFAOYSA-N fluorescein 6-isothiocyanate Chemical compound O1C(=O)C2=CC=C(N=C=S)C=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GTQFZXYECNSNNC-UHFFFAOYSA-N 0.000 description 3
- 229910052736 halogen Inorganic materials 0.000 description 3
- 150000002367 halogens Chemical class 0.000 description 3
- 125000000623 heterocyclic group Chemical group 0.000 description 3
- 150000002431 hydrogen Chemical class 0.000 description 3
- 230000007062 hydrolysis Effects 0.000 description 3
- 238000006460 hydrolysis reaction Methods 0.000 description 3
- 229960003786 inosine Drugs 0.000 description 3
- 238000007834 ligase chain reaction Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 125000000325 methylidene group Chemical group [H]C([H])=* 0.000 description 3
- 239000002777 nucleoside Substances 0.000 description 3
- 229910052760 oxygen Inorganic materials 0.000 description 3
- 239000001301 oxygen Substances 0.000 description 3
- 125000001997 phenyl group Chemical group [H]C1=C([H])C([H])=C(*)C([H])=C1[H] 0.000 description 3
- 238000006116 polymerization reaction Methods 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 150000003254 radicals Chemical class 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 108091008146 restriction endonucleases Proteins 0.000 description 3
- 238000003757 reverse transcription PCR Methods 0.000 description 3
- 229940081969 saccharomyces cerevisiae Drugs 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 229940113082 thymine Drugs 0.000 description 3
- 229940035893 uracil Drugs 0.000 description 3
- AZQWKYJCGOJGHM-UHFFFAOYSA-N 1,4-benzoquinone Chemical compound O=C1C=CC(=O)C=C1 AZQWKYJCGOJGHM-UHFFFAOYSA-N 0.000 description 2
- 238000005160 1H NMR spectroscopy Methods 0.000 description 2
- KJUGUADJHNHALS-UHFFFAOYSA-N 1H-tetrazole Chemical compound C=1N=NNN=1 KJUGUADJHNHALS-UHFFFAOYSA-N 0.000 description 2
- BNBQQYFXBLBYJK-UHFFFAOYSA-N 2-pyridin-2-yl-1,3-oxazole Chemical class C1=COC(C=2N=CC=CC=2)=N1 BNBQQYFXBLBYJK-UHFFFAOYSA-N 0.000 description 2
- 108010037497 3'-nucleotidase Proteins 0.000 description 2
- RKVHNYJPIXOHRW-UHFFFAOYSA-N 3-bis[di(propan-2-yl)amino]phosphanyloxypropanenitrile Chemical compound CC(C)N(C(C)C)P(N(C(C)C)C(C)C)OCCC#N RKVHNYJPIXOHRW-UHFFFAOYSA-N 0.000 description 2
- DCXJOVUZENRYSH-UHFFFAOYSA-N 4,4-dimethyloxazolidine-N-oxyl Chemical compound CC1(C)COCN1[O] DCXJOVUZENRYSH-UHFFFAOYSA-N 0.000 description 2
- LOSIULRWFAEMFL-UHFFFAOYSA-N 7-deazaguanine Chemical compound O=C1NC(N)=NC2=C1CC=N2 LOSIULRWFAEMFL-UHFFFAOYSA-N 0.000 description 2
- VKKXEIQIGGPMHT-UHFFFAOYSA-N 7h-purine-2,8-diamine Chemical compound NC1=NC=C2NC(N)=NC2=N1 VKKXEIQIGGPMHT-UHFFFAOYSA-N 0.000 description 2
- LRFVTYWOQMYALW-UHFFFAOYSA-N 9H-xanthine Chemical compound O=C1NC(=O)NC2=C1NC=N2 LRFVTYWOQMYALW-UHFFFAOYSA-N 0.000 description 2
- 241000219194 Arabidopsis Species 0.000 description 2
- 102000004594 DNA Polymerase I Human genes 0.000 description 2
- 108010017826 DNA Polymerase I Proteins 0.000 description 2
- 241000252212 Danio rerio Species 0.000 description 2
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 2
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 2
- 229910052693 Europium Inorganic materials 0.000 description 2
- 238000009015 Human TaqMan MicroRNA Assay kit Methods 0.000 description 2
- 101150018298 MAX gene Proteins 0.000 description 2
- 241000288906 Primates Species 0.000 description 2
- WTKZEGDFNFYCGP-UHFFFAOYSA-N Pyrazole Chemical compound C=1C=NNC=1 WTKZEGDFNFYCGP-UHFFFAOYSA-N 0.000 description 2
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 2
- KJTLSVCANCCWHF-UHFFFAOYSA-N Ruthenium Chemical compound [Ru] KJTLSVCANCCWHF-UHFFFAOYSA-N 0.000 description 2
- 108020004682 Single-Stranded DNA Proteins 0.000 description 2
- 125000002252 acyl group Chemical group 0.000 description 2
- 125000004442 acylamino group Chemical group 0.000 description 2
- 125000004423 acyloxy group Chemical group 0.000 description 2
- 238000000246 agarose gel electrophoresis Methods 0.000 description 2
- 125000004450 alkenylene group Chemical group 0.000 description 2
- 125000003282 alkyl amino group Chemical group 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- PYKYMHQGRFAEBM-UHFFFAOYSA-N anthraquinone Natural products CCC(=O)c1c(O)c2C(=O)C3C(C=CC=C3O)C(=O)c2cc1CC(=O)OC PYKYMHQGRFAEBM-UHFFFAOYSA-N 0.000 description 2
- 150000004056 anthraquinones Chemical class 0.000 description 2
- 125000005129 aryl carbonyl group Chemical group 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- KGBXLFKZBHKPEV-UHFFFAOYSA-N boric acid Chemical compound OB(O)O KGBXLFKZBHKPEV-UHFFFAOYSA-N 0.000 description 2
- 239000004327 boric acid Substances 0.000 description 2
- 239000012267 brine Substances 0.000 description 2
- 125000003917 carbamoyl group Chemical group [H]N([H])C(*)=O 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 2
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 2
- OGPBJKLSAFTDLK-UHFFFAOYSA-N europium atom Chemical compound [Eu] OGPBJKLSAFTDLK-UHFFFAOYSA-N 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- 239000006260 foam Substances 0.000 description 2
- 125000002485 formyl group Chemical group [H]C(*)=O 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 229930182470 glycoside Natural products 0.000 description 2
- 150000002338 glycosides Chemical class 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 125000005223 heteroarylcarbonyl group Chemical group 0.000 description 2
- 125000005553 heteroaryloxy group Chemical group 0.000 description 2
- 125000005842 heteroatom Chemical group 0.000 description 2
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 125000001921 locked nucleotide group Chemical group 0.000 description 2
- 125000001434 methanylylidene group Chemical group [H]C#[*] 0.000 description 2
- 125000000956 methoxy group Chemical group [H]C([H])([H])O* 0.000 description 2
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 2
- 239000002808 molecular sieve Substances 0.000 description 2
- 125000000371 nucleobase group Chemical group 0.000 description 2
- 238000002515 oligonucleotide synthesis Methods 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 239000012074 organic phase Substances 0.000 description 2
- 239000012071 phase Substances 0.000 description 2
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 2
- 210000002381 plasma Anatomy 0.000 description 2
- 239000013612 plasmid Substances 0.000 description 2
- 229920000136 polysorbate Polymers 0.000 description 2
- 239000002244 precipitate Substances 0.000 description 2
- 125000006239 protecting group Chemical group 0.000 description 2
- 125000001725 pyrenyl group Chemical group 0.000 description 2
- UMJSCPRVCHMLSP-UHFFFAOYSA-N pyridine Natural products COC1=CC=CN=C1 UMJSCPRVCHMLSP-UHFFFAOYSA-N 0.000 description 2
- 238000010992 reflux Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- PYWVYCXTNDRMGF-UHFFFAOYSA-N rhodamine B Chemical compound [Cl-].C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=CC=C1C(O)=O PYWVYCXTNDRMGF-UHFFFAOYSA-N 0.000 description 2
- 239000003161 ribonuclease inhibitor Substances 0.000 description 2
- 229910052707 ruthenium Inorganic materials 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 210000002966 serum Anatomy 0.000 description 2
- 239000000741 silica gel Substances 0.000 description 2
- 229910002027 silica gel Inorganic materials 0.000 description 2
- URGAHOPLAPQHLN-UHFFFAOYSA-N sodium aluminosilicate Chemical compound [Na+].[Al+3].[O-][Si]([O-])=O.[O-][Si]([O-])=O URGAHOPLAPQHLN-UHFFFAOYSA-N 0.000 description 2
- HPALAKNZSZLMCH-UHFFFAOYSA-M sodium;chloride;hydrate Chemical compound O.[Na+].[Cl-] HPALAKNZSZLMCH-UHFFFAOYSA-M 0.000 description 2
- 125000000547 substituted alkyl group Chemical group 0.000 description 2
- MPLHNVLQVRSVEE-UHFFFAOYSA-N texas red Chemical compound [O-]S(=O)(=O)C1=CC(S(Cl)(=O)=O)=CC=C1C(C1=CC=2CCCN3CCCC(C=23)=C1O1)=C2C1=C(CCC1)C3=[N+]1CCCC3=C2 MPLHNVLQVRSVEE-UHFFFAOYSA-N 0.000 description 2
- 125000002221 trityl group Chemical group [H]C1=C([H])C([H])=C([H])C([H])=C1C([*])(C1=C(C(=C(C(=C1[H])[H])[H])[H])[H])C1=C([H])C([H])=C([H])C([H])=C1[H] 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- DFUSDJMZWQVQSF-XLGIIRLISA-N (2r)-2-methyl-2-[(4r,8r)-4,8,12-trimethyltridecyl]-3,4-dihydrochromen-6-ol Chemical class OC1=CC=C2O[C@@](CCC[C@H](C)CCC[C@H](C)CCCC(C)C)(C)CCC2=C1 DFUSDJMZWQVQSF-XLGIIRLISA-N 0.000 description 1
- QFLWZFQWSBQYPS-AWRAUJHKSA-N (3S)-3-[[(2S)-2-[[(2S)-2-[5-[(3aS,6aR)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]pentanoylamino]-3-methylbutanoyl]amino]-3-(4-hydroxyphenyl)propanoyl]amino]-4-[1-bis(4-chlorophenoxy)phosphorylbutylamino]-4-oxobutanoic acid Chemical group CCCC(NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@@H](NC(=O)CCCCC1SC[C@@H]2NC(=O)N[C@H]12)C(C)C)P(=O)(Oc1ccc(Cl)cc1)Oc1ccc(Cl)cc1 QFLWZFQWSBQYPS-AWRAUJHKSA-N 0.000 description 1
- 125000006700 (C1-C6) alkylthio group Chemical group 0.000 description 1
- MXQACMJEKYMFOM-UHFFFAOYSA-N 1,7-dihydropyrrolo[2,3-d]pyrimidin-2-one Chemical compound N1C(=O)N=C2NC=CC2=C1 MXQACMJEKYMFOM-UHFFFAOYSA-N 0.000 description 1
- HASUWNAFLUMMFI-UHFFFAOYSA-N 1,7-dihydropyrrolo[2,3-d]pyrimidine-2,4-dione Chemical compound O=C1NC(=O)NC2=C1C=CN2 HASUWNAFLUMMFI-UHFFFAOYSA-N 0.000 description 1
- IFXXWBRWYVGOJH-UHFFFAOYSA-N 1-[4-(2-hydroxyethyl)anilino]anthracene-9,10-dione Chemical compound C1=CC(CCO)=CC=C1NC1=CC=CC2=C1C(=O)C1=CC=CC=C1C2=O IFXXWBRWYVGOJH-UHFFFAOYSA-N 0.000 description 1
- FDFVVBKRHGRRFY-UHFFFAOYSA-N 1-hydroxy-2,2,5,5-tetramethylpyrrolidine Chemical compound CC1(C)CCC(C)(C)N1O FDFVVBKRHGRRFY-UHFFFAOYSA-N 0.000 description 1
- HWPZZUQOWRWFDB-UHFFFAOYSA-N 1-methylcytosine Chemical compound CN1C=CC(N)=NC1=O HWPZZUQOWRWFDB-UHFFFAOYSA-N 0.000 description 1
- VUZNLSBZRVZGIK-UHFFFAOYSA-N 2,2,6,6-Tetramethyl-1-piperidinol Chemical compound CC1(C)CCCC(C)(C)N1O VUZNLSBZRVZGIK-UHFFFAOYSA-N 0.000 description 1
- QXHDYMUPPXAMPQ-UHFFFAOYSA-N 2-(4-aminophenyl)ethanol Chemical compound NC1=CC=C(CCO)C=C1 QXHDYMUPPXAMPQ-UHFFFAOYSA-N 0.000 description 1
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 description 1
- VKIGAWAEXPTIOL-UHFFFAOYSA-N 2-hydroxyhexanenitrile Chemical compound CCCCC(O)C#N VKIGAWAEXPTIOL-UHFFFAOYSA-N 0.000 description 1
- 125000000094 2-phenylethyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])C([H])([H])* 0.000 description 1
- NKDFYOWSKOHCCO-YPVLXUMRSA-N 20-hydroxyecdysone Chemical compound C1[C@@H](O)[C@@H](O)C[C@]2(C)[C@@H](CC[C@@]3([C@@H]([C@@](C)(O)[C@H](O)CCC(C)(O)C)CC[C@]33O)C)C3=CC(=O)[C@@H]21 NKDFYOWSKOHCCO-YPVLXUMRSA-N 0.000 description 1
- LOJNBPNACKZWAI-UHFFFAOYSA-N 3-nitro-1h-pyrrole Chemical group [O-][N+](=O)C=1C=CNC=1 LOJNBPNACKZWAI-UHFFFAOYSA-N 0.000 description 1
- 238000004679 31P NMR spectroscopy Methods 0.000 description 1
- 125000002103 4,4'-dimethoxytriphenylmethyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C(*)(C1=C([H])C([H])=C(OC([H])([H])[H])C([H])=C1[H])C1=C([H])C([H])=C(OC([H])([H])[H])C([H])=C1[H] 0.000 description 1
- WCKQPPQRFNHPRJ-UHFFFAOYSA-N 4-[[4-(dimethylamino)phenyl]diazenyl]benzoic acid Chemical compound C1=CC(N(C)C)=CC=C1N=NC1=CC=C(C(O)=O)C=C1 WCKQPPQRFNHPRJ-UHFFFAOYSA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- WXNZTHHGJRFXKQ-UHFFFAOYSA-N 4-chlorophenol Chemical compound OC1=CC=C(Cl)C=C1 WXNZTHHGJRFXKQ-UHFFFAOYSA-N 0.000 description 1
- DGYMBCUDYNBMHT-UHFFFAOYSA-N 4-pyren-1-ylbutane-1,2,3-triol Chemical class C1=C2C(CC(O)C(O)CO)=CC=C(C=C3)C2=C2C3=CC=CC2=C1 DGYMBCUDYNBMHT-UHFFFAOYSA-N 0.000 description 1
- LQLQRFGHAALLLE-UHFFFAOYSA-N 5-bromouracil Chemical compound BrC1=CNC(=O)NC1=O LQLQRFGHAALLLE-UHFFFAOYSA-N 0.000 description 1
- ZOXBWJMCXHTKNU-UHFFFAOYSA-N 5-methyl-2-benzofuran-1,3-dione Chemical compound CC1=CC=C2C(=O)OC(=O)C2=C1 ZOXBWJMCXHTKNU-UHFFFAOYSA-N 0.000 description 1
- ZLAQATDNGLKIEV-UHFFFAOYSA-N 5-methyl-2-sulfanylidene-1h-pyrimidin-4-one Chemical compound CC1=CNC(=S)NC1=O ZLAQATDNGLKIEV-UHFFFAOYSA-N 0.000 description 1
- 108091027075 5S-rRNA precursor Proteins 0.000 description 1
- PLUDYDNNASPOEE-UHFFFAOYSA-N 6-(aziridin-1-yl)-1h-pyrimidin-2-one Chemical compound C1=CNC(=O)N=C1N1CC1 PLUDYDNNASPOEE-UHFFFAOYSA-N 0.000 description 1
- SXQMWXNOYLLRBY-UHFFFAOYSA-N 6-(methylamino)purin-8-one Chemical compound CNC1=NC=NC2=NC(=O)N=C12 SXQMWXNOYLLRBY-UHFFFAOYSA-N 0.000 description 1
- CJIJXIFQYOPWTF-UHFFFAOYSA-N 7-hydroxycoumarin Natural products O1C(=O)C=CC2=CC(O)=CC=C21 CJIJXIFQYOPWTF-UHFFFAOYSA-N 0.000 description 1
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 1
- VHUUQVKOLVNVRT-UHFFFAOYSA-N Ammonium hydroxide Chemical compound [NH4+].[OH-] VHUUQVKOLVNVRT-UHFFFAOYSA-N 0.000 description 1
- 125000006519 CCH3 Chemical group 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 239000004215 Carbon black (E152) Substances 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical class OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 241001635598 Enicostema Species 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- GHASVSINZRGABV-UHFFFAOYSA-N Fluorouracil Chemical compound FC1=CNC(=O)NC1=O GHASVSINZRGABV-UHFFFAOYSA-N 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- 206010056740 Genital discharge Diseases 0.000 description 1
- 102000002812 Heat-Shock Proteins Human genes 0.000 description 1
- 108010004889 Heat-Shock Proteins Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000610640 Homo sapiens U4/U6 small nuclear ribonucleoprotein Prp3 Proteins 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 241001599018 Melanogaster Species 0.000 description 1
- 206010027626 Milia Diseases 0.000 description 1
- 241001430197 Mollicutes Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108091005461 Nucleic proteins Proteins 0.000 description 1
- 229910003849 O-Si Inorganic materials 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 229910003872 O—Si Inorganic materials 0.000 description 1
- RPDUDBYMNGAHEM-UHFFFAOYSA-N PROXYL Chemical compound CC1(C)CCC(C)(C)N1[O] RPDUDBYMNGAHEM-UHFFFAOYSA-N 0.000 description 1
- 241000282579 Pan Species 0.000 description 1
- 239000004952 Polyamide Substances 0.000 description 1
- WUGQZFFCHPXWKQ-UHFFFAOYSA-N Propanolamine Chemical compound NCCCO WUGQZFFCHPXWKQ-UHFFFAOYSA-N 0.000 description 1
- 108010066717 Q beta Replicase Proteins 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 241000606701 Rickettsia Species 0.000 description 1
- 101001110823 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) 60S ribosomal protein L6-A Proteins 0.000 description 1
- 101000712176 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) 60S ribosomal protein L6-B Proteins 0.000 description 1
- 101100171058 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) POL5 gene Proteins 0.000 description 1
- 229910052772 Samarium Inorganic materials 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-N Sulfuric acid Chemical compound OS(O)(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-N 0.000 description 1
- 101710137500 T7 RNA polymerase Proteins 0.000 description 1
- 241000589500 Thermus aquaticus Species 0.000 description 1
- 241000534944 Thia Species 0.000 description 1
- 229910052775 Thulium Inorganic materials 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 102100040374 U4/U6 small nuclear ribonucleoprotein Prp3 Human genes 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000002441 X-ray diffraction Methods 0.000 description 1
- MGPYJVWEJNTXLC-UHFFFAOYSA-N [6-[6-[2-cyanoethoxy-[di(propan-2-yl)amino]phosphanyl]oxyhexylcarbamoyl]-6'-(2,2-dimethylpropanoyloxy)-3-oxospiro[2-benzofuran-1,9'-xanthene]-3'-yl] 2,2-dimethylpropanoate Chemical compound C12=CC=C(OC(=O)C(C)(C)C)C=C2OC2=CC(OC(=O)C(C)(C)C)=CC=C2C11OC(=O)C2=CC=C(C(=O)NCCCCCCOP(N(C(C)C)C(C)C)OCCC#N)C=C21 MGPYJVWEJNTXLC-UHFFFAOYSA-N 0.000 description 1
- 230000009102 absorption Effects 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 125000002777 acetyl group Chemical group [H]C([H])([H])C(*)=O 0.000 description 1
- 150000001251 acridines Chemical class 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 150000001447 alkali salts Chemical class 0.000 description 1
- 150000001336 alkenes Chemical class 0.000 description 1
- 150000001345 alkine derivatives Chemical class 0.000 description 1
- 125000004453 alkoxycarbonyl group Chemical group 0.000 description 1
- 125000004644 alkyl sulfinyl group Chemical group 0.000 description 1
- 125000004390 alkyl sulfonyl group Chemical group 0.000 description 1
- 125000004414 alkyl thio group Chemical group 0.000 description 1
- 150000001412 amines Chemical class 0.000 description 1
- 125000004397 aminosulfonyl group Chemical group NS(=O)(=O)* 0.000 description 1
- QGZKDVFQNNGYKY-UHFFFAOYSA-N ammonia Natural products N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 1
- BKNBVEKCHVXGPH-UHFFFAOYSA-N anthracene-1,4,9,10-tetrol Chemical compound C1=CC=C2C(O)=C3C(O)=CC=C(O)C3=C(O)C2=C1 BKNBVEKCHVXGPH-UHFFFAOYSA-N 0.000 description 1
- 230000001093 anti-cancer Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 1
- 125000003435 aroyl group Chemical group 0.000 description 1
- 125000005239 aroylamino group Chemical group 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 125000004659 aryl alkyl thio group Chemical group 0.000 description 1
- 125000001769 aryl amino group Chemical group 0.000 description 1
- 125000005160 aryl oxy alkyl group Chemical group 0.000 description 1
- 125000004391 aryl sulfonyl group Chemical group 0.000 description 1
- 125000005110 aryl thio group Chemical group 0.000 description 1
- 238000011948 assay development Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- ZYGHJZDHTFUPRJ-UHFFFAOYSA-N benzo-alpha-pyrone Natural products C1=CC=C2OC(=O)C=CC2=C1 ZYGHJZDHTFUPRJ-UHFFFAOYSA-N 0.000 description 1
- 125000003236 benzoyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C(*)=O 0.000 description 1
- 125000001797 benzyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])* 0.000 description 1
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 238000009835 boiling Methods 0.000 description 1
- 238000007470 bone biopsy Methods 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 125000001246 bromo group Chemical group Br* 0.000 description 1
- 238000011088 calibration curve Methods 0.000 description 1
- 101150039352 can gene Proteins 0.000 description 1
- 230000000711 cancerogenic effect Effects 0.000 description 1
- 125000002837 carbocyclic group Chemical group 0.000 description 1
- 150000001721 carbon Chemical group 0.000 description 1
- CREMABGTGYGIQB-UHFFFAOYSA-N carbon carbon Chemical compound C.C CREMABGTGYGIQB-UHFFFAOYSA-N 0.000 description 1
- 239000011203 carbon fibre reinforced carbon Substances 0.000 description 1
- 125000002915 carbonyl group Chemical group [*:2]C([*:1])=O 0.000 description 1
- 231100000357 carcinogen Toxicity 0.000 description 1
- 239000003183 carcinogenic agent Substances 0.000 description 1
- 210000000845 cartilage Anatomy 0.000 description 1
- 239000006143 cell culture medium Substances 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 210000003850 cellular structure Anatomy 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 239000005081 chemiluminescent agent Substances 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000004440 column chromatography Methods 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 239000003636 conditioned culture medium Substances 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 235000001671 coumarin Nutrition 0.000 description 1
- 150000004775 coumarins Chemical class 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 125000000753 cycloalkyl group Chemical group 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000005549 deoxyribonucleoside Substances 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000010511 deprotection reaction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 238000010494 dissociation reaction Methods 0.000 description 1
- 230000005593 dissociations Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 239000003596 drug target Substances 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 239000000428 dust Substances 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000000804 electron spin resonance spectroscopy Methods 0.000 description 1
- 239000003480 eluent Substances 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- IINNWAYUJNWZRM-UHFFFAOYSA-L erythrosin B Chemical compound [Na+].[Na+].[O-]C(=O)C1=CC=CC=C1C1=C2C=C(I)C(=O)C(I)=C2OC2=C(I)C([O-])=C(I)C=C21 IINNWAYUJNWZRM-UHFFFAOYSA-L 0.000 description 1
- 239000004174 erythrosine Substances 0.000 description 1
- 229940011411 erythrosine Drugs 0.000 description 1
- 235000012732 erythrosine Nutrition 0.000 description 1
- 229940093499 ethyl acetate Drugs 0.000 description 1
- 235000019439 ethyl acetate Nutrition 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002875 fluorescence polarization Methods 0.000 description 1
- 229960002949 fluorouracil Drugs 0.000 description 1
- 238000002825 functional assay Methods 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- ZJYYHGLJYGJLLN-UHFFFAOYSA-N guanidinium thiocyanate Chemical compound SC#N.NC(N)=N ZJYYHGLJYGJLLN-UHFFFAOYSA-N 0.000 description 1
- 125000005843 halogen group Chemical group 0.000 description 1
- 125000005226 heteroaryloxycarbonyl group Chemical group 0.000 description 1
- 125000005150 heteroarylsulfinyl group Chemical group 0.000 description 1
- 125000005143 heteroarylsulfonyl group Chemical group 0.000 description 1
- 125000005368 heteroarylthio group Chemical group 0.000 description 1
- 150000002391 heterocyclic compounds Chemical class 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 229930195733 hydrocarbon Natural products 0.000 description 1
- 230000003301 hydrolyzing effect Effects 0.000 description 1
- 125000002768 hydroxyalkyl group Chemical group 0.000 description 1
- 125000001841 imino group Chemical group [H]N=* 0.000 description 1
- 238000012606 in vitro cell culture Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 125000000959 isobutyl group Chemical group [H]C([H])([H])C([H])(C([H])([H])[H])C([H])([H])* 0.000 description 1
- 125000001449 isopropyl group Chemical group [H]C([H])([H])C([H])(*)C([H])([H])[H] 0.000 description 1
- 230000000155 isotopic effect Effects 0.000 description 1
- 238000011862 kidney biopsy Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012317 liver biopsy Methods 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 210000004880 lymph fluid Anatomy 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 238000007403 mPCR Methods 0.000 description 1
- 230000005389 magnetism Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- GDOPTJXRTPNYNR-UHFFFAOYSA-N methyl-cyclopentane Natural products CC1CCCC1 GDOPTJXRTPNYNR-UHFFFAOYSA-N 0.000 description 1
- GRVDJDISBSALJP-UHFFFAOYSA-N methyloxidanyl Chemical group [O]C GRVDJDISBSALJP-UHFFFAOYSA-N 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 241000264288 mixed libraries Species 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 125000002950 monocyclic group Chemical group 0.000 description 1
- 238000001964 muscle biopsy Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 125000000449 nitro group Chemical group [O-][N+](*)=O 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 229940124276 oligodeoxyribonucleotide Drugs 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 125000001181 organosilyl group Chemical group [SiH3]* 0.000 description 1
- 230000003204 osmotic effect Effects 0.000 description 1
- 125000004043 oxo group Chemical group O=* 0.000 description 1
- 125000005740 oxycarbonyl group Chemical group [*:1]OC([*:2])=O 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 230000005298 paramagnetic effect Effects 0.000 description 1
- LCCNCVORNKJIRZ-UHFFFAOYSA-N parathion Chemical compound CCOP(=S)(OCC)OC1=CC=C([N+]([O-])=O)C=C1 LCCNCVORNKJIRZ-UHFFFAOYSA-N 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 229920002647 polyamide Polymers 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 230000037452 priming Effects 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 125000006308 propyl amino group Chemical group 0.000 description 1
- 210000001938 protoplast Anatomy 0.000 description 1
- IGFXRKMLLMBKSA-UHFFFAOYSA-N purine Chemical compound N1=C[N]C2=NC=NC2=C1 IGFXRKMLLMBKSA-UHFFFAOYSA-N 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 150000003220 pyrenes Chemical class 0.000 description 1
- 150000003233 pyrroles Chemical class 0.000 description 1
- 238000012207 quantitative assay Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 229910052761 rare earth metal Inorganic materials 0.000 description 1
- 150000002910 rare earth metals Chemical class 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 210000001995 reticulocyte Anatomy 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- KZUNJOHGWZRPMI-UHFFFAOYSA-N samarium atom Chemical compound [Sm] KZUNJOHGWZRPMI-UHFFFAOYSA-N 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 235000012239 silicon dioxide Nutrition 0.000 description 1
- 238000003201 single nucleotide polymorphism genotyping Methods 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 238000007390 skin biopsy Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 125000003003 spiro group Chemical group 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 125000003107 substituted aryl group Chemical group 0.000 description 1
- 235000011149 sulphuric acid Nutrition 0.000 description 1
- 210000001179 synovial fluid Anatomy 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 229940042055 systemic antimycotics triazole derivative Drugs 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 125000000999 tert-butyl group Chemical group [H]C([H])([H])C(*)(C([H])([H])[H])C([H])([H])[H] 0.000 description 1
- 230000002381 testicular Effects 0.000 description 1
- WGTODYJZXSJIAG-UHFFFAOYSA-N tetramethylrhodamine chloride Chemical compound [Cl-].C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C(O)=O WGTODYJZXSJIAG-UHFFFAOYSA-N 0.000 description 1
- 125000003396 thiol group Chemical class [H]S* 0.000 description 1
- 125000000464 thioxo group Chemical group S=* 0.000 description 1
- 210000001541 thymus gland Anatomy 0.000 description 1
- 230000002110 toxicologic effect Effects 0.000 description 1
- 231100000027 toxicology Toxicity 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 150000003852 triazoles Chemical group 0.000 description 1
- 239000001226 triphosphate Substances 0.000 description 1
- 235000011178 triphosphate Nutrition 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- ORHBXUUXSCNDEV-UHFFFAOYSA-N umbelliferone Chemical compound C1=CC(=O)OC2=CC(O)=CC=C21 ORHBXUUXSCNDEV-UHFFFAOYSA-N 0.000 description 1
- HFTAFOQKODTIJY-UHFFFAOYSA-N umbelliferone Natural products Cc1cc2C=CC(=O)Oc2cc1OCC=CC(C)(C)O HFTAFOQKODTIJY-UHFFFAOYSA-N 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 239000003643 water by type Substances 0.000 description 1
- 229940075420 xanthine Drugs 0.000 description 1
- 239000007222 ypd medium Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07H—SUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
- C07H21/00—Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Materials By The Use Of Chemical Reactions (AREA)
Abstract
The invention relates to nucleic acid probes, nucleic acid probe libraries, and kits for detecting, classifying, or quantifying components in a complex mixture of nucleic acids, such as a transcriptome, and methods of using the same. The invention also relates to methods of identifying nucleic acid probes useful in the probe libraries and to methods of identifying a means for detection of a given nucleic acid.
Description
DEMANDE OU BREVET VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.
NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME
NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:
NOTE POUR LE TOME / VOLUME NOTE:
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.
NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME
NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:
NOTE POUR LE TOME / VOLUME NOTE:
PROBES, LIBRARIES AND KITS FOR ANALYSIS OF MIXTURES OF NUCLEIC ACIDS AND
METHODS FOR CONSTRUCTING THE SAME
FIELD OF THE INVENTION
The invention relates to nucleic acid probes, nucleic acid probe libraries, and kits for detect-ing, classifying, or quantifying components in a complex mixture of nucleic acids, such as a transcriptome, and methods of using the same.
BACKGROUND OF THE INVENTION
With the advent of microarrays for profiling the expression of thousands of genes, such as GeneChipTM arrays (Affymetrix, Inc., Santa Clara, CA), correlations between expressed genes and cellular phenotypes may be identified at a fraction at the cost and labour necessary for traditional methods, such as Northern- or dot-blot analysis. Microarrays permit the develop-ment of multiple parallel assays for identifying and validating biomarkers of disease and drug targets which can be used in diagnosis and treatment. Gene expression profiles can also be used to estimate and predict metabolic and toxicological consequences of exposure to an agent (e.g., such as a drug, a potential toxin or carcinogen, etc.) or a condition (e.g., tem-perature, pH, etc).
Microarray experiments often yield redundant data, only a fraction of which has value for the experimenter. Additionally, because of the highly parallel format of microarray-based assays, conditions may not be optimal for individual capture probes. For these reasons, microarray experiments are most often followed up by, or sequentially replaced by, confirmatory studies using single-gene homogeneous assays. These are most often quantitative PCR-based me-thods such as the 5' nuclease assay or other types of dual labelled probe quantitative assays.
However, these assays are still time-consuming, single-reaction assays that are hampered by high costs and time-consuming probe design procedures. Further, 5' nuclease assay probes are relatively large (e.g., 15-30 nucleotides). Thus, the limitations in homogeneous assay systems currently known create a bottleneck in the validation of microarray findings, and in focused target validation procedures.
An approach to avoid this bottleneck is to omit the expensive dual-labelled indicator probes used in 5' nuclease assay procedures and molecular beacons and instead use non-sequence-specific DNA intercalating dyes such as SYBR Green that fluoresce upon binding to double-stranded but not single-stranded DNA. Using such dyes, it is possible to universally detect any amplified sequence in real-time. However, this technology is hampered by several problems. For example, non-specific priming during the PCR amplification process can gene-rate unintentional non-target amplicons that will contribute in the quantification process.
Further, interactions between PCR primers in the reaction to form "primer-dimers" are com-mon. Due to the high concentration of primers typically used in a PCR
reaction, this can lead to significant amounts of short double-stranded non-target amplicons that also bind interca-lating dyes. Therefore, the preferred method of quantifying mRNA by real-time PCR uses sequence- specific detection probes.
One approach for avoiding the problem of random amplification and the formation of primer-dimers is to use generic detection probes that may be used to detect a large number of dif-ferent types of nucleic acid molecules, while retaining some sequence specificity, has been described by Simeonov, et al. (Nucleic Acid Research 30(17): 91, 2002; U.S.
Patent Publica-tion 20020197630) and involves the use of a library of probes comprising more than 10% of all possible sequences of a given length (or lengths). The library can include various non-natural nucleobases and other modifications to stabilize binding of probes/primers in the Ii-brary to a target sequence. Even so, a minimal length of at least 8 bases is required for most sequences to attain a degree of stability that is compatible with most assay conditions rele-vant for applications such as real time PCR. Because a universal library of all possible 8-mers contains 65,536 different sequences, even the smallest library previously considered by Simeonov, et al. contains more than 10% of all possibilities, i.e. at least 6554 sequences which is impractical to handle and vastly expensive to construct.
From a practical point of view, several factors limit the ease of use and accessibility of con-temporary homogeneous assays applications. The problems encountered by users of conven-tional assay technologies include:
= prohibitively high costs when attempting to detect many different genes in a few sam-ples, because the price to purchase a probe for each transcript is high.
= The synthesis of labelled probes is time-consuming and often the time from order to receipt from manufacturer is more than 1 week.
= User-designed kits may not work the first time and validated kits are expensive per assay.
= It is difficult to quickly test for a new target or iteratively improve probe design.
= The exact probe sequence of commercial validated probes may be unknown for the customer resulting in problems with evaluation of results and suitability for scientific publica-tion.
= When assay conditions or components are obscure it may be impossible to order rea-gents from alternative source.
METHODS FOR CONSTRUCTING THE SAME
FIELD OF THE INVENTION
The invention relates to nucleic acid probes, nucleic acid probe libraries, and kits for detect-ing, classifying, or quantifying components in a complex mixture of nucleic acids, such as a transcriptome, and methods of using the same.
BACKGROUND OF THE INVENTION
With the advent of microarrays for profiling the expression of thousands of genes, such as GeneChipTM arrays (Affymetrix, Inc., Santa Clara, CA), correlations between expressed genes and cellular phenotypes may be identified at a fraction at the cost and labour necessary for traditional methods, such as Northern- or dot-blot analysis. Microarrays permit the develop-ment of multiple parallel assays for identifying and validating biomarkers of disease and drug targets which can be used in diagnosis and treatment. Gene expression profiles can also be used to estimate and predict metabolic and toxicological consequences of exposure to an agent (e.g., such as a drug, a potential toxin or carcinogen, etc.) or a condition (e.g., tem-perature, pH, etc).
Microarray experiments often yield redundant data, only a fraction of which has value for the experimenter. Additionally, because of the highly parallel format of microarray-based assays, conditions may not be optimal for individual capture probes. For these reasons, microarray experiments are most often followed up by, or sequentially replaced by, confirmatory studies using single-gene homogeneous assays. These are most often quantitative PCR-based me-thods such as the 5' nuclease assay or other types of dual labelled probe quantitative assays.
However, these assays are still time-consuming, single-reaction assays that are hampered by high costs and time-consuming probe design procedures. Further, 5' nuclease assay probes are relatively large (e.g., 15-30 nucleotides). Thus, the limitations in homogeneous assay systems currently known create a bottleneck in the validation of microarray findings, and in focused target validation procedures.
An approach to avoid this bottleneck is to omit the expensive dual-labelled indicator probes used in 5' nuclease assay procedures and molecular beacons and instead use non-sequence-specific DNA intercalating dyes such as SYBR Green that fluoresce upon binding to double-stranded but not single-stranded DNA. Using such dyes, it is possible to universally detect any amplified sequence in real-time. However, this technology is hampered by several problems. For example, non-specific priming during the PCR amplification process can gene-rate unintentional non-target amplicons that will contribute in the quantification process.
Further, interactions between PCR primers in the reaction to form "primer-dimers" are com-mon. Due to the high concentration of primers typically used in a PCR
reaction, this can lead to significant amounts of short double-stranded non-target amplicons that also bind interca-lating dyes. Therefore, the preferred method of quantifying mRNA by real-time PCR uses sequence- specific detection probes.
One approach for avoiding the problem of random amplification and the formation of primer-dimers is to use generic detection probes that may be used to detect a large number of dif-ferent types of nucleic acid molecules, while retaining some sequence specificity, has been described by Simeonov, et al. (Nucleic Acid Research 30(17): 91, 2002; U.S.
Patent Publica-tion 20020197630) and involves the use of a library of probes comprising more than 10% of all possible sequences of a given length (or lengths). The library can include various non-natural nucleobases and other modifications to stabilize binding of probes/primers in the Ii-brary to a target sequence. Even so, a minimal length of at least 8 bases is required for most sequences to attain a degree of stability that is compatible with most assay conditions rele-vant for applications such as real time PCR. Because a universal library of all possible 8-mers contains 65,536 different sequences, even the smallest library previously considered by Simeonov, et al. contains more than 10% of all possibilities, i.e. at least 6554 sequences which is impractical to handle and vastly expensive to construct.
From a practical point of view, several factors limit the ease of use and accessibility of con-temporary homogeneous assays applications. The problems encountered by users of conven-tional assay technologies include:
= prohibitively high costs when attempting to detect many different genes in a few sam-ples, because the price to purchase a probe for each transcript is high.
= The synthesis of labelled probes is time-consuming and often the time from order to receipt from manufacturer is more than 1 week.
= User-designed kits may not work the first time and validated kits are expensive per assay.
= It is difficult to quickly test for a new target or iteratively improve probe design.
= The exact probe sequence of commercial validated probes may be unknown for the customer resulting in problems with evaluation of results and suitability for scientific publica-tion.
= When assay conditions or components are obscure it may be impossible to order rea-gents from alternative source.
The described invention address these practical problems and aim to ensure rapid and inex-pensive assay development of accurate and specific assays for quantification of gene tran-scripts.
SUMMARY OF THE INVENTION
It is desirable to be able to quantify the expression of most genes (e.g., >98%) in e.g. the human transcriptome using a limited number of oligonucleotide detection probes in a homo-geneous assay system. The present invention solves the problems faced by contemporary approaches to homogeneous assays outlined above. This is done by providing a method for construction of generic multi-probes with sufficient sequence specificity - so that they are unlikely to detect a randomly amplified sequence fragment or primer-dimers -but are still capable of detecting many different target sequences each. Such probes are usable in different assays and may be combined in small probe libraries (50 to 500 probes) that can be used to detect and/or quantify individual components in complex mixtures composed of thousands of different nucleic acids (e.g. detecting individual transcripts in the human transcriptome composed of >30,000 different nucleic acids.) when combined with a target specific primer set.
Each multi-probe comprises two elements: 1) a detection element or detection moiety con-sisting of one or more labels to detect the binding of the probe to the target; and 2) a recog-nition element or recognition sequence tag ensuring the binding to the specific target(s) of interest. The detection element can be any of a variety of detection principles used in homo-geneous assays. The detection of binding is either direct by a measurable change in the properties of one or more of the labels following binding to the target (e.g.
a molecular bea-con type assay with or without stem structure) or indirect by a subsequent reaction following binding (e.g. cleavage by the 5' nuclease activity of the DNA polymerase in 5' nuclease as-says).
Each detection element may include a quencher selected from the quenchers disclosed in European patent applications 04078170 and 03759288. In that context, all disclosures relating to the quenchers disclosed in these two patent applications relate mutatis mutandis to quenchers forming part of oligonucleotide probes that are part of the libraries of the present invention and both disclosures are therefore incorporated by reference herein.
SUMMARY OF THE INVENTION
It is desirable to be able to quantify the expression of most genes (e.g., >98%) in e.g. the human transcriptome using a limited number of oligonucleotide detection probes in a homo-geneous assay system. The present invention solves the problems faced by contemporary approaches to homogeneous assays outlined above. This is done by providing a method for construction of generic multi-probes with sufficient sequence specificity - so that they are unlikely to detect a randomly amplified sequence fragment or primer-dimers -but are still capable of detecting many different target sequences each. Such probes are usable in different assays and may be combined in small probe libraries (50 to 500 probes) that can be used to detect and/or quantify individual components in complex mixtures composed of thousands of different nucleic acids (e.g. detecting individual transcripts in the human transcriptome composed of >30,000 different nucleic acids.) when combined with a target specific primer set.
Each multi-probe comprises two elements: 1) a detection element or detection moiety con-sisting of one or more labels to detect the binding of the probe to the target; and 2) a recog-nition element or recognition sequence tag ensuring the binding to the specific target(s) of interest. The detection element can be any of a variety of detection principles used in homo-geneous assays. The detection of binding is either direct by a measurable change in the properties of one or more of the labels following binding to the target (e.g.
a molecular bea-con type assay with or without stem structure) or indirect by a subsequent reaction following binding (e.g. cleavage by the 5' nuclease activity of the DNA polymerase in 5' nuclease as-says).
Each detection element may include a quencher selected from the quenchers disclosed in European patent applications 04078170 and 03759288. In that context, all disclosures relating to the quenchers disclosed in these two patent applications relate mutatis mutandis to quenchers forming part of oligonucleotide probes that are part of the libraries of the present invention and both disclosures are therefore incorporated by reference herein.
The quencher preferably has formula I
R R 1 1 (1) wherein one or two of Rl, R4 , RS and R8 independently is/are a bond or selected from substituted or non-substituted amino group, which constitute(s) the linker(s) to the remainder of the oligonucleotide probe, and wherein the remaining Rl to R8 groups are each, independently hydrogen or substituted or non-substituted hydroxy, amino, alkyl, aryl, arylalkyl or alkoxy The substitution of the amino group can be with an alkyl, alkylaryl or aryl group.
The term "alkyl" is used herein in the context of formula I to refer to a branched or unbranched, saturated or unsaturated, monovalent hydrocarbon radical, generally having from about 1-30 carbons and preferably, from 1-6 carbons. Suitable alkyl radicals include, for example, structures containing one or more methylene, methine and/or methyne groups.
Branched structures have a branching motif similar to iso-propyl, t-butyl, i-butyl, 2-ethylpropyl, etc. As used herein, the term encompasses "substituted alkyls"
and "cyclic alkyl". "Substituted alkyl" refers to alkyl as just described including one or more substituents such as, for example, Cl-C6-alkyl, aryl, acyl, halogen (i.e. alkylhalos, e.g., CF3), hydroxy, amino, alkoxy, alkylamino, acylamino, thioamido, acyloxy, aryloxy, aryloxyalkyl, mercapto, thia, aza, oxo, both saturated and unsaturated cyclic hydrocarbons, heterocycles and the like.
These groups may be attached to any carbon or substituent of the alkyl moiety.
Additionally, these groups may be pendent from, or integral to, the alkyl chain.
The term "alkylaryl" in this context means a radical obtained by combining an alkyl and an aryl group. Typical alkylaryl groups include phenethyl, ethyl phenyl and the like.
The term "alkylamino" in this context means amino substituted with alkyl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
The term "lalkylarylamino" in this context means amino substituted with alkylaryl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
The term "arylamino" in this context means amino substituted with aryl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
Especially preferred examples of quenchers used in the invention include 1,4-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone, 1,5-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-hydroxypropylamino)-5-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,4-bis-5 (4-(2-hydroxyethyl)phenylamino)-anthraquinone, 1-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-anthraquinone, 1,8-bis-(3-hydroxy-propylamino)-anthraquinone, 1,4-bis(3-hydroxypropylamino)-6-methylanthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-6(7)-methyl-anthraquinone, 1,4-bis(4-(2-hydroethyl)phenylamino)-6-methyl-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-carboxy-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-1-yl))carboxamido-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-carboxy-anthraquinone, 1,4-bis(propylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,5-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, 1-(4-(2-hydroethyl)phenylamino)-5-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone, 1,8-bis(3-hydroxypropylamino)-anthraquinone, 1-(3-hydroxypropylamino)-8-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,8-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, and 1-(4-(2-hydroethyl)phenylamino)-8-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone.
One especially preferred quencher is compound 11 of Example 21, i.e. 1,4-Bis(2-hydroxyethylamino)-6-methylanthraquinone.
The recognition element also contributes to the novelty of the present invention. It comprises a short oligonucleotide moiety whose sequence has been selected to enable detection of a large subset of target nucleotides in a given complex sample mixture. The novel probes designed to detect many different target molecules each are referred to as multi-probes. The concept of designing a probe for multiple targets and exploit the recurrence of a short recognition sequence by selecting the most frequently encountered sequences is novel and contrary to conventional probes that are designed to be as specific as possible for a single target sequence. The surrounding primers and the choice of probe sequence in combination subsequently ensure the specificity of the multi-probes. The novel design principles arising from attempts to address the largest number of targets with the smallest number of probes are likewise part of the invention. This is enabled by the discovery that very short 8-9 mer LNA containing oligonucleotide probes are compatible with PCR based assays. In one aspect of the present invention modified or analogue nucleobases, nucleosidic bases or nucleotides are incorporated in the recognition element, possibly together with minor groove binders and other modifications, that all aim to stabilize the duplex formed between the probe and the target molecule so that the shortest possible probe sequence with the widest range of targets can be used. In a preferred aspect of the invention the modifications are incorporation of LNA
residues to reduce the length of the recognition element to 8 or 9 nucleotides while maintaining sufficient stability of the formed duplex to be detectable under ordinary assay conditions. Typically, less than 20% of the oligonucleotide probes of said library have a guanidyl (G) residue in the 5' and/or 3' position of the recognition element, but it is preferred that less than 10% of the oligonucleotide probes have a G in the 5' end of the recognition element, such as less than 5%. Especially preferred are libraries where the recognition elements do not have a G in the 5' end.
Preferably, the multi-probes are modified in order to increase the binding affinity of the probe for a target sequence by at least two-fold compared to a probe of the same sequence without the modification, under the same conditions for detection, e.g., such as PCR
conditions, or stringent hybridization conditions. The preferred modifications include, but are not limited to, inclusion of nucleobases, nucleosidic bases or nucleotides that has been modified by a chemi-cal moiety or replaced by an analogue (e.g. including a ribose or deoxyribose analogue) or by using internucleotide linkages other than phosphodiester linkages (such as non-phosphate internucleotide linkages), all to increase the binding affinity. The preferred modifications may also include attachment of duplex stabilizing agents e.g., such as minor-groove-binders (MGB) or intercalating nucleic acids (INA). Additionally the preferred modifications may also include addition of non-discriminatory bases e.g., such as 5-nitroindole, which are capable of stabiiizing duplex formation regardless of the nucleobase at the opposing position on the target strand. Actually, a preferred embodiment entails that all probes in the inventive library include at least one 5-nitroindole residue (and most preferred: all probes include one single = f 5-nitroindole residue. Finally, multi-probes composed of a non-sugar-phosphate backbone, e.g. such as PNA, that are capable of binding sequence specifically to a target sequence are also considered as modification. All the different binding affinity increased modifications mentioned above will in the following be referred to as "the stabilizing modification(s)", and the ensuing multi-probe wili in the following also be referred to as "modified oligonucleotide".
More preferably the binding affinity of the modified oligonucleotide is at least about 3-fold, 4-fold, 5-fold, or 20-fold higher than the binding of a probe of the same sequence but without the stabilizing modification(s).
Most preferably, the stabilizing modification(s) is inclusion of one or more LNA nucleotide analogs. Probes of from 6 to 12 nucleotides according to the invention may comprise from 1 to 8 stabilizing nucleotides, such as LNA nucleotides. When at least two LNA
nucleotides are included, these may be consecutive or separated by one or more non-LNA
nucleotides. In one aspect, LNA nucleotides are alpha and/or xylo LNA nucleotides.
R R 1 1 (1) wherein one or two of Rl, R4 , RS and R8 independently is/are a bond or selected from substituted or non-substituted amino group, which constitute(s) the linker(s) to the remainder of the oligonucleotide probe, and wherein the remaining Rl to R8 groups are each, independently hydrogen or substituted or non-substituted hydroxy, amino, alkyl, aryl, arylalkyl or alkoxy The substitution of the amino group can be with an alkyl, alkylaryl or aryl group.
The term "alkyl" is used herein in the context of formula I to refer to a branched or unbranched, saturated or unsaturated, monovalent hydrocarbon radical, generally having from about 1-30 carbons and preferably, from 1-6 carbons. Suitable alkyl radicals include, for example, structures containing one or more methylene, methine and/or methyne groups.
Branched structures have a branching motif similar to iso-propyl, t-butyl, i-butyl, 2-ethylpropyl, etc. As used herein, the term encompasses "substituted alkyls"
and "cyclic alkyl". "Substituted alkyl" refers to alkyl as just described including one or more substituents such as, for example, Cl-C6-alkyl, aryl, acyl, halogen (i.e. alkylhalos, e.g., CF3), hydroxy, amino, alkoxy, alkylamino, acylamino, thioamido, acyloxy, aryloxy, aryloxyalkyl, mercapto, thia, aza, oxo, both saturated and unsaturated cyclic hydrocarbons, heterocycles and the like.
These groups may be attached to any carbon or substituent of the alkyl moiety.
Additionally, these groups may be pendent from, or integral to, the alkyl chain.
The term "alkylaryl" in this context means a radical obtained by combining an alkyl and an aryl group. Typical alkylaryl groups include phenethyl, ethyl phenyl and the like.
The term "alkylamino" in this context means amino substituted with alkyl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
The term "lalkylarylamino" in this context means amino substituted with alkylaryl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
The term "arylamino" in this context means amino substituted with aryl. In a preferred embodiment, the amino group is attached to the anthraquinone structure.
Especially preferred examples of quenchers used in the invention include 1,4-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone, 1,5-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-hydroxypropylamino)-5-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,4-bis-5 (4-(2-hydroxyethyl)phenylamino)-anthraquinone, 1-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-anthraquinone, 1,8-bis-(3-hydroxy-propylamino)-anthraquinone, 1,4-bis(3-hydroxypropylamino)-6-methylanthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-6(7)-methyl-anthraquinone, 1,4-bis(4-(2-hydroethyl)phenylamino)-6-methyl-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-carboxy-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-1-yl))carboxamido-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-carboxy-anthraquinone, 1,4-bis(propylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,5-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, 1-(4-(2-hydroethyl)phenylamino)-5-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone, 1,8-bis(3-hydroxypropylamino)-anthraquinone, 1-(3-hydroxypropylamino)-8-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,8-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, and 1-(4-(2-hydroethyl)phenylamino)-8-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone.
One especially preferred quencher is compound 11 of Example 21, i.e. 1,4-Bis(2-hydroxyethylamino)-6-methylanthraquinone.
The recognition element also contributes to the novelty of the present invention. It comprises a short oligonucleotide moiety whose sequence has been selected to enable detection of a large subset of target nucleotides in a given complex sample mixture. The novel probes designed to detect many different target molecules each are referred to as multi-probes. The concept of designing a probe for multiple targets and exploit the recurrence of a short recognition sequence by selecting the most frequently encountered sequences is novel and contrary to conventional probes that are designed to be as specific as possible for a single target sequence. The surrounding primers and the choice of probe sequence in combination subsequently ensure the specificity of the multi-probes. The novel design principles arising from attempts to address the largest number of targets with the smallest number of probes are likewise part of the invention. This is enabled by the discovery that very short 8-9 mer LNA containing oligonucleotide probes are compatible with PCR based assays. In one aspect of the present invention modified or analogue nucleobases, nucleosidic bases or nucleotides are incorporated in the recognition element, possibly together with minor groove binders and other modifications, that all aim to stabilize the duplex formed between the probe and the target molecule so that the shortest possible probe sequence with the widest range of targets can be used. In a preferred aspect of the invention the modifications are incorporation of LNA
residues to reduce the length of the recognition element to 8 or 9 nucleotides while maintaining sufficient stability of the formed duplex to be detectable under ordinary assay conditions. Typically, less than 20% of the oligonucleotide probes of said library have a guanidyl (G) residue in the 5' and/or 3' position of the recognition element, but it is preferred that less than 10% of the oligonucleotide probes have a G in the 5' end of the recognition element, such as less than 5%. Especially preferred are libraries where the recognition elements do not have a G in the 5' end.
Preferably, the multi-probes are modified in order to increase the binding affinity of the probe for a target sequence by at least two-fold compared to a probe of the same sequence without the modification, under the same conditions for detection, e.g., such as PCR
conditions, or stringent hybridization conditions. The preferred modifications include, but are not limited to, inclusion of nucleobases, nucleosidic bases or nucleotides that has been modified by a chemi-cal moiety or replaced by an analogue (e.g. including a ribose or deoxyribose analogue) or by using internucleotide linkages other than phosphodiester linkages (such as non-phosphate internucleotide linkages), all to increase the binding affinity. The preferred modifications may also include attachment of duplex stabilizing agents e.g., such as minor-groove-binders (MGB) or intercalating nucleic acids (INA). Additionally the preferred modifications may also include addition of non-discriminatory bases e.g., such as 5-nitroindole, which are capable of stabiiizing duplex formation regardless of the nucleobase at the opposing position on the target strand. Actually, a preferred embodiment entails that all probes in the inventive library include at least one 5-nitroindole residue (and most preferred: all probes include one single = f 5-nitroindole residue. Finally, multi-probes composed of a non-sugar-phosphate backbone, e.g. such as PNA, that are capable of binding sequence specifically to a target sequence are also considered as modification. All the different binding affinity increased modifications mentioned above will in the following be referred to as "the stabilizing modification(s)", and the ensuing multi-probe wili in the following also be referred to as "modified oligonucleotide".
More preferably the binding affinity of the modified oligonucleotide is at least about 3-fold, 4-fold, 5-fold, or 20-fold higher than the binding of a probe of the same sequence but without the stabilizing modification(s).
Most preferably, the stabilizing modification(s) is inclusion of one or more LNA nucleotide analogs. Probes of from 6 to 12 nucleotides according to the invention may comprise from 1 to 8 stabilizing nucleotides, such as LNA nucleotides. When at least two LNA
nucleotides are included, these may be consecutive or separated by one or more non-LNA
nucleotides. In one aspect, LNA nucleotides are alpha and/or xylo LNA nucleotides.
The invention also provides oligomer multi-probe library useful under conditions used in NASBA based assays.
NASBA is a specific, isothermal method of nucleic acid amplification suited for the amplifica-tion of RNA. Nucleic acid isolation is achieved via lysis with guanidine thiocyanate plus Triton X-100 and ending with purified nucleic acid being eluted from silicon dioxide particles. Ampli-fication by NASBA involves the coordinated activities of three enzymes, AMV
Reverse Tran-scriptase, RNase H, and T7 RNA Polymerase. Quantitative detection is achieved by way of internal calibrators, added at isolation, which are co-amplified and subsequently identified along with the wild type of RNA using electro chemiluminescence.
The invention also provides an oligomer multi-probe library comprising multi-probes compri-sing at least one with stabilizing modifications as defined above. Preferably, the probes are less than about 20 nucleotides in length and more preferably less than 12 nucleotides, and most preferably about 8 or 9 nucleotides. Also, preferably, the library comprises less than about 3000 probes and more preferably the library comprises less than 500 probes and most preferably about 100 probes. The libraries containing labelled multi-probes may be used in a variety of applications depending on the type of detection element attached to the recogni-tion element. These applications include, but are not limited to, dual or single labelled assays such as 5' nuclease assay, molecular beacon applications (see, e.g., Tyagi and Kramer Nat.
Biotechnol. 14: 303-308, 1996) and other FRET-based assays.
In one aspect of the invention the multi-probes described above, are designed together to complement each other as a predefined subset of all possible sequences of the given lengths selected to be able to detect/characterize/quantify the largest number of nucleic acids in a complex mixture using the smallest number of multi-probe sequences. These predesigned small subsets of all possible sequences constitute a multi-probe library. The multi-probe Ii-braries described by the present invention attains this functionality at a greatly reduced com-plexity by deliberately selecting the most commonly occurring oligomers of a given length or lengths while attempting to diversify the selection to get the best possible coverage of the complex nucleic acid target population. In one preferred aspect, probes of the library hybri-dize with more than about 60% of a target population of nucleic acids, such as a population of human mRNAs. More preferably, the probes hybridize with greater than 70%, greater than 80%, greater than 90%, greater than 95% and even greater than 98% of all target nu-cleic acid molecules in a population of target molecules (see, e.g., Fig. 1).
In a most preferred aspect of the invention, a probe library (i.e. such as about 100 multi-probes) comprising about 0.1 % of all possible sequences of the selected probe length(s), is capable of detecting, classifying, and/or quantifying more than 98% of mRNA
transcripts in the transcriptome of any specific species, particularly mammals and more particular humans (i.e., > 35,000 different mRNA sequences). In fact, it is preferred that at least 85% of all target nucleic acids in a target population are covered by a multi-probe library of the inven-tion.
The problems with existing homogeneous assays mentioned above are addressed by the use of a multi-probe library according to the invention consisting of a minimal set of short detec-tion probes selected so as to recognize or detect a majority of all expressed genes in a given cell type from a given organism. In one aspect, the library comprises probes that detect each transcript in a transcriptome of greater than about 10,000 genes, greater than about 15,000 genes, greater than about 20,000 genes, greater than about 25,000 genes, greater than about 30,000 genes or greater than about 35,000 genes or equivalent numbers of dif-ferent mRNA transcripts. In one preferred aspect, the library comprises probes that detect mammalian transcripts sequences, e.g., such as mouse, rat, rabbit, monkey, or human se-quences.
By providing a cost efficient multi-probe set useful for rapid development of quantitative real-time and end-point PCR assays, the present invention overcomes the limitations discussed above for contemporary homogeneous assays. The detection element of the multi-probes according to the invention may be single or doubly labelled (e.g. by comprising a label at each end of the probe, or an internal position). Thus, probes according to the invention can be adapted for use in 5' nuclease assays, molecular beacon assays, FRET
assays, and other similar assays. In one aspect, the detection multi-probe comprises two labels capable of in-teracting with each other to produce a signal or to modify a signal, such that a signal or a change in a signal may be detected when the probe hybridizes to a target sequence. A parti-cular aspect is when the two labels comprise a quencher and a reporter molecule.
In another aspect, the probe comprises a target-specific recognition segment capable of spe-cifically hybridizing to a plurality of different nucleic acid molecules comprising the comple-mentary recognition sequence. A particular detection aspect of the invention referred to as a "molecular beacon with a stem region" is when the recognition segment is flanked by first and second complementary hairpin-forming sequences which may anneal to form a hairpin.
A reporter label is attached to the end of one complementary sequence and a quenching moiety is attached to the end of the other complementary sequence. The stem formed when the first and second complementary sequences are hybridized (i.e., when the probe recogni-tion segment is not hybridized to its target) keeps these two labels in close proximity to each other, causing a signal produced by the reporter to be quenched by fluorescence resonance energy transfer (FRET). The proximity of the two labels is reduced when the probe is hybri-dized to a target sequence and the change in proximity produces a change in the interaction between the labels. Hybridization of the probe thus results in a signal (e.g.
fluorescence) being produced by the reporter molecule, which can be detected and/or quantified.
In another aspect, the multi-probe comprises a reporter and a quencher molecule at oppo-sing ends of the short recognition sequence, so that these moieties are in sufficient proximity to each other, that the quencher substantially reduces the signal produced by the reporter molecule. This is the case both when the probe is free in solution as well as when it is bound to the target nucleic acid. A particular detection aspect of the invention referred to as a "5' nuclease assay" is when the muiti-probe may be susceptible to cleavage by the 5' nuclease activity of the DNA polymerase. This reaction may possibly result in separation of the quencher molecule from the reporter molecule and the production of a detectable signal.
Thus, such probes can be used in amplification-based assays to detect and/or quantify the amplification process for a target nucleic acid.
In a first aspect, the present invention relates to libraries of multi-probes as discussed above.
In such a library of oligonucleotide probes, each probe comprises a detection element and a recognition segment having a length of about 8-9 nucleotides, where some or all of the nucleobases in said oligonucleotides are substituted by non-natural bases having the effect of increasing binding affinity compared to natural nucleobases, and/or some or all of the nucleo-tide units of the oligonucleotide probe are modified with a chemical moiety to increase bin-ding affinity, and/or where said oligonucleotides are modified with a chemical moiety to in-crease binding affinity, such that the probe has sufficient stability for binding to the target sequence under conditions suitable for detection, and wherein the number of different recog-nition segments comprises less than 10% of all possible segments of the given length, and wherein more than 90% of the probes can detect more than one complementary target in a target population of nucleic acids such that the library of oligonucleotide probes can detect a substantial fraction of all target sequences in a target population of nucleic acids.
The invention therefore relates to a library of oligonucleotide probes wherein each probe in the library consists of a recognition sequence tag and a detection moiety wherein at least one monomer in each oligonucleotide probe is a modified monomer analogue, increasing the binding affinity for the complementary target sequence relative to the corresponding unmo-dified oligonucleotide (which may e.g. be an unmodified oligodeoxyribonucleotide or oligoribonucleotide), such that the library probes have sufficient stability for sequence-specific binding and detection of a substantial fraction of a target nucleic acid in any given target population and wherein the number of different recognition sequences comprises less than 10% of all possible sequence tags of a given length(s).
The invention further relates to a library of oligonucleotide probes wherein the recognition sequence tag segment of the probes in the library have been modified in at least one of the following ways:
i) substitution with at least one non-naturally occurring nucleotide; and 5 ii) substitution with at least one chemical moiety to increase the stability of the probe.
Further, the invention relates to a library of oligonucleotide probes wherein the recognition sequence tag has a length of 6 to 12 nucleotides (i.e. 6, 7, 8, 9, 10, 11 or 12), and wherein the preferred length is 8 or 9 nucleotides.
Further, the invention relates to recognition sequence tags that are substituted with LNA nu-10 cleotides.
Also part of the invention is an oligonucleotide probe comprising a quencher of formula I and a 5'-nitroindole residue. It is believed that such useful multiprobes are inventive in their own right. Preferred such probes are free from a 5' guanidyl residue, and in general such inventive probes are disclosed in the present specification and claims.
Especially preferred probes are those set forth in Table 1, Table 1A, Fig. 13, or Fig 14.
Moreover, the invention relates to libraries of the invention where more than 90% of the oli-gonucleotide probes can bind and detect at least two target sequences in a nucleic acid population, preferably because the bound target sequences that are complementary to the recognition sequence of the probes.
Also preferably, the probe is capable of detecting more than one target in a target population of nucleic acids, e.g., the probe is capable of hybridizing to a plurality of different nucleic acid molecules contained within the target population of nucleic acids.
The invention also provides a method, system and computer program embedded in a com-puter readable medium ("a computer program product") for designing multi-probes compri-sing at least one stabilizing nucleobase. The method comprises querying a database of tar-get sequences (e.g., such as a database of expressed sequences) and designing a small set of probes (e.g. such as 50 or 100 or 200 or 300 or 500) which: i) has sufficient binding stabi-lity to bind their respective target sequence under PCR conditions, ii) have limited propensity to form duplex structures with itself, and iii) are capable of binding to and detect-ing/quantifying at least about 60%, at least about 70%, at least about 80%, at least about 90% or at least about 95% of all the sequences in the given database of sequences, such as a database of expressed sequences.
NASBA is a specific, isothermal method of nucleic acid amplification suited for the amplifica-tion of RNA. Nucleic acid isolation is achieved via lysis with guanidine thiocyanate plus Triton X-100 and ending with purified nucleic acid being eluted from silicon dioxide particles. Ampli-fication by NASBA involves the coordinated activities of three enzymes, AMV
Reverse Tran-scriptase, RNase H, and T7 RNA Polymerase. Quantitative detection is achieved by way of internal calibrators, added at isolation, which are co-amplified and subsequently identified along with the wild type of RNA using electro chemiluminescence.
The invention also provides an oligomer multi-probe library comprising multi-probes compri-sing at least one with stabilizing modifications as defined above. Preferably, the probes are less than about 20 nucleotides in length and more preferably less than 12 nucleotides, and most preferably about 8 or 9 nucleotides. Also, preferably, the library comprises less than about 3000 probes and more preferably the library comprises less than 500 probes and most preferably about 100 probes. The libraries containing labelled multi-probes may be used in a variety of applications depending on the type of detection element attached to the recogni-tion element. These applications include, but are not limited to, dual or single labelled assays such as 5' nuclease assay, molecular beacon applications (see, e.g., Tyagi and Kramer Nat.
Biotechnol. 14: 303-308, 1996) and other FRET-based assays.
In one aspect of the invention the multi-probes described above, are designed together to complement each other as a predefined subset of all possible sequences of the given lengths selected to be able to detect/characterize/quantify the largest number of nucleic acids in a complex mixture using the smallest number of multi-probe sequences. These predesigned small subsets of all possible sequences constitute a multi-probe library. The multi-probe Ii-braries described by the present invention attains this functionality at a greatly reduced com-plexity by deliberately selecting the most commonly occurring oligomers of a given length or lengths while attempting to diversify the selection to get the best possible coverage of the complex nucleic acid target population. In one preferred aspect, probes of the library hybri-dize with more than about 60% of a target population of nucleic acids, such as a population of human mRNAs. More preferably, the probes hybridize with greater than 70%, greater than 80%, greater than 90%, greater than 95% and even greater than 98% of all target nu-cleic acid molecules in a population of target molecules (see, e.g., Fig. 1).
In a most preferred aspect of the invention, a probe library (i.e. such as about 100 multi-probes) comprising about 0.1 % of all possible sequences of the selected probe length(s), is capable of detecting, classifying, and/or quantifying more than 98% of mRNA
transcripts in the transcriptome of any specific species, particularly mammals and more particular humans (i.e., > 35,000 different mRNA sequences). In fact, it is preferred that at least 85% of all target nucleic acids in a target population are covered by a multi-probe library of the inven-tion.
The problems with existing homogeneous assays mentioned above are addressed by the use of a multi-probe library according to the invention consisting of a minimal set of short detec-tion probes selected so as to recognize or detect a majority of all expressed genes in a given cell type from a given organism. In one aspect, the library comprises probes that detect each transcript in a transcriptome of greater than about 10,000 genes, greater than about 15,000 genes, greater than about 20,000 genes, greater than about 25,000 genes, greater than about 30,000 genes or greater than about 35,000 genes or equivalent numbers of dif-ferent mRNA transcripts. In one preferred aspect, the library comprises probes that detect mammalian transcripts sequences, e.g., such as mouse, rat, rabbit, monkey, or human se-quences.
By providing a cost efficient multi-probe set useful for rapid development of quantitative real-time and end-point PCR assays, the present invention overcomes the limitations discussed above for contemporary homogeneous assays. The detection element of the multi-probes according to the invention may be single or doubly labelled (e.g. by comprising a label at each end of the probe, or an internal position). Thus, probes according to the invention can be adapted for use in 5' nuclease assays, molecular beacon assays, FRET
assays, and other similar assays. In one aspect, the detection multi-probe comprises two labels capable of in-teracting with each other to produce a signal or to modify a signal, such that a signal or a change in a signal may be detected when the probe hybridizes to a target sequence. A parti-cular aspect is when the two labels comprise a quencher and a reporter molecule.
In another aspect, the probe comprises a target-specific recognition segment capable of spe-cifically hybridizing to a plurality of different nucleic acid molecules comprising the comple-mentary recognition sequence. A particular detection aspect of the invention referred to as a "molecular beacon with a stem region" is when the recognition segment is flanked by first and second complementary hairpin-forming sequences which may anneal to form a hairpin.
A reporter label is attached to the end of one complementary sequence and a quenching moiety is attached to the end of the other complementary sequence. The stem formed when the first and second complementary sequences are hybridized (i.e., when the probe recogni-tion segment is not hybridized to its target) keeps these two labels in close proximity to each other, causing a signal produced by the reporter to be quenched by fluorescence resonance energy transfer (FRET). The proximity of the two labels is reduced when the probe is hybri-dized to a target sequence and the change in proximity produces a change in the interaction between the labels. Hybridization of the probe thus results in a signal (e.g.
fluorescence) being produced by the reporter molecule, which can be detected and/or quantified.
In another aspect, the multi-probe comprises a reporter and a quencher molecule at oppo-sing ends of the short recognition sequence, so that these moieties are in sufficient proximity to each other, that the quencher substantially reduces the signal produced by the reporter molecule. This is the case both when the probe is free in solution as well as when it is bound to the target nucleic acid. A particular detection aspect of the invention referred to as a "5' nuclease assay" is when the muiti-probe may be susceptible to cleavage by the 5' nuclease activity of the DNA polymerase. This reaction may possibly result in separation of the quencher molecule from the reporter molecule and the production of a detectable signal.
Thus, such probes can be used in amplification-based assays to detect and/or quantify the amplification process for a target nucleic acid.
In a first aspect, the present invention relates to libraries of multi-probes as discussed above.
In such a library of oligonucleotide probes, each probe comprises a detection element and a recognition segment having a length of about 8-9 nucleotides, where some or all of the nucleobases in said oligonucleotides are substituted by non-natural bases having the effect of increasing binding affinity compared to natural nucleobases, and/or some or all of the nucleo-tide units of the oligonucleotide probe are modified with a chemical moiety to increase bin-ding affinity, and/or where said oligonucleotides are modified with a chemical moiety to in-crease binding affinity, such that the probe has sufficient stability for binding to the target sequence under conditions suitable for detection, and wherein the number of different recog-nition segments comprises less than 10% of all possible segments of the given length, and wherein more than 90% of the probes can detect more than one complementary target in a target population of nucleic acids such that the library of oligonucleotide probes can detect a substantial fraction of all target sequences in a target population of nucleic acids.
The invention therefore relates to a library of oligonucleotide probes wherein each probe in the library consists of a recognition sequence tag and a detection moiety wherein at least one monomer in each oligonucleotide probe is a modified monomer analogue, increasing the binding affinity for the complementary target sequence relative to the corresponding unmo-dified oligonucleotide (which may e.g. be an unmodified oligodeoxyribonucleotide or oligoribonucleotide), such that the library probes have sufficient stability for sequence-specific binding and detection of a substantial fraction of a target nucleic acid in any given target population and wherein the number of different recognition sequences comprises less than 10% of all possible sequence tags of a given length(s).
The invention further relates to a library of oligonucleotide probes wherein the recognition sequence tag segment of the probes in the library have been modified in at least one of the following ways:
i) substitution with at least one non-naturally occurring nucleotide; and 5 ii) substitution with at least one chemical moiety to increase the stability of the probe.
Further, the invention relates to a library of oligonucleotide probes wherein the recognition sequence tag has a length of 6 to 12 nucleotides (i.e. 6, 7, 8, 9, 10, 11 or 12), and wherein the preferred length is 8 or 9 nucleotides.
Further, the invention relates to recognition sequence tags that are substituted with LNA nu-10 cleotides.
Also part of the invention is an oligonucleotide probe comprising a quencher of formula I and a 5'-nitroindole residue. It is believed that such useful multiprobes are inventive in their own right. Preferred such probes are free from a 5' guanidyl residue, and in general such inventive probes are disclosed in the present specification and claims.
Especially preferred probes are those set forth in Table 1, Table 1A, Fig. 13, or Fig 14.
Moreover, the invention relates to libraries of the invention where more than 90% of the oli-gonucleotide probes can bind and detect at least two target sequences in a nucleic acid population, preferably because the bound target sequences that are complementary to the recognition sequence of the probes.
Also preferably, the probe is capable of detecting more than one target in a target population of nucleic acids, e.g., the probe is capable of hybridizing to a plurality of different nucleic acid molecules contained within the target population of nucleic acids.
The invention also provides a method, system and computer program embedded in a com-puter readable medium ("a computer program product") for designing multi-probes compri-sing at least one stabilizing nucleobase. The method comprises querying a database of tar-get sequences (e.g., such as a database of expressed sequences) and designing a small set of probes (e.g. such as 50 or 100 or 200 or 300 or 500) which: i) has sufficient binding stabi-lity to bind their respective target sequence under PCR conditions, ii) have limited propensity to form duplex structures with itself, and iii) are capable of binding to and detect-ing/quantifying at least about 60%, at least about 70%, at least about 80%, at least about 90% or at least about 95% of all the sequences in the given database of sequences, such as a database of expressed sequences.
Probes are designed in silico, which comprise all possible combinations of nucleotides of a given length forming a database of virtual candidate probes. These virtual probes are que-ried against the database of target sequences to identify probes that comprise the maximal ability to detect the most different target sequences in the database ("optimal probes"). Op-timal probes so identified are removed from the virtual probe database.
Additionally, target nucleic acids, which were identified by the previous set of optimal probes, are subtracted from the target nucleic acid database. The remaining probes are then queried against the remaining target sequences to identify a second set of optimal probes. The process is re-peated until a set of probes is identified which can provide the desired coverage of the target sequence database. The set may be stored in a database as a source of sequences for tran-scriptome analysis. Multi-probes may be synthesized having recognition sequences, which correspond to those in the database to generate a library of multi-probes.
In one preferred aspect, the target sequence database comprises nucleic acid sequences corresponding to human mRNA (e.g., mRNA molecules, cDNAs, and the like).
In another aspect, the method further comprises calculating stability based on the assump-tion that the recognition sequence comprises at least one stabilizing nucleotide, such as an LNA molecule. In one preferred aspect the calculated stability is used to eliminate probe re-cognition sequences with inadequate stability from the database of virtual candidate probes prior to the initial query against the database of target sequence to initiate the identification of optimal probe recognition sequences.
In another aspect, the method further comprises calculating the propensity for a given probe recognition sequence to form a duplex structure with itself based on the assumption that the recognition sequence comprises at least one stabilizing nucleotide, such as an LNA molecule.
In one preferred aspect the calculated propensity is used to eliminate probe recognition se-quences that are likely to form probe duplexes from the database of virtual candidate probes prior to the initiai query against the database of target sequence to initiate the determination of optimal probe recognition sequences.
In another aspect, the method further comprises evaluating the general applicability of a given candidate probe recognition sequence for inclusion in the growing set of optimal probe candidates by both a query against the remaining target sequences as well as a query against the original set of target sequences. In one preferred aspect only probe recognition sequences that are frequently found in both the remaining target sequences and in the origi-nal target sequences are added to in the growing set of optimal probe recognition sequences.
In a most preferred aspect this is accomplished by calculating the product of the scores from these queries and selecting the probes recognition sequence with the highest product that still is among the probe recognition sequences with 20% best score in the query against the current targets.
The invention also provides a computer program embedded in a computer readable medium comprising instructions for searching a database comprising a plurality of different target sequences and for identifying a set of probe recognition sequences capable of identifying to at least about 60%, about 70%, about 80%, about 90% and about 95% of the sequences within the database. In one aspect, the program provides instructions for executing the method described above. In another aspect, the program provides instructions for imple-menting an algorithm as shown in Fig. 2. The invention further provides a system wherein the system comprises a memory for storing a database comprising sequence information for a plurality of different target sequences and also comprises an application program for exe-cuting the program instructions for searching the database for a set of probe recognition se-quences which is capable of hybridizing to at least about 60%, about 70%, about 80%, about 90% and about 95% of the sequences within the database.
Another aspect of the invention relates to an oligonucleotide probe comprising a detection element and a recognition segment each independently having a length of about 1 to 8 or 9 nucleotides, wherein some or all of the nucleotides in the oligonucleotides are substituted by non-natural bases or base analogues having the effect of increasing binding affinity compared to natural nucleobases and/or some or all of the nucleotide units of the oligonucleotide probe are modified with a chemical moiety or replaced by an analogue to increase binding affinity, and/or where said oligonucleotides are modified with a chemical moiety or is an oligonucleo-tide analogue to increase binding affinity, such that the probe has sufficient stability for binding to the target sequence under conditions suitable for detection, and wherein the probe is capable of detecting more than one complementary target in a target population of nucleic acids.
A preferred embodiment of the invention is a kit for the characterization or detection or quantification of target nucleic acids comprising samples of a library of multi-probes. In one aspect, the kit comprises in silico protocols for their use. In another aspect, the kit com-prises information relating to suggestions for obtaining inexpensive DNA
primers. The probes contained within these kits may have any or all of the characteristics described above. In one preferred aspect, a plurality of probes comprises at least one stabilizing nucleotide, such as an LNA nucleotide. In another aspect, the plurality of probes comprises a nucleotide cou-pled to or stably associated with at least one chemical moiety for increasing the stability of binding of the probe. In a further preferred aspect, the kit comprises about 100 different probes. The kits according to the invention allow a user to quickly and efficiently develop an assay for thousands of different nucleic acid targets.
Additionally, target nucleic acids, which were identified by the previous set of optimal probes, are subtracted from the target nucleic acid database. The remaining probes are then queried against the remaining target sequences to identify a second set of optimal probes. The process is re-peated until a set of probes is identified which can provide the desired coverage of the target sequence database. The set may be stored in a database as a source of sequences for tran-scriptome analysis. Multi-probes may be synthesized having recognition sequences, which correspond to those in the database to generate a library of multi-probes.
In one preferred aspect, the target sequence database comprises nucleic acid sequences corresponding to human mRNA (e.g., mRNA molecules, cDNAs, and the like).
In another aspect, the method further comprises calculating stability based on the assump-tion that the recognition sequence comprises at least one stabilizing nucleotide, such as an LNA molecule. In one preferred aspect the calculated stability is used to eliminate probe re-cognition sequences with inadequate stability from the database of virtual candidate probes prior to the initial query against the database of target sequence to initiate the identification of optimal probe recognition sequences.
In another aspect, the method further comprises calculating the propensity for a given probe recognition sequence to form a duplex structure with itself based on the assumption that the recognition sequence comprises at least one stabilizing nucleotide, such as an LNA molecule.
In one preferred aspect the calculated propensity is used to eliminate probe recognition se-quences that are likely to form probe duplexes from the database of virtual candidate probes prior to the initiai query against the database of target sequence to initiate the determination of optimal probe recognition sequences.
In another aspect, the method further comprises evaluating the general applicability of a given candidate probe recognition sequence for inclusion in the growing set of optimal probe candidates by both a query against the remaining target sequences as well as a query against the original set of target sequences. In one preferred aspect only probe recognition sequences that are frequently found in both the remaining target sequences and in the origi-nal target sequences are added to in the growing set of optimal probe recognition sequences.
In a most preferred aspect this is accomplished by calculating the product of the scores from these queries and selecting the probes recognition sequence with the highest product that still is among the probe recognition sequences with 20% best score in the query against the current targets.
The invention also provides a computer program embedded in a computer readable medium comprising instructions for searching a database comprising a plurality of different target sequences and for identifying a set of probe recognition sequences capable of identifying to at least about 60%, about 70%, about 80%, about 90% and about 95% of the sequences within the database. In one aspect, the program provides instructions for executing the method described above. In another aspect, the program provides instructions for imple-menting an algorithm as shown in Fig. 2. The invention further provides a system wherein the system comprises a memory for storing a database comprising sequence information for a plurality of different target sequences and also comprises an application program for exe-cuting the program instructions for searching the database for a set of probe recognition se-quences which is capable of hybridizing to at least about 60%, about 70%, about 80%, about 90% and about 95% of the sequences within the database.
Another aspect of the invention relates to an oligonucleotide probe comprising a detection element and a recognition segment each independently having a length of about 1 to 8 or 9 nucleotides, wherein some or all of the nucleotides in the oligonucleotides are substituted by non-natural bases or base analogues having the effect of increasing binding affinity compared to natural nucleobases and/or some or all of the nucleotide units of the oligonucleotide probe are modified with a chemical moiety or replaced by an analogue to increase binding affinity, and/or where said oligonucleotides are modified with a chemical moiety or is an oligonucleo-tide analogue to increase binding affinity, such that the probe has sufficient stability for binding to the target sequence under conditions suitable for detection, and wherein the probe is capable of detecting more than one complementary target in a target population of nucleic acids.
A preferred embodiment of the invention is a kit for the characterization or detection or quantification of target nucleic acids comprising samples of a library of multi-probes. In one aspect, the kit comprises in silico protocols for their use. In another aspect, the kit com-prises information relating to suggestions for obtaining inexpensive DNA
primers. The probes contained within these kits may have any or all of the characteristics described above. In one preferred aspect, a plurality of probes comprises at least one stabilizing nucleotide, such as an LNA nucleotide. In another aspect, the plurality of probes comprises a nucleotide cou-pled to or stably associated with at least one chemical moiety for increasing the stability of binding of the probe. In a further preferred aspect, the kit comprises about 100 different probes. The kits according to the invention allow a user to quickly and efficiently develop an assay for thousands of different nucleic acid targets.
The invention further provides a multi-probe comprising one or more LNA
nucleotides, which has a reduced length of about 8, or 9 nucleotides. By selecting commonly occurring 8 and 9-mers as targets it is possible to detect many different genes with the same probe. Each 8 or 9-mer probe can be used to detect more than 7000 different human mRNA
sequences. The necessary specificity is then ensured by the combined effect of inexpensive DNA primers for the target gene and by the 8 or 9-mer probe sequence targeting the amplified DNA (Fig. 1).
In a preferred embodiment the present invention relates to an oligonucleotide multi-probe library comprising LNA-substituted octamers and nonamers of less than about 1000 sequen-ces, preferably less than about 500 sequences, or more preferably less than about 200 se-quences, such as consisting of about 100 different sequences selected so that the library is able to recognize more than about 90%, more preferably more than about 95% and more preferably more than about 98% of mRNA sequences of a target organism or target organ.
Positive control samples:
A recurring problem in designing real-time PCR detection assays for multiple genes is that the success-rate of these de-novo designs is less than 100%. Troubleshooting a non-functional assay can be cumbersome since ideally, a target specific template is needed for each probe, to test the functionality of the detection probe. Furthermore, a target specific template can be useful as a positive control if it is unknown whether the target is available in the test sam-ple. When operating with a limited number of detection probes in a probe library kit as de-scribed in the present invention (e.g. 90), it is feasible to also provide positive control targets in the form of PCR-amplifiable templates containing all possible targets for the limited num-ber of probes (e.g. 90). This feature allows users to evaluate the function of each probe, and is not feasible for non-recurring probe-based assays, and thus constitutes a further beneficial feature of the invention. For the suggested preferred probe recognition sequences listed in Fig. 13, we have designed concatamers of control sequences for all probes, containing a PCR-amplifiable target for every probe in the 40 first probes.
Probe sequence selection An important aspect of the present invention is the selection of optimal probe target sequen-ces in order to target as many targets with as few probes as possible, given a target selection criteria. This may be achieved by deliberately selecting target sequences that occur more frequently than what would have been expected from a random distribution.
The invention therefore relates in one aspect to a method of selecting oligonucleotide se-quences useful in a multi-probe library of the invention, the method comprising a) providing a first list of all possible oligonucleotides of a predefined number of nucleotides, N (typically an integer selected from 6, 7, 8, 9, 10, 11, and 12, preferably 8 or 9), said oligo-nucleotides having a melting temperature, Tm, of at least 50 C (preferably at least 60 C
such as at least 62 C), b) providing a second list of target nucleic acid sequences (such as a list of a target nucleic acid population discussed herein), c) identifying and storing for each member of said first list, the number of members from said second list, which include a sequence complementary to said each member, d) selecting a member of said first list, which in the identification in step c matches the maximum number, identified in step c, of members from said second list, e) adding the member selected in step d to a third list consisting of the selected oligonucleo-tides useful in the library according to the invention, f) subtracting the member selected in step d from said first list to provide a revised first list, m) repeating steps d through f until said third list consists of members which together will be contemplary to at least 30% of the members on the list of target nucleic acid sequences from step b (normally the percentage will be higher, such as at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or even higher such as at least 97%, at least 98% and even as high as at least 99%). As a further feature, the has a bias against including a member in the third list that have a 5' guanidyl (G) and/or a bias against including members in the third list that have a 3' guanidyl (G). This is the consequence of the surprising finding that the probes of the present invention are by far more effective in assays when they are free from a 5' guanidyl residue, but it has also been shown that omission of 3' guanidyl provides for advantages under assay conditions.
So, it is preferred that guanidyl is avoided as the 5' residue in all oligonucleotide sequences in said third list It is preferred that the first list only includes oligonucleotides incapable of self-hybridization in order to render a subsequent use of the probes less prone to false positives.
The selection method may include a number of steps after step f, but before step m g) subtraction of all members from said second list which include a sequence complementary to the member selected in step d to obtain a revised second list, h) identification and storing of, for each member of said revised first list, the number of members from said revised second list, which include a sequence complementary to said each member, i) selecting a member of said first list, which in the identification in step h matches the maximum number, identified in step h, of members from said second list, or selecting a member of said first list provides the maximum number obtained by multiplying the number identified in step h with the number identified in step c, j) addition of the member selected in step i to said third list, k) subtraction of the member selected in step i from said revised first list, and I) subtraction of all members from said revised second list which include a sequence or com-plementary to the member selected in step i.
5 The above-mentioned avoidance of guanidyl as the 5' residue is preferably achieved by i) reducing the list of step a to include only those that do not include a 5' guanidyl residue, and/or ii) avoiding selection in step d and/or i of those sequences which include a 5' guanidyl residue, and/or iii) omitting step e and/or j for those sequences that include a 5' guanidyl residue.
10 The selection in step d after step c is conveniently preceded by identification of those mem-bers of said first list which hybridizes to more than a selected percentage (6001o or higher such as the preferred 80%) of the maximum number of members from said second list so that only those members so identified are subjected to the selection in step d.
The method of the invention can also include the feature that it is ensured that members are 15 not entered on the third list if such members have previously failed qualitative as useful probes. Or, in simpler terms, after design of a library, the individual members are tested for their usefulness, and probes which are found to behave sub optimally in a relevant assay are included in a negative list' which is checked when later designing new probes and probe libraries. To avoid inclusion in the third list of oligonucleotide sequences that have previously failed qualitatively, it is possible to i) reduce the list of step a to include only those that have not previously failed qualitatively, and/or ii) avoid selection in step d or i of those sequences that have not previously failed qualitatively, and/or iil) omit step e or j for those sequences that have not previously failed qualitatively In the practical implementation of the selection method, said first, second and third lists are stored in the memory of a computer system, preferably in a database. The memory (also termed "computer readable medium") can be both volatile and non-volatile, i.e.
any memory device conventionally used in computer systems: a random access memory (RAM), a read-only memory (ROM), a data storage device such as a hard disk, a CD-ROM, DVD-ROM, and any other known memory device.
The invention also provides a computer program product providing instructions for imple-menting the selection method, embedded in a computer-readable medium (defined as above). That is, the computer program may be compiled and loaded in an active computer memory, or it may be loaded on a non-volatile storage device (optionally in a compressed format) from where it can be executed. Consequently, the invention also includes a system comprising a database of target sequences and an application program for executing the computer program. A source code for such a computer program is set forth in Fig. 17.
In a randomly distributed nucleic acid population, the occurrence of selected sequences of a given length will follow a statistical distribution defined by:
N1 = the complete length of the given nucleic acid population (e.g. 76.002.917 base pairs as in the 1]une 30, 2003 release of RefSeq).
N2= the number of fragments comprising the nucleic acid population (e.g.
38.556 genes in the 1)une 30, 2003 release of RefSeq).
N3 = the length of the recognition sequence (e.g. 9 base pairs) N4 = the occurrence frequency N4 =(N1-((N3-1) x 2 x N2))/(4N3) E.g.
76,002,917 - 8 x 2 x 38,556 = approximately 287 occurrences of 9-mer sequences or or 76,002,917 -7 x 2 x 38,556 = approximately 1,151 occurrences of 8-mer sequences 4$
Hence, as described in the example given above, a random 8-mer and 9-mer sequence would on average occur 1,151 and 287 times, respectively, in a random population of the described 38,556 mRNA sequences.
In the example above, the 76.002.917 base pairs originating from 38.556 genes would corre-spond to an average transcript length of 1971 bp, containing each 1971-16 or 1955 9-mer target sequences each. Thus as a statistical minimum, 38.556/1955/287 or 5671 9-mer probes would be needed for one probe to target each gene.
However, the occurrence of 9-mer sequences is not randomly distributed. In fact, a small subset of sequences occurs at surprisingly high prevalence, up to over 30 times the preva-lence anticipated from a random distribution. In a specific target population selected accor-ding to preferred criteria, preferably the most common sequences should be selected to in-crease the coverage of a selected library of probe target sequences. As described previously, selection should be step-wise, such that the selection of the most common target sequences is evaluated as well in the starting target population as well as in the population remaining after each selection step.
In a preferred embodiment of the invention the targets for the probe library are the entire expressed transcriptome.
Because the success rate of the reverse transcriptase reaction diminishes with the distance from the RT-primer used, and since using a poly-T primer targeting the poly-A
tract in mRNAs is common, the above-mentioned target can further be restricted to only include the 1000 most proximal bases in each mRNA. This may result in the selection of another set of optimal probe target sequences for optimal coverage.
Likewise the above-mentioned target may be restricted to include only the 50 bp of coding region sequence flanking the introns of a gene to ensure assays that preferably only monitor mRNA and not genomic DNA or to only include regions not containing di-, tri-or tetra repeat sequences, to avoid repetitive binding or probes or primers or regions not containing know allelic variation, to avoid primer or probe mis-annealing due to sequence variations in target sequences or regions of extremely high GC-content to avoid inhibition of PCR
amplification.
Depending on each target selection the optimal set of probes may vary, depending in the prevalence of target sequences in each target selection.
Examples of probe libraries Human genomic: A set of genomic sequences can be extracted from a genome, which could be the human, by dividing the genomic sequence in pieces of 500 nucleotides in length. Such a Probe Library can be used to measure any genomic sequence, including regulatory sequences, introns, repetitive sequences and other genomic sequences. The following library has been identified by means of the methods disclosed herein, cf. Fig. 17.
Table of oligos that are suitable for the human genome.
# no dnaID n nmer newhit cover sum p tm sc self 1naID ok oligo 1 18805 8 cacicctcc 9059 9059 9059 15 69 60 36 3365869 1 cAGCCTCC
2 21671 8 cccaggct 3786 8143 12845 22 66 56 38 2543023 1 ccCAGGCT
3 23888 8 cctcccaa 2446 8442 15291 26 63 56 8 3660644 1 cCTCCCAA
nucleotides, which has a reduced length of about 8, or 9 nucleotides. By selecting commonly occurring 8 and 9-mers as targets it is possible to detect many different genes with the same probe. Each 8 or 9-mer probe can be used to detect more than 7000 different human mRNA
sequences. The necessary specificity is then ensured by the combined effect of inexpensive DNA primers for the target gene and by the 8 or 9-mer probe sequence targeting the amplified DNA (Fig. 1).
In a preferred embodiment the present invention relates to an oligonucleotide multi-probe library comprising LNA-substituted octamers and nonamers of less than about 1000 sequen-ces, preferably less than about 500 sequences, or more preferably less than about 200 se-quences, such as consisting of about 100 different sequences selected so that the library is able to recognize more than about 90%, more preferably more than about 95% and more preferably more than about 98% of mRNA sequences of a target organism or target organ.
Positive control samples:
A recurring problem in designing real-time PCR detection assays for multiple genes is that the success-rate of these de-novo designs is less than 100%. Troubleshooting a non-functional assay can be cumbersome since ideally, a target specific template is needed for each probe, to test the functionality of the detection probe. Furthermore, a target specific template can be useful as a positive control if it is unknown whether the target is available in the test sam-ple. When operating with a limited number of detection probes in a probe library kit as de-scribed in the present invention (e.g. 90), it is feasible to also provide positive control targets in the form of PCR-amplifiable templates containing all possible targets for the limited num-ber of probes (e.g. 90). This feature allows users to evaluate the function of each probe, and is not feasible for non-recurring probe-based assays, and thus constitutes a further beneficial feature of the invention. For the suggested preferred probe recognition sequences listed in Fig. 13, we have designed concatamers of control sequences for all probes, containing a PCR-amplifiable target for every probe in the 40 first probes.
Probe sequence selection An important aspect of the present invention is the selection of optimal probe target sequen-ces in order to target as many targets with as few probes as possible, given a target selection criteria. This may be achieved by deliberately selecting target sequences that occur more frequently than what would have been expected from a random distribution.
The invention therefore relates in one aspect to a method of selecting oligonucleotide se-quences useful in a multi-probe library of the invention, the method comprising a) providing a first list of all possible oligonucleotides of a predefined number of nucleotides, N (typically an integer selected from 6, 7, 8, 9, 10, 11, and 12, preferably 8 or 9), said oligo-nucleotides having a melting temperature, Tm, of at least 50 C (preferably at least 60 C
such as at least 62 C), b) providing a second list of target nucleic acid sequences (such as a list of a target nucleic acid population discussed herein), c) identifying and storing for each member of said first list, the number of members from said second list, which include a sequence complementary to said each member, d) selecting a member of said first list, which in the identification in step c matches the maximum number, identified in step c, of members from said second list, e) adding the member selected in step d to a third list consisting of the selected oligonucleo-tides useful in the library according to the invention, f) subtracting the member selected in step d from said first list to provide a revised first list, m) repeating steps d through f until said third list consists of members which together will be contemplary to at least 30% of the members on the list of target nucleic acid sequences from step b (normally the percentage will be higher, such as at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or even higher such as at least 97%, at least 98% and even as high as at least 99%). As a further feature, the has a bias against including a member in the third list that have a 5' guanidyl (G) and/or a bias against including members in the third list that have a 3' guanidyl (G). This is the consequence of the surprising finding that the probes of the present invention are by far more effective in assays when they are free from a 5' guanidyl residue, but it has also been shown that omission of 3' guanidyl provides for advantages under assay conditions.
So, it is preferred that guanidyl is avoided as the 5' residue in all oligonucleotide sequences in said third list It is preferred that the first list only includes oligonucleotides incapable of self-hybridization in order to render a subsequent use of the probes less prone to false positives.
The selection method may include a number of steps after step f, but before step m g) subtraction of all members from said second list which include a sequence complementary to the member selected in step d to obtain a revised second list, h) identification and storing of, for each member of said revised first list, the number of members from said revised second list, which include a sequence complementary to said each member, i) selecting a member of said first list, which in the identification in step h matches the maximum number, identified in step h, of members from said second list, or selecting a member of said first list provides the maximum number obtained by multiplying the number identified in step h with the number identified in step c, j) addition of the member selected in step i to said third list, k) subtraction of the member selected in step i from said revised first list, and I) subtraction of all members from said revised second list which include a sequence or com-plementary to the member selected in step i.
5 The above-mentioned avoidance of guanidyl as the 5' residue is preferably achieved by i) reducing the list of step a to include only those that do not include a 5' guanidyl residue, and/or ii) avoiding selection in step d and/or i of those sequences which include a 5' guanidyl residue, and/or iii) omitting step e and/or j for those sequences that include a 5' guanidyl residue.
10 The selection in step d after step c is conveniently preceded by identification of those mem-bers of said first list which hybridizes to more than a selected percentage (6001o or higher such as the preferred 80%) of the maximum number of members from said second list so that only those members so identified are subjected to the selection in step d.
The method of the invention can also include the feature that it is ensured that members are 15 not entered on the third list if such members have previously failed qualitative as useful probes. Or, in simpler terms, after design of a library, the individual members are tested for their usefulness, and probes which are found to behave sub optimally in a relevant assay are included in a negative list' which is checked when later designing new probes and probe libraries. To avoid inclusion in the third list of oligonucleotide sequences that have previously failed qualitatively, it is possible to i) reduce the list of step a to include only those that have not previously failed qualitatively, and/or ii) avoid selection in step d or i of those sequences that have not previously failed qualitatively, and/or iil) omit step e or j for those sequences that have not previously failed qualitatively In the practical implementation of the selection method, said first, second and third lists are stored in the memory of a computer system, preferably in a database. The memory (also termed "computer readable medium") can be both volatile and non-volatile, i.e.
any memory device conventionally used in computer systems: a random access memory (RAM), a read-only memory (ROM), a data storage device such as a hard disk, a CD-ROM, DVD-ROM, and any other known memory device.
The invention also provides a computer program product providing instructions for imple-menting the selection method, embedded in a computer-readable medium (defined as above). That is, the computer program may be compiled and loaded in an active computer memory, or it may be loaded on a non-volatile storage device (optionally in a compressed format) from where it can be executed. Consequently, the invention also includes a system comprising a database of target sequences and an application program for executing the computer program. A source code for such a computer program is set forth in Fig. 17.
In a randomly distributed nucleic acid population, the occurrence of selected sequences of a given length will follow a statistical distribution defined by:
N1 = the complete length of the given nucleic acid population (e.g. 76.002.917 base pairs as in the 1]une 30, 2003 release of RefSeq).
N2= the number of fragments comprising the nucleic acid population (e.g.
38.556 genes in the 1)une 30, 2003 release of RefSeq).
N3 = the length of the recognition sequence (e.g. 9 base pairs) N4 = the occurrence frequency N4 =(N1-((N3-1) x 2 x N2))/(4N3) E.g.
76,002,917 - 8 x 2 x 38,556 = approximately 287 occurrences of 9-mer sequences or or 76,002,917 -7 x 2 x 38,556 = approximately 1,151 occurrences of 8-mer sequences 4$
Hence, as described in the example given above, a random 8-mer and 9-mer sequence would on average occur 1,151 and 287 times, respectively, in a random population of the described 38,556 mRNA sequences.
In the example above, the 76.002.917 base pairs originating from 38.556 genes would corre-spond to an average transcript length of 1971 bp, containing each 1971-16 or 1955 9-mer target sequences each. Thus as a statistical minimum, 38.556/1955/287 or 5671 9-mer probes would be needed for one probe to target each gene.
However, the occurrence of 9-mer sequences is not randomly distributed. In fact, a small subset of sequences occurs at surprisingly high prevalence, up to over 30 times the preva-lence anticipated from a random distribution. In a specific target population selected accor-ding to preferred criteria, preferably the most common sequences should be selected to in-crease the coverage of a selected library of probe target sequences. As described previously, selection should be step-wise, such that the selection of the most common target sequences is evaluated as well in the starting target population as well as in the population remaining after each selection step.
In a preferred embodiment of the invention the targets for the probe library are the entire expressed transcriptome.
Because the success rate of the reverse transcriptase reaction diminishes with the distance from the RT-primer used, and since using a poly-T primer targeting the poly-A
tract in mRNAs is common, the above-mentioned target can further be restricted to only include the 1000 most proximal bases in each mRNA. This may result in the selection of another set of optimal probe target sequences for optimal coverage.
Likewise the above-mentioned target may be restricted to include only the 50 bp of coding region sequence flanking the introns of a gene to ensure assays that preferably only monitor mRNA and not genomic DNA or to only include regions not containing di-, tri-or tetra repeat sequences, to avoid repetitive binding or probes or primers or regions not containing know allelic variation, to avoid primer or probe mis-annealing due to sequence variations in target sequences or regions of extremely high GC-content to avoid inhibition of PCR
amplification.
Depending on each target selection the optimal set of probes may vary, depending in the prevalence of target sequences in each target selection.
Examples of probe libraries Human genomic: A set of genomic sequences can be extracted from a genome, which could be the human, by dividing the genomic sequence in pieces of 500 nucleotides in length. Such a Probe Library can be used to measure any genomic sequence, including regulatory sequences, introns, repetitive sequences and other genomic sequences. The following library has been identified by means of the methods disclosed herein, cf. Fig. 17.
Table of oligos that are suitable for the human genome.
# no dnaID n nmer newhit cover sum p tm sc self 1naID ok oligo 1 18805 8 cacicctcc 9059 9059 9059 15 69 60 36 3365869 1 cAGCCTCC
2 21671 8 cccaggct 3786 8143 12845 22 66 56 38 2543023 1 ccCAGGCT
3 23888 8 cctcccaa 2446 8442 15291 26 63 56 8 3660644 1 cCTCCCAA
4 54564 8 tcccagca 1858 7179 17149 30 68 58 28 7788972 1 tCCCAGCA
55191 8 tcctqcct 1729 7024 18878 33 68 58 28 7798127 1 tCCTGCCT
6 30615 8 ctctqcct 1744 4737 20622 36 65 56 28 4128111 1 cTCTGCCT
7 64852 8 tttcccca 1820 2853 22442 39 63 54 8 8379244 1 tTTCCCCA
5 8 63383 8 ttctqcct 1603 2969 24045 42 62 54 28 8322415 1 tTCTGCCT
9 244667 9 tqtgtgtgt 1647 2570 25692 45 66 59 32 64978423 1 tGTGTGTGT
21781 8 ccccaccc 1457 2710 27149 47 68 60 0 2546029 1 ccCCACCC
11 54741 8 tccctccc 1142 2618 28291 49 63 60 0 7788397 1 tCCCtCCC
12 20964 8 ccactqca 933 6626 29224 51 65 54 38 3563432 1 cCACTGCa 10 13 32117 8 cttcctcc 1046 2428 30270 53 63 56 0 4185069 1 cTTCCTCC
14 55157 8 tcctctcc 1084 2175 31354 55 64 58 0 7797741 1 tCCTCTCC
24029 8 cctctctc 911 2335 32265 56 62 56 0 3661693 1 cCTCTCTC
16 57172 8 tcttccca 908 2163 33173 58 62 54 8 7863148 1 tCTTCCCA
17 57255 8 tcttqqct 697 3146 33870 59 65 54 36 7863727 1 tCTTGGCT
15 18 65365 8 ttttcccc 708 2604 34578 60 62 54 0 8387437 1 tTTTCCCC
55191 8 tcctqcct 1729 7024 18878 33 68 58 28 7798127 1 tCCTGCCT
6 30615 8 ctctqcct 1744 4737 20622 36 65 56 28 4128111 1 cTCTGCCT
7 64852 8 tttcccca 1820 2853 22442 39 63 54 8 8379244 1 tTTCCCCA
5 8 63383 8 ttctqcct 1603 2969 24045 42 62 54 28 8322415 1 tTCTGCCT
9 244667 9 tqtgtgtgt 1647 2570 25692 45 66 59 32 64978423 1 tGTGTGTGT
21781 8 ccccaccc 1457 2710 27149 47 68 60 0 2546029 1 ccCCACCC
11 54741 8 tccctccc 1142 2618 28291 49 63 60 0 7788397 1 tCCCtCCC
12 20964 8 ccactqca 933 6626 29224 51 65 54 38 3563432 1 cCACTGCa 10 13 32117 8 cttcctcc 1046 2428 30270 53 63 56 0 4185069 1 cTTCCTCC
14 55157 8 tcctctcc 1084 2175 31354 55 64 58 0 7797741 1 tCCTCTCC
24029 8 cctctctc 911 2335 32265 56 62 56 0 3661693 1 cCTCTCTC
16 57172 8 tcttccca 908 2163 33173 58 62 54 8 7863148 1 tCTTCCCA
17 57255 8 tcttqqct 697 3146 33870 59 65 54 36 7863727 1 tCTTGGCT
15 18 65365 8 ttttcccc 708 2604 34578 60 62 54 0 8387437 1 tTTTCCCC
19 18807 8 cagcctct 628 2511 35206 61 64 56 36 3365871 1 cAGCCTCT
59351 8 tgcttcct 712 2128 35918 63 62 54 28 8060783 1 tGCTTCCT
21 63380 8 ttctqcca 730 1955 36648 64 63 54 36 8322412 1 tTCTGCCA
22 24407 8 ccttccct 621 2226 37269 65 65 56 0 3668847 1 cCTTCCCT
59351 8 tgcttcct 712 2128 35918 63 62 54 28 8060783 1 tGCTTCCT
21 63380 8 ttctqcca 730 1955 36648 64 63 54 36 8322412 1 tTCTGCCA
22 24407 8 ccttccct 621 2226 37269 65 65 56 0 3668847 1 cCTTCCCT
20 23 56696 8 tctcctqa 530 2944 37799 66 63 54 33 7855092 1 tCTCCTGA
24 57239 8 tcttqcct 636 2062 38435 67 63 54 28 7863663 1 tCTTGCCT
32084 8 cttcccca 593 2028 39028 68 65 56 8 4184940 1 cTTCCCCA
26 62951 8 ttcctqct 577 2011 39605 69 62 54 28 8314799 1 tTCCTGCT
27 59895 8 tqgcttct 577 1892 40182 70 64 54 36 8085487 1 tGGCTTCT
25 28 30167 8 ctcctcct 458 2258 40640 71 62 56 0 4120431 1 cTCCTCCT
29 65108 8 tttqccca 525 1846 41165 72 65 54 33 8383340 1 tTTGCCCA
31639 8 ctqtqcct 452 2046 41617 73 66 56 36 4160879 1 cTGTGCCT
31 55252 8 tccttcca 457 1910 42074 74 62 54 8 7798636 1 tCCTTCCA
32 62792 8 ttcccaga 454 1831 42528 74 62 54 30 8313652 1 tTCCCAGA
30 33 58516 8 tqcagcca 399 1993 42927 75 65 54 38 6999404 1 tgCAGCCA
34 59323 8 tgctqtqt 396 1916 43323 76 62 54 32 8060407 1 tGCTGTGT
58871 8 tgccttct 359 2052 43682 76 62 54 28 8052719 1 tGCCTTCT
36 62840 8 ttccctqa 398 1776 44080 77 64 54 30 8313844 1 tTCCCTGA
37 65195 8 tttgqqqt 421 1613 44501 78 69 54 20 8383927 1 tTTGGGGT
35 38 260055 9 tttcttcct 371 1733 44872 79 62 55 0 67043183 1 tTTCTTCCT
39 30551 8 ctctccct 288 2391 45160 79 62 56 0 4127599 1 cTCTCCCT
14715 8 atqcctqt 275 4214 45435 79 63 54 28 2055159 1 aTGCCTGT
41 56660 8 tctcccca 287 1963 45722 80 68 58 8 7854956 1 tCTCCCCA
42 59381 8 tqctttcc 324 1689 46046 81 63 54 28 8060909 1 tGCTTTCC
40 43 229239 9 tctttctct 300 1731 46346 81 62 55 0 62913519 1 tCTTTCTCT
44 59348 8 t cttcca 296 1711 46642 82 64 54 28 8060780 1 tGCTTCCA
59892 8 tqgcttca 286 1703 46928 82 66 54 36 8085484 1 tGGCTTCA
46 59320 8 tgctqtqa 287 1603 47215 83 64 54 32 8060404 1 tGCTGTGA
47 30021 8 ctcccacc 216 3033 47431 83 67 60 8 4119341 1 cTCCCACC
45 48 30887 8 ctgagqct 217 1972 47648 83 66 56 36 4148655 1 cTGAGGCT
49 55176 8 tcctqaga 243 1668 47891 84 64 54 36 7798068 1 tCCTGAGA
15083 8 atqgtgqt 196 2182 48087 84 65 54 10 2060215 1 aTGGTGGT
51 57063 8 tctqtqct 238 1644 48325 85 63 54 36 7860143 1 tCTGTGCT
52 63399 8 ttctqqct 214 1766 48539 85 62 54 36 8322479 1 tTCTGGCT
50 53 54655 8 tccccttt 204 1753 48743 85 63 54 0 7789567 1 tCCCCTTT
54 31368 8 ctgggaqa 172 2023 48915 86 65 54 22 3108148 1 ctGGGAGA
55289 8 tcctttqc 190 1750 49105 86 64 54 28 7798773 1 tCCTTTGC
56 259575 9 tttccttct 199 1627 49304 86 62 55 0 67035119 1 tTTCCTTCT
57 57317 8 tctttqcc 196 1600 49500 87 64 54 28 7864237 1 tCTTTGCC
55 58 30612 8 ctctgcca 164 1806 49664 87 66 56 36 4128108 1 cTCTGCCA
59 61087 8 tgtg ctt 180 1569 49844 87 65 54 36 8121727 1 tGTGGCTT
53855 8 tcaqcctt 155 1798 49999 88 62 54 36 7760767 1 tCAGCCTT
61 58877 8 tqcctttc 155 1692 50154 88 63 54 28 8052733 1 tGCCTTTC
62 30164 8 ctcctcca 146 1760 50300 88 63 56 8 4120428 1 cTCCTCCA
60 63 244479 9 tqtqgtttt 166 1450 50466 88 67 55 16 64974847 1 tGTGGTTTT
64 58751 8 tqcccttt 151 1472 50617 89 64 54 28 8051711 1 tGCCCTTT
261495 9 ttttcctct 164 1261 50781 89 62 55 0 67099631 1 tTTTCCTCT
66 260085 9 tttctttcc 143 1379 50924 89 62 55 0 67043309 1 tTTCTTTCC
67 259935 9 tttctcctt 140 1356 51064 89 62 55 0 67042175 1 tTTCTCCTT
65 68 251901 9 ttccttttc 145 1239 51209 90 62 55 0 66519037 1 tTCCTTTTC
69 65191 8 tttgqqct 136 1289 51345 90 68 54 36 8383919 1 tTTGGGCT
70 58868 8 tgccttca 123 1578 51468 90 64 54 28 8052716 1 tGCCTTCA
71 4583 8 acactgct 122 1495 51590 90 63 54 36 1466287 1 aCACTGCT
72 227199 9 tctctcttt 116 1652 51706 91 62 55 0 62847999 1 tCTCTCTTT
73 31300 8 ctqgcaca 113 1487 51819 91 65 54 38 4156200 1 cTGGCACa 74 59901 8 tqgctttc 113 1456 51932 91 64 54 36 8085501 1 tGGCTTTC
75 19796 8 catcccca 110 1496 52042 91 64 56 16 3398508 1 cATCCCCA
76 24039 8 cctctgct 100 1949 52142 91 64 56 28 3661743 1 cCTCTGCT
77 10199 8 aqcttcct 95 1717 52237 91 62 54 38 1769327 1 aGCTTCCT
78 61112 8 tqtgqtga 99 1540 52336 92 66 54 12 8121844 1 tGTGGTGA
79 58543 8 tqcaqqtt 106 1381 52442 92 64 54 38 8048063 1 tGCAGGTT
80 22493 8 cccttctc 90 1719 52532 92 63 56 0 3604349 1 cCCTTCTC
81 61397 8 tgtttccc 92 1538 52624 92 62 54 14 8126317 1 tGTTTCCC
82 59256 8 tqctctqa 95 1423 52719 92 64 54 36 8059892 1 tGCTCTGA
83 7911 8 actgtgct 93 1413 52812 92 64 54 36 1568687 1 aCTGTGCT
84 10196 8 aqcttcca 91 1426 52903 93 63 54 38 1769324 1 aGCTTCCA
85 251895 9 ttcctttct 82 1411 52985 93 62 55 0 66519023 1 tTCCTTTCT
86 63867 8 ttqcctqt 81 1506 53066 93 62 54 28 8346615 1 tTGCCTGT
87 7655 8 actctqct 86 1260 53152 93 63 54 28 1564591 1 aCTCTGCT
88 234487 9 tqcatttct 84 1242 53236 93 62 55 38 64389103 1 tGCATTTCT
89 64119 8 ttgqctct 75 1425 53311 93 62 54 36 8350703 1 tTGGCTCT
90 59284 8 tqctcrcca 71 1512 53382 93 67 54 38 7011692 1 tqCTGCCA
Bacteria: 199 bacteria and archae genomes from which can be downloaded from NCBI:
ftp.ncbi.nih.gov The genomes can be classified according to the use of nucleotides. An even use of nucleotides is if every nucleotide (a,c,g,t) is used 25% of the time.
Deviation from even usage can for example be taken as any that differs by more than 3%.
Following this criteria the 199 genomes divide into: 91 AT rich, 44 GC rich, 28 no >3%
skewness, 21 A rich, 15 other categories.
Bacteria can be highly AT rich. This explains why probes from a human probe library do not give a good coverage. Designing probes for an AT rich organism is a challenge because of the low melting temperature. The probes must be longer to achieve the melting temperature, but this lowers the coverage. A Probe library for mainly AT rich genomes is given in the following "bacteria table" (also identified by means of the program set forth in Fig.
17).
# no dnaID n nmer newhit cover sum p tm sc self 1naID ok oligo 1 64235 8 ttggtggt 15138 15138 15138 5 64 54 12 8351671 1 tTGGTGGT
2 63976 8 ttgctgga 12289 13631 27427 10 68 54 36 8347572 1 tTGCTGGA
3 228852 9 tcttcttca 11067 12888 38494 14 63 55 8 62906348 1 tCTTCTTCA
4 64099 8 ttggcgat 10164 13063 48658 18 63 54 38 8350631 1 tTGGCGAT
5 64232 8 ttggtgga 9220 13163 57878 22 69 54 12 8351668 1 tTGGTGGA
6 63721 8 ttgatggc 8466 12948 66344 25 64 54 28 8343477 1 tTGATGGC
7 237565 9 tgctttttc 8295 12487 74639 28 66 55 28 64487421 1 tGCTTTTTC
8 62951 8 ttcctgct 7481 12549 82120 31 62 54 28 8314799 1 tTCCTGCT
9 63956 8 ttgctcca 6847 12608 88967 34 63 54 30 8347500 1 tTGCTCCA
10 228855 9 tcttcttct 6418 12133 95385 36 62 55 0 62906351 1 tCTTCTTCT
11 65369 8 ttttccgc 6217 11950 101602 38 62 54 28 8387445 1 tTTTCCGC
12 253945 9 ttcttttgc 5716 11886 107318 41 65 55 28 66584565 1 tTCTTTTGC
13 16057 8 attggtgc 5223 12364 112541 43 66 54 36 2092533 1 aTTGGTGC
14 63843 8 ttgccgat 5032 11970 117573 45 62 54 38 8346535 1 tTGCCGAT
15 53833 8 tcagcagc 4631 12189 122204 46 62 54 38 7744309 1 tCAgCAGC
16 57321 8 tctttggc 4344 12242 126548 48 66 54 28 7864245 1 tCTTTGGC
17 63380 8 ttctgcca 4173 11996 130721 50 63 54 36 8322412 1 tTCTGCCA
18 55679 8 tcgccttt 3935 11760 134656 51 62 54 28 7822335 1 tCGCCTTT
19 261961 9 tttttcagc 3809 11550 138465 53 63 55 28 67107637 1 tTTTTCAGC
20 15689 8 attccagc 3267 12463 141732 54 62 54 28 2087733 1 aTTCCAGC
24 57239 8 tcttqcct 636 2062 38435 67 63 54 28 7863663 1 tCTTGCCT
32084 8 cttcccca 593 2028 39028 68 65 56 8 4184940 1 cTTCCCCA
26 62951 8 ttcctqct 577 2011 39605 69 62 54 28 8314799 1 tTCCTGCT
27 59895 8 tqgcttct 577 1892 40182 70 64 54 36 8085487 1 tGGCTTCT
25 28 30167 8 ctcctcct 458 2258 40640 71 62 56 0 4120431 1 cTCCTCCT
29 65108 8 tttqccca 525 1846 41165 72 65 54 33 8383340 1 tTTGCCCA
31639 8 ctqtqcct 452 2046 41617 73 66 56 36 4160879 1 cTGTGCCT
31 55252 8 tccttcca 457 1910 42074 74 62 54 8 7798636 1 tCCTTCCA
32 62792 8 ttcccaga 454 1831 42528 74 62 54 30 8313652 1 tTCCCAGA
30 33 58516 8 tqcagcca 399 1993 42927 75 65 54 38 6999404 1 tgCAGCCA
34 59323 8 tgctqtqt 396 1916 43323 76 62 54 32 8060407 1 tGCTGTGT
58871 8 tgccttct 359 2052 43682 76 62 54 28 8052719 1 tGCCTTCT
36 62840 8 ttccctqa 398 1776 44080 77 64 54 30 8313844 1 tTCCCTGA
37 65195 8 tttgqqqt 421 1613 44501 78 69 54 20 8383927 1 tTTGGGGT
35 38 260055 9 tttcttcct 371 1733 44872 79 62 55 0 67043183 1 tTTCTTCCT
39 30551 8 ctctccct 288 2391 45160 79 62 56 0 4127599 1 cTCTCCCT
14715 8 atqcctqt 275 4214 45435 79 63 54 28 2055159 1 aTGCCTGT
41 56660 8 tctcccca 287 1963 45722 80 68 58 8 7854956 1 tCTCCCCA
42 59381 8 tqctttcc 324 1689 46046 81 63 54 28 8060909 1 tGCTTTCC
40 43 229239 9 tctttctct 300 1731 46346 81 62 55 0 62913519 1 tCTTTCTCT
44 59348 8 t cttcca 296 1711 46642 82 64 54 28 8060780 1 tGCTTCCA
59892 8 tqgcttca 286 1703 46928 82 66 54 36 8085484 1 tGGCTTCA
46 59320 8 tgctqtqa 287 1603 47215 83 64 54 32 8060404 1 tGCTGTGA
47 30021 8 ctcccacc 216 3033 47431 83 67 60 8 4119341 1 cTCCCACC
45 48 30887 8 ctgagqct 217 1972 47648 83 66 56 36 4148655 1 cTGAGGCT
49 55176 8 tcctqaga 243 1668 47891 84 64 54 36 7798068 1 tCCTGAGA
15083 8 atqgtgqt 196 2182 48087 84 65 54 10 2060215 1 aTGGTGGT
51 57063 8 tctqtqct 238 1644 48325 85 63 54 36 7860143 1 tCTGTGCT
52 63399 8 ttctqqct 214 1766 48539 85 62 54 36 8322479 1 tTCTGGCT
50 53 54655 8 tccccttt 204 1753 48743 85 63 54 0 7789567 1 tCCCCTTT
54 31368 8 ctgggaqa 172 2023 48915 86 65 54 22 3108148 1 ctGGGAGA
55289 8 tcctttqc 190 1750 49105 86 64 54 28 7798773 1 tCCTTTGC
56 259575 9 tttccttct 199 1627 49304 86 62 55 0 67035119 1 tTTCCTTCT
57 57317 8 tctttqcc 196 1600 49500 87 64 54 28 7864237 1 tCTTTGCC
55 58 30612 8 ctctgcca 164 1806 49664 87 66 56 36 4128108 1 cTCTGCCA
59 61087 8 tgtg ctt 180 1569 49844 87 65 54 36 8121727 1 tGTGGCTT
53855 8 tcaqcctt 155 1798 49999 88 62 54 36 7760767 1 tCAGCCTT
61 58877 8 tqcctttc 155 1692 50154 88 63 54 28 8052733 1 tGCCTTTC
62 30164 8 ctcctcca 146 1760 50300 88 63 56 8 4120428 1 cTCCTCCA
60 63 244479 9 tqtqgtttt 166 1450 50466 88 67 55 16 64974847 1 tGTGGTTTT
64 58751 8 tqcccttt 151 1472 50617 89 64 54 28 8051711 1 tGCCCTTT
261495 9 ttttcctct 164 1261 50781 89 62 55 0 67099631 1 tTTTCCTCT
66 260085 9 tttctttcc 143 1379 50924 89 62 55 0 67043309 1 tTTCTTTCC
67 259935 9 tttctcctt 140 1356 51064 89 62 55 0 67042175 1 tTTCTCCTT
65 68 251901 9 ttccttttc 145 1239 51209 90 62 55 0 66519037 1 tTCCTTTTC
69 65191 8 tttgqqct 136 1289 51345 90 68 54 36 8383919 1 tTTGGGCT
70 58868 8 tgccttca 123 1578 51468 90 64 54 28 8052716 1 tGCCTTCA
71 4583 8 acactgct 122 1495 51590 90 63 54 36 1466287 1 aCACTGCT
72 227199 9 tctctcttt 116 1652 51706 91 62 55 0 62847999 1 tCTCTCTTT
73 31300 8 ctqgcaca 113 1487 51819 91 65 54 38 4156200 1 cTGGCACa 74 59901 8 tqgctttc 113 1456 51932 91 64 54 36 8085501 1 tGGCTTTC
75 19796 8 catcccca 110 1496 52042 91 64 56 16 3398508 1 cATCCCCA
76 24039 8 cctctgct 100 1949 52142 91 64 56 28 3661743 1 cCTCTGCT
77 10199 8 aqcttcct 95 1717 52237 91 62 54 38 1769327 1 aGCTTCCT
78 61112 8 tqtgqtga 99 1540 52336 92 66 54 12 8121844 1 tGTGGTGA
79 58543 8 tqcaqqtt 106 1381 52442 92 64 54 38 8048063 1 tGCAGGTT
80 22493 8 cccttctc 90 1719 52532 92 63 56 0 3604349 1 cCCTTCTC
81 61397 8 tgtttccc 92 1538 52624 92 62 54 14 8126317 1 tGTTTCCC
82 59256 8 tqctctqa 95 1423 52719 92 64 54 36 8059892 1 tGCTCTGA
83 7911 8 actgtgct 93 1413 52812 92 64 54 36 1568687 1 aCTGTGCT
84 10196 8 aqcttcca 91 1426 52903 93 63 54 38 1769324 1 aGCTTCCA
85 251895 9 ttcctttct 82 1411 52985 93 62 55 0 66519023 1 tTCCTTTCT
86 63867 8 ttqcctqt 81 1506 53066 93 62 54 28 8346615 1 tTGCCTGT
87 7655 8 actctqct 86 1260 53152 93 63 54 28 1564591 1 aCTCTGCT
88 234487 9 tqcatttct 84 1242 53236 93 62 55 38 64389103 1 tGCATTTCT
89 64119 8 ttgqctct 75 1425 53311 93 62 54 36 8350703 1 tTGGCTCT
90 59284 8 tqctcrcca 71 1512 53382 93 67 54 38 7011692 1 tqCTGCCA
Bacteria: 199 bacteria and archae genomes from which can be downloaded from NCBI:
ftp.ncbi.nih.gov The genomes can be classified according to the use of nucleotides. An even use of nucleotides is if every nucleotide (a,c,g,t) is used 25% of the time.
Deviation from even usage can for example be taken as any that differs by more than 3%.
Following this criteria the 199 genomes divide into: 91 AT rich, 44 GC rich, 28 no >3%
skewness, 21 A rich, 15 other categories.
Bacteria can be highly AT rich. This explains why probes from a human probe library do not give a good coverage. Designing probes for an AT rich organism is a challenge because of the low melting temperature. The probes must be longer to achieve the melting temperature, but this lowers the coverage. A Probe library for mainly AT rich genomes is given in the following "bacteria table" (also identified by means of the program set forth in Fig.
17).
# no dnaID n nmer newhit cover sum p tm sc self 1naID ok oligo 1 64235 8 ttggtggt 15138 15138 15138 5 64 54 12 8351671 1 tTGGTGGT
2 63976 8 ttgctgga 12289 13631 27427 10 68 54 36 8347572 1 tTGCTGGA
3 228852 9 tcttcttca 11067 12888 38494 14 63 55 8 62906348 1 tCTTCTTCA
4 64099 8 ttggcgat 10164 13063 48658 18 63 54 38 8350631 1 tTGGCGAT
5 64232 8 ttggtgga 9220 13163 57878 22 69 54 12 8351668 1 tTGGTGGA
6 63721 8 ttgatggc 8466 12948 66344 25 64 54 28 8343477 1 tTGATGGC
7 237565 9 tgctttttc 8295 12487 74639 28 66 55 28 64487421 1 tGCTTTTTC
8 62951 8 ttcctgct 7481 12549 82120 31 62 54 28 8314799 1 tTCCTGCT
9 63956 8 ttgctcca 6847 12608 88967 34 63 54 30 8347500 1 tTGCTCCA
10 228855 9 tcttcttct 6418 12133 95385 36 62 55 0 62906351 1 tCTTCTTCT
11 65369 8 ttttccgc 6217 11950 101602 38 62 54 28 8387445 1 tTTTCCGC
12 253945 9 ttcttttgc 5716 11886 107318 41 65 55 28 66584565 1 tTCTTTTGC
13 16057 8 attggtgc 5223 12364 112541 43 66 54 36 2092533 1 aTTGGTGC
14 63843 8 ttgccgat 5032 11970 117573 45 62 54 38 8346535 1 tTGCCGAT
15 53833 8 tcagcagc 4631 12189 122204 46 62 54 38 7744309 1 tCAgCAGC
16 57321 8 tctttggc 4344 12242 126548 48 66 54 28 7864245 1 tCTTTGGC
17 63380 8 ttctgcca 4173 11996 130721 50 63 54 36 8322412 1 tTCTGCCA
18 55679 8 tcgccttt 3935 11760 134656 51 62 54 28 7822335 1 tCGCCTTT
19 261961 9 tttttcagc 3809 11550 138465 53 63 55 28 67107637 1 tTTTTCAGC
20 15689 8 attccagc 3267 12463 141732 54 62 54 28 2087733 1 aTTCCAGC
21 57317 8 tctttgcc 3366 11301 145098 55 64 54 28 7864237 1 tCTTTGCC
22 64916 8 tttcgcca 3161 11512 148259 56 63 54 28 8379756 1 tTTCGCCA
23 58249 8 tgatgagc 3063 11204 151322 57 62 54 28 8027445 1 tGATGAGC
24 63717 8 ttgatgcc 2792 11450 154114 59 62 54 28 8343469 1 tTGATGCC
5 25 57172 8 tcttccca 2957 10260 157071 60 62 54 8 7863148 1 tCTTCCCA
26 5759 8 accgcttt 2572 11074 159643 61 65 54 28 1502207 1 aCCGCTTT
27 65209 8 tttggtgc 2413 11267 162056 62 63 54 36 8383989 1 tTTGGTGC
28 57236 8 tcttgcca 2393 10890 164449 62 65 54 36 7863660 1 tCTTGCCA
29 55796 8 tcgcttca 2299 10806 166748 63 62 54 28 7823340 1 tCGCTTCA
10 30 61332 8 tgttgcca 2138 11233 168886 64 64 54 36 8125804 1 tGTTGCCA
31 98292 9 cctttttca 2135 10703 171021 65 65 53 8 29360108 1 cCTTTTTCA
32 237439 9 tgcttcttt 2102 10423 173123 66 65 55 28 64486399 1 tGCTTCTTT
33 97791 9 ccttctttt 2143 9728 175266 67 64 53 0 29351935 1 cCTTCTTTT
34 65429 8 ttttgccc 1855 10845 177121 67 64 54 28 8387949 1 tTTTGCCC
15 35 59348 8 tgcttcca 1844 10290 178965 68 64 54 28 8060780 1 tGCTTCCA
36 98295 9 cctttttct 1911 9610 180876 69 64 53 0 29360111 1 cCTTTTTCT
37 59325 8 tgctgttc 1687 10619 182563 69 62 54 28 8060413 1 tGCTGTTC
38 63855 8 ttgccgtt 1597 10785 184160 70 62 54 28 8346559 1 tTGCCGTT
39 63959 8 ttgctcct 1691 9861 185851 71 62 54 28 8347503 1 tTGCTCCT
20 40 14973 8 atggcttc 1439 10673 187290 71 65 54 36 2059261 1 aTGGCTTC
41 55935 8 tcggcttt 1432 10401 188722 72 65 54 36 7826431 1 tCGGCTTT
42 15083 8 atggtggt 1394 10337 190116 72 65 54 10 2060215 1 aTGGTGGT
43 261501 9 ttttccttc 1531 9094 191647 73 62 55 0 67099645 1 tTTTCCTTC
44 58345 8 tgattggc 1286 10495 192933 73 65 54 28 8028085 1 tGATTGGC
45 40831 9 agcttcttt 1366 9482 194299 74 65 55 38 14154751 1 aGCTTCTTT
46 60409 8 tggtttgc 1221 10407 195520 74 65 54 28 8093685 1 tGGTTTGC
47 65365 8 ttttcccc 1329 9259 196849 75 62 54 0 8387437 1 tTTTCCCC
48 64932 8 tttcggca 1152 10181 198001 75 64 54 36 8379820 1 tTTCGGCA
49 32244 9 acttcttca 1206 9405 199207 76 65 55 8 12574700 1 aCTTCTTCA
50 54911 8 tccgcttt 1024 10796 200231 76 62 54 28 7793663 1 tCCGCTTT
51 64125 8 ttggcttc 1005 10701 201236 77 64 54 36 8350717 1 tTGGCTTC
52 55805 8 tcgctttc 1084 9724 202320 77 62 54 28 7823357 1 tCGCTTTC
53 57305 8 tctttcgc 958 10624 203278 77 62 54 28 7864181 1 tCTTTCGC
54 261621 9 ttttcttcc 1086 8914 204364 78 62 55 0 67100653 1 tTTTCTTCC
55 60047 8 tggggatt 1010 9349 205374 78 68 54 24 8088895 1 tGGGGATT
56 6047 8 acctgctt 922 10045 206296 78 65 54 28 1506687 1 aCCTGCTT
57 56953 8 tctgctgc 847 10447 207143 79 64 54 38 7842805 1 tCTgCTGC
58 14565 8 atgatgcc 854 10029 207997 79 63 54 28 2052013 1 aTGATGCC
59 32247 9 acttcttct 891 9333 208888 79 64 55 5 12574703 1 aCTTCTTCT
60 63969 8 ttgctgac 802 10101 209690 80 62 54 28 8347557 1 tTGCTGAC
61 253941 9 ttcttttcc 841 9306 210531 80 62 55 0 66584557 1 tTCTTTTCC
62 63465 8 ttcttggc 788 9701 211319 80 64 54 28 8322997 1 tTCTTGGC
63 65001 8 tttctggc 738 10120 212057 81 64 54 36 8380341 1 tTTCTGGC
64 131028 9 ctttttcca 776 9397 212833 81 64 53 8 33554284 1 cTTTTTCCA
65 59371 8 tgcttggt 681 10310 213514 81 65 54 36 8060855 1 tGCTTGGT
66 7805 8 actgcttc 673 10218 214187 82 64 54 28 1567741 1 aCTGCTTC
67 59856 8 tggctcaa 658 10072 214845 82 63 54 38 8085348 1 tGGCTCAA
68 86004 9 ccattttca 739 8750 215584 82 62 53 16 28573676 1 cCATTTTCA
69 63869 8 ttgccttc 626 9973 216210 82 62 54 28 8346621 1 tTGCCTTC
70 1695 8 aacggctt 637 9529 216847 83 63 54 36 1240447 1 aACGGCTT
71 59901 8 tggctttc 623 9558 217470 83 64 54 36 8085501 1 tGGCTTTC
72 65161 8 tttggagc 629 9307 218099 83 65 54 28 8383797 1 tTTGGAGC
73 8057 8 acttctgc 592 9719 218691 83 64 54 28 1571829 1 aCTTCTGC
74 65449 8 ttttgggc 643 8621 219334 83 68 54 28 8388021 1 tTTTGGGC
75 228861 9 tcttctttc 632 8621 219966 84 63 55 0 62906365 1 tCTTCTTTC
76 262005 9 tttttctcc 652 8108 220618 84 62 55 0 67107821 1 tTTTTCTCC
77 5369 8 accattgc 523 9894 221141 84 63 54 36 1495029 1 aCCATTGC
78 60395 8 tggttggt 511 9801 221652 84 66 54 24 8093623 1 tGGTTGGT
79 62969 8 ttccttgc 577 8431 222229 85 62 54 28 8314869 1 tTCCTTGC
80 58341 8 tgattgcc 485 9841 222714 85 63 54 28 8028077 1 tGATTGCC
81 8009 8 acttcagc 483 9585 223197 85 62 54 33 1571637 1 aCTTCAGC
82 61341 8 tgttgctc 475 9458 223672 85 62 54 28 8125821 1 tGTTGCTC
83 55289 8 tcctttgc 519 8560 224191 85 64 54 28 7798773 1 tCCTTTGC
84 61413 8 tgtttgcc 481 8906 224672 86 63 54 28 8126381 1 tGTTTGCC
85 261757 9 ttttgcttc 455 9306 225127 86 65 55 28 67103741 1 tTTTGCTTC
86 65179 8 tttggcgt 428 9562 225555 86 64 54 36 8383863 1 tTTGGCGT
87 122877 9 ctctttttc 479 8379 226034 86 62 53 0 33030141 1 cTCTTTTTC
88 59381 8 tgctttcc 462 8539 226496 86 63 54 28 8060909 1 tGCTTTCC
89 257917 9 ttgttcttc 471 8098 226967 86 62 55 17 66845693 1 tTGTTCTTC
90 60392 8 tggttgga 429 8764 227396 87 70 54 20 8093620 1 tGGTTGGA
Selection of detection means and identification of single nucleic acids Another part of the invention relates to identification of a means for detection of a target nu-cleic acid, the method comprising A) inputting, into a computer system, data that uniquely identifies the nucleic acid sequence of said target nucleic acid, wherein said computer system comprises a database holding in-formation of the composition of at least one library of nucleic acid probes of the invention, and wherein the computer system further comprises a database of target nucleic acid se-quences for each probe of said at least one library and/or further comprises means for ac-quiring and comparing nucleic acid sequence data, B) identifying, in the computer system, a probe from the at least one library, wherein the sequence of the probe exists in the target nucleic acid sequence or a sequence complemen-tary to the target nucleic acid sequence, C) identifying, in the computer system, a primer that will amplify the target nucleic acid se-quence, and D) providing, as identification of the specific means for detection, an output that points out the probe identified in step B and the sequences of the primers identified in step C.
The above-outlined method has several advantages in the event it is desired to rapidly and specifically identify a particular nucleic acid. If the researcher already has acquired a suitable multi-probe library of the invention, the method makes it possible within seconds to acquire information reiating to which of the probes in the library one should use for a subsequent assay, and of the primers one should synthesize. The time factor is important, since synthe-sis of a primer pair can be accomplished overnight, whereas synthesis of the probe would normally be quite time-consuming and cumbersome.
To facilitate use of the method, the probe library can be identified (e.g. by means of a pro-duct code which essentially tells the computer system how the probe library is composed).
Step A then comprises inputting, into the computer system, data that identifies the at least one library of nucleic acids from which it is desired to select a member for use in the specific means for detection.
The preferred inputting interface is an internet-based web-interface, because the method is conveniently stored on a web server to allow access from users who have acquired a probe library of the present invention. However, the method also would be useful as part of an installable computer application, which could be installed on a single computer or on a local area network.
In preferred embodiments of this method, the primers identified in step C are chosen so as to minimize the chance of amplifying genomic nucleic acids in a PCR reaction.
This is of course only relevant where the sample is likely to contain genomic material. One simple way to minimize the chance of amplification of genomic nucleic acids is to include, in at least one of the primers, a nucleotide sequence which in genomic DNA is interrupted by an intron. In this way, the primer will only prime amplification of transcripts where the intron has been spliced out.
Alternatively, one can choose primer pairs that cannot amplify genomic DNA or other transcripts. Such primers can be identified by doing a computerized search with the primers against the genome and transcriptome, i.e. an in silico PCR. Such a search must find and filter primer pairs where the left and right primer can match the DNA within the distance of a typical amplicon length, which can be 600 nucleotides or several thousand nucleotides. The left and right primer can match in four different ways: 1: The left primer and the reverse complement of the right primer. 2: The left primer and the reverse complement of the left primer. 3: The right primer and the reverse complement of the left primer. 4:
The right primer and the reverse complement of the right primer.
A further optimization of the method is to choose the primers in step C so as to minimize the length of amplicons obtained from PCR performed on the target nucleic acid sequence and it is further also preferred to select the primers so as to optimize the GC
content for performing a subsequent PCR.
As for the probe selection method, the selection method for detection means can be provided to the end-user as a computer program product providing instructions for implementing the method, embedded in a computer-readable medium. Consequentiy, the invention also pro-vides for a system comprising a database of nucleic acid probes of the invention and an ap-plication program for executing this computer program.
The method and the computer programs and system allows for quantitative or qualitative determination of the presence of a target nucleic acid in a sample, comprising i) identifying, by means of the detection means selection method of the invention, a specific means for detection of the target nucleic acid, where the specific means for detection com-prises an oligonucleotide probe and a set of primers, ii) obtaining the primers and the oligonucleotide probe identified in step i), iii) subjecting the sample to a molecular amplification procedure in the presence of the pri-mers and the oligonucleotide probe from step ii), and iv) determining the presence of the target nucleic acid based on the outcome of step iii).
Conveniently, primers obtained in step ii) are obtained by synthesis and it is preferred that the oligonucleotide probe is obtained from a library of the present invention.
The molecular amplification method is typically a PCR or a NASBA procedure, but any in vitro method for specific amplification (and, possibly, detection) of a nucleic acid is useful. The preferred PCR procedure is a qPCR (also known as real-time reverse transcription PCR or ki-netic RT-PCR).
Other aspects of the invention are discussed infra.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 illustrates the use of conventional long probes in panel (A) as well as the properties and use of short multi-probes (B) from a library constructed according to the invention. The short multi-probes comprise a recognition segment chosen so that each probe sequence may be used to detect and/or quantify several different target sequences comprising the comple-mentary recognition sequence. Fig. 1A shows a method according to the prior art. Fig. 1B
shows a method according to one aspect of the invention.
Fig. 2 is a flow chart showing a method for designing multi-probe sequences for a library ac-cording to one aspect of the invention. The method can be implemented by executing in-structions provided by a computer program embedded in a computer readable medium. In one aspect, the program instructions are executed by a system, which comprises a database of sequences such as expressed sequences.
Fig. 3 is a graph illustrating the redundancy of probes targeting each gene within a 100-probe library according to one aspect of the invention. The y-axis shows the number of genes in the human transcriptome that are targeted by different number of probes in the library. It is apparent that a majority of all genes are targeted by several probes. The average number of probes per gene is 17.4.
Fig. 4 shows the theoretical coverage of the human transcriptome by a seiection of hyper-abundant oligonucieotides of a given length. The graphs show the percentage of approxi-mately 38.000 human mRNA sequences that can be detected by an increasing number of well-chosen short multi-probes of different length. The graph illustrates the theoretical cover-age of the human transcriptome by optimally chosen (i.e. hyper-abundant, non-self comple-mentary and thermally stable) short multi-probes of different lengths. The Homo sapiens transcriptome sequence was obtained from European Bioinformatics Institute (EMBL-EBI). A
region of 1000 nt proximal to the 3' end of each mRNA sequence was used for the analysis (from 50 nt to 1050 nt upstream from the 3' end). As the amplification of each sequence is by PCR both strands of the amplified duplex was considered a valid target for multi-probes in the probe library. Probe sequences that even with LNA substitutions have inadequate Tm, as well as self-complementary probe sequences are excluded.
Fig. 5 shows the MALDI-MS spectrum of the oligonucleotide probe EQ13992, showing [M-H]"
= 4121,3 Da.
Fig. 6 shows representative real time PCR curves for 9-mer multi-probes detecting target sequences in a dual labelled probe assay. Results are from real time PCR
reactions with 9 nt long LNA enhanced dual labelled probes targeting different 9-mer sequences within the same gene. Each of the three different dual labelled probes were analysed in PCRs generating the 469, the 570 or the 671 SSA4 amplicons (each between 81 to 95 nt long). Dual labelled probe 469, 570, and 671 is shown in Panel a, b, and c, respectively. Each probe only detects the amplicon it was designed to detect. The Ct values were 23.7, 23.2, and 23.4 for the dual labelled probes 469, 570, and 671, respectively. 2 x 10' copies of the SSA4 cDNA were added as template. The high similarity between results despite differences in both probe sequences and their individual primer pairs indicate that the assays are very robust.
Fig. 7 shows examples of real time PCR curves for Molecular Beacons with a 9-mer and a 10-mer recognition site. Panel (A): Molecular beacon probe with a 10-mer recognition site de-tecting the 469 SSA4 amplicon. Signal was only obtained in the sample where SSA4 cDNA
was added (2 x 10' copies). A Ct value of 24.0 was obtained. A similar experiment with a molecular beacon having a 9-mer recognition site detecting the 570 SSA4 amplicon is shown in panel (B). Signal was only obtained when SSA4 cDNA was added (2 x 10' copies).
Fig. 8 shows an example of a real time PCR curve for a SYBR-probe with a 9-mer recognition site targeting the 570 SSA4 amplicon. Signal was only obtained in the sample where SSA4 cDNA was added (2 x 10' copies), whereas no signal was detected without addition of tem-plate.
Fig. 9 shows a calibration curve for three different 9-mer multi-probes using a dual labelled probe assay principle. Detection of different copy number levels of the SSA4 cDNA by the three dual labelled probes. The threshold cycle nr defines the cycle number at which signal was first detected for the respective PCR. Slope (a) and correlation coefficients (R 2) of the 5 three linear regression lines are: a=-3.456 & R2 = 0.9999 (Dual-labelled-469), a = -3.468 &
R2 = 0.9981 (Dual-labelled-570), and a=-3.499 & R2 = 0.9993 (Dual-labelled-671).
Fig. 10 shows the use of 9-mer dual labelled multi-probes to quantify a heat shock protein before and after-exposure to heat shock in a wild type yeast strain as well as a mutant strain where the corresponding gene has been deleted. Real time detection of SSA4 transcript levels 10 in wild type (wt) yeast and in the SSA4 knockout mutant with the Dual-labelled-570 probe is shown. The different strains were either cultured at 30 C till harvest (- HS) or they were ex-posed to 40 C for 30 minutes prior to harvest. The Dual-labelled-570 probe was used in this example. The transcript was only detected in the wt type strain, where it was most abundant in the + HS culture. Ct values were 26,1 and 30.3 for the + HS and the - HS
culture, respec-15 tively.
Fig. 11 shows an example of how more than one gene can be detected by the same 9-mer probe while nucleic acid molecules without the probe target sequence (i.e.
complementary to the recognition sequence) will not be detected. In (a) Dual-labelled-469 detects both the SSA4 (469 amplicon) and the POL5 transcript with Ct values of 29.7 and 30.1, respectively.
20 No signal was detected from the APG9 and HSP82 transcripts. In (b) Dual-labelled-570 de-tects both the SSA4 (570 amplicon) and the APG9 transcript with Ct values of 31.3 and 29.2 respectively. No signal is detected from the POL5 and HSP82 transcripts. In (c) probe Dual-labelled-671 detected both the SSA4 (671 amplicon) and the HSP82 transcript with Ct values of 29.8 and 25.6 respectively. No signal was detected from the POL5 and APG9 transcripts.
5 25 57172 8 tcttccca 2957 10260 157071 60 62 54 8 7863148 1 tCTTCCCA
26 5759 8 accgcttt 2572 11074 159643 61 65 54 28 1502207 1 aCCGCTTT
27 65209 8 tttggtgc 2413 11267 162056 62 63 54 36 8383989 1 tTTGGTGC
28 57236 8 tcttgcca 2393 10890 164449 62 65 54 36 7863660 1 tCTTGCCA
29 55796 8 tcgcttca 2299 10806 166748 63 62 54 28 7823340 1 tCGCTTCA
10 30 61332 8 tgttgcca 2138 11233 168886 64 64 54 36 8125804 1 tGTTGCCA
31 98292 9 cctttttca 2135 10703 171021 65 65 53 8 29360108 1 cCTTTTTCA
32 237439 9 tgcttcttt 2102 10423 173123 66 65 55 28 64486399 1 tGCTTCTTT
33 97791 9 ccttctttt 2143 9728 175266 67 64 53 0 29351935 1 cCTTCTTTT
34 65429 8 ttttgccc 1855 10845 177121 67 64 54 28 8387949 1 tTTTGCCC
15 35 59348 8 tgcttcca 1844 10290 178965 68 64 54 28 8060780 1 tGCTTCCA
36 98295 9 cctttttct 1911 9610 180876 69 64 53 0 29360111 1 cCTTTTTCT
37 59325 8 tgctgttc 1687 10619 182563 69 62 54 28 8060413 1 tGCTGTTC
38 63855 8 ttgccgtt 1597 10785 184160 70 62 54 28 8346559 1 tTGCCGTT
39 63959 8 ttgctcct 1691 9861 185851 71 62 54 28 8347503 1 tTGCTCCT
20 40 14973 8 atggcttc 1439 10673 187290 71 65 54 36 2059261 1 aTGGCTTC
41 55935 8 tcggcttt 1432 10401 188722 72 65 54 36 7826431 1 tCGGCTTT
42 15083 8 atggtggt 1394 10337 190116 72 65 54 10 2060215 1 aTGGTGGT
43 261501 9 ttttccttc 1531 9094 191647 73 62 55 0 67099645 1 tTTTCCTTC
44 58345 8 tgattggc 1286 10495 192933 73 65 54 28 8028085 1 tGATTGGC
45 40831 9 agcttcttt 1366 9482 194299 74 65 55 38 14154751 1 aGCTTCTTT
46 60409 8 tggtttgc 1221 10407 195520 74 65 54 28 8093685 1 tGGTTTGC
47 65365 8 ttttcccc 1329 9259 196849 75 62 54 0 8387437 1 tTTTCCCC
48 64932 8 tttcggca 1152 10181 198001 75 64 54 36 8379820 1 tTTCGGCA
49 32244 9 acttcttca 1206 9405 199207 76 65 55 8 12574700 1 aCTTCTTCA
50 54911 8 tccgcttt 1024 10796 200231 76 62 54 28 7793663 1 tCCGCTTT
51 64125 8 ttggcttc 1005 10701 201236 77 64 54 36 8350717 1 tTGGCTTC
52 55805 8 tcgctttc 1084 9724 202320 77 62 54 28 7823357 1 tCGCTTTC
53 57305 8 tctttcgc 958 10624 203278 77 62 54 28 7864181 1 tCTTTCGC
54 261621 9 ttttcttcc 1086 8914 204364 78 62 55 0 67100653 1 tTTTCTTCC
55 60047 8 tggggatt 1010 9349 205374 78 68 54 24 8088895 1 tGGGGATT
56 6047 8 acctgctt 922 10045 206296 78 65 54 28 1506687 1 aCCTGCTT
57 56953 8 tctgctgc 847 10447 207143 79 64 54 38 7842805 1 tCTgCTGC
58 14565 8 atgatgcc 854 10029 207997 79 63 54 28 2052013 1 aTGATGCC
59 32247 9 acttcttct 891 9333 208888 79 64 55 5 12574703 1 aCTTCTTCT
60 63969 8 ttgctgac 802 10101 209690 80 62 54 28 8347557 1 tTGCTGAC
61 253941 9 ttcttttcc 841 9306 210531 80 62 55 0 66584557 1 tTCTTTTCC
62 63465 8 ttcttggc 788 9701 211319 80 64 54 28 8322997 1 tTCTTGGC
63 65001 8 tttctggc 738 10120 212057 81 64 54 36 8380341 1 tTTCTGGC
64 131028 9 ctttttcca 776 9397 212833 81 64 53 8 33554284 1 cTTTTTCCA
65 59371 8 tgcttggt 681 10310 213514 81 65 54 36 8060855 1 tGCTTGGT
66 7805 8 actgcttc 673 10218 214187 82 64 54 28 1567741 1 aCTGCTTC
67 59856 8 tggctcaa 658 10072 214845 82 63 54 38 8085348 1 tGGCTCAA
68 86004 9 ccattttca 739 8750 215584 82 62 53 16 28573676 1 cCATTTTCA
69 63869 8 ttgccttc 626 9973 216210 82 62 54 28 8346621 1 tTGCCTTC
70 1695 8 aacggctt 637 9529 216847 83 63 54 36 1240447 1 aACGGCTT
71 59901 8 tggctttc 623 9558 217470 83 64 54 36 8085501 1 tGGCTTTC
72 65161 8 tttggagc 629 9307 218099 83 65 54 28 8383797 1 tTTGGAGC
73 8057 8 acttctgc 592 9719 218691 83 64 54 28 1571829 1 aCTTCTGC
74 65449 8 ttttgggc 643 8621 219334 83 68 54 28 8388021 1 tTTTGGGC
75 228861 9 tcttctttc 632 8621 219966 84 63 55 0 62906365 1 tCTTCTTTC
76 262005 9 tttttctcc 652 8108 220618 84 62 55 0 67107821 1 tTTTTCTCC
77 5369 8 accattgc 523 9894 221141 84 63 54 36 1495029 1 aCCATTGC
78 60395 8 tggttggt 511 9801 221652 84 66 54 24 8093623 1 tGGTTGGT
79 62969 8 ttccttgc 577 8431 222229 85 62 54 28 8314869 1 tTCCTTGC
80 58341 8 tgattgcc 485 9841 222714 85 63 54 28 8028077 1 tGATTGCC
81 8009 8 acttcagc 483 9585 223197 85 62 54 33 1571637 1 aCTTCAGC
82 61341 8 tgttgctc 475 9458 223672 85 62 54 28 8125821 1 tGTTGCTC
83 55289 8 tcctttgc 519 8560 224191 85 64 54 28 7798773 1 tCCTTTGC
84 61413 8 tgtttgcc 481 8906 224672 86 63 54 28 8126381 1 tGTTTGCC
85 261757 9 ttttgcttc 455 9306 225127 86 65 55 28 67103741 1 tTTTGCTTC
86 65179 8 tttggcgt 428 9562 225555 86 64 54 36 8383863 1 tTTGGCGT
87 122877 9 ctctttttc 479 8379 226034 86 62 53 0 33030141 1 cTCTTTTTC
88 59381 8 tgctttcc 462 8539 226496 86 63 54 28 8060909 1 tGCTTTCC
89 257917 9 ttgttcttc 471 8098 226967 86 62 55 17 66845693 1 tTGTTCTTC
90 60392 8 tggttgga 429 8764 227396 87 70 54 20 8093620 1 tGGTTGGA
Selection of detection means and identification of single nucleic acids Another part of the invention relates to identification of a means for detection of a target nu-cleic acid, the method comprising A) inputting, into a computer system, data that uniquely identifies the nucleic acid sequence of said target nucleic acid, wherein said computer system comprises a database holding in-formation of the composition of at least one library of nucleic acid probes of the invention, and wherein the computer system further comprises a database of target nucleic acid se-quences for each probe of said at least one library and/or further comprises means for ac-quiring and comparing nucleic acid sequence data, B) identifying, in the computer system, a probe from the at least one library, wherein the sequence of the probe exists in the target nucleic acid sequence or a sequence complemen-tary to the target nucleic acid sequence, C) identifying, in the computer system, a primer that will amplify the target nucleic acid se-quence, and D) providing, as identification of the specific means for detection, an output that points out the probe identified in step B and the sequences of the primers identified in step C.
The above-outlined method has several advantages in the event it is desired to rapidly and specifically identify a particular nucleic acid. If the researcher already has acquired a suitable multi-probe library of the invention, the method makes it possible within seconds to acquire information reiating to which of the probes in the library one should use for a subsequent assay, and of the primers one should synthesize. The time factor is important, since synthe-sis of a primer pair can be accomplished overnight, whereas synthesis of the probe would normally be quite time-consuming and cumbersome.
To facilitate use of the method, the probe library can be identified (e.g. by means of a pro-duct code which essentially tells the computer system how the probe library is composed).
Step A then comprises inputting, into the computer system, data that identifies the at least one library of nucleic acids from which it is desired to select a member for use in the specific means for detection.
The preferred inputting interface is an internet-based web-interface, because the method is conveniently stored on a web server to allow access from users who have acquired a probe library of the present invention. However, the method also would be useful as part of an installable computer application, which could be installed on a single computer or on a local area network.
In preferred embodiments of this method, the primers identified in step C are chosen so as to minimize the chance of amplifying genomic nucleic acids in a PCR reaction.
This is of course only relevant where the sample is likely to contain genomic material. One simple way to minimize the chance of amplification of genomic nucleic acids is to include, in at least one of the primers, a nucleotide sequence which in genomic DNA is interrupted by an intron. In this way, the primer will only prime amplification of transcripts where the intron has been spliced out.
Alternatively, one can choose primer pairs that cannot amplify genomic DNA or other transcripts. Such primers can be identified by doing a computerized search with the primers against the genome and transcriptome, i.e. an in silico PCR. Such a search must find and filter primer pairs where the left and right primer can match the DNA within the distance of a typical amplicon length, which can be 600 nucleotides or several thousand nucleotides. The left and right primer can match in four different ways: 1: The left primer and the reverse complement of the right primer. 2: The left primer and the reverse complement of the left primer. 3: The right primer and the reverse complement of the left primer. 4:
The right primer and the reverse complement of the right primer.
A further optimization of the method is to choose the primers in step C so as to minimize the length of amplicons obtained from PCR performed on the target nucleic acid sequence and it is further also preferred to select the primers so as to optimize the GC
content for performing a subsequent PCR.
As for the probe selection method, the selection method for detection means can be provided to the end-user as a computer program product providing instructions for implementing the method, embedded in a computer-readable medium. Consequentiy, the invention also pro-vides for a system comprising a database of nucleic acid probes of the invention and an ap-plication program for executing this computer program.
The method and the computer programs and system allows for quantitative or qualitative determination of the presence of a target nucleic acid in a sample, comprising i) identifying, by means of the detection means selection method of the invention, a specific means for detection of the target nucleic acid, where the specific means for detection com-prises an oligonucleotide probe and a set of primers, ii) obtaining the primers and the oligonucleotide probe identified in step i), iii) subjecting the sample to a molecular amplification procedure in the presence of the pri-mers and the oligonucleotide probe from step ii), and iv) determining the presence of the target nucleic acid based on the outcome of step iii).
Conveniently, primers obtained in step ii) are obtained by synthesis and it is preferred that the oligonucleotide probe is obtained from a library of the present invention.
The molecular amplification method is typically a PCR or a NASBA procedure, but any in vitro method for specific amplification (and, possibly, detection) of a nucleic acid is useful. The preferred PCR procedure is a qPCR (also known as real-time reverse transcription PCR or ki-netic RT-PCR).
Other aspects of the invention are discussed infra.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 illustrates the use of conventional long probes in panel (A) as well as the properties and use of short multi-probes (B) from a library constructed according to the invention. The short multi-probes comprise a recognition segment chosen so that each probe sequence may be used to detect and/or quantify several different target sequences comprising the comple-mentary recognition sequence. Fig. 1A shows a method according to the prior art. Fig. 1B
shows a method according to one aspect of the invention.
Fig. 2 is a flow chart showing a method for designing multi-probe sequences for a library ac-cording to one aspect of the invention. The method can be implemented by executing in-structions provided by a computer program embedded in a computer readable medium. In one aspect, the program instructions are executed by a system, which comprises a database of sequences such as expressed sequences.
Fig. 3 is a graph illustrating the redundancy of probes targeting each gene within a 100-probe library according to one aspect of the invention. The y-axis shows the number of genes in the human transcriptome that are targeted by different number of probes in the library. It is apparent that a majority of all genes are targeted by several probes. The average number of probes per gene is 17.4.
Fig. 4 shows the theoretical coverage of the human transcriptome by a seiection of hyper-abundant oligonucieotides of a given length. The graphs show the percentage of approxi-mately 38.000 human mRNA sequences that can be detected by an increasing number of well-chosen short multi-probes of different length. The graph illustrates the theoretical cover-age of the human transcriptome by optimally chosen (i.e. hyper-abundant, non-self comple-mentary and thermally stable) short multi-probes of different lengths. The Homo sapiens transcriptome sequence was obtained from European Bioinformatics Institute (EMBL-EBI). A
region of 1000 nt proximal to the 3' end of each mRNA sequence was used for the analysis (from 50 nt to 1050 nt upstream from the 3' end). As the amplification of each sequence is by PCR both strands of the amplified duplex was considered a valid target for multi-probes in the probe library. Probe sequences that even with LNA substitutions have inadequate Tm, as well as self-complementary probe sequences are excluded.
Fig. 5 shows the MALDI-MS spectrum of the oligonucleotide probe EQ13992, showing [M-H]"
= 4121,3 Da.
Fig. 6 shows representative real time PCR curves for 9-mer multi-probes detecting target sequences in a dual labelled probe assay. Results are from real time PCR
reactions with 9 nt long LNA enhanced dual labelled probes targeting different 9-mer sequences within the same gene. Each of the three different dual labelled probes were analysed in PCRs generating the 469, the 570 or the 671 SSA4 amplicons (each between 81 to 95 nt long). Dual labelled probe 469, 570, and 671 is shown in Panel a, b, and c, respectively. Each probe only detects the amplicon it was designed to detect. The Ct values were 23.7, 23.2, and 23.4 for the dual labelled probes 469, 570, and 671, respectively. 2 x 10' copies of the SSA4 cDNA were added as template. The high similarity between results despite differences in both probe sequences and their individual primer pairs indicate that the assays are very robust.
Fig. 7 shows examples of real time PCR curves for Molecular Beacons with a 9-mer and a 10-mer recognition site. Panel (A): Molecular beacon probe with a 10-mer recognition site de-tecting the 469 SSA4 amplicon. Signal was only obtained in the sample where SSA4 cDNA
was added (2 x 10' copies). A Ct value of 24.0 was obtained. A similar experiment with a molecular beacon having a 9-mer recognition site detecting the 570 SSA4 amplicon is shown in panel (B). Signal was only obtained when SSA4 cDNA was added (2 x 10' copies).
Fig. 8 shows an example of a real time PCR curve for a SYBR-probe with a 9-mer recognition site targeting the 570 SSA4 amplicon. Signal was only obtained in the sample where SSA4 cDNA was added (2 x 10' copies), whereas no signal was detected without addition of tem-plate.
Fig. 9 shows a calibration curve for three different 9-mer multi-probes using a dual labelled probe assay principle. Detection of different copy number levels of the SSA4 cDNA by the three dual labelled probes. The threshold cycle nr defines the cycle number at which signal was first detected for the respective PCR. Slope (a) and correlation coefficients (R 2) of the 5 three linear regression lines are: a=-3.456 & R2 = 0.9999 (Dual-labelled-469), a = -3.468 &
R2 = 0.9981 (Dual-labelled-570), and a=-3.499 & R2 = 0.9993 (Dual-labelled-671).
Fig. 10 shows the use of 9-mer dual labelled multi-probes to quantify a heat shock protein before and after-exposure to heat shock in a wild type yeast strain as well as a mutant strain where the corresponding gene has been deleted. Real time detection of SSA4 transcript levels 10 in wild type (wt) yeast and in the SSA4 knockout mutant with the Dual-labelled-570 probe is shown. The different strains were either cultured at 30 C till harvest (- HS) or they were ex-posed to 40 C for 30 minutes prior to harvest. The Dual-labelled-570 probe was used in this example. The transcript was only detected in the wt type strain, where it was most abundant in the + HS culture. Ct values were 26,1 and 30.3 for the + HS and the - HS
culture, respec-15 tively.
Fig. 11 shows an example of how more than one gene can be detected by the same 9-mer probe while nucleic acid molecules without the probe target sequence (i.e.
complementary to the recognition sequence) will not be detected. In (a) Dual-labelled-469 detects both the SSA4 (469 amplicon) and the POL5 transcript with Ct values of 29.7 and 30.1, respectively.
20 No signal was detected from the APG9 and HSP82 transcripts. In (b) Dual-labelled-570 de-tects both the SSA4 (570 amplicon) and the APG9 transcript with Ct values of 31.3 and 29.2 respectively. No signal is detected from the POL5 and HSP82 transcripts. In (c) probe Dual-labelled-671 detected both the SSA4 (671 amplicon) and the HSP82 transcript with Ct values of 29.8 and 25.6 respectively. No signal was detected from the POL5 and APG9 transcripts.
25 The amplicon produced in the different PCRs is indicated in the legend. The same amount of cDNA was used as in the experiments depicted in Figure 10. Only cDNA from non-heat shocked wild type yeast was used.
Fig. 12 shows agarose gel electrophoresis of a fraction of the amplicons generated in the PCR
reactions shown in the example of Fig. 11, demonstrating that the probes are specific for target sequences comprising the recognition sequence but do not hybridize to nucleic acid molecules which do not comprise the target sequence. In lane 1 contain the SSA4-469 am-plicon (81 bp), lane 2 contains the POL5 amplicon (94 bp), lane 3 contains the APG9 ampli-con (97 bp) and lane 4 contains the HSP82 amplicon (88 bp). Lane M contains a 50 bp ladder as size indicator. It is clear that a product was formed in all four cases;
however, only ampli-ficates containing the correct multi-probe target sequence (i.e.SSA4-467 and POL5) were detected by the dual labelled probe 467. That two different amplificates were indeed pro-duced and detected is evident from the size difference in the detected fragments from lane 1 and 2.
Fig. 13: Preferred target sequences.
Fig. 14: Further Preferred target sequences.
Fig. 15: Longmers (positive controls). The sequences are set forth in SEQ ID
NOs. 32-46.
Fig. 16: Procedure for the selection of probes and the designing of primers for qPCR.
Fig. 17: Source code for the program used in the calculation of a multi-probe dataset.
Fig. 18: The result from performing real time PCR with a probe carrying the Q4 quencher together with the fluorescein dye.
Figure 19: The result from performing real time PCR with a dual labelled probe carrying a 3'-Nitroindole.
Figure 20: The result from performing real time PCR with a probe having perfect match or a single mismatch relative to the amplified target sequence. As control, a PCR
without addition of template was included in the experiment.
DETAILED DESCRIPTION
The present invention relates to short oligonucleotide probes or multi-probes, chosen and designed to detect, classify or characterize, and/or quantify many different target nucleic acid molecules. These multi-probes comprise at least one non-natural modification (e.g. such as LNA nucleotide) for increasing the binding affinity of the probes for a recognition sequence, which is a subsequence of the target nucleic acid molecules. The target nucleic acid mole-cules are otherwise different outside of the recognition sequence.
In one aspect, the multi-probes comprise at least one nucleotide modified with a chemical moiety for increasing binding affinity of the probes for a recognition sequence, which is a subsequence of the target nucleic acid sequence. In another aspect, the probes comprise both at least one non-natural nucleotide and at least one nucleotide modified with a chemical moiety. In a further aspect, the at least one non-natural nucleotide is modified by the chemical moiety. The invention also provides kits, libraries and other compositions compri-sing the probes.
The invention further provides methods for choosing and designing suitable oligonucleotide probes for a given mixture of target sequences, ii) individual probes with these abilities, and iii) libraries of such probes chosen and designed to be able to detect, classify, and/or quantify the largest number of target nucleotides with the smallest number of probe sequences. Each probe according to the invention is thus able to bind many different targets, but may be used to create a specific assay when combined with a set of specific primers in PCR
assays.
Preferred oligonucleotides of the invention are comprised of about 8 to 9 nucleotide units, a substantial portion of which comprises stabilizing nucleotides, such as LNA
nucleotides. A
preferred library contains approximately 100 of these probes chosen and designed to cha-racterize a specific pool of nucleic acids, such as mRNA, cDNA or genomic DNA.
Such a library may be used in a wide variety of applications, e.g., gene expression analyses, SNP detection, and the like. (See, e.g., Fig. 1).
Definitions The following definitions are provided for specific terms, which are used in the disclosure of the present invention:
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a cell" includes a plurality of cells, including mixtures thereof. The term "a nucleic acid molecule" includes a plurality of nucleic acid molecules.
As used herein, the term "transcriptome" refers to the complete collection of transcribed elements of the genome of any species.
In addition to mRNAs, it also represents non-coding RNAs which are used for structural and regulatory purposes.
As used herein, the term "amplicon refers to small, replicating DNA fragments.
As used herein, a "sample" refers to a sample of tissue or fluid isolated from an organism or organisms, including but not limited to, for exampie, skin, plasma, serum, spinal fluid, lymph fluid, synovial fluid, urine, tears, blood celis, organs, tumours, and also to samples of in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
As used herein, an "organism" refers to a living entity, including but not limited to, for exam-ple, human, mouse, rat, Drosophila (e.g. D. melanogaster), C. elegans, yeast, Arabidopsis (e.g. A. thaliana), zebra fish, primates (e.g. chimpanzees), domestic animals, etc.
By the term "SBC nucleobases" is meant "Selective Binding Complementary"
nucleobases, i.e. modified nucleobases that can make stable hydrogen bonds to their complementary nu-cleobases, but are unable to make stable hydrogen bonds to other SBC
nucleobases. As an example, the SBC nucleobase A', can make a stable hydrogen bonded pair with its comple-mentary unmodified nucleobase, T. Likewise, the SBC nucleobase T' can make a stable hy-drogen bonded pair with its complementary unmodified nucleobase, A. However, the SBC
nucleobases A' and T' will form an unstable hydrogen bonded pair as compared to the base-pairs A'-T and A-T'. Likewise, a SBC nucleobase of C is designated C' and can make a stable hydrogen bonded pair with its complementary unmodified nucleobase G, and a SBC
nucleo-base of G is designated G' and can make a stable hydrogen bonded pair with its comple-mentary unmodified nucleobase C, yet C' and G' will form an unstable hydrogen bonded pair as compared to the basepairs C'-G and C-G'. A stable hydrogen bonded pair is obtained when 2 or more hydrogen bonds are formed e.g. the pair between A' and T, A and T', C and G', and C' and G. An unstable hydrogen bonded pair is obtained when 1 or no hydrogen bonds is formed e.g. the pair between A' and T', and C' and G'.
Especially interesting SBC nucleobases are 2,6-diaminopurine (A', also called D) together with 2-thio-uracil (U', also called 25U)(2-thio-4-oxo-pyrimidine) and 2-thio-thymine (T', also called 2ST)(2-thio-4-oxo-5-methyl-pyrimidine). Fig. 4 illustrates that the pairs A-2ST and D-T
have 2 or more than 2 hydrogen bonds whereas the D-ZST pair forms a single (unstable) hy-drogen bond. Likewise the SBC nucleobases pyrrolo-[2,3-d]pyrimidine-2(3H)-one (C', also called PyrroloPyr) and hypoxanthine (G', also called I)(6-oxo-purine) are shown in Fig. 9 where the pairs PyrroloPyr-G and C-I have 2 hydrogen bonds each whereas the PyrroloPyr-I
pair forms a single hydrogen bond.
By "SBC LNA oligomer" is meant a "LNA oligomer" containing at least one "LNA
unit" where the nucleobase is a "SBC nucleobase". By "LNA unit with an SBC nucleobase" is meant a "SBC LNA monomer". Generally speaking SBC LNA oligomers include oligomers that besides the SBC LNA monomer(s) contain other modified or naturally-occurring nucleotides or nucleo-sides. By "SBC monomer" is meant a non-LNA monomer with a SBC nucleobase. By "isose-quential oligonucleotide" is meant an oligonucleotide with the same sequence in a Watson-Crick sense as the corresponding modified oligonucleotide e.g. the sequences agTtcATg is equal to agTscD 2SUg where s is equal to the SBC DNA monomer 2-thio-t or 2-thio-u, D is equal to the SBC LNA monomer LNA-D and ZSU is equal to the SBC LNA monomer LNA
As used herein, the terms "nucleic acid", "polynucleotide" and "oligonucleotide" refer to pri-mers, probes, oligomer fragments to be detected, oligomer controls and unlabelled blocking oligomers and shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), and to any other type of polynucleotide which is an N glycoside of a purine or pyrimidine base, or modified purine or pyrimidine bases. There is no intended distinction in length between the term "nucleic acid", "polynucleotide" and "oli-gonucleotide", and these terms will be used interchangeably. These terms refer only to the primary structure of the molecule. Thus, these terms include double- and single-stranded DNA, as well as double- and single stranded RNA. The oligonucleotide is comprised of a se-quence of approximately at least 3 nucleotides, preferably at least about 6 nucleotides, and more preferably at least about 8 - 30 nucleotides corresponding to a region of the designated nucleotide sequence. "Corresponding" means identical to or complementary to the designated sequence.
The oligonucleotide is not necessarily physically derived from any existing or natural sequen-ce but may be generated in any manner, including chemical synthesis, DNA
replication, re-verse transcription or a combination thereof. The terms "oligonucleotide" or "nucleic acid"
intend a polynucleotide of genomic DNA or RNA, cDNA, semi synthetic, or synthetic origin which, by virtue of its origin or manipulation: (1) is not associated with all or a portion of the polynucleotide with which it is associated in nature; and/or (2) is linked to a polynucleotide other than that to which it is linked in nature; and (3) is not found in nature.
Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5'.
phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbour in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have a 5' and 3' ends.
When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, the 3' end of one oligonucleotide points toward the 5' end of the other; the former may be called the "upstream"
oligonucleotide and the latter the "downstream" oligonucleotide.
The term "primer" may refer to more than one primer and refers to an oligonucleotide, whether occurring naturally, as in a purified restriction digest, or produced synthetically, which is capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is com-5 plementary to a nucleic acid strand is catalyzed. Such conditions include the presence of four different deoxyribonucleoside triphosphates and a polymerization-inducing agent such as DNA polymerase or reverse transcriptase, in a suitable buffer ("buffer"
includes substituents which are cofactors, or which affect pH, ionic strength, etc.), and at a suitable temperature.
The primer is preferably single-stranded for maximum efficiency in amplification.
10 As used herein, the terms "PCR reaction", "PCR amplification", "PCR" and "real-time PCR" are interchangeable terms used to signify use of a nucleic acid amplification system, which multi-plies the target nucleic acids being detected. Examples of such systems include the poly-merase chain reaction (PCR) system and the ligase chain reaction (LCR) system.
Other methods recently described and known to the person of skill in the art are the nucleic acid 15 sequence based amplification (NASBATM, Cangene, Mississauga, Ontario) and Q
Beta Repli-case systems. The products formed by said amplification reaction may or may not be moni-tored in real time or only after the reaction as an end point measurement.
The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5' end of one sequence is paired 20 with the 3' end of the other, is in "antiparallel association." Bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention include, for example, inosine and 7-deazaguanine. Complementarity may not be perfect;
stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables 25 including, for example, the length of the oligonucleotide, percent concentration of cytosine and guanine bases in the oligonucleotide, ionic strength, and incidence of mismatched base pairs.
Stability of a nucleic acid duplex is measured by the melting temperature, or "Tm". The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which half of 30 the base pairs have disassociated.
As used herein, the term "probe" refers to a labelled oligonucleotide, which forms a duplex structure with a sequence in the target nucleic acid, due to complementarity of at least one sequence in the probe with a sequence in the target region. The probe, preferably, does not contain a sequence complementary to sequence(s) used to prime the polymerase chain reac-tion. Generally the 3' terminus of the probe will be "blocked" to prohibit incorporation of the probe into a primer extension product. "Blocking" may be achieved by using non-comple-mentary bases or by adding a chemical moiety such as biotin or even a phosphate group to the 3' hydroxyl of the last nucleotide, which may, depending upon the selected moiety, may serve a dual purpose by also acting as a label.
The term "label" as used herein refers to any atom or molecule which can be used to provide a detectable (preferably quantifiable) signal, and which can be attached to a nucleic acid or protein. Labels may provide signals detectable by fluorescence, radioactivity, colorimetric, X-ray diffraction or absorption, magnetism, enzymatic activity, and the like.
As defined herein, "5'43' nuclease activity" or "5' to 3' nuclease activity"
refers to that acti-vity of a template-specific nucleic acid polymerase including either a 5'43' exonuclease acti-vity traditionally associated with some DNA polymerases whereby nucleotides are removed from the 5' end of an oligonucleotide in a sequential manner, (i.e., E. coli DNA polymerase I
has this activity whereas the Klenow fragment does not), or a 5'43' endonuclease activity wherein cleavage occurs more than one nucleotide from the 5' end, or both.
As used herein, the term "thermo stable nucleic acid polymerase" refers to an enzyme which is relatively stable to heat when compared, for example, to nucleotide polymerases from E.
coli and which catalyzes the polymerization of nucleosides. Generally, the enzyme will initiate synthesis at the 3'-end of the primer annealed to the target sequence, and will proceed in the 5'-direction along the template, and if possessing a 5' to 3' nuclease activity, hydrolyzing or displacing intervening, annealed probe to release both labelled and uniabelled probe frag-ments or intact probe, until synthesis terminates. A representative thermo stable enzyme isolated from Thermus aquaticus (Tag) is described in U.S. Pat. No. 4,889,818 and a method for using it in conventional PCR is described in Saiki et al., (1988), Science 239:487.
The term "nucleobase" covers the naturally occurring nucleobases adenine (A), guanine (G), cytosine (C), thymine (T) and uracil (U) as well as non-naturally occurring nucleobases such as xanthine, diaminopurine, 8-oxo-N6-methyladenine, 7-deazaxanthine, 7-deazaguanine, N4,N4-ethanocytosin, N6,N6-ethano-2,6-diaminopurine, 5-methylcytosine, 5-(C3-C6)-alkynyl-cytosine, 5-fluorouracil, 5-bromouracil, pseudoisocytosine, 2-hydroxy-5-methyl-4-triazolopy-ridin, isocytosine, isoguanine, inosine and the "non-naturally occurring"
nucleobases de-scribed in Benner et al., U.S. Patent No. 5,432,272 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acid Research,25: 4429-4443, 1997. The term "nucleobase" thus includes not only the known purine and pyrimidine heterocycles, but also heterocyclic analogues and tautomers thereof. Further naturally and non naturally occurring nucleobases include those disclosed in U.S. Patent No. 3,687,808; in chapter 15 by Sanghvi, in Antisense Research and Application, Ed. S. T. Crooke and B. Lebleu, CRC Press, 1993; in Englisch, et al., Angewandte Chemie, International Edition, 30: 613-722, 1991 (see, especially pages 622 and 623, and in the Concise Encyclopedia of Polymer Science and Engineering, J. I. Kroschwitz Ed., John Wiley & Sons, pages 858-859, 1990, Cook, Anti-Cancer DrugDesign 6: 585-607, 1991, each of which are hereby incorporated by reference in their entirety).
The term "nucleosidic base" or "nucleobase analogue" is further intended to include hetero-cyclic compounds that can serve as nucleosidic bases including certain "universal bases" that are not nucleosidic bases in the most classical sense but serve as nucleosidic bases. Es-pecially mentioned as a universal base is 3-nitropyrrole and 5-nitroindole.
Other preferred compounds include pyrene and pyridyloxazole derivatives, pyrenyl, pyrenylmethylglycerol derivatives and the like. Other preferred universal bases include, pyrrole, diazole or triazole derivatives, including those universal bases known in the art.
By "universal base" is meant a naturally-occurring or desirably a non-naturally occurring compound or moiety that can pair with a natural base (e.g., adenine, guanine, cytosine, uracil, and/or thymine), and that has a Tm differential of 15, 12, 10, 8, 6, 4, or 2 C or less as described herein.
By "oligonucleotide," "oligomer," or "oligo" is meant a successive chain of monomers (e.g., glycosides of heterocyclic bases) connected via internucleoside linkages. The linkage be-tween two successive monomers in the oligo consist of 2 to 4, desirably 3, groups/atoms selected from -CH2-, -0-, -S-, -NR"-, >C=O, >C=NR", >C=S, -Si(R")2-, -SO-, -S(O)Z-, -P(O)2-, -PO(BH3)-, -P(O,S)-, -P(S)2-, -PO(R")-, -PO(OCH3)-, and -PO(NHR")-, where R" is selected from hydrogen and C1_4-alkyl, and R" is selected from Cl_6-alkyl and phenyl. Illustra-tive examples of such linkages are -CH2-CH2-CH2-, -CH2-CO-CH2-, -CH2-CHOH-CH2-, -O-CH2-0-, -O-CHa-CHZ-, -O-CHa-CH= (including R5 when used as a linkage to a succeeding mono-mer), -CHZ-CH2-O-, -NR"-CHZ-CHz-, -CH2-CHZ-NR"-, -CHa-NR"-CHz-, -O-CH2-CH2-NR"--NR"-CO-O-, -NR"-CO-NR"-, -NR"-CS-NR"-, -NR"-C(=NR")-NR"-, -NR"-CO-CHZ-NR"-, -O-, -O-CO-CHZ-O-, -O-CHZ-CO-O-, -CHZ-CO-NR"-, -O-CO-NR"-, -NR"-CO-CH2-, -O-CHz-CO-NR"-, -O-CHZ-CHZ-NR"-, -CH=N-O-, -CH2-NR"-0-, -CHZ-O-N= (including R5 when used as a linkage to a succeeding monomer), -CHz-O-NR"-, -CO-NR"-CHZ-, -CHZ-NR"-0-, -CHZ-NR"-CO-, -O-NR"-CHZ-, -O-NR"-, -O-CH2-S-, -S-CH2-O-, -CH2-CHZ-S-, -O-CHZ-CHz-S-, -S-CH2-CH=
(including RS when used as a linkage to a succeeding monomer), -S-CH2-CH2-, -S-, -S-CH2-CH2-S-, -CH2-S-CH2-, -CHz-SO-CHZ-, -CHZ-SO2-CH2-, -O-SO-O-, -O-S(O)z-O-, -0-S(O)2-CH2.-, -O-S(O)2-NR"-, -NR"-S(O)2-CHZ-, -O-S(O)2-CHZ-, -O-P(O)2-0-, -O-P(O,S)-0-, -0-P(S)2-O-, -S-P(O)Z-O-, -S-P(O,S)-0-, -S-P(S)2-0-, -O-P(O)Z-S-, -O-P(O,S)-S-, -O-P(S)2-S-, -S-P(O)Z-S-, -S-P(O,S)-S-, -S-P(S)z-S-, -O-PO(R")-0-, -O-PO(OCH3)-0-, -O-PO(OCHZCH3)-O-, -O-PO(OCHZCH2S-R)-0-, -O-PO(BH3)-0-, -O-PO(NHR")-0-, -O-P(O)Z-NR"-, -NR"-P(O)2-0-, -O-P(O,NR")-0-, -CH2-P(O)2-0-, -O-P(O)2-CH2-, and -O-Si(R")2-O-; among which -CHZ-CO-NR"-, -CH2-NR"-0-, -S-CHZ-O-, -O-P(O)2-0-, -O-P(O,S)-0-, -O-P(S)a-O-, -NR"-P(O)2-0-, -0-P(O,NR")-0-, -O-PO(R")-0-, -O-PO(CH3)-0-, and -O-PO(NHR")-0-, where R" is selected form hydrogen and Ci_4-alkyl, and R" is selected from C1_6-alkyl and phenyl, are especially desir-able. Further illustrative examples are given in Mesmaeker et. al., Current Opinion in Struc-tural Biology 1995, 5, 343-355 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids Research, 1997, vol 25, pp 4429-4443. The left-hand side of the internucleoside linkage is bound to the 5-membered ring as substituent P* at the 3'-position, whereas the right-hand side is bound to the 5'-position of a preceding monomer.
By "LNA unit" is meant an individual LNA monomer (e.g., an LNA nucleoside or LNA nucleo-tide) or an oligomer (e.g., an oligonucleotide or nucleic acid) that includes at least one LNA
monomer. LNA units as disclosed in WO 99/14226 are in general particularly desirable modi-fied nucleic acids for incorporation into an oligonucleotide of the invention.
Additionally, the nucleic acids may be modified at either the 3' and/or 5' end by any type of modification known in the art. For example, either or both ends may be capped with a protecting group, attached to a flexible linking group, attached to a reactive group to aid in attachment to the substrate surface, etc. Desirable LNA units and their method of synthesis also are disclosed in WO 00/47599, US 6,043,060, US 6,268,490, PCT/JP98/00945, WO 0107455, WO
0100641, WO 9839352, WO 0056746, WO 0056748, WO 0066604, Morita et al., Bioorg.
Med. Chem. Lett. 12(1):73-76, 2002; Hakansson et al., Bioorg. Med. Chem. Lett.
11(7):935-938, 2001; Koshkin et al., J. Org. Chem. 66(25):8504-8512, 2001; Kvaerno et al., J. Org.
Chem. 66(16):5498-5503, 2001; Hakansson et al., J. Org. Chem. 65(17):5161-5166, 2000;
Kvaerno et al., J. Org. Chem. 65(17):5167-5176, 2000; Pfundheller et al., Nucleosides Nucleotides 18(9):2017-2030, 1999; and Kumar et a/., Bioorg. Med. Chem. Lett.
8(16):2219-2222, 1998.
Preferred LNA monomers, also referred to as "oxy-LNA" are LNA monomers which include bicyclic compounds as disclosed in PCT Publication WO 03/020739 wherein the bridge be-tween R" and R" as shown in formula (I) below together designate -CHZ-O-(methyloxy LNA) or -CHZ-CHZ-O- (ethyloxy LNA, also designated ENA).
Further preferred LNA monomers are designated "thio-LNA" or "amino-LNA"
including bicyclic structures as disclosed in WO 99/14226, wherein the heteroatom in the bridge between R4' and R 2'as shown in formula (I) below together designate -CH2-S-, -CH2-CH2-S-, -CH2-NH- or -CH2-CH2-NH-.
By "LNA modified oligonucleotide" is meant an oligonucleotide comprising at least one LNA
monomeric unit of formula (I), described infra, having the below described illustrative exam-ples of modifications:
Fig. 12 shows agarose gel electrophoresis of a fraction of the amplicons generated in the PCR
reactions shown in the example of Fig. 11, demonstrating that the probes are specific for target sequences comprising the recognition sequence but do not hybridize to nucleic acid molecules which do not comprise the target sequence. In lane 1 contain the SSA4-469 am-plicon (81 bp), lane 2 contains the POL5 amplicon (94 bp), lane 3 contains the APG9 ampli-con (97 bp) and lane 4 contains the HSP82 amplicon (88 bp). Lane M contains a 50 bp ladder as size indicator. It is clear that a product was formed in all four cases;
however, only ampli-ficates containing the correct multi-probe target sequence (i.e.SSA4-467 and POL5) were detected by the dual labelled probe 467. That two different amplificates were indeed pro-duced and detected is evident from the size difference in the detected fragments from lane 1 and 2.
Fig. 13: Preferred target sequences.
Fig. 14: Further Preferred target sequences.
Fig. 15: Longmers (positive controls). The sequences are set forth in SEQ ID
NOs. 32-46.
Fig. 16: Procedure for the selection of probes and the designing of primers for qPCR.
Fig. 17: Source code for the program used in the calculation of a multi-probe dataset.
Fig. 18: The result from performing real time PCR with a probe carrying the Q4 quencher together with the fluorescein dye.
Figure 19: The result from performing real time PCR with a dual labelled probe carrying a 3'-Nitroindole.
Figure 20: The result from performing real time PCR with a probe having perfect match or a single mismatch relative to the amplified target sequence. As control, a PCR
without addition of template was included in the experiment.
DETAILED DESCRIPTION
The present invention relates to short oligonucleotide probes or multi-probes, chosen and designed to detect, classify or characterize, and/or quantify many different target nucleic acid molecules. These multi-probes comprise at least one non-natural modification (e.g. such as LNA nucleotide) for increasing the binding affinity of the probes for a recognition sequence, which is a subsequence of the target nucleic acid molecules. The target nucleic acid mole-cules are otherwise different outside of the recognition sequence.
In one aspect, the multi-probes comprise at least one nucleotide modified with a chemical moiety for increasing binding affinity of the probes for a recognition sequence, which is a subsequence of the target nucleic acid sequence. In another aspect, the probes comprise both at least one non-natural nucleotide and at least one nucleotide modified with a chemical moiety. In a further aspect, the at least one non-natural nucleotide is modified by the chemical moiety. The invention also provides kits, libraries and other compositions compri-sing the probes.
The invention further provides methods for choosing and designing suitable oligonucleotide probes for a given mixture of target sequences, ii) individual probes with these abilities, and iii) libraries of such probes chosen and designed to be able to detect, classify, and/or quantify the largest number of target nucleotides with the smallest number of probe sequences. Each probe according to the invention is thus able to bind many different targets, but may be used to create a specific assay when combined with a set of specific primers in PCR
assays.
Preferred oligonucleotides of the invention are comprised of about 8 to 9 nucleotide units, a substantial portion of which comprises stabilizing nucleotides, such as LNA
nucleotides. A
preferred library contains approximately 100 of these probes chosen and designed to cha-racterize a specific pool of nucleic acids, such as mRNA, cDNA or genomic DNA.
Such a library may be used in a wide variety of applications, e.g., gene expression analyses, SNP detection, and the like. (See, e.g., Fig. 1).
Definitions The following definitions are provided for specific terms, which are used in the disclosure of the present invention:
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a cell" includes a plurality of cells, including mixtures thereof. The term "a nucleic acid molecule" includes a plurality of nucleic acid molecules.
As used herein, the term "transcriptome" refers to the complete collection of transcribed elements of the genome of any species.
In addition to mRNAs, it also represents non-coding RNAs which are used for structural and regulatory purposes.
As used herein, the term "amplicon refers to small, replicating DNA fragments.
As used herein, a "sample" refers to a sample of tissue or fluid isolated from an organism or organisms, including but not limited to, for exampie, skin, plasma, serum, spinal fluid, lymph fluid, synovial fluid, urine, tears, blood celis, organs, tumours, and also to samples of in vitro cell culture constituents (including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, recombinant cells and cell components).
As used herein, an "organism" refers to a living entity, including but not limited to, for exam-ple, human, mouse, rat, Drosophila (e.g. D. melanogaster), C. elegans, yeast, Arabidopsis (e.g. A. thaliana), zebra fish, primates (e.g. chimpanzees), domestic animals, etc.
By the term "SBC nucleobases" is meant "Selective Binding Complementary"
nucleobases, i.e. modified nucleobases that can make stable hydrogen bonds to their complementary nu-cleobases, but are unable to make stable hydrogen bonds to other SBC
nucleobases. As an example, the SBC nucleobase A', can make a stable hydrogen bonded pair with its comple-mentary unmodified nucleobase, T. Likewise, the SBC nucleobase T' can make a stable hy-drogen bonded pair with its complementary unmodified nucleobase, A. However, the SBC
nucleobases A' and T' will form an unstable hydrogen bonded pair as compared to the base-pairs A'-T and A-T'. Likewise, a SBC nucleobase of C is designated C' and can make a stable hydrogen bonded pair with its complementary unmodified nucleobase G, and a SBC
nucleo-base of G is designated G' and can make a stable hydrogen bonded pair with its comple-mentary unmodified nucleobase C, yet C' and G' will form an unstable hydrogen bonded pair as compared to the basepairs C'-G and C-G'. A stable hydrogen bonded pair is obtained when 2 or more hydrogen bonds are formed e.g. the pair between A' and T, A and T', C and G', and C' and G. An unstable hydrogen bonded pair is obtained when 1 or no hydrogen bonds is formed e.g. the pair between A' and T', and C' and G'.
Especially interesting SBC nucleobases are 2,6-diaminopurine (A', also called D) together with 2-thio-uracil (U', also called 25U)(2-thio-4-oxo-pyrimidine) and 2-thio-thymine (T', also called 2ST)(2-thio-4-oxo-5-methyl-pyrimidine). Fig. 4 illustrates that the pairs A-2ST and D-T
have 2 or more than 2 hydrogen bonds whereas the D-ZST pair forms a single (unstable) hy-drogen bond. Likewise the SBC nucleobases pyrrolo-[2,3-d]pyrimidine-2(3H)-one (C', also called PyrroloPyr) and hypoxanthine (G', also called I)(6-oxo-purine) are shown in Fig. 9 where the pairs PyrroloPyr-G and C-I have 2 hydrogen bonds each whereas the PyrroloPyr-I
pair forms a single hydrogen bond.
By "SBC LNA oligomer" is meant a "LNA oligomer" containing at least one "LNA
unit" where the nucleobase is a "SBC nucleobase". By "LNA unit with an SBC nucleobase" is meant a "SBC LNA monomer". Generally speaking SBC LNA oligomers include oligomers that besides the SBC LNA monomer(s) contain other modified or naturally-occurring nucleotides or nucleo-sides. By "SBC monomer" is meant a non-LNA monomer with a SBC nucleobase. By "isose-quential oligonucleotide" is meant an oligonucleotide with the same sequence in a Watson-Crick sense as the corresponding modified oligonucleotide e.g. the sequences agTtcATg is equal to agTscD 2SUg where s is equal to the SBC DNA monomer 2-thio-t or 2-thio-u, D is equal to the SBC LNA monomer LNA-D and ZSU is equal to the SBC LNA monomer LNA
As used herein, the terms "nucleic acid", "polynucleotide" and "oligonucleotide" refer to pri-mers, probes, oligomer fragments to be detected, oligomer controls and unlabelled blocking oligomers and shall be generic to polydeoxyribonucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), and to any other type of polynucleotide which is an N glycoside of a purine or pyrimidine base, or modified purine or pyrimidine bases. There is no intended distinction in length between the term "nucleic acid", "polynucleotide" and "oli-gonucleotide", and these terms will be used interchangeably. These terms refer only to the primary structure of the molecule. Thus, these terms include double- and single-stranded DNA, as well as double- and single stranded RNA. The oligonucleotide is comprised of a se-quence of approximately at least 3 nucleotides, preferably at least about 6 nucleotides, and more preferably at least about 8 - 30 nucleotides corresponding to a region of the designated nucleotide sequence. "Corresponding" means identical to or complementary to the designated sequence.
The oligonucleotide is not necessarily physically derived from any existing or natural sequen-ce but may be generated in any manner, including chemical synthesis, DNA
replication, re-verse transcription or a combination thereof. The terms "oligonucleotide" or "nucleic acid"
intend a polynucleotide of genomic DNA or RNA, cDNA, semi synthetic, or synthetic origin which, by virtue of its origin or manipulation: (1) is not associated with all or a portion of the polynucleotide with which it is associated in nature; and/or (2) is linked to a polynucleotide other than that to which it is linked in nature; and (3) is not found in nature.
Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5'.
phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its neighbour in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the "5' end" if its 5' phosphate is not linked to the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have a 5' and 3' ends.
When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, the 3' end of one oligonucleotide points toward the 5' end of the other; the former may be called the "upstream"
oligonucleotide and the latter the "downstream" oligonucleotide.
The term "primer" may refer to more than one primer and refers to an oligonucleotide, whether occurring naturally, as in a purified restriction digest, or produced synthetically, which is capable of acting as a point of initiation of synthesis along a complementary strand when placed under conditions in which synthesis of a primer extension product which is com-5 plementary to a nucleic acid strand is catalyzed. Such conditions include the presence of four different deoxyribonucleoside triphosphates and a polymerization-inducing agent such as DNA polymerase or reverse transcriptase, in a suitable buffer ("buffer"
includes substituents which are cofactors, or which affect pH, ionic strength, etc.), and at a suitable temperature.
The primer is preferably single-stranded for maximum efficiency in amplification.
10 As used herein, the terms "PCR reaction", "PCR amplification", "PCR" and "real-time PCR" are interchangeable terms used to signify use of a nucleic acid amplification system, which multi-plies the target nucleic acids being detected. Examples of such systems include the poly-merase chain reaction (PCR) system and the ligase chain reaction (LCR) system.
Other methods recently described and known to the person of skill in the art are the nucleic acid 15 sequence based amplification (NASBATM, Cangene, Mississauga, Ontario) and Q
Beta Repli-case systems. The products formed by said amplification reaction may or may not be moni-tored in real time or only after the reaction as an end point measurement.
The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5' end of one sequence is paired 20 with the 3' end of the other, is in "antiparallel association." Bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention include, for example, inosine and 7-deazaguanine. Complementarity may not be perfect;
stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables 25 including, for example, the length of the oligonucleotide, percent concentration of cytosine and guanine bases in the oligonucleotide, ionic strength, and incidence of mismatched base pairs.
Stability of a nucleic acid duplex is measured by the melting temperature, or "Tm". The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which half of 30 the base pairs have disassociated.
As used herein, the term "probe" refers to a labelled oligonucleotide, which forms a duplex structure with a sequence in the target nucleic acid, due to complementarity of at least one sequence in the probe with a sequence in the target region. The probe, preferably, does not contain a sequence complementary to sequence(s) used to prime the polymerase chain reac-tion. Generally the 3' terminus of the probe will be "blocked" to prohibit incorporation of the probe into a primer extension product. "Blocking" may be achieved by using non-comple-mentary bases or by adding a chemical moiety such as biotin or even a phosphate group to the 3' hydroxyl of the last nucleotide, which may, depending upon the selected moiety, may serve a dual purpose by also acting as a label.
The term "label" as used herein refers to any atom or molecule which can be used to provide a detectable (preferably quantifiable) signal, and which can be attached to a nucleic acid or protein. Labels may provide signals detectable by fluorescence, radioactivity, colorimetric, X-ray diffraction or absorption, magnetism, enzymatic activity, and the like.
As defined herein, "5'43' nuclease activity" or "5' to 3' nuclease activity"
refers to that acti-vity of a template-specific nucleic acid polymerase including either a 5'43' exonuclease acti-vity traditionally associated with some DNA polymerases whereby nucleotides are removed from the 5' end of an oligonucleotide in a sequential manner, (i.e., E. coli DNA polymerase I
has this activity whereas the Klenow fragment does not), or a 5'43' endonuclease activity wherein cleavage occurs more than one nucleotide from the 5' end, or both.
As used herein, the term "thermo stable nucleic acid polymerase" refers to an enzyme which is relatively stable to heat when compared, for example, to nucleotide polymerases from E.
coli and which catalyzes the polymerization of nucleosides. Generally, the enzyme will initiate synthesis at the 3'-end of the primer annealed to the target sequence, and will proceed in the 5'-direction along the template, and if possessing a 5' to 3' nuclease activity, hydrolyzing or displacing intervening, annealed probe to release both labelled and uniabelled probe frag-ments or intact probe, until synthesis terminates. A representative thermo stable enzyme isolated from Thermus aquaticus (Tag) is described in U.S. Pat. No. 4,889,818 and a method for using it in conventional PCR is described in Saiki et al., (1988), Science 239:487.
The term "nucleobase" covers the naturally occurring nucleobases adenine (A), guanine (G), cytosine (C), thymine (T) and uracil (U) as well as non-naturally occurring nucleobases such as xanthine, diaminopurine, 8-oxo-N6-methyladenine, 7-deazaxanthine, 7-deazaguanine, N4,N4-ethanocytosin, N6,N6-ethano-2,6-diaminopurine, 5-methylcytosine, 5-(C3-C6)-alkynyl-cytosine, 5-fluorouracil, 5-bromouracil, pseudoisocytosine, 2-hydroxy-5-methyl-4-triazolopy-ridin, isocytosine, isoguanine, inosine and the "non-naturally occurring"
nucleobases de-scribed in Benner et al., U.S. Patent No. 5,432,272 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acid Research,25: 4429-4443, 1997. The term "nucleobase" thus includes not only the known purine and pyrimidine heterocycles, but also heterocyclic analogues and tautomers thereof. Further naturally and non naturally occurring nucleobases include those disclosed in U.S. Patent No. 3,687,808; in chapter 15 by Sanghvi, in Antisense Research and Application, Ed. S. T. Crooke and B. Lebleu, CRC Press, 1993; in Englisch, et al., Angewandte Chemie, International Edition, 30: 613-722, 1991 (see, especially pages 622 and 623, and in the Concise Encyclopedia of Polymer Science and Engineering, J. I. Kroschwitz Ed., John Wiley & Sons, pages 858-859, 1990, Cook, Anti-Cancer DrugDesign 6: 585-607, 1991, each of which are hereby incorporated by reference in their entirety).
The term "nucleosidic base" or "nucleobase analogue" is further intended to include hetero-cyclic compounds that can serve as nucleosidic bases including certain "universal bases" that are not nucleosidic bases in the most classical sense but serve as nucleosidic bases. Es-pecially mentioned as a universal base is 3-nitropyrrole and 5-nitroindole.
Other preferred compounds include pyrene and pyridyloxazole derivatives, pyrenyl, pyrenylmethylglycerol derivatives and the like. Other preferred universal bases include, pyrrole, diazole or triazole derivatives, including those universal bases known in the art.
By "universal base" is meant a naturally-occurring or desirably a non-naturally occurring compound or moiety that can pair with a natural base (e.g., adenine, guanine, cytosine, uracil, and/or thymine), and that has a Tm differential of 15, 12, 10, 8, 6, 4, or 2 C or less as described herein.
By "oligonucleotide," "oligomer," or "oligo" is meant a successive chain of monomers (e.g., glycosides of heterocyclic bases) connected via internucleoside linkages. The linkage be-tween two successive monomers in the oligo consist of 2 to 4, desirably 3, groups/atoms selected from -CH2-, -0-, -S-, -NR"-, >C=O, >C=NR", >C=S, -Si(R")2-, -SO-, -S(O)Z-, -P(O)2-, -PO(BH3)-, -P(O,S)-, -P(S)2-, -PO(R")-, -PO(OCH3)-, and -PO(NHR")-, where R" is selected from hydrogen and C1_4-alkyl, and R" is selected from Cl_6-alkyl and phenyl. Illustra-tive examples of such linkages are -CH2-CH2-CH2-, -CH2-CO-CH2-, -CH2-CHOH-CH2-, -O-CH2-0-, -O-CHa-CHZ-, -O-CHa-CH= (including R5 when used as a linkage to a succeeding mono-mer), -CHZ-CH2-O-, -NR"-CHZ-CHz-, -CH2-CHZ-NR"-, -CHa-NR"-CHz-, -O-CH2-CH2-NR"--NR"-CO-O-, -NR"-CO-NR"-, -NR"-CS-NR"-, -NR"-C(=NR")-NR"-, -NR"-CO-CHZ-NR"-, -O-, -O-CO-CHZ-O-, -O-CHZ-CO-O-, -CHZ-CO-NR"-, -O-CO-NR"-, -NR"-CO-CH2-, -O-CHz-CO-NR"-, -O-CHZ-CHZ-NR"-, -CH=N-O-, -CH2-NR"-0-, -CHZ-O-N= (including R5 when used as a linkage to a succeeding monomer), -CHz-O-NR"-, -CO-NR"-CHZ-, -CHZ-NR"-0-, -CHZ-NR"-CO-, -O-NR"-CHZ-, -O-NR"-, -O-CH2-S-, -S-CH2-O-, -CH2-CHZ-S-, -O-CHZ-CHz-S-, -S-CH2-CH=
(including RS when used as a linkage to a succeeding monomer), -S-CH2-CH2-, -S-, -S-CH2-CH2-S-, -CH2-S-CH2-, -CHz-SO-CHZ-, -CHZ-SO2-CH2-, -O-SO-O-, -O-S(O)z-O-, -0-S(O)2-CH2.-, -O-S(O)2-NR"-, -NR"-S(O)2-CHZ-, -O-S(O)2-CHZ-, -O-P(O)2-0-, -O-P(O,S)-0-, -0-P(S)2-O-, -S-P(O)Z-O-, -S-P(O,S)-0-, -S-P(S)2-0-, -O-P(O)Z-S-, -O-P(O,S)-S-, -O-P(S)2-S-, -S-P(O)Z-S-, -S-P(O,S)-S-, -S-P(S)z-S-, -O-PO(R")-0-, -O-PO(OCH3)-0-, -O-PO(OCHZCH3)-O-, -O-PO(OCHZCH2S-R)-0-, -O-PO(BH3)-0-, -O-PO(NHR")-0-, -O-P(O)Z-NR"-, -NR"-P(O)2-0-, -O-P(O,NR")-0-, -CH2-P(O)2-0-, -O-P(O)2-CH2-, and -O-Si(R")2-O-; among which -CHZ-CO-NR"-, -CH2-NR"-0-, -S-CHZ-O-, -O-P(O)2-0-, -O-P(O,S)-0-, -O-P(S)a-O-, -NR"-P(O)2-0-, -0-P(O,NR")-0-, -O-PO(R")-0-, -O-PO(CH3)-0-, and -O-PO(NHR")-0-, where R" is selected form hydrogen and Ci_4-alkyl, and R" is selected from C1_6-alkyl and phenyl, are especially desir-able. Further illustrative examples are given in Mesmaeker et. al., Current Opinion in Struc-tural Biology 1995, 5, 343-355 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids Research, 1997, vol 25, pp 4429-4443. The left-hand side of the internucleoside linkage is bound to the 5-membered ring as substituent P* at the 3'-position, whereas the right-hand side is bound to the 5'-position of a preceding monomer.
By "LNA unit" is meant an individual LNA monomer (e.g., an LNA nucleoside or LNA nucleo-tide) or an oligomer (e.g., an oligonucleotide or nucleic acid) that includes at least one LNA
monomer. LNA units as disclosed in WO 99/14226 are in general particularly desirable modi-fied nucleic acids for incorporation into an oligonucleotide of the invention.
Additionally, the nucleic acids may be modified at either the 3' and/or 5' end by any type of modification known in the art. For example, either or both ends may be capped with a protecting group, attached to a flexible linking group, attached to a reactive group to aid in attachment to the substrate surface, etc. Desirable LNA units and their method of synthesis also are disclosed in WO 00/47599, US 6,043,060, US 6,268,490, PCT/JP98/00945, WO 0107455, WO
0100641, WO 9839352, WO 0056746, WO 0056748, WO 0066604, Morita et al., Bioorg.
Med. Chem. Lett. 12(1):73-76, 2002; Hakansson et al., Bioorg. Med. Chem. Lett.
11(7):935-938, 2001; Koshkin et al., J. Org. Chem. 66(25):8504-8512, 2001; Kvaerno et al., J. Org.
Chem. 66(16):5498-5503, 2001; Hakansson et al., J. Org. Chem. 65(17):5161-5166, 2000;
Kvaerno et al., J. Org. Chem. 65(17):5167-5176, 2000; Pfundheller et al., Nucleosides Nucleotides 18(9):2017-2030, 1999; and Kumar et a/., Bioorg. Med. Chem. Lett.
8(16):2219-2222, 1998.
Preferred LNA monomers, also referred to as "oxy-LNA" are LNA monomers which include bicyclic compounds as disclosed in PCT Publication WO 03/020739 wherein the bridge be-tween R" and R" as shown in formula (I) below together designate -CHZ-O-(methyloxy LNA) or -CHZ-CHZ-O- (ethyloxy LNA, also designated ENA).
Further preferred LNA monomers are designated "thio-LNA" or "amino-LNA"
including bicyclic structures as disclosed in WO 99/14226, wherein the heteroatom in the bridge between R4' and R 2'as shown in formula (I) below together designate -CH2-S-, -CH2-CH2-S-, -CH2-NH- or -CH2-CH2-NH-.
By "LNA modified oligonucleotide" is meant an oligonucleotide comprising at least one LNA
monomeric unit of formula (I), described infra, having the below described illustrative exam-ples of modifications:
R4" R~~
R3* R2*
wherein X is selected from -0-, -S-, -N(R")-, -C(R6R6*)-, -O-C(R'R'*)-, -C(R6R6*)-0-, -S-C(R'R'')-, -C(R6R6')-S-, -N(R"*)-C(R'R'*)-, -C(R6R6*)-N(R"')-, and -C(R6R6')-C(R'R'*).
B is selected from a modified base as discussed above e.g. an optionally substituted carbo-5 cyclic aryl such as optionally substituted pyrene or optionally substituted pyrenylmethylgly-cerol, or an optionaliy substituted heteroalicylic or optionally substituted heteroaromatic such as optionally substituted pyridyloxazole, optionally substituted pyrrole, optionally substituted diazole or optionally substituted triazole moieties; hydrogen, hydroxy, optionally substituted C1_4-alkoxy, optionally substituted Cl_4-alkyl, optionally substituted C1_4 acyloxy, nucleobases, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands.
P designates the radical position for an internucleoside linkage to a succeeding monomer, or a 5'-terminal group, such internucleoside linkage or 5'-terminal group optionally including the substituent R5. One of the substituents R2, R2*, R3, and R3* is a group P*
which designates an internucleoside linkage to a preceding monomer, or a 2'/3'-terminal group. The substituents 7* R , " and the ones of R2, R , 3 and R3* not designating P*
of Rl*, R4*, R , z* R , S R5*, R , 6 R6*, R ~
' R , each designates a biradical comprising about 1-8 groups/atoms selected from -C(RaRb)-, -C(Ra)=C(Ra)-, -C(Ra)=N-, -C(Ra)-0-, -0-, -Si(Ra)a-, -C(Ra)-S, -S-, -SO2-, -C(Ra)-N(Rb)-, -N(Ra)-, and >C=Q, wherein Q is selected from -0-, -S-, and -N(Ra)-, and Ra and Rb each is independently selected from hydrogen, optionally substituted C1_12-alkyl, optionally substi-tuted CZ_12-alkenyl, optionally substituted Cz_12-alkynyl, hydroxy, CI_12-alkoxy, Ca_12-alkeny-loxy, carboxy, Cl_1Z-alkoxycarbonyl, Cl_lZ-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryl-oxy, arylcarbonyl, heteroaryl, hetero-aryloxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Cl_6-alkyl)amino, carbamoyl, mono- and di(CI_6-alkyl)-amino-carbonyl, amino-Cl_6-alkyl-aminocarbonyl, mono- and di(Cl_6-alkyl)amino-Cl_6-alkyl-aminocarbonyl, C1_6-alkyl-carbonylamino, carbamido, C1_6-alkanoyloxy, sulphono, Cl_6-alkylsulphonyloxy, ni-tro, azido, sulphanyl, C1_6-alkylthio, halogen, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substituted, and where two geminal substituents Ra and Rb together may designate optionally substituted methylene (=CHz), and wherein two non-geminal or geminal substituents selected from Ra, Rb, and any of the substituents Rl*, R RZ*, R3, R3*, R~*, R5, RS*, R6 and R6*, R', and R7* which are present and not involved in P, P* or the biradical(s) together may form an associated biradical selected from biradicals of the same kind as defined before; the pair(s) of non-geminal substituents thereby forming a mono- or bicyclic entity together with (i) the atoms to which said non-geminal substituents are bound and (ii) any intervening atoms.
5 Each of the substituents Rl*, Rz, RZ*, R3, R4*, R5, R5*, R6 and R6*, R7, and R7* which are pre-sent and not involved in P, P* or the biradical(s), is independently selected from hydrogen, optionally substituted C1_12-alkyl, optionally substituted C2_12-alkenyl, optionally substituted C2_12-alkynyl, hydroxy, C1_12-alkoxy, C2_1Z-alkenyloxy, carboxy, Cl_1Z-alkoxycarbonyl, C1_12-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryloxy, arylcarbonyl, heteroaryl, heteroaryl-10 oxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Cl_6-alkyl)amino, car-bamoyl, mono- and di(C1_6-alkyl)-amino-carbonyl, amino-Ci_6-alkyl-aminocarbonyl, mono-and di(C1_6-alkyl)amino-Cl_6-alkyl-aminocarbonyl, CI_6-alkyl-carbonylamino, carbamido, C1_6-alkanoyloxy, sulphono, Cl_6-alkylsulphonyloxy, nitro, azido, sulphanyl, Cl_6-alkylthio, halogen, DNA intercalators, photochemically active groups, thermochemically active groups, chelating 15 groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substi-tuted, and where two geminal substituents together may designate oxo, thioxo, imino, or optionally substituted methylene, or together may form a spiro biradical consisting of a 1-5 carbon atom(s) alkylene chain which is optionally interrupted and/or terminated by one or more heteroatoms/groups selected from -0-, -S-, and -(NR")- where R" is selected from hy-20 drogen and C1_4-alkyl, and where two adjacent (non-geminal) substituents may designate an additional bond resulting in a double bond; and R"*, when present and not involved in a bira-dical, is selected from hydrogen and C1_4-alkyl; and basic salts and acid addition salts thereof.
Exemplary 5', 3', and/or 2' terminal groups include -H, -OH, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g., methyl or ethyl), 25 alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxy-carbonyl, acylamino, aroylamino, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, aryisulfinyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkyl-thio, amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 30 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
It is understood that references herein to a nucleic acid unit, nucleic acid residue, LNA unit, or similar term are inclusive of both individual nucleoside units and nucleotide units and nu-35 cleoside units and nucleotide units within an oligonucleotide.
R3* R2*
wherein X is selected from -0-, -S-, -N(R")-, -C(R6R6*)-, -O-C(R'R'*)-, -C(R6R6*)-0-, -S-C(R'R'')-, -C(R6R6')-S-, -N(R"*)-C(R'R'*)-, -C(R6R6*)-N(R"')-, and -C(R6R6')-C(R'R'*).
B is selected from a modified base as discussed above e.g. an optionally substituted carbo-5 cyclic aryl such as optionally substituted pyrene or optionally substituted pyrenylmethylgly-cerol, or an optionaliy substituted heteroalicylic or optionally substituted heteroaromatic such as optionally substituted pyridyloxazole, optionally substituted pyrrole, optionally substituted diazole or optionally substituted triazole moieties; hydrogen, hydroxy, optionally substituted C1_4-alkoxy, optionally substituted Cl_4-alkyl, optionally substituted C1_4 acyloxy, nucleobases, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands.
P designates the radical position for an internucleoside linkage to a succeeding monomer, or a 5'-terminal group, such internucleoside linkage or 5'-terminal group optionally including the substituent R5. One of the substituents R2, R2*, R3, and R3* is a group P*
which designates an internucleoside linkage to a preceding monomer, or a 2'/3'-terminal group. The substituents 7* R , " and the ones of R2, R , 3 and R3* not designating P*
of Rl*, R4*, R , z* R , S R5*, R , 6 R6*, R ~
' R , each designates a biradical comprising about 1-8 groups/atoms selected from -C(RaRb)-, -C(Ra)=C(Ra)-, -C(Ra)=N-, -C(Ra)-0-, -0-, -Si(Ra)a-, -C(Ra)-S, -S-, -SO2-, -C(Ra)-N(Rb)-, -N(Ra)-, and >C=Q, wherein Q is selected from -0-, -S-, and -N(Ra)-, and Ra and Rb each is independently selected from hydrogen, optionally substituted C1_12-alkyl, optionally substi-tuted CZ_12-alkenyl, optionally substituted Cz_12-alkynyl, hydroxy, CI_12-alkoxy, Ca_12-alkeny-loxy, carboxy, Cl_1Z-alkoxycarbonyl, Cl_lZ-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryl-oxy, arylcarbonyl, heteroaryl, hetero-aryloxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Cl_6-alkyl)amino, carbamoyl, mono- and di(CI_6-alkyl)-amino-carbonyl, amino-Cl_6-alkyl-aminocarbonyl, mono- and di(Cl_6-alkyl)amino-Cl_6-alkyl-aminocarbonyl, C1_6-alkyl-carbonylamino, carbamido, C1_6-alkanoyloxy, sulphono, Cl_6-alkylsulphonyloxy, ni-tro, azido, sulphanyl, C1_6-alkylthio, halogen, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substituted, and where two geminal substituents Ra and Rb together may designate optionally substituted methylene (=CHz), and wherein two non-geminal or geminal substituents selected from Ra, Rb, and any of the substituents Rl*, R RZ*, R3, R3*, R~*, R5, RS*, R6 and R6*, R', and R7* which are present and not involved in P, P* or the biradical(s) together may form an associated biradical selected from biradicals of the same kind as defined before; the pair(s) of non-geminal substituents thereby forming a mono- or bicyclic entity together with (i) the atoms to which said non-geminal substituents are bound and (ii) any intervening atoms.
5 Each of the substituents Rl*, Rz, RZ*, R3, R4*, R5, R5*, R6 and R6*, R7, and R7* which are pre-sent and not involved in P, P* or the biradical(s), is independently selected from hydrogen, optionally substituted C1_12-alkyl, optionally substituted C2_12-alkenyl, optionally substituted C2_12-alkynyl, hydroxy, C1_12-alkoxy, C2_1Z-alkenyloxy, carboxy, Cl_1Z-alkoxycarbonyl, C1_12-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryloxy, arylcarbonyl, heteroaryl, heteroaryl-10 oxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Cl_6-alkyl)amino, car-bamoyl, mono- and di(C1_6-alkyl)-amino-carbonyl, amino-Ci_6-alkyl-aminocarbonyl, mono-and di(C1_6-alkyl)amino-Cl_6-alkyl-aminocarbonyl, CI_6-alkyl-carbonylamino, carbamido, C1_6-alkanoyloxy, sulphono, Cl_6-alkylsulphonyloxy, nitro, azido, sulphanyl, Cl_6-alkylthio, halogen, DNA intercalators, photochemically active groups, thermochemically active groups, chelating 15 groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substi-tuted, and where two geminal substituents together may designate oxo, thioxo, imino, or optionally substituted methylene, or together may form a spiro biradical consisting of a 1-5 carbon atom(s) alkylene chain which is optionally interrupted and/or terminated by one or more heteroatoms/groups selected from -0-, -S-, and -(NR")- where R" is selected from hy-20 drogen and C1_4-alkyl, and where two adjacent (non-geminal) substituents may designate an additional bond resulting in a double bond; and R"*, when present and not involved in a bira-dical, is selected from hydrogen and C1_4-alkyl; and basic salts and acid addition salts thereof.
Exemplary 5', 3', and/or 2' terminal groups include -H, -OH, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g., methyl or ethyl), 25 alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxy-carbonyl, acylamino, aroylamino, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, aryisulfinyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkyl-thio, amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 30 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
It is understood that references herein to a nucleic acid unit, nucleic acid residue, LNA unit, or similar term are inclusive of both individual nucleoside units and nucleotide units and nu-35 cleoside units and nucleotide units within an oligonucleotide.
A"modified base" or other similar term refers to a composition (e.g., a non-naturally occur-ring nucleobase or nucleosidic base), which can pair with a natural base (e.g., adenine, gua-nine, cytosine, uracil, and/or thymine) and/or can pair with a non-naturally occurring nucleo-base or nucleosidic base. Desirably, the modified base provides a Tm differential of 15, 12, 10, 8, 6, 4, or 2 C or less as described herein. Exemplary modified bases are described in EP
1 072 679 and WO 97/12896.
The term "chemical moiety" refers to a part of a molecule. "Modified by a chemical moiety"
thus refer to a modification of the standard molecular structure by inclusion of an unusual chemical structure. The attachment of said structure can be covalent or non-covalent.
The term "inclusion of a chemical moiety" in an oligonucleotide probe thus refers to attach-ment of a molecular structure. Such as chemical moiety include but are not limited to cova-lently and/or non-covalently bound minor groove binders (MGB) and/or intercalating nucleic acids (INA) selected from a group consisting of asymmetric cyanine dyes, DAPI, SYBR Green I, SYBR Green II, SYBR Gold, PicoGreen, thiazole orange, Hoechst 33342, Ethidium Bromide, 1-0-(1-pyrenylmethyl)glycerol and Hoechst 33258. Other chemical moieties include the modified nucleobases, nucleosidic bases or LNA modified oligonucleotides.
The term "Dual labelled probe" refers to an oligonucleotide with two attached labels. In one aspect, one label is attached to the 5' end of the probe molecule, whereas the other label is attached to the 3' end of the molecule. A particular aspect of the invention contain a fluores-cent molecule attached to one end and a molecule, which is attached to the other end and which is able to quench the fluorophore by Fluorescence Resonance Energy Transfer (FRET).
5' nuclease assay probes and some Molecular Beacons are examples of Dual labelled probes.
The term "5' nuclease assay probe" refers to a dual labelled probe which may be hydrolyzed by the 5'-3' exonuclease activity of a DNA polymerase. A 5' nuclease assay probes is not nec-essarily hydrolyzed by the 5'-3' exonuclease activity of a DNA polymerase under the condi-tions employed in the particular PCR assay. The name "5' nuclease assay" is used regardless of the degree of hydrolysis observed and does not indicate any expectation on behalf of the experimenter. The term "5' nuclease assay probe" and "5' nuclease assay"
merely refers to assays where no particular care has been taken to avoid hydrolysis of the involved probe. "5' nuclease assay probes" are often referred to as a TaqMan assay probes", and the "5' nucle-ase assay" as "TaqMan assay". These names are used interchangeably in this application.
The term "oligonucleotide analogue" refers to a nucleic acid binding molecule capable of re-cognizing a particular target nucleotide sequence. A particular oligonucleotide analogue is peptide nucleic acid (PNA) in which the sugar phosphate backbone of an oligonucleotide is replaced by a protein like backbone. In PNA, nucleobases are attached to the uncharged polyamide backbone yielding a chimeric pseudopeptide-nucleic acid structure, which is homomorphous to nucleic acid forms.
The term "Molecular Beacon" refers to a single or dual labelled probe which is not likely to be affected by the 5'-3' exonuclease activity of a DNA polymerase. Special modifications to the probe, polymerase or assay conditions have been made to avoid separation of the labels or constituent nucleotides by the 5'-3' exonuclease activity of a DNA polymerase.
The detection principle thus rely on a detectable difference in label elicited signal upon binding of the mole-cular beacon to its target sequence. In one aspect of the invention the oligonucleotide probe forms an intramolecular hairpin structure at the chosen assay temperature mediated by com-plementary sequences at the 5'- and the 3'-end of the oligonucleotide. The oligonucleotide may have a fluorescent molecule attached to one end and a molecule attached to the other, which is able to quench the fluorophore when brought into close proximity of each other in the hairpin structure. In another aspect of the invention, a hairpin structure is not formed based on complementary structure at the ends of the probe sequence instead the detected signal change upon binding may result from interaction between one or both of the labels with the formed duplex structure or from a general change of spatial conformation of the probe upon binding - or from a reduced interaction between the labels after binding. A parti-cular aspect of the molecular beacon contain a number of LNA residues to inhibit hydrolysis by the 5'-3' exonuclease activity of a DNA polymerase.
The term "multi-probe" as used herein refers to a probe which comprises a recognition seg-ment which is a probe sequence sufficiently complementary to a recognition sequence in a target nucleic acid molecule to bind to the sequence under moderately stringent conditions and/or under conditions suitable for PCR, 5' nuclease assay and/or Molecular Beacon analysis (or generally any FRET-based method). Such conditions are well known to those of skill in the art. Preferably, the recognition sequence is found in a plurality of sequences being evaluated, e.g., such as a transcriptome. A multi-probe according to the invention may com-prise a non-natural nucleotide ("a stabilizing nucleotide") and may have a higher binding af-finity for the recognition sequence than a probe comprising an identical sequence but without the stabilizing modification. Preferably, at least one nucleotide of a multi-probe is modified by a chemical moiety (e.g., covalently or otherwise stably associated with during at least hybridization stages of a PCR reaction) for increasing the binding affinity of the recognition segment for the recognition sequence.
As used herein, a multi-probe with an increased "binding affinity" for a recognition sequence than a probe which comprises the same sequence but which does not comprise a stabilizing nucleotide, refers to a probe for which the association constant (Ka) of the probe recognition segment is higher than the association constant of the complementary strands of a double-stranded molecule. In another preferred embodiment, the association constant of the probe recognition segment is higher than the dissociation constant (Kd) of the complementary strand of the recognition sequence in the target sequence in a double stranded molecule.
A "multi-probe library" or "library of multi-probes" comprises a plurality of multi- probes, such that the sum of the probes in the library are able to recognise a major proportion of a transcriptome, including the most abundant sequences, such that about 60%, about 70%, about 80%, about 85%, more preferably about 90%, and still more preferably 95%, of the target nucleic acids in the transcriptome, are detected by the probes.
Monomers are referred to as being "complementary" if they contain nucleobases that can form hydrogen bonds according to Watson-Crick base-pairing rules (e.g. G with C, A with T or A with U) or other hydrogen bonding motifs such as for example diaminopurine with T, inosine with C, pseudoisocytosine with G, etc.
The term "succeeding monomer" relates to the neighbouring monomer in the 5'-terminal di-rection and the "preceding monomer" relates to the neighbouring monomer in the 3'-terminal direction.
As used herein, the term "target population" refers to a plurality of different sequences of nucleic acids, for example the genome or other nucleic acids from a particular species inclu-ding the transcriptome of the genome, wherein the transcriptome refers to the complete col-lection of transcribed elements of the genome of any species. Normally, the number of diffe-rent target sequences in a nucleic acid population is at least 100, but as will be clear the number is often much higher (more than 200, 500, 1000, and 10000 - in the case where the target population is a eukaryotic transcriptome).
As used herein, the term "target nucleic acid" refers to any relevant nucleic acid of a single specific sequence, e. g., a biological nucleic acid, e. g., derived from a patient, an animal (a human or non-human animal), a plant, a bacteria, a fungi, an archae, a cell, a tissue, an or-ganism, etc. For example, where the target nucleic acid is derived from a bacteria, archae, plant, non-human animal, cell, fungi, or non-human organism, the method optionally further comprises selecting the bacteria, archae, plant, non-human animal, cell, fungi, or non-human organism based upon detection of the target nucleic acid. In one embodiment, the target nucleic acid is derived from a patient, e. g., a human patient. In this embodiment, the inven-tion optionally further includes selecting a treatment, diagnosing a disease, or diagnosing a genetic predisposition to a disease, based upon detection of the target nucleic acid.
1 072 679 and WO 97/12896.
The term "chemical moiety" refers to a part of a molecule. "Modified by a chemical moiety"
thus refer to a modification of the standard molecular structure by inclusion of an unusual chemical structure. The attachment of said structure can be covalent or non-covalent.
The term "inclusion of a chemical moiety" in an oligonucleotide probe thus refers to attach-ment of a molecular structure. Such as chemical moiety include but are not limited to cova-lently and/or non-covalently bound minor groove binders (MGB) and/or intercalating nucleic acids (INA) selected from a group consisting of asymmetric cyanine dyes, DAPI, SYBR Green I, SYBR Green II, SYBR Gold, PicoGreen, thiazole orange, Hoechst 33342, Ethidium Bromide, 1-0-(1-pyrenylmethyl)glycerol and Hoechst 33258. Other chemical moieties include the modified nucleobases, nucleosidic bases or LNA modified oligonucleotides.
The term "Dual labelled probe" refers to an oligonucleotide with two attached labels. In one aspect, one label is attached to the 5' end of the probe molecule, whereas the other label is attached to the 3' end of the molecule. A particular aspect of the invention contain a fluores-cent molecule attached to one end and a molecule, which is attached to the other end and which is able to quench the fluorophore by Fluorescence Resonance Energy Transfer (FRET).
5' nuclease assay probes and some Molecular Beacons are examples of Dual labelled probes.
The term "5' nuclease assay probe" refers to a dual labelled probe which may be hydrolyzed by the 5'-3' exonuclease activity of a DNA polymerase. A 5' nuclease assay probes is not nec-essarily hydrolyzed by the 5'-3' exonuclease activity of a DNA polymerase under the condi-tions employed in the particular PCR assay. The name "5' nuclease assay" is used regardless of the degree of hydrolysis observed and does not indicate any expectation on behalf of the experimenter. The term "5' nuclease assay probe" and "5' nuclease assay"
merely refers to assays where no particular care has been taken to avoid hydrolysis of the involved probe. "5' nuclease assay probes" are often referred to as a TaqMan assay probes", and the "5' nucle-ase assay" as "TaqMan assay". These names are used interchangeably in this application.
The term "oligonucleotide analogue" refers to a nucleic acid binding molecule capable of re-cognizing a particular target nucleotide sequence. A particular oligonucleotide analogue is peptide nucleic acid (PNA) in which the sugar phosphate backbone of an oligonucleotide is replaced by a protein like backbone. In PNA, nucleobases are attached to the uncharged polyamide backbone yielding a chimeric pseudopeptide-nucleic acid structure, which is homomorphous to nucleic acid forms.
The term "Molecular Beacon" refers to a single or dual labelled probe which is not likely to be affected by the 5'-3' exonuclease activity of a DNA polymerase. Special modifications to the probe, polymerase or assay conditions have been made to avoid separation of the labels or constituent nucleotides by the 5'-3' exonuclease activity of a DNA polymerase.
The detection principle thus rely on a detectable difference in label elicited signal upon binding of the mole-cular beacon to its target sequence. In one aspect of the invention the oligonucleotide probe forms an intramolecular hairpin structure at the chosen assay temperature mediated by com-plementary sequences at the 5'- and the 3'-end of the oligonucleotide. The oligonucleotide may have a fluorescent molecule attached to one end and a molecule attached to the other, which is able to quench the fluorophore when brought into close proximity of each other in the hairpin structure. In another aspect of the invention, a hairpin structure is not formed based on complementary structure at the ends of the probe sequence instead the detected signal change upon binding may result from interaction between one or both of the labels with the formed duplex structure or from a general change of spatial conformation of the probe upon binding - or from a reduced interaction between the labels after binding. A parti-cular aspect of the molecular beacon contain a number of LNA residues to inhibit hydrolysis by the 5'-3' exonuclease activity of a DNA polymerase.
The term "multi-probe" as used herein refers to a probe which comprises a recognition seg-ment which is a probe sequence sufficiently complementary to a recognition sequence in a target nucleic acid molecule to bind to the sequence under moderately stringent conditions and/or under conditions suitable for PCR, 5' nuclease assay and/or Molecular Beacon analysis (or generally any FRET-based method). Such conditions are well known to those of skill in the art. Preferably, the recognition sequence is found in a plurality of sequences being evaluated, e.g., such as a transcriptome. A multi-probe according to the invention may com-prise a non-natural nucleotide ("a stabilizing nucleotide") and may have a higher binding af-finity for the recognition sequence than a probe comprising an identical sequence but without the stabilizing modification. Preferably, at least one nucleotide of a multi-probe is modified by a chemical moiety (e.g., covalently or otherwise stably associated with during at least hybridization stages of a PCR reaction) for increasing the binding affinity of the recognition segment for the recognition sequence.
As used herein, a multi-probe with an increased "binding affinity" for a recognition sequence than a probe which comprises the same sequence but which does not comprise a stabilizing nucleotide, refers to a probe for which the association constant (Ka) of the probe recognition segment is higher than the association constant of the complementary strands of a double-stranded molecule. In another preferred embodiment, the association constant of the probe recognition segment is higher than the dissociation constant (Kd) of the complementary strand of the recognition sequence in the target sequence in a double stranded molecule.
A "multi-probe library" or "library of multi-probes" comprises a plurality of multi- probes, such that the sum of the probes in the library are able to recognise a major proportion of a transcriptome, including the most abundant sequences, such that about 60%, about 70%, about 80%, about 85%, more preferably about 90%, and still more preferably 95%, of the target nucleic acids in the transcriptome, are detected by the probes.
Monomers are referred to as being "complementary" if they contain nucleobases that can form hydrogen bonds according to Watson-Crick base-pairing rules (e.g. G with C, A with T or A with U) or other hydrogen bonding motifs such as for example diaminopurine with T, inosine with C, pseudoisocytosine with G, etc.
The term "succeeding monomer" relates to the neighbouring monomer in the 5'-terminal di-rection and the "preceding monomer" relates to the neighbouring monomer in the 3'-terminal direction.
As used herein, the term "target population" refers to a plurality of different sequences of nucleic acids, for example the genome or other nucleic acids from a particular species inclu-ding the transcriptome of the genome, wherein the transcriptome refers to the complete col-lection of transcribed elements of the genome of any species. Normally, the number of diffe-rent target sequences in a nucleic acid population is at least 100, but as will be clear the number is often much higher (more than 200, 500, 1000, and 10000 - in the case where the target population is a eukaryotic transcriptome).
As used herein, the term "target nucleic acid" refers to any relevant nucleic acid of a single specific sequence, e. g., a biological nucleic acid, e. g., derived from a patient, an animal (a human or non-human animal), a plant, a bacteria, a fungi, an archae, a cell, a tissue, an or-ganism, etc. For example, where the target nucleic acid is derived from a bacteria, archae, plant, non-human animal, cell, fungi, or non-human organism, the method optionally further comprises selecting the bacteria, archae, plant, non-human animal, cell, fungi, or non-human organism based upon detection of the target nucleic acid. In one embodiment, the target nucleic acid is derived from a patient, e. g., a human patient. In this embodiment, the inven-tion optionally further includes selecting a treatment, diagnosing a disease, or diagnosing a genetic predisposition to a disease, based upon detection of the target nucleic acid.
As used herein, the term "target sequence" refers to a specific nucleic acid sequence within any target nucleic acid.
The term "stringent conditions", as used herein, is the "stringency" which occurs within a range from about Tm-5 C (5 C below the melting temperature (Tm) of the probe) to about 20 C to 25 C below Tm. As will be understood by those skilled in the art, the stringency of hybridization may be altered in order to identify or detect identical or related polynucleotide sequences. Hybridization techniques are generally described in Nucleic Acid Hybridization, A
Practical Approach, Ed. Hames, B. D. and Higgins, S. J., IRL Press, 1985; Gall and Pardue, Proc. Nati. Acad. Sci., USA 63: 378-383, 1969; and John, et al. Nature 223:
582-587, 1969.
Multi-probes Referring now to Fig. 113, a multi-probe according to the invention is preferably a short se-quence probe which binds to a recognition sequence found in a plurality of different target nucleic acids, such that the multi-probe specifically hybridizes to the target nucleic acid but do not hybridize to any detectable level to nucleic acid molecules which do not comprise the recognition sequence. Preferably, a collection of multi-probes, or multi-probe library, is able to recognize a major proportion of a transcriptome, including the most abundant sequences, such as about 60%, about 70%, about 80%, about 85%, more preferably about 90%, and still more preferably 95%, of the target nucleic acids in the transcriptome, are detected by the probes. A multi-probe according to the invention comprises a "stabilizing modification"
e.g. such as a non-natural nucleotide ("a stabilizing nucleotide") and has higher binding af-finity for the recognition sequence than a probe comprising an identical sequence but without the stabilizing sequence. Preferably, at least one nucleotide of a multi-probe is modified by a chemical moiety (e.g., covalently or otherwise stably associated with the probe during at least hybridization stages of a PCR reaction) for increasing the binding affinity of the recogni-tion segment for the recognition sequence.
In one aspect, a multi-probe of from 6 to 12 nucleotides comprises from 1 to 6 or even up to 12 stabilizing nucleotides, such as LNA nucleotides. An LNA enhanced probe library contains short probes that recognize a short recognition sequence (e.g., 8-9 nucleotides). LNA nu-cleobases can comprise a-LNA molecules (see, e.g., WO 00/66604) or xylo-LNA
molecules (see, e.g., WO 00/56748).
In one aspect, it is preferred that the Tm of the multi-probe when bound to its recognition sequence is between about 55 C to about 70 C.
In another aspect, the multi-probes comprise one or more modified nucleobases.
Modified base units may comprise a cyclic unit (e.g. a carbocyclic unit such as pyrenyl) that is joined to a nucleic unit, such as a 1'-position of furasonyl ring through a linker, such as a straight of branched chain alkylene or alkenylene group. Alkylene groups suitably having from 1(i.e., -5 CH2-) to about 12 carbon atoms, more typically 1 to about 8 carbon atoms, still more typi-cally 1 to about 6 carbon atoms. Alkenylene groups suitably have one, two or three carbon-carbon double bounds and from 2 to about 12 carbon atoms, more typically 2 to about 8 car-bon atoms, still more typically 2 to about 6 carbon atoms.
Multi-probes according to the invention are ideal for performing such assays as real-time PCR
10 as the probes according to the invention are preferably less than about 25 nucleotides, less than about 15 nucleotides, less than about 10 nucleotides, e.g., 8 or 9 nucleotides. Prefer-ably, a multi-probe can specifically hybridize with a recognition sequence within a target se-quence under PCR conditions and preferably the recognition sequence is found in at least about 50, at least about 100, at least about 200, at least about 500 different target nucleic 15 acid molecules. A library of multi-probes according to the invention will comprise multi-probes, which comprise non-identical recognition sequences, such that any two multi-probes hybridize to different sets of target nucleic acid molecules. In one aspect, the sets of target nucleic acid molecules comprise some identical target nucleic acid molecules, i.e., a target nucleic acid molecule comprising a gene sequence of interest may be bound by more than 20 one multi-probe. Such a target nucleic acid molecule wili contain at least two different re-cognition sequences which may overlap by one or more, but less than x nucleotides of a re-cognition sequence comprising x nucleotides.
In one aspect, a multi-probe library comprises a piurality of different multi-probes, each dif-ferent probe localized at a discrete location on a solid substrate. As used herein, "localize"
25 refers to being limited or addressed at the location such that hybridization event detected at the location can be traced to a probe of known sequence identity. A localized probe may or may not be stably associated with the substrate. For example, the probe could be in solution in the well of a microtiter plate and thus localized or addressed to the well.
Alternatively, or additionally, the probe could be stably associated with the substrate such that it remains at a 30 defined location on the substrate after one or more washes of the substrate with a buffer.
For example, the probe may be chemically associated with the substrate, either directly or through a linker molecule, which may be a nucleic acid sequence, a peptide or other type of molecule, which has an affinity for molecules on the substrate.
Alternatively, the target nucleic acid molecules may be localized on a substrate (e.g., as a 35 cell or cell lysate or nucleic acids dotted onto the substrate).
Once the appropriate sequences are determined, multi-LNA probes are preferably chemically synthesized using commercially available methods and equipment as described in the art (Tetrahedron 54: 3607-30, 1998). For example, the solid phase phosphoramidite method can be used to produce short LNA probes (Caruthers, et al., Cold Spring Harbor Symp.
Quant. Biol. 47:411-418, 1982, Adams, et al., J. Am. Chem. Soc. 105: 661 (1983).
The determination of the extent of hybridization of multi-probes from a multi-probe library to one or more target sequences (preferably to a plurality of target sequences) may be carried out by any of the methods well known in the art. If there is no detectable hybridization, the extent of hybridization is thus 0. Typically, labelled signal nucleic acids are used to detect hybridization. Complementary nucleic acids or signal nucleic acids may be labelled by any one of several methods typically used to detect the presence of hybridized polynucleotides.
The most common method of detection is the use of ligands, which bind to labelled antibo-dies, fluorophores or chemiluminescent agents. Other labels include antibodies, which can serve as specific binding pair members for a labelled ligand. The choice of label depends on sensitivity required, ease of conjugation with the probe, stability requirements, and available instrumentation.
LNA-containing-probes are typically labelled during synthesis. The flexibility of the phos-phoramidite synthesis approach furthermore facilitates the easy production of LNAs carrying all commercially available linkers, fluorophores and labelling-molecules available for this standard chemistry. LNA may also be labelled by enzymatic reactions e.g. by kinasing.
Multi-probes according to the invention can comprise single labels or a plurality of labels. In one aspect, the plurality of labels comprise a pair of labels which interact with each other either to produce a signal or to produce a change in a signal when hybridization of the multi-probe to a target sequence occurs.
In another aspect, the multi-probe comprises a fluorophore moiety and a quencher moiety, positioned in such a way that the hybridized state of the probe can be distinguished from the unhybridized state of the probe by an increase in the fluorescent signal from the nucleotide.
In one aspect, the multi-probe comprises, in addition to the recognition element, first and second complementary sequences, which specifically hybridize to each other, when the probe is not hybridized to a recognition sequence in a target molecule, bringing the quencher mole-cule in sufficient proximity to said reporter molecule to quench fluorescence of the reporter molecule. Hybridization of the target molecule distances the quencher from the reporter molecule and results in a signal, which is proportional to the amount of hybridization.
In another aspect, where polymerization of strands of nucleic acids can be detected using a polymerase with 5' nuclease activity. Fluorophore and quencher molecules are incorporated into the probe in sufficient proximity such that the quencher quenches the signal of the fluorophore molecule when the probe is hybridized to its recognition sequence.
Cleavage of the probe by the polymerase with 5' nuclease activity results in separation of the quencher and fluorophore molecule, and the presence in increasing amounts of signal as nucleic acid sequences In the present context, the term "label" means a reporter group, which is detectable either by itself or as a part of a detection series. Examples of functional parts of reporter groups are biotin, digoxigenin, fluorescent groups (groups which are able to absorb electromagnetic radiation, e.g. light or X-rays, of a certain wavelength, and which subsequently reemits the energy absorbed as radiation of longer wavelength; illustrative examples are DANSYL (5-di-methylamino)-1-naphthalenesulfonyl), DOXYL (N-oxyl-4,4-dimethyloxazolidine), PROXYL (N-oxyl-2,2,5,5-tetramethylpyrrolidine), TEMPO (N-oxyl-2,2,6,6-tetramethylpiperidine), dinitro-phenyl, acridines, coumarins, Cy3 and Cy5 (trademarks for Biological Detection Systems, Inc.), erythrosine, coumaric acid, umbelliferone, Texas red, rhodamine, tetramethyl rhoda-mine, Rox, 7-nitrobenzo-2-oxa-l-diazole (NBD), pyrene, fluorescein, Europium, Ruthenium, Samarium, and other rare earth metals), radio isotopic labels, chemiluminescence labels (la-bels that are detectable via the emission of light during a chemical reaction), spin labels (a free radical (e.g. substituted organic nitroxides) or other paramagnetic probes (e.g. Cu2+, MgZ+) bound to a biological molecule being detectable by the use of electron spin resonance spectroscopy). Especially interesting examples are biotin, fluorescein, Texas Red, rhodamine, dinitrophenyl, digoxigenin, Ruthenium, Europium, Cy5, Cy3, etc.
Suitable samples of target nucleic acid molecule may comprise a wide range of eukaryotic and prokaryotic cells, including protoplasts; or other biological materials, which may harbour target nucleic acids. The methods are thus applicable to tissue culture animal cells, animal cells (e.g., blood, serum, plasma, reticulocytes, lymphocytes, urine, bone marrow tissue, cerebrospinal fluid or any product prepared from blood or lymph) or any type of tissue biopsy (e.g. a muscle biopsy, a liver biopsy, a kidney biopsy, a bladder biopsy, a bone biopsy, a car-tilage biopsy, a skin biopsy, a pancreas biopsy, a biopsy of the intestinal tract, a thymus bi-opsy, a mammae biopsy, a uterus biopsy, a testicular biopsy, an eye biopsy or a brain bi-opsy, e.g., homogenized in lysis buffer), archival tissue nucleic acids, plant cells or other cells sensitive to osmotic shock and cells of bacteria, yeasts, viruses, mycoplasmas, protozoa, rickettsia, fungi and other small microbial cells and the like.
Target nucleic acids which are recognized by a plurality of multi-probes can be assayed to detect sequences which are present in less than 10% in a population of target nucleic acid molecules, less than about 5%, less than about 1%, less than about 0.1%, and less than about 0.01% (e.g., such as specific gene sequences). The type of assay used to detect such sequences is a non-limiting feature of the invention and may comprise PCR or some other suitable assay as is known in the art or developed to detect recognition sequences which are found in less than 10% of a population of target nucleic acid molecules.
In one aspect, the assay to detect the less abundant recognition sequences comprises hybri-dizing at least one primer capable of specifically hybridizing to the recognition sequence but substantially incapable of hybridizing to more than about 50, more than about 25, more than about 10, more than about 5, more than about 2 target nucleic acid molecules (e.g., the probe recognizes both copies of a homozygous gene sequence), or more than one target nu-cleic acid in a population (e.g., such as an allele of a single copy heterozygous gene sequence present in a sample). In one preferred aspect a pair of such primers is provided and flank the recognition sequence identified by the multi-probe, i.e., are within an amplifiable distance of the recognition sequence such that amplicons of about 40-5000 bases can be produced, and preferably, 50-500 or more preferably 60-100 base amplicons are produced.
One or more of the primers may be labelled.
Various amplifying reactions are well known to one of ordinary skill in the art and include, but are not limited to PCR, RT-PCR, LCR, in vitro transcription, rolling circle PCR, OLA and the like. Multiple primers can also be used in multiplex PCR for detecting a set of specific target molecules.
The invention further provides a method for designing multi-probes sequences for use in methods and kits according to the invention. A fiow chart outlining the steps of the method is shown in Fig. 2.
In one aspect, a plurality of n-mers of n nucleotides is generated in silico, containing all pos-sible n-mers. A subset of n-mers are selected which have a Tm > 60 C. In another aspect, a subset of these probes is selected which do not self-hybridize to provide a list or database of candidate n-mers. The sequence of each n-mer is used to query a database comprising a plurality of target sequences. Preferably, the target sequence database comprises expressed sequences, such as human mRNA sequences.
From the list of candidate n-mers used to query the database, n-mers are selected that iden-tify a maximum number of target sequences (e.g., n-mers which comprise recognition seg-ments which are complementary to subsequences of a maximal number of target sequences in the target database) to generate an n-mer/target sequence matrix. Sequences of n-mers, which bind to a maximum number of target sequences, are stored in a database of optimal probe sequences and these are subtracted from the candidate n-mer database.
Target se-quences that are identified by the first set of optimal probes are removed from the target sequence database. The process is then repeated for the remaining candidate probes until a set of multi-probes is identified comprising n-mers which cover more than about 60%, more than about 80%, more than about 90% and more than about 95% of targets sequences. The optimal sequences identified at each step may be used to generate a database of virtual multi-probes sequences. Multi-probes may then be synthesized which comprise sequences from the multi-probe database.
In another aspect, the method further comprises evaluating the general applicability of a given candidate probe recognition sequence for inclusion in the growing set of optimal probe candidates by both a query against the remaining target sequences as well as a query against the original set of target sequences. In one preferred aspect only probe recognition sequences that are frequentiy found in both the remaining target sequences and in the origi-nal target sequences are added to in the growing set of optimal probe recognition sequences.
In a most preferred aspect this is accomplished by calculating the product of the scores from these queries and selecting the probes recognition sequence with the highest product that still is among the probe recognition sequences with 20% best score in the query against the current targets.
The invention also provides computer program products for facilitating the method described above (see, e.g., Fig. 2). In one aspect, the computer program product comprises program instructions, which can be executed by a computer or a user device connectable to a network in communication with a memory.
The invention further provides a system comprising a computer memory comprising a data-base of target sequences and an application system for executing instructions provided by the computer program product.
Kits Comprising Multi-Probes A preferred embodiment of the invention is a kit for the characterisation or detection or quantification of target nucleic acids comprising samples of a library of multi-probes. In one aspect, the kit comprises in silico protocols for their use. In another aspect, the kit compri-ses information relating to suggestions for obtaining inexpensive DNA primers.
The probes contained within these kits may have any or all of the characteristics described above. In one preferred aspect, a plurality of probes comprises a least one stabilizing nucleobase, such as an LNA nucleobase.
In another aspect, the plurality of probes comprises a nucleotide coupied or stably associated with at least one chemical moiety for increasing the stability of binding of the probe. In a further preferred aspect, the kit comprises a number of different probes for covering at least 60% of a population of different target sequences such as a transcriptome. In one preferred 5 aspect, the transcriptome is a human transcriptome.
In another aspect, the kit comprises at least one probe labelled with one or more labels. In still another aspect, one or more probes comprise labels capable of interacting with each other in a FRET-based assay, i.e., the probes may be designed to perform in 5' nuclease or Molecular Beacon -based assays.
10 The kits according to the invention allow a user to quickly and efficiently to develop assays for many different nucleic acid targets. The kit may additionally comprise one or more re-agents for performing an amplification reaction, such as PCR.
EXAMPLES
The invention will now be further illustrated with reference to the following examples. It will 15 be appreciated that what follows is by way of example only and that modifications to detail may be made while still falling within the scope of the invention.
In the following Examples probe reference numbers designate the LNA-oligonucleotide se-quences shown in the synthesis examples below.
20 Source of transcriptome data The human transcriptome mRNA sequences were obtained from ENSEMBL. ENSEMBL is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on eukaryotic genomes (see, e.g., But-ler, Nature 406 (6794): 333, 2000). ENSEMBL is primarily funded by the Wellcome Trust. It 25 is noted that sequence data can be obtained from any type of database comprising expressed sequences, however, ENSEMBL is particularly attractive because it presents up-to-date se-quence data and the best possible annotation for metazoan genomes. The file "Homo_sapiens.cdna.fa" was downloaded from the ENSEMBL ftp site:
ftp://ftp.ensembl.orq/pub/current human/data/ on May 14. 2003. The file contains all EN-SEMBL transcript predictions (i.e., 37347 different sequences). From each sequence the re-gion starting at 50 nucleotides upstream from the 3' end to 1050 nucleotides upstream of the 3' end was extracted. The chosen set of probe sequences (see best mode below) was further evaluated against the human mRNA sequences in the Reference Sequence (RefSeq) collection from NCBI. RefSeq standards serve as the basis for medical, functional, and diversity studies;
they provide a stable reference for gene identification and characterization, mutation analy-sis, expression studies, polymorphism discovery, and comparative analyses. The RefSeq col-lection aims to provide a comprehensive, integrated, non-redundant set of sequences, inclu-ding genomic DNA, transcript (RNA), and protein products, for major research organisms.
Similar coverage was found for both the 37347 sequences from ENSEMBL and the sequences in the RefSeq collection, i.e., demonstrating that the type of database is a non-limiting feature of the invention.
Calculation of a multi-probe dataset (Alfa library) Special software running on UNIX computers was designed to calculate the optimal set of probes in a library. The algorithm is illustrated in the flow chart shown in Fig. 2.
The optimal coverage of a transcriptome is found in two steps. In the first step a sparse matrix of n_mers and genes is determined, so that the number of genes that contain a given n_mer can be found easily. This is done by running the getcover program with the -p option and a sequence file in FASTA format as input.
The second step is to determine the optimal cover with an algorithm, based on the matrix determined in the first step. For this purpose a program such as the getcover program is run with the matrix as input. However, programs performing similar functions and for executing similar steps may be readily designed by those of skill in the art.
Obtaining good oligonucleotide cover of the transcriptome, 1. All 4' n-mers are generated and the expected melting temperature is calculated. n-mers with a melting temperature below 60 C or with high self-hybridisation energy are removed from the set. This gives a list of n-mers that have acceptable physical proper-ties.
2. A list of gene sequences representing the human transcriptome is extracted from the ENSEMBL database.
3. Start of the main loop: Given the n-mer and gene list a sparse matrix of n-mers versus genes is generated by identifying all n-mers in a given gene and storing the result in a matrix.
4. If this is the first iteration, a copy of the matrix is put aside, and named the "total n-mer/gene matrix".
5. The n-mer that covers most genes is identified and the number of genes it covers is stored as "max_gene".
6. The coverage of the remaining genes in the matrix is determined and genes with coverage of at least 80% of max_gene are stored in the "n-mer list with good cover-age".
7. The optimal n-mer is the one where the product of its current coverage and the total coverage is maximal.
8. The optimal n-mer is deleted from the n-mer list (step 1).
9. The genes covered by this n-mer are deleted from the gene list (step 2).
10. The n-mer is added to the optimal n-mer list, the process is continued from step 3 until no more n-mers can be found.
The program code ("getcover" version 1.0 by Niels Tolstrup 2003) for calculation of a multi-probe dataset is listed in Fig. 17. It consists of three proprietary modules:
getcover.c, dyp.c, dyp. h The program also incorporate four modules covered by the GNU Lesser General Public Li-cence:
getopt.c, getopt.h, getoptl.c, getopt init.c /* Copyright (C) 1987,88,89,90,91,92,93,94,95,96,98,99,2000,2001 Free Software Foundation, Inc.
These files are part of the GNU C Library. The GNU C Library is free software;
you can redis-tribute it and/or modify it under the terms of the GNU Lesser General Public License as pub-lished by the Free Software Foundation */
The software was compiled with aap. The main.aap file used to make the program is like-wise listed in Fig. 17.
To run the compiled program the following command is used:
getcover -1 8,9 -b bad.lst -p -f < h_sap_cdna_50_1050.fasta >
h_sap_cdna_50_1050_I9.stat getcover -1 8,9 -b bad.lst -s < h_sap_cdna_50_1050_I9.stat >
h_sap_cdna_50_1050_I9.cover The computer program was used with instructions for implementing the algorithm described above to analyze the human transcriptome with the following parameter settings:
L89: probe length = 8 or 9 nucleotides ii: inclusion fraction = 100%
d15: delta Tm required for target duplex against self duplex = 15 C
t62: minimum Tm for target duplex = 62 C
c: complementary target sequence used as well m80: optimal probes selected among the most general probes addressing the remaining tar-gets with the product rule and the 80% rule n: LNA nucleotides were preferably included in the central part of the recognition segment b: bad.lst is a list of oligos that are known experimentally to be bad and must be deselected;
and resulted in the identification of a database of multi-probe target sequences.
Target sequences in this database are exemplary optimal targets for a multi-probe library.
These optimal multi-probes are listed in TABLE 1 below and comprise 5' fluorescein fluoro-phores and 3' Eclipse or other quenchers (see below).
TABLE 1 Dual label oligonucleotide probes cagcctcc cagagcca agctgtga aggaggga aggaggag ctggaagc cagagagc tgtggaga cccaggag cagccaga tgaggaga ctggggaa ctccagcc cttctggg acagtgga ctcctgca ctcctcca ttctgcca acagccat tgaggtgg ctgctgcc aggagaga tttctcca aaggcagc ctccagca ttcctg ca cagtggtg ctgtggca ctgctggg tttgggga aaagggga agaagggc cttcctgg caggcaga tgtgggaa tggatgga acagcagc ctgtgcca actgggaa ttctggca cagctcca ttccctgg tcacagga cagaaggc ccccaccc aaccccat ttcctccc atcccaga tggtggtg ctgcccag aggtggaa caggtgct ttcctcca ctgaggca tgtggaca ctgtctcc ctgctcca ctgctggt tggaggcc tgctgtga tggagaga cagtgcca atggtgaa agctggat aaggcaga atggggaa ctggaagg tggagagc cagccagg agggagag caggcagc cttggtgg cagcagga ctctgcca tcaggagc caccttgg ctgtgctg ctgctgag acacacac cagccacc agaggaga ccctccca catcttca ctgtgacc ctgtggct aggaggca cacctgca agggggaa cagtggct cactgcca ccagggcc tgggacca ttctccca ctgtgtgg cagaggca acagggaa cctggagc ttcccagt ctgggact ctgggcaa cccagcag tccagtgt ctgcctgt ctggagga ttctcctg ctcctccc tggaaggc tccactgc cttcctgc cttcccca ctgtgcct ctgccacc ccacctcc ctctgcca ctgtgctc acagcctca ttcctctg cagcaggt ctgtgagc ctgtggtc tggtgatg ctccatcc tcctcctc cttcaggc tgtggctg tgctgtcc ctcagcca tctgggtc cttctccc tcctctcc ctcttccc cttggagc ctgcctcc ctctgcct ctgggcac ccaggctc ctccttcc ctggctgc tgggcatc tctctggt tcctgctc ccgccgcc ctctggct cttgggct catcctcc ctcctcct tgctgggc ctgccatc aggagctg cagcctgg ctgctctc cactggga tcctgctg cagcagcc ctggagtc tgccctga ctcctcca tgctggag cttcagcc ttggtggt ccagccag cttcctcc cttccagc ttgggact cagcccag ttcctggc tccaggtc ctgctgga ctccacca tcctcagc cagcatcc caggagct ctccagcc aggagcag cagaggct ctcagcct tggctctg ccaggagg ctgccttc ttctggct caggcagc cagcctcc ctgggaga ctgtctgc ctgcctct agctggag cccagccc ctgtccca cttctgcc ctgctgcc cagctccc tctgccca ctgctccc tggctgtg ccagccgc ctggacac tggtggaa cctggaga cctcagcc ttgccatc agctggga ccagggcc tcctcttct cttcccct ctgcttcc ccaccacc ctggctcc cttgggca cagcaggc tctgctgc ccagggca ttctggtc tctggagc cagccacc ctccacct ccgccgcc catccagc cagaggag ctgcccca cttcttctc atggctgc ctctcctc tgggcagc ttccctcc ctcctgcc caggagcc ctggtctc ttcctcaga tggtggcc tctggtcc ctggggcc tccaaggc ctggggct ctgtctcc cagtggca ttggggtc ttgccatc cttcccct cttgggca ttctggtc cttcttctc ttccctcc ttcctcaga tccaaggc ttggggtc These hyper-abundant 9-mer and 8-mer sequences fulfil the selection criteria in Fig. 2., i.e., = each probe target occurs in at least 6% of the sequences in the human transcriptome (i.e., more than 2200 target sequences each, more than 800 sequences targeted within 5 1000 nt proximal to the 3' end of the transcript).
= they are not self complementary (i.e. unlikely to form probe duplexes).
Self score is at least 10 below Tm estimate for the duplex formed with the target.
= the formed duplex with their target sequence has a Tm at or above 60 OC.
10 They cover > 98 % of the mRNAs in the human transcriptome when combined.
Especially preferred versions of the multi-probes of table 1 are presented in the following table la:
TABLE la LNA substituted oligonucleotides cAgCCTCc cAGAGCCa aGCTGTGa aGGAGGGa aGGAGGAg cTGGAAGc cAGAGAGc tGTGGAGa ccCAGGAg cAGCCAGa tGAGGAGa ctGGGGAa cTCCAgCc cTTCTGGg aCAGTGGa cTCCtGCa cTCCTCCa tTCTGCCa aCAGCCAt tGAGGtGg cTgCTGCc aGGAGAGa tTTCTCCa aAGGCAGc cTCCAGCa tTCCTGCa cAGTGGTg ctGTGGCa cTGCTGgg tTTGGGGa aAAGGGGa aGAAGGGc cTTCCTGg cAGGCAGa tGTGGGAa tGGATGGa aCAGCAGc ctGTGCCa aCTGGGAa tTCTGGCa caGCTCCa tTCCCTGg tCACAGGa cAGAAGGc cCCCACCc aACCCCAt tTCCTCCc aTCCCAGa tGGTGGTg ctGCCCag aGGTGGAa cAGGtGCt tTCCTCCa cTGAGGCa tGTGGACa cTGTCTCc cTGCTCCa cTGCtGGt tGGAGgCc tGCTGTGa tGGAGAGa cAGtGCCa atGGTGAA aGCTGGAt aAGGCAGa aTGGGGAa cTGGAAGg tGGAGAGc cAGCcAGg aGGGAGAg cAGGcAGc cTTGGTGg cAGCAGGa cTCtGCCa tCAGGaGc cACCTTGg cTGTGCTg cTGCTGAg aCACACAC cAgCCACc aGAGGAGa cCCtCCCa cATCTTCA cTGTGACc ctGTGGCt aGGAGGca cACCtGCa aGGGGGAa caGTGGCt cACtGCCa cCAGgGcc tGgGACCa tTCTCCCa cTGTGTGg cAGAGGCa aCAGGGAa cTGgcTGC cAGCAGGC cAGCATCC tCTGCCCA
ccGCCgCC cTGCCTCT cAGAGGCT cTGGACAC
cTCCTCCT cTCCACCT cATCCTCC tCAgCAGC
cTGGAGGA cTCCTCCC cTCTGCCT tTCTTGGC
caGCcTGG cTTCCCCA cAGTGGCA cggCGGCA
cAGcAGCC cTTCAGCC cAGCACCC cTGGTGGT
cTTCCTCC cTCTGCCA cTCTCCTC cCTTCTCC
ccAGGAGG cTTCTGCC tCTGgTCC cCTCTTCC
cAGCcTCC cAGCAGGT cAGGAGCC tGTTGCCA
aGcTGGAG tcTGGAGC cTGTCTCC tGGaTGGC
cTGcTGcC cTGCCCCA cTGGGACT cCAGCATC
tGGcTGTG cATCCAGC cTGCCTGT tCTTCTTCT
cCTGGAGa aTGGcTGC tGGaAGGC tcgCCGCC
cCAGGGcC cTCCTGCC cTGTGCCT tGCTGTTC
cCACCACC cTGGGGcc cTGTGCTC tCAAGGGC
acAGCCTCA cTCCATCC cTGTGAGC tgCTGCTC
cAGAGGAG cTGGGCAA cTCTTCCC tcGCCGTC
tGcTGGAG cCAGCCGC cTGGGCAC tTGATGCC
aGGAGcAG tGGTGGcc tGGGCATC cCTTCAGC
aGGaGCTG cTGGGGCT tCCTCCTC aTTCCAGC
tCCTGCTG cTGCTCCC cTCTGGCT tTGATGGC
cCTGGAGC tGCTGTCC tgcTGGGC cCAGTTCC
cTCCTCCA tCCTCTCC cTCAGCCA tTGGCTTC
cCAGCCAG tGGTGGAA cTGCTCTC tTGCCTTC
cCCAGCAG aGCTGGGA cTGGAGTC aTGGCTTC
tTCTCCTG cTGGTCTC cTGTGGTC cACCCGCT
cAGCCCAG tTCCCAGT cTTCAGGC tCTTTGCC
cTTCCTGC tCCTCTTCT tCTGGGTC cTGGTTGC
cTCCACCA tCCAGTGT cTTGGAGC tGGACACC
cTTCCAGC tGGGcAGC cCAGGCTC tcGTCGCC
cCCAGCCC cCAGGGCA tCTCTGGT cCATCAGC
cTGCCTTC cTGGCTCC CTTGGGCT tGGTGGAT
cTCCAGCC tCTGcTGC cTGCCATC aTGGTGGT
cCACCTCC cAGCCACC cACTGGGA cCtGGTGC
tTCCTCTG tTCcTGGC tGCCCTGa tCCTCGTC
tGGCTCTG tCCTCAGC tTGGTGGT tTCTTGCC
tGGTGATG cTCCTTCC tTGGGACT tGGgCTTC
tGTGGcTG cTGGGAGA cTGCTGGA tGATGAGC
cTTCTCCC tCCTGCTC cAGGaGCT tCCTggCC
cTGCCTCC cAGGcAGC cTCAGCCT cCTCCTTC
cAGCTCCC tCCACTGC tTCTGGCT tGCTGGAG
cTGCTTCC cTGCCACC cTGTCTGC
ccTCAGCC tCcAGGTC cTGTCCCA
- wherein small letters designate deoxyribonucleotides and capital letters designate LNA
nucleotides.
> 95.0 % of the mRNA sequences are targeted within the 1000 nt near their 3'terminal, (position 50 to 1050 from 3' end) and > 95% of the mRNA contain the target sequence for more than one probe in the library. More than 650,000 target sites for these 100 multi-probes were identified in the human transcriptome containing 37,347 nucleic acid sequences.
The average number of multi-probes addressing each transcript in the transcriptome is 17.4 and the median value is target sites for 14 different probes.
The sequences noted above are also an excellent choice of probes for other transcriptomes, though they were not selected to be optimized for the particular organisms. We have thus evaluated the coverage of the above listed library for the mouse and rat genome despite the fact that the above probes were designed to detect/characterize/quantify the transcripts in the human transcriptome only. E.g. see table 2.
TABLE 2 Transcriptome Human probe library Human Mouse Rat no. of mRNA sequences 37347 32911 28904 Coverage of full length mRNAs 96.7% 94.6% 93.5%
Coverage 1000 nt near the 3'-end 91.0% - -At least covered by two probes 89.8% 80.2% 77.0%
nt - nucleotides.
Expected coverage of human transcriptome by frequently occurring 9-mer oligonucleotides Experimental pilot data (similar to Fig. 6) indicated that it is possible to reduce the length of the recognition sequence of a dual-labelled probe for real-time PCR assays to 8 or 9 nucleo-tides depending on the sequence, if the probe is enhanced with LNA. The unique duplex sta-bilizing properties of LNA are necessary to ensure an adequate stability for such a short du-plex (i.e. Tm > 60 OC). The functional real-time PCR probe will be almost pure LNA with 6 to LNA nucleotides in the recognition sequence. However, the short recognition sequence 10 makes it possible to use the same LNA probe to detect and quantify the abundance of many different genes. By proper selection of the best (i.e. most common) 8 or 9-mer recognition sequences according to the algorithm depicted in Fig. 2 it is possible to get a coverage of the human transcriptome containing about 37347 mRNAs (Fig. 3).
Fig. 3 shows the expected coverage as percentage of the total number of mRNA
sequences in the human transcriptome that are detectable within a 1000 nt long stretch near the 3' end of the respective sequences (i.e. the sequence from 50 nt to 1050 nt from the 3' end) by opti-mized probes of different lengths. The probes are required to be sufficiently stable (Tm>60 degC) and with a low propensity for forming self duplexes, which eliminate many 9-mers and even more 8-mer probe sequences.
If all probes sequences of a given length could be used as probes we would obviously get the best coverage of the transcriptome by the shortest possible probe sequences.
This is indeed the case when only a limited number of probes (< 55) are included in the library (Fig. 4).
However, because many short probes with a low GC content have an inadequate thermal stability, they were omitted from the library. The limited diversity of acceptable 8-mer probes are less efficient at detecting low GC content genes, and a library composed of 100 different 9-mer probes consequently have a better coverage of the transcriptome than a similar library of 8-mers. However, the best choice is a mixed library composed of sequences of different lengths such as the proposed best mode library listed above. The coverage of this library is not shown in Fig. 4.
The designed probe library containing 100 of the most commonly occurring 9-mer and 8-mers, i.e., the "Human mRNA probe library" can be handled in a convenient box or microtiter plate format.
The initial set of 100 probes for human mRNAs can be modified to generate similar library kits for transcriptomes from other organisms (mouse, rat, Drosophila, C.
elegans, yeast, Arabidopsis, zebra fish, primates, domestic animals, etc.). Construction of these new probe libraries will require little effort, as most of the human mRNA probes may be re-used in the novel library kits (TABLE 2).
Number of probes in the library that target each gene Not only does the limited number of probes in the proposed libraries target a large fraction (> 98%) of the human transcriptome, but there is also a large degree of redundancy in that most of the genes (almost 95%) may be detected by more than one probe. More than 650,000 target sites have been identified in the human transcriptome (37347 genes) for the 100 probes in the best mode library shown above. This gives an average number of target sites per probe of 6782 (i.e. 18 % of the transcriptome) ranging from 2527 to 12066 se-quences per probe. The average number of probes capable of detecting a particular gene is 17.4, and the median value is 14. Within the library of only 100 probes we thus have at least 14 probes for more than 50% of all human mRNA sequences.
The number of genes that are targeted by a given number of probes in the library is depicted in Fig. 4.
Design of 9-mer probes to demonstrate feasibility The SSA4 gene from yeast (Saccharomyces cerevisiae) was selected for the expression as-says because the gene transcription level can be induced by heat shock and mutants are available where expression is knocked out. Three different 9mer sequences were selected amongst commonly occurring 9mer sequences within the human transcriptome (Table 3).
The sequences were present near the 3' terminal end of 1.8 to 6.4 % of all mRNA sequences within the human transcriptome. Further selection criteria were a moderate level of self-com-plementarity and a Tm of 60 C or above. All three sequences were present within the termi-nal 1000 bases of the SSA4 ORF. Three 5' nuclease assay probes were constructed by syn-thesizing the three sequences with a FITCH fluorophore in the 5'-end and an Eclipse quencher (Epoch Biosciences) in the 3'end. The probes were named according to their position within 5 the ORF YER103W (SSA4) where position 1201 was set to be position 1. Three sets of primer pairs were designed to produce three non-overlapping amplicons, which each contained one of the three probe sequences. Amplicons were named according to the probe sequence they encompassed.
Table 3. Designed 5' nuclease assay probes and primers Sequence Name of Forward primer sequen- Reverse primer se- Amplicon probe ce quence length aaGGAGAAG Dual-label- cgcgtttactttgaaaaatt gcttccaatttcctggca 81 bp led-469 ctg tc (SEQ ID NO: 1) (SEQ ID NO: 2) cAAGGAAAg Dual-la- gcccaagatgctataaatt- gggtttgcaacaccttct 95 bp belled-570 ggttag agttc (SEQ ID NO: 3) (SEQ ID NO: 4) ctGGAGCaG Dual-label- tacggagctgcaggtggt gttgggccgttgtctggt 86 bp led-671 (SEQ ID NO: 5) (SEQ ID NO: 6) 10 bp - base pairs Two Molecular Beacons were also designed to detect the SSA4 469- and the SSA4 570 se-quence and named Beacon-469 and Beacon-570, respectively. The sequence of the beacon was CAAGGAGAAGTTG (SEQ ID NO: 7, 10-mer recognition site) which should enable this oligonucleotide to form the intramolecular beacon structure with a stem formed by the 15 LNA-LNA interactions between the 5'-CAA and the TTG-3'. The sequence of the beacon was CAAGGAAAGttG (9-mer recognition site) where the intramolecular beacon struc-ture may form between the 5'-CAA and the ttG-3'. Both the sequences were synthesized with a fluorescein fluorophore in the 5'-end and a Dabcyl quencher in the 3'end.
One SYBR Green labelled probe was also designed to detect the SSA4 570 sequence and 20 named SYBR-Probe-570. The sequence of this probe was CAAGGAAaG. This probe was syn-thesized with an amino-C6 linker on the 5'-end on which the fluorophore SYBR
Green 101 (Molecular Probes) was attached according to the manufactures instructions.
Upon hybridiza-tion to the target sequence, the linker attached fluorophore should intercalate in the genera-ted LNA-DNA duplex region causing increased fluorescence from the SYBR Green 101.
TABLE 4: SEQUENCES
EQ Name Type Sequence Position Number in gene Dual-labelled-13992 469 5' nuclease assay probe 5'-Fluor-aaGGAGAAG-Eclipse-3' 469-477 Dual-labelled-13994 570 5' nuclease assay probe 5'-Fluor-cAAGGAAAg-Eclipse-3' 570-578 Dual-labelled-13996 671 5' nuclease assay probe 5'-Fluor-ctGGAGCaG-Eclipse-3' 671-679 13997 Beacon-469 Molecular Beacon 5'-FI uor-CAAGGAGAAGTTG-Da bcyl -3' (5'-Fluor-SEQ ID NO: 8-Dabcyl-3') 14148 Beacon-570 Molecular Beacon 5'-Fluor-CAAGGAAAGttG-Dabcyl-3' (5'-Fluor-SEQ ID NO: 9-Dabcyl-3') SYBR-Probe-14165 570 SYBR-Probe 5'-SYBR101-NH2C6-cAAGGAAAg-3' 14012 SSA4-469-F Primer cgcgtttactttgaaaaattctg (SEQ ID NO:
10) 14013 SSA4-469-R Primer gcttccaatttcctggcatc (SEQ ID NO: 11) 14014 SSA4-570-F Primer gcccaagatgctataaattggttag (SEQ ID
NO: 12) 14015 SSA4-570-R Primer gggtttgcaacaccttctagttc (SEQ ID NO:
13) 14016 SSA4-671-F Primer tacggagctgcaggtggt (SEQ ID NO: 14) 14017 SSA4-671-R Primer gttgggccgttgtctggt (SEQ ID NO: 15) 14115 POL5-469-F Primer gcgagagaaaacaagcaagg (SEQ ID NO:
16) 14116 POL5-469-R Primer attcgtcttcactggcatca (SEQ ID NO: 17) 14117 APG9-570-F Primer cagctaaaaatgatgacaataatgg (SEQ ID
NO: 18) 14118 APG9-570-R Primer attacatcatgattagggaatgc (SEQ ID NO:
19) 14119 HSP82-671-F Primer gggtttgaacattgatgagga (SEQ ID NO:
20) 14120 HSP82-671-R Primer ggtgtcagctggaacctctt (SEQ ID NO: 21) Synthesis, deprotection and purification of dual labelled oligonucleotides The dual labelled oligonucleotides EQ13992 to EQ14148 (Table 4) were prepared on an automated DNA synthesizer (Expedite 8909 DNA synthesizer, PerSeptive Biosystems, 0.2 mol scale) using the phosphoramidite approach (Beaucage and Caruthers, Tetrahedron Lett.
22: 1859-1862, 1981) with 2-cyanoethyl protected LNA and DNA phosphoramidites, (Sinha, et al., Tetrahedron Lett.24: 5843-5846, 1983). CPG solid supports were derivatized with either eclipse quencher (EQ13992-EQ13996) or dabcyl (EQ13997-EQ14148) and 5'-fluorescein phosphoramidite (GLEN Research, Sterling, Virginia, USA). The synthesis cycle was modified for LNA phosphoramidites (250s coupling time) compared to DNA
phosphoramidites. 1H-tetrazole or 4,5-dicyanoimidazole (Proligo, Hamburg, Germany) was used as activator in the coupling step.
The oligonucleotides were deprotected using 32% aqueous ammonia (lh at room tempera-ture, then 2 hours at 60 C) and purified by HPLC (Shimadzu-SpectraChrom series; XterraTM
RP18 column, 10?m 7.8 x 150 mm (Waters). Buffers: A: 0.05M Triethylammonium acetate pH 7.4. B. 50% acetonitrile in water. Eluent: 0-25 min: 10-80% B; 25-30 min:
80% B). The composition and purity of the oligonucleotides were verified by MALDI-MS
(PerSeptive Bio-system, Voyager DE-PRO) analysis, see Table 5. Fig. 5 is the MALDI-MS spectrum of EQ13992 showing [M-H]- = 4121,3 Da. This is a typical MALDI-MS spectrum for the 9-mer probes of the invention.
TABLE 5:
EQ# Sequences MW (Calc.) MW (Found) 13992 5'-Fitc-aaGGAGAAG-EQL-3' 4091,8 Da. 4091,6 Da.
13994 5'-Fitc-cAAGGAAAg-EQL-3' 4051,9 Da. 4049,3 Da.
13996 5'-Fitc-ctGGAGmCaG-EQL-3' 4020,8 Da. 4021,6 Da.
5'- Fitc-mCAAGGAGAAGTTG-dabcy/-3' 13997 (5'-Fitc-SEQ ID NO: 22-dabcyl-3') 5426,3 Da. 5421,2 Da.
Capitals designate LNA monomers (A, G, mC, T), where mC is LNA methyl cytosine. Small letters designate DNA monomers (a, g, c, t). Fitc = Fluorescein; EQL = Eclipse quencher;
Dabcyl = Dabcyl quencher. MW = Molecular weight.
Production of cDNA standards of SSA4 for detection with 9-mer probes The functionality of the constructed 9mer probes were analysed in PCR assays where the probes ability to detect different SSA4 PCR amplicons were questioned.
Template for the PCR
reaction was cDNA obtained from reverse transcription of cRNA produced from in vitro tran-scription of a downstream region of the SSA4 gene in the expression vector pTRIamp18 (Am-bion). The downstream region of the SSA4 gene was cloned as follows:
PCR amplification Amplification of the partial yeast gene was done by standard PCR using yeast genomic DNA
as template. Genomic DNA was prepared from a wild type standard laboratory strain of Sac-charomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham Biosciences) according to supplier's instructions. In the first step of PCR amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker se-quence were used. In this step 20 bp was added to the 3'-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing a poly-T20 tail and a restriction enzyme site. The SSA4 amplicon contains 729 bp of the SSA4 ORF plus a 20 bp universal linker sequence and a poly-A20 tail.
The PCR primers used were:
YER103W-For-SacI: acgtgagctcattgaaactgcaggtggtattatga (SEQ ID NO: 23) YER103W-Rev-Uni: gatccccgggaattgccatgctaatcaacctcttcaaccgttgg (SEQ ID NO: 24) Uni-polyT-BamHI: acgtggatccttttttttttttttttttttgatccccgggaattgccatg (SEQ ID
NO: 25).
Plasmid DNA constructs The PCR amplicon was cut with the restriction enzymes, EcoRI + BamHI. The DNA
fragment was ligated into the pTRIamp18 vector (Ambion) using the Quick Ligation Kit (New England Biolabs) according to the supplier's instructions and transformed into E. coli DH-5 by stan-dard methods.
DNA sequencing To verify the cloning of the PCR amplicon, plasmid DNA was sequenced using M13 forward and M13 reverse primers and analysed on an ABI 377.
In vitro transcription SSA4 cRNA was obtained by performing in vitro transcription with the Megascript T7 kit (Am-bion) according to the supplier's instructions.
Reverse transcription Reverse transcription was performed with lpg of cRNA and 0.2 U of the reverse transcriptase Superscript II RT (Invitrogen) according to the suppliers instructions except that 20 U Supe-rase-In (RNAse inhibitor - Ambion) was added. The produced cDNA was purified on a QiaQuick PCR purification column (Qiagen) according to the supplier's instructions using the supplied EB-buffer for elution. The DNA concentration of the eluted cDNA was measured and diluted to a concentration of SSA4 cDNA copies corresponding to 2 x 10' copies pr pL.
Protocol for of dual label probe assays Reagents for the dual label probe PCRs were mixed according to the following scheme (Table 6):
Table 6 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mg2+ 5.5 mM
DNTP 0.2 mM
Dual Label Probe 0.1 or 0.3 pM*
Template 1 pL
Forward primer 0.2 pM
Reverse primer 0.2 pM
AmliTaq Gold 2.5 U
Total 50 pL
*) Final concentration of 5' nuclease assay probe 0.1 pM and Beacon/SYBR-probe 0.3 pM.
In the present experiments 2 x 10' copies of the SSA4 cDNA was added as template. Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle pro-5 tocols:
Table 7 5' nuclease assays Beacon & SYBR-probe Assays 95 C for 7 minutes 95 C for 7 minutes & &
The term "stringent conditions", as used herein, is the "stringency" which occurs within a range from about Tm-5 C (5 C below the melting temperature (Tm) of the probe) to about 20 C to 25 C below Tm. As will be understood by those skilled in the art, the stringency of hybridization may be altered in order to identify or detect identical or related polynucleotide sequences. Hybridization techniques are generally described in Nucleic Acid Hybridization, A
Practical Approach, Ed. Hames, B. D. and Higgins, S. J., IRL Press, 1985; Gall and Pardue, Proc. Nati. Acad. Sci., USA 63: 378-383, 1969; and John, et al. Nature 223:
582-587, 1969.
Multi-probes Referring now to Fig. 113, a multi-probe according to the invention is preferably a short se-quence probe which binds to a recognition sequence found in a plurality of different target nucleic acids, such that the multi-probe specifically hybridizes to the target nucleic acid but do not hybridize to any detectable level to nucleic acid molecules which do not comprise the recognition sequence. Preferably, a collection of multi-probes, or multi-probe library, is able to recognize a major proportion of a transcriptome, including the most abundant sequences, such as about 60%, about 70%, about 80%, about 85%, more preferably about 90%, and still more preferably 95%, of the target nucleic acids in the transcriptome, are detected by the probes. A multi-probe according to the invention comprises a "stabilizing modification"
e.g. such as a non-natural nucleotide ("a stabilizing nucleotide") and has higher binding af-finity for the recognition sequence than a probe comprising an identical sequence but without the stabilizing sequence. Preferably, at least one nucleotide of a multi-probe is modified by a chemical moiety (e.g., covalently or otherwise stably associated with the probe during at least hybridization stages of a PCR reaction) for increasing the binding affinity of the recogni-tion segment for the recognition sequence.
In one aspect, a multi-probe of from 6 to 12 nucleotides comprises from 1 to 6 or even up to 12 stabilizing nucleotides, such as LNA nucleotides. An LNA enhanced probe library contains short probes that recognize a short recognition sequence (e.g., 8-9 nucleotides). LNA nu-cleobases can comprise a-LNA molecules (see, e.g., WO 00/66604) or xylo-LNA
molecules (see, e.g., WO 00/56748).
In one aspect, it is preferred that the Tm of the multi-probe when bound to its recognition sequence is between about 55 C to about 70 C.
In another aspect, the multi-probes comprise one or more modified nucleobases.
Modified base units may comprise a cyclic unit (e.g. a carbocyclic unit such as pyrenyl) that is joined to a nucleic unit, such as a 1'-position of furasonyl ring through a linker, such as a straight of branched chain alkylene or alkenylene group. Alkylene groups suitably having from 1(i.e., -5 CH2-) to about 12 carbon atoms, more typically 1 to about 8 carbon atoms, still more typi-cally 1 to about 6 carbon atoms. Alkenylene groups suitably have one, two or three carbon-carbon double bounds and from 2 to about 12 carbon atoms, more typically 2 to about 8 car-bon atoms, still more typically 2 to about 6 carbon atoms.
Multi-probes according to the invention are ideal for performing such assays as real-time PCR
10 as the probes according to the invention are preferably less than about 25 nucleotides, less than about 15 nucleotides, less than about 10 nucleotides, e.g., 8 or 9 nucleotides. Prefer-ably, a multi-probe can specifically hybridize with a recognition sequence within a target se-quence under PCR conditions and preferably the recognition sequence is found in at least about 50, at least about 100, at least about 200, at least about 500 different target nucleic 15 acid molecules. A library of multi-probes according to the invention will comprise multi-probes, which comprise non-identical recognition sequences, such that any two multi-probes hybridize to different sets of target nucleic acid molecules. In one aspect, the sets of target nucleic acid molecules comprise some identical target nucleic acid molecules, i.e., a target nucleic acid molecule comprising a gene sequence of interest may be bound by more than 20 one multi-probe. Such a target nucleic acid molecule wili contain at least two different re-cognition sequences which may overlap by one or more, but less than x nucleotides of a re-cognition sequence comprising x nucleotides.
In one aspect, a multi-probe library comprises a piurality of different multi-probes, each dif-ferent probe localized at a discrete location on a solid substrate. As used herein, "localize"
25 refers to being limited or addressed at the location such that hybridization event detected at the location can be traced to a probe of known sequence identity. A localized probe may or may not be stably associated with the substrate. For example, the probe could be in solution in the well of a microtiter plate and thus localized or addressed to the well.
Alternatively, or additionally, the probe could be stably associated with the substrate such that it remains at a 30 defined location on the substrate after one or more washes of the substrate with a buffer.
For example, the probe may be chemically associated with the substrate, either directly or through a linker molecule, which may be a nucleic acid sequence, a peptide or other type of molecule, which has an affinity for molecules on the substrate.
Alternatively, the target nucleic acid molecules may be localized on a substrate (e.g., as a 35 cell or cell lysate or nucleic acids dotted onto the substrate).
Once the appropriate sequences are determined, multi-LNA probes are preferably chemically synthesized using commercially available methods and equipment as described in the art (Tetrahedron 54: 3607-30, 1998). For example, the solid phase phosphoramidite method can be used to produce short LNA probes (Caruthers, et al., Cold Spring Harbor Symp.
Quant. Biol. 47:411-418, 1982, Adams, et al., J. Am. Chem. Soc. 105: 661 (1983).
The determination of the extent of hybridization of multi-probes from a multi-probe library to one or more target sequences (preferably to a plurality of target sequences) may be carried out by any of the methods well known in the art. If there is no detectable hybridization, the extent of hybridization is thus 0. Typically, labelled signal nucleic acids are used to detect hybridization. Complementary nucleic acids or signal nucleic acids may be labelled by any one of several methods typically used to detect the presence of hybridized polynucleotides.
The most common method of detection is the use of ligands, which bind to labelled antibo-dies, fluorophores or chemiluminescent agents. Other labels include antibodies, which can serve as specific binding pair members for a labelled ligand. The choice of label depends on sensitivity required, ease of conjugation with the probe, stability requirements, and available instrumentation.
LNA-containing-probes are typically labelled during synthesis. The flexibility of the phos-phoramidite synthesis approach furthermore facilitates the easy production of LNAs carrying all commercially available linkers, fluorophores and labelling-molecules available for this standard chemistry. LNA may also be labelled by enzymatic reactions e.g. by kinasing.
Multi-probes according to the invention can comprise single labels or a plurality of labels. In one aspect, the plurality of labels comprise a pair of labels which interact with each other either to produce a signal or to produce a change in a signal when hybridization of the multi-probe to a target sequence occurs.
In another aspect, the multi-probe comprises a fluorophore moiety and a quencher moiety, positioned in such a way that the hybridized state of the probe can be distinguished from the unhybridized state of the probe by an increase in the fluorescent signal from the nucleotide.
In one aspect, the multi-probe comprises, in addition to the recognition element, first and second complementary sequences, which specifically hybridize to each other, when the probe is not hybridized to a recognition sequence in a target molecule, bringing the quencher mole-cule in sufficient proximity to said reporter molecule to quench fluorescence of the reporter molecule. Hybridization of the target molecule distances the quencher from the reporter molecule and results in a signal, which is proportional to the amount of hybridization.
In another aspect, where polymerization of strands of nucleic acids can be detected using a polymerase with 5' nuclease activity. Fluorophore and quencher molecules are incorporated into the probe in sufficient proximity such that the quencher quenches the signal of the fluorophore molecule when the probe is hybridized to its recognition sequence.
Cleavage of the probe by the polymerase with 5' nuclease activity results in separation of the quencher and fluorophore molecule, and the presence in increasing amounts of signal as nucleic acid sequences In the present context, the term "label" means a reporter group, which is detectable either by itself or as a part of a detection series. Examples of functional parts of reporter groups are biotin, digoxigenin, fluorescent groups (groups which are able to absorb electromagnetic radiation, e.g. light or X-rays, of a certain wavelength, and which subsequently reemits the energy absorbed as radiation of longer wavelength; illustrative examples are DANSYL (5-di-methylamino)-1-naphthalenesulfonyl), DOXYL (N-oxyl-4,4-dimethyloxazolidine), PROXYL (N-oxyl-2,2,5,5-tetramethylpyrrolidine), TEMPO (N-oxyl-2,2,6,6-tetramethylpiperidine), dinitro-phenyl, acridines, coumarins, Cy3 and Cy5 (trademarks for Biological Detection Systems, Inc.), erythrosine, coumaric acid, umbelliferone, Texas red, rhodamine, tetramethyl rhoda-mine, Rox, 7-nitrobenzo-2-oxa-l-diazole (NBD), pyrene, fluorescein, Europium, Ruthenium, Samarium, and other rare earth metals), radio isotopic labels, chemiluminescence labels (la-bels that are detectable via the emission of light during a chemical reaction), spin labels (a free radical (e.g. substituted organic nitroxides) or other paramagnetic probes (e.g. Cu2+, MgZ+) bound to a biological molecule being detectable by the use of electron spin resonance spectroscopy). Especially interesting examples are biotin, fluorescein, Texas Red, rhodamine, dinitrophenyl, digoxigenin, Ruthenium, Europium, Cy5, Cy3, etc.
Suitable samples of target nucleic acid molecule may comprise a wide range of eukaryotic and prokaryotic cells, including protoplasts; or other biological materials, which may harbour target nucleic acids. The methods are thus applicable to tissue culture animal cells, animal cells (e.g., blood, serum, plasma, reticulocytes, lymphocytes, urine, bone marrow tissue, cerebrospinal fluid or any product prepared from blood or lymph) or any type of tissue biopsy (e.g. a muscle biopsy, a liver biopsy, a kidney biopsy, a bladder biopsy, a bone biopsy, a car-tilage biopsy, a skin biopsy, a pancreas biopsy, a biopsy of the intestinal tract, a thymus bi-opsy, a mammae biopsy, a uterus biopsy, a testicular biopsy, an eye biopsy or a brain bi-opsy, e.g., homogenized in lysis buffer), archival tissue nucleic acids, plant cells or other cells sensitive to osmotic shock and cells of bacteria, yeasts, viruses, mycoplasmas, protozoa, rickettsia, fungi and other small microbial cells and the like.
Target nucleic acids which are recognized by a plurality of multi-probes can be assayed to detect sequences which are present in less than 10% in a population of target nucleic acid molecules, less than about 5%, less than about 1%, less than about 0.1%, and less than about 0.01% (e.g., such as specific gene sequences). The type of assay used to detect such sequences is a non-limiting feature of the invention and may comprise PCR or some other suitable assay as is known in the art or developed to detect recognition sequences which are found in less than 10% of a population of target nucleic acid molecules.
In one aspect, the assay to detect the less abundant recognition sequences comprises hybri-dizing at least one primer capable of specifically hybridizing to the recognition sequence but substantially incapable of hybridizing to more than about 50, more than about 25, more than about 10, more than about 5, more than about 2 target nucleic acid molecules (e.g., the probe recognizes both copies of a homozygous gene sequence), or more than one target nu-cleic acid in a population (e.g., such as an allele of a single copy heterozygous gene sequence present in a sample). In one preferred aspect a pair of such primers is provided and flank the recognition sequence identified by the multi-probe, i.e., are within an amplifiable distance of the recognition sequence such that amplicons of about 40-5000 bases can be produced, and preferably, 50-500 or more preferably 60-100 base amplicons are produced.
One or more of the primers may be labelled.
Various amplifying reactions are well known to one of ordinary skill in the art and include, but are not limited to PCR, RT-PCR, LCR, in vitro transcription, rolling circle PCR, OLA and the like. Multiple primers can also be used in multiplex PCR for detecting a set of specific target molecules.
The invention further provides a method for designing multi-probes sequences for use in methods and kits according to the invention. A fiow chart outlining the steps of the method is shown in Fig. 2.
In one aspect, a plurality of n-mers of n nucleotides is generated in silico, containing all pos-sible n-mers. A subset of n-mers are selected which have a Tm > 60 C. In another aspect, a subset of these probes is selected which do not self-hybridize to provide a list or database of candidate n-mers. The sequence of each n-mer is used to query a database comprising a plurality of target sequences. Preferably, the target sequence database comprises expressed sequences, such as human mRNA sequences.
From the list of candidate n-mers used to query the database, n-mers are selected that iden-tify a maximum number of target sequences (e.g., n-mers which comprise recognition seg-ments which are complementary to subsequences of a maximal number of target sequences in the target database) to generate an n-mer/target sequence matrix. Sequences of n-mers, which bind to a maximum number of target sequences, are stored in a database of optimal probe sequences and these are subtracted from the candidate n-mer database.
Target se-quences that are identified by the first set of optimal probes are removed from the target sequence database. The process is then repeated for the remaining candidate probes until a set of multi-probes is identified comprising n-mers which cover more than about 60%, more than about 80%, more than about 90% and more than about 95% of targets sequences. The optimal sequences identified at each step may be used to generate a database of virtual multi-probes sequences. Multi-probes may then be synthesized which comprise sequences from the multi-probe database.
In another aspect, the method further comprises evaluating the general applicability of a given candidate probe recognition sequence for inclusion in the growing set of optimal probe candidates by both a query against the remaining target sequences as well as a query against the original set of target sequences. In one preferred aspect only probe recognition sequences that are frequentiy found in both the remaining target sequences and in the origi-nal target sequences are added to in the growing set of optimal probe recognition sequences.
In a most preferred aspect this is accomplished by calculating the product of the scores from these queries and selecting the probes recognition sequence with the highest product that still is among the probe recognition sequences with 20% best score in the query against the current targets.
The invention also provides computer program products for facilitating the method described above (see, e.g., Fig. 2). In one aspect, the computer program product comprises program instructions, which can be executed by a computer or a user device connectable to a network in communication with a memory.
The invention further provides a system comprising a computer memory comprising a data-base of target sequences and an application system for executing instructions provided by the computer program product.
Kits Comprising Multi-Probes A preferred embodiment of the invention is a kit for the characterisation or detection or quantification of target nucleic acids comprising samples of a library of multi-probes. In one aspect, the kit comprises in silico protocols for their use. In another aspect, the kit compri-ses information relating to suggestions for obtaining inexpensive DNA primers.
The probes contained within these kits may have any or all of the characteristics described above. In one preferred aspect, a plurality of probes comprises a least one stabilizing nucleobase, such as an LNA nucleobase.
In another aspect, the plurality of probes comprises a nucleotide coupied or stably associated with at least one chemical moiety for increasing the stability of binding of the probe. In a further preferred aspect, the kit comprises a number of different probes for covering at least 60% of a population of different target sequences such as a transcriptome. In one preferred 5 aspect, the transcriptome is a human transcriptome.
In another aspect, the kit comprises at least one probe labelled with one or more labels. In still another aspect, one or more probes comprise labels capable of interacting with each other in a FRET-based assay, i.e., the probes may be designed to perform in 5' nuclease or Molecular Beacon -based assays.
10 The kits according to the invention allow a user to quickly and efficiently to develop assays for many different nucleic acid targets. The kit may additionally comprise one or more re-agents for performing an amplification reaction, such as PCR.
EXAMPLES
The invention will now be further illustrated with reference to the following examples. It will 15 be appreciated that what follows is by way of example only and that modifications to detail may be made while still falling within the scope of the invention.
In the following Examples probe reference numbers designate the LNA-oligonucleotide se-quences shown in the synthesis examples below.
20 Source of transcriptome data The human transcriptome mRNA sequences were obtained from ENSEMBL. ENSEMBL is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on eukaryotic genomes (see, e.g., But-ler, Nature 406 (6794): 333, 2000). ENSEMBL is primarily funded by the Wellcome Trust. It 25 is noted that sequence data can be obtained from any type of database comprising expressed sequences, however, ENSEMBL is particularly attractive because it presents up-to-date se-quence data and the best possible annotation for metazoan genomes. The file "Homo_sapiens.cdna.fa" was downloaded from the ENSEMBL ftp site:
ftp://ftp.ensembl.orq/pub/current human/data/ on May 14. 2003. The file contains all EN-SEMBL transcript predictions (i.e., 37347 different sequences). From each sequence the re-gion starting at 50 nucleotides upstream from the 3' end to 1050 nucleotides upstream of the 3' end was extracted. The chosen set of probe sequences (see best mode below) was further evaluated against the human mRNA sequences in the Reference Sequence (RefSeq) collection from NCBI. RefSeq standards serve as the basis for medical, functional, and diversity studies;
they provide a stable reference for gene identification and characterization, mutation analy-sis, expression studies, polymorphism discovery, and comparative analyses. The RefSeq col-lection aims to provide a comprehensive, integrated, non-redundant set of sequences, inclu-ding genomic DNA, transcript (RNA), and protein products, for major research organisms.
Similar coverage was found for both the 37347 sequences from ENSEMBL and the sequences in the RefSeq collection, i.e., demonstrating that the type of database is a non-limiting feature of the invention.
Calculation of a multi-probe dataset (Alfa library) Special software running on UNIX computers was designed to calculate the optimal set of probes in a library. The algorithm is illustrated in the flow chart shown in Fig. 2.
The optimal coverage of a transcriptome is found in two steps. In the first step a sparse matrix of n_mers and genes is determined, so that the number of genes that contain a given n_mer can be found easily. This is done by running the getcover program with the -p option and a sequence file in FASTA format as input.
The second step is to determine the optimal cover with an algorithm, based on the matrix determined in the first step. For this purpose a program such as the getcover program is run with the matrix as input. However, programs performing similar functions and for executing similar steps may be readily designed by those of skill in the art.
Obtaining good oligonucleotide cover of the transcriptome, 1. All 4' n-mers are generated and the expected melting temperature is calculated. n-mers with a melting temperature below 60 C or with high self-hybridisation energy are removed from the set. This gives a list of n-mers that have acceptable physical proper-ties.
2. A list of gene sequences representing the human transcriptome is extracted from the ENSEMBL database.
3. Start of the main loop: Given the n-mer and gene list a sparse matrix of n-mers versus genes is generated by identifying all n-mers in a given gene and storing the result in a matrix.
4. If this is the first iteration, a copy of the matrix is put aside, and named the "total n-mer/gene matrix".
5. The n-mer that covers most genes is identified and the number of genes it covers is stored as "max_gene".
6. The coverage of the remaining genes in the matrix is determined and genes with coverage of at least 80% of max_gene are stored in the "n-mer list with good cover-age".
7. The optimal n-mer is the one where the product of its current coverage and the total coverage is maximal.
8. The optimal n-mer is deleted from the n-mer list (step 1).
9. The genes covered by this n-mer are deleted from the gene list (step 2).
10. The n-mer is added to the optimal n-mer list, the process is continued from step 3 until no more n-mers can be found.
The program code ("getcover" version 1.0 by Niels Tolstrup 2003) for calculation of a multi-probe dataset is listed in Fig. 17. It consists of three proprietary modules:
getcover.c, dyp.c, dyp. h The program also incorporate four modules covered by the GNU Lesser General Public Li-cence:
getopt.c, getopt.h, getoptl.c, getopt init.c /* Copyright (C) 1987,88,89,90,91,92,93,94,95,96,98,99,2000,2001 Free Software Foundation, Inc.
These files are part of the GNU C Library. The GNU C Library is free software;
you can redis-tribute it and/or modify it under the terms of the GNU Lesser General Public License as pub-lished by the Free Software Foundation */
The software was compiled with aap. The main.aap file used to make the program is like-wise listed in Fig. 17.
To run the compiled program the following command is used:
getcover -1 8,9 -b bad.lst -p -f < h_sap_cdna_50_1050.fasta >
h_sap_cdna_50_1050_I9.stat getcover -1 8,9 -b bad.lst -s < h_sap_cdna_50_1050_I9.stat >
h_sap_cdna_50_1050_I9.cover The computer program was used with instructions for implementing the algorithm described above to analyze the human transcriptome with the following parameter settings:
L89: probe length = 8 or 9 nucleotides ii: inclusion fraction = 100%
d15: delta Tm required for target duplex against self duplex = 15 C
t62: minimum Tm for target duplex = 62 C
c: complementary target sequence used as well m80: optimal probes selected among the most general probes addressing the remaining tar-gets with the product rule and the 80% rule n: LNA nucleotides were preferably included in the central part of the recognition segment b: bad.lst is a list of oligos that are known experimentally to be bad and must be deselected;
and resulted in the identification of a database of multi-probe target sequences.
Target sequences in this database are exemplary optimal targets for a multi-probe library.
These optimal multi-probes are listed in TABLE 1 below and comprise 5' fluorescein fluoro-phores and 3' Eclipse or other quenchers (see below).
TABLE 1 Dual label oligonucleotide probes cagcctcc cagagcca agctgtga aggaggga aggaggag ctggaagc cagagagc tgtggaga cccaggag cagccaga tgaggaga ctggggaa ctccagcc cttctggg acagtgga ctcctgca ctcctcca ttctgcca acagccat tgaggtgg ctgctgcc aggagaga tttctcca aaggcagc ctccagca ttcctg ca cagtggtg ctgtggca ctgctggg tttgggga aaagggga agaagggc cttcctgg caggcaga tgtgggaa tggatgga acagcagc ctgtgcca actgggaa ttctggca cagctcca ttccctgg tcacagga cagaaggc ccccaccc aaccccat ttcctccc atcccaga tggtggtg ctgcccag aggtggaa caggtgct ttcctcca ctgaggca tgtggaca ctgtctcc ctgctcca ctgctggt tggaggcc tgctgtga tggagaga cagtgcca atggtgaa agctggat aaggcaga atggggaa ctggaagg tggagagc cagccagg agggagag caggcagc cttggtgg cagcagga ctctgcca tcaggagc caccttgg ctgtgctg ctgctgag acacacac cagccacc agaggaga ccctccca catcttca ctgtgacc ctgtggct aggaggca cacctgca agggggaa cagtggct cactgcca ccagggcc tgggacca ttctccca ctgtgtgg cagaggca acagggaa cctggagc ttcccagt ctgggact ctgggcaa cccagcag tccagtgt ctgcctgt ctggagga ttctcctg ctcctccc tggaaggc tccactgc cttcctgc cttcccca ctgtgcct ctgccacc ccacctcc ctctgcca ctgtgctc acagcctca ttcctctg cagcaggt ctgtgagc ctgtggtc tggtgatg ctccatcc tcctcctc cttcaggc tgtggctg tgctgtcc ctcagcca tctgggtc cttctccc tcctctcc ctcttccc cttggagc ctgcctcc ctctgcct ctgggcac ccaggctc ctccttcc ctggctgc tgggcatc tctctggt tcctgctc ccgccgcc ctctggct cttgggct catcctcc ctcctcct tgctgggc ctgccatc aggagctg cagcctgg ctgctctc cactggga tcctgctg cagcagcc ctggagtc tgccctga ctcctcca tgctggag cttcagcc ttggtggt ccagccag cttcctcc cttccagc ttgggact cagcccag ttcctggc tccaggtc ctgctgga ctccacca tcctcagc cagcatcc caggagct ctccagcc aggagcag cagaggct ctcagcct tggctctg ccaggagg ctgccttc ttctggct caggcagc cagcctcc ctgggaga ctgtctgc ctgcctct agctggag cccagccc ctgtccca cttctgcc ctgctgcc cagctccc tctgccca ctgctccc tggctgtg ccagccgc ctggacac tggtggaa cctggaga cctcagcc ttgccatc agctggga ccagggcc tcctcttct cttcccct ctgcttcc ccaccacc ctggctcc cttgggca cagcaggc tctgctgc ccagggca ttctggtc tctggagc cagccacc ctccacct ccgccgcc catccagc cagaggag ctgcccca cttcttctc atggctgc ctctcctc tgggcagc ttccctcc ctcctgcc caggagcc ctggtctc ttcctcaga tggtggcc tctggtcc ctggggcc tccaaggc ctggggct ctgtctcc cagtggca ttggggtc ttgccatc cttcccct cttgggca ttctggtc cttcttctc ttccctcc ttcctcaga tccaaggc ttggggtc These hyper-abundant 9-mer and 8-mer sequences fulfil the selection criteria in Fig. 2., i.e., = each probe target occurs in at least 6% of the sequences in the human transcriptome (i.e., more than 2200 target sequences each, more than 800 sequences targeted within 5 1000 nt proximal to the 3' end of the transcript).
= they are not self complementary (i.e. unlikely to form probe duplexes).
Self score is at least 10 below Tm estimate for the duplex formed with the target.
= the formed duplex with their target sequence has a Tm at or above 60 OC.
10 They cover > 98 % of the mRNAs in the human transcriptome when combined.
Especially preferred versions of the multi-probes of table 1 are presented in the following table la:
TABLE la LNA substituted oligonucleotides cAgCCTCc cAGAGCCa aGCTGTGa aGGAGGGa aGGAGGAg cTGGAAGc cAGAGAGc tGTGGAGa ccCAGGAg cAGCCAGa tGAGGAGa ctGGGGAa cTCCAgCc cTTCTGGg aCAGTGGa cTCCtGCa cTCCTCCa tTCTGCCa aCAGCCAt tGAGGtGg cTgCTGCc aGGAGAGa tTTCTCCa aAGGCAGc cTCCAGCa tTCCTGCa cAGTGGTg ctGTGGCa cTGCTGgg tTTGGGGa aAAGGGGa aGAAGGGc cTTCCTGg cAGGCAGa tGTGGGAa tGGATGGa aCAGCAGc ctGTGCCa aCTGGGAa tTCTGGCa caGCTCCa tTCCCTGg tCACAGGa cAGAAGGc cCCCACCc aACCCCAt tTCCTCCc aTCCCAGa tGGTGGTg ctGCCCag aGGTGGAa cAGGtGCt tTCCTCCa cTGAGGCa tGTGGACa cTGTCTCc cTGCTCCa cTGCtGGt tGGAGgCc tGCTGTGa tGGAGAGa cAGtGCCa atGGTGAA aGCTGGAt aAGGCAGa aTGGGGAa cTGGAAGg tGGAGAGc cAGCcAGg aGGGAGAg cAGGcAGc cTTGGTGg cAGCAGGa cTCtGCCa tCAGGaGc cACCTTGg cTGTGCTg cTGCTGAg aCACACAC cAgCCACc aGAGGAGa cCCtCCCa cATCTTCA cTGTGACc ctGTGGCt aGGAGGca cACCtGCa aGGGGGAa caGTGGCt cACtGCCa cCAGgGcc tGgGACCa tTCTCCCa cTGTGTGg cAGAGGCa aCAGGGAa cTGgcTGC cAGCAGGC cAGCATCC tCTGCCCA
ccGCCgCC cTGCCTCT cAGAGGCT cTGGACAC
cTCCTCCT cTCCACCT cATCCTCC tCAgCAGC
cTGGAGGA cTCCTCCC cTCTGCCT tTCTTGGC
caGCcTGG cTTCCCCA cAGTGGCA cggCGGCA
cAGcAGCC cTTCAGCC cAGCACCC cTGGTGGT
cTTCCTCC cTCTGCCA cTCTCCTC cCTTCTCC
ccAGGAGG cTTCTGCC tCTGgTCC cCTCTTCC
cAGCcTCC cAGCAGGT cAGGAGCC tGTTGCCA
aGcTGGAG tcTGGAGC cTGTCTCC tGGaTGGC
cTGcTGcC cTGCCCCA cTGGGACT cCAGCATC
tGGcTGTG cATCCAGC cTGCCTGT tCTTCTTCT
cCTGGAGa aTGGcTGC tGGaAGGC tcgCCGCC
cCAGGGcC cTCCTGCC cTGTGCCT tGCTGTTC
cCACCACC cTGGGGcc cTGTGCTC tCAAGGGC
acAGCCTCA cTCCATCC cTGTGAGC tgCTGCTC
cAGAGGAG cTGGGCAA cTCTTCCC tcGCCGTC
tGcTGGAG cCAGCCGC cTGGGCAC tTGATGCC
aGGAGcAG tGGTGGcc tGGGCATC cCTTCAGC
aGGaGCTG cTGGGGCT tCCTCCTC aTTCCAGC
tCCTGCTG cTGCTCCC cTCTGGCT tTGATGGC
cCTGGAGC tGCTGTCC tgcTGGGC cCAGTTCC
cTCCTCCA tCCTCTCC cTCAGCCA tTGGCTTC
cCAGCCAG tGGTGGAA cTGCTCTC tTGCCTTC
cCCAGCAG aGCTGGGA cTGGAGTC aTGGCTTC
tTCTCCTG cTGGTCTC cTGTGGTC cACCCGCT
cAGCCCAG tTCCCAGT cTTCAGGC tCTTTGCC
cTTCCTGC tCCTCTTCT tCTGGGTC cTGGTTGC
cTCCACCA tCCAGTGT cTTGGAGC tGGACACC
cTTCCAGC tGGGcAGC cCAGGCTC tcGTCGCC
cCCAGCCC cCAGGGCA tCTCTGGT cCATCAGC
cTGCCTTC cTGGCTCC CTTGGGCT tGGTGGAT
cTCCAGCC tCTGcTGC cTGCCATC aTGGTGGT
cCACCTCC cAGCCACC cACTGGGA cCtGGTGC
tTCCTCTG tTCcTGGC tGCCCTGa tCCTCGTC
tGGCTCTG tCCTCAGC tTGGTGGT tTCTTGCC
tGGTGATG cTCCTTCC tTGGGACT tGGgCTTC
tGTGGcTG cTGGGAGA cTGCTGGA tGATGAGC
cTTCTCCC tCCTGCTC cAGGaGCT tCCTggCC
cTGCCTCC cAGGcAGC cTCAGCCT cCTCCTTC
cAGCTCCC tCCACTGC tTCTGGCT tGCTGGAG
cTGCTTCC cTGCCACC cTGTCTGC
ccTCAGCC tCcAGGTC cTGTCCCA
- wherein small letters designate deoxyribonucleotides and capital letters designate LNA
nucleotides.
> 95.0 % of the mRNA sequences are targeted within the 1000 nt near their 3'terminal, (position 50 to 1050 from 3' end) and > 95% of the mRNA contain the target sequence for more than one probe in the library. More than 650,000 target sites for these 100 multi-probes were identified in the human transcriptome containing 37,347 nucleic acid sequences.
The average number of multi-probes addressing each transcript in the transcriptome is 17.4 and the median value is target sites for 14 different probes.
The sequences noted above are also an excellent choice of probes for other transcriptomes, though they were not selected to be optimized for the particular organisms. We have thus evaluated the coverage of the above listed library for the mouse and rat genome despite the fact that the above probes were designed to detect/characterize/quantify the transcripts in the human transcriptome only. E.g. see table 2.
TABLE 2 Transcriptome Human probe library Human Mouse Rat no. of mRNA sequences 37347 32911 28904 Coverage of full length mRNAs 96.7% 94.6% 93.5%
Coverage 1000 nt near the 3'-end 91.0% - -At least covered by two probes 89.8% 80.2% 77.0%
nt - nucleotides.
Expected coverage of human transcriptome by frequently occurring 9-mer oligonucleotides Experimental pilot data (similar to Fig. 6) indicated that it is possible to reduce the length of the recognition sequence of a dual-labelled probe for real-time PCR assays to 8 or 9 nucleo-tides depending on the sequence, if the probe is enhanced with LNA. The unique duplex sta-bilizing properties of LNA are necessary to ensure an adequate stability for such a short du-plex (i.e. Tm > 60 OC). The functional real-time PCR probe will be almost pure LNA with 6 to LNA nucleotides in the recognition sequence. However, the short recognition sequence 10 makes it possible to use the same LNA probe to detect and quantify the abundance of many different genes. By proper selection of the best (i.e. most common) 8 or 9-mer recognition sequences according to the algorithm depicted in Fig. 2 it is possible to get a coverage of the human transcriptome containing about 37347 mRNAs (Fig. 3).
Fig. 3 shows the expected coverage as percentage of the total number of mRNA
sequences in the human transcriptome that are detectable within a 1000 nt long stretch near the 3' end of the respective sequences (i.e. the sequence from 50 nt to 1050 nt from the 3' end) by opti-mized probes of different lengths. The probes are required to be sufficiently stable (Tm>60 degC) and with a low propensity for forming self duplexes, which eliminate many 9-mers and even more 8-mer probe sequences.
If all probes sequences of a given length could be used as probes we would obviously get the best coverage of the transcriptome by the shortest possible probe sequences.
This is indeed the case when only a limited number of probes (< 55) are included in the library (Fig. 4).
However, because many short probes with a low GC content have an inadequate thermal stability, they were omitted from the library. The limited diversity of acceptable 8-mer probes are less efficient at detecting low GC content genes, and a library composed of 100 different 9-mer probes consequently have a better coverage of the transcriptome than a similar library of 8-mers. However, the best choice is a mixed library composed of sequences of different lengths such as the proposed best mode library listed above. The coverage of this library is not shown in Fig. 4.
The designed probe library containing 100 of the most commonly occurring 9-mer and 8-mers, i.e., the "Human mRNA probe library" can be handled in a convenient box or microtiter plate format.
The initial set of 100 probes for human mRNAs can be modified to generate similar library kits for transcriptomes from other organisms (mouse, rat, Drosophila, C.
elegans, yeast, Arabidopsis, zebra fish, primates, domestic animals, etc.). Construction of these new probe libraries will require little effort, as most of the human mRNA probes may be re-used in the novel library kits (TABLE 2).
Number of probes in the library that target each gene Not only does the limited number of probes in the proposed libraries target a large fraction (> 98%) of the human transcriptome, but there is also a large degree of redundancy in that most of the genes (almost 95%) may be detected by more than one probe. More than 650,000 target sites have been identified in the human transcriptome (37347 genes) for the 100 probes in the best mode library shown above. This gives an average number of target sites per probe of 6782 (i.e. 18 % of the transcriptome) ranging from 2527 to 12066 se-quences per probe. The average number of probes capable of detecting a particular gene is 17.4, and the median value is 14. Within the library of only 100 probes we thus have at least 14 probes for more than 50% of all human mRNA sequences.
The number of genes that are targeted by a given number of probes in the library is depicted in Fig. 4.
Design of 9-mer probes to demonstrate feasibility The SSA4 gene from yeast (Saccharomyces cerevisiae) was selected for the expression as-says because the gene transcription level can be induced by heat shock and mutants are available where expression is knocked out. Three different 9mer sequences were selected amongst commonly occurring 9mer sequences within the human transcriptome (Table 3).
The sequences were present near the 3' terminal end of 1.8 to 6.4 % of all mRNA sequences within the human transcriptome. Further selection criteria were a moderate level of self-com-plementarity and a Tm of 60 C or above. All three sequences were present within the termi-nal 1000 bases of the SSA4 ORF. Three 5' nuclease assay probes were constructed by syn-thesizing the three sequences with a FITCH fluorophore in the 5'-end and an Eclipse quencher (Epoch Biosciences) in the 3'end. The probes were named according to their position within 5 the ORF YER103W (SSA4) where position 1201 was set to be position 1. Three sets of primer pairs were designed to produce three non-overlapping amplicons, which each contained one of the three probe sequences. Amplicons were named according to the probe sequence they encompassed.
Table 3. Designed 5' nuclease assay probes and primers Sequence Name of Forward primer sequen- Reverse primer se- Amplicon probe ce quence length aaGGAGAAG Dual-label- cgcgtttactttgaaaaatt gcttccaatttcctggca 81 bp led-469 ctg tc (SEQ ID NO: 1) (SEQ ID NO: 2) cAAGGAAAg Dual-la- gcccaagatgctataaatt- gggtttgcaacaccttct 95 bp belled-570 ggttag agttc (SEQ ID NO: 3) (SEQ ID NO: 4) ctGGAGCaG Dual-label- tacggagctgcaggtggt gttgggccgttgtctggt 86 bp led-671 (SEQ ID NO: 5) (SEQ ID NO: 6) 10 bp - base pairs Two Molecular Beacons were also designed to detect the SSA4 469- and the SSA4 570 se-quence and named Beacon-469 and Beacon-570, respectively. The sequence of the beacon was CAAGGAGAAGTTG (SEQ ID NO: 7, 10-mer recognition site) which should enable this oligonucleotide to form the intramolecular beacon structure with a stem formed by the 15 LNA-LNA interactions between the 5'-CAA and the TTG-3'. The sequence of the beacon was CAAGGAAAGttG (9-mer recognition site) where the intramolecular beacon struc-ture may form between the 5'-CAA and the ttG-3'. Both the sequences were synthesized with a fluorescein fluorophore in the 5'-end and a Dabcyl quencher in the 3'end.
One SYBR Green labelled probe was also designed to detect the SSA4 570 sequence and 20 named SYBR-Probe-570. The sequence of this probe was CAAGGAAaG. This probe was syn-thesized with an amino-C6 linker on the 5'-end on which the fluorophore SYBR
Green 101 (Molecular Probes) was attached according to the manufactures instructions.
Upon hybridiza-tion to the target sequence, the linker attached fluorophore should intercalate in the genera-ted LNA-DNA duplex region causing increased fluorescence from the SYBR Green 101.
TABLE 4: SEQUENCES
EQ Name Type Sequence Position Number in gene Dual-labelled-13992 469 5' nuclease assay probe 5'-Fluor-aaGGAGAAG-Eclipse-3' 469-477 Dual-labelled-13994 570 5' nuclease assay probe 5'-Fluor-cAAGGAAAg-Eclipse-3' 570-578 Dual-labelled-13996 671 5' nuclease assay probe 5'-Fluor-ctGGAGCaG-Eclipse-3' 671-679 13997 Beacon-469 Molecular Beacon 5'-FI uor-CAAGGAGAAGTTG-Da bcyl -3' (5'-Fluor-SEQ ID NO: 8-Dabcyl-3') 14148 Beacon-570 Molecular Beacon 5'-Fluor-CAAGGAAAGttG-Dabcyl-3' (5'-Fluor-SEQ ID NO: 9-Dabcyl-3') SYBR-Probe-14165 570 SYBR-Probe 5'-SYBR101-NH2C6-cAAGGAAAg-3' 14012 SSA4-469-F Primer cgcgtttactttgaaaaattctg (SEQ ID NO:
10) 14013 SSA4-469-R Primer gcttccaatttcctggcatc (SEQ ID NO: 11) 14014 SSA4-570-F Primer gcccaagatgctataaattggttag (SEQ ID
NO: 12) 14015 SSA4-570-R Primer gggtttgcaacaccttctagttc (SEQ ID NO:
13) 14016 SSA4-671-F Primer tacggagctgcaggtggt (SEQ ID NO: 14) 14017 SSA4-671-R Primer gttgggccgttgtctggt (SEQ ID NO: 15) 14115 POL5-469-F Primer gcgagagaaaacaagcaagg (SEQ ID NO:
16) 14116 POL5-469-R Primer attcgtcttcactggcatca (SEQ ID NO: 17) 14117 APG9-570-F Primer cagctaaaaatgatgacaataatgg (SEQ ID
NO: 18) 14118 APG9-570-R Primer attacatcatgattagggaatgc (SEQ ID NO:
19) 14119 HSP82-671-F Primer gggtttgaacattgatgagga (SEQ ID NO:
20) 14120 HSP82-671-R Primer ggtgtcagctggaacctctt (SEQ ID NO: 21) Synthesis, deprotection and purification of dual labelled oligonucleotides The dual labelled oligonucleotides EQ13992 to EQ14148 (Table 4) were prepared on an automated DNA synthesizer (Expedite 8909 DNA synthesizer, PerSeptive Biosystems, 0.2 mol scale) using the phosphoramidite approach (Beaucage and Caruthers, Tetrahedron Lett.
22: 1859-1862, 1981) with 2-cyanoethyl protected LNA and DNA phosphoramidites, (Sinha, et al., Tetrahedron Lett.24: 5843-5846, 1983). CPG solid supports were derivatized with either eclipse quencher (EQ13992-EQ13996) or dabcyl (EQ13997-EQ14148) and 5'-fluorescein phosphoramidite (GLEN Research, Sterling, Virginia, USA). The synthesis cycle was modified for LNA phosphoramidites (250s coupling time) compared to DNA
phosphoramidites. 1H-tetrazole or 4,5-dicyanoimidazole (Proligo, Hamburg, Germany) was used as activator in the coupling step.
The oligonucleotides were deprotected using 32% aqueous ammonia (lh at room tempera-ture, then 2 hours at 60 C) and purified by HPLC (Shimadzu-SpectraChrom series; XterraTM
RP18 column, 10?m 7.8 x 150 mm (Waters). Buffers: A: 0.05M Triethylammonium acetate pH 7.4. B. 50% acetonitrile in water. Eluent: 0-25 min: 10-80% B; 25-30 min:
80% B). The composition and purity of the oligonucleotides were verified by MALDI-MS
(PerSeptive Bio-system, Voyager DE-PRO) analysis, see Table 5. Fig. 5 is the MALDI-MS spectrum of EQ13992 showing [M-H]- = 4121,3 Da. This is a typical MALDI-MS spectrum for the 9-mer probes of the invention.
TABLE 5:
EQ# Sequences MW (Calc.) MW (Found) 13992 5'-Fitc-aaGGAGAAG-EQL-3' 4091,8 Da. 4091,6 Da.
13994 5'-Fitc-cAAGGAAAg-EQL-3' 4051,9 Da. 4049,3 Da.
13996 5'-Fitc-ctGGAGmCaG-EQL-3' 4020,8 Da. 4021,6 Da.
5'- Fitc-mCAAGGAGAAGTTG-dabcy/-3' 13997 (5'-Fitc-SEQ ID NO: 22-dabcyl-3') 5426,3 Da. 5421,2 Da.
Capitals designate LNA monomers (A, G, mC, T), where mC is LNA methyl cytosine. Small letters designate DNA monomers (a, g, c, t). Fitc = Fluorescein; EQL = Eclipse quencher;
Dabcyl = Dabcyl quencher. MW = Molecular weight.
Production of cDNA standards of SSA4 for detection with 9-mer probes The functionality of the constructed 9mer probes were analysed in PCR assays where the probes ability to detect different SSA4 PCR amplicons were questioned.
Template for the PCR
reaction was cDNA obtained from reverse transcription of cRNA produced from in vitro tran-scription of a downstream region of the SSA4 gene in the expression vector pTRIamp18 (Am-bion). The downstream region of the SSA4 gene was cloned as follows:
PCR amplification Amplification of the partial yeast gene was done by standard PCR using yeast genomic DNA
as template. Genomic DNA was prepared from a wild type standard laboratory strain of Sac-charomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham Biosciences) according to supplier's instructions. In the first step of PCR amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker se-quence were used. In this step 20 bp was added to the 3'-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing a poly-T20 tail and a restriction enzyme site. The SSA4 amplicon contains 729 bp of the SSA4 ORF plus a 20 bp universal linker sequence and a poly-A20 tail.
The PCR primers used were:
YER103W-For-SacI: acgtgagctcattgaaactgcaggtggtattatga (SEQ ID NO: 23) YER103W-Rev-Uni: gatccccgggaattgccatgctaatcaacctcttcaaccgttgg (SEQ ID NO: 24) Uni-polyT-BamHI: acgtggatccttttttttttttttttttttgatccccgggaattgccatg (SEQ ID
NO: 25).
Plasmid DNA constructs The PCR amplicon was cut with the restriction enzymes, EcoRI + BamHI. The DNA
fragment was ligated into the pTRIamp18 vector (Ambion) using the Quick Ligation Kit (New England Biolabs) according to the supplier's instructions and transformed into E. coli DH-5 by stan-dard methods.
DNA sequencing To verify the cloning of the PCR amplicon, plasmid DNA was sequenced using M13 forward and M13 reverse primers and analysed on an ABI 377.
In vitro transcription SSA4 cRNA was obtained by performing in vitro transcription with the Megascript T7 kit (Am-bion) according to the supplier's instructions.
Reverse transcription Reverse transcription was performed with lpg of cRNA and 0.2 U of the reverse transcriptase Superscript II RT (Invitrogen) according to the suppliers instructions except that 20 U Supe-rase-In (RNAse inhibitor - Ambion) was added. The produced cDNA was purified on a QiaQuick PCR purification column (Qiagen) according to the supplier's instructions using the supplied EB-buffer for elution. The DNA concentration of the eluted cDNA was measured and diluted to a concentration of SSA4 cDNA copies corresponding to 2 x 10' copies pr pL.
Protocol for of dual label probe assays Reagents for the dual label probe PCRs were mixed according to the following scheme (Table 6):
Table 6 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mg2+ 5.5 mM
DNTP 0.2 mM
Dual Label Probe 0.1 or 0.3 pM*
Template 1 pL
Forward primer 0.2 pM
Reverse primer 0.2 pM
AmliTaq Gold 2.5 U
Total 50 pL
*) Final concentration of 5' nuclease assay probe 0.1 pM and Beacon/SYBR-probe 0.3 pM.
In the present experiments 2 x 10' copies of the SSA4 cDNA was added as template. Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle pro-5 tocols:
Table 7 5' nuclease assays Beacon & SYBR-probe Assays 95 C for 7 minutes 95 C for 7 minutes & &
40 cycles of: 40 cycles of:
94 C for 20 seconds 94 C for 30 seconds 60 C for 1 minute 52 C for 1 minute*
Fluorescence detection Fluorescence detection 72 C for 30 seconds * For the Beacon-570 with 9-mer recognition site the annealing temperature was reduced to 10 The composition of the PCR reactions shown in Table 6 together with PCR
cycle protocols listed in Table 7 will be referred to as standard 5' nuclease assay or standard Beacon assay conditions.
Specificity of 9-mer 5' nuc%ase assay probes 15 The specificity of the 5' nuclease assay probes were demonstrated in assays where each of the probes was added to 3 different PCR reactions each generating a different SSA4 PCR am-plicon. As shown in Fig. 6, each probe only produces a fluorescent signal together with the amplicon it was designed to detect (see also Figs. 10, 11 and 12). Importantly the different probes had very similar cycle threshold Ct values (from 23.2 to 23.7), showing that the as-20 says and probes have a very equal efficiency. Furthermore it indicates that the assays should detect similar expression levels when used in used in real expression assays.
This is an im-portant finding, because variability in performance of different probes is undesirable.
Specificity of 9 and 10-mer Molecular Beacon probes The ability to detect in real time, newly generated PCR amplicons was also demonstrated for the molecular beacon design concept. The Molecular Beacon designed against the 469 ampli-con with a 10-mer recognition sequence produced a clear signal when the SSA4 cDNA tem-plate and primers for generating the 469 amplicon were present in the PCR, Fig. 7A. The observed Ct value was 24.0 and very similar to the ones obtained with the 5' nuclease assay probes again indicating a very similar sensitivity of the different probes. No signal was pro-duced when the SSA4 template was not added. A similar result was produced by the Molecu-lar Beacon designed against the 570 amplicon with a 9-mer recognition sequence, Fig. 7B.
EXAMPLE 11.
Specificity of 9-mer SYBR-probes.
The ability to detect newly generated PCR amplicons was also demonstrated for the SYBR-probe design concept. The 9-mer SYBR-probe designed against the 570 amplicon of the SSA4 cDNA produced a clear signal when the SSA4 cDNA template and primers for genera-ting the 570 amplicon were present in the PCR, Fig. 8. No signal was produced when the SSA4 template was not added.
Quantification of transcript copy number The ability to detect different levels of gene transcripts is an essential requirement for a probe to perform in a true expression assay. The fulfilment of the requirement was shown by the three 5' nuclease assay probes in an assay where different levels of the expression vector derived SSA4 cDNA was added to different PCR reactions together with one of the 5' nuclease assay probes (Fig. 9). Composition and cycle conditions were according to standard 5' nucle-ase assay conditions.
The cDNA copy number in the PCR before start of cycling is reflected in the cycle threshold value Ct, i.e., the cycle number at which signal is first detected. Signal is here only defined as signal if fluorescence is five times above the standard deviation of the fluorescence detected in PCR cycles 3 to 10. The results show an overall good correlation between the logarithm to the initial cDNA copy number and the Ct value (Fig. 9). The correlation appears as a straight line with slope between -3.456 and -3.499 depending on the probe and correlation coeffi-cients between 0.9981 and 0.9999. The slope of the curves reflect the efficiency of the PCRs with a 100% efficiency corresponding to a slope of -3.322 assuming a doubling of amplicon in each PCR cycle. The slopes of the present PCRs indicate PCR efficiencies between 94% and 100%. The correlation coefficients and the PCR efficiencies are as high as or higher than the values obtained with DNA 5' nuclease assay probes 17 to 26 nucleotides long in detection assays of the same SSA4 cDNA levels (results not shown). Therefore these results show that the three 9-mer 5' nuclease assay probes meet the requirements for true expression probes indicating that the probes should perform in expression profiling assays Detection of SSA4 transcription levels in yeast Expression levels of the SSA4 transcript were detected in different yeast strains grown at different culture conditions ( heat shock). A standard laboratory strain of Saccharomyces cerevisiae was used as wild type yeast in the experiments described here. A
SSA4 knockout mutant was obtained from EUROSCARF (accession number Y06101). This strain is here re-ferred to as the SSA4 mutant. Both yeast strains were grown in YPD medium at 30 C till an OD600 of 0.8 A. Yeast cultures that were to be heat shocked were transferred to 40 C for 30 minutes after which the cells were harvested by centrifugation and the pellet frozen at -80 C. Non-heat shocked cells were in the meantime left growing at 30 C for 30 minutes and then harvested as above.
RNA was isolated from the harvested yeast using the FastRNA Kit (Bio 101) and the FastPrep machine according to the supplier's instructions.
Reverse transcription was performed with 5 pg of anchored oligo(dT) primer to prime the reaction on lpg of total RNA, and 0.2 U of the reverse transcriptase Superscript II RT (Invi-trogen) according to the suppliers instructions except that 20 U Superase-In (RNAse inhibitor - Ambion) was added. After two-hours of incubation, enzyme inactivation was performed at 70 for 5 minutes. The cDNA reactions were diluted 5 times in 10 mM Tris buffer pH 8.5 and oligonucleotides and enzymes were removed by purification on a MicroSpinT" S-column (Amersham Pharmacia Biotech). Prior to performing the expression assay the cDNA
was diluted 20 times. The expression assay was performed with the Dual-labelled-570 probe using standard 5' nuclease assay conditions except 2 pL of template was added.
The template was a 100 times dilution of the original reverse transcription reactions. The four different cDNA templates used were derived from wild type or mutant with or without heat shock. The assay produced the expected results (Fig. 10) showing increased levels of the SSA4 transcript in heat shocked wild type yeast (Ct =26.1) compared to the wild type yeast that was not submitted to elevated temperature (Ct =30.3). No transcripts were detected in the mutant yeast irrespective of culture conditions. The difference in Ct values of 3.5 corresponds to a 17 fold induction in the expression level of the heat shocked versus the non-heat shocked wild type yeast and this value is close to the values around 19 reported in the literature (Causton, et al. 2001). These values were obtained by using the standard curve obtained for the Dual-labelled-570 probe in the quantification experiments with known amounts of the SSA4 transcript (see Fig. 9). The experiments demonstrate that the 9-mer probes are capable of detecting expression levels that are in good accordance with published results.
Multiple transcript detection with individual 9-mer probes To demonstrate the ability of the three 5' nuclease assay probes to detect expression levels of other genes as well, three different yeast genes were selected in which one of the probe sequences was present. Primers were designed to amplify a 60-100 base pair region around the probe sequence. The three selected yeast genes and the corresponding primers are shown in Table.
Design of alternative expression assays Sequence/Name Matching Probe Forward primer Reverse primer Amplicon sequence sequence length YEL055C/POL5 Dual-labelled- gcgagagaaaaca- attcgtcttcactggcatca 94 bp 469 agcaagg (SEQ ID NO: 27) (SEQ ID NO: 26) YDL149W_APG9 Dual-labelled- cagctaaaaatgat- attacatcatgattaggga- 97 bp 570 gacaataatgg atgc (SEQ ID NO: 28) (SEQ ID NO: 29) YPL240C_HSP82 Dual-labelled- gggtttgaacattg- ggtgtcagctggaacctctt 88 bp 671 atgagga (SEQ ID NO: 31) (SEQ ID NO: 30) Total cDNA derived from non-heat shocked wild type yeast was used as template for the ex-pression assay, which was performed using standard 5' nuclease assay conditions except 2 pL of template was added. As shown in Fig. 11, all three probes could detect expression of the genes according to the assay design outlined in Table 8. Expression was not detected with any other combination of probe and primers than the ones outlined in Table 8. Expres-sion data are available in the literature for the SSA4, POL5, HSP82, and the APG9 (Holstege, et al. 1998). For non-heat shocked yeast, these data describe similar expression levels for SSA4 (0.8 transcript copies per cell), POL5 (0.8 transcript copies per cell) and HSP82 (1.3 transcript copies per cell) whereas APG9 transcript levels are somewhat lower (0.1 transcript copies per cell).
This data is in good correspondence with the results obtained here since all these genes showed similar Ct values except HSP82, which had a Ct value of 25.6. This suggests that the HSP82 transcript was more abundant in the strain used in these experiments than what is indicated by the literature. Agarose gel electrophoresis was performed with the PCRs shown in Fig. 11a for the Dual-labelled-469 probe. The agarose gel (Fig. 12) shows that PCR product was indeed generated in reactions where no signal was obtained and therefore the lack fluo-rescent signal from these reactions was not caused by failure of the PCR.
Furthermore, the different length of amplicons produced in expression assays for different genes indicate that the signal produced in expression assays for different genes are indeed specific for the gene in question.
Selection of targets Using the EnsMart software release 16.1 from http://www.ensembl.org/EnsMart, the 50 bases from each end off all exons from the Homo Sapiens NCBI 33 dbSNP115 Ensembl Genes were extracted to form a Human Exon50 target set. Using the GetCover program (cf. Fig.
17), occurrence of all probe target sequences was calculated and probe target sequences not passing selection criteria according to excess self-Complementarity, excessive GC content etc. were eliminated. Among the remaining sequences, the most abundant probe target sequences was selected (No. 1, covering 3200 targets), and subsequently all the probe targets having a prevalence above 0.8 times the prevalence of the most abundant (3200 x 0.8) or above 2560 targets. From the remaining sample the number of new hits for each probe was computed and the product of number of new hits per probe target compared to the existing selection and the total prevalence of the same probe target was computed and 5 used to select the next most abundant probe target sequence by selecting the highest product number. The probe target length (n), and sequence (nmer) and occurrence in the total target (cover), as well as the number of new hits per probe target selection (Newhit), the product of Newhit and cover (newhit x cover) and the number of accumulated hits in the target population from all accumulated probes (sum) is exemplified in the table below.
No n nmer Newhit Cover newhit x cover sum 1 8 ctcctcct 3200 3200 10240000 3200 2 8 ctggagga 2587 3056 7905872 5787 3 8 aggagctg 2132 3074 6553768 7919 4 8 cagcctgg 2062 2812 5798344 9981 8 cagcagcc 1774 2809 4983166 11755 6 8 tgctggag 1473 2864 4218672 13228 7 8 agctggag 1293 2863 3701859 14521 8 8 ctgctgcc 1277 2608 3330416 15798 9 8 aggagcag 1179 2636 3107844 16977 8 ccaggagg 1044 2567 2679948 18021 11 8 tcctgctg 945 2538 2398410 18966 12 8 cttcctcc 894 2477 2214438 19860 13 8 ccgccgcc 1017 2003 2037051 20877 14 8 cctggagc 781 2439 1904859 21658 8 cagcctcc 794 2325 1846050 22452 16 8 tggctgtg 805 2122 1708210 23257 17 8 cctggaga 692 2306 1595752 23949 18 8 ccagccag 661 2205 1457505 24610 19 8 ccagggcc 578 2318 1339804 25188 8 cccagcag 544 2373 1290912 25732 21 8 ccaccacc 641 1916 1228156 26373 22 8 ctcctcca 459 3010 1381590 26832 23 8 ttctcctg 534 1894 1011396 27366 24 8 cagcccag 471 2033 957543 27837 8 ctggctgc 419 2173 910487 28256 26 8 ctccacca 426 2097 893322 28682 27 8 cttcctgc 437 1972 861764 29119 28 8 cttccagc 415 1883 781445 29534 29 8 ccacctcc 366 2018 738588 29900 8 ttcctctg 435 1666 724710 30335 31 8 cccagccc 354 1948 689592 30689 32 8 tggtgatg 398 1675 666650 31087 33 8 tggctctg 358 1767 632586 31445 34 8 ctgccttc 396 1557 616572 31841 No n nmer Newhit Cover newhit x cover sum 35 8 ctccagcc 294 2378 699132 32135 36 8 tgtggctg 304 1930 586720 32439 37 8 cagaggag 302 1845 557190 32741 38 8 cagctccc 275 1914 526350 33016 39 8 ctgcctcc 262 1977 517974 33278 40 8 tctgctgc 267 1912 510504 33545 41 8 ctgcttcc 280 1777 497560 33825 42 8 cttctccc 291 1663 483933 34116 43 8 cctcagcc 232 1863 432216 34348 44 8 ctccttcc 236 1762 415832 34584 45 8 cagcaggc 217 1868 405356 34801 46 8 ctgcctct 251 1575 395325 35052 47 8 ctccacct 215 1706 366790 35267 48 8 ctcctccc 205 1701 348705 35472 49 8 cttcccca 224 1537 344288 35696 50 8 cttcagcc 203 1650 334950 35899 51 8 ctctgcca 201 1628 327228 36100 52 8 ctgggaga 192 1606 308352 36292 53 8 cttctgcc 195 1533 298935 36487 54 8 cagcaggt 170 1711 290870 36657 55 8 tctggagc 206 1328 273568 36863 56 8 tcctgctc 159 1864 296376 37022 57 8 ctggggcc 159 1659 263781 37181 58 8 ctcctgcc 155 1733 268615 37336 59 8 ctgggcaa 185 1374 254190 37521 60 8 ctggggct 149 1819 271031 37670 61 8 tggtggcc 145 1731 250995 37815 62 8 ccagggca 147 1613 237111 37962 63 8 ctgctccc 146 1582 230972 38108 64 8 tgggcagc 135 1821 245835 38243 65 8 ctccatcc 161 1389 223629 38404 66 8 ctgcccca 143 1498 214214 38547 67 8 ttcctggc 155 1351 209405 38702 68 8 atggctgc 157 1285 201745 38859 69 8 tggtggaa 155 1263 195765 39014 70 8 tgctgtcc 135 1424 192240 39149 No n nmer Newhit Cover newhit x cover sum 71 8 ccagccgc 159 1203 191277 39308 72 8 catccagc 122 1590 193980 39430 73 8 tcctctcc 118 1545 182310 39548 74 8 agctggga 121 1398 169158 39669 75 8 ctggtctc 128 1151 147328 39797 76 8 ttcccagt 142 1023 145266 39939 77 8 caggcagc 108 1819 196452 40047 78 8 tcctcagc 105 1654 173670 40152 79 8 ctggctcc 103 1607 165521 40255 80 9 tcctcttct 127 1006 127762 40382 81 8 tccagtgt 123 968 119064 40505 qPCR for Human Genes Use of the Probe library is coupled to the use of a real-time PCR design software which can:
= recognise an input sequence via a unique identifier or by registering a submitted nucleic acid sequence = identify all probes which can target the nucleic acid = sort probes according to target sequence selection criteria such as proximity to the 3' end or proximity to intron-exon boundaries = if possible, design PCR primers that flank probes targeting the nucleic acid sequence according to PCR design rules = suggest available real-time PCR assays based on above procedures.
The design of an efficient and reliable qPCR assay for a human gene is carried out via the software found on www.probelibrary.com The ProbeFinder software designs optimal qPCR probes and primers fast and reliably for a given human gene.
The design comprises the following steps:
1) Determination of the intron positions Noise from chromosomal DNA is eliminated by selecting intron spanning qPCR's.
Introns are determined by a blast search against the human genome. Regions found on the DNA, but not in the transcript are considered to be introns.
2) Match of the Probe Library to the gene Virtually all human transcripts are covered by at least one of the 90 probes, the high coverage is made possible by LNA modifications of the recognition sequence tags.
3) Design of primers and selection of optimal qPCR assay Primers are designed with 'Primer3' (Whitehead Inst. For Biomedical Research, S. Rozen and H.J. Skaletsky). Finally the probes are ranked according to selected rules ensuring the best possible qPCR. The rules favour intron spanning amplicons to remove false sig-nals from DNA contamination, amplicons that will not amplify off target genomic sequence or other transcripts as found by an in silico PCR search, small amplicon size for reproducible and comparable assays and a GC content optimized for PCR.
Preparation of ena-monomers and oligomers ENA-T monomers are prepared and used for the preparation of dual labelled probes of the invention.
In the following sequences the X denotes a 2'-O,4'-C-ethylene-5-methyluridine (ENA-T). The synthesis of this monomer is described in WO 00/47599. The reaction conditions for incor-poration of a 5'-O-Dimethoxytrityl-2'-O,4'-C-ethylene-5-methyluridine-3'-O-(2-cyanoethyl-N,N-diisopropyl)phosphoramidite corresponds to the reaction conditions for the preparation of LNA oligomers as described in EXAMPLE 6.
The following three dual labelled probes are prepared:
EQ# Sequences MW (Calc.) MW (Found) 16533 5'-Fitc-ctGmCXmCmCAg-EQL-3' 4002 Da. 4001 Da.
16534 5'-Fitc-cXGmCXmCmCA-EQL-3' 3715 Da. 3716 Da.
16535 5'-Fitc-tGGmCGAXXX-EQL-3' 4128 Da. 4130 Da.
X designates ENA-T monomer. Small letters designate DNA monomers (a, g, c, t).
Fitc =
Fluorescein; EQL = Eclipse quencher; Dabcyl = Dabcyl quencher. MW = Molecular weight.
Capital letters other than 'X' designate methyloxy LNA nucleotides.
5 Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 9):
Table 9 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mga+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
17302 Q4 Dual Label Probe 0.1 pM
15319 Oligo Template 4 pM
15321 Forward primer 0.2 pM
15322 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates in Table 10 were included in the above 10 mentioned PCR mix from Table 9;
Table 10 Name Sequence Quencher 15321 Forward Primer gactcacggtcgcacca (SEQ ID NO: 47) -15322 Reverse Primer ccgcgttccacggtta (SEQ ID NO: 48) -17302 Q4 Dual Label Probe 5' 6-Fitc-tTmCmCTmCTG#Q4z 3' Q4 15319 Oligo Template attgactcacggtcgcaccaaattcctctgccttcctgctctgctgg gagaaggaggtggtgatgtggctggaaggaggcagctccagg agaaaataaccgtggaacgcggtcat (SEQ ID NO: 49) -LNA nucleotides are in capital letters;
6-Fitc: Fluorescein 6-isothiocyanate;
#Q4: 1,4-Bis(2-hydroxyethylamino)-6-methylanthraquinone, cf. Example 21 which also shows preparation of a 2-cyanoethyl protected phosphoramidite version of this molecule for use in the general method in Example 6, i.e. of 1-(4-(2-(2-cyanoethoxy(diisopropylamino) phosphinoxy)ethyl)phenylamino)-4-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone;
z: 2'-deoxy-5-nitroindole-ribofuranosyl;
mC: 5-methylcytosin.
The 17302 Q4 dual label probe is prepared as generally described in Example 6.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol (Table 11):
Table 11 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Fig. 18, which shows that the dual labelled probe with the quencher Q4 is fully functional as a real time PCR probe.
Dual labelled probe functionality in real time PCR
Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 12):
Table 12 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mga+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
15305 Q1 Dual Label Probe 0.1 pM
15319 Oligo Template 4 pM
15321 Forward primer 0.2 pM
15322 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates in Table 13 were included in the above mentioned PCR mix from Table 12.
Table 13 Name Sequence Quencher 15321 Forward Primer gactcacggtcgcacca (SEQ ID NO: 47) -15322 Reverse Primer ccgcgttccacggtta (SEQ ID NO: 48) -15305 Q1 Dual Label Probe 5' 6-Fitc-tTmCmCTmCTG#Q1z 3' Q1 15319 Oligo Template attgactcacggtcgcaccaaattcctctgccttcct gctctgctgggagaaggaggtggtgatgtggctg gaaggaggcagctccaggagaaaataaccgtgg aacgcggtcat (SEQ ID NO: 49) -* LNA nucleotides are in capital letters; 6-Fitc: Fluorescein 6-isothiocyanate;
#Q1: 1,4-Bis(3-hydroxypropylamino)-anthraquinone, cf. Example 20 which also shows preparation of a 2-cyanoethyl protected phosphoramidite version of this molecule (1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone) for use in the general method in Example 6;
z: 2'-deoxy-5-nitroindole-ribofuranosyl;
mC: 5-methylcytosin.
The 15305 Q1 dual label probe is prepared as described in Example 6.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol:
Table 14 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Figure 19, which shows that the dual labelled probe with a 3'-Nitroindole is fully functional as a real time PCR probe.
Preparation of 1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone (3) OH 0 0 HN~~OH
I \ ~ - _ ~ \ I \ - -OH 0 0 HN,,,-,~OH
O HN"~\ODMT 0 HN"~~ODMT
~/ ~/ --- ~/ I/ ~N~\
O HN0 ,,,-~,,OH 0 HO'p-, O
1,4-Bis(3-hydrox)(propylamino)-anthraquinone (1) Leucoquinizarin (9.9 g; 0.04 mol) is mixed with 3-amino-l-propanol (10 mL) and Ethanol (200 mL) and heated to reflux for 6 hours. The mixture is cooled to room temperature and stirred overnight under atmospheric conditions. The mixture is poured into water (500 mL) and the precipitate is filtered off washed with water (200 mL) and dried. The solid is boiled in ethylacetate (300 mL), cooled to room temperature and the solid is collected by filtration.
Yield: 8.2 g (56%) 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone (2) 1,4-Bis(3-hydroxypropylamino)-anthraquinone (7.08 g; 0.02 mol) is dissolved in a mixture of dry N,N-dimethylformamide (150 mL) and dry pyridine (50 mL).
Dimethoxytritylchloride (3.4 g; 0.01 mol) is added and the mixture is stirred for 2 hours. Additional dimethoxytritylchloride (3.4 g; 0.01 mol) is added and the mixture is stirred for 3 hours. The mixture is concentrated under vacuum and the residue is re-dissolved in dichloromethane (400 mL) washed with water (2 x 200 ml) and dried (Na2SO4). The solution is filtered through a silica gel pad (o 10 cm; h 10 cm) and eluted with dichloromethane until mono-DMT-anthraquinone product begins to elude where after the solvent is the changed to 2%
methanol in dichloromethane. The pure fractions are combined and concentrated resulting in a blue foam.
Yield: 7.1 g (54%) 1H-NMR(CDCI3): 10.8 (2H, 2xt, J= 5.3 Hz, NH), 8.31 (2H, m, AqH), 7.67 (2H, dt, J= 3.8 and 9.4, AqH), 7.4-7.1 (9H, m, ArH + AqH), 6.76 (4H, m, ArH) 3.86 (2H, q, J=
5.5Hz, CHZOH), 3.71 (6H, s, CH3), 3.54 (4H, m, NCH2), 3.26 (2H, t, J= 5.7 Hz, CH2ODMT), 2.05 (4H, m, 5 CCH2C), 1.74 (1H, t, J= 5 Hz, OH).
1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone (3) 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone (0.66 g; 1.0 mmol) is dissolved in dry dichloromethane (100 mL) and added 3A
molecular sieves.
10 The mixture is stirred for 3 hours and then added 2-cyanoethyl-N,N,N',N'-tetraisopropylphosphordiamidite (335 mg; 1.1 mmol) and 4,5-dicyanoimidazole (105 mg;
0.9 mmol). The mixture is stirred for 5 hours and then added sat. NaHCO3 (50 mL) and stirred for 10 minutes. The phases are separated and the organic phase is washed with sat.
NaHCO3 (50 mL), brine (50 mL) and dried (Na2SO4). After concentration the phosphoramidite 15 is obtained as a blue foam and is used in oligonucleotide synthesis without further purification.
Yield: 705 mg (82 %) 31P-NMR (CDCI3): 150.0 1H-NMR(CDCI3): 10.8 (2H, 2xt, J= 5.3 Hz, NH), 8.32 (2H, m, AqH), 7.67 (2H, m, AqH), 7.5-20 7.1 (9H, m, ArH + AqH), 6.77 (4H, m, ArH) 3.9-3.75 (4H, m), 3.71 (6H, s, OCH3), 3.64-3.52 (3.54 (6H, m), 3.26 (2H, t, J= 5.8 Hz, CH2ODMT), 2.63 (2H, t, J= 6.4 Hz, CH2CN) 2.05 (4H, m, CCH2C), 1.18 (12H, dd, 3 3.1 Hz, CCH3).
Preparation of 1-(4-(2-(2-cyanoethoxy(dfisopropylamino)phosphinoxy)ethyl)phenylamino)-4-25 (4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone (13) OH
O o I
C 0 +
1 ~OH
\ I ODMT \ I ODMT
O HN O HN
-- ~ ~ ~ ~ _~ ~ ~ ~ ~
O HN I o 0 HN I o ~~
N
/ OH 0. P.O,-,,_,CN
6-methyl-Quinizarin (10) 4-methyl-phthalic anhydride (10 g, 62 mmol), p-chlorophenol (3.6 g, 28 mmol) and Boric acid (1.6 g) were dissolved in concentrated H2SO4 (34 ml) and the mixture was stirred at 5 200 C for 6 hours in a flask covered with a glass plate. After completion of the reaction, the mixture was allowed to cool and then poured into water (160 ml) and the precipitate collected by filtration. The solid was suspended in boiling water (320 ml) and boiled for 5 min, whereupon the solid was collected by filtration. The product was obtained as a dark red solid (5 g, 19.7 mmol) after drying. MALDI-MS: m/z 255.7 (M+H).
10 1 4-Bis(4-(2-hydroxyethyl)phenylamino)-6-methyl-anthraquinone (11) 6-methyl-quinizarin (10, 2.5g) is suspended in acetic acid (30ml), Zn-dust (2g) is added and the mixture is stirred at 90 C for 1h. The mixture is then filtered through a pad of celite, cooled to room temperature and water (90ml) is added and the reduced anthraquinone derivative can then be collected by filtration. The solid is then mixed with boric acid (1.9 g;
0.03 mol) and ethanol (100 mL) and refluxed for 1 hour. The mixture is cooled to room temperature and added 4-aminophenethyl alcohol (4.1 g; 0.03 mol) where after the mixture is heated to reflux for 3 days. The mixture concentrated redissolved in dichloromethane (300 mL) washed with water (3 x 100 mL), dried (Na2SO4) and concentrated. The residue is purified on silica gel column with MeOH/dichloromethane. Yield: 1.5 g (30%).
1-(4-(2-(4.4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-6(7)-methyl-anthraquinone (12) 1,4-Bis(4-(2-hydroethyl)phenylamino)-6-methyl-anthraquinone (0.95 g; 1.9 mmol) is dissolved in dry pyridine (30 mL). Dimethoxytritylchloride (0.34g; 1 mmol) is added and the mixture is stirred for 2 hours. Additional dimethoxytritylchloride (0.34g; 1 mmol) is added and the mixture is stirred for 4 hours. The mixture is concentrated under vacuum and the residue is redissolved in dichloromethane (200 mL) washed with water (2 x 100 ml) and dried (Na2SO4). The product is purified by column chromatography (toluene/EtoAc). Yield:
0.81 g (54%).
1-(4-(2-(2-cyanoethoxy(diisopropylamino)phosphinoxy)ethyl)phenylamino)-4-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone (13) 1-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-6(7)-methyl-anthraquinone (0.50 g; 0.63 mmol) is dissolved in dry dichloromethane (50 mL) and added 3A molecular sieves. The mixture is stirred for 3 hours and then added 2-cyanoethyl-N,N,N',N'-tetraisopropylphosphordiamidite (215 mg; 0.72 mmol) and 4,5-dicyanoimidazole (64 mg; 0.55 mmol). The mixture is stirred for 4 hours and then added sat.
NaHCO3 (25 mL) and stirred for 10 minutes. The phases are separated and the organic phase is washed with sat. NaHCO3 (25 mL), brine (25 mL) and dried (NaZSO4). The phosphoramidite is then evaporated to dryness and used in oligonucleotide synthesis without further purification. Yield: 0.59 g (94%).
Snp Detection Using A Library Of Probes Single Nucleotide polymorphisms (SNPs) are the most common type of genetic variants in the human and other genomes. Detection of SNPs using dual labelled probes can be done by simultaneously using 2 differently labelled probes, which each hybridize specifically to one SNP allele. The result of the real time PCR will hence indicate the presence of one or the other or both alleles in the sample. As sample can be used either genomic DNA
or RNA.
SNPs occur almost randomly and it is expected that almost any sequence context can exist in many permutations as a result of SNPs and currently over 2 million SNPs are known. Hence to have all relevant probes on stock for supplying or generating SNP detection assays, millions of probes would be needed.
Relevant for the present invention, due to the short probes enabled by the use of LNA, this number can be reduced by using LNA-containing 8 or 9-mer probes.
Theoretically, 49 or 262144 possible 9-mers and 48 or 65536 8-mers can exist and would be necessary to cover any possible SNP sequence. Still an advantage of LNA-containing oligo's is an increased specificity, allowing the SNP-position in the probe to be placed at any position in the probe.
Hence, each probe can cover 9 different SNP positions, which would reduce the need for 8-mer sequences from 65536 to 65536/9= 7281. Detection can also occur at both strands, hence only 7281/2=3640 probes are needed.
SNP discrimination example - demonstrating single mismatch discrimination by dual labelled probe in real time PCR.
Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 15):
Table 15 Reagents Final Concentration GeneAmp lOx PCR buffer II lx MgZ+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
13996 Dual Label Probe 0.1 pM
Oligo Template 40 fM
(14229 or 14226) 14117 Forward primer 0.2 pM
14118 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates were included in the above mentioned PCR mix (Table 15).
Table 16 Name Sequence 14117 Forward Primer cagctaaaaatgatgacaataatgg 14118 Reverse Primer attacatcatgattagggaatgc 13996 Dual Label Probe 5' 6-Fitc-ctGGAGmCaG-EQL 3' 14229 Single Mismatch Oligo Template cagctaaaaatgatgacaataatgggctaacggagaa gcgggagcagatcggcattccctaatcatgatgtaat 14226 Perfect Match Oligo cagctaaaaatgatgacaataatgggctaaaggagaa Template gctggagcagatcggcattccctaatcatgatgtaat LNA's in capital letters; 6-Fitc: Fluorescein 6-isothiocyanate; EQL: Eclipse Tm Dark Quencher (Epoch Biosciences); mC: 5-methylcytosin.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol:
Table 17 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Figure 20, which shows that the dual labelled probe is able to discriminate between a perfectly matching target and a target having a single mismatch relative to the probe.
REFERENCES AND NOTES
1. Helen C. Causton, Bing Ren, Sang Seok Koh, Christopher T. Harbison, Elenita Kanin, Ezra G. Jennings, Tong Ihn Lee, Heather L. True, Eric S. Lander, and Richard A. Young (2001). Remodelling of Yeast Genome Expression in Response to Environmental 5 Changes. Mol. Biol. Cell 12:323-337 (2001).
2. Frank C. P. Holstege, Ezra G. Jennings, John J. Wyrick, Tong Ihn Lee, Christoph J. Hen-gartner, Michael R. Green, Todd R. Golub, Eric S. Lander, and Richard A. Young (1998).
Dissecting the Regulatory Circuitry of a Eukaryotic Genome. Cell 1998 95: 717-728.
3. Simeonov, Anton and Theo T. Nikiforov, Single nucleotide polymorphism genotyping 10 using short, fluorescently labelled locked nucleic acid (LNA) probes and fluorescence polarization detection, Nucleic Acid Research, 2002, Vol.30 No 17 e 91.
Variations, modifications, and other implementations of what is described herein will occur to those skilled in the art without departing from the spirit and scope of the invention as descri-bed and claimed herein and such variations, modifications, and implementations are encom-15 passed within the scope of the invention.
The references, patents, patent applications, and international applications disclosed above are incorporated by reference herein in their entireties.
DEMANDE OU BREVET VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.
NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME
NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:
NOTE POUR LE TOME / VOLUME NOTE:
94 C for 20 seconds 94 C for 30 seconds 60 C for 1 minute 52 C for 1 minute*
Fluorescence detection Fluorescence detection 72 C for 30 seconds * For the Beacon-570 with 9-mer recognition site the annealing temperature was reduced to 10 The composition of the PCR reactions shown in Table 6 together with PCR
cycle protocols listed in Table 7 will be referred to as standard 5' nuclease assay or standard Beacon assay conditions.
Specificity of 9-mer 5' nuc%ase assay probes 15 The specificity of the 5' nuclease assay probes were demonstrated in assays where each of the probes was added to 3 different PCR reactions each generating a different SSA4 PCR am-plicon. As shown in Fig. 6, each probe only produces a fluorescent signal together with the amplicon it was designed to detect (see also Figs. 10, 11 and 12). Importantly the different probes had very similar cycle threshold Ct values (from 23.2 to 23.7), showing that the as-20 says and probes have a very equal efficiency. Furthermore it indicates that the assays should detect similar expression levels when used in used in real expression assays.
This is an im-portant finding, because variability in performance of different probes is undesirable.
Specificity of 9 and 10-mer Molecular Beacon probes The ability to detect in real time, newly generated PCR amplicons was also demonstrated for the molecular beacon design concept. The Molecular Beacon designed against the 469 ampli-con with a 10-mer recognition sequence produced a clear signal when the SSA4 cDNA tem-plate and primers for generating the 469 amplicon were present in the PCR, Fig. 7A. The observed Ct value was 24.0 and very similar to the ones obtained with the 5' nuclease assay probes again indicating a very similar sensitivity of the different probes. No signal was pro-duced when the SSA4 template was not added. A similar result was produced by the Molecu-lar Beacon designed against the 570 amplicon with a 9-mer recognition sequence, Fig. 7B.
EXAMPLE 11.
Specificity of 9-mer SYBR-probes.
The ability to detect newly generated PCR amplicons was also demonstrated for the SYBR-probe design concept. The 9-mer SYBR-probe designed against the 570 amplicon of the SSA4 cDNA produced a clear signal when the SSA4 cDNA template and primers for genera-ting the 570 amplicon were present in the PCR, Fig. 8. No signal was produced when the SSA4 template was not added.
Quantification of transcript copy number The ability to detect different levels of gene transcripts is an essential requirement for a probe to perform in a true expression assay. The fulfilment of the requirement was shown by the three 5' nuclease assay probes in an assay where different levels of the expression vector derived SSA4 cDNA was added to different PCR reactions together with one of the 5' nuclease assay probes (Fig. 9). Composition and cycle conditions were according to standard 5' nucle-ase assay conditions.
The cDNA copy number in the PCR before start of cycling is reflected in the cycle threshold value Ct, i.e., the cycle number at which signal is first detected. Signal is here only defined as signal if fluorescence is five times above the standard deviation of the fluorescence detected in PCR cycles 3 to 10. The results show an overall good correlation between the logarithm to the initial cDNA copy number and the Ct value (Fig. 9). The correlation appears as a straight line with slope between -3.456 and -3.499 depending on the probe and correlation coeffi-cients between 0.9981 and 0.9999. The slope of the curves reflect the efficiency of the PCRs with a 100% efficiency corresponding to a slope of -3.322 assuming a doubling of amplicon in each PCR cycle. The slopes of the present PCRs indicate PCR efficiencies between 94% and 100%. The correlation coefficients and the PCR efficiencies are as high as or higher than the values obtained with DNA 5' nuclease assay probes 17 to 26 nucleotides long in detection assays of the same SSA4 cDNA levels (results not shown). Therefore these results show that the three 9-mer 5' nuclease assay probes meet the requirements for true expression probes indicating that the probes should perform in expression profiling assays Detection of SSA4 transcription levels in yeast Expression levels of the SSA4 transcript were detected in different yeast strains grown at different culture conditions ( heat shock). A standard laboratory strain of Saccharomyces cerevisiae was used as wild type yeast in the experiments described here. A
SSA4 knockout mutant was obtained from EUROSCARF (accession number Y06101). This strain is here re-ferred to as the SSA4 mutant. Both yeast strains were grown in YPD medium at 30 C till an OD600 of 0.8 A. Yeast cultures that were to be heat shocked were transferred to 40 C for 30 minutes after which the cells were harvested by centrifugation and the pellet frozen at -80 C. Non-heat shocked cells were in the meantime left growing at 30 C for 30 minutes and then harvested as above.
RNA was isolated from the harvested yeast using the FastRNA Kit (Bio 101) and the FastPrep machine according to the supplier's instructions.
Reverse transcription was performed with 5 pg of anchored oligo(dT) primer to prime the reaction on lpg of total RNA, and 0.2 U of the reverse transcriptase Superscript II RT (Invi-trogen) according to the suppliers instructions except that 20 U Superase-In (RNAse inhibitor - Ambion) was added. After two-hours of incubation, enzyme inactivation was performed at 70 for 5 minutes. The cDNA reactions were diluted 5 times in 10 mM Tris buffer pH 8.5 and oligonucleotides and enzymes were removed by purification on a MicroSpinT" S-column (Amersham Pharmacia Biotech). Prior to performing the expression assay the cDNA
was diluted 20 times. The expression assay was performed with the Dual-labelled-570 probe using standard 5' nuclease assay conditions except 2 pL of template was added.
The template was a 100 times dilution of the original reverse transcription reactions. The four different cDNA templates used were derived from wild type or mutant with or without heat shock. The assay produced the expected results (Fig. 10) showing increased levels of the SSA4 transcript in heat shocked wild type yeast (Ct =26.1) compared to the wild type yeast that was not submitted to elevated temperature (Ct =30.3). No transcripts were detected in the mutant yeast irrespective of culture conditions. The difference in Ct values of 3.5 corresponds to a 17 fold induction in the expression level of the heat shocked versus the non-heat shocked wild type yeast and this value is close to the values around 19 reported in the literature (Causton, et al. 2001). These values were obtained by using the standard curve obtained for the Dual-labelled-570 probe in the quantification experiments with known amounts of the SSA4 transcript (see Fig. 9). The experiments demonstrate that the 9-mer probes are capable of detecting expression levels that are in good accordance with published results.
Multiple transcript detection with individual 9-mer probes To demonstrate the ability of the three 5' nuclease assay probes to detect expression levels of other genes as well, three different yeast genes were selected in which one of the probe sequences was present. Primers were designed to amplify a 60-100 base pair region around the probe sequence. The three selected yeast genes and the corresponding primers are shown in Table.
Design of alternative expression assays Sequence/Name Matching Probe Forward primer Reverse primer Amplicon sequence sequence length YEL055C/POL5 Dual-labelled- gcgagagaaaaca- attcgtcttcactggcatca 94 bp 469 agcaagg (SEQ ID NO: 27) (SEQ ID NO: 26) YDL149W_APG9 Dual-labelled- cagctaaaaatgat- attacatcatgattaggga- 97 bp 570 gacaataatgg atgc (SEQ ID NO: 28) (SEQ ID NO: 29) YPL240C_HSP82 Dual-labelled- gggtttgaacattg- ggtgtcagctggaacctctt 88 bp 671 atgagga (SEQ ID NO: 31) (SEQ ID NO: 30) Total cDNA derived from non-heat shocked wild type yeast was used as template for the ex-pression assay, which was performed using standard 5' nuclease assay conditions except 2 pL of template was added. As shown in Fig. 11, all three probes could detect expression of the genes according to the assay design outlined in Table 8. Expression was not detected with any other combination of probe and primers than the ones outlined in Table 8. Expres-sion data are available in the literature for the SSA4, POL5, HSP82, and the APG9 (Holstege, et al. 1998). For non-heat shocked yeast, these data describe similar expression levels for SSA4 (0.8 transcript copies per cell), POL5 (0.8 transcript copies per cell) and HSP82 (1.3 transcript copies per cell) whereas APG9 transcript levels are somewhat lower (0.1 transcript copies per cell).
This data is in good correspondence with the results obtained here since all these genes showed similar Ct values except HSP82, which had a Ct value of 25.6. This suggests that the HSP82 transcript was more abundant in the strain used in these experiments than what is indicated by the literature. Agarose gel electrophoresis was performed with the PCRs shown in Fig. 11a for the Dual-labelled-469 probe. The agarose gel (Fig. 12) shows that PCR product was indeed generated in reactions where no signal was obtained and therefore the lack fluo-rescent signal from these reactions was not caused by failure of the PCR.
Furthermore, the different length of amplicons produced in expression assays for different genes indicate that the signal produced in expression assays for different genes are indeed specific for the gene in question.
Selection of targets Using the EnsMart software release 16.1 from http://www.ensembl.org/EnsMart, the 50 bases from each end off all exons from the Homo Sapiens NCBI 33 dbSNP115 Ensembl Genes were extracted to form a Human Exon50 target set. Using the GetCover program (cf. Fig.
17), occurrence of all probe target sequences was calculated and probe target sequences not passing selection criteria according to excess self-Complementarity, excessive GC content etc. were eliminated. Among the remaining sequences, the most abundant probe target sequences was selected (No. 1, covering 3200 targets), and subsequently all the probe targets having a prevalence above 0.8 times the prevalence of the most abundant (3200 x 0.8) or above 2560 targets. From the remaining sample the number of new hits for each probe was computed and the product of number of new hits per probe target compared to the existing selection and the total prevalence of the same probe target was computed and 5 used to select the next most abundant probe target sequence by selecting the highest product number. The probe target length (n), and sequence (nmer) and occurrence in the total target (cover), as well as the number of new hits per probe target selection (Newhit), the product of Newhit and cover (newhit x cover) and the number of accumulated hits in the target population from all accumulated probes (sum) is exemplified in the table below.
No n nmer Newhit Cover newhit x cover sum 1 8 ctcctcct 3200 3200 10240000 3200 2 8 ctggagga 2587 3056 7905872 5787 3 8 aggagctg 2132 3074 6553768 7919 4 8 cagcctgg 2062 2812 5798344 9981 8 cagcagcc 1774 2809 4983166 11755 6 8 tgctggag 1473 2864 4218672 13228 7 8 agctggag 1293 2863 3701859 14521 8 8 ctgctgcc 1277 2608 3330416 15798 9 8 aggagcag 1179 2636 3107844 16977 8 ccaggagg 1044 2567 2679948 18021 11 8 tcctgctg 945 2538 2398410 18966 12 8 cttcctcc 894 2477 2214438 19860 13 8 ccgccgcc 1017 2003 2037051 20877 14 8 cctggagc 781 2439 1904859 21658 8 cagcctcc 794 2325 1846050 22452 16 8 tggctgtg 805 2122 1708210 23257 17 8 cctggaga 692 2306 1595752 23949 18 8 ccagccag 661 2205 1457505 24610 19 8 ccagggcc 578 2318 1339804 25188 8 cccagcag 544 2373 1290912 25732 21 8 ccaccacc 641 1916 1228156 26373 22 8 ctcctcca 459 3010 1381590 26832 23 8 ttctcctg 534 1894 1011396 27366 24 8 cagcccag 471 2033 957543 27837 8 ctggctgc 419 2173 910487 28256 26 8 ctccacca 426 2097 893322 28682 27 8 cttcctgc 437 1972 861764 29119 28 8 cttccagc 415 1883 781445 29534 29 8 ccacctcc 366 2018 738588 29900 8 ttcctctg 435 1666 724710 30335 31 8 cccagccc 354 1948 689592 30689 32 8 tggtgatg 398 1675 666650 31087 33 8 tggctctg 358 1767 632586 31445 34 8 ctgccttc 396 1557 616572 31841 No n nmer Newhit Cover newhit x cover sum 35 8 ctccagcc 294 2378 699132 32135 36 8 tgtggctg 304 1930 586720 32439 37 8 cagaggag 302 1845 557190 32741 38 8 cagctccc 275 1914 526350 33016 39 8 ctgcctcc 262 1977 517974 33278 40 8 tctgctgc 267 1912 510504 33545 41 8 ctgcttcc 280 1777 497560 33825 42 8 cttctccc 291 1663 483933 34116 43 8 cctcagcc 232 1863 432216 34348 44 8 ctccttcc 236 1762 415832 34584 45 8 cagcaggc 217 1868 405356 34801 46 8 ctgcctct 251 1575 395325 35052 47 8 ctccacct 215 1706 366790 35267 48 8 ctcctccc 205 1701 348705 35472 49 8 cttcccca 224 1537 344288 35696 50 8 cttcagcc 203 1650 334950 35899 51 8 ctctgcca 201 1628 327228 36100 52 8 ctgggaga 192 1606 308352 36292 53 8 cttctgcc 195 1533 298935 36487 54 8 cagcaggt 170 1711 290870 36657 55 8 tctggagc 206 1328 273568 36863 56 8 tcctgctc 159 1864 296376 37022 57 8 ctggggcc 159 1659 263781 37181 58 8 ctcctgcc 155 1733 268615 37336 59 8 ctgggcaa 185 1374 254190 37521 60 8 ctggggct 149 1819 271031 37670 61 8 tggtggcc 145 1731 250995 37815 62 8 ccagggca 147 1613 237111 37962 63 8 ctgctccc 146 1582 230972 38108 64 8 tgggcagc 135 1821 245835 38243 65 8 ctccatcc 161 1389 223629 38404 66 8 ctgcccca 143 1498 214214 38547 67 8 ttcctggc 155 1351 209405 38702 68 8 atggctgc 157 1285 201745 38859 69 8 tggtggaa 155 1263 195765 39014 70 8 tgctgtcc 135 1424 192240 39149 No n nmer Newhit Cover newhit x cover sum 71 8 ccagccgc 159 1203 191277 39308 72 8 catccagc 122 1590 193980 39430 73 8 tcctctcc 118 1545 182310 39548 74 8 agctggga 121 1398 169158 39669 75 8 ctggtctc 128 1151 147328 39797 76 8 ttcccagt 142 1023 145266 39939 77 8 caggcagc 108 1819 196452 40047 78 8 tcctcagc 105 1654 173670 40152 79 8 ctggctcc 103 1607 165521 40255 80 9 tcctcttct 127 1006 127762 40382 81 8 tccagtgt 123 968 119064 40505 qPCR for Human Genes Use of the Probe library is coupled to the use of a real-time PCR design software which can:
= recognise an input sequence via a unique identifier or by registering a submitted nucleic acid sequence = identify all probes which can target the nucleic acid = sort probes according to target sequence selection criteria such as proximity to the 3' end or proximity to intron-exon boundaries = if possible, design PCR primers that flank probes targeting the nucleic acid sequence according to PCR design rules = suggest available real-time PCR assays based on above procedures.
The design of an efficient and reliable qPCR assay for a human gene is carried out via the software found on www.probelibrary.com The ProbeFinder software designs optimal qPCR probes and primers fast and reliably for a given human gene.
The design comprises the following steps:
1) Determination of the intron positions Noise from chromosomal DNA is eliminated by selecting intron spanning qPCR's.
Introns are determined by a blast search against the human genome. Regions found on the DNA, but not in the transcript are considered to be introns.
2) Match of the Probe Library to the gene Virtually all human transcripts are covered by at least one of the 90 probes, the high coverage is made possible by LNA modifications of the recognition sequence tags.
3) Design of primers and selection of optimal qPCR assay Primers are designed with 'Primer3' (Whitehead Inst. For Biomedical Research, S. Rozen and H.J. Skaletsky). Finally the probes are ranked according to selected rules ensuring the best possible qPCR. The rules favour intron spanning amplicons to remove false sig-nals from DNA contamination, amplicons that will not amplify off target genomic sequence or other transcripts as found by an in silico PCR search, small amplicon size for reproducible and comparable assays and a GC content optimized for PCR.
Preparation of ena-monomers and oligomers ENA-T monomers are prepared and used for the preparation of dual labelled probes of the invention.
In the following sequences the X denotes a 2'-O,4'-C-ethylene-5-methyluridine (ENA-T). The synthesis of this monomer is described in WO 00/47599. The reaction conditions for incor-poration of a 5'-O-Dimethoxytrityl-2'-O,4'-C-ethylene-5-methyluridine-3'-O-(2-cyanoethyl-N,N-diisopropyl)phosphoramidite corresponds to the reaction conditions for the preparation of LNA oligomers as described in EXAMPLE 6.
The following three dual labelled probes are prepared:
EQ# Sequences MW (Calc.) MW (Found) 16533 5'-Fitc-ctGmCXmCmCAg-EQL-3' 4002 Da. 4001 Da.
16534 5'-Fitc-cXGmCXmCmCA-EQL-3' 3715 Da. 3716 Da.
16535 5'-Fitc-tGGmCGAXXX-EQL-3' 4128 Da. 4130 Da.
X designates ENA-T monomer. Small letters designate DNA monomers (a, g, c, t).
Fitc =
Fluorescein; EQL = Eclipse quencher; Dabcyl = Dabcyl quencher. MW = Molecular weight.
Capital letters other than 'X' designate methyloxy LNA nucleotides.
5 Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 9):
Table 9 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mga+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
17302 Q4 Dual Label Probe 0.1 pM
15319 Oligo Template 4 pM
15321 Forward primer 0.2 pM
15322 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates in Table 10 were included in the above 10 mentioned PCR mix from Table 9;
Table 10 Name Sequence Quencher 15321 Forward Primer gactcacggtcgcacca (SEQ ID NO: 47) -15322 Reverse Primer ccgcgttccacggtta (SEQ ID NO: 48) -17302 Q4 Dual Label Probe 5' 6-Fitc-tTmCmCTmCTG#Q4z 3' Q4 15319 Oligo Template attgactcacggtcgcaccaaattcctctgccttcctgctctgctgg gagaaggaggtggtgatgtggctggaaggaggcagctccagg agaaaataaccgtggaacgcggtcat (SEQ ID NO: 49) -LNA nucleotides are in capital letters;
6-Fitc: Fluorescein 6-isothiocyanate;
#Q4: 1,4-Bis(2-hydroxyethylamino)-6-methylanthraquinone, cf. Example 21 which also shows preparation of a 2-cyanoethyl protected phosphoramidite version of this molecule for use in the general method in Example 6, i.e. of 1-(4-(2-(2-cyanoethoxy(diisopropylamino) phosphinoxy)ethyl)phenylamino)-4-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone;
z: 2'-deoxy-5-nitroindole-ribofuranosyl;
mC: 5-methylcytosin.
The 17302 Q4 dual label probe is prepared as generally described in Example 6.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol (Table 11):
Table 11 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Fig. 18, which shows that the dual labelled probe with the quencher Q4 is fully functional as a real time PCR probe.
Dual labelled probe functionality in real time PCR
Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 12):
Table 12 Reagents Final Concentration GeneAmp lOx PCR buffer II lx Mga+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
15305 Q1 Dual Label Probe 0.1 pM
15319 Oligo Template 4 pM
15321 Forward primer 0.2 pM
15322 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates in Table 13 were included in the above mentioned PCR mix from Table 12.
Table 13 Name Sequence Quencher 15321 Forward Primer gactcacggtcgcacca (SEQ ID NO: 47) -15322 Reverse Primer ccgcgttccacggtta (SEQ ID NO: 48) -15305 Q1 Dual Label Probe 5' 6-Fitc-tTmCmCTmCTG#Q1z 3' Q1 15319 Oligo Template attgactcacggtcgcaccaaattcctctgccttcct gctctgctgggagaaggaggtggtgatgtggctg gaaggaggcagctccaggagaaaataaccgtgg aacgcggtcat (SEQ ID NO: 49) -* LNA nucleotides are in capital letters; 6-Fitc: Fluorescein 6-isothiocyanate;
#Q1: 1,4-Bis(3-hydroxypropylamino)-anthraquinone, cf. Example 20 which also shows preparation of a 2-cyanoethyl protected phosphoramidite version of this molecule (1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone) for use in the general method in Example 6;
z: 2'-deoxy-5-nitroindole-ribofuranosyl;
mC: 5-methylcytosin.
The 15305 Q1 dual label probe is prepared as described in Example 6.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol:
Table 14 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Figure 19, which shows that the dual labelled probe with a 3'-Nitroindole is fully functional as a real time PCR probe.
Preparation of 1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone (3) OH 0 0 HN~~OH
I \ ~ - _ ~ \ I \ - -OH 0 0 HN,,,-,~OH
O HN"~\ODMT 0 HN"~~ODMT
~/ ~/ --- ~/ I/ ~N~\
O HN0 ,,,-~,,OH 0 HO'p-, O
1,4-Bis(3-hydrox)(propylamino)-anthraquinone (1) Leucoquinizarin (9.9 g; 0.04 mol) is mixed with 3-amino-l-propanol (10 mL) and Ethanol (200 mL) and heated to reflux for 6 hours. The mixture is cooled to room temperature and stirred overnight under atmospheric conditions. The mixture is poured into water (500 mL) and the precipitate is filtered off washed with water (200 mL) and dried. The solid is boiled in ethylacetate (300 mL), cooled to room temperature and the solid is collected by filtration.
Yield: 8.2 g (56%) 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone (2) 1,4-Bis(3-hydroxypropylamino)-anthraquinone (7.08 g; 0.02 mol) is dissolved in a mixture of dry N,N-dimethylformamide (150 mL) and dry pyridine (50 mL).
Dimethoxytritylchloride (3.4 g; 0.01 mol) is added and the mixture is stirred for 2 hours. Additional dimethoxytritylchloride (3.4 g; 0.01 mol) is added and the mixture is stirred for 3 hours. The mixture is concentrated under vacuum and the residue is re-dissolved in dichloromethane (400 mL) washed with water (2 x 200 ml) and dried (Na2SO4). The solution is filtered through a silica gel pad (o 10 cm; h 10 cm) and eluted with dichloromethane until mono-DMT-anthraquinone product begins to elude where after the solvent is the changed to 2%
methanol in dichloromethane. The pure fractions are combined and concentrated resulting in a blue foam.
Yield: 7.1 g (54%) 1H-NMR(CDCI3): 10.8 (2H, 2xt, J= 5.3 Hz, NH), 8.31 (2H, m, AqH), 7.67 (2H, dt, J= 3.8 and 9.4, AqH), 7.4-7.1 (9H, m, ArH + AqH), 6.76 (4H, m, ArH) 3.86 (2H, q, J=
5.5Hz, CHZOH), 3.71 (6H, s, CH3), 3.54 (4H, m, NCH2), 3.26 (2H, t, J= 5.7 Hz, CH2ODMT), 2.05 (4H, m, 5 CCH2C), 1.74 (1H, t, J= 5 Hz, OH).
1-(3-(2-cyanoethoxy(diisopropylamino)phosphinoxy)propylamino)-4-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone (3) 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone (0.66 g; 1.0 mmol) is dissolved in dry dichloromethane (100 mL) and added 3A
molecular sieves.
10 The mixture is stirred for 3 hours and then added 2-cyanoethyl-N,N,N',N'-tetraisopropylphosphordiamidite (335 mg; 1.1 mmol) and 4,5-dicyanoimidazole (105 mg;
0.9 mmol). The mixture is stirred for 5 hours and then added sat. NaHCO3 (50 mL) and stirred for 10 minutes. The phases are separated and the organic phase is washed with sat.
NaHCO3 (50 mL), brine (50 mL) and dried (Na2SO4). After concentration the phosphoramidite 15 is obtained as a blue foam and is used in oligonucleotide synthesis without further purification.
Yield: 705 mg (82 %) 31P-NMR (CDCI3): 150.0 1H-NMR(CDCI3): 10.8 (2H, 2xt, J= 5.3 Hz, NH), 8.32 (2H, m, AqH), 7.67 (2H, m, AqH), 7.5-20 7.1 (9H, m, ArH + AqH), 6.77 (4H, m, ArH) 3.9-3.75 (4H, m), 3.71 (6H, s, OCH3), 3.64-3.52 (3.54 (6H, m), 3.26 (2H, t, J= 5.8 Hz, CH2ODMT), 2.63 (2H, t, J= 6.4 Hz, CH2CN) 2.05 (4H, m, CCH2C), 1.18 (12H, dd, 3 3.1 Hz, CCH3).
Preparation of 1-(4-(2-(2-cyanoethoxy(dfisopropylamino)phosphinoxy)ethyl)phenylamino)-4-25 (4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone (13) OH
O o I
C 0 +
1 ~OH
\ I ODMT \ I ODMT
O HN O HN
-- ~ ~ ~ ~ _~ ~ ~ ~ ~
O HN I o 0 HN I o ~~
N
/ OH 0. P.O,-,,_,CN
6-methyl-Quinizarin (10) 4-methyl-phthalic anhydride (10 g, 62 mmol), p-chlorophenol (3.6 g, 28 mmol) and Boric acid (1.6 g) were dissolved in concentrated H2SO4 (34 ml) and the mixture was stirred at 5 200 C for 6 hours in a flask covered with a glass plate. After completion of the reaction, the mixture was allowed to cool and then poured into water (160 ml) and the precipitate collected by filtration. The solid was suspended in boiling water (320 ml) and boiled for 5 min, whereupon the solid was collected by filtration. The product was obtained as a dark red solid (5 g, 19.7 mmol) after drying. MALDI-MS: m/z 255.7 (M+H).
10 1 4-Bis(4-(2-hydroxyethyl)phenylamino)-6-methyl-anthraquinone (11) 6-methyl-quinizarin (10, 2.5g) is suspended in acetic acid (30ml), Zn-dust (2g) is added and the mixture is stirred at 90 C for 1h. The mixture is then filtered through a pad of celite, cooled to room temperature and water (90ml) is added and the reduced anthraquinone derivative can then be collected by filtration. The solid is then mixed with boric acid (1.9 g;
0.03 mol) and ethanol (100 mL) and refluxed for 1 hour. The mixture is cooled to room temperature and added 4-aminophenethyl alcohol (4.1 g; 0.03 mol) where after the mixture is heated to reflux for 3 days. The mixture concentrated redissolved in dichloromethane (300 mL) washed with water (3 x 100 mL), dried (Na2SO4) and concentrated. The residue is purified on silica gel column with MeOH/dichloromethane. Yield: 1.5 g (30%).
1-(4-(2-(4.4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-6(7)-methyl-anthraquinone (12) 1,4-Bis(4-(2-hydroethyl)phenylamino)-6-methyl-anthraquinone (0.95 g; 1.9 mmol) is dissolved in dry pyridine (30 mL). Dimethoxytritylchloride (0.34g; 1 mmol) is added and the mixture is stirred for 2 hours. Additional dimethoxytritylchloride (0.34g; 1 mmol) is added and the mixture is stirred for 4 hours. The mixture is concentrated under vacuum and the residue is redissolved in dichloromethane (200 mL) washed with water (2 x 100 ml) and dried (Na2SO4). The product is purified by column chromatography (toluene/EtoAc). Yield:
0.81 g (54%).
1-(4-(2-(2-cyanoethoxy(diisopropylamino)phosphinoxy)ethyl)phenylamino)-4-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-6(7)-methyl-anthraquinone (13) 1-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-6(7)-methyl-anthraquinone (0.50 g; 0.63 mmol) is dissolved in dry dichloromethane (50 mL) and added 3A molecular sieves. The mixture is stirred for 3 hours and then added 2-cyanoethyl-N,N,N',N'-tetraisopropylphosphordiamidite (215 mg; 0.72 mmol) and 4,5-dicyanoimidazole (64 mg; 0.55 mmol). The mixture is stirred for 4 hours and then added sat.
NaHCO3 (25 mL) and stirred for 10 minutes. The phases are separated and the organic phase is washed with sat. NaHCO3 (25 mL), brine (25 mL) and dried (NaZSO4). The phosphoramidite is then evaporated to dryness and used in oligonucleotide synthesis without further purification. Yield: 0.59 g (94%).
Snp Detection Using A Library Of Probes Single Nucleotide polymorphisms (SNPs) are the most common type of genetic variants in the human and other genomes. Detection of SNPs using dual labelled probes can be done by simultaneously using 2 differently labelled probes, which each hybridize specifically to one SNP allele. The result of the real time PCR will hence indicate the presence of one or the other or both alleles in the sample. As sample can be used either genomic DNA
or RNA.
SNPs occur almost randomly and it is expected that almost any sequence context can exist in many permutations as a result of SNPs and currently over 2 million SNPs are known. Hence to have all relevant probes on stock for supplying or generating SNP detection assays, millions of probes would be needed.
Relevant for the present invention, due to the short probes enabled by the use of LNA, this number can be reduced by using LNA-containing 8 or 9-mer probes.
Theoretically, 49 or 262144 possible 9-mers and 48 or 65536 8-mers can exist and would be necessary to cover any possible SNP sequence. Still an advantage of LNA-containing oligo's is an increased specificity, allowing the SNP-position in the probe to be placed at any position in the probe.
Hence, each probe can cover 9 different SNP positions, which would reduce the need for 8-mer sequences from 65536 to 65536/9= 7281. Detection can also occur at both strands, hence only 7281/2=3640 probes are needed.
SNP discrimination example - demonstrating single mismatch discrimination by dual labelled probe in real time PCR.
Protocol for dual label probe assays Reagents for the Real Time dual label probe PCRs were mixed according to the following scheme (Table 15):
Table 15 Reagents Final Concentration GeneAmp lOx PCR buffer II lx MgZ+ 5.5 mM
dATP, dGTP, dCTP 0.2 mM
dUTP 0.6 mM
13996 Dual Label Probe 0.1 pM
Oligo Template 40 fM
(14229 or 14226) 14117 Forward primer 0.2 pM
14118 Reverse primer 0.2 pM
Uracil DNA Glycosylase 0.5 U
AmpliTaq Gold 2.5 U
Total 50 pL
The following primers, probes, and Oligo Templates were included in the above mentioned PCR mix (Table 15).
Table 16 Name Sequence 14117 Forward Primer cagctaaaaatgatgacaataatgg 14118 Reverse Primer attacatcatgattagggaatgc 13996 Dual Label Probe 5' 6-Fitc-ctGGAGmCaG-EQL 3' 14229 Single Mismatch Oligo Template cagctaaaaatgatgacaataatgggctaacggagaa gcgggagcagatcggcattccctaatcatgatgtaat 14226 Perfect Match Oligo cagctaaaaatgatgacaataatgggctaaaggagaa Template gctggagcagatcggcattccctaatcatgatgtaat LNA's in capital letters; 6-Fitc: Fluorescein 6-isothiocyanate; EQL: Eclipse Tm Dark Quencher (Epoch Biosciences); mC: 5-methylcytosin.
Assays were performed in a DNA Engine Opticon (MJ Research) using the following PCR
cycle protocol:
Table 17 37 C for 10 minutes 95 C for 7 minutes 40 cycles of: 94 C for 20 seconds 60 C for 1 minute Fluorescence detection Results from the Real Time PCR is illustrated in Figure 20, which shows that the dual labelled probe is able to discriminate between a perfectly matching target and a target having a single mismatch relative to the probe.
REFERENCES AND NOTES
1. Helen C. Causton, Bing Ren, Sang Seok Koh, Christopher T. Harbison, Elenita Kanin, Ezra G. Jennings, Tong Ihn Lee, Heather L. True, Eric S. Lander, and Richard A. Young (2001). Remodelling of Yeast Genome Expression in Response to Environmental 5 Changes. Mol. Biol. Cell 12:323-337 (2001).
2. Frank C. P. Holstege, Ezra G. Jennings, John J. Wyrick, Tong Ihn Lee, Christoph J. Hen-gartner, Michael R. Green, Todd R. Golub, Eric S. Lander, and Richard A. Young (1998).
Dissecting the Regulatory Circuitry of a Eukaryotic Genome. Cell 1998 95: 717-728.
3. Simeonov, Anton and Theo T. Nikiforov, Single nucleotide polymorphism genotyping 10 using short, fluorescently labelled locked nucleic acid (LNA) probes and fluorescence polarization detection, Nucleic Acid Research, 2002, Vol.30 No 17 e 91.
Variations, modifications, and other implementations of what is described herein will occur to those skilled in the art without departing from the spirit and scope of the invention as descri-bed and claimed herein and such variations, modifications, and implementations are encom-15 passed within the scope of the invention.
The references, patents, patent applications, and international applications disclosed above are incorporated by reference herein in their entireties.
DEMANDE OU BREVET VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.
NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME
NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME:
NOTE POUR LE TOME / VOLUME NOTE:
Claims (90)
1. A library of oligonucleotide probes wherein each probe in the library consists of a recognition sequence tag and a detection moiety wherein at least one monomer in each oli-gonucleotide probe is a modified monomer analogue, increasing the binding affinity for the complementary target sequence relative to the corresponding unmodified oligonucleotide, such that the library probes have sufficient stability for sequence-specific binding and detection of a substantial fraction of a target nucleic acid in any given target population and wherein the number of different recognition sequences comprises less than 10%
of all possible sequence tags of a given length(s), and wherein each probe contains a fluorophore-quencher pair for detection where the quencher has formula (I) wherein one or two of R1, R4, R5 and R8 independently is/are a bond or selected from a substituted or non-substituted amino group, which constitute(s) the linker(s) to the remainder of the oligonucleotide probe, and wherein the remaining R1 to R8 groups are each, independently hydrogen or substituted or non-substituted hydroxy, amino, alkyl, aryl, arylalkyl or alkoxy, and/or wherein less than 20% of the oligonucleotide probes of said library have a guanidyl (G) residue in the 5' and/or 3' position.
of all possible sequence tags of a given length(s), and wherein each probe contains a fluorophore-quencher pair for detection where the quencher has formula (I) wherein one or two of R1, R4, R5 and R8 independently is/are a bond or selected from a substituted or non-substituted amino group, which constitute(s) the linker(s) to the remainder of the oligonucleotide probe, and wherein the remaining R1 to R8 groups are each, independently hydrogen or substituted or non-substituted hydroxy, amino, alkyl, aryl, arylalkyl or alkoxy, and/or wherein less than 20% of the oligonucleotide probes of said library have a guanidyl (G) residue in the 5' and/or 3' position.
2. The library according to claim 1, wherein the quencher is selected from 1,4-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-anthraquinone, 1,5-bis-(3-hydroxy-propylamino)-anthraquinone, 1-(3-hydroxypropylamino)-5-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,4-bis-(4-(2-hydroxyethyl)phenylamino)-anthraquinone, 1-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-4-(4-(2-hydroethyl)phenylamino)-anthraquinone, 1,8-bis-(3-hydroxy-propylamino)-anthraquinone, 1,4-bis(3-hydroxypropylamino)-6-methylanthraquinone, 1-(3-(4,4'-dimethoxy-trityloxy)propylamino)-4-(3-hydroxypropylamino)-6(7)-methyl-anthraquinone, 1,4-bis(4-(2-hydroethyl)phenylamino)-6-methyl-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-carboxy-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-1-yl))carboxamido-anthraquinone, 1,4-bis(4-methyl-phenylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-l-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-carboxy-anthraquinone, 1,4-bis(propylamino)-6-(N-(6,7-dihydroxy-4-oxo-heptane-1-yl))carboxamido-anthraquinone, 1,4-bis(propylamino)-6-(N-(7-dimethoxytrityloxy-6-hydroxy-4-oxo-heptane-1-yl))carboxamido-anthraquinone, 1,5-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, 1-(4-(2-hydroethyl)phenylamino)-5-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone, 1,8-bis(3-hydroxypropylamino)-anthraquinone, 1-(3-hydroxypropylamino)-8-(3-(4,4'-dimethoxy-trityloxy)propylamino)-anthraquinone, 1,8-bis(4-(2-hydroethyl)phenylamino)-anthraquinone, and 1-(4-(2-hydroethyl)phenylamino)-8-(4-(2-(4,4'-dimethoxy-trityloxy)ethyl)phenylamino)-anthraquinone.
3. The library according to claim 1, wherein the quencher is 1,4-Bis(2-hydroxy-ethylamino)-6-methylanthraquinone.
4. The library according to any of the preceding claims, wherein less than 10%
of the oligonucleotide probes have a G in the 5' end, such as less than 5%.
of the oligonucleotide probes have a G in the 5' end, such as less than 5%.
5. The library according to claim 4, wherein none of the oligonucleotides in the library have a G in the 5' end.
6. A library of oligonucleotide probes according to any one of the preceding claims, wherein the recognition sequence tag segment of the probes in the library have been modified in at least one of the following ways:
i) substitution with at least one non-naturally occurring nucleotide ii) substitution with at least one chemical moiety to increase the stability of the probe.
i) substitution with at least one non-naturally occurring nucleotide ii) substitution with at least one chemical moiety to increase the stability of the probe.
7. A library of oligonucleotide probes according to wherein the recognition sequence tag has a length of 6 to 12 nucleotides.
8. A library of oligonucleotide probes according to claim 7, wherein the recognition sequence tag has a length of 8 or 9 nucleotides.
9. A library of oligonucleotide probes according to claim 8, wherein the recognition sequence tags are substituted with LNA nucleotides.
10. A library of oligonucleotide probes according to any one of the preceding claims, wherein more than 90% of the oligonucleotide probes can bind and detect at least two target sequences in a nucleic acid population.
11. A library according to claim 10, wherein the recognition sequence tag is complementary to at least two target sequences in the nucleic acid population.
12. A library of oligonucleotide probes of 8 and 9 nucleotides in length comprising a mix-ture of subsets of oligonucleotide probes defined in any one of claims 1-11.
13. A library of oligonucleotide probes of any one of the preceding claims, wherein the number of different target sequences in a nucleic acid population is at least 100.
14. A library of oligonucleotide probes according to any one of the preceding claims, wherein at least one nucleotide in each oligonucleotide probe is substituted with a non-natu-rally occurring nucleotide analogue, a deoxyribose or ribose analogue, or an internucleotide linkage other than a phosphodiester linkage.
15. A library of oligonucleotide probes according to any one of the preceding claims, wherein the detection moiety is a covalently or non-covalently bound minor groove binder or an intercalator selected from the group comprising asymmetric cyanine dyes, DAPI, SYBR
Green I, SYBR Green II, SYBR Gold, PicoGreen, thiazole orange, Hoechst 33342, Ethidium Bromide, 1-O-(1-pyrenylmethyl)glycerol, and Hoechst 33258.
Green I, SYBR Green II, SYBR Gold, PicoGreen, thiazole orange, Hoechst 33342, Ethidium Bromide, 1-O-(1-pyrenylmethyl)glycerol, and Hoechst 33258.
16. The library oligonucleotide probes according to claim 14 or 15, wherein the internucleo-tide linkage other than phosphodiester linkage is a non-phosphate internucleotide linkage.
17. The library of oligonucleotide probes according to claim 16, wherein the internucleotide linkage is selected from the group consisting of alkyl phosphonate, phosphoramidite, alkyl-phosphotriester, phosphorothioate, and phosphorodithioate linkages.
18. The library of oligonucleotide probes according to any one of the preceding claims, wherein said oligonucleotide probes contain non-naturally occurring nucleotides, such as 2'-O-methyl, diamine purine, 2-thio uracil, 5-nitroindole, universal or degenerate bases, inter-calating nucleic acids or minor-groove-binders, to enhance their binding to a complementary nucleic acid sequence.
19. The library according to claim 18, wherein all oligonucleotide probes contain at least one 5-nitroindole residue.
20. The library of oligonucleotide probes according to any one of the preceding claims, wherein said different recognition sequences comprise less than 1% of all possible oligonu-cleotides of a given length.
21. The library of oligonucleotide probes according to any one of the preceding claims, wherein each probe can be detected using a dual label by the molecular beacon assay princi-ple.
22. The library of oligonucleotide probes according to any one of claims 1-20, wherein each probe can be detected using a dual label by the 5' nuclease assay principle.
23. The library according to any one of the preceding claims, wherein each probe contains a single detection moiety that can be detected by the molecular beacon assay principle.
24. The library of oligonucleotide probes according to any one of the preceding claims, wherein the target nucleic acid population is an mRNA sample, a cDNA sample or a genomic DNA sample.
25. The library of oligonucleotide probes according to claim 24, wherein said target mRNA
or target cDNA population originates from the transcriptomes of human, mouse, rat, Arabidopsis thaliana, Drosophila melanogaster, Chimpanzee or Caenorhabditis elegans.
or target cDNA population originates from the transcriptomes of human, mouse, rat, Arabidopsis thaliana, Drosophila melanogaster, Chimpanzee or Caenorhabditis elegans.
26. The library of oligonucleotide probes according to any one of the preceding claims, wherein said probe target sequences occur at least once within more than 4% of different target nucleic acids in a target nucleic acid population.
27. The library of oligonucleotide probes according to any one of the preceding claims, wherein self-complementary probe sequences have been omitted from the said library.
28. The library of oligonucleotide probes according to claim 27, wherein said self-complementary sequences have been de-selected.
29. The library of oligonucleotide probes according to claim 27, wherein said self-complementary sequences have been eliminated by sequence-specific modifications, such as non-standard nucleotides, nucleotides with SBC nucleobases, 2'-O-methyl, diamine purine, 2-thio uracil, universal or degenerate bases or minor-groove-binders.
30. The library of oligonucleotide probes according to any one of the preceding claims, wherein the melting temperature (T m) of each probe is adjusted to be suitable for PCR-based assays by substitution with non-occurring modifications, such as LNA, optionally modified with SBC nucleobases, 2'-O-methyl, diamine purine, 2-thio uracil, 5-nitroindole, universal or degenerate bases, intercalating nucleic acids or minor-groove-binders, to enhance their binding to a complementary nucleic acid sequence.
31. The library of oligonucleotide probes according to any one of the preceding claims, wherein the melting temperature (T m) of each probe is at least 50°C.
32. The library of oligonucleotide probes according to any one of the preceding claims, wherein each probe has a DNA nucleotide at the 5'-end and/or has a DNA
nucleotide at the 3'-end.
nucleotide at the 3'-end.
33. The library of oligonucleotide probes according to any one of the preceding claims, wherein each probe can be detected by the molecular beacon principle.
34. The library of oligonucleotide probes according to any one of the preceding claims, wherein the target population is the human transcriptome.
35. The library of oligonucleotide probes according to any one of the preceding claims, wherein each oligonucleotide probe detects the largest possible number of different target nucleic acids resulting in maximum coverage for a given target nucleic acid population by the said library.
36. The library of oligonucleotide probes according to any one of the preceding claims, wherein the oligonucleotide probes are selected to have as many target sequences or binding sites as possible within the target population of nucleic acids in order to obtain a maximum degree of detection.
37. The library of oligonucleotide probes according to any one of the preceding claims, wherein the oligonucleotide probes are selected to have at least one target sequence in as many target nucleic acids as possible within the target population of nucleic acids in order to obtain a maximum degree of detection.
38. The library of oligonucleotide probes in TABLE 1 or TABLE 1a or Fig. 13 or Fig. 14 capa-ble of detecting the complementary sequences in any given nucleic acid population.
39. The library according to any one of the preceding claims, which comprises probes each having a recognition element listed in TABLE 1 or TABLE 1a in the specification and/or which comprises probes each having a recognition element complementary to the recognition ele-ments listed in said TABLE 1.
40. An oligonucleotide probe comprising a quencher of formula I and a 5'-nitroindole residue.
41. The oligonucleotide probe of claim 40, which is free from a 5' guanidyl residue.
42. The oligonucleotide probe of claim 40 or 41, which is as defined in any one of claims 1-9, 14-18,21-23, and 31-1.
43. The oligonucleotide probe according to any one of claims 40-42, said probe being selected from probes complementary to or identical with the sequences set forth in Table 1, Table 1A, Fig. 13, or Fig 14.
44. The oligonucleotide probe according to any one of claim 40-43, which has an exact nucleotide sequence selected from Table 1 or Table 1A.
45. A method of selecting oligonucleotide sequences useful in the library according to any one of the preceding claims, comprising a) providing a first list of all possible oligonucleotides of a predefined number of nucleotides, N, said oligonucleotides having a melting temperature, T m, of at least 50°C, b) providing a second list of target nucleic acid sequences, c) identifying and storing for each member of said first list, the number of members from said second list, which include a sequence complementary to said each member, d) selecting a member of said first list, which in the identification in step c matches the maximum number, identified in step c, of members from said second list, e) adding the member selected in step d to a third list consisting of the selected oligonucleo-tides useful in the library according to any one of the preceding claims, f) subtracting the member selected in step d from said first list to provide a revised first list, m) repeating steps d through f until said third list consists of members which together will be contemplary to at least 30% of the members on the list of target nucleic acid sequences from step b, wherein said method has a bias against including a member in the third list that have a 5' guanidyl (G) and/or a bias against including members in the third list that have a 3' guanidyl (G).
46. The method according to claim 45, wherein guanidyl is avoided as the 5' residue in all oligonucleotide sequences in said third list.
47. The method according to claim 46, wherein the avoidance of guanidyl as the 5' residue is achieved by i) reducing the list of step a to include only those that do not include a 5' guanidyl residue, and/or ii) avoiding selection in step d of those sequences which include a 5' guanidyl residue, and/or iii) omitting step e for those sequences that include a 5' guanidyl residue.
48. The method according to any one of claims 45-47, wherein T m is at least 60°.
49. The method according to any one of claims 45-48, wherein the first list of oligonucleotides only includes oligonucleotides incapable of self-hybridization.
50. The method according to any one of claims 45-49, which after step f and before step m comprises the following steps:
g) subtracting all members from said second list which include a sequence complementary to the member selected in step d to obtain a revised second list, h) identifying and storing for each member of said revised first list, the number of members from said revised second list, which include a sequence complementary to said each member, i) selecting a member of said first list, which in the identification in step h matches the maximum number, identified in step h, of members from said second list, or selecting a member of said first list that provides the maximum number obtained by multiplying the number identified in step h with the number identified in step c, j) adding the member selected in step i to said third list, k) subtracting the member selected in step i from said revised first list, and I) subtracting all members from said revised second list which include a sequence com-plementary to the member selected in step i.
g) subtracting all members from said second list which include a sequence complementary to the member selected in step d to obtain a revised second list, h) identifying and storing for each member of said revised first list, the number of members from said revised second list, which include a sequence complementary to said each member, i) selecting a member of said first list, which in the identification in step h matches the maximum number, identified in step h, of members from said second list, or selecting a member of said first list that provides the maximum number obtained by multiplying the number identified in step h with the number identified in step c, j) adding the member selected in step i to said third list, k) subtracting the member selected in step i from said revised first list, and I) subtracting all members from said revised second list which include a sequence com-plementary to the member selected in step i.
51. The method according to claim 50 insofar as it depends on claim 46, wherein the avoidance of guanidyl as the 5' residue is achieved by avoiding selection in step i of those sequences which include a 5' guanidyl residue, and/or omitting step j for those sequences that include a 5' guanidyl residue.
52. The method according to any one of claims 45-51, wherein repetition in step m is continued until said third list consists of members which together will be contemplary to at least 85% of the members on the list of target nucleic acid sequences from step b.
53. The method according to any one of claims 45-52, wherein, after selection of the first member of said third list, the selection in step d after step c is preceded by identification of those members of said first list which hybridizes to more than a selected percentage of the maximum number of members from said second list so that only those members so identified are subjected to the selection in step d.
54. The method according to claim 53, wherein the selected percentage is 80%.
55. The method according to any one of claims45-54, wherein it is ensured that members are not entered on the third list if such members have previously failed qualitative as useful probes.
56. The method according to claim 55, wherein oligonucleotide sequences that have previously failed qualitatively are not included in the third list by i) reducing the list of step a to include only those that have not previously failed qualitatively, and/or ii) avoiding selection in step d or i of those sequences that have not previously failed qualitatively, and/or iii) omitting step e or j for those sequences that have not previously failed qualitatively.
57. The method according to any one of claims 45-56, wherein N is an integer selected from 6, 7, 8, 9, 10, 11, and 12.
58. The method according to claim 57, wherein N is 8 or 9.
59. The method according to any one of claims 45-58, wherein said second list of step b comprises target nucleic acid sequences as defined in claim 24 or 25.
60. The method according to any one of claims 45-59, essentially performed as set forth in Fig. 2.
61. The method according to any one of claims 45-60, wherein said first, second and third lists are stored in the memory of a computer system, preferably in a database.
62. A computer program product providing instructions for implementing the method accor-ding to any one of claims 45-61, embedded in a computer-readable medium.
63. A system comprising a database of target sequences and an application program for executing the computer program of claim 62.
64. A method for identifying a specific means for detection of a target nucleic acid, the method comprising A) inputting, into a computer system, data that uniquely identifies the nucleic acid sequence of said target nucleic acid, wherein said computer system comprises a database holding in-formation of the composition of at least one library of nucleic acid probes according to any one of claims 1-39, and wherein the computer system further comprises a database of target nucleic acid sequences for each probe of said at least one library and/or further comprises means for acquiring and comparing nucleic acid sequence data, B) identifying, in the computer system, a probe from the at least one library, wherein the sequence of the probe exists in the target nucleic acid sequence or a sequence complemen-tary to the target nucleic acid sequence, C) identifying, in the computer system, primer that will amplify the target nucleic acid se-quence, and D) providing, as identification of the specific means for detection, an output that points out the probe identified in step B and the sequences of the primers identified in step C.
65. The method according to claim 64, wherein step A also comprises inputting, into the computer system, data that identifies the at least one library of nucleic acids from which it is desired to select a member for use in the specific means for detection.
66. The method according to claim 65, wherein the data that identifies the composition of the at least one library is a product code.
67. The method according to any one of claims 64-66, wherein inputting in step A is per-formed via an internet web interface.
68. The method according to any one of claims 64-66, wherein the primers identified in step C are chosen so as to minimize the chance of amplifying genomic nucleic acids in a PCR
reaction.
reaction.
69. The method according to claim 68, wherein at least one of the primers is selected so as to include a nucleotide sequence which in genomic DNA is interrupted by an intron.
70. The method according to any one of claims 64-69, wherein the primers selected in step C are chosen so as to minimize length of amplicons obtained from PCR performed on the tar-get nucleic acid sequence.
71. The method according to any one of claims 64-70, wherein the primers selected in step C are chosen so as to optimize the GC content for performing PCR.
72. A computer program product providing instructions for implementing the method accor-ding to any one of claims 64-71 embedded in a computer-readable medium.
73. A system comprising a database of nucleic acid probes as defined in any one of claims 1-39 and an application program for executing the computer program of claim 72.
74. A method for profiling a plurality of target sequences comprising contacting a sample of target sequences with a library according to any one of claims 1-39 and detecting, characteri-zing or quantifying the probe sequences which bind to the target sequences.
75. The method according to claim 74, providing detection of a nucleic acid sequence which is present in less than 10% of the plurality of sequences which are bound by the multi-probe sequences.
76. The method according to claim 75, wherein the target mRNA sequences or cDNA
sequences comprise a transcriptome.
sequences comprise a transcriptome.
77. The method according to claim 76, wherein the transcriptome is a human transcrip-tome.
78. The method according to any one of claims 74-77, wherein the library of probes are covalently coupled to a solid support.
79. The method according to claim 78, wherein the solid support comprises a microtiter plate and each well of the microtiter plate comprises a different library probe.
80. The method according to any one of claims 74-79, wherein the step of detecting is per-formed by amplifying a target nucleic acid sequence containing a recognition sequence com-plementary to a library probe.
81. The method of claim 80, wherein target nucleic acid amplification is carried out by u-sing a pair of oligonucleotide primers flanking the recognition sequence complementary to a library probe.
82. The method of claim 74-81, wherein the presence or expression level of one or more target nucleic acid sequences is correlated with a species' phenotype.
83. The method of claim 82, wherein the phenotype is a disease.
84. A method of analysing a mixture of nucleic acids using a library according to any one of claims 1-39 comprising the steps of (a) contacting a target oligonucleotide with a library of labelled oligonucleotide probes, each of said oligonucleotide probes having a known sequence and being attached to a solid sup-port at a known position, to hybridize said target oligonucleotide to at least one member of said library of probes, thereby forming a hybridized library;
(b) contacting said hybridized library with a nuclease capable of cleaving double-stranded oligonucleotides to release from said hybridized library a portion of said labelled oligonucleo-tide probes or fragments thereof; and (c) identifying said positions of said hybridized library from which labelled probes or frag-ments thereof have been removed, to determine the sequence of said unlabelled target oli-gonucleotide.
(b) contacting said hybridized library with a nuclease capable of cleaving double-stranded oligonucleotides to release from said hybridized library a portion of said labelled oligonucleo-tide probes or fragments thereof; and (c) identifying said positions of said hybridized library from which labelled probes or frag-ments thereof have been removed, to determine the sequence of said unlabelled target oli-gonucleotide.
85. A method of analysing a mixture of nucleic acids using a library of any one of claims 1-39 comprising the steps of (a) contacting a target oligonucleotide with a library of labelled oligonucleotide probes, each of said oligonucleotide probes having a known sequence and being attached to a solid sup-port at a known position, to hybridize said target oligonucleotide to at least one member of said library of probes, thereby forming a hybridized library;
(b) identifying said positions of said hybridized library at which labelled probes or fragments thereof have hybridized, to determine the sequence of said target oligonucleotide; and (c) identifying said positions of said hybridized library from which labelled probes or frag-ments thereof have been removed, to determine the sequence of said unlabelled target oli-gonucleotide.
(b) identifying said positions of said hybridized library at which labelled probes or fragments thereof have hybridized, to determine the sequence of said target oligonucleotide; and (c) identifying said positions of said hybridized library from which labelled probes or frag-ments thereof have been removed, to determine the sequence of said unlabelled target oli-gonucleotide.
86. A method for quantitatively or qualitatively determining the presence of a target nucleic acid in a sample, the method comprising i) identifying, by means of the method according to any one of claims 64-71, a specific means for detection of the target nucleic acid, where the specific means for detection com-prises an oligonucleotide probe and a set of primers, ii) obtaining the primers and the oligonucleotide probe identified in step i), iii) subjecting the sample to a molecular amplification procedure in the presence of the pri-mers and the oligonucleotide probe from step ii), and iv) determining the presence of the target nucleic acid based on the outcome of step iii).
87. The method according to claim 86, wherein the primers obtained in step ii) are ob-tained by synthesis.
88. The method according to claim 86 or 87 or, wherein the oligonucleotide probe is ob-tained from a library according to any one of claims 1-39.
89. The method according to any one of claims 86-88, wherein the procedure in step iii) is a PCR or a NASBA procedure.
90. The method according to claim 89, wherein the PCR procedure is a qPCR.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63785704P | 2004-12-22 | 2004-12-22 | |
DKPA200401987 | 2004-12-22 | ||
DKPA200401987 | 2004-12-22 | ||
US60/637,857 | 2004-12-22 | ||
DKPA200402012 | 2004-12-28 | ||
DKPA200402012 | 2004-12-28 | ||
PCT/DK2005/000815 WO2006066592A2 (en) | 2004-12-22 | 2005-12-21 | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2593916A1 true CA2593916A1 (en) | 2006-06-29 |
Family
ID=36292652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002593916A Abandoned CA2593916A1 (en) | 2004-12-22 | 2005-12-21 | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1831394A2 (en) |
JP (1) | JP2008523828A (en) |
CA (1) | CA2593916A1 (en) |
WO (1) | WO2006066592A2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2423324A4 (en) * | 2009-04-22 | 2013-02-13 | Vertex Pharma | Probe set for identification of nucleotide mutation, and method for identification of nucleotide mutation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004024314A2 (en) * | 2002-09-11 | 2004-03-25 | Exiqon A/S | A population of nucleic acids including a subpopulation of lna oligomers |
AU2003275018B2 (en) * | 2002-09-20 | 2009-10-01 | Integrated Dna Technologies, Inc. | Anthraquinone quencher dyes, their methods of preparation and use |
JP4573833B2 (en) * | 2003-06-20 | 2010-11-04 | エクシコン・アクティーゼルスカブ | Probe, library and kit for analyzing nucleic acid mixture, and method for constructing the same |
-
2005
- 2005-12-21 EP EP05804988A patent/EP1831394A2/en not_active Withdrawn
- 2005-12-21 CA CA002593916A patent/CA2593916A1/en not_active Abandoned
- 2005-12-21 WO PCT/DK2005/000815 patent/WO2006066592A2/en active Application Filing
- 2005-12-21 JP JP2007547182A patent/JP2008523828A/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
EP1831394A2 (en) | 2007-09-12 |
JP2008523828A (en) | 2008-07-10 |
WO2006066592A3 (en) | 2006-08-24 |
WO2006066592A2 (en) | 2006-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11111535B2 (en) | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same | |
US8192937B2 (en) | Methods for quantification of microRNAs and small interfering RNAs | |
EP1735459B1 (en) | Methods for quantification of micrornas and small interfering rnas | |
US6344316B1 (en) | Nucleic acid analysis techniques | |
US20040219565A1 (en) | Oligonucleotides useful for detecting and analyzing nucleic acids of interest | |
CA2490466A1 (en) | Systems and methods for predicting oligonucleotide melting temperatures (tms) | |
EP1975256B1 (en) | Modified oligonucleotides for mismatch discrimination | |
WO2008040355A2 (en) | Novel methods for quantification of micrornas and small interfering rnas | |
EP1994182A2 (en) | Degenerate nucleobase analogs | |
WO2004024314A2 (en) | A population of nucleic acids including a subpopulation of lna oligomers | |
US20060166238A1 (en) | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same | |
US20060014183A1 (en) | Extendable probes | |
CN101090979A (en) | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same | |
CA2529793C (en) | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same | |
US20160289761A1 (en) | Oligonucleotides comprising a secondary structure and uses thereof | |
CA2593916A1 (en) | Probes, libraries and kits for analysis of mixtures of nucleic acids and methods for constructing the same | |
WO2005121358A2 (en) | Extendable probes | |
EP1944310A2 (en) | Modified oligonucleotides for mismatch discrimination | |
EP1882748A2 (en) | A population of nucleic acids including a subpopulation of LNA oligomers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Dead |