JP2005503148A

JP2005503148A - Molecular interaction site of RNase PRNA and method for modulating the same site

Info

Publication number: JP2005503148A
Application number: JP2003521807A
Authority: JP
Inventors: エッカー，デービッド・ジェイ
Original assignee: Isis Pharmaceuticals Inc
Current assignee: Ionis Pharmaceuticals Inc
Priority date: 2001-08-21
Filing date: 2002-08-19
Publication date: 2005-02-03
Also published as: WO2003016498A3; WO2003016498A2; CA2457318A1; EP1425292A2; IL160401A0

Abstract

特定の二次構造を有する、ＲＮアーゼＰＲＮＡの分子相互作用部位を含むポリヌクレオチドを提供する。該ポリヌクレオチドに結合する化合物のコンビナトリアル・ライブラリーを仮想的又は現実的にスクリーニングするためのこのようなポリヌクレオチドの使用方法も提供する。ＲＮアーゼＰＲＮＡ又はこれを含有する原核細胞の活性を、このような仮想的又は現実的スクリーニングによって同定される化合物によってモジュレートする方法も提供する。A polynucleotide comprising a molecular interaction site of RNase P RNA having a specific secondary structure is provided. Also provided are methods for using such polynucleotides to virtually or realistically screen combinatorial libraries of compounds that bind to the polynucleotides. Also provided are methods of modulating the activity of RNase P RNA or prokaryotic cells containing it by compounds identified by such virtual or realistic screening.

Description

【技術分野】
【０００１】
発明の分野
本発明は、ＲＮアーゼＰＲＮＡの分子相互作用部位の同定、同部位に結合する化合物の仮想的又は現実的スクリーニング、並びに仮想的及び現実的スクリーニングで同定される該化合物によるＲＮアーゼＰＲＮＡ活性のモジュレーティングに関する。
【０００２】
発明の背景
リボヌクレアーゼＰ（ＲＮアーゼＰ）は、ｔＲＮＡの５’末端の成熟中のｔＲＮＡ前駆体からのリーダー配列の除去の原因となるエンドリボヌクレアーゼである。Altman et al.,FASEB J.,1993,7,7-14 及びPace et al.,J.Bacteriol.,1995,177,1919-1928. ＲＮアーゼＰは、少なくとも細菌におけるその触媒機能が、タンパク質によってではなく、そのＲＮＡ成分（ＲＮアーゼＰＲＮＡ）によって行なわれるリボ核タンパク質である。Guerrier-Takada et al.,Cell,1983,35,849-857. ＲＮアーゼＰＲＮＡの他の特徴は、そのｐｒｅ−ｔＲＮＡ基質の三次構造を認識するその能力である。Kahle et al.,EMBO J.,1990,9,1929-1937. 細菌ＲＮアーゼＰＲＮＡの二次構造は、配列の一次比較分析から推測され（James et al.,Cell,1988,52,19-26)、さらなる配列が入手可能であったときに微細調整された(Haas et al.,Science,1991,254,853-856; Haas et al.,Proc.Natl.Acad.Sci.USA,1994,91,2527-2531; Brown et al.,Nutl.Acids Res.,1993,21,671-679)。さらに、大腸菌（E.coli）及び枯草菌(Bacillus subtilis)からの細菌ＲＮアーゼＰＲＮＡの立体構造(three-dimensional architecture)の誘導は、Massire et al.,J.Mol.Biol.,1998,279,773-793（これは、その全体で本明細書に援用される）に記載されている。
【０００３】
ゲノミクス、分子生物学及び構造生物学における最近の進歩は、ＲＮＡ分子が、細胞におけるタンパク質発現に必要である多くのイベントにどのように関与するか、又は制御するかを強調している。ＲＮＡ分子は、単なる中間体として機能するのではなく、ＤＮＡからのそれら自身の転写を積極的に調節し、ｍＲＮＡ分子及びｔＲＮＡ分子をスプライスし、編集し、リボソームにおけるペプチド結合を合成し、新生タンパク質の、細胞膜へのマイグレーションを触媒し、メッセージの翻訳速度を微調整する。ＲＮＡ分子は、これらの機能を果たすために必要なフレームワークを与える、多様な特有の構造モチーフを導入することができる。
【０００４】
構成されたＲＮＡ分子に特異的に結合する“小(small)”分子治療薬(therapeutics)は、ポリマーではない有機化学分子である。“小”分子治療薬は、例えば、最強力な天然発生抗生物質である。例えば、アミノグリコシド及びマクロライド抗生物質は、リボソームＲＮＡ（ｒＲＮＡ）構造の一定の(defined)領域に結合して、タンパク質合成に必要な、ＲＮＡのコンホメーション変化(conformational change)を妨害することによって作用すると考えられる“小”分子である。さらに、ＲＮＡ分子のコンホメーションにおける変化が、ｍＲＮＡの転写及び翻訳の速度を調節することが判明している。小分子は一般に１０ｋＤａ未満である。
【０００５】
ＲＮＡ分子又は関連ＲＮＡ分子群は、タンパク質合成に細胞によって利用される調節領域を有すると、出願人は考える。細胞は、ＲＮＡとの直接の特異的相互作用によって合成されるタンパク質のタイミング及び量の両方に制御を及ぼすと考えられる。この考えは、転写に高度に集中している、遺伝子調節に関する科学論文を読むことによって得られる印象とは矛盾する。ＲＮＡ成熟、輸送、細胞内局在及び翻訳のプロセスは、薬物結合のための良好な機会を与えるＲＮＡ認識部位に富んである。出願人の発明は、とりわけ、微生物ゲノムにおけるＲＮＡ分子、特にＲＮアーゼＰＲＮＡの、これらの部位を見い出すことに関する。出願人の発明はさらに、多数の化学的エンティティを、これらの薬物結合部位に結合する及び／又はこれらの薬物結合部位をモジュレートするそれらの能力に関して、現実的に又は仮想的に製造する及び／又はスクリーニングするために、コンビナトリアル・ケミストリーを利用する。
【０００６】
核酸及びそれらの付随構造モチーフの可能な立体構造の決定は、例えば、ＲＮＡによる触媒作用、ＲＮＡ−ＲＮＡ相互作用、ＲＮＡ−核酸相互作用、ＲＮＡ−タンパク質相互作用、及び核酸による小分子の認識の研究のような、分野に識見を与える。ＲＮＡのモデル立体構造を形成するための４つの一般的アプローチが、文献において実証されている。これらの全てが、例えばＲＮＡのような、標的核酸内のフォールディング及び三次相互作用(tertiary interaction)のシミュレーションのために、難解な分子モデリング及びコンピュータによるアルゴリズムを用いている。ＷｅｓｔｈｏｆとＡｌｔｍａｎ（Proc.Natl.Acad.Sci.,1994,91,5133,これはその全体で本明細書に援用される）は、対話型コンピュータ・モデリング・プロトコールによる、大腸菌(E.coli)からのＲＮアーゼＰの触媒ＲＮＡサブユニット、Ｍ１ＲＮＡの立体作用モデルの作製を述べている。ＭｕｅｌｌｅｒとＢｒｉｍａｃｏｍｂｅ(J.Mol.Biol.,1997,271,524)は、リボソームＲＮＡに関する極低温電子顕微鏡検査（ｃｒｙｏ−ＥＭ）及び生化学研究の分野における重要な一連の研究を利用して、大腸菌１６ＳリボソームＲＮＡの立体モデルを構築している。核酸ヘアピン・モチーフをモデル化する方法は、核酸構造を表す一組の縮小座標(reduced coordinates)と、ＭｏｎｔｅＣａｒｌｏ（ＭＣ）シミュレーションを用いて構造を平衡させるサンプリング・アルゴリズムとに基づいて開発されている(Tung, Biophysical J.,1997,72,876, これはその全体で本明細書に援用される）。ＭＣ−ＳＹＭプログラムは、制約−充足方法を用いて、ＲＮＡの立体構造を予測するためのさらに別のアプローチである(Major et al.,Proc.Natl.Acad.Sci.,1993,90,9408）。ＭＣ−ＳＹＭプログラムは、問題入力制約(query input constraints)を満たす全てのモデルに関してコンホメーション・スペースをサーチする制約−充足に基づくアルゴリズムであり、例えば、Cedergren et al., RNA Structure And Function,1998, Cold Spring Harbor Lab.Press,p.37-75に記載されている。ＲＮＡの立体構造は、この方法によると、成長するオリゴヌクレオチド・モデルに１つ又は幾つかの異なるコンホメーションを有するヌクレオチドを段階的に添加することによって作製される。
【０００７】
ＷｅｓｔｈｏｆとＡｌｔｍａｎ(Proc.Natl.Acad.Sci.,1994,91,5133)は、対話型コンピュータ・モデリング・プロトコールによる、大腸菌(E.coli)からのＲＮアーゼＰの触媒ＲＮＡサブユニット、Ｍ１ＲＮＡの立体作用モデルの作製を述べている。このモデリング・プロトコールは、化学的及び酵素的防御実験、系統発生的分析、突然変異体の活性の研究、及びＭ１ＲＮＡへの基質の結合によって触媒される反応の動力学からのデータを組み入れていた。モデリングは、大部分に関して、該文献に記載されたように行なわれた（Westhof et al.,in “Theoretical Biochemistry and Molecular Biophysics”, Beveridge and Lavery(Eds.),Adenine,NY,1990,399)。一般に、Ｍ１ＲＮＡの一次配列から出発して、二次構造のステム−ループ構造及び他の要素(elements)を作製した。コンピュータ・グラフィックス・ステーション及びＦＲＯＤＯを用いた、これらの要素から立体構造へのその後のアセンブリ(Jones, J.Appl.Crystallogr.,1978,11,268)、その後のＮＵＣＬＩＮ−ＮＵＣＬＳＱを用いた精密化(refinement)は、正確なジオメトリー、不良な接点の不在及び適当な立体化学を有するＲＮＡモデルを提供した。このように作製されたモデルは、Ｍ１ＲＮＡに関する経験的データの大部分と一致することが見出されており、ＲＮアーゼＰの作用機構についての仮説をもたらすものである。しかし、この方法によって作製されたモデルは、あまり充分に分析されていないので、該構造はＸ線結晶学によって決定される。
【０００８】
ＭｕｅｌｌｅｒとＢｒｉｍａｃｏｍｂｅ(J.Mol.Biol.,1997,271,524,これはその全体で本明細書に援用される）は、ＥＲＮＡ−３Ｄと呼ばれるモデリング・プログラムを用いて、大腸菌１６ＳリボソームＲＮＡの立体モデルを作製している。このプログラムは、低分解能回折データから得られる電子密度にフィットする、一本鎖の力学的ドッキング(dynamic docking)によってＡ形ＲＮＡヘリックス(A-form RNA helices)及び一本鎖領域のような立体構造を作製する。ヘリカル要素を決定して、モデル内に位置付けた後に、例えばＲＮＡ−タンパク質架橋及びフットプリンティング・データのような、任意の既知生化学的制約を満たすように、一本鎖領域の配置が調整されている。
【０００９】
核酸ヘアピン・モチーフをモデル化する方法は、核酸構造を表す一組の縮小座標と、ＭｏｎｔｅＣａｒｌｏ（ＭＣ）シミュレーションを用いて構造を平衡させるサンプリング・アルゴリズムとに基づいて開発されている(Tung, Biophysical J.,1997,72,876, これはその全体で本明細書に援用される）。核酸のステム領域は、標準的な二本鎖形成(canonical duplex formation)を用いることによって、適切にモデル化することができる。一組の縮小座標を用いて、１対の固定末端を有する一本鎖ループ構造を形成することができるアルゴリズムを作製した。これは、コンホメーション・スペースにおける該ループの効果的な構造サンプリングを可能にする。このアルゴリズムを改変されたＭｅｔｒｏｐｏｌｉｓＭｏｎｔｅＣａｒｌｏアルゴリズムと組み合わせると、核酸ヘアピン構造の研究をコンピュータ手段によって簡略化する構造シミュレーション・パッケージが得られる。ＲＮＡサブドメインがひと度同定されたならば、それらを必要に応じて、米国特許第５，７１２，０９６号に開示された方法によって安定化させることができる。
【００１０】
Ｘ線結晶学は、バイオポリマー標的の幾つかの二次構造及び三次構造の決定を可能にする、非常に強力な方法であるが(Erikson et al.,Ann.Rep.in Med.Chem.,1992,27,271-289)、この方法は費用のかかる手段であり、達成が非常に困難である可能性がある。バイオポリマーの結晶化は、極めてやり甲斐があるが、適当な分解能で行なうのが困難であり、しばしば、科学(science)と同様な技術であると見なされる。薬物開発プロセスにおけるＸ線結晶構造の有用性をさらに混乱させているのは、結晶学が液相への洞察、したがって生物学的に関連がある、問題の標的の構造を明らかにすることができないことである。リガンド（アゴニスト、アンタゴニスト又は阻害剤）とその標的との相互作用の性質及び強度の幾つかの分析が、ＥＬＩＳＡ(Kemeny and Challacombe, in ELISA and oher Solid Phase Immunoassays:1988)、ラジオリガンド結合アッセイ（Berson et al., Clin.1968; Chard,in “An Introduction to Radioimmunoassay and Related Techniques”,1982)、表面プラスモン共鳴(Karlsson et al.,1991, Jonsson et al., Biotechniquws,1991)、又はシンチレーション近似アッセイ(Udenfriend et al., Anal.Biochem.,1987)（これらは全て以前に引用したものである）によって行なうことができる。ラジオリガンド結合アッセイは通常、ラジオリガンドの結合に関して結合部位における未知物質の競合結合を評価するときにのみ有用であり、さらに放射能の使用を必要とする。表面プラスモン共鳴法はより直接的に使用できるが、非常に費用もかかる。結合動力学についての慣用的な生化学的アッセイと、解離及び結合定数も、標的−リガンド相互作用の性質を明らかにする上で有用である。
【００１１】
したがって、本発明の１態様は、ＲＮアーゼＰＲＮＡ中の分子相互作用部位を同定する。二次構造要素を構成する、これらの分子相互作用部位は、“小”分子等との重要な治療的、調節的又は他の相互作用を生じる可能性が高い。本発明の他の態様は、ＲＮアーゼＰＲＮＡの分子相互作用部位を、それとの相互作用のために提案された化合物と比較することである。
【００１２】
本発明のさらに他の態様は、ＲＮアーゼＰＲＮＡの分子相互作用部位の三次元構造の数値的表現のデータベースの確立である。このようなデータベース・ライブラリーは、分子相互作用部位の構造及び、分子相互作用部位と可能なリガンドとの相互作用の解明及びそれらの予測のための強力なツールである。本発明の他の態様は、ライブラリーのいずれの要素が標的に結合するのかを決定するために、ＲＮアーゼＰＲＮＡに対する個々の化合物又は化合物の混合物を含むコンビナトリアル・ライブラリーをスクリーニングするための一般的な方法を提供することである。
【００１３】
発明の概要
本発明は、特定の二次構造を含むＲＮアーゼＰＲＮＡの分子相互作用部位の同定に関する。
本発明はさらに、分子相互作用部位に結合する化合物のコンビナトリアル・ライブラリーを仮想的又は現実的にスクリーニングするために用いることができる、分子相互作用部位を含む核酸分子、ポリヌクレオチド又はオリゴヌクレオチドにも関する。
本発明はさらに、分子結合部位の構造の三次元表現を含むコンピュータ可読の媒体にも関する。
【００１４】
本発明はさらに、ＲＮアーゼＰＲＮＡ又はそれを含む原核細胞を、このような仮想的又は現実的スクリーニングで同定される化合物と接触させることによるＲＮアーゼＰＲＮＡ活性のモジュレーティングにも関する。
本発明はまた、原核細胞をこのような仮想的又は現実的スクリーニングで同定される化合物と接触させることを含む、原核細胞増殖のモジュレーティングにも関する。
【００１５】
本発明の好ましい実施態様の説明
本発明は、とりわけ、ＲＮアーゼＰＲＮＡの分子相互作用部位の同定に関する。このような分子相互作用部位は、翻訳及び他の細胞プロセスのために必要な、例えば因子及びタンパク質のような細胞成分(cellular components)と相互作用することができる二次構造を含む。分子相互作用部位を含む核酸分子又はポリヌクレオチドを用いて、それと結合する化合物のコンビナトリアル・ライブラリーを仮想的又は現実的にスクリーニングすることができる。このようなスクリーニングによって同定される化合物は、ＲＮアーゼＰＲＮＡの活性をモジュレートするために用いられるので、原核細胞増殖をモジュレート、即ち、阻害するか又は刺激するために、このような化合物を用いることができる。したがって、ＲＮアーゼＰＲＮＡのモジュレーションを介して作用する、新規な薬物、農業用化学物質、工業用化学物質等を同定することができる。
【００１６】
強力な薬物及び他の生物学的に有用な化合物を同定するために、幾つかの方法及びプロトコールを統合することが好ましい。薬剤、獣医学用薬物、農業用化学物質、殺虫剤、除草剤、殺真菌剤、工業用化学物質、検査用化学物質及び、汚染防止、工業用生化学及び生体触媒系(biocatalytic system)に有用な、多くの他の有効化合物を、本発明の実施態様によって同定することができる。幾つかの手段の新規な組み合わせが、本発明の方法に非常な能力と汎用性を与える。本明細書でさらに詳しく記載するように、本出願の譲受人によって開発された幾つかのプロセスを統合することが、一部の実施態様では好ましいが、他の方法論を本発明と充分に効果的に統合することができることを認識すべきである。したがって、本発明の教示に従ってＲＮアーゼＰＲＮＡ上の分子結合部位を決定することは非常に有利であるが、リガンド及びリガンド混合物と、重要であると同定された、他のＲＮアーゼＰＲＮＡとの相互作用は、本発明の他の態様から非常に恩恵を受ける可能性がある。このような組み合わせの全てが本発明の範囲内である。
【００１７】
出願人の発明の１態様は、“分子相互作用部位”と呼ばれる、ＲＮアーゼＰＲＮＡ中の二次構造の同定に関する。本明細書で用いる限り、“分子相互作用部位”とは、二次構造を有する、ＲＮアーゼＰＲＮＡの領域である。分子相互作用部位は、ＲＮアーゼＰＲＮＡの複数の異なる分類学的種の間に保存されうる。分子相互作用部位は、大きいＲＮＡ分子内に含有される、小さい、好ましくは２００ヌクレオチド未満、好ましくは１５０ヌクレオチド未満、好ましくは７０ヌクレオチド未満、好ましくは５０ヌクレオチド未満、或いは３０ヌクレオチド未満である、独立的に折り畳まれた機能的サブドメインである。分子相互作用部位は一本鎖領域と二本鎖領域の両方を含有することができる。したがって、分子相互作用部位は、“小さい”分子との、及びそうでなくとも、相互作用を受けることができ、治療用途及び他の用途において例えばオリゴヌクレオチド及び他の化合物のような、“小さい”分子、オリゴマーとの相互作用のための部位として役立つと期待される。分子相互作用部位はさらに、小分子、薬物等を結合するためのポケットを含む。
【００１８】
分子相互作用部位は、少なくともＲＮアーゼＰＲＮＡ内に存在する。本発明の一部の実施態様によると、分子相互作用部位（単数又は複数）を有するＲＮアーゼＰＲＮＡが多くの供給源に由来することができることが理解される。したがって、このようなＲＮアーゼＰＲＮＡを任意の手段によって同定して、三次元表現に翻訳して、ＲＮアーゼＰＲＮＡと相互作用して、該ＲＮアーゼＰＲＮＡのモジュレーションを行なうことができる化合物を同定するために用いることができる。幾つかの実施態様では、ＲＮアーゼＰＲＮＡ中に同定される分子相互部位は、真核生物、特にヒトには存在しない、したがって、ヒトに毒性を及ぼさずに、原核生物のＲＮアーゼＰＲＮＡを同時にモジュレートしながら、“小”分子結合部位として役立つことができる。
【００１９】
当業者に知られた任意の手段によって、分子相互作用部位を同定することができる。本発明の一部の実施態様では、ＲＮアーゼＰＲＮＡ中の分子相互作用部位は、国際公開ＷＯ９９／５８７１９（これはその全体で本明細書に援用される）に記載された一般的方法によって同定される。簡単に説明すると、標的ＲＮアーゼＰＲＮＡヌクレオチド配列は、既知配列から選択される。任意のＲＮアーゼＰＲＮＡヌクレオチド配列を選択することができる。標的ＲＮアーゼＰＲＮＡのヌクレオチド配列を、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡのヌクレオチド配列と比較する。該複数のＲＮアーゼＰＲＮＡと標的ＲＮアーゼＰＲＮＡとに有効に保存される、少なくとも１つの配列領域を同定する。このような保存された領域を、いずれかの二次構造が存在するか否かを決定するために試験して、二次構造を有する保存領域に関して、このような二次構造を同定する。
【００２０】
本発明の一部の実施態様によると、標的ＲＮアーゼＰＲＮＡのヌクレオチド配列を異なる分類学的種からの複数の対応するＲＮアーゼＰＲＮＡのヌクレオチド配列と比較する。特定の標的核酸の最初の選択は、いずれかの機能的基準に基づくことができる。例えば、細菌及び酵母のような、病原性ゲノムに関与することが知られているＲＮアーゼＰＲＮＡが、具体的な標的である。病原性細菌及び酵母は、当業者に周知である。他のＲＮアーゼＰＲＮＡ標的は独立的に決定することができるか、又は当業者に知られた、公に利用可能な原核生物遺伝子データベースから選択することができる。データベースは、例えば、Ｍａｎ（ＯＭＩＭ）におけるＯｎｌｉｎｅＭｅｎｄｅｌｉａｎＩｎｈｅｒｉｔａｎｃｅ、ｔｈｅＣａｎｃｅｒＧｅｎｏｍｅＡｎａｔｏｍｙＰｒｏｊｅｃｔ（ＣＧＡＰ）、ＧｅｎＢａｎｋ、ＥＭＢＬ、ＰＩＲ、ＳＷＩＳＳ−ＰＲＯＴ等を包含する。疾患に関連した遺伝子突然変異のデータベースであるＯＭＩＭは、一部は、ｔｈｅＮａｔｉｏｎａｌＣｅｎｔｅｒｆｏｒＢｉｏｔｅｃｈｎｏｌｏｇｙＩｎｆｏｒｍａｔｉｏｎ（ＮＣＢＩ）のために開発されたものである。ＯＭＩＭは、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／Ｏｍｉｍ／でインターネットの世界的ウェブを介してアクセスすることができる。癌細胞の分子アナトミーを解読するために必要な、情報及びテクノロジー・ツールを確立するインターディシプリナリー・プログラムであるＣＧＡＰは、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ｎｃｉｃｇａｐ／でインターネットの世界的ウェブを介してアクセスすることができる。これらのデータベースの一部は、完全な又は部分的なヌクレオチド配列を含有する可能性がある。さらに、ＲＮアーゼＰＲＮＡ標的は、プライベート遺伝子データベースから選択することもできる。或いは、ＲＮアーゼＰＲＮＡ標的を利用可能な刊行物から選択することができるか、又は特に本発明に関連して用いるために決定することができる。
【００２１】
ＲＮアーゼＰＲＮＡ標的を選択するか又は用意した後に、該ＲＮアーゼＰＲＮＡ標的のヌクレオチド配列を決定して、次に、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡのヌクレオチド配列に比較する。本発明の１実施態様では、ＲＮアーゼＰＲＮＡ標的のヌクレオチド配列を少なくとも１つの遺伝子データベースをスキャンすることによって決定するか、又は利用可能な刊行物中で同定する。当業者に既知で、利用可能なデータベースは、例えば、ＧｅｎＢａｎｋ等を包含する。これらのデータベースは、例えば、当業者に既知で、利用可能であるＥｎｔｒｅｚのような、サーチング・プログラムと共に用いることができる。Ｅｎｔｒｅｚは、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／Ｅｎｔｒｅｚ／でインターネットの世界的ウェブを介してアクセスすることができる。好ましくは、種々なデータベースから入手可能な最も完全な核酸配列表現を用いる。当業者に既知で、利用可能であるＧｅｎＢａｎｋデータベースも、最も完全なヌクレオチド配列を得るために用いることができる。ＧｅｎＢａｎｋは、ＮＩＨ遺伝子配列データベースであり、全ての公に入手可能なＤＮＡ配列の注釈付きコレクションである。ＧｅｎＢａｎｋは、例えば、Nuc.Acids Res.,1998,26,1-7（これは、その全体で本明細書に援用される）に記載されており、当業者が、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／Ｗｅｂ／Ｇｅｎｂａｎｋ／ｉｎｄｅｘ．ｈｔｍｌでインターネットの世界的ウェブを介してアクセスすることができる。或いは、完全なヌクレオチド配列が入手不能である場合に、ＲＮアーゼＰＲＮＡ標的の部分ヌクレオチド配列を用いることができる。
【００２２】
ＲＮアーゼＰＲＮＡ標的のヌクレオチド配列を、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡのヌクレオチド配列と比較する。異なる分類学的種からの複数のＲＮアーゼＰＲＮＡと、そのヌクレオチド配列は、利用可能な刊行物からの遺伝子データベースに見出すことができるか、又は特に、本発明に関連して用いるために決定することができる。本発明の１実施態様では、ＲＮアーゼＰＲＮＡ標的を、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡのヌクレオチド配列と、配列類似性サーチ、オーソログ・サーチ又は両方を行なって比較する、このようなサーチは当業者に知られている。
【００２３】
配列類似性サーチの結果は、ウインドウ領域と呼ばれる、標的ＲＮアーゼＰＲＮＡの少なくとも８から２０ヌクレオチドまでの領域に相同である、それらのヌクレオチド配列の少なくとも一部を有する、複数のＲＮアーゼＰＲＮＡである。好ましくは、該複数のＲＮアーゼＰＲＮＡは、標的ＲＮアーゼＰＲＮＡのいずれかのウインドウ領域に少なくとも６０％相同性である少なくとも１つの部分を含む。より好ましくは、この相同性は少なくとも７０％である。さらに好ましくは、この相同性は少なくとも８０％である。最も好ましくは、この相同性は少なくとも９０％又は９５％である。例えば、該ウインドウサイズ、即ち、該複数の配列が比較される、標的ＲＮアーゼＰＲＮＡの部分は、約８〜約２０連続ヌクレオチド、好ましくは約１０〜約１５連続ヌクレオチド、最も好ましくは約１１〜約１２連続ヌクレオチドでありうる。したがって、該ウインドウサイズは、調節することができる。次に、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡを、標的ＲＮアーゼＰＲＮＡの各可能なウインドウに、好ましくは、該複数の配列の全ての部分が該標的ＲＮアーゼＰＲＮＡの該ウインドウに比較されるまで、比較する。標的ＲＮアーゼＰＲＮＡのいずれかのウインドウ配列に少なくとも６０％、好ましくは少なくとも７０％、より好ましくは少なくとも８０％、又は最も好ましくは少なくとも９０％相同性である部分を有する、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡの配列は、可能な相同配列と見なされる。
【００２４】
配列類似性サーチは、手動で、又は当業者に知られた、幾つかの利用可能なコンピュータ・プログラムを用いて行なうことができる。好ましくは、当業者に利用可能で、知られた、Ｂｌａｓｔ及びＳｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズム等が使用可能である。Ｂｌａｓｔは、ヌクレオチド及びタンパク質配列データベースの分析をサポートするように設計されたＮＣＢＩの配列類似性サーチ・ツールである。Ｂｌａｓｔは、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ＢＬＡＳＴ／でインターネットの世界的ウェブを介してアクセスすることができる。ＧＣＧＰａｃｋａｇｅは、パブリックのドメイン・データベース又は任意のローカルに入手可能で、サーチ可能なデータベースのいずれかによって利用することができるＢｌａｓｔのローカル版を提供する。ＧＣＧＰａｃｋａｇｅｖ．９．０は、１００を超える相互関連ソフトウェア・プログラムを含有し、それらの編集、マッピング、比較及びアライニングによって配列の分析を可能にする、商業的に入手可能なソフトウェア・パッケージである。ＧＣＧＰａｃｋａｇｅに包含される他のプログラムは、例えば、ＲＮＡ二次構造予測、核酸断片アセンブリ、及び進化的分析(evolutionary analysis)を促進するプログラムを包含する。さらに、非常に優れた遺伝子データベース（ＧｅｎＢａｎｋ、ＥＭＢＬ、ＰＩＲ及びＳＷＩＳＳ−ＰＲＯＴ）がＧＣＧＰａｃｋａｇｅと共に配布され、これらはデータベース・サーチング及び操作プログラムによって完全にアクセス可能である。ＧＣＧは、例えば、ｇｃｇ．ｃｏｍ．／でインターネットの世界的ウェブを介してアクセス可能である。Ｆｅｔｃｈは、ＧＣＧに利用可能なツールであり、これは受け入れ番号(accession number)に基づいて注釈付きＧｅｎＢａｎｋレコードを得ることができ、Ｅｎｔｒｅｚに類似する。他の配列類似性サーチは、ＰａｎｇｅａからのＧｅｎｅＷｏｒｌｄ及びＧｅｎｅＴｈｅｓａｕｒｕｓによって行なうことができる。ＧｅｎｅＷｏｒｌｄ２．５は、ポリヌクレオチド及びタンパク質配列の分析のための、自動化された、フレキシブルな高スループット・アプリケーションである。ＧｅｎｅＷｏｒｌｄは、配列の自動的分析及び注釈を可能にする。ＧＣＧと同様に、ＧｅｎｅＷｏｒｌｄは、相同性サーチング、遺伝子検出、多重配列アラインメント、二次構造予測、及びモチーフ同定のための幾つかのツールを組み込む。ＧｅｎｅＴｈｅｓａｕｒｕｓ１．０^ＴＭは、多重ソースからの情報を与え、パブリック及びローカル・データに関するリレーショナル・データ・モデルを提供する、配列及び注釈データ・サブスクリプション・サービスである。
【００２５】
他の代替の配列類似性サーチは、例えば、ＢｌａｓｔＰａｒｓｅによって行なうことができる。ＢｌａｓｔＰａｒｓｅは、上記方策を自動化する、ＵＮＩＸ（登録商標）プラットフォーム上をランするＰＥＲＬスクリプトである。ＢｌａｓｔＰａｒｓｅは、問題の標的受入番号のリストを受け取って、全てのＧｅｎＢａｎｋフィールドを解析して、“タブ区切りの(tab-delimited)”テキストを形成する、これは、次に、フレキシビリティーを与える、より容易なサーチ及び分析のための“リレーショナル・データベース(relational database)”フォーマットに登録されることができる。最終結果は、容易に選別され、濾過され、照会されることができる、一連の完全に解析されたＧｅｎＢａｎｋレコード、並びに注釈−リレーショナル・データベースである。
【００２６】
配列類似性サーチング及びデータ操作を行なうことができる、他のツールキットは、同様にＮＣＢＩからのＳＥＡＬＳである。このツールセットは、パール及びＣで書き込まれ、これらの言語をサポートする任意のコンピュータ・プラットフォーム上をランすることができる。これは、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／Ｗａｌｋｅｒ／ＳＥＡＬＳ／でインターネットの世界的ウェブにおいてダウンロードするために利用可能である。このツールキットは、Ｂｌａｓｔ２又はギャップド・ブラスト(gapped blast)へのアクセスを可能にする。これはまた、タックス−コレクター(tax-collector)と呼ばれるツールを包含し、このツールは、タックス−ブレイク(tax-break)と呼ばれるツールと共に、Ｂｌａｓｔ２の出力を解析して、存在する各種の照会配列(query sequence)に最も相同性である配列の識別子(identifier)を戻す。他の有用なツールは、ｆｅａｔｕｒｅ２ｆａｓｔａであり、これは入力配列から注釈に基づいて配列断片を抽出する。
【００２７】
好ましくは、配列類似性サーチにおいて上述したように、標的核酸に相同性を有する、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡを、その中に標的ＲＮアーゼＰＲＮＡのオーソログを見出すために、さらに線引きする(delineated)。オーソログは、配列類似性を有し、生物に関連して同様な機能を果たす、広く分岐した生物(widely divergent organisms)における２つの遺伝子を意味するように遺伝子分類において定義された用語である。これとは対照的に、パラログは、遺伝子重複によって生じるが、新たな機能を発達させ、アイソタイプとも呼ばれる、種内の遺伝子である。任意に、パラログ・サーチを行なうこともできる。オーソログ・サーチを行なうことによって、多様な生物からの相同配列の網羅的なリストが得られる。その後、これらの配列を分析して、オーソログであるための基準にフィットする最良の代表的配列を選択する。オーソログ・サーチは、例えば、Ｃｏｍｐａｒｅを含めた、当業者に入手可能なプログラムによって行なうことができる。好ましくは、オーソログ・サーチは、配列の各々に関する完全な、解析済みＧｅｎＢａｎｋ注釈にアクセスして行なわれる。現在、ＧｅｎＢａｎｋから得られるレコードは、“フラット−ファイル(flat-files)”であり、自動化分析に理想的に適するとは言えない。好ましくは、オーソログ・サーチをＱ−Ｃｏｍｐａｒｅプログラムを用いて行なう。ＢｌａｓｔＲｅｓｕｌｔｓ−Ｒｅｌａｔｉｏｎデータベース及びＡｎｎｏｔａｔｉｏｎｓ−Ｒｅｌａｔｉｏｎａｌデータベースが、Ｑ−Ｃｏｍｐａｒｅプロトコールに用いられる、これによって、オーソログ配列のリストを得て、以下に述べる種間配列比較プログラムで比較する。
【００２８】
上記類似性サーチは、ｅ−スコアと呼ばれる、カットオフ値に基づいた結果を与える。ｅ−スコアは、ヌクレオチドの一定のウインドウ内での無作為の配列マッチ(random sequence match)の確率を表す。ｅ−スコアが低ければ低いほど、該マッチは良好である。
当業者はｅ−スコアを熟知している。ユーザーは、ストリンジェンシー又は上述したような、所望の相同性度に依存してｅ−値カットオフを定義する。本発明の一部の実施態様では、同定されるＲＮアーゼＰＲＮＡの相同ヌクレオチド配列のいずれも、ヒトゲノムに存在しないことが好ましい。
【００２９】
本発明の別の実施態様では、オーソログ・データベースをサーチすることによって、必要な配列を得る。このようなデータベースの１つは、Ｈｏｖｅｒｇｅｎであり、これは、脊椎動物オーソログのキュレート済み・データベース(curated database)である。オーソログ・セットを、このデータベースから輸出して、そのまま用いることができる、又は上述したような、他の配列類似性サーチのためのシード(seeds)として用いることができる。例えば、無脊椎動物オーソログを見い出すために、さらなるサーチが望ましい可能性がある。Ｈｏｖｅｒｇｅｎは、例えば、ｐｂｉｌ．ｕｎｉｖ−ｌｙｏｎ１．ｆｒ／ｐｕｂ／ｈｏｖｅｒｇｅｎ／でファイル・トランスファー・プログラムとしてダウンロードすることができる。原核生物オーソログのデータベース、ＣＯＧＳは利用可能であり、例えば、ｎｃｂｉ．ｎｌｍ．ｎｉｈ．ｇｏｖ／ＣＯＧ／でインターネットの世界的ウェブを介して対話形式で用いることができる。
【００３０】
上記オーソログ又は仮想的転写体(virtual transcripts)を、配列類似性サーチ又はオーソログ・サーチのいずれかによって得た後に、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡ及び標的ＲＮアーゼＰＲＮＡの中に保存される少なくとも１つの配列領域を同定する。当業者に利用可能で、知られた、多くのコンピュータ・プログラムを用いて、種間配列比較を行なうことができる。当業者に利用可能で、知られたＣｏｍｐａｒｅを用いて、種間配列比較を行なうことが好ましい。Ｃｏｍｐａｒｅは、ウインドウ／ストリンジェンシー判断基準を用いて、配列のペアワイズ比較を可能にするＧＣＧツールである。Ｃｏｍｐａｒｅは、指定された性質のマッチが見い出される点を含有する出力ファイルを生じる。これらを、別のＧＣＧツール、ＤｏｔＰｌｏｔでプロットすることができる。
【００３１】
或いは、ＣｏｍｐａｒｅＯｖｅｒＷｉｎｓと組み合わせたＱ−Ｃｏｍｐａｒｅから得られるオーソログ配列を用いた種間配列比較によって、保存配列領域の同定を行なう。好ましくは、比較のための配列のリスト、即ち、Ｑ−Ｃｏｍｐａｒｅから得られるオーソログ配列をＣｏｍｐａｒｅＯｖｅｒＷｉｎｓアルゴリズム中に入力する。好ましくは、種間配列比較を、マスター標的配列のウインドウ上に照会配列をスライドさせるペアワイズ配列比較によって行なう。好ましくは、該ウインドウは約９〜約９９連続ヌクレオチドである。
【００３２】
標的ＲＮアーゼＰＲＮＡと、上述したように得られる複数のＲＮアーゼＰＲＮＡのいずれかの照会配列との間の配列相同性は、好ましくは少なくとも６０％、より好ましくは少なくとも７０％、さらに好ましくは少なくとも８０％、最も好ましくは少なくとも９０％又は９５％である。閾値を選択する最も好ましい方法は、５０％〜１００％の全ての閾値をコンピュータに自動的にトライさせて、ユーザーが提供した測定基準に基づいて閾値を選択することである。このような測定基準の１つは、ｎを通常３に設定する場合に、正確にｎヒット(hits)が戻されるように、閾値を選定することである。このプロセスを、上記複数のＲＮアーゼＰＲＮＡのメンバーである照会核酸(query nucleic acid)上の全ての塩基がマスター標的配列上の全ての塩基に比較されるまで、繰り返す。得られるスコアリング・マトリックスを散布図としてプロットすることができる。一定位置におけるマッチ密度(match density)に基づいて、ドットが存在しないか、単離ドットが存在するか、又はラインのように見えるほど一緒に密接した、一組のドットが存在するかのいずれかが起こりうる。小さいとは言え、ラインの存在は、一次配列相同性を示唆する。分岐した種におけるＲＮアーゼＰＲＮＡ内の配列保存は、二次構造を有する可能性もある、保存された調節要素(conserved regulatory elements)のインジケーターであるように思われる。種間配列比較の結果は、当業者に知られた、完全に自動化された方法で、ＭＳＥｘｃｅｌ及び視覚的基本ツールを用いて分析することができる。
【００３３】
ＲＮアーゼＰＲＮＡ標的のヌクレオチド配列と、異なる分類学的種からの複数のＲＮアーゼＰＲＮＡのヌクレオチド配列との間に保存される少なくとも１つの領域を好ましくはオーソログによって同定した後に、該保存領域を分析して、それが二次構造を含有するか否かを決定する。同定された保存領域が二次構造を含有するか否かの決定は、当業者に知られた多くの手段によって行なうことができる。二次構造の決定は、好ましくは、自己相補性比較(self complementarity comparison)、アラインメント及び共分散分析、二次構造予測又はこれらの組み合わせによって行なわれる。
【００３４】
本発明の１実施態様では、アラインメント及び共分散分析によって、二次構造分析を行なう。アラインメント及び共分散分析のための多くのプロトコールが、当業者に知られている。好ましくは、アラインメントは、当業者に利用可能で、知られているＣｌｕｓｔａｌＷによって行なわれる。ＣｌｕｓｔａｌＷは、ＧＣＧの一部ではないが、既存のＧＣＧツールセットの延長として加えて、ローカル配列と共に用いることができる、多重配列アラインメントのためのツールである。ＣｌｕｓｔａｌＷは、例えば、ｄｏｔ．ｉｍｇｅｎ．ｂｃｍ．ｔｍｃ．ｅｄｕ：９３３１／ｍｕｌｔｉ−ａｌｉｇｎ／Ｏｐｔｉｏｎｓ／ｃｌｕｓｔａｌｗ．ｈｔｍｌでインターネットの世界的ウェブを介してアクセスすることができる。ＣｌｕｓｔａｌＷはさらに、Thompson,et al., Nuc.Acids Res.,1994,22,4673-4680に記載されており、これはその全体で本明細書に援用される。これらのプロセスは、初期段階で同定された保存ＵＴＲ領域を自動的に用いるようにスクリプトすることができる。Ｓｅｑｅｄ、即ち、当業者に利用可能で、知られたＵＮＩＸ（登録商標）コマンド・ライン・インターフェイスは、大きい配列からの選択されたローカル領域の抽出を可能にする。多くの異なる種からの多重配列をクラスター化して、さらなる分析のためにアラインすることができる。
【００３５】
本発明の他の実施態様では、全ての可能なペアワイズＣｏｍｐａｒｅＯｖｅｒＷｉｎｄｏｗｓ（登録商標）比較の出力をコンパイルし、ＡｌｉｇｎＨｉｔｓと呼ばれるプログラム、即ち、当業者が再生することができるプログラムを用いて、基準配列に合わせてアラインする。このプログラムの１つの目的は、ペアワイズ比較で得られた全てのヒットを基準配列上の位置にマッピングすることである。ＣｏｍｐａｒｅＯｖｅｒＷｉｎｄｏｗ及びＡｌｉｇｎＨｉｔｓを組み合わせた、この方法は、如何なる他のアルゴリズムよりも、大きくローカルなアラインメント（２０〜１００塩基にわたる）を生じる。このローカル・アラインメントは、例えば共変動(covariation)又はＲｅｖＣｏｍｐのような、後に述べる構造検出ルーチン(structure finding routine)のために必要である。このアルゴリズムは、アラインされた配列のファスタファイル(fasta file)を書き込む。これを、ＣｏｍｐａｒｅＯｖｅｒＷｉｎｄｏｗｓ（登録商標）及びＡｌｉｇｎＨｉｔｓなしに、ＣｌｕｓｔａｌＷだけを用いることから区別することが重要である。
【００３６】
共変動は、コンセンサス二次構造予測のための一次構造情報の系統発生的分析を用いるプロセスである。共変動は、下記参考文献に記載されており、これらの文献の各々はそれらの全体で本明細書に援用される：Gutell et al., “Comparative Sequence Analysis Of Experiments Performed During Evolution”In Ribosomal RNA Group I Introns,Green,Ed.,Austin:Landes,1996; Gautheret et al.,Nuc.Acids Res.,1997,25,1559-1564; Gautheret et al., RNA,1995,I,807-814; Lodmell et al., Proc.Natl.Sci.USA,1995,92,10555-10559; Gautheret et al.,J.Mol.Biol.,1995,248,27-43; Gutell, Nuc.Acids Res.,1994,22,3502-3517; Gutell, Nuc.Acids Res.,1993,21,3055-3074; Gutell, Nuc.Acids Res.,1993,21,3051-3054; Woese,Proc.Natl.Sci.USA,1989,86,3119-3122; 及びWoese et al., Nuc.Acids Res.,1980,8,2275-2293,これらの各々はその全体で本明細書に援用される。好ましくは、共分散ソフトウェアを共分散分析のために用いる。好ましくは、共変動、即ち、配列アラインメントからのＲＮＡ構造の比較分析のためのプログラムセットを用いる。共変動は、コンセンサス二次構造予測のための一次配列情報の系統発生的分析を用いる。共変動は、例えば、ｍｂｉｏ．ｎｃｓｕ．ｅｄｕ／ＲＮａｓｅＰ／ｉｎｆｏ／ｐｒｏｇｒａｍｓ／ｐｒｏｇｒａｍｓ．ｈｔｍｌでインターネットの世界的ウェブを介して得ることができる。該プログラムのバージョンについての完全な説明は、刊行されている(Brown,J.W.1991, Phylogenetic analysis of RNA structure on the Macintosh computer. CABIOS 7:391-393)。現在のバージョンは、ｖ４．１であり、これは、標準共変動分析、代償性塩基−変化(compensatory base-changes)の同定、及び相互情報分析を含めた、ＲＮＡ配列アラインメントからの種々なタイプの共変動分析を実行することができる。該プログラムは文書で充分に立証されており、広範囲な実例ファイルを備えている。これは、独立型プログラムとして編集されており；これはＨｙｐｅｒｃａｒｄを必要としない（但し、非常に小さい“スタック(stack)”バージョンが包含される）。このプログラムは、任意のＭａｃｉｎｔｏｓｈ環境ランニングＭａｃＯＳｖ７．１以上においてランする。高速プロセッサー装置(faster processor machines)（６８０４０又はＰｏｗｅｒＰＣ）が、相互情報分析又は大きい配列アラインメントの分析のために提案される。
【００３７】
本発明の他の実施態様では、二次構造分析を二次構造予測によって行う。熱力学的パラメータ及びエネルギー計算に基づいて、ＲＮＡ二次構造を予測するアルゴリズムが多く存在する。Ｍ−ｆｏｌｄ又はＲＮＡ構造２．５２のいずれかを用いて、二次構造予測を行なうことが好ましい。Ｍ−ｆｏｌｄは、例えば、ｉｂｃ．ｗｕｓｔｌ．ｅｄｕ／−ｚｕｋｅｒ／ｍａ／ｆｏｒｍ２．ｃｇｉで世界的ウェブを介して、アクセスすることができる、又はローカル使用のためにはＵＮＩＸ（登録商標）プラットフォーム上をダウンロードすることができる。Ｍ−ｆｏｌｄは、ＧＣＧパッケージの一部としても入手可能である。ＲＮＡＳｔｒｕｃｔｕｒｅ２．５２は、Ｍ−ｆｏｌｄアルゴリズムのウインドウズ・アダプテーションであり、例えば、１２８．１５１．１７６．７０／ＲＮＡｓｔｒｕｃｔｕｒｅ．ｈｔｍｌでインターネットの世界的ウェブを介してアクセスすることができる。
【００３８】
本発明の他の実施態様では、自己相補的比較(self complementarity comparison)によって、二次構造分析を行う。好ましくは、自己相補的比較は、上述したＣｏｍｐａｒｅを用いて、行なう。より好ましくは、Ｃｏｍｐａｒｅを修飾して、対合マトリックスを拡大して、従来のＷａｔｓｏｎ−ＣｒｉｃｋＧ−Ｃ／Ｃ−Ｇ又はＡ−Ｕ／Ｕ−Ａ対の他に、Ｇ−Ｕ又はＵ−Ｇ塩基対を占めるようにすることができる。このような修飾Ｃｏｍｐａｒｅプログラム（修飾Ｃｏｍｐａｒｅ）は、一定の配列内の全ての可能な塩基対合を予測することによって、開始する。上述したように、一連のオーソログの一次配列比較に基づいて、小さいが保存される領域が同定される。修飾されたＣｏｍｐａｒｅでは、これらの配列の各々が、それ自身の逆補体(its own reverse complement)に比較される。許容される塩基対合は、Ｗａｔｓｏｎ−ＣｒｉｃｋＡ−Ｕ、Ｇ−Ｃ対合及び非標準的(non-canonical)Ｇ−Ｕ対合を包含する。全ての利用可能なオーソログの、このような自己相補的プロットのオーバーレイと、各々における最大の反復パターンの選択は、最少数の可能な折り畳み立体配置を生じる。次に、これらのオーバーレイを、上記エネルギー考察によって制限されるものを含めた、付加的な制約と共に用いて、最も可能な二次構造を演繹することができる。
【００３９】
本発明の他の実施態様では、ＡｌｉｇｎＨｉｔｓの出力を、ＲｅｖＣｏｍｐと呼ばれるプログラムによって読み取る。このプログラムは、当業者によって再生することができる。このプログラムの目的の１つは、塩基対合ルール及びオーソログ進化(ortholog evolution)を用いて、ＲＮＡ二次構造を予測することである。ＲＮＡ二次構造は、一本鎖領域と、ステムと呼ばれる塩基対合領域とから構成される。進化によって保存される構造をサーチするので、オーソログ配列の一定のアラインメントに関する最も有望なステムは、最も多くの配列によって形成されうるステムである。可能なステム形成又は塩基対合ルールは、例えば、ＮＭＲなどの、他の手法によって決定されたステムの塩基対合統計(base pairing statistics)を分析することによって決定される。ＲｅｖＣｏｍｐの出力は、この構造を形成することができたオーソログ・セット・メンバー配列の割合(percentage)によってランクされる、可能な構造の分類リストである。このアプローチはパーセンテージ閾値(percentage threshold)アプローチを用いるため、ノイズ配列(noise sequences)には鈍感である。ノイズ配列は、実際のオーソログではないか、又はサーチされる構造の例を表さないとしても、高度な配列相同性のためにＡｌｉｇｎＨｉｔｓの出力に及ぶ配列のいずれかである配列である。非常に類似したアルゴリズムが、ＰＣｓ上をランするために、ＶｉｓｕａｌｂａｓｉｃｆｏｒＡｐｐｌｉｃａｔｉｏｎｓ（ＶＢＡ）及びＭｉｃｒｏｓｏｆｔＥｘｃｅｌを用いて実施されて、一定の配列セットに対する逆補体マトリックス・ビュー(reverse complement matrix view)を形成する。
【００４０】
上記二次構造分析の結果は、アラインメント及び共変動、自己相補的分析、例えばＭ−ｆｏｌｄ若しくはその他の形式を用いるような二次構造予測のいずれによって行なわれたとしても、標的ＲＮアーゼＰＲＮＡ及び異なる分類学的種からの複数のＲＮアーゼＰＲＮＡの中の保存領域における二次構造の同定である。同定されうる具体的な二次構造は、非限定的に、バルジ、ループ、ステム、ヘアピン、ノット(knot)、トリプル・インターアクト(triple interacts)、クローバーリーフ(cloverleafs)、又はヘリックス、或いはこれらの組み合わせを包含する。或いは、新たな二次構造が同定される可能性がある。
【００４１】
本発明はまた、１６ＳｒＲＮＡ中に存在する分子相互作用部位を含む、例えばポリヌクレオチド及びオリゴヌクレオチドのような、核酸分子に関する。核酸分子は、物質的な化合物自体、並びに同化合物のコンピュータ内(in silico)表現を包含する。したがって、核酸分子はＲＮアーゼＰＲＮＡに由来する。分子相互作用部位は、この分子相互作用部位に結合するときに、細胞におけるＲＮアーゼＰＲＮＡの発現をモジュレートする、少なくとも１つの分子のための結合部位として役立つ。ポリヌクレオチドのヌクレオチド配列は、実施例においてさらに詳述する分子相互作用部位の二次構造を与えるように選択される。ポリヌクレオチドのヌクレオチド配列は、好ましくは、上述した標的ＲＮアーゼＰＲＮＡのヌクレオチド配列である。或いは、該ヌクレオチド配列は、好ましくは、同様に分子相互作用部位を含有する、複数の異なる分類学的種からのＲＮアーゼＰＲＮＡのヌクレオチド配列である。
【００４２】
本発明のポリヌクレオチドは、ＲＮアーゼＰＲＮＡの分子相互作用部位を含む。したがって、本発明のポリヌクレオチドは、分子相互作用部位のヌクレオチド配列を含む。さらに、該ポリヌクレオチドは、各ポリヌクレオチドの５’又は３’末端又はこれらの組み合わせに、５０まで、より好ましくは４０まで、さらに好ましくは３０まで、さらに好ましくは２０まで、最も好ましくは１０までの付加的ヌクレオチドを含むことができる。したがって、例えば、分子相互作用部位が２５ヌクレオチドを含む場合に、ポリヌクレオチドは７５ヌクレオチドまでを含むことができる。分子相互作用部位に存在する以外のヌクレオチドは、分子相互作用部位の二次構造を維持するように選択される。当業者は、二次構造を保存するために、このような付加的なヌクレオチドを選択することができる。該ポリヌクレオチドは、ＲＮＡ若しくはＤＮＡのいずれかを含むことができる、又はキメラＲＮＡ／ＤＮＡであることができる。該ポリヌクレオチドは、当業者に周知である、修飾塩基、糖及び主鎖を含むことができる。さらに、単一ポリヌクレオチドは、複数の分子相互作用部位を含むことができる。さらに、複数のポリヌクレオチドが、一緒に、単一分子相互作用部位を含むことができる。或いは、複数のポリヌクレオチドが一緒に分子相互作用部位を含む場合に、当業者はこれらのポリヌクレオチドを相互に結合させて、単一ポリヌクレオチドを形成することができる。
【００４３】
分子相互作用部位を含む、ポリヌクレオチドの部分は１つ以上の欠失、挿入及び置換を含むことができる。ステム、末端ループ、バルジ、内部ループ及びダングリング領域は、１つ以上の欠失、挿入及び置換を含むことができる。したがって、例えば、１０ヌクレオチドから成る分子相互作用部位の末端ループを、１つ以上の挿入、欠失又は置換を含有するように修飾して、末端ループに先行するステムを短縮又は伸長することができる。さらに、例えば、二本鎖領域に隣接する不対ダングリング・ヌクレオチドを欠失させることができる、又は他のヌクレオチドを加えて、塩基対合を形成して、ステムを伸長することができる。さらに、ステム内のヌクレオチド塩基対合も、置換、欠失又は挿入させることができる。したがって、例えば、分子相互作用部位のステム部分内のＡ−Ｕ塩基対をＧ−Ｃ塩基対と置換することができる。さらに、非標準的塩基対合（例えば、Ｇ−Ａ、Ｃ−Ｔ、Ｇ−Ｕ等）もポリヌクレオチド内に存在することができる。このように、例えば、以下の実施例に記載するような、分子相互作用部位と少なくとも７０％、より好ましくは８０％、さらに好ましくは９０％、さらに好ましくは９５％、最も好ましくは９９％相同性を有するポリヌクレオチドが、本発明の範囲内に包含される。相同性パーセントは、例えば、Ｇａｐプログラム(Wisconsin Sequence Analysis Package,Version 8 for Unix（登録商標）,Genetics Computer Group,University Research Park,Madison WI)、によって、ＳｍｉｔｈａｎｄＷａｔｅｒｍａｎのアルゴリズム(Adv.Appl.Math.,1981,2,482-489,これはその全体で本明細書に援用される）を用いる規定の設定を用いて、決定することができる。
【００４４】
本発明はさらに、ＲＮアーゼＰＲＮＡ内に存在する、上述した、精密化及び単離済み核酸又はポリヌクレオチドに関する。分子相互作用部位を含むポリヌクレオチドは、分子相互作用部位を含むＲＮアーゼＰＲＮＡの部分を模倣する。
【００４５】
ポリヌクレオチド及びそれらの修飾は、当業者に周知である。本発明のポリヌクレオチドは、例えば、分子相互作用部位と結合する自然発生分子を検出するためのリサーチ試薬として、用いることができる。或いは、以下に詳述するように、分子相互作用部位に結合する小分子を、現実的に又は仮想的に、スクリーニングするために、本発明のポリヌクレオチドを用いることができる。分子相互作用部位に結合するための化合物の仮想的形成とそれらのスクリーニングは、例えば、国際出願公開ＷＯ９９／５８９４７（これはその全体で本明細書に援用される）に記載されている。本発明のポリヌクレオチドは、リサーチ、診断及び治療的用途に関して、細胞内の自然発生分子相互作用部位と競合するデコイ(decoys)として用いることもできる。特に、該ポリヌクレオチドは、細菌増殖を阻害するための治療的用途に用いることができる。分子相互作用部位と結合する分子は、翻訳中のＲＮアーゼＰＲＮＡの機能を、増強又は縮小のいずれかによって、モジュレートする。該ポリヌクレオチドはさらに、農業的、工業的及び他の用途にも使用可能である。
【００４６】
本発明はまた、上記ポリヌクレオチドを少なくとも１つ含む組成物にも関する。本発明の一部の実施態様では、１つの組成物中に２つのポリヌクレオチドを包含させる。本発明の組成物は、任意に、キャリヤーを含むことができる。“キャリヤー”は、動物に１つ以上の核酸をデリバリーするための、許容される溶媒、希釈剤、懸濁化剤又は任意の他の不活性ビヒクルであり、当業者に周知である。該キャリヤーは、製薬的に受容されるキャリヤーでありうる。該キャリヤーは液体又は固体であることができ、組成物の他の成分と一緒にしたときに、所望のバルク(bulk)、コンシステンシー等を与えるように、念頭において予定された投与方法によって選択される。典型的な製薬的キャリヤーは、非限定的に、結合剤（プレゼラチン化(pregelatined)トウモロコシデンプン、ポリビニルピロリドン又はヒドロキシプロピルメチルセルロース等）；充填剤（例えば、ラクトースと他の糖、微結晶性セルロース、ペクチン、ゼラチン、硫酸カルシウム、エチルセルロース、ポリアクリレート又はリン酸水素カルシウム等）；滑沢剤（例えば、ステアリン酸マグネシウム、タルク、シリカ、コロイド二酸化ケイ素、ステアリン酸、金属ステアリン酸塩、水素化植物油、コーンスターチ、ポリエチレングリコール、安息香酸ナトリウム、酢酸ナトリウム等）；崩壊剤（例えば、デンプン、デンプングリコール酸ナトリウム等）；又は湿潤剤（例えば、ラウリル硫酸ナトリウム等）を包含する。
【００４７】
本発明はまた、ＲＮアーゼＰＲＮＡの分子相互作用部位に結合する化合物を同定する方法であって、分子相互作用部位の三次元構造の数値的表現を用意し、複数の有機化合物の三次元構造の数値的表現を含む化合物データセットを用意することを含む方法に関する。次に、分子相互作用部位の該数値的表現を該化合物データセットのメンバーと比較して、該有機化合物が、分子相互作用部位との物理的相互作用を生じる能力に従ってランクされる有機化合物の序列(hierarchy)を形成する。
【００４８】
本発明はまた、ＲＮアーゼＰＲＮＡ又はそれを含むポリヌクレオチドの分子相互作用部位に結合する化合物を同定する方法にも関する。本発明の一部の実施態様では、ＲＮアーゼＰＲＮＡ又はそれを含むポリヌクレオチドの分子相互作用部位に結合する化合物が、国際出願公開ＷＯ９９／５８９４７（これはその全体で本明細書に援用される）に記載された一般的方法に従って、同定される。簡単に説明すると、該方法は、分子相互作用部位又はそれを含むポリヌクレオチドの三次元構造の数値的表現を用意し、複数の有機化合物の三次元構造の数値的表現を含む化合物データセットを用意し、分子相互作用部位の該数値的表現を該化合物データセットのメンバーと比較して、該有機化合物が、該分子相互作用部位との物理的相互作用を生じる能力に従ってランクされる有機化合物の序列を形成することを含む。
【００４９】
分子相互作用部位と、例えば有機化合物のような、リガンドとの結合を特徴付けるには多くの方法があるが、国際出願公開ＷＯ９９／５８７１９、ＷＯ９９／５９０６１、ＷＯ９９／５８７２２、ＷＯ９９／４５１５０、ＷＯ９９／５８４７４及びＷＯ９９／５８９４７には方法論が記載されており、これらの各々は本発明の譲受人に譲渡され、これらの各々はその全体で本明細書に援用される。
【００５０】
さらに、本発明は、上述した、核酸分子及びそれを含む組成物の三次元表現にも関する。ＲＮアーゼＰＲＮＡの分子相互作用部位の三次元構造は、数値的表現として操作することができる。三次元表現、即ち、in silico（例えば、コンピュータ可読形での）表現は、例えば、国際出願公開ＷＯ９９／５８９４７（これはその全体で本明細書に援用される）に開示された方法で生成することができる。簡単に説明すると、好ましくはＲＮＡの分子相互作用部位の三次元構造は、数値的表現として操作することができる。行なわれる化学及び入手可能な反応ビルディング・ブロックに基づいて分子を設計する能力を当業者に与えるコンピュータ・ソフトウェアは、商業的に入手可能である。例えば、Ｓｙｂｙｌ／Ｂａｓｅ（Tripos,St.Louis,MO)、ＩｎｓｉｇｈｔＩＩ（Molecular Simulations,San Diego,CA)、及びＳｃｕｌｐｔ（MDL Information Systems,San Leandro,CA)のような、ソフトウェア・パッケージが、構造のコンピュータによる生成手段を与える。これらのソフトウェア製品はまた、コンピュータによって生成した分子及びそれらの構造を評価し、比較するための方法を与える。分子相互作用部位のin silicoコレクションは、上記供給メーカーのいずれかからのソフトウェア及び入手可能である又は入手可能になる可能性がある他のソフトウェアを用いて、生成することができる。三次元表現を用いて、例えば、分子（単数又は複数）を可能な治療用化合物にドッキングすることができる。このように、三次元表現を薬物サーチング手段に用いることができる。したがって、本発明の、核酸分子及びそれを含む組成物は、同核酸分子の三次元表現を包含する。
【００５１】
ＲＮアーゼＰＲＮＡの分子相互作用部位に対する一連の構造制約は、例えば、酵素によるマッピング及び化学的プローブのような生化学的分析から、及び例えば共変動及び配列保存のようなゲノミクス情報から生じる可能性がある。例えばこれのような情報を用いて、特定の二次構造のステム及び他の領域において塩基対合を形成することができる。ループ及びバルジ領域における非標準的塩基対合スキームに関して、付加的な構造仮説が生じる可能性がある。ＭｏｎｔｅＣａｒｌｏサーチ方法は、プログラム制約と調和するＲＮアーゼＰＲＮＡの可能なコンホメーションをサンプリングして、三次元構造を作製することができる。
【００５２】
三次元のin silico表現の生成についてのレポートは、ライブラリー設計、発生、及びタンパク質標的に対するスクリーニングの見地から利用可能である。同様に、ＲＮＡモデル形成の分野における幾つかの試みが、文献に報告されている。しかし、ＲＮアーゼＰＲＮＡ構造の三次元、in silico表現を有する、有機分子、“小”分子、ポリヌクレオチド、又は他の核酸のin silico表現を照会するための構造に基づく設計アプローチの使用に関するレポートは存在しない。本発明は好ましくは、ＲＮアーゼＰＲＮＡ構造の三次元モデルの構成、複数の有機化合物、“小”分子、ポリマー化合物、ポリヌクレオチド及び他の核酸の三次元のin silico表現の構成、in silicoでのＲＮアーゼＰＲＮＡ分子相互作用部位に対する、このようなin silico表現のスクリーニング、複数の化合物からの最良に可能な結合剤(binders)のスコアリングと同定、並びに最後に、コンビナトリアル形式でのこのような化合物の合成と、このようなＲＮアーゼＰＲＮＡ標的のための新たなリガンドを同定するための該化合物の実験的試験を可能にするコンピュータ・ソフトウェアを用いる。
【００５３】
本発明の方法を用いてスクリーニングすることができる分子は、非限定的に、有機又は無機の、小分子量〜大分子量の個々の化合物と、リガンド、阻害剤、アゴニスト、アンタゴニスト、基質及び例えばペプチド又はポリヌクレオチドのようなバイオポリマーのコンビナトリアル混合物又はライブラリーを包含する。コンビナトリアル混合物は、非限定的に、化合物のコレクション及び化合物のライブラリーを包含する。これらの混合物は、混合物のコンビナトリアル合成によって、又は個々の化合物の混合によって生成することができる。化合物のコレクションは、非限定的に、個々の化合物セット又は混合物セット又は化合物プールを包含する。これらのコンビナトリアル・ライブラリーは、合成から、又は例えば微生物性、植物性、海洋性、ウイルス性及び動物性物質の天然ソースのような、天然ソースから得ることができる。コンビナトリアル・ライブラリーは、少なくとも約２０化合物、数千ほど多くの個々の化合物、及び恐らくは、さらに多くを包含する。コンビナトリアル・ライブラリーが化合物の混合物である場合には、これらの混合物は、典型的に２０〜５０００化合物、好ましくは５０〜１０００、より好ましくは５０〜１００化合物を含有する。１００〜５００化合物の組み合わせが、５００〜１０００の個々の種を有する混合物と同様に有用である。典型的に、コンビナトリアル・ライブラリーのメンバーは、約１０，０００Ｄａ未満、より好ましくは７，５００Ｄａ未満、最も好ましくは５０００Ｄａ未満の分子量を有する。
【００５４】
仮想的スクリーニングの分野での重要な進歩は、問題の受容体(a receptor of interest)に対する既知分子の相互作用を見い出して、同定するための構造に基づくデータベース・サーチを可能にする、ＤＯＣＫと呼ばれるソフトウェア・プログラムの開発であった(Kuntz et al.,ACC.Chem.Res.,1994,27,117; Geschwend and Kuntz, J.Compt.-Aided Mol.Des.,1996,10,123)。ＤＯＣＫは、それらの３Ｄ構造がin silicoで形成されているが、それらに関して受容体との相互作用についての先行知識が得られないような分子のスクリーニングを可能にする。それ故、ＤＯＣＫは、問題の受容体に対する新規なリガンドの発見を助成するツールを提供する。したがって、ＤＯＣＫは、本発明の方法によって調製された化合物を所望の標的にドッキングさせるために用いることができる。ＤＯＣＫの実施は、例えば、国際出願公開ＷＯ９９／５８９４７（これは、その全体で本明細書に援用される）に記載されている。
【００５５】
本発明の幾つかの実施態様では、例えば上述したような、自動化されたコンピュータによるサーチ・アルゴリズムを用いて、ユーザーによって指定された生化学的及びゲノム的制約と合致する、ＲＮアーゼＰＲＮＡからの許容される三次元分子相互作用部位構造の全てを予測する。これらの構造は、例えば、それらの二乗平均平方根偏差値に基づいて、異なるファミリーにクラスター化される。各ファミリーの代表的なメンバー（単数又は複数）に対して、明白な(explicit)溶媒及びカチオンでの分子動力学による、さらなる構造精密化(structural refinement)を行なうことができる。
【００５６】
これらのソフトウェア・プログラムによる構造列挙(structural enumeration)及び表現は、典型的に、ニ次元で分子骨格及び置換基を描くことによって、行なわれる。ひと度描いて、コンピュータに記憶させたならば、これらの分子を、商業的に入手可能なソフトウェア内に存在するアルゴリズムを用いて、三次元構造にすることができる。好ましくは、分子相互作用部位の三次元表現を作製するために、ＭＣ−ＳＹＭを用いる。分子相互作用部位の二次元構造を三次元モデルにすることは、典型的に、各分子の低エネルギー・コンホメーション又は低エネルギー・コンホーマー(conformers)のコレクションを生じる。これらの商業的に入手可能なプログラムの最終結果は、分子相互作用部位を含有するＲＮアーゼＰＲＮＡ配列の、該分子相互作用部位の三次元構造の同様な数値的表現のファミリーへの変換である。これらの数値的表現は集合データセット(ensemble data set)を形成する。
【００５７】
複数の化合物、好ましくは“小”有機化合物の三次元構造は、これらの化合物の三次元構造の数値的表現を含む化合物データセットとして表示することができる。これに関連して、“小”分子とは、非オリゴマー有機化合物を意味する。化合物の二次元構造を、分子相互作用部位に関して上述したように、三次元構造に変換させて、該分子相互作用部位の三次元構造に対する照会(querying)に用いることができる。商業的に入手可能な構造表現アルゴリズム(structure rendering algorithm)を用いることによって、化合物の二次元構造を迅速に生成することができる。例えばポリヌクレオチド又は他の核酸構造のような、性質がポリマーである化合物の三次元表現は、上記文献方法を用いて生成することができる。“小”分子又は他の化合物の三次元構造を生成することができ、低エネルギー・コンホメーションを短い分子動力学最小化(short molecular dynamics minimization)から得ることができる。これらの三次元構造はリレーショナル・データベースに記憶させることができる。三次元構造が構成される化合物は、独自に開発されたもの(proprietary)であるか、商業的に入手可能であるか、又は仮想的であることができる。
【００５８】
本発明の一部の実施態様では、複数の有機化合物の三次元構造の数値的表現を含む化合物データセットは、例えば、商業的プログラムから改変されたコンピュータ・プログラムによって生成される二次元化合物ライブラリーから、例えば、Ｃｏｎｖｅｒｔｅｒ（MSI,San Diego)によって提供される。化学化合物の二次元構造を、上述したように、三次元構造に変換させることによって、他の適当なデータベースを構成することができる。最終結果は、有機化合物の二次元構造から、複数の有機化合物の三次元構造の数値的表現への変換である。これらの数値的表現は、化合物データセットとして提示される。
【００５９】
分子相互作用部位を含むポリヌクレオチドの三次元構造の数値的表現と、複数の有機化合物の三次元構造の数値的表現を含む化合物データセットの両方が得られた後に、分子相互作用部位の数値的表現を、該化合物データセットのメンバーと比較して、有機化合物の序列を形成する。この序列は、該有機化合物が分子相互作用部位と物理的相互作用を形成する能力に従ってランクされる。好ましくは、この比較は、化合物データセットのメンバーに対して逐次行なわれる。一部の実施態様によると、この比較を、分子相互作用部位を含む、複数のポリヌクレオチドによって同時に行なうことができる。
【００６０】
“小”分子又は有機化合物の、例えば核酸のような生物学的標的との相互作用を研究し、最適化するための、多様な理論的及びコンビナトリアル方法は当業者に知られている。これらの、構造に基づく薬物設計ツールは、タンパク質と小分子リガンドとの相互作用をモデル化し、これらの相互作用を最適化することに非常に有用であった。典型的に、この種類の研究は、タンパク質受容体の構造が、この受容体に対して１度に１つずつ、個々の小分子を照会する(querying)ことによって知られた場合に、行なわれている。通常、これらの小分子は、受容体と共結晶化されているか、又は共結晶化されている他の分子に関係付けられているか、又は該受容体とのそれらの相互作用に関して幾らかの一連の知識が存在する分子であるかのいずれかであった。上述したＤＯＣＫを用いて、分子相互作用部位を含むポリヌクレオチドに、したがって、問題のＲＮアーゼＰＲＮＡに結合すると予想される分子を見い出して、同定することができる。ＤＯＣＫ４．０は、Regents of the University of Californiaから商業的に入手可能である。同等なプログラムも本発明に包含される。
【００６１】
ＤＯＣＫプログラムは、タンパク質標的と、それに結合するリガンドの同定に広く適用されている。典型的に、既知標的に結合する新しいクラスの分子が同定されており、後で、ｉｎｖｉｔｒｏ実験で実証されている。ＤＯＣＫソフトウェア・プログラムは、ＳＰＨＧＥＮ(Kuntz et al., J.Mol.Biol.,1982,161,269)及びＣＨＥＭＧＲＩＤ(Meng et al., J.Comput.Chem.,1992,13,505,これらの各々は、その全体で本明細書に援用される）を包含する、幾つかのモジュールから成る。ＳＰＨＧＥＮは、標的受容体内の結合ポケットの溶媒アクセス可能な表面を表すオーバーラッピング・スフェア(overlapping spheres)のクラスターを形成する。各クラスターは、小分子の可能な結合部位を表す。ＣＨＥＭＧＲＩＤは、結合する分子と標的ＲＮアーゼＰＲＮＡとの相互作用のフォース・フィールド・スコアリング(force field scoring)のために必要な情報を予め算定して、グリッド・ファイル(grid file)に記憶させる。スコアリング機能は、分子力学相互作用エネルギーを概算するものであり、ｖａｎｄｅｒＷａａｌｓ成分及び静電気成分から成る。ＤＯＣＫは、選択されたスフェア・クラスター(selected cluster of spheres)を用いて、ＲＮアーゼＰＲＮＡ上の標的部位にリガンド分子を配向させる(orient)。予め形成された三次元データベース内の各分子を該部位内での非常に多くの配向(in thousands of orientations)で試験して、各配向をスコアリング機能によって評価する。このようにスクリーニングされた各化合物に関して最良のスコアを有する配向のみを出力ファイル(output file)に記憶させる。最後に、データベースの全ての化合物をスコアの順の序列でランクし、次に、最良の候補のコレクションを実験的にスクリーニングすることができる。
【００６２】
ＤＯＣＫを用いて、多様なタンパク質標的に対する非常に多くのリガンドが同定されている。この分野における最近の試みは、例えばＲＮＡ二重らせんのような核酸に対して結合特異性を示す小分子リガンドを同定し、設計するためのＤＯＣＫの使用についてのレポートを生じている。ＲＮＡは、例えばＡＩＤＳ、ウイルス感染症及び細菌感染症のような、多くの疾患において重要な役割を果たしているが、特異的なＲＮＡ結合が可能な小分子に関しては殆ど研究がなされていない。ＲＮＡ二重らせんに対して、その深い主要な溝の特有のジオメトリー(geometry)に基づいて、特異性を有する化合物が、ＤＯＣＫ方法論を用いて同定された。Chen et al., Biochemistry,1997,36,11402;及びKuntz et al., ACC.Chem.Res.,1994,27,117. 最近、ＤＮＡ四重らせん(quadruplexes)におけるリガンド認識の問題へのＤＯＣＫ適用が報告されている。Chen et al., Proc.Natl.Acad.Sci.,1996,93,2635.
個々の化合物を、例えば、モルファイル(mol file)として表して、適当な化学構造プログラム又は同等なソフトウェアを用いて、in silico表現のコレクションに一緒にすることが好ましい。これらの二次元モルファイルは、輸出されて、例えば、上述したような、Ｃｏｎｖｅｒｔｅｒ(Molecular Simulations Inc.,San Diego)又は同等なソフトウェアのような商業的ソフトウェアを用いて、三次元構造に変換される。例えばＤＯＣＫ又はＱＸＰのようなドッキング・プログラムによる使用に適した原子種類は、例えば、Ｂａｂｅｌのようなソフトウェアを用いて又は同等なソフトウェアによって、三次元モルファイル中の全ての原子に割り当てられる(assigned to)。
【００６３】
各分子の低エネルギー・コンホメーションは、例えば、Ｄｉｓｃｏｖｅｒ(MSI,San Diego)のようなソフトウェアによって生成される。配向サーチ(orientation search)は、複数の化合物のうちの各化合物を、分子相互作用部位に多くの配向で、ＤＯＣＫ又はＱＸＰを用いて近接させることによって行なわれる。各配向に対する接触スコア(contact score)を決定して、その後に、該化合物の最適の配向を用いる。或いは、該化合物のコンホメーションを、予め決定された骨格(scaffold)の鋳型コンホメーションから決定することができる。
【００６４】
複数の化合物と分子相互作用部位との相互作用を、該分子相互作用部位の数値的表現を化合物データセットのメンバーと比較することによって、調べる。好ましくは、例えば、コンピュータ・プログラムによって又は他の方法で作製されたような、複数の化合物を分子相互作用部位と比較し、化合物のジヘドラル結合(dihedral bonds)間のランダムな“動き”を経験させる。好ましくは、約２０，０００〜１００，０００化合物を少なくとも１つの分子相互作用部位と比較する。典型的には、２０，０００化合物を約５分子相互作用部位と比較して、スコアをつける(scored)。三次元構造の個々のコンホメーションを標的部位に多くの配向で配置する。さらに、ＤＯＣＫプログラムの実行中に、化合物と分子相互作用部位とは、最適の水素結合、静電気及びｖａｎｄｅｒＷａａｌｓ接触(contacts)が実現されうるように、“フレキシブル”であることが許される。相互作用のエネルギーを、化合物と分子相互作用部位との１０〜１５の可能な配向に関して、算出して、記憶させる。ＱＸＰ方法論は、リガンドと標的の両方における実際のフレキシビリティー(true flexibility)を可能にするので、現在好ましい。
【００６５】
各エネルギー寄与の相対的重み(relative weights)は絶えず更新されて、全ての化合物に関して算出された結合スコアが、実験的結合データを表すことを保証する。各配向に関する結合エネルギーは、水素結合、ｖａｎｄｅｒＷａａｌｓ接触、静電気、溶媒和／脱溶媒和(desolvation)、及びフィットの質(quality of the fit)に基づいてスコアをつけられる。化合物と分子相互作用部位との間の低エネルギーｖａｎｄｅｒＷａａｌｓ、二極性及び水素結合相互作用を測定して、合計する。一部の実施態様では、これらのパラメータを、経験的に得られた結果によって調整することができる。標的に対する各分子の結合エネルギーは、リレーショナル・データベースへの出力である。該リレーショナル・データベースは、化合物が分子相互作用部位と物理的相互作用を形成する能力に従ってランクされる化合物の序列を含有する。高くランクされた化合物は、分子相互作用部位との物理的相互作用をより良好に形成することができる。
【００６６】
他の実施態様では、最高のランクを有する、即ち、最良にフィットする化合物を合成のために選択する。本発明の一部の実施態様では、結合データに基づいて所望の結合特徴を有すると考えられるような化合物を合成のために選択する。好ましくは、最高ランキング５％が合成のために選択される。より好ましくは、最高ランキング１０％が合成のために選択される。さらにより好ましくは、最高ランキング２０％が合成のために選択される。選択された化合物の合成は、パラレル・アレイ・シンセサイザー(parallel array synthesizer)を用いて自動化することができる、又は液相若しくは固相方法及び機器を用いて調製することができる。さらに、高ランクされた化合物と、分子相互作用部位を含有する核酸との相互作用を以下に述べるように評価する。
【００６７】
高ランクされた有機化合物と、ＲＮアーゼＰＲＮＡ分子相互作用部位を含むポリヌクレオチドとの相互作用は、当業者に知られた非常に多くの方法によって評価することができる。例えば、最高ランキング化合物を、高スループット（ＨＴＳ）機能及び細胞スクリーンにおける活性に関して試験することができる。ＨＴＳアッセイは、シンチレーション・プロキシミティ(scintillation proximity)、沈降、蛍光に基づくフォーマット、濾過に基づくアッセイ、比色アッセイ(colorometry assay)等によって測定することができる。次に、リード化合物(lead compound)をスケールアップして、動物モデルで活性及び毒性に関して試験することができる。この評価は、好ましくは、ＲＮアーゼＰＲＮＡポリヌクレオチドと、少なくとも１種類の化合物との混合物の質量分析法又は機能バイオアッセイを含む。
【００６８】
質量分析(mass spectroscopy)を用いる、ある一定の評価法は、本発明によって用いるためのある種の有用な質量分析法の例示として、国際出願公開ＷＯ９９／４５１５０（これはその全体で本明細書に援用される）に開示される。しかし、本発明を実施するために、これらの特定の質量分析法を用いることが必須ではないことは、明確に理解すべきである。むしろ、如何なる評価方法も、本発明の目的が維持される限り、試みる(undertake)ことができる。
【００６９】
本発明の一部の実施態様では、ＤＯＣＫプログラム又はＱＸＰを用いて形成された序列から最高ランキング２０％の化合物を用いて、該序列における高ランキングの化合物に化学的に関連した化合物を含む有機化合物の三次元表現のさらなるデータセットを形成する。最良フィッティング(the best fitting)化合物は最高ランキング１％に含まれると思われるが、多様性（環サイズ、鎖長さ、官能基）を与えるために、第２比較に関して、約２０％までの付加的化合物が選択される。このプロセスは、分子相互作用部位中の小さいエラーが化合物同定プロセスにまで拡張しないことを保証する。例えば最高ランキング２０％から得られる構造／スコア・データを数学的に研究して（クラスター化して）、結合を強化する化合物内の傾向又は特徴を見い出す。化合物類を異なる、幾つかのグループにクラスター化する。上述した、化合物の化学的合成及びスクリーニングは、算定されたＤＯＣＫ又はＱＸＰスコアを実際の結合データと相関させることを可能にする。化合物を調製し、スクリーニングした後に、各化合物に関して、予測結合エネルギーと実測(observed)Ｋｄ値とを相関させる。
【００７０】
該結果を用いて、種々な要素（立体的、静電気的）を適当に比較検討する(weigh)予測スコアリング・スキームを開発する。上記方策(strategy)は、高ランクされる化合物に関して、異なる官能基の種々なサイズ及び形状を有する、多くの骨格の迅速な評価を可能にする。このやり方で、序列中で高いランクにある有機化合物に化学的に関連する化合物を含む有機化合物の表現の他のデータセットを、分子相互作用部位の数値的表現に比較して、有機化合物が分子相互作用部位との物理的相互作用を形成する能力に従ってランクされるさらなる序列を決定することができる。このやり方で、序列中で高くランクされる化合物に関連する化合物の三次元構造の表現のさらなるデータセットを得て、これを実際に、現実的結合を仮想的結合に相関させて、最適化している。序列において最高の化合物の所望の数が得られるまで、該全サイクルを必要に応じて反復させることができる。
【００７１】
標的バイオ分子、特に、標的ＲＮアーゼＰＲＮＡに対してアフィニティ及び特異性を有すると分かっている、又は標的ＲＮアーゼＰＲＮＡに結合して、それをモジュレートすることができることが判明している化合物を、本発明の一部の実施態様によって、検出可能な形式で、タグを付ける又は標識することができる。このような標識化方法(labeling)は、例えば、蛍光団、放射性標識、酵素標識及び多くの他の形式のような、当業者に知られた標識化形式の全てを包含することができる。このような標識化又はタグ化は、分子相互作用部位の検出を助長し、染色体の容易なマッピング及び他の有用なプロセスを可能にする。
【００７２】
本明細書に開示する本発明がさらに充分に理解されうるために、以下に実施例を提供する。これらの実施例が例示のためのみのものであり、如何なる意味でも本発明を限定するものと見なすべきではないことは、理解すべきである。本明細書に述べた他に、本発明の種々な修飾は、上記説明から当業者に明らかであろう。このような修飾も、特許請求の範囲内に入るように意図される。さらに、本明細書に引用され、記載された各特許、特許出願及び公開の開示は、それらの全体で本明細書に援用される。
【００７３】
実施例
実施例１：ＲＮアーゼＰＲＮＡの選択
小分子に対する分子相互作用部位を同定する方策を説明するために、ＲＮアーゼＰＲＮＡを用いた。該ＲＮアーゼＰＲＮＡの構造は、Massire et al.,J.Mol.Biol.,1998,279,773-793に開示される。該ＲＮアーゼＰＲＮＡは、幾つかのドメインに折り畳まれる約３７５〜４００ヌクレオチドのＲＮＡである。
【００７４】
実施例２：ＲＮアーゼＰＲＮＡ中の分子相互作用部位
ＲＮアーゼＰＲＮＡ内に非常に多くの分子相互作用部位が発見されている。部位１は第１及び第２ポリヌクレオチドを含む、ＲＮＡの領域を含む。第１ポリヌクレオチドは約２４ヌクレオチド〜約６９ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約３ヌクレオチドを含むダングリング領域、約３ヌクレオチド〜約８ヌクレオチドを含む第１ステムの第１側、約３ヌクレオチド〜約８ヌクレオチドを含む第２ステムの第１側、約３ヌクレオチド〜約８ヌクレオチドを含む第１末端ループ、約３ヌクレオチド〜約８ヌクレオチドを含む第２ステムの第２側、約２ヌクレオチド〜約６ヌクレオチドを含む第３ステムの第１側、約２ヌクレオチド〜約６ヌクレオチドを含む第２末端ループ、約２ヌクレオチド〜約６ヌクレオチドを含む第３ステムの第２側、この第３ステムの第２側には約１ヌクレオチド〜約３ヌクレオチドを含むバルジが任意に存在する、約２ヌクレオチド〜約６ヌクレオチドを含む第４ステムの第１側、この第４ステムの第１側には約１ヌクレオチド〜約５ヌクレオチドを含むバルジが任意に存在する、及び約１ヌクレオチド〜約５ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。第２ポリヌクレオチドは、約８ヌクレオチド〜約２２ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約３ヌクレオチド〜約８ヌクレオチドを含むダングリング領域、約２ヌクレオチド〜約６ヌクレオチドを含む第４ステムの第２側、及び約３ヌクレオチド〜約８ヌクレオチドを含む第１ステムの第２側を有する二本鎖ＲＮＡを形成する。
【００７５】
部位１に関して、第１ポリヌクレオチドは、好ましくは、４５ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：２ヌクレオチドを含むダングリング領域、５ヌクレオチドを含む第１ステムの第１側、５ヌクレオチドを含む第２ステムの第１側、５ヌクレオチドを含む第１末端ループ、５ヌクレオチドを含む第２ステムの第２側、４ヌクレオチドを含む第３ステムの第１側、４ヌクレオチドを含む第２末端ループ、４ヌクレオチドを含む第３ステムの第２側、この第３ステムの第２側の第３ヌクレオチドと第４ヌクレオチドとの間には１ヌクレオチドを含むバルジが存在する、４ヌクレオチドを含む第４ステムの第１側、この第４ステムの第１側の第２ヌクレオチドと第３ヌクレオチドとの間には３ヌクレオチドを含むバルジが存在する、及び３ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第１ポリヌクレオチドは、配列：
【００７６】
【化１】

【００７７】
（配列番号１）（太字ヌクレオチドは好ましい塩基対合(basepairing)を表示する）を含む。第２ポリヌクレオチドは、好ましくは、１４ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：５ヌクレオチドを含むダングリング領域、４ヌクレオチドを含む第４ステムの第２側、及び５ヌクレオチドを含む第１ステムの第２側を有する二本鎖ＲＮＡを形成する。好ましくは、第２ポリヌクレオチドは、配列：
【００７８】
【化２】

【００７９】
（配列番号２）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位１は、図１に示すように、大腸菌(E.coli)中に存在する。
部位２は、第１、第２及び第３ポリヌクレオチドを含む、ＲＮＡの領域を含む。第１ポリヌクレオチドは、約６ヌクレオチド〜約１６ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約３ヌクレオチドを含むダングリング領域、及び約４ヌクレオチド〜約１０ヌクレオチドを含む第１ステムの第１側、この第１ステムの第１側には約１ヌクレオチド〜約３ヌクレオチドを含むバルジが任意に存在する、を有する二本鎖ＲＮＡを形成する。第２ポリヌクレオチドは、約１３ヌクレオチド〜約３４ヌクレオチドを含み、該ポリヌクレオチドの部分は下記特徴（５’〜３’）：約４ヌクレオチド〜約１０ヌクレオチドを含む第１ステムの第２側、この第１ステムの第２側には約１ヌクレオチド〜約３ヌクレオチドを含むバルジが任意に存在する、約４ヌクレオチド〜約１０ヌクレオチドを含むバルジ、約３ヌクレオチド〜約９ヌクレオチドを含む第２ステムの第１側、及び約１ヌクレオチド〜約２ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。第３ポリヌクレオチドは、約５ヌクレオチド〜約１３ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約２ヌクレオチドを含むダングリング領域、約３ヌクレオチド〜約９ヌクレオチドを含む第２ステムの第２側、及び約１ヌクレオチド〜約２ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。
【００８０】
部位２に関して、第１ポリヌクレオチドは、好ましくは、１１ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：２ヌクレオチドを含むダングリング領域、及び７ヌクレオチドを含む第１ステムの第１側、この第１ステムの第１側の第５ヌクレオチドと第６ヌクレオチドとの間に２ヌクレオチドを含むバルジが存在する、を有する二本鎖ＲＮＡを形成する。好ましくは、該第１ポリヌクレオチドは、配列：
【００８１】
【化３】

【００８２】
（配列番号３）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第２ポリヌクレオチドは、好ましくは、２３ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：７ヌクレオチドを含む第１ステムの第２側、この第１ステムの第２側の第５ヌクレオチドと第６ヌクレオチドとの間には２ヌクレオチドを含むバルジが存在する、７ヌクレオチドを含むバルジ、６ヌクレオチドを含む第２ステムの第１側、及び１ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第２ポリヌクレオチドは、配列：
【００８３】
【化４】

【００８４】
（配列番号４）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第３ポリヌクレオチドは、好ましくは、８ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：１ヌクレオチドを含むダングリング領域、６ヌクレオチドを含む第２ステムの第２側、及び１ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第３ポリヌクレオチドは、配列：
【００８５】
【化５】

【００８６】
（配列番号５）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位２は、図１に示すように、大腸菌(E.coli)中に存在する。
部位３は、第１及び第２ポリヌクレオチドを含むＲＮＡ領域を含む。第１ポリヌクレオチドは、約１０ヌクレオチド〜約２６ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約３ヌクレオチドを含むダングリング領域、約２ヌクレオチド〜約６ヌクレオチドを含む第１ステムの第１側、約３ヌクレオチド〜約９ヌクレオチドを含む内部ループの第１側、約３ヌクレオチド〜約６ヌクレオチドを含む第２ステムの第１側、及び約１ヌクレオチド〜約２ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。第２ポリヌクレオチドは、約１０ヌクレオチド〜約２７ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約３ヌクレオチド〜約９ヌクレオチドを含む第２ステムの第２側、約３ヌクレオチド〜約７ヌクレオチドを含む内部ループの第２側、約２ヌクレオチド〜約６ヌクレオチドを含む第１ステムの第２側、及び約２ヌクレオチド〜約５ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。
【００８７】
部位３に関して、第１ポリヌクレオチドは、好ましくは、１９ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：２ヌクレオチドを含むダングリング領域、４ヌクレオチドを含む第１ステムの第１側、６ヌクレオチドを含む内部ループの第１側、６ヌクレオチドを含む第２ステムの第１側、及び１ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第１ポリヌクレオチドは、配列：
【００８８】
【化６】

【００８９】
（配列番号６）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第２ポリヌクレオチドは、好ましくは、１８ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：６ヌクレオチドを含む第２ステムの第２側、５ヌクレオチドを含む内部ループの第２側、４ヌクレオチドを含む第１ステムの第２側、及び３ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第２ポリヌクレオチドは、配列：
【００９０】
【化７】

【００９１】
（配列番号７）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位３は、図１に示すように、大腸菌(E.coli)中に存在する。
部位４は、約１２ヌクレオチド〜約３４ヌクレオチドを含むポリヌクレオチドを含むＲＮＡ領域を含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約３ヌクレオチド〜約９ヌクレオチドを含むステムの第１側、このステムの第１側には約２ヌクレオチド〜約５ヌクレオチドを含む内部ループの第１側が存在する、約２ヌクレオチド〜約６ヌクレオチドを含む末端ループ、約３ヌクレオチド〜約９ヌクレオチドを含む該ステムの第２側、このステムの第２側には約１ヌクレオチド〜約３ヌクレオチドを含む該内部ループの第２側が存在する、及び約１ヌクレオチド〜約２ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。
【００９２】
部位４に関して、ＲＮＡの領域は、好ましくは、２２ヌクレオチドを含むポリヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：６ヌクレオチドを含むステムの第１側、このステムの第１側の第３ヌクレオチドと第４ヌクレオチドとの間には３ヌクレオチドを含む内部ループの第１側が存在する、４ヌクレオチドを含む末端ループ、６ヌクレオチドを含む該ステムの第２側、このステムの第２側の第３ヌクレオチドと第４ヌクレオチドとの間には２ヌクレオチドを含む該内部ループの第２側が存在する、及び１ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、該ポリヌクレオチドは、配列：
【００９３】
【化８】

【００９４】
（配列番号８）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位４は、図２に示すように、枯草菌(B subtilis)中に存在する。
部位５は、第１、第２、第３、第４及び第５ポリヌクレオチドを含むＲＮＡ領域を含む。第１ポリヌクレオチドは約３ヌクレオチド〜約９ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約２ヌクレオチド〜約６ヌクレオチドを含む第１ステムの第１側、及び約１ヌクレオチド〜約３ヌクレオチドを含む第２ステムの第１側を有する二本鎖ＲＮＡを形成する。第２ポリヌクレオチドは、約３ヌクレオチド〜約８ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約３ヌクレオチドを含む第２ステムの第２側、及び約２ヌクレオチド〜約５ヌクレオチドを含む第３ステムの第１側を有する二本鎖ＲＮＡを形成する。第３ポリヌクレオチドは、約７ヌクレオチド〜約１８ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約２ヌクレオチド〜約５ヌクレオチドを含む第３ステムの第２側、この第３ステムの第２側には約１ヌクレオチド〜約２ヌクレオチドを含むバルジが任意に存在する、約１ヌクレオチド〜約３ヌクレオチドを含む第４ステムの第１側、約１ヌクレオチド〜約３ヌクレオチドを含むバルジ、及び約２ヌクレオチド〜約５ヌクレオチドを含む第５ステムの第１側を有する二本鎖ＲＮＡを形成する。第４ポリヌクレオチドは約８ヌクレオチド〜約２０ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約２ヌクレオチド〜約５ヌクレオチドを含む第５ステムの第２側、約３ヌクレオチド〜約７ヌクレオチドを含むバルジ、約１ヌクレオチド〜約３ヌクレオチドを含む第６ステムの第１側、及び約２ヌクレオチド〜約５ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。第５ポリヌクレオチドは、約５ヌクレオチド〜約１５ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約１ヌクレオチド〜約３ヌクレオチドを含むダングリング領域、約１ヌクレオチド〜約３ヌクレオチドを含む第６ステムの第２側、約１ヌクレオチド〜約３ヌクレオチドを含む第４ステムの第２側、及び約２ヌクレオチド〜約６ヌクレオチドを含む第１ステムの第２側を有する二本鎖ＲＮＡを形成する。
【００９５】
部位５に関して、第１ポリヌクレオチドは、好ましくは、６ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：４ヌクレオチドを含む第１ステムの第１側、及び２ヌクレオチドを含む第２ステムの第１側を有する二本鎖ＲＮＡを形成する。好ましくは、第１ポリヌクレオチドは、配列：
【００９６】
【化９】

【００９７】
（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第２ポリヌクレオチドは、好ましくは、５ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：２ヌクレオチドを含む第２ステムの第２側、及び３ヌクレオチドを含む第３ステムの第１側を有する二本鎖ＲＮＡを形成する。好ましくは、第２ポリヌクレオチドは、配列：
【００９８】
【化１０】

【００９９】
（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第３ポリヌクレオチドは、好ましくは、１０ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：３ヌクレオチドを含む第３ステムの第２側、この第３ステムの第２側の第２ヌクレオチドと第３ヌクレオチドとの間には１ヌクレオチドを含むバルジが存在する、２ヌクレオチドを含む第４ステムの第１側、１ヌクレオチドを含むバルジ、及び３ヌクレオチドを含む第５ステムの第１側を有する二本鎖ＲＮＡを形成する。好ましくは、第３ポリヌクレオチドは、配列：
【０１００】
【化１１】

【０１０１】
（配列番号９）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第４ポリヌクレオチドは、好ましくは、１３ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：３ヌクレオチドを含む第５ステムの第２側、５ヌクレオチドを含むバルジ、２ヌクレオチドを含む第６ステムの第１側、及び３ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、第４ポリヌクレオチドは、配列：
【０１０２】
【化１２】

【０１０３】
（配列番号１０）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。第５ポリヌクレオチドは、好ましくは、１０ヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：２ヌクレオチドを含むダングリング領域、２ヌクレオチドを含む第６ステムの第２側、２ヌクレオチドを含む第４ステムの第２側、及び４ヌクレオチドを含む第１ステムの第２側を有する二本鎖ＲＮＡを形成する。好ましくは、第５ポリヌクレオチドは、配列：
【０１０４】
【化１３】

【０１０５】
（配列番号１１）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位５は、図２に示すように、枯草菌(B.subtilis)中に存在する。
部位６は、約１３ヌクレオチド〜約３４ヌクレオチドを含むポリヌクレオチドを含むＲＮＡ領域を含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：約２ヌクレオチド〜約５ヌクレオチドを含むダングリング領域、約２ヌクレオチド〜約５ヌクレオチドを含むステムの第１側、約６ヌクレオチド〜約１６ヌクレオチドを含む末端ループ、約２ヌクレオチド〜約５ヌクレオチドを含む該ステムの第２側、及び約１ヌクレオチド〜約３ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。
【０１０６】
部位６に関して、ＲＮＡの領域は、好ましくは、２２ヌクレオチドを含むポリヌクレオチドを含み、該ポリヌクレオチドの部分は、下記特徴（５’〜３’）：３ヌクレオチドを含むダングリング領域、３ヌクレオチドを含むステムの第１側、１１ヌクレオチドを含む末端ループ、３ヌクレオチドを含む該ステムの第２側、及び２ヌクレオチドを含むダングリング領域を有する二本鎖ＲＮＡを形成する。好ましくは、該ポリヌクレオチドは、配列：
【０１０７】
【化１４】

【０１０８】
（配列番号１２）（太字ヌクレオチドは好ましい塩基対合を表示する）を含む。部位６は図２に示すように、枯草菌(B.subtilis)中に存在する。
【図面の簡単な説明】
【０１０９】
【図１】図１は、部位１、２及び３を示す大腸菌(E.coli)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図１Ａ】図１Ａは、部位１、２及び３を示す大腸菌(E.coli)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図１Ｂ】図１Ｂは、部位１、２及び３を示す大腸菌(E.coli)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図１Ｃ】図１Ｃは、部位１、２及び３を示す大腸菌(E.coli)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図２】図２は、部位４、５及び６を示す枯草菌(B.subtilis)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図２Ａ】図２Ａは、部位４、５及び６を示す枯草菌(B.subtilis)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図２Ｂ】図２Ｂは、部位４、５及び６を示す枯草菌(B.subtilis)ＲＮアーゼＰＲＮＡの代表的構造を示す。
【図２Ｃ】図２Ｃは、部位４、５及び６を示す枯草菌(B.subtilis)ＲＮアーゼＰＲＮＡの代表的構造を示す。【Technical field】
[0001]
Field of Invention
The present invention identifies molecular interaction sites of RNase P RNA, virtual or realistic screening of compounds that bind to the same site, and RNase P RNA activity by the compounds identified in virtual and realistic screening. Regarding modulation.
[0002]
Background of the Invention
Ribonuclease P (RNase P) is an endoribonuclease responsible for removal of the leader sequence from the mature tRNA precursor at the 5 ′ end of the tRNA. Altman et al., FASEB J., 1993, 7, 7-14 and Pace et al., J. Bacteriol., 1995, 177, 1919-1928. RNase P has at least its catalytic function in bacteria depending on the protein. Rather, it is a ribonucleoprotein performed by its RNA component (RNase P RNA). Guerrier-Takada et al., Cell, 1983, 35, 849-857. Another feature of RNase P RNA is its ability to recognize the tertiary structure of its pre-tRNA substrate. Kahle et al., EMBO J., 1990, 9, 1929-1937. The secondary structure of the bacterial RNase P RNA was inferred from a primary comparative analysis of the sequence (James et al., Cell, 1988, 52, 19- 26), fine-tuned when additional sequences were available (Haas et al., Science, 1991, 254, 853-856; Haas et al., Proc. Natl. Acad. Sci. USA, 1994, 91, 2527-2531; Brown et al., Nutl. Acids Res., 1993, 21, 671-679). Furthermore, the induction of the three-dimensional architecture of bacterial RNase P RNA from E. coli and Bacillus subtilis is described in Massire et al., J. Mol. Biol., 1998, 279, 773. -793, which is hereby incorporated by reference in its entirety.
[0003]
Recent advances in genomics, molecular biology and structural biology have emphasized how RNA molecules are involved or control many of the events required for protein expression in cells. RNA molecules do not simply function as intermediates, but actively regulate their own transcription from DNA, splicing and editing mRNA and tRNA molecules, synthesizing peptide bonds in ribosomes, and nascent proteins Catalyze migration to cell membranes and fine-tune the translation speed of messages. RNA molecules can introduce a variety of unique structural motifs that provide the necessary framework to perform these functions.
[0004]
“Small” molecular therapeutics that specifically bind to the constructed RNA molecule are organic chemical molecules that are not polymers. “Small” molecular therapeutics are, for example, the most potent naturally occurring antibiotics. For example, aminoglycosides and macrolide antibiotics act by binding to defined regions of the ribosomal RNA (rRNA) structure and preventing RNA conformational changes necessary for protein synthesis. This is a “small” molecule. Furthermore, changes in the conformation of RNA molecules have been found to regulate the rate of transcription and translation of mRNA. Small molecules are generally less than 10 kDa.
[0005]
Applicants believe that an RNA molecule or group of related RNA molecules has regulatory regions that are utilized by the cell for protein synthesis. Cells are thought to exert control over both the timing and amount of protein synthesized by direct specific interactions with RNA. This idea contradicts the impression obtained by reading scientific papers on gene regulation that are highly concentrated in transcription. The process of RNA maturation, transport, subcellular localization and translation is rich in RNA recognition sites that provide good opportunities for drug binding. Applicant's invention relates, inter alia, to finding these sites of RNA molecules, particularly RNase P RNA, in the microbial genome. Applicant's invention further produces a large number of chemical entities, realistically or virtually, with respect to their ability to bind and / or modulate these drug binding sites and / or Or use combinatorial chemistry to screen.
[0006]
The determination of possible conformations of nucleic acids and their associated structural motifs can be performed, for example, by studying RNA catalysis, RNA-RNA interactions, RNA-nucleic acid interactions, RNA-protein interactions, and small molecule recognition by nucleic acids. Like to give insight into the field. Four general approaches for forming model conformations of RNA have been demonstrated in the literature. All of these use esoteric molecular modeling and computational algorithms for the simulation of folding and tertiary interactions within the target nucleic acid, such as RNA. Westhof and Altman (Proc. Natl. Acad. Sci., 1994, 91, 5133, which is incorporated herein in its entirety) from E. coli by an interactive computer modeling protocol. Of RNase P catalytic RNA subunit, M1 RNA. Mueller and Brimacombe (J. Mol. Biol., 1997, 271, 524) have used a series of important studies in the field of cryogenic electron microscopy (cryo-EM) and biochemical research on ribosomal RNA to produce the E. coli 16S ribosome. A three-dimensional model of RNA is being constructed. Methods for modeling nucleic acid hairpin motifs have been developed based on a set of reduced coordinates representing nucleic acid structure and a sampling algorithm that balances the structure using Monte Carlo (MC) simulation ( Tung, Biophysical J., 1997,72,876, which is incorporated herein in its entirety). The MC-SYM program is yet another approach to predict RNA conformation using constraint-satisfaction methods (Major et al., Proc. Natl. Acad. Sci., 1993, 90, 9408). . The MC-SYM program is a constraint-satisfaction-based algorithm that searches the conformation space for all models that satisfy the query input constraints, for example, Cedergren et al., RNA Structure And Function, 1998. , Cold Spring Harbor Lab. Press, p.37-75. According to this method, RNA conformation is created by stepwise addition of nucleotides having one or several different conformations to a growing oligonucleotide model.
[0007]
Westoff and Altman (Proc. Natl. Acad. Sci., 1994, 91, 5133) have described the catalytic RNA subunit of RNase P from E. coli, M1 RNA, according to an interactive computer modeling protocol. Describes the creation of a three-dimensional model. This modeling protocol incorporated data from chemical and enzymatic defense experiments, phylogenetic analysis, mutant activity studies, and kinetics of reactions catalyzed by substrate binding to M1 RNA. . Modeling was for the most part performed as described in the literature (Westhof et al., In “Theoretical Biochemistry and Molecular Biophysics”, Beveridge and Lavery (Eds.), Adenine, NY, 1990, 399). In general, starting from the primary sequence of M1 RNA, secondary stem-loop structures and other elements were created. Subsequent assembly of these elements into 3D structures using computer graphics stations and FRODO (Jones, J. Appl. Crystallogr., 1978, 11, 268) followed by refinement using NUCLIN-NUCLSQ ) Provided an RNA model with the correct geometry, the absence of bad contacts and the appropriate stereochemistry. The model created in this way has been found to be consistent with most of the empirical data for M1 RNA and provides a hypothesis for the mechanism of action of RNase P. However, since the model created by this method has not been well analyzed, the structure is determined by X-ray crystallography.
[0008]
Mueller and Brimacombe (J. Mol. Biol., 1997, 271, 524, which is incorporated herein in its entirety) uses a modeling program called ERNA-3D to develop a three-dimensional model of E. coli 16S ribosomal RNA. I am making it. This program allows conformations such as A-form RNA helices and single-stranded regions by single-stranded dynamic docking to fit the electron density obtained from low-resolution diffraction data. Is made. After the helical element is determined and positioned in the model, the placement of the single stranded region is adjusted to meet any known biochemical constraints, such as RNA-protein cross-linking and footprinting data. Yes.
[0009]
Methods for modeling nucleic acid hairpin motifs have been developed based on a set of reduced coordinates representing the nucleic acid structure and a sampling algorithm that equilibrates the structure using Monte Carlo (MC) simulation (Tung, Biophysical J ., 1997, 72, 876, which is incorporated herein in its entirety). The stem region of a nucleic acid can be appropriately modeled by using standard canonical duplex formation. An algorithm was created that could form a single stranded loop structure with a pair of fixed ends using a set of reduced coordinates. This allows an effective structural sampling of the loop in the conformation space. Combining this algorithm with a modified Metropolis Monte Carlo algorithm provides a structure simulation package that simplifies the study of nucleic acid hairpin structures by computer means. Once the RNA subdomains have been identified, they can be stabilized as needed by the methods disclosed in US Pat. No. 5,712,096.
[0010]
X-ray crystallography is a very powerful method that allows the determination of several secondary and tertiary structures of biopolymer targets (Erikson et al., Ann. Rep. In Med. Chem., 1992, 27, 271-289), this method is an expensive tool and can be very difficult to achieve. Crystallization of biopolymers is extremely challenging but difficult to perform with adequate resolution and is often considered a technique similar to science. Further confounding the usefulness of X-ray crystal structures in the drug development process is that crystallography cannot reveal the insight into the liquid phase and hence the biologically relevant structure of the target in question That is. Several analyzes of the nature and strength of the interaction between a ligand (agonist, antagonist or inhibitor) and its target are described in ELISA (Kemeny and Challacombe, in ELISA and oher Solid Phase Immunoassays: 1988), radioligand binding assay (Berson et al., Clin. 1968; Chard, in “An Introduction to Radioimmunoassay and Related Techniques”, 1982), surface plasmon resonance (Karlsson et al., 1991, Jonsson et al., Biotechniquws, 1991), or scintillation approximation assay (Udenfriend et al., Anal. Biochem., 1987), all of which have been cited previously. Radioligand binding assays are usually only useful when assessing competitive binding of unknowns at the binding site for radioligand binding and further require the use of radioactivity. The surface plasmon resonance method can be used more directly, but is also very expensive. Conventional biochemical assays for binding kinetics and dissociation and binding constants are also useful in elucidating the nature of target-ligand interactions.
[0011]
Thus, one aspect of the invention identifies molecular interaction sites in RNase P RNA. These molecular interaction sites that make up the secondary structural elements are likely to cause significant therapeutic, regulatory or other interactions with “small” molecules and the like. Another aspect of the invention is to compare the molecular interaction site of RNase P RNA with the compounds proposed for interaction therewith.
[0012]
Yet another aspect of the present invention is the establishment of a database of numerical representations of the three-dimensional structure of the molecular interaction site of RNase P RNA. Such a database library is a powerful tool for elucidating and predicting the structure of molecular interaction sites and the interactions between molecular interaction sites and possible ligands. Another aspect of the invention is a general approach for screening combinatorial libraries containing individual compounds or mixtures of compounds against RNase P RNA to determine which elements of the library bind to the target. Is to provide a practical way.
[0013]
Summary of the Invention
The present invention relates to the identification of molecular interaction sites of RNase P RNA that contain specific secondary structures.
The present invention further relates to nucleic acid molecules, polynucleotides or oligonucleotides containing molecular interaction sites that can be used to virtually or realistically screen combinatorial libraries of compounds that bind to molecular interaction sites. Related.
The invention further relates to a computer readable medium comprising a three-dimensional representation of the structure of the molecular binding site.
[0014]
The invention further relates to the modulation of RNase P RNA activity by contacting RNase P RNA or prokaryotic cells containing it with a compound identified in such a virtual or realistic screen.
The present invention also relates to the modulation of prokaryotic cell proliferation comprising contacting the prokaryotic cell with a compound identified in such a virtual or realistic screen.
[0015]
DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION
The present invention relates, inter alia, to the identification of molecular interaction sites of RNase P RNA. Such molecular interaction sites include secondary structures that can interact with cellular components, such as factors and proteins, necessary for translation and other cellular processes. Nucleic acid molecules or polynucleotides containing molecular interaction sites can be used to virtually or realistically screen combinatorial libraries of compounds that bind to them. Since compounds identified by such screening are used to modulate the activity of RNase P RNA, such compounds may be used to modulate, ie inhibit or stimulate, prokaryotic cell proliferation. Can be used. Therefore, it is possible to identify novel drugs, agricultural chemicals, industrial chemicals, etc. that act via modulation of RNase P RNA.
[0016]
It is preferable to integrate several methods and protocols in order to identify potent drugs and other biologically useful compounds. Useful for drugs, veterinary drugs, agricultural chemicals, pesticides, herbicides, fungicides, industrial chemicals, laboratory chemicals, and pollution prevention, industrial biochemistry and biocatalytic systems Many other active compounds can be identified by embodiments of the present invention. A novel combination of several means gives the method of the present invention great power and versatility. As described in more detail herein, it is preferred in some embodiments to integrate several processes developed by the assignee of the present application, but other methodologies are sufficiently effective with the present invention. It should be recognized that it can be integrated into. Thus, although it is highly advantageous to determine the molecular binding site on RNase P RNA in accordance with the teachings of the present invention, the ligand and ligand mixture and other RNase P RNAs identified as important The interaction can greatly benefit from other aspects of the invention. All such combinations are within the scope of the present invention.
[0017]
One aspect of Applicants' invention relates to the identification of secondary structures in RNase P RNA, called “molecular interaction sites”. As used herein, a “molecular interaction site” is a region of RNase P RNA that has a secondary structure. Molecular interaction sites can be conserved among multiple different taxonomic species of RNase P RNA. The molecular interaction site is contained within a large RNA molecule and is small, preferably less than 200 nucleotides, preferably less than 150 nucleotides, preferably less than 70 nucleotides, preferably less than 50 nucleotides, or less than 30 nucleotides, independent Is a functional subdomain folded into Molecular interaction sites can contain both single-stranded and double-stranded regions. Thus, molecular interaction sites can be interacted with and otherwise with “small” molecules and are “small” in therapeutic and other applications, such as oligonucleotides and other compounds. It is expected to serve as a site for interaction with molecules and oligomers. Molecular interaction sites further include pockets for binding small molecules, drugs and the like.
[0018]
The molecular interaction site is present at least in the RNase P RNA. It will be appreciated that, according to some embodiments of the invention, RNase P RNA having molecular interaction site (s) can be derived from a number of sources. Accordingly, a compound capable of identifying such RNase P RNA by any means, translating it into a three-dimensional representation, and interacting with RNase P RNA to modulate the RNase P RNA. Can be used to identify. In some embodiments, the molecular reciprocal site identified in the RNase P RNA is not present in eukaryotes, particularly humans, and thus is not toxic to humans, and prokaryotic RNase P RNA is It can serve as a “small” molecular binding site while simultaneously modulating.
[0019]
Molecular interaction sites can be identified by any means known to those skilled in the art. In some embodiments of the invention, the molecular interaction sites in RNase P RNA are identified by the general methods described in International Publication No. WO 99/58719, which is incorporated herein in its entirety. Is done. Briefly, the target RNase P RNA nucleotide sequence is selected from known sequences. Any RNase P RNA nucleotide sequence can be selected. The nucleotide sequence of the target RNase P RNA is compared to the nucleotide sequence of multiple RNase P RNAs from different taxonomic species. At least one sequence region is identified that is effectively conserved in the plurality of RNase P RNAs and the target RNase P RNA. Such conserved regions are tested to determine whether any secondary structure is present, and such secondary structures are identified with respect to conserved regions having secondary structures.
[0020]
According to some embodiments of the invention, the nucleotide sequence of the target RNase P RNA is compared to the nucleotide sequences of multiple corresponding RNase P RNAs from different taxonomic species. The initial selection of a particular target nucleic acid can be based on any functional criteria. For example, RNase P RNAs known to be involved in pathogenic genomes, such as bacteria and yeast, are specific targets. Pathogenic bacteria and yeast are well known to those skilled in the art. Other RNase P RNA targets can be determined independently or can be selected from publicly available prokaryotic gene databases known to those skilled in the art. The database includes, for example, Online Mendelian Inheritance in Man (OMIM), the Cancer Genome Anomaly Project (CGAP), GenBank, EMBL, PIR, SWISS-PROT, and the like. OMIM, a database of disease-related gene mutations, was developed in part for the National Center for Biotechnology Information (NCBI). OMIM is, for example, ncbi. nlm. nih. Gov / Omim / can be accessed via the global web of the Internet. CGAP, an interdisciplinary program that establishes the information and technology tools necessary to decipher the molecular anatomy of cancer cells, for example, ncbi. nlm. nih. Gov / ncicgap / can be accessed via the global web of the Internet. Some of these databases may contain complete or partial nucleotide sequences. In addition, RNase P RNA targets can be selected from private gene databases. Alternatively, RNase P RNA targets can be selected from available publications or can be determined specifically for use in connection with the present invention.
[0021]
After selecting or preparing an RNase P RNA target, the nucleotide sequence of the RNase P RNA target is determined and then compared to the nucleotide sequences of multiple RNase P RNAs from different taxonomic species . In one embodiment of the invention, the nucleotide sequence of the RNase P RNA target is determined by scanning at least one gene database or identified in available publications. Databases known and available to those skilled in the art include, for example, GenBank and others. These databases can be used with a searching program such as, for example, Entrez, which is known and available to those skilled in the art. Entrez is, for example, ncbi. nlm. nih. Gov / Entrez / can be accessed via the global web of the Internet. Preferably, the most complete nucleic acid sequence representation available from various databases is used. The GenBank database known and available to those skilled in the art can also be used to obtain the most complete nucleotide sequence. GenBank is the NIH gene sequence database, an annotated collection of all publicly available DNA sequences. GenBank is described, for example, in Nuc. Acids Res., 1998, 26, 1-7, which is hereby incorporated by reference in its entirety, and is known to those skilled in the art, for example, ncbi. nlm. nih. gov / Web / Genbank / index. It can be accessed via the worldwide web of the Internet at html. Alternatively, the partial nucleotide sequence of the RNase P RNA target can be used when the complete nucleotide sequence is not available.
[0022]
The nucleotide sequence of the RNase P RNA target is compared to the nucleotide sequence of multiple RNase P RNAs from different taxonomic species. Multiple RNase P RNAs from different taxonomic species and their nucleotide sequences can be found in genetic databases from available publications or specifically determined for use in connection with the present invention be able to. In one embodiment of the invention, the RNase P RNA target is compared with nucleotide sequences of multiple RNase P RNAs from different taxonomic species by performing a sequence similarity search, an ortholog search, or both. Such searches are known to those skilled in the art.
[0023]
The result of the sequence similarity search is a plurality of RNase P RNAs having at least a portion of their nucleotide sequence that is homologous to a region of at least 8 to 20 nucleotides of the target RNase P RNA, referred to as the window region. is there. Preferably, the plurality of RNase P RNAs comprise at least one portion that is at least 60% homologous to any window region of the target RNase P RNA. More preferably, this homology is at least 70%. More preferably, this homology is at least 80%. Most preferably, this homology is at least 90% or 95%. For example, the window size, ie the portion of the target RNase P RNA to which the plurality of sequences are compared, is about 8 to about 20 contiguous nucleotides, preferably about 10 to about 15 contiguous nucleotides, most preferably about 11 to 11 There can be about 12 contiguous nucleotides. Accordingly, the window size can be adjusted. Next, multiple RNase P RNAs from different taxonomic species are placed in each possible window of the target RNase P RNA, preferably all portions of the plurality of sequences are of the target RNase P RNA. Compare until compared to the window. From different taxonomic species having portions that are at least 60%, preferably at least 70%, more preferably at least 80%, or most preferably at least 90% homologous to any window sequence of the target RNase P RNA Multiple RNase P RNA sequences are considered possible homologous sequences.
[0024]
The sequence similarity search can be performed manually or using several available computer programs known to those skilled in the art. Preferably, the Blast and Smith-Waterman algorithms that are available and known to those skilled in the art can be used. Blast is NCBI's sequence similarity search tool designed to support analysis of nucleotide and protein sequence databases. Blast is, for example, ncbi. nlm. nih. Gov / BLAST / can be accessed via the global web of the Internet. GCG Package provides a local version of Blast that can be used by either a public domain database or any locally available and searchable database. GCG Package v. 9.0 is a commercially available software package that contains over 100 interrelated software programs and allows analysis of sequences by editing, mapping, comparing and aligning them. Other programs encompassed by the GCG Package include, for example, programs that facilitate RNA secondary structure prediction, nucleic acid fragment assembly, and evolutionary analysis. In addition, very good gene databases (GenBank, EMBL, PIR and SWISS-PROT) are distributed with the GCG Package, which are fully accessible by database searching and manipulation programs. GCG is, for example, gcg. com. Is accessible via the global web of the Internet. Fetch is a tool available for GCG, which can get an annotated GenBank record based on the accession number, similar to Entrez. Other sequence similarity searches can be performed by GeneWorld and GeneThesaurus from Pangea. GeneWorld 2.5 is an automated, flexible, high-throughput application for the analysis of polynucleotide and protein sequences. GeneWorld allows for automatic analysis and annotation of sequences. Similar to GCG, GeneWorld incorporates several tools for homology searching, gene detection, multiple sequence alignment, secondary structure prediction, and motif identification. GeneThesaurus1.0 ^TM Is an array and annotation data subscription service that provides information from multiple sources and provides a relational data model for public and local data.
[0025]
Other alternative sequence similarity searches can be performed, for example, by BlastParse. BlastParse is a PERL script that runs on the UNIX® platform that automates the above strategy. BlastParase takes a list of target acceptance numbers in question and parses all GenBank fields to form “tab-delimited” text, which in turn gives flexibility, It can be registered in a “relational database” format for easier searching and analysis. The end result is a series of fully parsed GenBank records as well as an annotation-relational database that can be easily filtered, filtered and queried.
[0026]
Another toolkit that can perform sequence similarity searching and data manipulation is SEALS from NCBI as well. This toolset is written in Perl and C and can run on any computer platform that supports these languages. This is for example ncbi. nlm. nih. It is available for download on the global web of the Internet at gov / Walker / SEALS /. This toolkit allows access to Blast2 or gapped blast. This also includes a tool called tax-collector, which, together with a tool called tax-break, parses the output of Blast2 and presents various query sequences that exist. Returns the identifier of the sequence that is most homologous to (query sequence). Another useful tool is feature2fasta, which extracts sequence fragments based on annotations from the input sequence.
[0027]
Preferably, as described above in the sequence similarity search, to find multiple RNase P RNAs from different taxonomic species that have homology to the target nucleic acid, within which to find orthologs of the target RNase P RNA Delineated further. An ortholog is a term defined in the genetic classification to mean two genes in widely divergent organisms that have sequence similarity and perform similar functions in relation to the organism. In contrast, paralogs are genes within a species that are caused by gene duplication but develop new functions and are also called isotypes. Optionally, a paralog search can be performed. By performing an ortholog search, an exhaustive list of homologous sequences from various organisms can be obtained. These sequences are then analyzed to select the best representative sequence that fits the criteria for being an ortholog. The ortholog search can be performed by programs available to those skilled in the art including, for example, Compare. Preferably, an ortholog search is performed with access to the complete, analyzed GenBank annotation for each of the sequences. Currently, the records obtained from GenBank are “flat-files” and are not ideally suited for automated analysis. Preferably, an ortholog search is performed using the Q-Compare program. The Blast Results-Relation and Annotations-Relational databases are used in the Q-Compare protocol, whereby a list of ortholog sequences is obtained and compared with the interspecies sequence comparison program described below.
[0028]
The similarity search gives results based on a cut-off value called e-score. The e-score represents the probability of a random sequence match within a fixed window of nucleotides. The lower the e-score, the better the match.
Those skilled in the art are familiar with e-scores. The user defines an e-value cutoff depending on stringency or the desired degree of homology, as described above. In some embodiments of the invention, it is preferred that none of the identified RNase P RNA homologous nucleotide sequences are present in the human genome.
[0029]
In another embodiment of the invention, the required sequence is obtained by searching an ortholog database. One such database is Hovergen, which is a curated database of vertebrate orthologs. The ortholog set can be exported from this database and used as is, or as a seed for other sequence similarity searches, as described above. For example, further searches may be desirable to find invertebrate orthologs. Hovergen is, for example, pbil. univ-lyon1. It can be downloaded as a file transfer program at fr / pub / hovergen /. A database of prokaryotic orthologs, COGS, is available, eg, ncbi. nlm. nih. Gov / COG / can be used interactively via the global web of the Internet.
[0030]
After obtaining the orthologs or virtual transcripts by either a sequence similarity search or an ortholog search, among multiple RNase P RNAs and target RNase P RNAs from different taxonomic species Identifying at least one sequence region conserved in Interspecies sequence comparisons can be performed using a number of computer programs available and known to those skilled in the art. It is preferred to perform interspecies sequence comparisons using known compare available to those skilled in the art. Compare is a GCG tool that allows pair-wise comparison of sequences using window / stringency criteria. Compare produces an output file containing the points where a match of the specified property is found. These can be plotted with another GCG tool, DotPlot.
[0031]
Alternatively, a conserved sequence region is identified by interspecies sequence comparison using an ortholog sequence obtained from Q-Compare in combination with CompareOverWins. Preferably, a list of sequences for comparison, ie, an ortholog sequence obtained from Q-Compare, is entered into the CompareOverWins algorithm. Preferably, interspecies sequence comparison is performed by pair-wise sequence comparison by sliding the query sequence over the master target sequence window. Preferably, the window is from about 9 to about 99 contiguous nucleotides.
[0032]
The sequence homology between the target RNase P RNA and the query sequence of any of the plurality of RNase P RNAs obtained as described above is preferably at least 60%, more preferably at least 70%, even more preferably At least 80%, most preferably at least 90% or 95%. The most preferred method of selecting the threshold is to have the computer automatically try all thresholds between 50% and 100% and select the threshold based on user provided metrics. One such metric is to select a threshold so that when n is usually set to 3, exactly n hits are returned. This process is repeated until all bases on the query nucleic acid that are members of the plurality of RNase P RNAs are compared to all bases on the master target sequence. The resulting scoring matrix can be plotted as a scatter plot. Based on the match density at a fixed position, either there are no dots, there are isolated dots, or there is a set of dots that are so close together that they look like lines Can happen. Although small, the presence of the line suggests primary sequence homology. Sequence conservation within RNase P RNA in branched species appears to be an indicator of conserved regulatory elements that may have secondary structure. The results of interspecies sequence comparison can be analyzed using MS Excel and visual basic tools in a fully automated manner known to those skilled in the art.
[0033]
After identifying at least one region conserved between the nucleotide sequence of the RNase P RNA target and the nucleotide sequence of multiple RNase P RNAs from different taxonomic species, preferably by orthologue, the conserved region is Analyze to determine if it contains secondary structure. The determination of whether an identified conserved region contains secondary structure can be made by a number of means known to those skilled in the art. Secondary structure determination is preferably done by self complementarity comparison, alignment and covariance analysis, secondary structure prediction or a combination thereof.
[0034]
In one embodiment of the invention, secondary structure analysis is performed by alignment and covariance analysis. Many protocols for alignment and covariance analysis are known to those skilled in the art. Preferably, the alignment is performed by a known ClustalW, available to those skilled in the art. ClustalW is a tool for multiple sequence alignment that is not part of GCG but can be used with local sequences in addition to an extension of the existing GCG toolset. ClustalW is, for example, dot. imgen. bcm. tmc. edu: 9331 / multi-align / Options / clustalw. It can be accessed via the worldwide web of the Internet at html. ClustalW is further described in Thompson, et al., Nuc. Acids Res., 1994, 22, 4673-4680, which is incorporated herein in its entirety. These processes can be scripted to automatically use the conserved UTR regions identified in the early stages. Sequed, a known UNIX command line interface, available to those skilled in the art, allows the extraction of selected local regions from a large array. Multiple sequences from many different species can be clustered and aligned for further analysis.
[0035]
In another embodiment of the invention, the output of all possible pair-wise CompareOverWindows® comparisons are compiled and matched to the reference sequence using a program called AlignHits, ie a program that can be reproduced by those skilled in the art. Align. One purpose of this program is to map all hits from pairwise comparisons to positions on the reference sequence. This method, combining CompareOverWindow and AlignHits, produces a larger local alignment (over 20-100 bases) than any other algorithm. This local alignment is necessary for the structure finding routine described below, such as covariation or RevComp. This algorithm writes a fasta file of aligned sequences. It is important to distinguish this from using only ClustalW, without CompareOverWindows® and AlignHits.
[0036]
Covariation is a process that uses phylogenetic analysis of primary structure information for consensus secondary structure prediction. Covariation is described in the following references, each of which is incorporated herein in their entirety: Gutell et al., “Comparative Sequence Analysis Of Experiments Performed During Evolution” In Ribosomal RNA Group. I Introns, Green, Ed., Austin: Landes, 1996; Gautheret et al., Nuc. Acids Res., 1997, 25, 1559-1564; Gautheret et al., RNA, 1995, I, 807-814; Lodmell et al., Proc. Natl. Sci. USA, 1995, 92, 10555-10559; Gautheret et al., J. Mol. Biol., 1995, 248, 27-43; Gutell, Nuc. Acids Res., 1994, 22 , 3502-3517; Gutell, Nuc. Acids Res., 1993, 21, 3055-3074; Gutell, Nuc. Acids Res., 1993, 21, 3051-3054; Woese, Proc. Natl. Sci. USA, 1989, 86 , 3119-3122; and Woese et al., Nuc. Acids Res., 1980, 8, 2275-2293, each of which is incorporated herein in its entirety. Preferably, covariance software is used for covariance analysis. Preferably, a program set for covariation, ie, comparative analysis of RNA structure from sequence alignments, is used. Covariation uses phylogenetic analysis of primary sequence information for consensus secondary structure prediction. Covariation is described in, for example, ncsu. edu / RNaseP / info / programs / programs. It can be obtained via the worldwide web of the Internet at html. A complete description of the version of the program has been published (Brown, JW1991, Phylogenetic analysis of RNA structure on the Macintosh computer. CABIOS 7: 391-393). The current version is v4.1, which includes various types of RNA sequence alignments, including standard covariation analysis, compensatory base-changes identification, and mutual information analysis. A covariation analysis can be performed. The program is well documented and has a wide range of example files. It has been edited as a stand-alone program; it does not require Hypercard (but includes a very small “stack” version). This program runs on any Macintosh environment running MacOS v7.1 or higher. Faster processor machines (68040 or PowerPC) are proposed for mutual information analysis or analysis of large sequence alignments.
[0037]
In another embodiment of the present invention, secondary structure analysis is performed by secondary structure prediction. There are many algorithms for predicting RNA secondary structure based on thermodynamic parameters and energy calculations. It is preferred to perform secondary structure prediction using either M-fold or RNA structure 2.52. M-fold is, for example, ibc. Wustl. edu / -zuker / ma / form2. It can be accessed via the global web with cgi or downloaded on the UNIX platform for local use. M-fold is also available as part of the GCG package. RNA Structure 2.52 is a Windows adaptation of the M-fold algorithm, for example, 128.151.176.70/RNAstructure. It can be accessed via the worldwide web of the Internet at html.
[0038]
In another embodiment of the present invention, secondary structure analysis is performed by self complementarity comparison. Preferably, the self-complementary comparison is performed using the above-described Compare. More preferably, the Compare is modified to expand the pairing matrix so that in addition to the conventional Watson-Crick GC / CG or AU / UA pair, GU or UG It can occupy base pairs. Such a modified Compare program (modified Compare) starts by predicting all possible base pairs within a given sequence. As described above, small but conserved regions are identified based on a primary sequence comparison of a series of orthologs. In the modified Compare, each of these sequences is compared to its own reverse complement. Acceptable base pairings include Watson-Crick AU, GC pairings and non-canonical GU pairings. The overlay of such self-complementary plots of all available orthologs and the selection of the largest repeating pattern in each yields the fewest possible folding configurations. These overlays can then be used with additional constraints, including those limited by the above energy considerations, to deduce the most possible secondary structure.
[0039]
In another embodiment of the invention, the output of AlignHits is read by a program called RevComp. This program can be reproduced by those skilled in the art. One of the goals of this program is to predict RNA secondary structure using base pairing rules and ortholog evolution. The RNA secondary structure is composed of a single-stranded region and a base pairing region called a stem. Since we search for structures conserved by evolution, the most promising stem for a certain alignment of ortholog sequences is the stem that can be formed by the most sequences. Possible stem formation or base pairing rules are determined by analyzing base pairing statistics of stems determined by other techniques, such as, for example, NMR. The output of RevComp is a classification list of possible structures, ranked by the percentage of ortholog set member sequences that could form this structure. Since this approach uses a percentage threshold approach, it is insensitive to noise sequences. A noise sequence is a sequence that is either not an actual ortholog, or one that spans the output of AlignHits because of the high degree of sequence homology, even though it does not represent an example of the structure being searched. A very similar algorithm was implemented using Visual basic for Applications (VBA) and Microsoft Excel to run on PCs to generate a reverse complement matrix view for a given set of sequences. Form.
[0040]
The results of the secondary structure analysis, whether performed by alignment and covariation, self-complementary analysis, eg secondary structure prediction using M-fold or other formats, Identification of secondary structures in conserved regions among multiple RNase P RNAs from different taxonomic species. Specific secondary structures that can be identified include, but are not limited to, bulges, loops, stems, hairpins, knots, triple interacts, cloverleafs, or helices, or these Includes combinations. Alternatively, new secondary structures may be identified.
[0041]
The invention also relates to nucleic acid molecules, such as polynucleotides and oligonucleotides, that contain molecular interaction sites present in 16S rRNA. A nucleic acid molecule encompasses the physical compound itself as well as an in silico representation of the compound. Thus, the nucleic acid molecule is derived from RNase P RNA. The molecular interaction site serves as a binding site for at least one molecule that modulates the expression of RNase P RNA in the cell when binding to the molecular interaction site. The nucleotide sequence of the polynucleotide is selected to provide the secondary structure of the molecular interaction site as further detailed in the examples. The nucleotide sequence of the polynucleotide is preferably the nucleotide sequence of the target RNase P RNA described above. Alternatively, the nucleotide sequence is preferably the nucleotide sequence of RNase P RNA from multiple different taxonomic species that also contains molecular interaction sites.
[0042]
The polynucleotide of the present invention comprises a molecular interaction site of RNase P RNA. Thus, the polynucleotide of the present invention comprises the nucleotide sequence of the molecular interaction site. Further, the polynucleotide may be at the 5 ′ or 3 ′ end of each polynucleotide or combinations thereof, up to 50, more preferably up to 40, more preferably up to 30, even more preferably up to 20, most preferably up to 10. Additional nucleotides can be included. Thus, for example, a polynucleotide can contain up to 75 nucleotides when the molecular interaction site contains 25 nucleotides. Nucleotides other than those present at the molecular interaction site are selected to maintain the secondary structure of the molecular interaction site. One skilled in the art can select such additional nucleotides to preserve secondary structure. The polynucleotide can comprise either RNA or DNA, or can be chimeric RNA / DNA. The polynucleotide can include modified bases, sugars and backbones well known to those skilled in the art. In addition, a single polynucleotide can include multiple molecular interaction sites. In addition, multiple polynucleotides can include a single molecule interaction site together. Alternatively, if a plurality of polynucleotides together contain a molecular interaction site, those skilled in the art can combine these polynucleotides together to form a single polynucleotide.
[0043]
The portion of the polynucleotide, including the molecular interaction site, can include one or more deletions, insertions and substitutions. The stem, terminal loop, bulge, inner loop and dangling region can contain one or more deletions, insertions and substitutions. Thus, for example, the terminal loop of a molecular interaction site consisting of 10 nucleotides can be modified to contain one or more insertions, deletions or substitutions to shorten or extend the stem preceding the terminal loop. . Further, for example, unpaired dangling nucleotides adjacent to the double stranded region can be deleted, or other nucleotides can be added to form base pairs and extend the stem. In addition, nucleotide base pairing within the stem can also be substituted, deleted or inserted. Thus, for example, an AU base pair in the stem portion of the molecular interaction site can be replaced with a GC base pair. In addition, non-standard base pairing (eg, GA, CT, GU, etc.) can also be present in the polynucleotide. Thus, for example, as described in the examples below, at least 70%, more preferably 80%, more preferably 90%, more preferably 95%, most preferably 99% homology with the molecular interaction site. Are encompassed within the scope of the present invention. The percent homology is calculated, for example, by the algorithm of Smith and Waterman (Adv. Appl. Math.) By the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix (registered trademark), Genetics Computer Group, University Research Park, Madison WI). 1981, 2, 482-489, which is incorporated herein in its entirety).
[0044]
The invention further relates to the above-described refined and isolated nucleic acids or polynucleotides present in RNase P RNA. A polynucleotide containing a molecular interaction site mimics the portion of RNase P RNA that contains the molecular interaction site.
[0045]
Polynucleotides and their modifications are well known to those skilled in the art. The polynucleotide of the present invention can be used, for example, as a research reagent for detecting a naturally occurring molecule that binds to a molecular interaction site. Alternatively, as detailed below, the polynucleotides of the invention can be used to screen realistically or virtually for small molecules that bind to a molecular interaction site. The virtual formation of compounds for binding to molecular interaction sites and their screening are described, for example, in International Application Publication No. WO 99/58947, which is hereby incorporated by reference in its entirety. The polynucleotides of the invention can also be used as decoys that compete with naturally occurring molecular interaction sites in cells for research, diagnostic and therapeutic applications. In particular, the polynucleotide can be used for therapeutic applications to inhibit bacterial growth. Molecules that bind to molecular interaction sites modulate the function of the RNase P RNA during translation, either by enhancement or reduction. The polynucleotide can also be used for agricultural, industrial and other uses.
[0046]
The present invention also relates to a composition comprising at least one of the above polynucleotides. In some embodiments of the invention, two polynucleotides are included in one composition. The composition of the present invention can optionally include a carrier. A “carrier” is an acceptable solvent, diluent, suspending agent or any other inert vehicle for delivering one or more nucleic acids to an animal and is well known to those skilled in the art. The carrier can be a pharmaceutically acceptable carrier. The carrier can be liquid or solid and is selected with the intended method of administration in mind to give the desired bulk, consistency, etc. when combined with the other components of the composition. The Typical pharmaceutical carriers include, but are not limited to, binders (such as pregelatined corn starch, polyvinyl pyrrolidone or hydroxypropyl methylcellulose); fillers (eg, lactose and other sugars, microcrystalline cellulose, Pectin, gelatin, calcium sulfate, ethyl cellulose, polyacrylate or calcium hydrogen phosphate); lubricants (eg, magnesium stearate, talc, silica, colloidal silicon dioxide, stearic acid, metal stearate, hydrogenated vegetable oil, corn starch Polyethylene glycol, sodium benzoate, sodium acetate, etc.); disintegrating agents (eg, starch, sodium starch glycolate, etc.); or wetting agents (eg, sodium lauryl sulfate, etc.).
[0047]
The present invention is also a method for identifying a compound that binds to a molecular interaction site of RNase P RNA, comprising preparing a numerical representation of the three-dimensional structure of the molecular interaction site, and the three-dimensional structure of a plurality of organic compounds. And providing a compound data set comprising a numerical representation of The numerical representation of the molecular interaction site is then compared with the members of the compound data set, and the organic compound is ranked according to the ability of the organic compound to produce a physical interaction with the molecular interaction site form (hierarchy).
[0048]
The invention also relates to a method of identifying a compound that binds to a molecular interaction site of RNase P RNA or a polynucleotide comprising the same. In some embodiments of the invention, a compound that binds to a molecular interaction site of RNase P RNA or a polynucleotide comprising the same is described in International Application Publication No. WO 99/58947, which is incorporated herein in its entirety. ) Is identified according to the general method described in). Briefly, the method provides a numerical representation of the three-dimensional structure of a molecular interaction site or polynucleotide comprising it, and a compound data set comprising a numerical representation of the three-dimensional structure of a plurality of organic compounds. Comparing the numerical representation of a molecular interaction site with a member of the compound data set and ranking the organic compounds ranked according to the ability of the organic compound to produce a physical interaction with the molecular interaction site Forming.
[0049]
There are many ways to characterize the binding of a molecular interaction site to a ligand, such as an organic compound, for example, International Application Publications WO99 / 58819, WO99 / 59061, WO99 / 58722, WO99 / 45150, WO99 / 58474. And WO 99/58947 describe methodologies, each of which is assigned to the assignee of the present invention, each of which is incorporated herein in its entirety.
[0050]
Furthermore, the present invention also relates to a three-dimensional representation of the nucleic acid molecule and the composition comprising it as described above. The three-dimensional structure of the molecular interaction site of RNase P RNA can be manipulated as a numerical expression. Three-dimensional representations, ie in silico (eg in computer readable form) representations, are generated, for example, in the manner disclosed in International Application Publication No. WO 99/58947, which is hereby incorporated by reference in its entirety. be able to. Briefly, preferably the three-dimensional structure of the molecular interaction site of RNA can be manipulated as a numerical expression. Computer software is commercially available that gives those skilled in the art the ability to design molecules based on the chemistry performed and available reaction building blocks. For example, software packages such as Sybyl / Base (Tripos, St. Louis, MO), Insight II (Molecular Simulations, San Diego, Calif.), And Sculpt (MDL Information Systems, San Leandro, Calif.) Provides computer generation means. These software products also provide methods for evaluating and comparing computer generated molecules and their structures. In silico collections of molecular interaction sites can be generated using software from any of the above suppliers and other software that may or may be available. The three-dimensional representation can be used, for example, to dock the molecule (s) to a possible therapeutic compound. Thus, the three-dimensional representation can be used for drug searching means. Accordingly, the nucleic acid molecules and compositions comprising the same of the present invention include three-dimensional representations of the nucleic acid molecules.
[0051]
A series of structural constraints on the molecular interaction sites of RNase P RNA can arise from, for example, enzymatic mapping and biochemical analysis such as chemical probes, and genomic information such as covariation and sequence conservation. There is. For example, information such as this can be used to form base pairings in specific secondary structure stems and other regions. Additional structural hypotheses may arise for non-standard base pairing schemes in the loop and bulge regions. The Monte Carlo search method can sample the possible conformation of RNase P RNA that matches the program constraints to create a three-dimensional structure.
[0052]
Reports on the generation of three-dimensional in silico representations are available from the perspective of library design, development, and screening for protein targets. Similarly, several attempts in the field of RNA modeling have been reported in the literature. However, a report on the use of a structure-based design approach to query in silico representations of organic molecules, “small” molecules, polynucleotides, or other nucleic acids with a three-dimensional, in silico representation of RNase P RNA structure Does not exist. The present invention preferably comprises the construction of a three-dimensional model of RNase P RNA structure, the construction of three-dimensional in silico representations of multiple organic compounds, “small” molecules, polymer compounds, polynucleotides and other nucleic acids, in silico Screening of such in silico representations for RNase P RNA molecule interaction sites, scoring and identification of the best possible binders from multiple compounds, and finally such in combinatorial format Computer software is used that allows the synthesis of such compounds and the experimental testing of such compounds to identify new ligands for such RNase P RNA targets.
[0053]
Molecules that can be screened using the methods of the present invention include, but are not limited to, organic or inorganic, small to large molecular weight individual compounds, ligands, inhibitors, agonists, antagonists, substrates and, for example, peptides or Includes combinatorial mixtures or libraries of biopolymers such as polynucleotides. Combinatorial mixtures include, but are not limited to, a collection of compounds and a library of compounds. These mixtures can be produced by combinatorial synthesis of the mixtures or by mixing individual compounds. A collection of compounds includes, but is not limited to, individual compound sets or mixture sets or compound pools. These combinatorial libraries can be obtained from synthesis or from natural sources, such as natural sources of microbial, plant, marine, viral and animal substances. Combinatorial libraries include at least about 20 compounds, thousands of individual compounds, and possibly even more. When the combinatorial library is a mixture of compounds, these mixtures typically contain 20-5000 compounds, preferably 50-1000, more preferably 50-100 compounds. Combinations of 100-500 compounds are useful as well as mixtures having 500-1000 individual species. Typically, members of a combinatorial library have a molecular weight of less than about 10,000 Da, more preferably less than 7,500 Da, and most preferably less than 5000 Da.
[0054]
An important advance in the field of virtual screening, called DOCK, allows structure-based database searches to find and identify known molecular interactions for a receptor of interest It was the development of a software program (Kuntz et al., ACC. Chem. Res., 1994, 27, 117; Geschwend and Kuntz, J. Compt.-Aided Mol. Des., 1996, 10, 123). DOCK allows the screening of molecules whose 3D structures are formed in silico but for which no prior knowledge about the interaction with the receptor is available. Therefore, DOCK provides a tool to assist in the discovery of new ligands for the receptor in question. Thus, DOCK can be used to dock a compound prepared by the method of the invention to a desired target. The implementation of DOCK is described, for example, in International Application Publication No. WO 99/58947, which is hereby incorporated by reference in its entirety.
[0055]
In some embodiments of the invention, from RNase P RNA that meets biochemical and genomic constraints specified by the user, eg, using an automated computer-based search algorithm, as described above. Predict all acceptable 3D molecular interaction site structures. These structures are clustered into different families based on, for example, their root mean square deviation values. Further structural refinement can be performed on the representative member (s) of each family by molecular dynamics with explicit solvents and cations.
[0056]
Structural enumeration and representation by these software programs is typically done by drawing molecular skeletons and substituents in two dimensions. Once drawn and stored in a computer, these molecules can be made into a three-dimensional structure using algorithms present in commercially available software. Preferably, MC-SYM is used to create a three-dimensional representation of the molecular interaction site. Making the two-dimensional structure of the molecular interaction site into a three-dimensional model typically results in a low energy conformation or collection of low energy conformers for each molecule. The end result of these commercially available programs is the conversion of the RNase P RNA sequence containing the molecular interaction site into a family of similar numerical representations of the three-dimensional structure of the molecular interaction site. . These numerical representations form an ensemble data set.
[0057]
The three-dimensional structure of a plurality of compounds, preferably “small” organic compounds, can be displayed as a compound data set containing a numerical representation of the three-dimensional structure of these compounds. In this context, “small” molecules refer to non-oligomeric organic compounds. The two-dimensional structure of a compound can be converted to a three-dimensional structure and used for querying the three-dimensional structure of the molecular interaction site, as described above with respect to the molecular interaction site. By using commercially available structure rendering algorithms, two-dimensional structures of compounds can be rapidly generated. Three-dimensional representations of compounds that are polymeric in nature, such as polynucleotides or other nucleic acid structures, can be generated using the literature methods described above. Three-dimensional structures of “small” molecules or other compounds can be generated and low energy conformations can be obtained from short molecular dynamics minimization. These three-dimensional structures can be stored in a relational database. The compound in which the three-dimensional structure is constructed can be proprietary, commercially available, or virtual.
[0058]
In some embodiments of the invention, a compound data set comprising a numerical representation of the three-dimensional structure of a plurality of organic compounds is generated, for example, by a two-dimensional compound library generated by a computer program modified from a commercial program. From, for example, Converter (MSI, San Diego). By converting the two-dimensional structure of the chemical compound into a three-dimensional structure as described above, another appropriate database can be constructed. The final result is a transformation from the two-dimensional structure of an organic compound to a numerical representation of the three-dimensional structure of a plurality of organic compounds. These numerical representations are presented as compound data sets.
[0059]
After both a numerical representation of the three-dimensional structure of the polynucleotide containing the molecular interaction site and a compound data set containing a numerical representation of the three-dimensional structure of multiple organic compounds, a numerical representation of the molecular interaction site is obtained. Expressions are compared with members of the compound data set to form an order of organic compounds. This order is ranked according to the ability of the organic compound to form a physical interaction with the molecular interaction site. Preferably, this comparison is performed sequentially on members of the compound data set. According to some embodiments, this comparison can be performed simultaneously by a plurality of polynucleotides comprising molecular interaction sites.
[0060]
A variety of theoretical and combinatorial methods for studying and optimizing the interaction of “small” molecules or organic compounds with biological targets such as nucleic acids are known to those skilled in the art. These structure-based drug design tools have been very useful in modeling the interactions between proteins and small molecule ligands and optimizing these interactions. Typically, this type of study is performed when the structure of a protein receptor is known by querying individual small molecules, one at a time for this receptor. ing. Usually, these small molecules are either co-crystallized with the receptor, or are related to other molecules that are co-crystallized, or some series with respect to their interaction with the receptor. The knowledge of either was a molecule that existed. Using DOCK as described above, one can find and identify a molecule that is expected to bind to a polynucleotide containing a molecular interaction site, and thus to the RNase P RNA in question. DOCK 4.0 is commercially available from the Regents of the University of California. Equivalent programs are also encompassed by the present invention.
[0061]
The DOCK program has been widely applied to the identification of protein targets and ligands that bind to them. Typically, a new class of molecules that bind to known targets have been identified and later demonstrated in in vitro experiments. The DOCK software programs are SPHGEN (Kuntz et al., J. Mol. Biol., 1982, 161, 269) and CHEMGRID (Meng et al., J. Comput. Chem., 1992, 13, 505, each of which is in its entirety. Which is incorporated herein by reference). SPHGEN forms a cluster of overlapping spheres that represent the solvent accessible surface of the binding pocket within the target receptor. Each cluster represents a possible binding site for a small molecule. CHEMGRID pre-calculates information required for force field scoring of the interaction between the binding molecule and the target RNase P RNA and stores it in a grid file. . The scoring function approximates the molecular mechanics interaction energy and consists of a van der Waals component and an electrostatic component. DOCK orients the ligand molecule to the target site on the RNase P RNA using selected cluster of spheres. Each molecule in the pre-formed three-dimensional database is tested in numerous thousands of orientations within the site, and each orientation is evaluated by a scoring function. Only the orientation with the best score for each compound screened in this way is stored in the output file. Finally, all compounds in the database can be ranked in order of score, and then the best candidate collection can be screened experimentally.
[0062]
A large number of ligands for a variety of protein targets have been identified using DOCK. Recent efforts in this field have generated reports on the use of DOCK to identify and design small molecule ligands that exhibit binding specificity for nucleic acids such as RNA duplexes. RNA plays an important role in many diseases such as AIDS, viral infections and bacterial infections, but little research has been done on small molecules capable of specific RNA binding. For RNA duplexes, compounds with specificity were identified using the DOCK methodology, based on the unique geometry of its deep major groove. Chen et al., Biochemistry, 1997, 36, 11402; and Kuntz et al., ACC. Chem. Res., 1994, 27, 117. Recently reported application of DOCK to the problem of ligand recognition in DNA quadruplexes. Has been. Chen et al., Proc. Natl. Acad. Sci., 1996, 93, 2635.
Individual compounds are preferably represented, for example, as a mol file and combined into a collection of in silico representations using an appropriate chemical structure program or equivalent software. These 2D molar files are exported and converted to 3D structures using commercial software such as Converter (Molecular Simulations Inc., San Diego) or equivalent software, as described above. . For example, atom types suitable for use by docking programs such as DOCK or QXP can be assigned to all atoms in a three-dimensional molar file using, for example, software such as Babel or equivalent software. ).
[0063]
The low energy conformation of each molecule is generated by software such as Discover (MSI, San Diego). An orientation search is performed by bringing each of a plurality of compounds close to the molecular interaction site in many orientations using DOCK or QXP. A contact score for each orientation is determined and then the optimal orientation of the compound is used. Alternatively, the conformation of the compound can be determined from a pre-determined scaffold template conformation.
[0064]
The interaction between a plurality of compounds and a molecular interaction site is examined by comparing a numerical representation of the molecular interaction site with a member of a compound data set. Preferably, multiple compounds are compared to molecular interaction sites, eg, created by a computer program or otherwise, to experience random “movement” between the dihedral bonds of the compounds . Preferably, about 20,000-100,000 compounds are compared to at least one molecular interaction site. Typically, 20,000 compounds are scored compared to about 5 molecule interaction sites. Individual conformations of the three-dimensional structure are placed in many orientations at the target site. Furthermore, during the execution of the DOCK program, the compound and the molecular interaction site are allowed to be “flexible” so that optimal hydrogen bonding, static electricity and van der Waals contacts can be achieved. The energy of interaction is calculated and stored for 10-15 possible orientations of the compound and molecular interaction site. The QXP methodology is currently preferred because it allows for true flexibility in both the ligand and the target.
[0065]
The relative weights of each energy contribution are constantly updated to ensure that the binding scores calculated for all compounds represent experimental binding data. The binding energy for each orientation is scored based on hydrogen bonding, van der Waals contact, static electricity, solvation / desolvation, and quality of the fit. Low energy van der Waals, bipolar and hydrogen bonding interactions between the compound and the molecular interaction site are measured and summed. In some implementations, these parameters can be adjusted according to empirically obtained results. The binding energy of each molecule to the target is output to a relational database. The relational database contains a sequence of compounds that are ranked according to their ability to form physical interactions with molecular interaction sites. Highly ranked compounds can better form physical interactions with molecular interaction sites.
[0066]
In other embodiments, the compound with the highest rank, ie, the best fit, is selected for synthesis. In some embodiments of the invention, compounds are selected for synthesis that are believed to have the desired binding characteristics based on the binding data. Preferably, a maximum ranking of 5% is selected for synthesis. More preferably, a maximum ranking of 10% is selected for synthesis. Even more preferably, the highest ranking of 20% is selected for synthesis. The synthesis of selected compounds can be automated using a parallel array synthesizer or can be prepared using liquid or solid phase methods and equipment. Furthermore, the interaction between the highly ranked compound and the nucleic acid containing the molecular interaction site is evaluated as described below.
[0067]
The interaction of highly ranked organic compounds with polynucleotides containing RNase P RNA molecule interaction sites can be assessed by numerous methods known to those skilled in the art. For example, the highest ranking compounds can be tested for high throughput (HTS) function and activity in cell screens. HTS assays can be measured by scintillation proximity, sedimentation, fluorescence-based formats, filtration-based assays, colorometry assays, and the like. The lead compound can then be scaled up and tested for activity and toxicity in animal models. This assessment preferably includes mass spectrometry or functional bioassay of a mixture of RNase P RNA polynucleotide and at least one compound.
[0068]
Certain evaluation methods using mass spectroscopy are described in International Application Publication No. WO 99/45150, which is hereby incorporated by reference in its entirety, as an illustration of certain useful mass spectrometry methods for use in accordance with the present invention. Incorporated). However, it should be clearly understood that it is not essential to use these specific mass spectrometry methods to practice the present invention. Rather, any evaluation method can be undertaken as long as the purpose of the present invention is maintained.
[0069]
In some embodiments of the invention, an organic compound comprising a compound that is chemically related to the highest ranking compound in the order, using the highest ranking compound of 20% from the order formed using the DOCK program or QXP Form a further data set of a three-dimensional representation of The best fitting compounds are likely to be included in the highest ranking 1%, but up to about 20% addition for the second comparison to give diversity (ring size, chain length, functional group) The selected compound is selected. This process ensures that small errors in the molecular interaction site do not extend to the compound identification process. For example, the structure / score data obtained from the highest ranking of 20% is mathematically studied (clustered) to find trends or features within the compound that enhance binding. The compounds are clustered into different groups. The chemical synthesis and screening of compounds described above makes it possible to correlate the calculated DOCK or QXP score with the actual binding data. After the compounds are prepared and screened, the predicted binding energy and the observed Kd value are correlated for each compound.
[0070]
The results are used to develop a predictive scoring scheme that appropriately weighs various elements (stereoscopic, electrostatic). The above strategy allows for the rapid evaluation of many scaffolds with different sizes and shapes of different functional groups for highly ranked compounds. In this way, comparing other data sets of representations of organic compounds, including compounds that are chemically related to organic compounds that are ranked higher in the rank, to numerical representations of molecular interaction sites, Further ranks ranked according to their ability to form physical interactions with the interaction site can be determined. In this way, we obtain an additional data set of 3D structural representations of the compounds related to the compounds ranked higher in the order, and actually optimize this by correlating the real bonds with the virtual bonds. Yes. The entire cycle can be repeated as necessary until the desired number of compounds with the highest order is obtained.
[0071]
Compounds that are known to have affinity and specificity for a target biomolecule, particularly a target RNase P RNA, or that are known to be able to bind to and modulate a target RNase P RNA Can be tagged or labeled in a detectable manner according to some embodiments of the invention. Such labeling can include all of the labeling formats known to those skilled in the art, such as, for example, fluorophores, radioactive labels, enzyme labels and many other formats. Such labeling or tagging facilitates the detection of molecular interaction sites and allows for easy chromosome mapping and other useful processes.
[0072]
In order that the invention disclosed herein may be more fully understood, the following examples are provided. It should be understood that these examples are for illustrative purposes only and should not be construed as limiting the invention in any way. In addition to those described herein, various modifications of the present invention will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the claims. Furthermore, the disclosures of each patent, patent application, and publication cited and described herein are hereby incorporated by reference in their entirety.
[0073]
Example
Example 1: Selection of RNase P RNA
To explain the strategy for identifying molecular interaction sites for small molecules, RNase P RNA was used. The structure of the RNase P RNA is disclosed in Massire et al., J. Mol. Biol., 1998, 279, 773-793. The RNase P RNA is an RNA of about 375 to 400 nucleotides that folds into several domains.
[0074]
Example 2: Molecular interaction sites in RNase P RNA
Numerous molecular interaction sites have been discovered within RNase P RNA. Site 1 includes a region of RNA that includes first and second polynucleotides. The first polynucleotide comprises from about 24 nucleotides to about 69 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising from about 1 nucleotide to about 3 nucleotides, from about 3 nucleotides to about 3 nucleotides The first side of the first stem comprising 8 nucleotides, the first side of the second stem comprising from about 3 nucleotides to about 8 nucleotides, the first terminal loop comprising from about 3 nucleotides to about 8 nucleotides, from about 3 nucleotides to about 8 nucleotides A second side of the second stem comprising, a first side of the third stem comprising from about 2 nucleotides to about 6 nucleotides, a second terminal loop comprising from about 2 nucleotides to about 6 nucleotides, comprising from about 2 nucleotides to about 6 nucleotides Optional bulge containing about 1 to about 3 nucleotides on the second side of the third stem, the second side of the third stem Present, the first side of the fourth stem comprising about 2 nucleotides to about 6 nucleotides, optionally having a bulge comprising about 1 nucleotides to about 5 nucleotides on the first side of the fourth stem, and about 1 nucleotides Forms double stranded RNA having a dangling region comprising ˜about 5 nucleotides. The second polynucleotide comprises from about 8 nucleotides to about 22 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising from about 3 nucleotides to about 8 nucleotides, from about 2 nucleotides to A double stranded RNA is formed having a second side of the fourth stem comprising about 6 nucleotides and a second side of the first stem comprising about 3 to about 8 nucleotides.
[0075]
With respect to site 1, the first polynucleotide preferably comprises 45 nucleotides, and the portion of the polynucleotide is characterized by the following features (5′-3 ′): dangling region comprising 2 nucleotides, first stem comprising 5 nucleotides The first side of the second stem comprising 5 nucleotides, the first end loop comprising 5 nucleotides, the second side of the second stem comprising 5 nucleotides, the first side of the third stem comprising 4 nucleotides, A second terminal loop containing 4 nucleotides, a second side of the third stem containing 4 nucleotides, and a bulge containing 1 nucleotide between the third and fourth nucleotides on the second side of the third stem. The first side of the fourth stem containing 4 nucleotides, 3 nucleotides between the second and third nucleotides on the first side of the fourth stem. No bulge is present, and to form a double-stranded RNA having a dangling region containing the 3 nucleotides. Preferably, the first polynucleotide has the sequence:
[0076]
[Chemical 1]

[0077]
(SEQ ID NO: 1) (bold nucleotides indicate preferred base pairing). The second polynucleotide preferably comprises 14 nucleotides, the portion of the polynucleotide being characterized as follows (5′-3 ′): dangling region comprising 5 nucleotides, second side of the fourth stem comprising 4 nucleotides , And a double stranded RNA having the second side of the first stem comprising 5 nucleotides. Preferably, the second polynucleotide has the sequence:
[0078]
[Chemical formula 2]

[0079]
(SEQ ID NO: 2) (bold nucleotides indicate preferred base pairing). Site 1 is present in E. coli as shown in FIG.
Site 2 includes a region of RNA that includes first, second and third polynucleotides. The first polynucleotide comprises from about 6 nucleotides to about 16 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising from about 1 nucleotide to about 3 nucleotides, and about 4 nucleotides A double-stranded RNA is formed having a first side of a first stem comprising about 10 nucleotides, optionally having a bulge comprising about 1 to about 3 nucleotides on the first side of the first stem. The second polynucleotide comprises from about 13 nucleotides to about 34 nucleotides, a portion of the polynucleotide comprising the following features (5′-3 ′): the second side of the first stem comprising from about 4 nucleotides to about 10 nucleotides, There is optionally a bulge comprising about 1 nucleotide to about 3 nucleotides on the second side of the first stem, a bulge comprising about 4 nucleotides to about 10 nucleotides, a second bulge comprising about 3 nucleotides to about 9 nucleotides. A double stranded RNA is formed having a dangling region comprising one side and from about 1 nucleotide to about 2 nucleotides. The third polynucleotide comprises from about 5 nucleotides to about 13 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising from about 1 nucleotide to about 2 nucleotides, from about 3 nucleotides to A double stranded RNA is formed having a second side of a second stem comprising about 9 nucleotides and a dangling region comprising about 1 to about 2 nucleotides.
[0080]
With respect to site 2, the first polynucleotide preferably comprises 11 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising 2 nucleotides, and a first comprising 7 nucleotides. A double-stranded RNA is formed having a first side of the stem, a bulge containing 2 nucleotides between the 5th and 6th nucleotides on the first side of the first stem. Preferably, the first polynucleotide has the sequence:
[0081]
[Chemical Formula 3]

[0082]
(SEQ ID NO: 3) (bold nucleotides indicate preferred base pairing). The second polynucleotide preferably comprises 23 nucleotides and the portion of the polynucleotide comprises the following features (5′-3 ′): the second side of the first stem comprising 7 nucleotides, the second of the first stem A bulge containing 2 nucleotides exists between the 5th and 6th nucleotides on the side, a bulge containing 7 nucleotides, a first side of the second stem containing 6 nucleotides, and a dangling region containing 1 nucleotide Having double stranded RNA. Preferably, the second polynucleotide has the sequence:
[0083]
[Formula 4]

[0084]
(SEQ ID NO: 4) (bold nucleotides indicate preferred base pairing). The third polynucleotide preferably comprises 8 nucleotides, the portion of the polynucleotide being characterized as follows (5′-3 ′): a dangling region comprising 1 nucleotide, the second side of the second stem comprising 6 nucleotides And a double stranded RNA having a dangling region comprising 1 nucleotide. Preferably, the third polynucleotide has the sequence:
[0085]
[Chemical formula 5]

[0086]
(SEQ ID NO: 5) (bold nucleotides indicate preferred base pairing). Site 2 is present in E. coli as shown in FIG.
Site 3 includes an RNA region that includes first and second polynucleotides. The first polynucleotide comprises from about 10 nucleotides to about 26 nucleotides, the portion of the polynucleotide comprising the following features (5 ′ to 3 ′): a dangling region comprising from about 1 nucleotide to about 3 nucleotides, from about 2 nucleotides to A first side of a first stem comprising about 6 nucleotides, a first side of an inner loop comprising from about 3 nucleotides to about 9 nucleotides, a first side of a second stem comprising from about 3 nucleotides to about 6 nucleotides, and about 1 nucleotide Forms double stranded RNA having a dangling region comprising ˜about 2 nucleotides. The second polynucleotide comprises from about 10 nucleotides to about 27 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the second side of the second stem comprising from about 3 nucleotides to about 9 nucleotides, Two having a second side of an inner loop comprising about 3 nucleotides to about 7 nucleotides, a second side of a first stem comprising about 2 nucleotides to about 6 nucleotides, and a dangling region comprising about 2 nucleotides to about 5 nucleotides Forms strand RNA.
[0087]
With respect to site 3, the first polynucleotide preferably comprises 19 nucleotides, and the portion of the polynucleotide comprises the following features (5′-3 ′): dangling region comprising 2 nucleotides, first stem comprising 4 nucleotides A double-stranded RNA having a first side of the first side of the inner loop comprising 6 nucleotides, a first side of the second stem comprising 6 nucleotides, and a dangling region comprising 1 nucleotide. Preferably, the first polynucleotide has the sequence:
[0088]
[Chemical 6]

[0089]
(SEQ ID NO: 6) (bold nucleotides indicate preferred base pairing). The second polynucleotide preferably comprises 18 nucleotides, the portion of the polynucleotide being characterized by the following features (5′-3 ′): the second side of the second stem comprising 6 nucleotides and the inner loop comprising 5 nucleotides A double-stranded RNA is formed having a second side, a second side of the first stem containing 4 nucleotides, and a dangling region containing 3 nucleotides. Preferably, the second polynucleotide has the sequence:
[0090]
[Chemical 7]

[0091]
(SEQ ID NO: 7) (bold nucleotides indicate preferred base pairing). Site 3 is present in E. coli as shown in FIG.
Site 4 comprises an RNA region comprising a polynucleotide comprising from about 12 nucleotides to about 34 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): of a stem comprising from about 3 nucleotides to about 9 nucleotides On the first side, the first side of the stem has a first side of an internal loop containing about 2 nucleotides to about 5 nucleotides, a terminal loop containing about 2 nucleotides to about 6 nucleotides, about 3 nucleotides to about 9 nucleotides A second side of the stem comprising, a second side of the stem is present on the second side of the inner loop comprising from about 1 nucleotide to about 3 nucleotides, and has a dangling region comprising from about 1 nucleotide to about 2 nucleotides Forms double-stranded RNA.
[0092]
With respect to site 4, the region of RNA preferably comprises a polynucleotide comprising 22 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the first side of the stem comprising 6 nucleotides, this stem There is a first side of an internal loop containing 3 nucleotides between the 3rd and 4th nucleotides on the 1st side of this, a terminal loop containing 4 nucleotides, the second side of the stem containing 6 nucleotides, this stem The second side of the inner loop containing 2 nucleotides is present between the third and fourth nucleotides on the second side of the double-stranded RNA having a dangling region containing 1 nucleotide and a dangling region containing 1 nucleotide. Preferably, the polynucleotide has the sequence:
[0093]
[Chemical 8]

[0094]
(SEQ ID NO: 8) (bold nucleotides indicate preferred base pairing). Site 4 is present in B subtilis as shown in FIG.
Site 5 includes an RNA region comprising first, second, third, fourth and fifth polynucleotides. The first polynucleotide comprises from about 3 nucleotides to about 9 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the first side of the first stem comprising from about 2 nucleotides to about 6 nucleotides, and A double stranded RNA is formed having the first side of the second stem comprising from about 1 nucleotide to about 3 nucleotides. The second polynucleotide comprises from about 3 nucleotides to about 8 nucleotides, and the portion of the polynucleotide comprises the following features (5′-3 ′): the second side of the second stem comprising from about 1 nucleotide to about 3 nucleotides; And forming a double-stranded RNA having a first side of a third stem comprising from about 2 nucleotides to about 5 nucleotides. The third polynucleotide comprises from about 7 nucleotides to about 18 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the second side of the third stem comprising from about 2 nucleotides to about 5 nucleotides, The second side of the third stem optionally has a bulge comprising about 1 nucleotide to about 2 nucleotides, the first side of the fourth stem comprising about 1 nucleotide to about 3 nucleotides, about 1 nucleotide to about 3 nucleotides And a double-stranded RNA having a first side of a fifth stem comprising from about 2 nucleotides to about 5 nucleotides. The fourth polynucleotide comprises from about 8 nucleotides to about 20 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the second side of the fifth stem comprising from about 2 nucleotides to about 5 nucleotides, about A double stranded RNA is formed having a bulge comprising 3 to about 7 nucleotides, a first side of a sixth stem comprising about 1 to about 3 nucleotides, and a dangling region comprising about 2 to about 5 nucleotides. The fifth polynucleotide comprises from about 5 nucleotides to about 15 nucleotides, the portion of the polynucleotide comprising the following features (5 ′ to 3 ′): a dangling region comprising from about 1 nucleotide to about 3 nucleotides, from about 1 nucleotide to Two having a second side of the sixth stem comprising about 3 nucleotides, a second side of the fourth stem comprising about 1 nucleotide to about 3 nucleotides, and a second side of the first stem comprising about 2 nucleotides to about 6 nucleotides. Forms single-stranded RNA.
[0095]
With respect to site 5, the first polynucleotide preferably comprises 6 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): the first side of the first stem comprising 4 nucleotides, and 2 nucleotides A double-stranded RNA having the first side of the second stem containing Preferably, the first polynucleotide has the sequence:
[0096]
[Chemical 9]

[0097]
(Bold nucleotides indicate preferred base pairing). The second polynucleotide preferably comprises 5 nucleotides, the portion of the polynucleotide being characterized as follows (5′-3 ′): the second side of the second stem comprising 2 nucleotides, and the third comprising 3 nucleotides. A double stranded RNA having the first side of the stem is formed. Preferably, the second polynucleotide has the sequence:
[0098]
[Chemical Formula 10]

[0099]
(Bold nucleotides indicate preferred base pairing). The third polynucleotide preferably comprises 10 nucleotides and the portion of the polynucleotide comprises the following features (5′-3 ′): the second side of the third stem comprising 3 nucleotides, the second of the third stem There is a bulge containing 1 nucleotide between the second and third nucleotides on the side. A double-stranded RNA having a first side is formed. Preferably, the third polynucleotide has the sequence:
[0100]
Embedded image

[0101]
(SEQ ID NO: 9) (bold nucleotides indicate preferred base pairing). The fourth polynucleotide preferably comprises 13 nucleotides, and the portion of the polynucleotide comprises the following characteristics (5′-3 ′): the second side of the fifth stem comprising 3 nucleotides, the bulge comprising 5 nucleotides, 2 A double-stranded RNA is formed having a first side of a sixth stem containing nucleotides and a dangling region containing 3 nucleotides. Preferably, the fourth polynucleotide has the sequence:
[0102]
Embedded image

[0103]
(SEQ ID NO: 10) (bold nucleotides indicate preferred base pairing). The fifth polynucleotide preferably comprises 10 nucleotides, and the portion of the polynucleotide is characterized by the following features (5′-3 ′): dangling region comprising 2 nucleotides, second side of the sixth stem comprising 2 nucleotides A double stranded RNA is formed having a second side of the fourth stem comprising 2 nucleotides and a second side of the first stem comprising 4 nucleotides. Preferably, the fifth polynucleotide has the sequence:
[0104]
Embedded image

[0105]
(SEQ ID NO: 11) (bold nucleotides indicate preferred base pairing). Site 5 is present in B. subtilis as shown in FIG.
Site 6 comprises an RNA region comprising a polynucleotide comprising from about 13 nucleotides to about 34 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): dangling comprising from about 2 nucleotides to about 5 nucleotides A region, a first side of a stem comprising from about 2 nucleotides to about 5 nucleotides, a terminal loop comprising from about 6 nucleotides to about 16 nucleotides, a second side of the stem comprising from about 2 nucleotides to about 5 nucleotides, and from about 1 nucleotide to A double stranded RNA is formed having a dangling region comprising about 3 nucleotides.
[0106]
With respect to site 6, the region of RNA preferably comprises a polynucleotide comprising 22 nucleotides, the portion of the polynucleotide comprising the following features (5′-3 ′): a dangling region comprising 3 nucleotides, 3 nucleotides A double stranded RNA is formed having a first side of the stem, a terminal loop containing 11 nucleotides, a second side of the stem containing 3 nucleotides, and a dangling region containing 2 nucleotides. Preferably, the polynucleotide has the sequence:
[0107]
Embedded image

[0108]
(SEQ ID NO: 12) (bold nucleotides indicate preferred base pairing). Site 6 is present in B. subtilis as shown in FIG.
[Brief description of the drawings]
[0109]
FIG. 1 shows a representative structure of E. coli RNase P RNA showing sites 1, 2 and 3. FIG.
FIG. 1A shows a representative structure of E. coli RNase P RNA showing sites 1, 2 and 3.
FIG. 1B shows a representative structure of E. coli RNase P RNA showing sites 1, 2 and 3.
FIG. 1C shows a representative structure of E. coli RNase P RNA showing sites 1, 2 and 3.
FIG. 2 shows a representative structure of B. subtilis RNase P RNA showing sites 4, 5, and 6.
FIG. 2A shows a representative structure of B. subtilis RNase P RNA showing sites 4, 5, and 6. FIG.
FIG. 2B shows a representative structure of B. subtilis RNase P RNA showing sites 4, 5, and 6.
FIG. 2C shows a representative structure of B. subtilis RNase P RNA showing sites 4, 5, and 6.

Claims

A composition comprising a first polynucleotide and a second polynucleotide comprising:
A first polynucleotide comprising a dangling region comprising at least 26 nucleotides and not more than 117 nucleotides and comprising from about 1 nucleotide to about 3 nucleotides; a first side of the first stem comprising from about 3 nucleotides to about 7 nucleotides; The first side of the second stem comprising nucleotides to about 7 nucleotides, the first terminal loop comprising about 3 nucleotides to about 7 nucleotides, the second side of the second stem comprising about 3 nucleotides to about 7 nucleotides, about 2 nucleotides to A first side of a third stem comprising about 6 nucleotides; a second terminal loop comprising about 2 nucleotides to about 6 nucleotides; a second side of a third stem comprising about 2 nucleotides to about 6 nucleotides; On the two side there is a bulge containing from about 1 nucleotide to about 2 nucleotides, from about 2 nucleotides to A first side of a fourth stem comprising 6 nucleotides, a bulge comprising from about 2 nucleotides to about 5 nucleotides is present on the first side of the fourth stem, and a dangling region comprising from about 2 nucleotides to about 5 nucleotides As well as a dangling region comprising from about 3 nucleotides to about 7 nucleotides, and from about 2 nucleotides to about 6 nucleotides A composition comprising a secondary structure defined by a second side of the fourth stem and a second side of the first stem comprising from about 3 nucleotides to about 7 nucleotides.

A first polynucleotide comprising at least 45 to 95 nucleotides and a dangling region comprising 2 nucleotides, a first side of a first stem comprising 5 nucleotides, a first side of a second stem comprising 5 nucleotides, A first end loop comprising 5 nucleotides, a second side of a second stem comprising 5 nucleotides, a first side of a third stem comprising 4 nucleotides, a second end loop comprising 4 nucleotides, a third stem comprising 4 nucleotides There is a bulge containing 1 nucleotide between the third and fourth nucleotides on the second side, the second side of the third stem, the first side of the fourth stem containing 4 nucleotides, the fourth stem A bulge containing 3 nucleotides exists between the second and third nucleotides on the first side of A secondary structure defined by a dangling region comprising a tide; and a fourth stem, wherein the second polynucleotide comprises at least 14 nucleotides up to 64 nucleotides and comprises 5 nucleotides And a secondary structure defined by the second side of the first stem comprising 5 nucleotides.

The composition of claim 2, wherein the first polynucleotide comprises SEQ ID NO: 1.

The composition of claim 2, wherein the second polynucleotide comprises SEQ ID NO: 2.

A composition comprising a first polynucleotide, a second polynucleotide and a third polynucleotide comprising:
A first polynucleotide comprising a dangling region comprising at least 6 nucleotides up to 56 nucleotides and comprising from about 1 nucleotide to about 3 nucleotides, and a first side of the first stem comprising from about 4 nucleotides to about 10 nucleotides; Including a secondary structure defined by the presence of a bulge comprising about 1 to about 3 nucleotides on the first side of the first stem;
The second polynucleotide comprises at least 13 nucleotides up to 84 nucleotides and comprises about 4 nucleotides to about 10 nucleotides on the second side of the first stem, the second side of the first stem about 1 nucleotides to about A bulge comprising about 4 to about 10 nucleotides, a first side of a second stem comprising about 3 to about 9 nucleotides, and a dangling region comprising about 1 to about 2 nucleotides, wherein a bulge comprising 3 nucleotides is present A dangling region comprising from about 3 nucleotides to about 9 nucleotides, wherein the third polynucleotide comprises at least 5 to 63 nucleotides and comprises from about 1 nucleotide to about 2 nucleotides. The second side of the second stem containing, and about 1 nu Composition comprising a secondary structure as defined by dangling region containing Reochido to about 2 nucleotides.

A first polynucleotide comprising at least 11 to 61 nucleotides and comprising a dangling region comprising 2 nucleotides; a first side of the first stem comprising 7 nucleotides; a fifth nucleotide on the first side of the first stem A secondary structure defined by the presence of a bulge containing 2 nucleotides between and 6th nucleotide;
The second polynucleotide comprises at least 23 nucleotides to 73 nucleotides or less and includes 7 nucleotides on the second side of the first stem, between the 5th and 6th nucleotides on the second side of the first stem. A bulge comprising 2 nucleotides, a bulge comprising 7 nucleotides, a first side of a second stem comprising 6 nucleotides, and a dangling region comprising 1 nucleotide, comprising a secondary structure, and a third polynucleotide Includes a secondary structure defined by a dangling region comprising at least 8 to 58 nucleotides and comprising a dangling region comprising 1 nucleotide, a second side of a second stem comprising 6 nucleotides, and a dangling region comprising 1 nucleotide. The composition according to claim 5.

The composition of claim 6, wherein the first polynucleotide comprises SEQ ID NO: 3.

7. The composition of claim 6, wherein the second polynucleotide comprises SEQ ID NO: 4.

The composition of claim 6, wherein the third polynucleotide comprises SEQ ID NO: 5.

A composition comprising a first polynucleotide and a second polynucleotide comprising:
A dangling region wherein the first polynucleotide comprises at least 10 nucleotides up to 79 nucleotides and comprises about 1 nucleotide to about 3 nucleotides; the first side of the first stem comprising about 2 nucleotides to about 6 nucleotides; about 3 nucleotides A secondary structure defined by a first side of an inner loop comprising about 9 nucleotides, a first side of a second stem comprising about 3 nucleotides to about 9 nucleotides, and a dangling region comprising about 1 nucleotides to about 2 nucleotides And an internal loop comprising from about 3 nucleotides to about 7 nucleotides on the second side of the second stem, wherein the second polynucleotide comprises at least 10 nucleotides up to 77 nucleotides and comprises about 3 nucleotides to about 9 nucleotides The second side of from about 2 nucleotides to about 6 nucleotides The second side of the first stem comprising fault, and compositions comprising a secondary structure as defined by dangling region comprising about 2 nucleotides to about 5 nucleotides.

A first polynucleotide comprising at least 19 to 69 nucleotides and a dangling region comprising 2 nucleotides, a first side of a first stem comprising 4 nucleotides, a first side of an internal loop comprising 6 nucleotides, 6 nucleotides Including a secondary structure defined by a first side of a second stem comprising a dangling region comprising 1 nucleotide; and a second polynucleotide comprising at least 18 to 68 nucleotides and 6 nucleotides Comprising a secondary structure defined by a second side of the second stem comprising, a second side of the inner loop comprising 5 nucleotides, a second side of the first stem comprising 4 nucleotides, and a dangling region comprising 3 nucleotides, The composition according to claim 10.

12. The composition of claim 11, wherein the first polynucleotide comprises SEQ ID NO: 6.

12. The composition of claim 11, wherein the second polynucleotide comprises SEQ ID NO: 7.

A polynucleotide comprising at least 12 to 84 nucleotides, the first side of the stem comprising from about 3 nucleotides to about 9 nucleotides, the inner side comprising from about 2 nucleotides to about 5 nucleotides on the first side of the stem The first side is present, a terminal loop comprising about 2 nucleotides to about 6 nucleotides, a second side of the stem comprising about 3 nucleotides to about 9 nucleotides, and about 1 nucleotide to about 3 nucleotides on the second side of the stem. A polynucleotide comprising a secondary structure defined by a dangling region comprising a second side of the inner loop comprising and comprising from about 1 nucleotide to about 2 nucleotides.

15. The polynucleotide of claim 14, comprising at least 22 to 72 nucleotides, wherein the first side of the stem comprising 6 nucleotides, between the third and fourth nucleotides on the first side of the stem There is a first side of an internal loop containing 3 nucleotides, a terminal loop containing 4 nucleotides, a second side of the stem containing 6 nucleotides, and between the third and fourth nucleotides on the second side of the stem A polynucleotide comprising a secondary structure present by a second side of the inner loop comprising 2 nucleotides and defined by a dangling region comprising 1 nucleotide.

The polynucleotide of claim 15 comprising SEQ ID NO: 8.

A composition comprising a first polynucleotide, a second polynucleotide, a third polynucleotide, a fourth polynucleotide and a fifth polynucleotide comprising:
The first polynucleotide comprises at least 3 nucleotides up to 59 nucleotides and less, and the first side of the first stem comprising from about 2 nucleotides to about 6 nucleotides, and the second stem second comprising from about 1 nucleotides to about 3 nucleotides. Including a secondary structure defined by one side;
The second polynucleotide comprises at least 3 nucleotides to 58 nucleotides or less, the second side of the second stem comprising from about 1 nucleotide to about 3 nucleotides, and the first of the third stem comprising from about 2 nucleotides to about 5 nucleotides. Including secondary structures defined by sides;
The third polynucleotide comprises at least 7 nucleotides up to 68 nucleotides and comprises about 2 nucleotides to about 5 nucleotides on the second side of the third stem, on the second side of the third stem about 1 nucleotides to about A first side of a fourth stem comprising from about 1 nucleotide to about 3 nucleotides, a bulge comprising from about 1 nucleotide to about 3 nucleotides, and a fifth stem comprising from about 2 nucleotides to about 5 nucleotides, wherein a bulge comprising 2 nucleotides is present A secondary structure defined by the first side of
The fourth polynucleotide comprises at least 8 nucleotides to 70 nucleotides or less and the second side of the fifth stem comprising from about 2 nucleotides to about 5 nucleotides, a bulge comprising from about 3 nucleotides to about 7 nucleotides, from about 1 nucleotide to Comprising a secondary structure defined by the first side of the sixth stem comprising about 3 nucleotides and a dangling region comprising about 2 to about 5 nucleotides; and the fifth polynucleotide comprises at least 5 to 65 nucleotides or less And a dangling region comprising about 1 nucleotide to about 3 nucleotides, the second side of the sixth stem comprising about 1 nucleotide to about 3 nucleotides, the second of the fourth stem comprising about 1 nucleotide to about 3 nucleotides Side, and from about 2 nucleotides to about 6 nucleotides Composition comprising a secondary structure defined by the second side of the non-first stem.

The first polynucleotide comprises a secondary structure defined by a first side of a first stem comprising at least 6 to 56 nucleotides and comprising 4 nucleotides and a second stem comprising 2 nucleotides. Including;
The second polynucleotide comprises at least 5 to 55 nucleotides and has a secondary structure defined by the second side of the second stem comprising 2 nucleotides and the first side of the third stem comprising 3 nucleotides Including;
The third polynucleotide comprises at least 10 to 60 nucleotides or less, and includes a second side of the third stem containing 3 nucleotides, a second nucleotide between the second and third nucleotides on the second side of the third stem. A secondary structure defined by a first side of a fourth stem comprising 2 nucleotides, a bulge comprising 1 nucleotide, and a first side of a fifth stem comprising 3 nucleotides, wherein a bulge comprising nucleotides is present;
The fourth polynucleotide comprises at least 13 to 63 nucleotides and less, and the second side of the fifth stem containing 3 nucleotides, the bulge containing 5 nucleotides, the first side of the sixth stem containing 2 nucleotides, and 3 A sixth structure comprising a dangling region defined by a dangling region comprising nucleotides; and a fifth polynucleotide comprising a dangling region comprising at least 10 to 50 nucleotides and comprising 2 nucleotides; 18. The composition of claim 17, comprising a secondary structure defined by the second side of the second stem of the fourth stem comprising 2 nucleotides and the second side of the first stem comprising 4 nucleotides.

19. The composition of claim 18, wherein the first polynucleotide comprises 5'-cgucccc-3 '.

The composition of claim 18, wherein the second polynucleotide comprises 5'-ggggca-3 '.

19. The composition of claim 18, wherein the third polynucleotide comprises SEQ ID NO: 9.

19. The composition of claim 18, wherein the fourth polynucleotide comprises SEQ ID NO: 10.

19. The composition of claim 18, wherein the fifth polynucleotide comprises SEQ ID NO: 11.

A polynucleotide comprising at least 13 to 84 nucleotides, a dangling region comprising from about 2 nucleotides to about 5 nucleotides, a first side of a stem comprising from about 2 nucleotides to about 5 nucleotides, from about 6 nucleotides to about 16 nucleotides A polynucleotide comprising a secondary structure defined by a terminal loop comprising: a second side of the stem comprising from about 2 nucleotides to about 5 nucleotides; and a dangling region comprising from about 1 nucleotide to about 3 nucleotides.

25. The polynucleotide of claim 24 comprising at least 22 to 72 nucleotides, a dangling region comprising 3 nucleotides, a first side of a stem comprising 3 nucleotides, a terminal loop comprising 11 nucleotides, comprising 3 nucleotides A polynucleotide comprising a secondary structure defined by a second side of the stem and a dangling region comprising 2 nucleotides.

26. The polynucleotide of claim 25 comprising SEQ ID NO: 12.