JP2003527649A

JP2003527649A - System and method for database similarity join

Info

Publication number: JP2003527649A
Application number: JP2000614159A
Authority: JP
Inventors: ハースト，ジヨン・アール; ハツトン，スコツト・ジー
Original assignee: アリーナ・フアーマシユーチカルズ・インコーポレーテツド
Priority date: 1999-04-28
Filing date: 2000-04-26
Publication date: 2003-09-16
Also published as: EP1323068A2; CA2370064A1; MXPA01010906A; WO2000065484A3; US20040153250A1; WO2000065484A2; US6721754B1; AU4500000A

Abstract

(57)【要約】ここに明らかにされるのは情報組織化のシステム及び方法であり、これによりエンティティに関する特徴が類似エンティティの特徴から推察される。これは、ここでは「ファジー類似結合」と呼び、薬品類似結合を使って例示される。明らかにされたシステム及び方法は、薬剤開発の領域における化合物解析に特に有用である。 (57) [Summary] What is disclosed is a system and method for organizing information, in which characteristics regarding an entity are inferred from characteristics of a similar entity. This is referred to herein as "fuzzy-like binding" and is illustrated using drug-like binding. The disclosed systems and methods are particularly useful for compound analysis in the area of drug development.

Description

Detailed Description of the Invention

【０００１】[0001]

BACKGROUND OF THE INVENTION

【０００２】[0002]

FIELD OF THE INVENTION

本発明は、一般に情報データベースに関し、特にデータベース類似結合、より
特別には情報を組織化しこれにより全体に関する特徴が同様な全体の特徴から推
測されるシステム及び方法に関する。これは、ここでは「ファジー」類似結合と
呼び、薬品の類似結合を使って例示される。The present invention relates generally to information databases, and more particularly to database-like joins, and more particularly to systems and methods for organizing information so that overall features can be inferred from similar overall features. This is referred to herein as a "fuzzy" like bond and is exemplified using a drug like bond.

【０００３】[0003]

[Background information]

【０００４】[0004]

[Development of drug candidates]

化学者、生物学者及びその他の使用者は、いつも仮説を調査し立証するために
一連の化合物を作り試験する。この過程において、使用者はある種の特徴を示し
、又はある種の測定基準に従って挙動をする化合物を得るために探索することが
多く、そして類似の特徴又は挙動パターンを有する合成化合物を求めようとする
ことがある。Chemists, biologists and other users always make and test sets of compounds to investigate and substantiate hypotheses. In the process, users often seek to obtain compounds that exhibit certain characteristics or behave according to certain metrics, and seek for synthetic compounds with similar characteristics or behavior patterns. I have something to do.

【０００５】市場価値を有するある化合物を探す方法は、通常は広範囲の選定と試験とで出
発する。この例は、医薬品発見の最初の段階で典型的に使用される全体のスクリ
ーニングである。医薬品の発見は例として使われるが、農薬の発見及び物質科学
の研究、並びにその他の関連分野においても同じ形式の過程が使われる。The method of looking for a certain compound of market value usually starts with extensive selection and testing. An example of this is a global screen typically used in the initial stages of drug discovery. Although drug discovery is used as an example, the same type of process is used in pesticide discovery and material science research, and other related disciplines.

【０００６】高スループットスクリーニング（ＨＴＳ）において、有望な生物学的応答につ
いて検査され試験される化合物の数は５万から５０万の範囲又はそれ以上となる
ことが多い。最終目標は、生物学的スクリーニングにおいて活性である大量の化
合物のセットの中の幾つかの少数のセットを見いだすこと、及びこれらの化合物
を、最終の医薬品候補に更に開発し得る「リード」としてこれらの化合物を処理
することである。試験された化合物の最初のライブラリーは、多くの異なっ種類
の薬品を表す。In high throughput screening (HTS), the number of compounds tested and tested for promising biological responses often ranges from 50,000 to 500,000 or more. The ultimate goal is to find some minor set of the large set of compounds that are active in biological screens, and these compounds as “leads” that can be further developed into final drug candidates. Is to treat the compound. The first library of compounds tested represents many different types of drugs.

【０００７】初期ライブラリー内の薬品は、通常の合成、市場での入手、化学的組合せ、及
び天然製品の抽出により組織内で開発されたものを含む幾つかのソースから得る
ことができる。これら化合物は、典型的にマイクロタイタープレート（ｍｉｃｒ
ｏ−ｔｉｔｅｒｐｌａｔｅ）に置かれる。プレートについての典型的なフォー
マットは９６及び３８４ウエル（ｗｅｌｌ）プレートを含むが、１５３６及び３
４５６ウエルプレートのような高密度プレートに向かう傾向がある。これらプレ
ートは、生物学的スクリーニングを行うためにロボットにより操作されることが
普通である。The drugs in the initial library can be obtained from several sources, including those developed in-house by routine synthesis, commercial availability, chemical combination, and extraction of natural products. These compounds are typically used in microtiter plates (micr).
placed in the o-titer plate). Typical formats for plates include 96 and 384 well plates, but 1536 and 3
Tends towards higher density plates such as 456 well plates. These plates are commonly manipulated by robots to perform biological screens.

【０００８】スクリーニング自体は、通常は生物学的受容体に基づく。受容体は、受容体と
の結合がある程度直接的に測定できるように隔離されるか、又は受容体が潜在的
な薬品リードにより変調されたときに検出可能な応答を与えるようにセルライン
（ｃｅｌｌｌｉｎｅ）が設計されるかのいずれかである。The screen itself is usually based on biological receptors. Receptors are sequestered so that binding to the receptor can be measured to some extent more directly, or cell lines are provided to give a detectable response when the receptor is modulated by a potential drug lead. line) is either designed.

【０００９】大多数の初期ライブラリーは何千もの化合物を含むが、更に最も広範なライブ
ラリーでは、「霧状］の性質を有するかもしれない潜在的な薬品構造の無数の単
なる下位セットを示す。市場で入手し得る化合物の総数は、現在では全部で約百
万種に限定されると見積もられる。[0009] While the vast majority of initial libraries contain thousands of compounds, the more extensive libraries show a myriad of simple subsets of potential drug structures that may have "mist-like" properties. The total number of compounds available on the market is now estimated to be limited to a total of about one million species.

【００１０】スクリーニングすべき化合物の表は、これら入手可能なのもから無作為に選ぶ
ことができ、或いはある特定の計画に従う化学者又は生物学者のある直感的な「
バイアス」を有し選ばれることが多い。このバイアスは、化学者が有用な医薬品
候補に導きうる化学薬品の形式に対する特有の洞察力を持つことが多い計画に有
利であることがしばしばある。しかし、いかなるバイアスによっても、ときには
直感的な方法は有力な新規な薬品を見落とすことがあり得る。The table of compounds to be screened can be randomly selected from these available ones, or some intuitive “chemist” or “biologist” according to a particular scheme.
Often has a "bias". This bias is often beneficial to programs where chemists often have a unique insight into the types of chemicals that can lead to useful drug candidates. However, due to any bias, sometimes intuitive methods can overlook potential new drugs.

【００１１】最近の数年間において、技術動向は、最終の選択されたセット内の化合物の相
違点に基づいて選択することである。この過程は、多くの広いクラスの化合物が
試験されることを保証するように意図される。相違性の尺度（多様性測定基準）
及び相違性の選択方法は多く論議されてきたが、これらはいつも２種の化合物間
の「類似性」の尺度に依存している。互いにできるだけ異なった化合物を選ぶこ
とが一般的傾向であるが、これは、そのセットにおける最も化学的に「ユニーク
」な化合物の選定を導くことがしばしばある。従って、この方法は、有力な活性
的なリード化合物を見逃すように導く可能性がある。In the last few years, the technological trend has been to make selections based on the dissimilarity of compounds within the final selected set. This process is intended to ensure that many broad classes of compounds are tested. Dissimilarity measure (diversity metric)
And methods of selecting dissimilarities have been much discussed, but these always rely on a measure of "similarity" between two compounds. While it is common to choose compounds that are as different from each other as possible, this often leads to the selection of the most chemically "unique" compounds in the set. Therefore, this method may lead to missing potential active lead compounds.

【００１２】これらの調査を導く際は、研究者は、ライブラリー内の構造的に類似の化合物
の大きいクラスターを見いだす選定方法は稀にしか希望しない（例えば、５００
０のベンゾジアゼパン(benzodiazepam)誘導体は望ましくないであろう）。シン
グルトン、即ちデータセット内に類似構造を持たない化合物も、これらはいかな
る構造−活性度相関情報を展開させる機会を許さないため、一般に望ましくない
と考えられる。むしろ、１０−１５の小さい構造のセットに導く選定方法が好ま
しいと考えられる。類似化合物のかかる小さなセットは、化合物の活性度におけ
る小さな構造の変動の効果についてのある程度の解析を許す（構造−活性度相関
、又はＳＡＲ調査と呼ばれる）。更に、小さいクラスターは、スクリーニング結
果の確認を助ける。小さいクラスターにおける１０の化合物のうちの５が生物学
的活性を証明したとすれば、このクラスターは化学的に関連した構造と比較され
るため、活性度が再現可能であり「最適化」されることがよりありそうである。In guiding these studies, researchers rarely would like selection methods to find large clusters of structurally similar compounds in the library (eg, 500
0 benzodiazepam derivatives would be undesirable). Singletons, ie compounds that do not have similar structures in the data set, are generally considered undesirable as they do not allow the opportunity to develop any structure-activity relationship information. Rather, a selection method that leads to a set of 10-15 small structures is considered preferable. Such a small set of similar compounds allows some analysis of the effects of small structural variations on the activity of the compounds (called structure-activity relationships, or SAR studies). Moreover, the small clusters help confirm the screening results. Given that 5 out of 10 compounds in the small cluster demonstrated biological activity, this cluster is compared to a chemically related structure and the activity is reproducible and "optimized". It is more likely.

【００１３】最初の生物学的スクリーニングは一般に「ヒット薬」又は単に「ヒット」と呼
ばれる化合物を作る。ヒットは、ある検査でスクリーニングされ、そして希望閾
値以上の生物学的活性度の証明された化合物である。これらのヒットは、更に動
物による毒性調査及び最終的には人体による臨床治験により更なる解析が行われ
る最終の薬品候補を稀に含むことがある。事実、これらのヒットは、一般に、そ
の化学構造における小さい変化を作ることにより最適化されたリードを表す。こ
の変化は、一般に追加のスクリーニングにより商品候補が識別されるまでリード
の生物学的活性を改良し又は強化するために意図される。追跡される化合物は、
最初のヒットの類似体と呼ぶことができる。ヒットのこの最適化の過程は、一般
に「リード追跡」と呼ばれる。Initial biological screens generally produce compounds called “hits” or simply “hits”. Hits are compounds that have been screened in a test and have demonstrated biological activity above a desired threshold. These hits may rarely include the final drug candidate for further analysis by animal toxicity studies and ultimately human clinical trials. In fact, these hits generally represent reads that have been optimized by making small changes in their chemical structure. This change is generally intended to improve or enhance the biological activity of the lead until additional screens identify candidate products. The compounds tracked are
It can be called the analog of the first hit. This process of hit optimization is commonly referred to as "lead tracking."

【００１４】リード追跡は、一般に、幾つかのリード化合物の類似体の小セットを作る薬品
化学者により達成されてきた。次いで、最初のヒットに導く最初のスクリーニン
グと同様に、類似体の生物学的効力も試験される。活性度を増加させる構造変更
が選ばれ、活性度を減らすものは通常捨てられ、そして類似体化合物に対する新
たな変性が作られ試験されることも多い。薬品化学者は、化合物（又は化合物の
小セット）が医薬候補として適切な効力を持つことが確認されるまでリードを追
跡する。Lead tracking has generally been accomplished by chemical chemists who make small sets of analogs of several lead compounds. The biological potency of the analog is then tested, as well as the initial screen leading to the first hit. Structural changes that increase activity are chosen, those that decrease activity are usually discarded, and new modifications to the analog compounds are often made and tested. The medicinal chemist follows the lead until the compound (or small set of compounds) is confirmed to have adequate potency as a drug candidate.

【００１５】最近の数年間において、薬品化学者は、定量的構造活性度関係（Ｑｕａｎｔｉ
ｔａｔｉｖｅＳｔｒｕｃｔｕｒｅＡｃｔｉｖｉｔｙＲｅｌａｔｉｏｎｓｈ
ｉｐ，ＱＳＡＲ）のようなコンピューターベースの設計技術によりしばしば支援
されてきた。これらのプログラムは、まだ試験中の化合物の効力を予測するため
に、前に試験された化合物についての効力データを使用する。ＱＳＡＲプログラ
ムの目標は、化合物を試験するより前に活性度の正確な予測を与えることである
。ＱＳＡＲプログラムは一般に成功し、最終の医薬候補の活性度を予測するだけ
でなく、各段階の類似合成体のより効果的な選択も許す。ＱＳＡＲ方法により活
性であると予測された化合物はいつも予測された活性度を有するわけではないが
、一般に、これらの化合物は、一般のその他のものと比較して活性的である機会
が多い。In the last few years, pharmaceutical chemists have sought quantitative structure-activity relationships (Quanti
native Structure Activity Relations
ip, QSAR) and has often been supported by computer-based design techniques. These programs use the potency data for previously tested compounds to predict the potency of compounds still under test. The goal of the QSAR program is to give an accurate prediction of activity prior to testing compounds. The QSAR program is generally successful, not only predicting the activity of the final drug candidate, but also allowing a more efficient selection of analogs at each stage. Compounds predicted to be active by the QSAR method do not always have the expected activity, but in general these compounds have a greater chance of being active compared to others in general.

【００１６】医薬品の開発は一般に極めて競争が激しい。従って、ほとんどの場合、医薬品
候補が選定されると、候補自体又は候補の使用が他の特許により制限されないこ
とを確かめるために広範な特許調査が行われる。動物による毒性調査は、一般に
特許調査に続く。動物による毒性調査が受容し得る場合は、医薬品候補の人間に
よる臨床治験が行われる。The development of pharmaceuticals is generally very competitive. Therefore, in most cases, once a drug candidate has been selected, extensive patent search is conducted to ensure that the candidate itself or its use is not limited by other patents. Animal toxicity searches generally follow patent searches. Where animal toxicity studies are acceptable, human clinical trials of drug candidates are conducted.

【００１７】有力な医薬品候補のスクリーニング、類似体製造及び確認の過程は、非常に時
間と費用のかかることがあり得る。特許検索、特に薬品化合物の分野における検
索は、これも時間と費用とを要する。有力な医薬品候補を含んだ動物毒性調査は
数千ドル台の費用であり容易である。有力な医薬品候補の人間における安全性と
効力とを確立するために設計された人間による臨床治験は千万ドルを越える。従
って、時間、労力及び資金の実質的な投入が、例えば第三者の特許の請求範囲内
にある有力な医薬品候補、又は例えば、人間の臨床調査により安全性の証明され
た別の化合物と化学的に関連する有力な医薬品候補に向けられないように、可能
性のある医薬品候補に関して多くの情報を、過程のできるだけ早い段階で知るこ
とが不可避である。The process of screening for potential drug candidates, analog production and confirmation can be very time consuming and expensive. Searching for patents, especially in the area of drug compounds, is also time consuming and expensive. Animal toxicity studies, including potential drug candidates, cost thousands of dollars and are easy. Human clinical trials designed to establish the safety and efficacy of potential drug candidates in humans exceed $ 10 million. Therefore, a substantial input of time, effort and money may be due to potential drug candidates, for example within the scope of a third party patent, or other compounds and chemistries proven safe, for example by human clinical investigations. It is imperative to know as much information as possible about potential drug candidates as early in the process as possible so that they are not directed to relevant relevant drug candidates.

【００１８】[0018]

[Relational database system]

リレーショナルデーターベースシステム（ＲＤＳ）は、多すぎる程の対象の情
報を記憶し探索するためにあらゆる産業及び学会で広く使用される。ＲＤＳは、
各エンティティの種々の事例に関する情報を記憶するために表構造を使用する。
これらの表は各データ項目（列）の属性となし得るコラムを定める。各コラムの
データは、文章、数字、日時、バイナリーなどを含む幾つかの形式のものとする
ことができる。あるコラムのデータは、これをより迅速に検索できるように索引
が設けられる。Relational Database Systems (RDS) are widely used in all industries and academia to store and retrieve too much information of interest. RDS is
A table structure is used to store information about the different cases of each entity.
These tables define the columns that can be the attributes of each data item (column). The data in each column can be in several formats, including text, numbers, date and time, binary, and so on. The data in a column is indexed so that it can be retrieved more quickly.

【００１９】データーベース設計のためのリレーショナルモデルにおいて、数列にわたって
繰り返されるデータは、通常、新しい表／エンティティ定義内に分割される。こ
の過程は「正規化」と呼ばれ、一般にデータの一貫性を保護しかつ空間の節約の
ために行われる。しかし、表中のデータ間の関係は維持される。In a relational model for database design, data that is repeated over sequences is usually split into new table / entity definitions. This process is called "normalization" and is generally done to protect data consistency and save space. However, the relationship between the data in the table is maintained.

【００２０】ＲＤＳにおけるデータは、一般に使用者により照会され、又は照会指向言語で
特定の照会を作ることによりアプリケーションプルグラムにより照会される。オ
ラクル^TMシステムは、ＲＤＳの好ましい例である。オラクルにおいては、多くの
他のＲＤＳにおけると同様に、照会は、構造化照会言語（Ｓｔｒｕｃｔｕａｒｅ
ｄＱｕｅｒｙＬａｎｇｕｇｅ，ＳＱＬ）を使用して提出される。この言語は
、種々の表に記憶されている情報の検索を容易にし、かつ異なった表における関
連データを組み合わせることを許す。データの組合せを実行するＳＱＬ照会の構
成は「結合」と呼ばれる。用語「結合」は、本文においては技術用語であり、名
詞であって動詞ではない。結合は、ある共通又は関連したコラム（属性）に基づ
いて、一方の表の列と他方の表の列とを連結する。結合は「オンザフライ（ｏｎ
ｔｈｅｆｉｙ）」で行われ（即ち、結合自体が、発生のたびに各照会に加え
られる）、又は一般に「ビュー」と呼ばれる疑似表を与えるように予め決めてお
くことができる。ビューは新しい表の外見を有するが、一般にビューはそのよう
には記憶されない。Data in RDS is typically queried by users or by application programs by making specific queries in a query-oriented language. The Oracle ^™ system is a preferred example of RDS. In Oracle, as in many other RDS, queries are structured query languages (Structureare).
d Query Language, SQL). This language facilitates retrieval of information stored in various tables and allows combining relevant data in different tables. The composition of an SQL query that performs a combination of data is called a "join". The term "association" is a technical term in the text and is a noun, not a verb. A join concatenates the columns of one table and the columns of the other table based on some common or related column (attribute). The combination is "on the fly (on
can be pre-determined to provide a pseudo table, commonly referred to as a "view", or the join itself is added to each query as it occurs. Views have a new table look, but views are generally not stored as such.

【００２１】[0021]

【the Internet】

インターネットは、種々の場所にある情報をさまざまな個人が入手できるよう
にするための有用な技法である。事実、広大なインターネット環境において、個
人が遠隔地から情報ツールにアクセスすることができる。インターネットは、上
述されたようなリレーショナルデータベースに記憶された情報にアクセスするた
めの好ましい方法である。The Internet is a useful technique for making information in different places available to different individuals. In fact, in vast Internet environments, individuals can access information tools remotely. The Internet is the preferred method for accessing information stored in relational databases such as those mentioned above.

【００２２】１９６０年代後半に始まったインターネットは、全世界にわたる多くの小さい
ネットワークより作られたコンピューターネットワークである。インターネット
のホストコンピューター又はコンピューターネットワークは、多くの分野の専門
家の情報を含んだデータベースへの一般人のアクセスを許す。ホストは、例えば
大学、政府組織、企業及び個人を含む広範囲のエンティティにより後援される。The Internet, which began in the late 1960s, is a computer network made up of many small networks throughout the world. Internet host computers or computer networks allow the public access to databases containing information from experts in many fields. Hosts are sponsored by a wide range of entities including, for example, universities, government organizations, businesses and individuals.

【００２３】インターネット情報は、インターネットホスト上を走っているサーバーを経て
一般人が入手できるように作られる。サーバーは、ホストサイトにアクセスする
人が利用できる文書又はファイルを作成する。かかるファイルは、好ましくはホ
ストに置かれるデータベース又は例えば、光又は磁気の記憶装置のような記憶媒
体に記憶させることができる。Internet information is made available to the public via a server running on an Internet host. The server creates a document or file that is available to those who access the host site. Such files may be stored in a database, preferably located on the host, or a storage medium, such as an optical or magnetic storage device.

【００２４】ホストと依頼するクライアントとの間の通信を容易にするためにネットワーキ
ングプロトコールを使うことができる。ＴＣＰ／ＩＰ（Ｔｒａｎｓｍｉｓｓｉｏ
ｎＣｏｎｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）
は、かかるネットワーキングプロトコールの一つである。ＴＣＰ／ＩＰ上のコン
ピューターは、インターネット上の各コンピューター又はホストを一義的に識別
できる特有の識別（ＩＤ）コードを利用する。かかるコードは、ＩＰ（インター
ネットプロトコール）番号又はアドレス、及び対応するネットワーク又はコンピ
ューター名を含むことができる。Networking protocols can be used to facilitate communication between the host and the requesting client. TCP / IP (Transmissio)
n Control Protocol / Internet Protocol)
Is one such networking protocol. A computer on TCP / IP uses a unique identification (ID) code that can uniquely identify each computer or host on the Internet. Such code may include an IP (Internet Protocol) number or address and corresponding network or computer name.

【００２５】１９９１年に作られたＷｏｒｌｄ−Ｗｉｄｅ−Ｗｅｂ（Ｗｅｂ又はｗｗｗ）は
、使用者に、ＩＰアドレスなしで又はその他の専門的な知識なしで、インターネ
ット資源を直感的にナビゲートすることを許しつつインターネット上の情報への
アクセスを提供する。Ｗｅｂは相互に接続された膨大な「ページ」又は文書を備
え、これを使用者のコンピューターのモニター上に展示することができる。Ｗｅ
ｂページは、特別なサーバーが走っているホストにより提供される。これらＷｅ
ｂサーバーを走るソフトウエアは比較的単純であり、かつＰＣを含む広範囲のコ
ンピュータープラットフォームにおいて利用可能である。Ｗｅｂページ並びに使
用者のシステムの伝統的な非Ｗｅｂファイルを展示するために、Ｗｅｂブラウザ
ーソフトウエアが等しく利用できる。Created in 1991, World-Wide-Web (Web or www) allows users to intuitively navigate Internet resources without an IP address or other specialized knowledge. Provide access to information on the Internet while allowing. The Web comprises a large number of interconnected "pages" or documents that can be displayed on the monitor of a user's computer. We
Page b is served by a host running a special server. These We
The software running on the b server is relatively simple and available on a wide range of computer platforms, including PCs. Web browser software is equally available to display web pages as well as traditional non-web files on your system.

【００２６】Ｗｅｂは、ＨＴＴＰ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃ
ｏｌ）として知られるハイパーテキスト及び転送方法の考えに基づく。ＨＴＴＰ
は第１のＴＣＰ／ＩＰ上で走るように設計されかつ標準インターネットセットア
ップを使用し、サーバーはデータ及びクライアント表示を発行し、又はこれを処
理する。情報転送用の一つのフォーマットは、ＨｙｐｅｒｔｅｘｔＭａｒｋｕ
ｐＬａｎｇｕａｇｅ（ＨＴＭＬ）を使用して文書を作成するためのものである
。ＨＴＭＬページは標準テキスト、並びにページをどのように展示するかを指示
するコードから作られる。ブラウザーはページを展示するためにこれらコードを
読む。The Web is based on HTTP (Hypertext Transfer Protocol).
based on the idea of hypertext and transfer methods known as ol). HTTP
Is designed to run over a first TCP / IP and uses standard Internet setup, the server publishes or processes data and client representations. One format for information transfer is Hypertext Marku.
p is for creating a document using Language (HTML). HTML pages are made from standard text as well as code that indicates how to display the page. The browser reads these codes to display the page.

【００２７】各Ｗｅｂページは、テキストに加えて絵と音とを含むことができる。あるテキ
ストと組み合わせられて絵又は音は、同じサーバー内、或いはインターネット内
の他のコンピューターとも接続する。これはハイパーテキストリンクとして知ら
れる。例えば、リンクはアンダーライン付き又はハイライト部分とされた語又は
フレーズとして表すことができる。各リンクはＵＲＬ（ＵｎｉｆｏｒｍＲｅｓ
ｏｕｃｅＬｏｃａｔｏｒ）と呼ばれる特別名称を使ってＷｅｂページに向けら
れる。ＵＲＬは、別のＷｅｂサーバーにある場合であっても組み合わせられたフ
ァイルに直接行くことができる。Each Web page can include pictures and sounds in addition to text. The pictures or sounds combined with some text connect to other computers on the same server or on the internet. This is known as a hypertext link. For example, links can be represented as underlined or highlighted words or phrases. Each link is a URL (Uniform Res)
It is directed to the web page using a special name called the Locator). The URL can go directly to the combined file, even if it is on another web server.

【００２８】情報の一般人による一般的な検索を許すインターネットに加えて、かかる情報
にアクセスする別の手段がありかつ共通に利用される。例えば、２台のコンピュ
ーター間の直接モデム接続、大きい学会及び組織内の私的インターネットネット
ワークなどは、データベースに記憶された分類情報にアクセスするための等しく
利用可能でかつ有用な手段である。In addition to the Internet, which allows the general public to retrieve information, there are and commonly used alternative means of accessing such information. For example, direct modem connections between two computers, large academic societies and private Internet networks within an organization are equally available and useful means for accessing classification information stored in databases.

【００２９】[0029]

[Outline of the Invention]

本発明は、エンティティに関する特徴が、より小さいエンティティの特徴から
推定し得る情報組織化のためのシステム及び方法に向けられる。本発明の一態様
により、データベース類似結合は、第１のデータベースに含まれる特性又はパラ
メーター関連の情報を第２のデータベースに含まれる特性又はパラメーター関連
の情報から推測するために使うことができる。本発明のこの態様により、本発明
は、ある特定の使用者が要求し又は希望する方法で組織化されていない情報の検
索を提供できる。これは、無関係のデータベースにおいて組織化されたエンティ
ティ間の共通特性又は類似性に基づく検索を許す。情報は、情報を使用者に更に
有用にする方法で検索できかつ組織化することができる。The present invention is directed to systems and methods for information organization in which features related to an entity can be inferred from features of smaller entities. According to one aspect of the invention, database similarity joins can be used to infer property or parameter related information contained in a first database from property or parameter related information contained in a second database. This aspect of the invention allows the invention to provide a search for information that is not organized in the manner requested or desired by a particular user. This allows searches based on common characteristics or similarities between organized entities in irrelevant databases. Information can be retrieved and organized in ways that make the information more useful to the user.

【００３０】かかる検索及び組織化を許す一方法が、ファジー類似結合と呼ばれる。この方
法により、検索された情報間の関係は、これを検索する方法に直感的に又は組織
的に関係する必要がない。代わりに、機能の情報の検索は１個又はそれ以上のデ
ータベースにおけるエンティティ間の類似性に基づくことができる。One method that allows such searching and organizing is called fuzzy similarity join. By this way, the relationships between the retrieved information do not have to be intuitively or systematically related to the way it is retrieved. Alternatively, the search for functional information can be based on similarities between entities in one or more databases.

【００３１】個別に又は集合的に実行し得る本発明のこれら及びその他の態様は、恐らくは
応用例の項目において最もよく説明される。例えば、科学者たちが対象とする１
又はそれ以上の化合物について幾つかの情報を得たい場合の薬品探索の応用を考
えよう。通常の薬品データベース戦略によって、対象化合物に関し科学者の興味
のある情報をデータベースにおいて入手することは容易でなく、又は全く入手で
きない。本発明の一態様により、科学者は別のデータベースからの対象の化合物
に関する情報を得るためにデータベース結合を行うことができる。These and other aspects of the invention, which may be carried out individually or collectively, are perhaps best explained in the application section. For example, 1
Or consider the application of drug discovery when you want to get some information about more compounds. With conventional drug database strategies, it is not easy or impossible to obtain information of interest to scientists in a database about a compound of interest. One aspect of the invention allows scientists to perform database joins to obtain information about compounds of interest from another database.

【００３２】この応用における本発明の別の態様により、科学者は、「類似」化合物の別の
パラメーターの特性に基づいて、対象の化合物に関する情報を推測するために薬
品類似結合（又はファジー類似結合）を行うことができる。本発明のこの態様に
より、薬品類似結合は、科学者が「類似」化合物に関する情報を得るために１個
又はそれ以上のデータベースを探索することを許す。科学者は、対象の化合物に
関する挙動又はその他の特性又はパラメーターを推測するためにこの情報を使う
ことができる。According to another aspect of the invention in this application, the scientist, based on the properties of the other parameters of the “similar” compound, may infer drug-like bond (or fuzzy-like bond) to infer information about the compound of interest. )It can be performed. According to this aspect of the invention, drug-like binding allows scientists to search one or more databases for information about "similar" compounds. Scientists can use this information to infer behavior or other properties or parameters for the compound of interest.

【００３３】例えば、一実施例においては、問題の特性（例えば、毒性）について近傍効果
があるように薬品空間を定め、次いで１個のデータベースにおける対象の化合物
の特性を別のデータベースにおける類似化合物の特性データベースから推測する
ことができる。従って、この応用における本発明のこの態様は、２個の構造の類
似の比較により２個の表を結合させることを許す。２個の構造の正確な適合は、
結合作業を行うためには要求されない。For example, in one embodiment, the drug space is defined such that there is a neighborhood effect on the property of interest (eg, toxicity), and then the properties of the compound of interest in one database are compared to those of similar compounds in another database. It can be inferred from the property database. Thus, this aspect of the invention in this application allows the joining of two tables by a similar comparison of the two structures. The exact fit of the two structures is
It is not required to do the combining work.

【００３４】本発明の別の態様により、特性測定基準の好ましいセットのまわりの近傍探索
を容易にするために、実際の化合物データと仮想データセットとを組み合わせる
探索用ツールを使うことができる。近傍関係は、これを類似結合の基礎とするこ
とができる。According to another aspect of the invention, a search tool can be used that combines actual compound data with a virtual data set to facilitate a neighborhood search around a preferred set of characterization metrics. Neighbor relations can make this the basis for similarity joins.

【００３５】本発明の別の態様により、特別な又は識別された特性又は特徴を有する記録を
排除するために、データセットをスクリーニングすることができる。更に、記録
の望ましくないクラスを排除するために、更なるスクリーニングを許すようにデ
ータセットを他のデータと組み合わせることができる。According to another aspect of the invention, datasets can be screened for exclusion of records that have special or identified characteristics or features. Further, the dataset can be combined with other data to allow further screening to eliminate unwanted classes of records.

【００３６】本発明のなお別の態様により、探索用ツールを、確認された品目の購入を使用
者に許す発注システムとリンクさせることとができる。According to yet another aspect of the invention, the search tool can be linked to an ordering system that allows a user to purchase a confirmed item.

【００３７】本発明のこれら及びその他の特徴、利点及び態様が、以下更に詳細に説明され
る。These and other features, advantages and aspects of the present invention are described in further detail below.

【００３８】[0038]

Detailed Description of the Preferred Embodiments

【００３９】[0039]

[Introduction and overflow]

本発明は、一般に情報データベースに関し、特にデータベース類似結合、より
特別には情報を組織化しこれによりエンティティに関する特徴が同様なエンティ
ティの特徴から推測されるシステム及び方法に関する。この特許文書に明らかに
される本発明は、ある特定の使用者が必要とし又は希望するが、無関係なデータ
ベースに組織化されるエンティティ間の共通特徴に基づいて情報を（そうでない
ときのデータより）なお有用にし得る方法で組織化されないことが普通の情報の
検索に応用可能でありかつ有用である。The present invention relates generally to information databases, and more particularly to database-like joins, and more particularly to systems and methods for organizing information so that features about an entity can be inferred from features of similar entities. The invention disclosed in this patent document provides information based on common features between entities that are needed or desired by a particular user, but which are organized into irrelevant databases (from data otherwise). It is applicable and useful for ordinary information retrieval that it is not organized in a way that can still be useful.

【００４０】ここでは、かかる検索及び組織化を許す方法を「ファジー類似結合」と呼び、
これにおいては検索された情報間の関係は、それを検索する方法と直感的に又は
組織的に関係せず、むしろ、この関係は、使用者が（可能ならば）無関係か又は
拡散したデータ源から所要のデータを苦心して探すことが必要な使用者の要求に
基づく。事実、本発明は、カタログ制作者の目標に従って特別の限定された方法
で情報を得ることを使用者に強制する「固定された」カタログ（印刷物又は電子
ベースのいずれであっても）とは違って、本発明は、使用者の要求及び目的に基
づいて情報を「流動的に」検索することを許す。Here, a method that allows such retrieval and organization is called “fuzzy similarity combination”,
In this, the relationship between the retrieved information is not intuitively or systematically related to the way it is retrieved, but rather the relationship is a source of data that the user has (if possible) irrelevant or diffused. It is based on the user's request that it is necessary to painstakingly search for required data from In fact, the present invention differs from "fixed" catalogs (whether printed or electronic-based) that force the user to obtain information in a special and limited way according to the cataloger's goals. Thus, the present invention allows "fluid" retrieval of information based on the user's needs and purpose.

【００４１】ファジー類似結合は、例えば、（１）比較されるデータベースに共通な少なく
も一つの属性を識別し、又は（２）標準関連データベースの演算子による判定セ
ット（例えば、等しい、より大きい、より小さい、含む、除外するなど）に合う
ことに基づいたデータセットから情報の組織化と検索を一般に必要とする伝統的
な結合を含まない。例えば、図４Ａ、４Ｂ及び４Ｃの結合は、伝統的な結合を構
成し、ファジー類似結合ではない。ファジー類似結合は、非伝統的データベース
演算子に基づく。この文章においては、非伝統的演算子は、２要素の類似を評価
する。ここに「類似」は比較されている要素属性の幾つか又は全部の複合指数で
あり、複合指数は０（完全な０ではない）と１以下（１は含まない）との間の値
に任意に割り当てることができる。伝統的な結合については、２要素についての
複合指数は正確に１でなければならない（又は、標準関連演算子により設定され
た条件を完全に満たすであろう）。ファジー類似結合に対しては、複合指数は使
用者により予め決められたある値より大きい値を持たねばならない。例えば、こ
こに明らかにされるファジー類似結合の好ましい実施例（薬品類似結合）におい
ては、タニモト係数が薬品類似結合の複合指数として機能し、そして薬品類似結
合を確立するために、使用者は、自分の特別な要求に基づいてタニモト係数の最
小値を設定するであろう。A fuzzy-like join may, for example, (1) identify at least one attribute that is common to the databases being compared, or (2) a set of operator-related decision sets (eg, equal to, greater than, Smaller than, include, exclude, etc.) does not include traditional joins that generally require organizing and retrieving information from a dataset based on fit. For example, the bonds in FIGS. 4A, 4B and 4C constitute a traditional bond and are not fuzzy like bonds. Fuzzy-like joins are based on non-traditional database operators. In this sentence, the nontraditional operator evaluates the similarity of two elements. "Similarity" is the composite index of some or all of the element attributes being compared, and the composite index is any value between 0 (not a perfect 0) and 1 or less (not including 1). Can be assigned to. For traditional concatenation, the composite exponent for the two elements must be exactly 1 (or will completely meet the conditions set by the standard association operators). For fuzzy similarity joins, the composite index should have a value greater than some value predetermined by the user. For example, in the preferred embodiment of the fuzzy-like bond disclosed herein (drug-like bond), the Tanimoto coefficient functions as a composite index of the drug-like bond, and in order to establish the drug-like bond, the user must: You will set a minimum value for the Tanimoto coefficient based on your particular needs.

【００４２】従って、この開示はある特定形式の情報、即ち、薬品化合物の情報に向けられ
た特定の実施例に焦点が当てられるが、この開示は、通常の技術により展望され
たとき、本発明を、限定するものではないが例えば生物学的化合物、冶金用の化
合物、遺伝子情報、健康志向、人口研究、政治及び世論動向などのような他の領
域に適用する機会を提供するであろう。しかし、提示効率の目的で、限定するも
のではないが、本開示の焦点はある特定のファジー類似結合、薬品類似結合に向
けられる。Accordingly, while this disclosure focuses on certain specific forms of information, ie, specific examples directed to drug compound information, this disclosure, when viewed by ordinary skill in the art, does not disclose the present invention. Would provide opportunities to apply to other areas such as, but not limited to, biological compounds, metallurgical compounds, genetic information, health consciousness, population studies, political and public opinion trends, and the like. However, for purposes of presentation efficiency, and without limitation, the focus of the present disclosure is on certain fuzzy, drug-like bonds.

【００４３】本発明の一態様により、探索用ツールは、特性測定基準の好ましいセットのま
わりの近傍の探索を容易にするために、実際の化合物データと仮想データセット
とを組み合わせる。According to one aspect of the invention, a search tool combines actual compound data with a virtual data set to facilitate searching for neighborhoods around a preferred set of property metrics.

【００４４】本発明の別の態様により、データセットは、特定の性質又は識別された特性を
有する化合物を無くすためにスクリーニングすることができる。更に、データセ
ットは、望ましくないクラスの化合物を排除するために更なるフイルターリング
を許すように、他のデータと組み合わせることができる。例えば、データセット
は、特許適用範囲、毒物情報、束縛データなどに関連する情報と組み合わせるこ
とができる。本発明の別の態様により、探索用ツールは、使用者に識別された化
合物の購入を許す注文システムにリンクすることができる。According to another aspect of the invention, the data set can be screened for compounds with particular properties or identified properties. In addition, the data set can be combined with other data to allow further filtering to exclude unwanted classes of compounds. For example, the dataset can be combined with information related to patent coverage, toxicology information, binding data, and the like. According to another aspect of the invention, the exploration tool can be linked to an ordering system that allows the user to purchase the identified compound.

【００４５】個々に又は集合的に実行し得る本発明及びその種々の態様は、好ましいインタ
ーフェースツールとしてインターネットの用語で明らかにされる。これらの用語
で明らかにされる本発明は、説明を容易にするだけのために提供される。この開
示の読了後、通常の技術者は、本発明が幾つかの検索環境のいずれにおいても実
行し得ることが明らかとなるであろう。The present invention and its various aspects, which may be implemented individually or collectively, are disclosed in the Internet term as preferred interface tools. The invention disclosed in these terms is provided solely for ease of explanation. After reading this disclosure, it will be apparent to one of ordinary skill in the art that the present invention may be implemented in any of several search environments.

【００４６】[0046]

[Library information integration]

ライブラリー情報統合、即ち薬品ライブラリーの統合に関する情報の最も好ま
しい実施例が明らかにされ、これは上述のようにインターネット環境の用語で作
られる。The most preferred embodiment of the information relating to library information integration, i.e. the integration of drug libraries, has been elucidated, which is made up in the terminology of the Internet environment as described above.

【００４７】図１は、本発明の一実施例によるライブラリー統合情報システムの一アプリケ
ーションを一般に示しているブロック図である。図１に示されたアプリケーショ
ンは、ライブラリー１０６、１０８並びに別のデータソース１１０へのアクセス
を有するライブラリー統合サーバー１０４を含む。好ましい実施例において、及
び薬品類似結合の文脈において、ライブラリー１０６は、ライブラリー統合サー
バー１０４がアクセスできかつ検索できる薬品化合物ライブラリーを１又はそれ
以上含む。ライブラリー１０６は、知られかつ存在している薬品化合物に関する
情報を含み、更に、例えば化合物の特性測定基準及びその他の情報のような情報
を含むことができる。ライブラリー統合情報システムを化合物の購入のためにも
使用し得る一実施例において、ライブラリー１０６は、入手可能な化合物の購入
のための価格、入手可能性、及び配送の情報を含むことができる。FIG. 1 is a block diagram generally illustrating one application of a library integrated information system according to one embodiment of the present invention. The application shown in FIG. 1 includes a library integration server 104 having access to libraries 106, 108 as well as another data source 110. In the preferred embodiment, and in the context of drug-like binding, library 106 comprises one or more drug compound libraries that are accessible and searchable by library integration server 104. The library 106 contains information about known and existing drug compounds and can also include information such as, for example, compound characterization metrics and other information. In one embodiment, where the library integrated information system may also be used to purchase compounds, the library 106 may include price, availability, and shipping information for the purchase of available compounds. .

【００４８】薬品ライブラリー情報システムのアプリケーション及び薬品ライブラリー情報
システムと関連して使用されるライブラリー１０６に依存して、ライブラリー１
０６は、サーバー１０４に又はこれから離して置くことができる。ローカルライ
ブラリー１０６は、これに含まれる薬品データへの直接アクセスを提供するため
に使用でき、かつ薬品ナビゲーションサーバー１０４に関連して維持することが
できる。しかし、化合物ライブラリー１０６は存在できかつ第３者により維持さ
れ、更に、ダイヤルアップリンク、ネットワーク、インターネット、又はその他
の通信媒体のような遠隔通信回路網を介してサーバー１０４にアクセス可能であ
る。このシナリオの下で、作動時に追加のデータセットを利用するためのサーバ
ー１０４の能力は、外部データセットに拡張することができる。Library 1 depending on the application of the drug library information system and the library 106 used in connection with the drug library information system.
06 may be located at or remote from server 104. The local library 106 can be used to provide direct access to the drug data contained therein and can be maintained in association with the drug navigation server 104. However, the compound library 106 may exist and be maintained by a third party, and may be accessible to the server 104 via a telecommunication network such as a dial-up link, a network, the Internet, or other communication medium. Under this scenario, the ability of server 104 to utilize additional datasets during operation can be extended to external datasets.

【００４９】１個又はそれ以上の仮想化合物ライブラリー（１個のみを図示）１０８は、既
知の又は入手可能な化合物のデータセットを提供できる仮想化合物のデータセッ
トの識別を含むことが好ましい。「仮想」は、物理的に存在しないかもしれない
薬品化合物を一般に呼ぶ技術用語である。仮想化合物は、仮想化合物を合成でき
るような既知の合成過程により定義することができる。好ましくは、仮想化合物
ライブラリーは、１個又は複数の特性測定基準により定められる特定の特性のセ
ットを有する複数の仮想又は仮定の薬品化合物を含む。以下詳細に説明されるよ
うに化合物探索技法の強化を許すために、既知のデータセットによる仮想データ
セットのマッピング又は組合せを利用することができる。化合物ライブラリー１
０６と同様に、仮想化合物ライブラリー１０８を、局所的に又は遠隔位置におい
て維持することができ、かつ多種の異なった接続技術を通してサーバー１０４が
アクセスすることができる。The one or more virtual compound libraries (only one shown) 108 preferably include identifications of virtual compound data sets that can provide known or available compound data sets. “Virtual” is a technical term that generally refers to drug compounds that may not be physically present. Virtual compounds can be defined by known synthetic processes that allow the synthesis of virtual compounds. Preferably, the virtual compound library comprises a plurality of virtual or hypothetical drug compounds that have a particular set of properties defined by one or more property metrics. Mapping or combination of virtual datasets with known datasets can be utilized to allow enhancements to compound search techniques, as described in detail below. Compound library 1
Similar to 06, the virtual compound library 108 can be maintained locally or at a remote location and can be accessed by the server 104 through a variety of different connection technologies.

【００５０】薬品ライブラリー情報システムにアクセスしたい使用者は、使用者のコンピュ
ーター又はワークステーション１０２を介してサーバー１０４に接続できる。ユ
ーザーワークステーション１０２は、例えば直接接続、ネットワーク接続、又は
その他を含む種々の接続後術を利用してサーバー１０４に接続することができる
。好ましい方法においては、使用者はインターネットを介して種々の遠隔地から
サーバー１０４にアクセスできる。A user who wishes to access the drug library information system can connect to the server 104 via the user's computer or workstation 102. User workstation 102 may connect to server 104 using a variety of post connection techniques including, for example, direct connection, network connection, or otherwise. In the preferred method, a user can access the server 104 from various remote locations via the Internet.

【００５１】図１に示されたライブラリーの第３カテゴリーは、他のデータ１１０を含んだ
ライブラリーである。その他のデータは、サーバー１０４を介して薬品化合物を
評価する際に使用者の興味のある例えば、特許データ、毒性データ、薬品化合物
−生物目標結合データ、又はその他のデータのような薬品化合物に関する対象デ
ータを含むことができる。ライブラリー１０６、１０８と同様に、ライブラリー
１１０に含まれるデータは、他のデータと共に共通のデータベースにおいて見い
だすことができ、又はサーバー１０４により局所的又は遠隔的にアクセスし得る
分離データベースに見いだすことができる。The third category of library shown in FIG. 1 is a library containing other data 110. Other data may be relevant to the drug compound, such as patent data, toxicity data, drug compound-biologic target binding data, or other data of interest to the user in evaluating the drug compound via the server 104. Can include data. Similar to the libraries 106, 108, the data contained in the library 110 may be found in a common database with other data, or may be found in a separate database that may be accessed locally or remotely by the server 104. it can.

【００５２】ライブラリー１０６、仮想化合物ライブラリー１０８、その他のデータソース
１１０は、図１においては「分離した」データベースとして図示されたがこれら
データベースに含まれるデータを１個又は複数個の物理的データベース又は論理
データベースにおいて提供し得ることは、本技術の通常の技術者に明らかであろ
う。更に、図１には１個のサーバー１０４が示されるが、システムは、１個又は
複数のサーバー１０４を使って実行することができる。Library 106, virtual compound library 108, and other data sources 110 are shown as “separated” databases in FIG. 1, but the data contained in these databases may be one or more physical databases. Alternatively, it will be apparent to one of ordinary skill in the art that it can be provided in a logical database. Further, although one server 104 is shown in FIG. 1, the system can be implemented using one or more servers 104.

【００５３】図１に示されたアプリケーション例は、少なくも２クラスのライブラリー、化
合物ライブラリー１０６及び仮想化合物ライブラリー１０８を備える。上述のよ
うに、ライブラリー１０６は、ライブラリーに列挙された既知又は存在する化学
薬品又は化合物についての情報を含む。既知の薬品化合物は各化合物に関連する
複数の特性測定基準の値に従って分類することができる。本発明の一態様により
、このカタログは、化合物を薬品空間にマッピングし、そして薬品空間において
既知の化合物と仮想化合物とを組み合わせることにより達成することができる。The example application shown in FIG. 1 comprises at least two classes of libraries, a compound library 106 and a virtual compound library 108. As mentioned above, the library 106 contains information about known or existing chemicals or compounds listed in the library. Known drug compounds can be classified according to the values of a number of property metrics associated with each compound. According to one aspect of the invention, this catalog can be achieved by mapping compounds into drug space and combining known and virtual compounds in drug space.

【００５４】[0054]

[Measurement standard and chemical space]

薬品空間における化合物のマッピングを説明する前に、「空間」の数学的導入
部が提供される。ここで使用される「空間」は１セットのパラメーターと距離関
数とにより定められる。パラメーターの領域は、実数又は複素数のセット又は適
宜なその下位セット（例えば、整数、正整数など）とすることができる。数の非
連続なセットに基づいたパラメーターは「離散パラメーター」と呼ばれる。空間
における各パラメーターについての特定値により適宜の空間内の点が定義される
。距離関数は、空間内の適宜の２点について実数の負でない値を作る。Before describing the mapping of compounds in drug space, a mathematical introduction to "space" is provided. As used herein, "space" is defined by a set of parameters and a distance function. The domain of parameters can be a set of real or complex numbers or any suitable sub-set thereof (eg, integer, positive integer, etc.). Parameters based on a discontinuous set of numbers are called "discrete parameters". A specific value for each parameter in space defines a point in the appropriate space. The distance function produces real, non-negative values for any two points in space.

【００５５】大多数の個人は、物理的に「利用し得る」生活経験に関連する空間−２次元（
２Ｄ）及び３次元（３Ｄ）ユークリッド空間に慣れている。ユークリッド空間は
全ての空間に適用されない特性を有し、特にユークリッド空間の特性は、全ての
薬品化合物を含んだ空間（薬品空間）に適用されない。The majority of individuals are physically related to the “available” life experience in space-2D (
We are accustomed to 2D) and 3D Euclidean spaces. The Euclidean space has a property that does not apply to all spaces, and in particular, the property of the Euclidean space does not apply to a space containing all drug compounds (drug space).

【００５６】薬品空間は、ユークリッド空間とは似てなくて、可能な化合物の幾つか又は全
てをマッピングし得る空間に基づいて理解される。この空間は、非常に多数の化
合物を含む。分子量が８００以下の有機薬品化合物の数は約１０²⁰⁴と見積もら
れている。Drug space is understood to be similar to Euclidean space and is based on a space that can map some or all of the possible compounds. This space contains a very large number of compounds. The number of organic drug compounds with a molecular weight of 800 or less is estimated to be about 10 ²⁰⁴ .

【００５７】薬品空間は、化合物の化学構造及びこれらパラメーターに基づく非類似距離関
数（ｄｉｓｓｉｍｉｌａｒｉｔｙｄｉｓｔａｎｃｅｆｕｎｃｔｉｏｎ）から
計算し得るパラメーターにより一般に定められる。薬品空間を定めるために使用
し得るパラメーターは、限定するものではないが、例えば以下を含む。The drug space is generally defined by the chemical structure of the compound and the parameters that can be calculated from the dissimilarity distance function based on these parameters. Parameters that can be used to define the drug space include, but are not limited to, for example:

【００５８】ｃＬｏｇＰオクタノールと水との間の化合物のための計算された（見積もられた）分配係数；分子量（ＭＷ）；サイズに関連するステリモルのパラメーター；構造の電子的性質に関連するキール及びホールのパラメーター；ＤｉｖｅｒｓｅＳｏｌｕｔｉｏｎｓ^TMソフトウエアパッケージ（Ｕ．Ｔｅｘａｓ）において使用されるようなＢカット測定基準；ＭＤＬ，ＴｒｉｐｏｓＩｎｃ．，及びＤａｙｌｉｇｈｔＣｈｅｍｉｃａｌＩｎｆｏｍａｔｉｏｎＳｙｓｔｅｍｓより市販のソフトウエアパッケージに使用される種々のフィンガープリント測定基準；Ｔｒｉｐｏｓより市販のＨＱＳＡＲ^TMソフトウエアに定められた分子ホログラム測定基準。以上は例であり、網羅を意味しない。CLogP Calculated (estimated) partition coefficient for the compound between octanol and water; molecular weight (MW); size related sterimol parameters; keel and related electronic properties of structure. Hall parameters; B-cut metrics as used in the Diversion Solutions ^™ software package (U. Texas); MDL, Tripos Inc. , And various fingerprint metrics used in software packages commercially available from Daylight Chemical Information Systems; Molecular Hologram metrics defined in HQSAR ^™ software commercially available from Tripos. The above are examples and do not mean exhaustive.

【００５９】ユニティフィンガープリント空間及びＨＱＳＡＲホログラム空間（いずれもＴ
ｒｉｐｏｓ，Ｉｎｃ．Ｓｔ．Ｌｏｕｉｓ，ＭＯ．より）は上述の薬品空間概念の
セットの例であり、例えばユニティフィンガープリントは合成薬品空間測定基準
の例である。しかし、本発明は、これら空間には依存せず、かつこれら空間又は
測定基準に限定されない。Unity fingerprint space and HQSAR hologram space (both T
ripos, Inc. St. Louis, MO. Is an example of the set of drug space concepts described above, eg Unity Fingerprint is an example of a synthetic drug space metric. However, the invention is not dependent on these spaces and is not limited to these spaces or metrics.

【００６０】ユニティフィンガープリント空間においては、パラメーターはユニティフィン
ガープリントビットマップにおける種々のビットである。このビットマップは、
各化学構造について、構造をその下位構造断片の全てに分解することにより作ら
れる。次いで、各断片の特有の表現が、ビットマップにおけるある位置に作り直
される。フィンガープリントビットマップは、典型的に２００から１０００ビッ
ト長である。構造内に断片が見いだされたときは、これをマップするビットは１
に設定される。従って、フィンガープリントは、構造内の断片の全ての存在を示
している一連のビットである。In the unity fingerprint space, the parameters are the various bits in the unity fingerprint bitmap. This bitmap is
For each chemical structure, it is created by breaking the structure into all of its substructure fragments. The unique representation of each fragment is then recreated at some position in the bitmap. Fingerprint bitmaps are typically 200 to 1000 bits long. When a fragment is found in the structure, the bit that maps it is 1
Is set to. Therefore, the fingerprint is a series of bits that indicates the presence of all of the fragments in the structure.

【００６１】同じビットは、いかに遭遇するかにかかわり無く、常に、フィンガープリント
における同じビットに設定すべきである。そこで、同じ断片の多くを有する２個
の構造は、その対応するフィンガープリントに設定された多くの同じビットを持
つ。The same bit should always be set to the same bit in the fingerprint, regardless of how it is encountered. So two structures that have many of the same fragments will have many of the same bits set in their corresponding fingerprints.

【００６２】例えば、１０００ビットフィンガープリントが使われた場合は、このように定
義された薬品空間は１０００次元の離散的空間であり、各パラメーターは０か１
かのどちらかとすることができる。この空間は、１０００次元的「超立方体」で
あり、各可能な薬品はこの超立方体の頂点の一つの上にマップする。For example, when the 1000-bit fingerprint is used, the drug space thus defined is a 1000-dimensional discrete space, and each parameter is 0 or 1.
Can be either. This space is a 1000 dimensional "hypercube" and each possible drug maps onto one of the vertices of this hypercube.

【００６３】この空間の距離関数は、０から１００％にわたる類似の指数の一つに基づく。
距離は１から類似指数の値を引いたもの（ｄ＝１−ｓｉｍ）である。共通類似指
数は、タニモト指数、コサイン指数、及びその他の幾つかを含む。The distance function of this space is based on one of the similar indices ranging from 0 to 100%.
The distance is 1 minus the value of the similarity index (d = 1-sim). Common similarity indices include the Tanimoto index, the cosine index, and some others.

【００６４】タニモト指数は次のように定義される。[0064] The Tanimoto index is defined as follows.

【００６５】ｓｉｍ＝Ｃ／（Ａ＋Ｂ−Ｃ）ここに、Ｃ：両フィンガープリントにおけるビットセット数（＝１）Ａ：第１のフィンガープリントにおけるビットセット数Ｂ：第２のフィンガープリントにおけるビットセット数である。タニモト係数はほとんど万能的に使用され、アプリケーションでは、係
数は、一般に化合物の類似性に関する化学者の直感と関連する。Sim = C / (A + B−C) where C: the number of bit sets in both fingerprints (= 1) A: the number of bit sets in the first fingerprint B: the number of bit sets in the second fingerprint Is. The Tanimoto coefficient is used almost universally, and in applications the coefficient is commonly associated with the chemist's intuition regarding compound similarity.

【００６６】説明を簡単にするために、この文書では薬品空間を２Ｄユークリッド空間で表
す。これは便宜上行うだけである。ユニティフィンガープリント及びタニモト指
数により定められた薬品空間はユークリッド空間でなく、三角不等式に従わずか
つ離散的である。For simplicity of description, the drug space is referred to as a 2D Euclidean space in this document. This is done for convenience only. The drug space defined by the unity fingerprint and the Tanimoto index is not Euclidean space, but is slightly discrete according to the triangular inequality.

【００６７】薬品空間において、ある距離内にある（即ち、少なくある数の類似インデック
スを持った）全ての化合物は、その化合物の「近傍」を定める。これら化合物は
、ある化合物の類似半径内にあると言われ、或いは化合物の近傍球の中にあると
言われる。非ユークリッド空間においては、球は、別の点のある距離内の全ての
点として定義される。非ユークリッド球は３Ｄユークリッド空間における球の性
質を必ずしも持たず、定義された表面又は表面積を持たず、更に体積も持たない
。図における近傍の球は、ここでは、表現の便利のため、球の２Ｄ投影（円形）
で表されるであろう。All compounds within a distance (ie, having a small number of similarity indices) in the drug space define a “vicinity” of the compound. These compounds are said to be within the similar radius of a compound or within the sphere of the compound's neighborhood. In non-Euclidean space, a sphere is defined as every point within a distance of another point. Non-Euclidean spheres do not necessarily have the properties of spheres in 3D Euclidean space, have no defined surface or surface area, and have no volume. The neighboring sphere in the figure is a 2D projection of the sphere (circle) here for convenience of representation.
Will be represented by

【００６８】[0068]

【多次元薬品空間の２次元表現】薬品空間の概念を導入したが、ここに多次元薬品空間内への化合物のマッピン
グの例が与えられる。[Two-dimensional representation of multi-dimensional drug space] The concept of drug space was introduced, but here an example of mapping compounds into multi-dimensional drug space is given.

【００６９】図２は、仮定の２Ｄ薬品空間における既知の薬品化合物の下位セット例のマッ
ピングを示している線図である。実際の薬品空間は多次元でありかつ２個の測定
基準（測定基準１及び測定基準２）ではなくて多数の測定基準を含む。図２にお
いて、化合物は、２個の特性測定基準について化合物の特性値に基づいてこの空
間内にマッピングされる。FIG. 2 is a diagram showing a mapping of an example subset of known drug compounds in a hypothetical 2D drug space. The actual drug space is multi-dimensional and contains multiple metrics rather than two metrics (metric 1 and metric 2). In FIG. 2, compounds are mapped in this space based on the compound's property values for two property metrics.

【００７０】特性測定基準の値に従って薬品空間内に化合物をマッピングし、更に特性測定
基準値に従って化合物についての近傍を定める薬品データベースシステムが存在
する。かかるデータベースの例は、Ｔｒｉｐｏｓのユニティデータベースシステ
ムＭＤＬのＩｓｉｓ^TMデータベースシステム、その他を含む。本技術の通常の技
術者は、これら市販のデータベースが１個又はそれ以上のライブラリー１０６と
して使用できること、或いは与えられたアプリケーションのために注文のライブ
ラリーを作り得ることが、本特許の開示の読了後に明らかになるであろう。There are drug database systems that map compounds into drug space according to property metric values and further define neighborhoods for compounds according to property metric values. Examples of such databases include Tripos's Unity database system MDL's Isis ^™ database system, and others. One of ordinary skill in the art will appreciate that these commercially available databases can be used as one or more libraries 106, or custom libraries can be created for a given application. It will be apparent after reading.

【００７１】更に、図２は２個のライブラリーの組合せを示すが、１個又は適宜の数のライ
ブラリーを、ナビゲーションのために所与の薬品空間に組み合わせることができ
る。所与の化合物の母集団は動的であって常に変化してるので、ライブラリー１
０６は更新できかつ化学界の変化を反映して変化できることが考えられる。更に
、この例示図面のように、多くの異なった薬品空間を定めることができ、そして
すべての可能な薬品をこれら空間の各にマッピングすることができる。Further, while FIG. 2 shows a combination of two libraries, one or any number of libraries can be combined in a given drug space for navigation. Library 1 because the population of a given compound is dynamic and constantly changing
It is conceivable that 06 can be updated and changed to reflect changes in the chemical community. Further, as in this exemplary drawing, many different drug spaces can be defined and all possible drugs can be mapped into each of these spaces.

【００７２】図２において化合物は点１２２及び円１２４で表される。点１２２及び円１２
４の使用は、薬品空間にマッピングされた既知の化合物を、一つは点１２２によ
り表された化合物のデータベース、もう一つは円１２４により表された化合物の
データベースである２個の異なったデータベースから得ることができることを示
す。本明細書を読了した本技術の通常の技術者に明らかであろうように、この薬
品空間内に追加のデータベースからの化合物をマッピングすることもできる。こ
の開示を読了後の通常の技術者に明らかであるように、１個又はそれ以上のデー
タベースからの化合物を多次元空間内にマッピングし、この化合物に対してＮ個
まで特性測定基準を表すことができる。In FIG. 2, the compound is represented by points 122 and circles 124. Point 122 and circle 12
The use of 4 is a database of known compounds mapped to the drug space, one database of compounds represented by points 122 and another database of compounds represented by circles 124, two different databases. Show that you can get from. Compounds from additional databases can also be mapped into this drug space, as will be apparent to those of ordinary skill in the art having read this specification. Mapping a compound from one or more databases into a multidimensional space and representing up to N characterization metrics for this compound, as will be apparent to those of ordinary skill after reading this disclosure. You can

【００７３】２Ｄ薬品空間を作る種々の薬品近傍１２６も図２に示される。「セルベース」
近傍１２６が、破線で境界付けされて図２に示される。セルベース近傍は、幾つ
かの「箱」の中に測定基準を仕切ることにより定められる。この方法は、いかな
る化合物も含まない箱を含むセルベース近傍に導くことができる。与えれたセル
ベース近傍内に入る化合物１２２、１２４は、類似の化合物であると考えること
ができる。Various drug neighborhoods 126 that create a 2D drug space are also shown in FIG. "Cell base"
The neighborhood 126 is shown in FIG. 2 bounded by a dashed line. The cell base neighborhood is defined by partitioning the metric into several "boxes". This method can lead to cell base neighborhoods containing boxes that do not contain any compounds. Compounds 122, 124 that fall within a given cell base neighborhood can be considered similar compounds.

【００７４】これに代わる好ましい近傍は、図３及び大きい円１３２を参照して説明される
であろうように、所与の化合物又は仮想化合物の領域内又はある距離内の点に基
づく。円１３２内の化合物は、実際の化合物１２２Ａ及び１２４Ａを含んでいる
仮想化合物１３０Ａの近傍内にあると言われる。この「距離ベース」近傍は、実
際又は仮想の化合物により定めることができ、各化合物はそれ自体の近傍を定め
る。Alternative preferred neighborhoods are based on points within a region or distance of a given compound or virtual compound, as will be described with reference to FIG. 3 and large circle 132. Compounds within circle 132 are said to be in the vicinity of virtual compound 130A, which contains actual compounds 122A and 124A. This "distance-based" neighborhood can be defined by actual or virtual compounds, each compound defining its own neighborhood.

【００７５】図３は、図２に示されたような既知の薬品のライブラリーと組み合わせられた
薬品空間内の仮想化合物ライブラリーの例を示す。本発明の一実施例において、
仮想ライブラリー１０８は、ライブラリー１０６の化合物の数と比較して大量の
仮想化合物を含み、更に仮想化合物は薬品空間にわたって比較的一様に分布して
いることが好ましい。この実行の結果として、先に合成され特性測定基準値を持
った化合物の有無に拘わらず、特性測定基準値の与えられたセットとよく適合す
る仮想化合物を見いだすことができる。FIG. 3 shows an example of a virtual compound library in drug space combined with a library of known drugs as shown in FIG. In one embodiment of the present invention,
The virtual library 108 contains a large amount of virtual compounds compared to the number of compounds in the library 106, and preferably the virtual compounds are relatively evenly distributed over the drug space. As a result of this execution, it is possible to find a virtual compound that fits well with the given set of characteristic measurement reference values, regardless of the presence of previously synthesized compounds having characteristic measurement reference values.

【００７６】図３に示されるように、既知の化合物１２２、１２４と比較して比較的数の多
い仮想化合物１３０がある。仮想化合物の数が多いので、仮想化合物は、定めら
れた薬品空間内で多数の位置を占めることができる。そこで、確率的観点から、
使用者は、使用者により指定された希望の特性測定基準に対する値の所与のセッ
トに相当し又はよく適合する仮想化合物を見いだすであろうことがより有りそう
である。仮想データセットと既知のデータセットとの組合せにより、仮想化合物
を、既知の化合物による近傍内にグループ分けすることができる。As shown in FIG. 3, there are a relatively large number of virtual compounds 130 compared to known compounds 122, 124. Due to the large number of virtual compounds, the virtual compounds can occupy multiple positions within the defined drug space. So, from a probabilistic point of view,
It is more likely that the user will find a virtual compound that corresponds or fits well into a given set of values for the desired property metric specified by the user. The combination of the virtual data set and the known data set allows the virtual compounds to be grouped into neighborhoods by known compounds.

【００７７】[0077]

[Nearby effect]

薬品空間は、これがある種の関連特性について「近傍効果」を示すならば有用
であると考えることができる。近傍効果は、ある特定の希望値を有するある特定
の化合物に類似した化合物が、すべての化合物の母集団よりも特定の化合物の特
性値に類似の特性値を有することがより有りそうなときに生ずる。近傍効果の例
として、ユニティフィンガープリント及びタニモトの類似指数を使用して、活性
的な化合物に８５％類似した化合物が、無作為に選ばれた化合物よりも３０倍以
上生物学的活性であり得ることが見いだされた。A drug space can be considered useful if it exhibits a "neighborhood effect" for certain relevant properties. The neighborhood effect is when a compound that is similar to a particular compound with a particular desired value is more likely to have a property value that is more similar to the property value of a particular compound than the population of all compounds. Occurs. Using Unity Fingerprint and Tanimoto's Similarity Index as an example of a neighborhood effect, a compound that is 85% similar to an active compound may be 30 times more biologically active than a randomly selected compound It was found.

【００７８】生物学的活性及び毒性に対して近傍効果を作ることが見いだされた薬品空間は
、ダイバースソルーション（Ｕ．Ｔｅｘａｓ）のｂ−ｃｕｔ空間、フィンガープ
リント空間（Ｔｒｉｐｏｓ，ＭＤＬ，Ｄａｙｌｉｇｈｔ）及びＣｅｒｉｕｓ２（
ＭＳＩ）のＳＡＲ空間を含む。しかし、ある種の特性及び測定基準は、近傍効果
を作らないことに注意すべきである。例えば、分子量及びｃＬｏｇＰに基づく薬
品空間については、活性化合物の近傍が同様に生物学的に活性である強化された
可能性の有る活性化合物を見いだすことは期待されない。従って、分子量及びｃ
ＬｏｇＰにより定められた空間は、生物学的活性に対する近傍効果を作らない。
通常の技術者は、熟練者の特別な要求に適した近傍効果基準を確立する能力を信
じる。The drug spaces found to create a neighborhood effect on biological activity and toxicity are the b-cut space of the divers solution (U. Texas), the fingerprint space (Tripos, MDL, Daylight) and Cerius2. (
SAR space of MSI). However, it should be noted that certain properties and metrics do not produce near-field effects. For example, for drug spaces based on molecular weight and cLogP, it is not expected to find potentially potentiated active compounds in the vicinity of the active compound that are also biologically active. Therefore, the molecular weight and c
The space defined by LogP does not create a neighborhood effect on biological activity.
Those of ordinary skill in the art believe in the ability to establish neighborhood effect criteria that suit the special needs of the skilled person.

【００７９】[0079]

[Fuzzy-like bond, drug-like bond]

薬品類似結合は、これを使用者に関心があるかもしれない化合物の識別を支援
するために使うことができる。しかし、薬品類似結合を明らかにするより前に、
リレーショナルデータベースシステムにおける結合演算の概念をまず説明するこ
とが有用である。Drug-like bonds can be used to help identify compounds that may be of interest to the user. But before revealing drug-like bonds,
It is useful to first explain the concept of join operations in relational database systems.

【００８０】リレーショナルデータベースシステムにおいては、関係情報の記録を格納する
ために表が使用される。各表がエンティティ形式のオカレンスを表し、従って表
の属性がエンティティを定める。例えば、表が、ある会社の種々の部門に関する
情報を格納するために使用されるとする。表の属性は、部門ＩＤ番号、部門名称
、予算コード、マネージャー氏名、建物所在地などのような項目を含むであろう
。かかる表の例が図４Ａに示される。In relational database systems, tables are used to store records of relational information. Each table represents an occurrence of the entity type, so the attributes of the table define the entity. For example, a table may be used to store information about various departments of a company. The attributes of the table may include items such as department ID number, department name, budget code, manager name, building location, and so on. An example of such a table is shown in Figure 4A.

【００８１】別の表は、従業員ＩＤ番号、社会保険番号、氏名、部門ＩＤ番号、事務所所在
地、肩書などを含んだ会社従業員に関する情報を含むことができる。かかる表の
例が図４Ｂに示される。Another table may include information about company employees including employee ID number, social insurance number, name, department ID number, office location, job title, and so forth. An example of such a table is shown in Figure 4B.

【００８２】これらの表の情報は、データベース「結合」を使用して対象者の情報を作るた
めに組み合わせることができる。結合は、ある表の列を、記録のあるコラムの値
との間の関係に基づいて別の表のものに組み合わせる。例えば、各従業員に対す
る部門名称を与える新しい疑似表（ビューと呼ばれる）を与えるために、従業員
表を部門表と組み合わせることができる。結合の作成を達成するであろう構造化
照合言語（ＳｔｒｕｃｔｕｒｅｄＱｕｅｒｙＬａｎｇｕａｇｅ，ＳＱＬ）照
合の例は、部門表、従業員表から、（部門表の部門ＩＤ＝従業員表の部門ＩＤ）である氏名名、事務所、従業員ＩＤ、部門名称を選択せよとすることができる。かかるＳＱＬ照合を使用して作られたビュー表が図４Ｃに
示される。The information in these tables can be combined to create information about the subject using database “joins”. A join combines the columns of one table with those of another table based on the relationship between the values in one column of the record. For example, the employee table can be combined with the department table to provide a new pseudo-table (called a view) that gives the department name for each employee. An example of a Structured Query Language (SQL) collation that would achieve the creation of a join is from the department table, employee table, (department table department ID = employee table department ID) Name You can choose to select name, office, employee ID, department name. A view table created using such SQL matching is shown in Figure 4C.

【００８３】リレーショナルデータベースは、同様な方法で種々の化合物ライブラリーに関
する情報を格納するために使うことができる。薬品化合物及びこれに関連する情
報の文脈において、異なったライブラリーは、化学構造、販売者、価格、所在地
、毒性データ、生物スクリーニングデータ等を含んだ異なった属性を持つ可能性
がある。種々のライブラリーに含まれる情報は、情報の価値を増すために結合さ
せることができる。例えば、ある表は販売者Ａから入手可能なデータを含み、第
２の表は化合物のあるセットのＲａｔＬＤ₅₀のデータを含むことがある。これ
らの表は、入手可能でかつ良好な毒性結果を有する化合物に関する情報を提供す
るために結合することができる。一実施例においては、この結合は、化学構造を
使って達成することができる。Relational databases can be used to store information about various compound libraries in a similar manner. In the context of drug compounds and related information, different libraries may have different attributes, including chemical structure, vendor, price, location, toxicity data, bioscreening data, and so on. Information contained in various libraries can be combined to increase the value of the information. For example, a table contains data available from the merchant A, the second table may contain data of Rat LD ₅₀ of the set of compounds. These tables can be combined to provide information on compounds that are available and have good toxicity results. In one example, this attachment can be accomplished using a chemical structure.

【００８４】図５Ａは、化合物についての販売者表の例を示す線図である。図５Ｂは、図５
Ａの表に列挙された化合物の下位セットに対する毒性表の例を示す線図である。
図５Ｃは、図５Ａの表の化合物の入手可能性と、図５Ｂに示された表の化合物の
セットのＲａｔＬＤ₅₀データとを示している得られた結合表である。FIG. 5A is a diagram showing an example of a vendor table for compounds. 5B is the same as FIG.
FIG. 6 is a diagram showing an example of a toxicity table for a subset of the compounds listed in Table A.
FIG. 5C is a resulting binding table showing availability of compounds in the table of FIG. 5A and Rat LD ₅₀ data for the set of compounds in the table of FIG. 5B.

【００８５】図５Ａ−５Ｃに示された結合は、入手可能なカタログにおける構造が、対象の
属性について試験済みである状況であり有用である。上述された毒性データの例
については、毒物学的終点及びこのデータが存在している「毒性」表に有る場合
に有用である。残念なことに、今日まで、既知の毒物学データを有する販売者か
ら、或いは多くのデータを作っている（連携している）化学者から利用可能な化
合物は非常に少ない。そこで、使用者は、伝統的な薬品探索カタログを使用して
毒性の試験済みのある特定の化合物を探索しない限り、使用者が対象の化合物に
関する毒性情報を得ることはまず有りそうもない。このことが、なぜ薬品類似結
合が本発明の本質的な特徴であるかの第１の理由である。The bonds shown in FIGS. 5A-5C are useful in situations where the structures in the available catalog have been tested for attributes of interest. The examples of toxicity data given above are useful if they are in the toxicological endpoint and the "toxicity" table in which this data exists. Unfortunately, to date, very few compounds are available from vendors with known toxicology data, or from chemists who produce (link) a lot of data. Thus, it is unlikely that the user will obtain toxicity information about the compound of interest unless the user searches for a particular compound that has been tested for toxicity using a traditional drug search catalog. This is the first reason why drug-like bonds are an essential feature of the present invention.

【００８６】この文脈においては、薬品類似結合のこの開示は「ファジー」類似結合の我々
の開示の例である。例示の状況より広い文脈においては、ファジー類似結合は、
同じ論理に従い、即ち、通常は組み合わせられず又は一緒に組織化されない情報
を、ライブラリ情報統合内の化合物間の類似特性に基づいて組織化し又は組み合
わせることによる。In this context, this disclosure of drug-like bonds is an example of our disclosure of “fuzzy” like bonds. In a broader context than the example situation, the fuzzy like join is
By following the same logic, that is, organizing or combining information that is not normally combined or organized together based on similar properties between compounds within library information integration.

【００８７】[0087]

[Genomics]

保存されたベースの数に、或いはＦａｓｔＡｏｒＢｌａｓｔ探索システム
により例示されるようなシーケンスのより洗練された解析に基づいて２個のシー
ケンスについて類似指数が定められる。この類似尺度に基づく結合は、未知の遺
伝子のシーケンス又は未知の機能を、よく特徴付けられた遺伝子のデータベース
を使って連結するために使用される。これは、新しい遺伝子をファミリー又はス
ーパーファミリー内に置くことを助け、又はその機能の識別を支援する。A similarity index is defined for the two sequences based on the number of stored bases or on a more sophisticated analysis of the sequences as exemplified by the FastA or Blast search system. This similarity measure based ligation is used to connect sequences of unknown genes or unknown functions using a well-characterized database of genes. This helps place the new gene within a family or superfamily, or helps identify its function.

【００８８】[0088]

[Group clinical trial]

臨床治験の候補は、標準人口統計データ、身体特性、個人プロフィルツール、
運転記録などを含む多彩な測定基準により記述することができる。遺伝子プロフ
ァイリングの結果も「ピープル空間」における記述語として使うことができる。
距離尺度は幾つかの方法で決めることができ、そしてこのピープル空間にマッピ
ングされた任意の２名の間の距離の大きさとしてこれを特徴付けることができる
。距離関数の一例は次のようである。１）各ピープル空間測定基準を、セット全
体における最小値及び最大値に基づいて範囲０−１に正規化する、２）次式によ
り距離を計算する。Candidates for clinical trials include standard demographic data, physical characteristics, personal profile tools,
It can be described by various measurement standards including driving records. The results of gene profiling can also be used as descriptive words in "people space".
The distance measure can be determined in several ways, and it can be characterized as the magnitude of the distance between any two persons mapped to this people space. An example of the distance function is as follows. 1) Normalize each people space metric to the range 0-1 based on the minimum and maximum values in the entire set. 2) Calculate distance by

【００８９】ｄ（Ａ，Ｂ）＝Σｍｉｎ（Ａ₁Ｂ₁）／ｍａｘ（Ａ₁Ｂ₁）これは、パラメーター値０及び１に限定されない空間についてのタニモト係数の
一般的な形式である。この類似尺度を使用して、潜在的な対象者のデータベース
を、ある人が望ましくない副作用を受けた前の試験結果のデータベースにリンク
させることができる。次いで、新しい対象者の感受性を、前の試験対象者との類
似に基づいて前の結果から見積もることができる。D (A, B) = Σmin (A ₁ B ₁ ) / max (A ₁ B ₁ ) This is a general form of Tanimoto coefficients for spaces not limited to parameter values 0 and 1. This similarity measure can be used to link a database of potential subjects to a database of previous test results in which a person has experienced undesirable side effects. The sensitivity of the new subject can then be estimated from the previous results based on similarities to the previous test subject.

【００９０】ここに明らかにされた薬品類似結合の例を含むファジー類似結合は、限定する
ものではないが、リレーショナルデータベースシステム、非関連データベースシ
ステム、ファイルベースの情報システム、スプレッドシート型システム、或いは
インデックスカード、報告書の集積を含む非コンピューターベースのシステムな
どを含んだ種々の方法で実行することができる。好ましい実施例はオラクルデー
タベースシステムを使用するであろう。オラクルにおいては、ファジーデータベ
ース結合は、新しいエンティティとして化学構造を、また化学構造エンティティ
の新しいオペレーター作用として類似比較を定めるオラクルカートリッジとして
実行することができる。結合は、外部手続きセルとしてオラクルで実行すること
ができる。この実施例においては、ファジー類似結合の仕様及びその他の照合基
準を認めるように、照合を特定するためにＳＱＬ言語が強化される。Fuzzy-like joins, including examples of drug-like joins disclosed herein, include, but are not limited to, relational database systems, unrelated database systems, file-based information systems, spreadsheet-based systems, or indexes. It can be implemented in a variety of ways, including cards, non-computer based systems including collection of reports, etc. The preferred embodiment will use the Oracle database system. In Oracle, a fuzzy database join can be implemented as an Oracle cartridge that defines a chemical structure as a new entity and a similar comparison as a new operator action of a chemical structure entity. The combination can be performed in Oracle as an external procedure cell. In this embodiment, the SQL language is enhanced to specify a match to allow fuzzy like join specifications and other matching criteria.

【００９１】別の実行は、フィンガープリントビットマップ又はその他の構造ベースの測定
基準の先行計算及びオラクル内又はオラクル外部システムのファイル内への先行
計算測定基準の記憶を含むことができる。更に別の実行は、構造の幾つかの対又
はすべての対の類似の先行計算及び類似値のオラクル内又は外部ファイル内又は
システム内への格納を含むであろう。Another implementation may include pre-calculation of fingerprint bitmaps or other structure-based metrics and storage of the pre-calculated metric within Oracle or in a file of the Oracle external system. Yet another implementation would include similar precomputation of some or all pairs of structures and storage of similar values in oracle or in an external file or system.

【００９２】本発明は、実行にオラクルのようなリレーショナルデータベースシステムを使
用するか否かに拘わらず、ファジー類似に基づいた記録の「観念的」結合（即ち
、記録は望ましい方法で結合されない）に関連し、そしてファジー類似結合の照
合仕様とその他の照合基準との統合には依存しない。The present invention allows "ideal" combining of records (ie records are not combined in the desired way) based on fuzzy similarity, whether or not using a relational database system such as Oracle for implementation. Relevant and independent of the integration of fuzzy-like join matching specifications with other matching criteria.

【００９３】薬品類似結合に戻り、正確に同じ化学構造に対して第１次の表の情報値を強化
するであろうデータベースは入手できないことが多いが、類似の構造については
入手可能であることが多いことに注意した。薬品空間が、問題の特性（例えば、
毒性）に対する近傍効果があるような方法で定められるならば、第１次の表にお
ける化合物の特性は、第２次の表の同様な化合物の特性データから推定できる。
このことは、例えば２個の表は構造が正確には適合しないが２個の構造の類似比
較により結合できるためであり、薬品類似結合についての基本である。Databases that would reinstate the drug-like bond and enhance the information values in the first order table for exactly the same chemical structure are often unavailable, but similar structures are available Note that there are many. The drug space is a characteristic of the problem (eg,
The properties of the compounds in the first order table can be inferred from the property data for similar compounds in the second order table, provided that there is a near effect on toxicity.
This is because, for example, two tables are not exactly structurally compatible, but can be joined by a similarity comparison of the two structures, which is the basis for drug-like binding.

【００９４】一実施例による薬品類似結合においては、２種の化合物の類似指数がある特定
の閾値（例えば８０％）より大きいならば、２個の表からの列が得られた表の１
列に組み合わせられる。これに対する特性例の近傍効果が存在し従って有用な薬
品類似結合は、限定するものではないが以下を含む。In a drug-like bond according to one embodiment, if the similarity index of two compounds is greater than a certain threshold (eg, 80%), one row of the two tables is obtained.
Combined in rows. Drug-like bonds for which there are neighborhood effects of property examples and are therefore useful include, but are not limited to:

【００９５】生物効果−１次スクリーニングのヒットから作られた表（すべて活性）は、２次スクリーニングにおいて試験すべきＳＲＡ近傍を見いだすために、入手可能な薬品表におけるものと結合させることができる。[0095] Biological effects-a table (all active) made from hits in the primary screen To find the SRA neighborhood to be tested in the secondary screen, Can be combined with those in the available medication table.

【００９６】毒性データ−獲得し得る化合物の表は、毒薬データの表と結合させることができる。得られた表は、１次の表の各化合物について、毒薬表における近傍についての平均毒性データを含むことができる。この値は１次表の化合物についての毒性の推測値であり、潜在的な有毒化合物の除去、又は試験スケジュールの優先順をつけるために使うことができる。[0096] Toxicity data-table of available compounds should be combined with table of toxicology data You can The table obtained is for each compound in the following table Average toxicity data for the neighborhood can be included. This value is converted to the primary table Estimate of toxicity for compounds, removal of potential toxic compounds, or testing Can be used to prioritize trial schedules.

【００９７】特許適用範囲−幾つかの特許が特定の構造並びに範囲内薬品の包括的記述を含むことがある。特定構造は、特許化合物のデータベースにおいて入手可能であることが多い。化合物が包括的説明の範囲内であるか否かを判定することはより困難であるが一つずつ行うことができる。特許表により獲得された薬品表の薬品類似結合は、直接特許範囲内に、幾つかの関係の深い構造を作るために使用することができる。これは、「特許警告フラグ」を表すことができ、例えば、化合物の２０個の近傍が一つ又はそれ以上の特許の範囲内にある場合は、これは、この化合物も範囲内であるか否かを調べるように使用者に合図することができる。[0097] Patent Scope-Some patents have a specific structure and a comprehensive description of the drugs in scope May be included. Specific structures available in database of patented compounds Often possible. Determine if compound is within comprehensive description It is more difficult to do, but can be done one by one. Obtained by the patent table The drug-like bond of the obtained drug table is directly related to the scope of patent and has several relations. It can be used to create a good structure. This is the "patent warning flag" Can be represented, for example, the 20 neighbors of the compound can be represented by one or more If within the scope of the patent, this is whether this compound is also within scope. The user can be signaled to look up.

【００９８】拡張性−獲得可能な化合物の１セット（作り得るが作られたことのない化合物のセット）が大きい仮想データベースと結合された場合は、化合物の周辺の拡張能力に関する情報を推定できる。これは、１次スクリーニングライブラリーの要素の選定、又は追跡調査すべき構造のヒット表からの選定に有用である。この場合、「合成可能」薬品空間に少なくも幾つかの近傍を有する化合物は、そうでないものより価値が大きい。仮想近傍の数は拡張性の尺度となる。[0098] Extensibility-a set of obtainable compounds (can be made but never made) When a compound set) is combined with a large virtual database, Estimate information about peripheral expansion capabilities. This is the primary screening Select library elements or select from hit table of structures to be followed. Useful for certain purposes. In this case, there is at least some proximity to the "synthesizable" drug space. Compounds with aside are more valuable than those without. The number of virtual neighbors is It is a measure of expandability.

【００９９】ＳＡＲ１次選定−獲得可能な化合物のセットが自分自身と結合された場合（回帰薬品類似結合）は、セット内の各化合物は、それが有する近傍の数を属性とすることができる。これにより、化合物の選定は、小さなＳＡＲ型のクラスターを含んだ１次スクリーニングＳＥＴを形成できる。[0099] SAR primary selection-when a set of attainable compounds is combined with itself (Regressive drug-like bond) is the number of neighbors each compound in the set has Can be an attribute. As a result, the selection of compounds is small SAR. A primary screening SET containing clusters of types can be formed.

【０１００】薬品類似結合についての基礎が明らかにされたが、ここで、薬品類似結合を使
っている使用者に対して、有力な化合物を識別する仮定を示している図６の機能
的な流れ図を参照する。図６の段階２５２において、自分の要求に適合した化合
物を探している使用者は、その探している目標化合物を識別することができる。
使用者は、一般に化学構造により対象の特定化合物を識別するであろう（ただし
、その他の方法、例えば化学名も利用することができる）。The basis for drug-like bonds has been elucidated, but now the functional flow diagram of FIG. 6 showing the hypothesis for identifying potential compounds for users using drug-like bonds. Refer to. In step 252 of FIG. 6, a user seeking a compound that meets their needs can identify the target compound for which they are seeking.
The user will generally identify a particular compound of interest by its chemical structure (although other methods, such as chemical names, are available).

【０１０１】使用者は識別された目標化合物と類似の化合物を探すことを望むため、使用者
は、段階２５６において、段階２５２で識別された目標化合物の特性測定基準に
対する値の容認しうる近傍又は範囲を定める。近傍は、目標化合物を囲む化学空
間の範囲であって探索が受容可能である範囲と考えることができる。或いは、図
２及び３に関して上述されたように、この段階は、使用者が予め定められた近傍
（例えば、化合物があるセルベース近傍）内を探索することを単純に選び得るよ
うには要求されない。Since the user desires to look for a compound similar to the identified target compound, the user, in step 256, selects an acceptable neighborhood of values for the characterization criteria of the target compound identified in step 252. Define the range. The neighborhood can be considered as the range of the chemical space surrounding the target compound and the range in which the search is acceptable. Alternatively, as described above with respect to FIGS. 2 and 3, this step is not required so that the user may simply choose to search within a predetermined neighborhood (eg, a cell-based neighborhood where the compound is). .

【０１０２】段階２６０において、使用者は、サーバー１０４に目標化合物及び近傍範囲を
提供する。これらのデータは、インターネットを介してサーバー１０４に提出さ
れる。使用者がこの情報をサーバー１０４に提供し易くするために、この実施例
における形式及びその他適切なインターフェースを使用者に提供することができ
る。At step 260, the user provides the server 104 with the target compound and the neighborhood. These data are submitted to the server 104 via the Internet. To facilitate the user in providing this information to the server 104, the format in this embodiment and other suitable interfaces may be provided to the user.

【０１０３】段階２６４において、薬品類似結合が実行される。薬品類似結合の基礎を形成
するために種々の技法を使うことができる。例えば、タニモト係数は、化合物間
の類似性を判定し、これによりある化合物が選ばれた近傍内にあるか否かを判定
するために使用できる。分子ホログラムは、２以上の分子構造を比較し、化合物
が定められた近傍内にあるか否かを判定するために使用することができる。分子
ホログラム及びＱＳＡＲにおけるその使用は、参考文献としてここに取り入れら
れたハースト他の米国特許５７５１６０５号に見いだすことができる。At step 264, drug-like binding is performed. Various techniques can be used to form the basis for drug-like bonds. For example, the Tanimoto coefficient can be used to determine the similarity between compounds and thereby whether a compound is within a selected neighborhood. Molecular holograms can be used to compare two or more molecular structures to determine if a compound is within a defined neighborhood. Molecular holograms and their use in QSARs can be found in Hurst et al., US Pat. No. 5,751,605, incorporated herein by reference.

【０１０４】段階２７２において、１次データセット内の化合物の特性を推定するために結
合された記録が平均され、計数され、又はその他の統計技術により処理される。
例えば、２次データセットに毒物学データが含まれた場合は、関連化合物の毒性
値の平均を、１次化合物についての毒性の推定値として使うことができる。或い
は、関連構造及びその特性値に関する詳細情報は、全体で戻されることがあり得
る。At step 272, the combined records are averaged, counted, or processed by other statistical techniques to estimate the properties of the compounds in the primary data set.
For example, if toxicological data is included in the secondary data set, the average toxicity value of related compounds can be used as an estimate of toxicity for the primary compound. Alternatively, detailed information about the associated structure and its property values may be returned in its entirety.

【０１０５】段階２７６において、サーバー１０４は、探索結果を使用者ワークステーショ
ン１０２において使用者に提供する。上述のように、使用者は探索結果として戻
された化合物を評価でき、その化合物を使用者の研究目的で購入するか否かを決
定する。一実施例においてはサーバー１０４は、使用者への化合物の販売を調
整でき、又は扱うことさえできる。例えば、インターネット実施例において、サ
ーバー１０４は、通常の「ｅコマース」サイトとして化合物の販売を完了する能
力を持つこともできる。更に、使用者は、化合物の購入を行い完了するために別
のサーバー又は別のサイトに進むことができる。At step 276, the server 104 provides the search results to the user at the user workstation 102. As described above, the user can evaluate the compounds returned as search results and decide whether to purchase the compounds for their research purposes. In one embodiment, the server 104 can coordinate or even handle the sale of the compound to users. For example, in the internet embodiment, the server 104 may also have the ability to complete the sale of the compound as a regular "e-commerce" site. Further, the user can go to another server or another site to purchase and complete the compound.

【０１０６】本発明の種々の特徴及び態様を解明するために、数例の使用者シナリオが述べ
られる。これらのシナリオは、研究の設定に利用し得る本発明の１又は複数の実
施例への使用を説明する。このシナリオの読了後、本技術の通常の技術者は、こ
れら及び多くの別のシナリオについて本発明をいかに実行するかが明らかになる
であろう。To illustrate various features and aspects of the present invention, some user scenarios are described. These scenarios describe use for one or more embodiments of the invention that may be utilized in a research setting. After reading this scenario, one of ordinary skill in the art will become apparent how to practice the invention for these and many other scenarios.

【０１０７】図７は、シナリオの一例を示す操作可能な流れ図であり、これにより、使用者
は入手可能な化合物の大きいライブラリー又はデータベースから化合物の１次選
定を行うために薬品ライブラリー統合ツールを使うことができる。第１に、この
設定においては、使用者は考慮されている化合物のセットから望ましくないいか
なる化合物も削除することを望むことができる。これが段階４２２により示され
る。この段階において、望ましくない化合物として使用者の定義した基準に合致
するセット内の化合物が排除される。例えば、使用者は、分子の大きさ、ｃＬｏ
ｇＰの範囲、反応性又は毒機能性グループなどに基づいて化合物を排除すること
を決定できる。かかる排除は使用者の特定の要求に基づく。FIG. 7 is an operational flow chart showing an example of a scenario that allows a user to use a drug library integration tool to perform a primary selection of compounds from a large library or database of available compounds. Can be used. First, in this setting the user may wish to remove any unwanted compounds from the set of compounds under consideration. This is indicated by step 422. At this stage, compounds in the set that meet user-defined criteria as unwanted compounds are eliminated. For example, the user may use the molecular size, cLo
Exclusion of compounds can be determined based on gP range, reactivity or toxic functional groups, and the like. Such exclusion is based on the specific needs of the user.

【０１０８】段階４２４において、使用者は、その探索したい類似範囲を定めるために近傍
半径を選択する。一実施例においては、使用者は、ＳＡＲクラスターについての
半径を選ぶ。この半径は、別の半径を選ぶことができるが、典型的に約０．８又
は８０％であるように選ばれる。At step 424, the user selects a neighborhood radius to define the similarity range he wishes to search. In one embodiment, the user chooses a radius for the SAR cluster. This radius is typically chosen to be about 0.8 or 80%, although other radii can be chosen.

【０１０９】一シナリオにおいては、使用者は、薬品空間内の種々の区域における小さいＳ
ＡＲクラスターについて検索することができる。例えば、使用者は、１０個の化
合物クラスターを探索することができる。そこで、段階４２６において、使用者
は、希望の近傍占有率を選び、更にこの探索過程において選びたい化合物の希望
の総数を示す。In one scenario, the user may have small S in various areas within the drug space.
You can search for AR clusters. For example, the user can search for 10 compound clusters. Thus, in step 426, the user selects the desired neighborhood occupancy and also indicates the desired total number of compounds desired in the search process.

【０１１０】段階４２８において、ライブラリー又はデータベースは、例えば化学類似結合
を使ってそれ自体に結合される。この結合は、各化合物に対するある数の近近傍
を作る。もう一度述べるが、この結合は、各化合物の定められた近傍内のある数
の化合物を作る。薬品類似結合により、使用者は選ばれた類似半径内にある化合
物を捜し出すことができる。At step 428, the library or database is attached to itself using, for example, a chemical analogy. This bond creates a number of near neighbors for each compound. Once again, this bond creates a number of compounds within the defined neighborhood of each compound. The drug-like bond allows the user to seek out compounds that are within the selected similarity radius.

【０１１１】種々の近傍(neighborhood)内の化合物を数え、少なくもＸ個の近傍を有する化
合物がＸ又はその付近と一緒に選ばれ、データセットから排除される。この選択
は、前に排除も選択もされなかったセットから行われる。残った近傍はデータセ
ットから排除される。これが段階４３０、４３１及び４３２により示される。Compounds within various neighborhoods are counted, and compounds with at least X neighborhoods are selected with or near X and excluded from the data set. This selection is made from the set that was not previously excluded or selected. The remaining neighborhoods are excluded from the dataset. This is indicated by steps 430, 431 and 432.

【０１１２】この近傍排除の過程は、希望数の化合物が評価のため選ばれるまで続けられる
。これは段階４３４により示される。希望数の化合物が選定され、又は選択する
化合物がこれ以上無くなると、相応者は、選定された化合物のサンプルを、例え
ば生物学的スクリーニングのために注文することができる。This process of neighborhood exclusion is continued until the desired number of compounds has been selected for evaluation. This is indicated by step 434. Once the desired number of compounds has been selected or there are no more compounds to select, the corresponding person can order a sample of the selected compounds, eg for biological screening.

【０１１３】ライブラリーにおける別の共通的な実行はリードの追跡である。この方法は、
一般に、使用者がより広範な探索の範囲に入らないリードに従ったときに、その
リードが更なる考慮を払うに十分に重要なものでるかどうかを決定するために一
般に使用される。図７は、本発明の一実施例によい、リードの追跡を行うために
本発明を使う方法例を示している線図である。Another common run in the library is read tracking. This method
It is commonly used to determine if a lead is sufficiently important to give further consideration when the user follows a lead that falls outside of the broader search. FIG. 7 is a diagram illustrating an example method of using the present invention to perform lead tracking, according to one embodiment of the present invention.

【０１１４】さて、図８を参照すれば、高出力スクリーニングからのヒットの表が示され述
べられ、又は表にロードされる。これは段階４６２により示される。Referring now to FIG. 8, a table of hits from the high power screen is shown, described, or loaded into the table. This is indicated by step 462.

【０１１５】段階４６４において、使用者は希望の類似半径を選択する。上述の例示シナリ
オによるように、この過程で使用される一つの類似半径は、他の半径を選ぶこと
ができるが、典型的に０．８又は８０％である。At step 464, the user selects the desired similarity radius. As with the example scenario above, one similar radius used in this process is typically 0.8 or 80%, although other radii can be chosen.

【０１１６】段階４６６において、ヒット表は、薬品類似結合を使用して毒物データの表と
結合される。薬品類似結合が標準結合の代わりに使用されるため、この結合作業
は主題化合物の類似半径内にある毒物表内の化合物を含む。再び説明すれば、識
別された化合物が毒物表内に見いだされた場合は続く結合作業は不要である。類
似結合はより広範囲であり、定められた類似半径内の化合物を捕捉する。この結
合においては、結合された種々の列に対する毒物データが平均されて、１次表構
造についての毒性の予測を確立する。At step 466, the hit table is combined with the table of toxicology data using drug-like joins. Because drug-like bonds are used instead of standard bonds, this bonding task involves compounds in the poison table that are within the similar radius of the subject compound. Once again, if the identified compound is found in the toxicology table, no subsequent conjugation work is necessary. Similar binding is more extensive, trapping compounds within a defined similar radius. In this binding, the toxicant data for the various columns that were bound are averaged to establish a prediction of toxicity for the primary table structure.

【０１１７】段階４６８において、使用者は毒性遮断値を選ぶ。例えば、使用者は、毒薬の
１回分服用量に定められた最小値以下の化合物を望まないように定めることがで
きる。段階４７０において、遮断値以上の毒性予測値を有するヒットを排除する
ことができる。即ち、一実施例においては、対象化合物と毒物表との薬品類似結
合が、小さい１回分の予測毒薬服用量の結果を得た場合はその化合物は排除され
る。At step 468, the user selects a toxicity block value. For example, the user may undesirably seek no more than a defined minimum of a single dose of toxic drug. In step 470, hits with predictive toxicity values above the block value can be eliminated. That is, in one embodiment, a compound is excluded if the drug-like bond between the compound of interest and the toxicant table results in a small, one-time predicted dose of the toxic drug.

【０１１８】このシナリオにおいて、使用者は、潜在的化合物の特許範囲に従属する情報も
考える。段階４７２において、得られた表が、その分野に属する特許の可能性を
判定するために特許化合物の表と結合される。即ち、段階４７０の後に残った化
合物が、薬品類似結合を使用して特許化合物の表と結合される。この結合におい
て、結合された記録の数が数えられる。段階４７４において、使用者は、容認し
得る限度を定める特許記録の数を入力する。得られた表は、特許範囲内にあるか
又は影響される主題化合物の類似半径内の化合物の数を判定するために検査され
る。段階４７６において、使用者により選定された遮断値以上のヒットを排除す
ることができる。In this scenario, the user also considers the information subject to the patent coverage of the potential compound. In step 472, the resulting table is combined with the table of patented compounds to determine the likelihood of patents in the field. That is, the compound remaining after step 470 is combined with the proprietary compound table using a drug-like bond. In this combination, the number of combined records is counted. At step 474, the user enters the number of patent records that define the acceptable limit. The resulting table is examined to determine the number of compounds within the similar radius of the subject compound that are within the scope of the patent or affected. In step 476, hits above the cutoff value selected by the user may be eliminated.

【０１１９】段階４７８において、使用者は、追跡セットを選ぶ。例えば、使用者は、より
広い試験を行うために上述の類似結合後に残っている化合物から約１０種の化合
物を選ぶことができる。段階４８０において、選ばれた化合物が獲得可能な薬品
のライブラリーに結合される。この場合も、化合物は薬品類似結合を使って結合
される。一実施例においては、この結合は、平均又は計数なしで内部結合を使用
して達成することができる。段階４８２において、組み合わせられた構造が、例
えば２次スクリーニングのために注文される。At step 478, the user selects a tracking set. For example, the user can select about 10 compounds from the compounds remaining after the above-described similar binding to perform a broader test. At step 480, the selected compound is bound to a library of obtainable drugs. Again, the compounds are attached using drug-like bonds. In one embodiment, this coupling can be accomplished using an inner coupling without averaging or counting. At step 482, the combined structures are ordered, eg, for secondary screening.

【０１２０】これらのシナリオが示すように、薬品類似結合使用の一つの利点は、類似半径
内の薬品又は化合物に関するデータが、類似結合の行われる対象化合物と組み合
わせられて、挙動、属性、又はその他のパラメーターを予測し又は推測するため
に使用できることである。例えば、使用者は、与えられた化合物又は仮想化合物
についての毒性及び特許範囲に関する情報を持つこと望むことができる。しかし
、このデータは、その特定の化合物又は仮想化合物については存在しないかもし
れない。従って、標準リレーショナルデータベース結合演算は、使用者にこれら
パラメーターに関する情報を提供できない。しかし、使用者は、類似半径内の結
合作業を実行する薬品類似結合を使用して、定められた類似半径内の化合物に関
連する可能性のある情報を取得し調査することができる。このとき、使用者は、
これらのパラメーターが選定された化合物又は仮想化合物に存在するか否かを推
定し又は予測するためにこの情報を使うことができる。As these scenarios show, one advantage of using drug-like bonds is that data about drugs or compounds within similar radii can be combined with target compounds to which similar bonds are made to behave, attribute, or otherwise Can be used to predict or infer the parameters of For example, the user may wish to have information regarding toxicity and patent coverage for a given compound or hypothetical compound. However, this data may not exist for that particular compound or hypothetical compound. Therefore, standard relational database join operations cannot provide the user with information about these parameters. However, the user can use drug-like bonds to perform binding tasks within a similar radius to obtain and explore information that may be relevant to compounds within a defined similar radius. At this time, the user
This information can be used to estimate or predict whether these parameters are present in a selected compound or hypothetical compound.

【０１２１】以上説明された本発明の種々の実施例、態様、及び特徴は、ハードウエア、ソ
フトウエア又はこれらの組合せを使って実行でき、更に１個又は複数個のプロセ
ッサーを有する計算システムを使って実行することができる。事実、一実施例に
おいては、これら要素は、以上説明された機能を実行できるプロセッサーベース
のシステムを使って実行される。The various embodiments, aspects, and features of the invention described above can be implemented using hardware, software, or combinations thereof, and further using a computing system having one or more processors. Can be executed. In fact, in one embodiment, these elements are implemented using a processor-based system capable of performing the functions described above.

【０１２２】本発明の種々の実施例が以上説明されたが、これらは例として、かつ本発明を
限定しない方法で与えられたことを理解すべきである。従って、本発明の精神及
び範囲は、以上説明された例示実施例のいずれによっても限定されず、特許請求
の範囲及びその相当事項によってのみ定められるべきである。Although various embodiments of the present invention have been described above, it should be understood that they have been given by way of example and in a non-limiting manner. Therefore, the spirit and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only by the claims and their equivalents.

[Brief description of drawings]

【図１】本発明の一実施例によるライブラリー情報システムの一応用例を一般的に示し
ているブロック図である。FIG. 1 is a block diagram generally illustrating an application of a library information system according to an embodiment of the present invention.

【図２】多次元薬品空間の２次元的表現における公知の化合物の下位セット例の「セル
ベース」の近隣マッピングを示している線図である。FIG. 2 is a diagram showing “cell-based” neighborhood mapping of an example subset of known compounds in a two-dimensional representation of a multi-dimensional drug space.

【図３】本発明の一実施例による仮想化合物のデータセット例への、公知化合物のデー
タセット例の「距離ベース近隣」マッピングを示している線図である。FIG. 3 is a diagram showing a “distance-based neighborhood” mapping of an example data set of known compounds to an example data set of virtual compounds according to one embodiment of the invention.

【図４Ａ】社内の種々の部門に情報を蓄積するために使用し得る表の例を示している線図
である。FIG. 4A is a diagram showing an example of a table that may be used to store information in various departments within a company.

【図４Ｂ】従業員ＩＤ番号、社会安全保障番号、氏名、部門ＩＤ番号、事務所所在地、肩
書等を含んだ会社従業員に関する情報を含み得る表の例を示している線図である
。FIG. 4B is a diagram showing an example of a table that may include information about company employees including employee ID numbers, social security numbers, names, department ID numbers, office locations, job titles, and the like.

【図４Ｃ】データベース結合を照合するＳＱＬ例から得ることのできる表の例を示してい
る線図である。FIG. 4C is a diagram showing an example of a table that can be obtained from an example SQL that matches database joins.

【図５Ａ】化合物の販売者表の例を示している線図である。FIG. 5A FIG. 6 is a diagram showing an example of a vendor table of compounds.

【図５Ｂ】図５Ａのテーブルにリストされた化合物の下位セットのための毒性表の例を示
している線図である。5B is a diagram showing an example toxicity table for a subset of the compounds listed in the table of FIG. 5A.

【図５Ｃ】図５Ａのテーブルの化合物の入手可能性を、図５Ｂに示された表の化合物のセ
ットのＲａｔＬＤ５０データとともに示している合成連結表である。5C is a synthetic linkage table showing the availability of compounds in the table of FIG. 5A along with Rat LD50 data for the set of compounds in the table shown in FIG. 5B.

【図６】本発明の一実施例による使用者用の有力な化合物識別過程を示している作動の
流れ図である。FIG. 6 is an operational flowchart illustrating a powerful compound identification process for a user according to an embodiment of the present invention.

【図７】本発明の一実施例による１次選択用のシナリオ例を示している作動の流れ図で
ある。FIG. 7 is an operational flow diagram illustrating an example scenario for primary selection according to one embodiment of the invention.

【図８】本発明の一実施例による追跡リード用のシナリオ例を示している作動の流れ図
である。FIG. 8 is an operational flow diagram illustrating an example scenario for a tracking lead according to one embodiment of the invention.

【手続補正書】[Procedure amendment]

【提出日】平成１４年３月２０日（２００２．３．２０）[Submission date] March 20, 2002 (2002.3.20)

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Name of item to be amended] Claims

【補正方法】変更[Correction method] Change

【補正の内容】[Contents of correction]

【特許請求の範囲】[Claims]

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ )，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＧ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＤＺ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＳ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ─────────────────────────────────────────────────── ─── Continued front page (81) Designated countries EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, I T, LU, MC, NL, PT, SE), OA (BF, BJ , CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, K E, LS, MW, SD, SL, SZ, TZ, UG, ZW ), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, C N, CR, CU, CZ, DE, DK, DM, DZ, EE , ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, K P, KR, KZ, LC, LK, LR, LS, LT, LU , LV, MA, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, S G, SI, SK, SL, TJ, TM, TR, TT, TZ , UA, UG, US, UZ, VN, YU, ZA, ZW

Claims

[Claims]

1. A computer-based method for identifying one or more compounds having one or more similar properties, the method identifying a property of a target compound and having properties similar to those of the target compound. A method comprising using a computer to perform similar binding to identify one or more database compounds.

2. A computer-based method for identifying at least one item having at least one characteristic similar to the target item from a database containing the items, the target item being identified. And a method comprising using a computer to perform a fuzzy similarity join in a database to identify at least one item from a database that has properties similar to those of the target item.

3. The method according to claim 10, wherein the fuzzy-like bond is a drug-like bond and the item identified as the target item is a drug compound.

4. The method according to claim 10, wherein a computer user is informed of the identification of one or more identified items having characteristics similar to those of the target item.

5. The method according to claim 11, wherein the characteristic comprises a chemical structure of the target item.

6. The method according to claim 10, wherein a plurality of characteristics of the target item are identified.

7. The method according to claim 11, wherein a plurality of characteristics of the target item are identified.

8. One or more of the plurality of properties is a chemical structure, synthetic pathway, binding data, biological activity, structure-activity relationship information, molecular weight, partition coefficient, charge, size. 16. The method according to claim 15 selected from the group consisting of: efficiency, toxicity, manufacturer, price, and availability.

9. The method according to claim 12, wherein the user receives the information via a telecommunications network.

10. The method according to claim 17, wherein the telecommunication network is the Internet.

11. The method according to claim 11, further comprising excluding test items from the database by selecting user-defined criteria for undesired items.

12. The method according to claim 10, wherein the target item is a biological compound.

13. The method according to claim 20, wherein the biological compound is a protein.

14. The method according to claim 21, wherein the target item is a gene.

15. A computer-based method for identifying from a database containing drug compounds at least one drug compound having at least one property similar to the target drug compound, the method comprising: A method comprising using a computer to perform a drug-like bond in a database to identify a property and further identify at least one database drug compound having properties similar to those of the target drug compound.

16. The method according to claim 23, wherein the user is informed of the identification of at least one database drug compound having properties similar to those of the target drug compound.

17. The method according to claim 23, wherein the properties of the target drug compound include the chemical structure of the target drug compound.

18. The method according to claim 23, wherein a plurality of properties of the target drug compound are identified.

19. The method according to claim 23, wherein the characteristic identified is a neighborhood effect.

20. The method according to claim 27, wherein the neighborhood effect comprises a range of values of the characterization metric for the target drug compound.

21. The method according to claim 23, wherein the similarity between the properties of the target drug compound and the database drug compound is determined using at least one parameter selected from the group consisting of Tanimoto coefficients and molecular holograms.

22. The user receives information via a telecommunications network.
By the method.

23. The method according to claim 30, wherein the telecommunication network is the Internet.

24. The method according to claim 23, further comprising excluding at least one database drug compound from drug-like bindings by selecting user-defined exclusion criteria for unwanted compound features.

25. The chemical properties are chemical structure, synthetic pathway, binding data, biological activity, structure-activity correlation information, molecular weight, partition coefficient, charge, size, efficiency, toxicity, manufacturer, price, 24. The method according to claim 23 selected from the group consisting of: and availability.

26. A computer-based method for identifying from a database containing biological compounds at least one biological compound having at least one property similar to a target biological compound. Perform fuzzy-like joins in a database to identify a property of the target biological compound and further identify at least one database biological compound that has properties similar to those of the target biological compound A method that includes using a computer for.

27. The method according to claim 34, wherein the biological compound is a protein.

28. The method according to claim 34, wherein the biological compound is a gene.