JP2003519201A

JP2003519201A - Highly efficient mapping of molecular variants to functional properties

Info

Publication number: JP2003519201A
Application number: JP2001550023A
Authority: JP
Inventors: ラビッツ、ハーシェル
Original assignee: Princeton University
Current assignee: Princeton University
Priority date: 2000-01-03
Filing date: 2001-01-03
Publication date: 2003-06-17
Also published as: WO2001050124A1; AU2756801A; EP1244909A1

Abstract

(57)【要約】反応生成物の機能特性を最適化する多変数分子合成を選択的に変化させり方法であって、上記多変数合成に対して全ての一次ライブラリ入力変数組合せを反応させて得られた機能特性出力データの一次ライブラリを構築すると共に全ての反応生成物に対する上記機能特性を測定する工程であって、上記一次ライブラリ入力変数組合せは、一度に１個採用された変数に対する全ての選択値を含む工程と；機能特性最適化に対する入力変数の効果に従って各入力変数に対する値を順序付けする工程と；上記順序付けされた入力変数から一度に２個の変数を採用して低頻度サンプリングされた二次ライブラリ入力変数組合せの集合を反応させて得られた機能特性出力データの二次ライブラリを構築すると共に全ての反応生成物に対する上記機能特性を測定する工程と；上記機能特性の最適化のために上記機能特性出力データ間を補間する工程と；から成る方法。 (57) [Summary] A method for selectively changing a multivariable molecular synthesis for optimizing the functional characteristics of a reaction product, wherein a functional characteristic output obtained by reacting all primary library input variable combinations to the multivariable synthesis is obtained. Constructing a primary library of data and measuring the functional properties for all reaction products, wherein the primary library input variable combination comprises all selected values for the variables employed one at a time; Ordering the values for each input variable according to the effect of the input variables on functional property optimization; and a low frequency sampled secondary library input variable combination employing two variables at a time from the ordered input variables. A secondary library of functional characteristic output data obtained by reacting the set of Process and measuring; step and interpolating between the functional characteristics output data to optimize the functional characteristics; process comprising.

Description

Detailed Description of the Invention

【０００１】（関連出願の相互参照）本出願は米国特許法第１１９条（ｅ）項に基づき、２０００年１月３日に出願
された米国仮特許出願第６０／１７４，２２５号の優先権の恩恵を主張する。こ
の特許出願の開示内容は本明細書に援用する。(Cross Reference of Related Applications) This application is based on US Patent Law Section 119 (e), priority of US provisional patent application No. 60 / 174,225 filed on January 3, 2000. Claim the benefits of. The disclosure of this patent application is incorporated herein by reference.

【０００２】（発明の背景）化学技術においては、多くの室内実験、環境的および工業的プロセス、ならび
にモデル演習が多数の入力変数により特徴付けられる。斯かる場合における一般
的な目的は、多くの場合には最適化を念頭に置きまたは単に関連する現象をより
良く理解すべく、観察可能なシステム挙動に対する高次元入力変数空間の影響を
できるだけ完全に精査することである。これらの精査を行う場合の重要な問題は
、システムの入出力挙動を効率的に学ぶために必要な実験またはモデル変形例の
個数である。BACKGROUND OF THE INVENTION In chemical technology, many laboratory experiments, environmental and industrial processes, and model exercises are characterized by a large number of input variables. The general purpose in such cases is often to keep the effect of the high dimensional input variable space on the observable system behavior as complete as possible in order to better understand the phenomena, often with optimization in mind or simply related. To scrutinize. An important issue in performing these scrutiny is the number of experiments or model variants needed to efficiently learn the input / output behavior of the system.

【０００３】変更例に対して利用可能な多数の入力変数が在るという化学／物理系の実験ま
たはモデル化を行うに際し、入力変数の変更は一定の設計戦略を念頭において行
われ得るか、または、上記制御されていない入力の自然の変化の故にランダムに
生じ得る。いずれの状況でも、共通の目標は、問題となる一個以上のシステム観
測量に関する入力変数空間の影響の精査を企図し、できるだけ多くの実行処理を
行うことである。斯かる演習は、入力変数の役割を物理的に理解すべく、または
多くの場合に究極的には、入力変数の特別な選択によりひとつ以上の望ましい物
理的目的を達成するための最適化を行うべく、実施され得る。In conducting a chemical / physical system experiment or modeling in which there are a large number of input variables available for the modification, the modification of the input variables may be done with certain design strategies in mind, or , Can occur randomly due to the natural changes in the uncontrolled inputs. In all situations, the common goal is to scrutinize the effects of the input variable space on one or more system observables in question and do as much processing as possible. Such exercises perform optimizations to physically understand the role of input variables, or in many cases, ultimately to achieve one or more desired physical objectives by special selection of input variables. Therefore, it can be implemented.

【０００４】分子物質（すなわち単一種類の分子から成るサンプル）は、突然変異タンパク
質および薬剤などの多くの用途を包含する。分子物質に関する第ｉ番目の変数は
、基準分子構造上における化学機能化に関する第ｉ番目の部位であり得る。アミ
ノ酸突然変異を受けるタンパク質の場合、変数（すなわち突然変異部位）の合計
数は非常に大きくなり得ると共に、バックボーン部位ｉに関連する変数ｘ_iは天
然アミノ酸を超えて２０個までの値を取り得る。これと対照的に薬剤分子は、通
常、基準となる化学的骨組上の少数部位の機能化により生成された適度なサイズ
である。薬剤の場合、第ｉ番目の変数は多数の値を取り得る、と言うのも、適切
な分子骨組上の置換に対して、かなり恣意性の高い化学的部分（ｃｈｅｍｉｃａ
ｌｍｏｉｅｔｙ）の集合が考慮され得るからである。Molecular materials (ie samples of a single type of molecule) encompass many applications such as muteins and drugs. The i-th variable for the molecular substance may be the i-th site for chemical functionalization on the reference molecular structure. In the case of proteins undergoing amino acid mutations, the total number of variables (ie mutation sites) can be very large and the variables x _i associated with backbone site i can take values up to 20 beyond the natural amino acids. . Drug molecules, in contrast, are usually of a modest size generated by the functionalization of minority sites on a reference chemical scaffold. In the case of drugs, the i-th variable can take many values, because the chemical moiety, which is quite arbitrary for substitution on the appropriate molecular scaffold.
This is because a set of l moities) can be considered.

【０００５】分子物質は本来的に、混合物調製物におけるそれと異なる、と言うのも、分子
部分（たとえば、メチル、エチル、クロロなど）の入力変数は離散的である一方
、混合物における入力変数としての成分モル分率は連続的な値を取り得るからで
ある。大きな集合の可能な分子種に由来する混合物質は、離散的変数および連続
的変数の両方を有する。これらの物質は全て、多数の入力変数または多数の変数
値のいずれかにより特徴付けられるという問題があることから、生成され得る可
能な爆発個数の材料サンプルの取り扱いを試行する上での高スループット合成／
スクリーニング技術に対して大きな関心が寄せられている。Molecular substances are inherently different from those in mixture preparations, because the input variables for the molecular parts (eg methyl, ethyl, chloro, etc.) are discrete while the input variables for the mixture are This is because the component mole fraction can take a continuous value. Mixed materials from a large set of possible molecular species have both discrete and continuous variables. All of these substances have the problem of being characterized by either a large number of input variables or a large number of variable values, thus leading to high throughput synthesis in attempting to handle the possible explosive number of material samples that can be produced. /
There is great interest in screening techniques.

【０００６】上述の例および他の多くの例の全てに共通する特性は、入力を記述すべく当然
に生じ得る多数の変数である。この文脈で“多数”という概念は、個々の用途に
基づき、特に、全ての入力変数の任意の単一の詳細に対応するシステム出力を適
切に観察もしくは計算する上での困難さに基づいている。同様に、薬剤の検索に
は少数の変数（すなわち分子骨組上における機能化に対する部位）が含まれるが
、これらの変数の各々に関する部分（ｍｏｉｅｔｙ）の値の個数は１０²以上に
非常に大きくなり得る。この場合、ひとつの考えられる薬剤分子を作成すること
は容易であるが、全ての関連する可能性を実現することは困難である。他の問題
においては、関連する変数の個数が本来的に多く、その一例は、入力が関数であ
り且つ良好な分解能が必要とされることから離散化入力変数が数百個以上となる
場合に生ずる。A property that is common to all of the above examples and many other examples is the large number of variables that can naturally occur to describe the input. The concept of "many" in this context is based on the individual application, and in particular on the difficulty in properly observing or calculating the system output corresponding to any single detail of all input variables. . Similarly, the drug search involves a small number of variables (ie, sites for functionalization on the molecular scaffold), but the number of moiety values for each of these variables can be very large, above 10 ^2. obtain. In this case, it is easy to generate one possible drug molecule, but it is difficult to realize all relevant possibilities. In another problem, the number of variables involved is inherently large, one example being the case where there are hundreds or more discretized input variables because the inputs are functions and good resolution is required. Occurs.

【０００７】故に、精査されるべきシステム内における分子変種の個数とシステムにおける
可能な変数組合せの個数との間における指数的関係の故に、所望の特性を有する
分子物質の発見は依然として非常な負担である。この問題にランダムもしくは準
ランダムな合成手法で対処すべく、コンビナトリアル・ケミストリが試行されて
来た。代替的に、設計により指針を達成すべくモデル化処置が試行されてきた。
しかしこれらの技術はいずれも、完全な有望性および機能に到達していない。故
に依然として、多変数反応系の精査が更に利用され得る如くデータを構成して解
釈する方法に対する要求が依然として存在する。Therefore, due to the exponential relationship between the number of molecular variants in the system to be scrutinized and the number of possible variable combinations in the system, the discovery of molecular substances with the desired properties is still a great burden. is there. Combinatorial chemistry has been attempted to address this issue with random or quasi-random synthesis techniques. Alternatively, modeled procedures have been attempted to achieve the guidelines by design.
However, none of these technologies have reached full promise and functionality. Therefore, there is still a need for ways to structure and interpret data so that scrutiny of multivariable reaction systems can be further utilized.

【０００８】（発明の要約）この要求は、本発明により満足される。適度な初期の数の実験を本発明により
採用される技術によって使用し、更なる実験を詳細に案内した場合、システム内
の全ての可能な変数組合せを反応させなくても極めて効率的な様式で多変数反応
系が精査され得ることが見出された。本発明は、各変数の影響を協働項の階層へ
と分解すると共に、任意の特定変数の寄与が分子変数の合理的順序付けに基づく
実験室観察から直接的に評価され得ることを認識することにより、多変数反応系
の完全な研究を実施するために必要な分子合成実験の回数が増加するのは、多く
とも部位の個数により多項式であり、一定条件下では部位の個数に対して実際に
不変となる、という発見を取入れている。SUMMARY OF THE INVENTION This need is met by the present invention. When a reasonable initial number of experiments were used with the technique employed by the present invention and further experiments were guided in detail, in a very efficient manner without reacting all possible combinations of variables in the system. It has been found that multivariable reaction systems can be scrutinized. The present invention decomposes the effect of each variable into a hierarchy of cooperating terms and recognizes that the contribution of any particular variable can be directly assessed from laboratory observations based on rational ordering of molecular variables. Show that the number of molecular synthesis experiments required to carry out a complete study of a multivariable reaction system increases at most with a polynomial according to the number of sites, and under certain conditions it actually It incorporates the discovery that it will be immutable.

【０００９】故に本発明のひとつの態様に依れば、反応生成物の機能特性を最適化する多変
数分子反応を選択的に変化させる方法であって、上記多変数合成に対して全ての一次ライブラリ入力変数組合せを反応させて得
られた機能特性出力データの一次ライブラリを構築すると共に、全ての反応生成
物に対する上記機能特性を測定する工程であって、上記一次ライブラリ入力変数
組合せは一度に１個採用された変数に対する全ての選択値を含む一方で、他の変
数値は一定に保持されまたはランダム化される工程と、機能特性出力データの上記一次ライブラリに基づき機能特性最適化に対する入
力変数の効果に従って各入力変数に対する値を順序付けする工程と、上記順序付けされた入力変数から低頻度サンプリングされた二次ライブラリ入
力変数組合せの集合を反応させて得られた機能特性出力データの二次ライブラリ
を構築すると共に、全ての反応生成物に対して上記機能特性を測定する工程であ
って、上記二次ライブラリ入力変数組合せは、上記順序付けされた入力変数の低
頻度サンプリングから一度に組合された２個の変数である一方で、他の変数値が
在るならば該他の変数値は一定に保持されまたはランダム化される工程と、上記機能特性の最適化のために上記機能特性出力データ間を補間する工程と、から成る方法が提供される。Therefore, according to one aspect of the present invention, there is provided a method for selectively altering a multivariable molecular reaction that optimizes the functional properties of a reaction product, wherein A step of constructing a primary library of functional characteristic output data obtained by reacting a combination of library input variables, and measuring the functional characteristic for all reaction products, wherein the primary library input variable combination is 1 at a time. Including all selected values for each of the adopted variables, other variable values are held constant or randomized, and the input variables for the functional characteristic optimization based on the above primary library of the functional characteristic output data. Ordering the values for each input variable according to the effect, and using a second-order library input sampled infrequently from the ordered input variables A step of constructing a secondary library of functional characteristic output data obtained by reacting a set of force variable combinations and measuring the functional characteristic of all reaction products, wherein the secondary library input variable A combination is two variables that are combined at once from the infrequent sampling of the ordered input variables, while other variable values, if any, are held constant or randomized. And a step of interpolating between the functional characteristic output data for optimizing the functional characteristic.

【００１０】上記二次ライブラリは、順序付けされた変数値の順序付けされた低頻度サンプ
リング、順序付けされた変数値の完全にランダムな低頻度サンプリングを、２つ
のサンプリング技術の組合せによりまたは他の低頻度サンプリング技術により反
応させることで構築され得る。いずれのサンプリング技術が採用されるかは、変
数の個数および実験室合成技術によって決まる。The secondary library described above uses ordered infrequent sampling of ordered variable values, fully random infrequent sampling of ordered variable values by a combination of two sampling techniques or other infrequent sampling. It can be constructed by reacting by a technique. Which sampling technique is used depends on the number of variables and the laboratory synthesis technique.

【００１１】本発明の方法は、習用の高次元モデル表現（ＨｉｇｈＤｉｍｅｎｓｉｏｎａ
ｌＭｏｄｅｌＲｅｐｒｅｓｅｎｔａｔｉｏｎ（ＨＤＭＲ））アルゴリズムの
変形例および改良である。ＨＤＭＲは構成原理の限界集合を採用することにより
実験室合成研究を直接的に導く結果、所望の機能特性を備えた化合物が迅速に識
別される。本発明は、合成反応変数の順序付けされたサンプリングを使用して適
度な合成研究の第２工程を示唆することでＨＤＭＲ技術を改良し、最終的には、
合成されていない分子までも含めて可能性の全空間にわたり分子の機能特性を定
量的に評価する手段につながる。The method of the present invention uses a conventional High Dimensional Model Representation (High Dimensiona).
1 is a modification and improvement of the 1 Model Representation (HDMR) algorithm. HDMR directly guides laboratory synthetic studies by adopting a limiting set of constitutive principles, resulting in rapid identification of compounds with desired functional properties. The present invention improves the HDMR technique by suggesting a second step in a modest synthetic study using ordered sampling of synthetic reaction variables, and finally,
This will lead to a means to quantitatively evaluate the functional properties of molecules, including the entire range of possibilities, including those that have not been synthesized.

【００１２】故に本発明の方法は先ず、各反応変数に対して選択された全ての値を含む合成
の初期集合と各反応変数の機能的影響の観察とを実施し、次に反応生成物に対す
る問題の機能特性の最適化に関する各変数値の影響に従い該各変数値が順序付け
されるという上記結果のアルゴリズム的解析を行うことで、順序付けられた又は
ランダムにサンプリングされた分子の更なる選択的集合の実施が示唆される。こ
れらの合成およびそれらの機能的影響から得られる全体的情報は次に、分子可能
性の全空間にわたり機能特性を補間するために、高次元モデル表現へと再構築さ
れる。可能な省力の例として、数百万以上の分子可能性を伴う事案が、賢明に合
成された２００種程度の少ない分子により定量的に研究され得る。The method of the invention therefore first carries out an initial set of syntheses containing all the values selected for each reaction variable and observation of the functional influence of each reaction variable, and then on the reaction products. A further selective set of ordered or randomly sampled molecules is performed by performing an algorithmic analysis of the above results in which the values of each variable are ordered according to their effect on the optimization of the functional property in question. Is suggested. The global information obtained from these syntheses and their functional influences is then reconstructed into a high-dimensional model representation in order to interpolate the functional properties over the entire space of molecular possibilities. As an example of possible labor savings, cases involving millions or more of molecular possibilities can be quantitatively studied with as few as 200 molecules wisely synthesized.

【００１３】本発明の他の特徴は、本発明の原理およびそれを実施すべく現在において企図
された最良形態を開示する以下の説明および各請求項にて指摘される。Other features of the invention are pointed out in the following description and claims that disclose the principles of the invention and the best mode presently contemplated for carrying it out.

【００１４】本発明の更に完全な評価および意図された他の多くの利点は、添付図面を考慮
し乍ら本発明の詳細な説明を参照することで容易に理解され得る。A more complete appreciation of the present invention and many other contemplated advantages can be readily appreciated by reference to the detailed description of the invention in light of the accompanying drawings.

【００１５】（好ましい実施形態の詳細な説明）図１は、たとえば薬剤作用に関して研究中の基準分子構造などの分子物質を発
見すべく本発明の方法を具現する、アプリケーションプログラムを示している。
ステップ１０において、ＨＤＭＲに対して本質的に習用の手段により機能特性出
力データの一次ライブラリが構築される。変数ａ，ｂ，ｃ，・・ｚを含む多変数
反応に対しては、各変数に対する値がａ₁、ａ₂、ａ₃・・ａ_N；ｂ₁、ｂ₂、
ｂ₃・・ｂ_Nなどの様に選択される。入力変数組合せは、一度に１個の変数を採
用することで調製される一方で、反応の他の変数は一定に保持されまたはランダ
ム化される。故に、変数ａ₁、ａ₂、ａ₃・・ａ_Nに対して組合せが調製される
一方で、反応の全ての変数に対して変数ｂ乃至ｚが一定に保持されまたはランダ
ム化される。習用のＨＤＭＲに対する改良を示す本発明の好ましい実施形態のひ
とつは、サンプリングされる各組合せ中に選択変数値の全てが含まれることを唯
一の要件として、全体空間を完全にランダムにサンプリングして入力変数組合せ
を形成することである。入力変数組合せが如何にして形成されるかに関わらず、
該組合せは、全ての可能な組合せを精査するために反応されるべき組合せの合計
数を相当に減少する。Detailed Description of the Preferred Embodiments FIG. 1 shows an application program embodying the methods of the present invention to discover molecular substances such as reference molecular structures under study for drug action.
In step 10, a primary library of functional characteristic output data is constructed by means essentially conventional to HDMR. For multivariable reactions involving the variables a, b, c, ... Z, the values for each variable are a ₁ , a ₂ , a ₃ ... a _N ; b ₁ , b ₂ ,
It is selected such as b ₃ _··· b _N. Input variable combinations are prepared by adopting one variable at a time, while other variables of the reaction are held constant or randomized. Therefore, while the combinations are prepared for the variables a ₁ , a ₂ , a ₃ ... A _N , the variables b to z are held constant or randomized for all variables of the reaction. One of the preferred embodiments of the present invention, which represents an improvement over the conventional HDMR, is to sample the entire space completely randomly, with the only requirement that all of the selection variable values be included in each sampled combination. Forming variable combinations. No matter how the input variable combinations are formed,
The combination considerably reduces the total number of combinations to be reacted in order to probe all possible combinations.

【００１６】一次ライブラリは、各組合せを反応させてから、問題となる単一または複数の
機能特性を全ての反応生成物に対して測定することで構築される。その目的は、
一度に１個採用された各変数に対し全ての選択値を先ず反応させる一方で、他の
変数値は一定に保持またはランダム化し、各変数の選択値が問題の機能特性に対
して有する影響の測定を提供する１次元サンプリングであって各変数に対する多
次元空間の全体にわたる１次元サンプリングを求めることである。A primary library is constructed by reacting each combination and then measuring the functional property or properties of interest for all reaction products. Its purpose is
All selected values are reacted first for each variable adopted one at a time, while other variable values are kept constant or randomized to determine the effect that the selected value of each variable has on the functional characteristic in question. To obtain a one-dimensional sampling that provides a measurement, for each variable across a multidimensional space.

【００１７】薬剤作用に対して研究中の分子に関し、変数集合は、ひとつ以上の化学的骨組
すなわち化学機能化に対して一個以上の部位を有する基本的分子構造もしくは骨
格、ならびに、各部位の機能化に対して選択される種々の化学的部分（たとえば
、メチル、エチル、クロロなど）を含み得る。該変数としてはまた、骨組の構造
的変更および空間的特徴が挙げられる。また最適化されるべき機能特性としては
、薬剤作用、副作用の最小化、生物学的利用能、改善された生成物収率、および
、目的薬物の発見に不可欠な他の特性が挙げられる。For molecules under investigation for drug action, the variable set is the basic molecular structure or skeleton with one or more chemical scaffolds, ie, one or more sites for chemical functionalization, as well as the function of each site. It may include various chemical moieties that are selected for chemistry (eg methyl, ethyl, chloro, etc.). The variables also include structural alterations and spatial features of the skeleton. Functional properties to be optimized also include drug action, side effect minimization, bioavailability, improved product yield, and other properties essential for drug discovery of interest.

【００１８】故に、本発明の目的で、変数の“値”は必ずしも経験量を指すのではなく、考
慮中の種々の選択肢の単位元（ｉｄｅｎｔｉｔｙ）を指す。もし変数ａが化学的
骨組を表して該骨組上の化学機能化の各部位にて部分が変更されるべきであれば
、ａ₁、ａ₂、ａ₃・・は研究されるべき骨組構造を表す。もし変数ｂが上記骨
組上の第１の化学機能化部位にて変更されるべき部分を表すなら、ｂ₁、ｂ₂、
ｂ₃・・ｂ_Nは、クロロ、フルオロ、メチル、エチルなどのその機能化箇所にて
試験されるべく選択される種々の部分を表す。他の変数および値は、残りの化学
機能化部位に割当てられる。Thus, for the purposes of the present invention, the “value” of a variable does not necessarily refer to an empirical amount, but to the identity of the various alternatives under consideration. If the variable a represents a chemical skeleton and the part should be changed at each site of chemical functionalization on the skeleton, then a ₁ , a ₂ , a ₃ ... Denote the skeleton structure to be studied. Represent If the variable b represents the part to be modified at the first chemical functionalization site on the framework, then b ₁ , b ₂ ,
b ₃ ... B _N represent the various moieties selected to be tested at their functionalization sites, such as chloro, fluoro, methyl, ethyl and the like. Other variables and values are assigned to the remaining chemical functionalization sites.

【００１９】本発明は薬剤作用に対する分子の研究に限定されるのではなく、一般的な分子
物質に対して適用可能である。薬剤として利用される分子用途の他にも、問題と
なる分子用途の非限定的例としては、光学的作用、磁気作用、電気作用などを有
する分子が挙げられる。薬剤作用に対して研究される分子の場合と同様に、変数
集合は、ひとつ以上の化学的骨組すなわち化学機能化に対して一個以上の部位を
有する基本的分子構造もしくは骨格、ならびに、各部位の機能化に対して選択さ
れる種々の化学的部分を含む。本発明はまた、個々のモノマ部分であって、本発
明の目的に対して変更され得る一個以上の化学機能化部位を有する化学的骨組を
各々が備えるモノマ部分から成るポリマー材料に対しても概略的に適用可能であ
る。評価されるべき機能特性は、該ポリマーの機能特性である。The present invention is not limited to the study of molecules for drug action, but is applicable to general molecular substances. In addition to molecular use as a drug, non-limiting examples of problematic molecular uses include molecules that have optical, magnetic, or electrical effects. As in the case of the molecules studied for drug action, the variable set consists of one or more chemical scaffolds, or basic molecular structures or scaffolds with one or more sites for chemical functionalization, as well as each site. It includes various chemical moieties that are selected for functionalization. The present invention is also generalized to polymeric materials consisting of individual monomer moieties, each of which comprises a chemical framework having one or more chemically functionalized sites that may be modified for the purposes of the invention. Is applicable. The functional property to be evaluated is the functional property of the polymer.

【００２０】本発明の方法と共に使用される多変数システムの別の例は、部位特異的突然変
異に対する候補タンパク質である。タンパク質の配列内の各アミノ酸は、タンパ
ク質合成において採用された各アミノ酸から選択される、考えうる限りの独立変
数を表す。最適化されるべき機能特性は、フォールディング安定性、リガンド結
合親和力などのタンパク質の特性である。Another example of a multivariable system for use with the methods of the invention is a candidate protein for site-directed mutagenesis. Each amino acid within the protein sequence represents a possible independent variable selected from each amino acid employed in protein synthesis. Functional properties to be optimized are protein properties such as folding stability, ligand binding affinity.

【００２１】各変数は、可能な分子構造に対する最初のコンビナトリアル・ケミストリ・ス
クリーニングにより、または、薬剤としてのリード化合物から導かれ得る。たと
えば治療特性を有することが見出された天然化合物は、最適化されるべき化学的
骨組ならびに研究されるべき関連骨組を提供するとともに、効力、効能、安全性
、生物学的利用能などに関する治療効果の最適化を目的として、各骨組の化学機
能化部位において変更されるべき化学的部分が選択される。薬剤リード化合物の
別の例は、治療効果を生成するリガンド結合親和力を有することが見出されてい
るタンパク質であり、これは治療効果を強化するリガンド結合能力の改善を目的
として、一個以上の配列位置における部位特異的突然変異に対するアミノ酸配列
を提供する。Each variable can be derived by an initial combinatorial chemistry screen for possible molecular structures or from lead compounds as drugs. For example, natural compounds found to have therapeutic properties provide chemical scaffolds to be optimized as well as related scaffolds to be studied, as well as treatments for efficacy, efficacy, safety, bioavailability, etc. For the purpose of optimizing the effect, the chemical moieties to be altered at the chemical functionalization site of each scaffold are selected. Another example of a drug lead compound is a protein that has been found to have a ligand binding affinity that produces a therapeutic effect, which comprises one or more sequences with the aim of improving the ligand binding ability to enhance the therapeutic effect. Amino acid sequences for site-directed mutations at positions are provided.

【００２２】分子物質は、入力変数が離散値を有するシステムの例である。本発明を適用可
能な他のシステムは、化学的混合物の成分モル分率などの様な連続的値を有する
変数を有し得る。温度、圧力、反応時間などの反応条件もまた、その他の場合に
は有限で離散的な値を有する変数に限定される変数組合せに対して連続的値の入
力変数を導入し得る。但し連続的値の変数は、変数連続体の全体の完全なサンプ
リングを提供するデータ点の選択を必要とする。多変数システムにおける各変数
が連続的であるか離散的であるかに関わらず、目的は、一度に１個採用された各
変数に対し全ての選択値を先ず反応させる一方で、他の変数値を一定に保持また
はランダム化することである。Molecular material is an example of a system where the input variables have discrete values. Other systems to which the present invention is applicable may have variables with continuous values such as component mole fractions of chemical mixtures. Reaction conditions such as temperature, pressure, reaction time, etc. may also introduce continuous-valued input variables to variable combinations that would otherwise be limited to variables with finite and discrete values. However, continuous-valued variables require the selection of data points that provide a complete sampling of the entire variable continuum. Whether or not each variable in a multivariable system is continuous or discrete, the goal is to react all selected values first to each adopted one at a time, while Is to be kept constant or randomized.

【００２３】ステップ２０において、考慮中の特性が変数値と共に単調に変化するように各
変数に対する値を順序付けすべく、上記一次ライブラリ出力データが使用される
。換言すると、機能特性は選択値の順序付けを左右し得ると共に各変数に対する
選択値はオブザーバとしての機能特性の観点から格付けされることから、最高の
最適機能特性を生成する値に最大の重要性が割当てられ、以下は最低の最適機能
特性を生成する値まで同様に行われる。目的は各変数に対する値の“自然な”順
序を識別することであり、ここで“自然な”とは、機能特性に関する実際の実験
に基づく変数値の合理的な順序付けとして定義される。In step 20, the primary library output data is used to order the values for each variable such that the property under consideration changes monotonically with the variable value. In other words, the functional characteristics can influence the ordering of the selected values, and the selected values for each variable are ranked in terms of their functional characteristics as observers, so that the value that produces the best optimal functional characteristic has the greatest importance. Assigned, and so on until the value that produces the lowest optimal functional characteristic. The purpose is to identify a "natural" order of values for each variable, where "natural" is defined as a rational ordering of variable values based on actual experimentation on functional properties.

【００２４】ステップ１０および２０は、各機能特性がそれ自体の変数の自然順序を有する
一個以上の機能特性に対して適用され得る。ひとつの目的は２個以上の機能特性
の最適組合せを有する反応生成物を見出すこととされ得るが、その場合に、ステ
ップ１０は２個以上の特性の測定を行い、ステップ２０は本質的に、各特性に対
する最適化に関して達成された結果に基づく各変数値を採点することになる。各
機能特性は、最適化された反応生成物に関して求められる最終的な目的に対する
機能特性の重要性に依存した変数値の採点（スコアリング）目的に対し、別様に
加重される。Steps 10 and 20 may be applied to one or more functional properties, each functional property having its own natural order of variables. One purpose may be to find a reaction product with an optimal combination of two or more functional properties, in which case step 10 performs a measurement of two or more properties and step 20 essentially comprises Each variable value will be scored based on the results achieved for the optimization for each property. Each functional property is weighted differently for the purpose of scoring variable values depending on the importance of the functional property to the final purpose sought for the optimized reaction product.

【００２５】ステップ２５において、順序付けされた変数が評価され、各変数に対する出力
データがサンプリングされた値の範囲にわたり多少なりとも規則的な挙動を示す
ことが確認される。本発明の目的に対して“規則的”とは、最も近い各変数値間
における機能特性の差が“円滑”すなわちできるだけ小さい状態として定義され
る。もしそうでければ、方法はステップ２７に進み、一次ライブラリが精細化（
ｒｅｆｉｎｅ）される。この精細化は幾通りかで達成され得るが、目的は一次ラ
イブラリ入力変数組合せの集合を拡大することである。これを行うひとつの手法
はステップ１０を反復し、再び各変数を一度に１個採用しつつ他の変数値を一定
に保持またはランダム化して新たな入力変数組合せを形成することで、別の出力
データの集合を求めることである。また、これを行う別の手法は、変数値の個数
を増加することである。これは、薬剤作用に関して研究されつつある分子構造の
例を用いると、別の化学的骨組の精査、または、化学機能化部位における付加的
部分の評価を含むものである。その場合に該方法は、ステップ１０（組合せの反
応および機能特性の測定）、ステップ２０（それぞれの機能特性からの変数値の
順序付け）、および、ステップ２５（円滑さの規則性に関し、順序付けされた変
数値の評価）を反復し、必要に応じてステップ２７が再び後に続く。In step 25, the ordered variables are evaluated to ensure that the output data for each variable behaves more or less regularly over the range of sampled values. For the purposes of the present invention, "regular" is defined as the state in which the difference in the functional characteristics between the closest variable values is "smooth", ie as small as possible. If not, the method proceeds to step 27, where the primary library is refined (
refined). This refinement can be accomplished in several ways, but the goal is to expand the set of primary library input variable combinations. One way to do this is to repeat step 10 and again adopt one variable at a time while holding or randomizing the values of the other variables to form a new input variable combination to produce another output. To find a set of data. Another way to do this is to increase the number of variable values. This involves probing of another chemical scaffold, or evaluation of additional moieties at chemical functionalization sites, using examples of molecular structures being studied for drug action. The method is then step 10 (measurement of the combination response and functional properties), step 20 (ordering of variable values from each functional property), and step 25 (ordering for smoothness regularity). The evaluation of variable values) is repeated, and step 27 is followed again if necessary.

【００２６】一次ライブラリ入力変数組合せを増加する別の手法は、完全なまたは部分的な
二次出力データを求め、これを変数値の再順序付け（ｒｅｏｒｄｅｒｉｎｇ）に
導入することである。換言すると、一度に２個の変数値を採用する一方で他の変
数値を一定に保持またはランダム化して、上記一次ライブラリ入力変数組合せの
幾つかまたは全てを再構築することで、可能な変数組合せの更に完全なサンプル
と各変数間の協働のより良い評価が求められる。ステップ１０、２０および２５
が再び反復されると共に、必要に応じてステップ２７も再び後に続く。いずれの
選択肢に依っても、ステップ１０および２０が反復されると各変数の内の少なく
とも一個の変数の各値が再順序付けされる。と言うのも、増加したデータによれ
ば機能特性の観点から変数値の合理的階層に対する新たな知見が得られ、これは
データの“更に円滑な”もしくは更に規則的な順序付けとなるからである。Another approach to increasing the primary library input variable combinations is to find complete or partial secondary output data and introduce this into the reordering of the variable values. In other words, by adopting two variable values at a time and holding or randomizing the other variable values constant, some or all of the above primary library input variable combinations are reconstructed to make possible variable combinations. A more complete sample of and better assessment of the cooperation between each variable is needed. Steps 10, 20 and 25
Is repeated again, and step 27 follows again, if necessary. Regardless of which option is selected, when steps 10 and 20 are repeated, the values of at least one of the variables are reordered. This is because the increased data provides a new insight into the rational hierarchy of variable values in terms of functional properties, which results in a "smoother" or more regular ordering of the data. .

【００２７】順序付けされた変数値ができるだけ規則的であることをステップ２５が確認し
たなら、本方法はステップ３０に進み、ステップ２０で順序付けされた変数値か
ら低頻度でサンプリングされた二次ライブラリ入力変数組合せの集合を反応させ
て機能特性出力データの二次ライブラリが求められる。本発明の目的に対して“
低頻度”サンプリングとは、機能特性最適化に関して各変数に対する値が有する
影響を観察することで示唆される適度な部分的サンプリングとして定義されるが
、これは、合成されない反応生成物を含む可能性の全空間にわたる反応生成物に
対する機能特性の定量的評価を許容する上で効果的である。低頻度のサンプリン
グは、最適な機能特性を有する反応生成物に到達すべく全ての可能な変数組合せ
をサンプリングするのと比較し、経済的な尺度を示す。If step 25 confirms that the ordered variable values are as regular as possible, then the method proceeds to step 30 where the infrequently sampled secondary library input from the ordered variable values in step 20. A secondary library of functional characteristic output data is obtained by reacting a set of variable combinations. For the purposes of the present invention
"Infrequent" sampling is defined as a modest partial sampling suggested by observing the effect that values have on each variable with respect to functional property optimization, which may include reaction products that are not synthesized. Is effective in allowing a quantitative assessment of functional properties for the reaction products over the entire space of the low frequency sampling to achieve all possible variable combinations to reach the reaction product with the optimal functional properties. An economic measure is shown in comparison with sampling.

【００２８】上記の低頻度サンプリングは幾通りかで実施され得るが、その目的は可能性の
全空間にわたり機能的特性性能の合理的評価を求めることである。たとえば順序
付けされたサンプリングが実施され得るが、この場合に順序付けされた変数値は
多次元軸または多次元配列で構築されて周期的にサンプリングされることから、
５番目毎、１０番目毎、２０番目毎または１００番目毎、の変数組合せがサンプ
リングされる。故に、サンプリングされる組合せの個数を増加すると、変数空間
にわたり求められる分解能が高められる。代わりに、変数空間の完全にランダム
なサンプリングを実施してもよい。このようにして順序付けされたまたはランダ
ムなサンプリングは、可能性の全空間にわたり均一にまたは不均一に実施され得
る。不均一なサンプリングは、最適な機能的特性性能の領域が識別されるまで反
復的に実施される。The above low frequency sampling may be carried out in several ways, the purpose of which is to seek a rational assessment of the functional characteristic performance over the entire space of possibilities. For example, ordered sampling may be performed, where ordered variable values are constructed with multidimensional axes or arrays and sampled periodically,
Every 5th, 10th, 20th or 100th variable combination is sampled. Therefore, increasing the number of combinations sampled increases the resolution required over the variable space. Alternatively, a completely random sampling of the variable space may be performed. Ordered or random sampling in this way can be performed uniformly or non-uniformly over the entire space of possibilities. Non-uniform sampling is performed iteratively until the region of optimal functional characteristic performance is identified.

【００２９】二次ライブラリ構築に対する低頻度サンプリング技術は、その他の点ではＨＤ
ＭＲマッピングに対して本質的に習用であり、当業者により容易に採用されるこ
とから、詳細な説明は不用である。但し本発明の主な寄与は、低頻度サンプリン
グ技術に在るのでは無く、機能特性の観点から各変数値を合理的に自然に順序付
けすることであり、その故に最初の低頻度サンプリングが可能とされるのである
。当業者であれば、本明細書を参照してこの概念が理解されたなら、過度の努力
なしで既存のＨＤＭＲソフトウェア・アルゴリズムを改変して本明細書中に記述
された目的を達成し得ることは明らかである。Low-frequency sampling techniques for secondary library construction are otherwise HD
A detailed description is unnecessary because it is inherently conventional for MR mapping and is easily adopted by those skilled in the art. However, the main contribution of the present invention is not in the low-frequency sampling technique, but in reasonably and naturally ordering the variable values from the viewpoint of functional characteristics, and therefore the first low-frequency sampling is possible. Is done. Those of ordinary skill in the art, with reference to the present specification, should be able to modify existing HDMR software algorithms without undue effort to achieve the objectives described herein if the concept is understood. Is clear.

【００３０】切断（ｃｕｔ）ＨＤＭＲおよびＲＳ−ＨＤＭＲとして公知の技術を含む習用の
低頻度サンプリング技術は、Ｒａｂｉｔｓ等の“ＧｅｎｅｒａｌＦｏｕｎｄａ
ｔｉｏｎｓｏｆＨｉｇｈＤｉｍｅｎｓｉｏｎａｌＭｏｄｅｌＲｅｐｒ
ｅｓｅｎｔａｔｉｏｎ”、Ｊ．Ｍａｔｈ．Ｃｈｅｍ．，２５，１９７−２３３（
１９９９）に開示されている。切断ＨＤＭＲは、切断中心と称される基準点の回
りにおける規則的な低頻度サンプリングを採用する一方、ＲＳ−ＨＤＭＲは多次
元空間の全体にわたり低頻度のモンテカルロ・サンプリングによる拡張関数を決
定する。Conventional low frequency sampling techniques, including those known as cut HDMR and RS-HDMR, are described by Rabits et al. In “General Founda”.
conditions of High Dimensional Model Repr
"Essentation", J. Math. Chem., 25, 197-233 (
1999). Truncated HDMR employs regular low-frequency sampling around a reference point called the cut center, while RS-HDMR determines a low-frequency Monte Carlo sampling expansion function over a multidimensional space.

【００３１】上記二次ライブラリ構築はその他の点ではＨＤＭＲに対して本質的に習用であ
る。二次ライブラリであることから、入力変数組合せは、低頻度サンプリング技
術に基づく他の変数の選択肢により一度に２個の変数値を採用することで構築さ
れる。The secondary library construction described above is otherwise conventional for HDMR. Being a quadratic library, the input variable combinations are constructed by adopting two variable values at once with alternative variable choices based on low frequency sampling techniques.

【００３２】ステップ４０は、機能特性測定ステップを表す。機能特性測定は全ての二次ラ
イブラリ入力変数組合せの反応生成物に対して実施され、二次ライブラリに対す
る出力データの完全集合が得られる。Step 40 represents a functional characteristic measuring step. Functional characterization is performed on the reaction products of all secondary library input variable combinations to obtain a complete set of output data for the secondary library.

【００３３】上記方法はステップ５０に進み、一次および二次ライブラリからの出力データ
が補間されて入力変数値の組合せが識別されることで、最適機能特性を有する反
応生成物が生成される。繰り返すが、これもＨＤＭＲに対して本質的に習用のス
テップである。十中八九、識別された入力変数組合せはこれまでに反応されたも
のでなく、その場合に上記方法は最適機能特性を有する反応生成物を生成すると
識別された変数組合せの反応のためにステップ６０へと進む。ステップ６０は、
上記機能特性出力データの補間に基づく結果がその現在の形態で許容できるか否
かを決定すべく、結果の検証を行う。もし上記結果が許容できるものであれば上
記方法は完了する。と言うのも、問題の最適機能特性を有する反応生成物が識別
されたからである。もし結果が許容できなければ、幾つかの選択肢が在る。The method proceeds to step 50, where output data from the primary and secondary libraries are interpolated to identify combinations of input variable values to produce reaction products with optimal functional properties. Again, this is essentially a conventional step for HDMR. Often, the identified input variable combinations have not previously been reacted, in which case the method proceeds to step 60 for the reaction of the identified variable combinations to produce a reaction product with optimal functional properties. move on. Step 60 is
The result is verified to determine if the result based on the interpolation of the functional characteristic output data is acceptable in its current form. If the results are acceptable, the method is complete. This is because the reaction products with the optimal functional properties in question have been identified. If the result is unacceptable, there are several options.

【００３４】第１は、二次ライブラリを精細化すべくステップ７０に進むことである。これ
は本質的に、更に低頻度のサンプリングを行うことで二次ライブラリに対する出
力データ分解能を高め、補間に対する更なるデータ点を求めるものである。たと
えば、１２番目毎の組合せの順序付けされたサンプリングが実施されているなら
ば、更なるデータ点を求めるべく１０番目毎の組合せをサンプリングし得る。も
し入力変数組合せがランダムにサンプリングされているなら、更なるデータ点を
求めるべく付加的なランダム・サンプリングが実施され得る。また上記ランダム
・サンプリングが不均一であったなら、最適機能的特性性能を有する領域が識別
されるまで、可能性の空間における他の領域が不均一にランダムにサンプリング
され得る。The first is to proceed to step 70 to refine the secondary library. This essentially increases the output data resolution for the secondary library by performing even less frequent sampling to find more data points for interpolation. For example, if ordered sampling of every twelfth combination is being performed, then every tenth combination may be sampled for additional data points. If the input variable combinations are randomly sampled, then additional random sampling may be performed to find additional data points. Also, if the random sampling was non-uniform, other regions in the space of possibilities could be randomly sampled non-uniformly until the region with the optimal functional characteristic performance is identified.

【００３５】二次ライブラリもまた、掌握している情報に基づき相関された入力変数組合せ
を行うことで精細化され得る。もし各変数が多次元空間における座標軸として表
されると共に機能特性により認識された選択変数値の自然な順序に従い該選択変
数値が上記座標軸上に位置されているなら、相関組合せは本質的に、２個以上の
変数に対して認識された自然な変数値を可能な限り接近して整合することにより
最適機能的特性性能と一致するという領域への上記座標軸の回転を表す。本質的
にこれは、可能な最適機能的特性性能の領域の精査により付加的なデータ点を収
集すべく可能性の空間にわたる多次元切断（ｍｕｌｔｉ−ｄｉｍｅｎｓｉｏｎａ
ｌｃｕｔ）を表す。The secondary library can also be refined by performing correlated input variable combinations based on the information in hand. If each variable is represented as a coordinate axis in a multidimensional space and the selected variable values are located on the coordinate axis according to the natural order of the selected variable values recognized by the functional property, the correlation combination is essentially By matching the recognized natural variable values for two or more variables as closely as possible, we represent the rotation of the coordinate axes into a region that is in agreement with the optimal functional characteristic performance. Essentially this is a multi-dimensional cut across the space of possibilities to collect additional data points by scrutinizing the area of possible optimal functional properties performance.
l cut).

【００３６】本発明は、最適機能特性を有する反応生成物を識別する上記補間ステップに対
して十分な出力データが求められるまで、ステップ３０、４０、５０、６０およ
び７０の反復繰り返しを企図する。但しこのプロセスの任意の時点にて、ステッ
プ３０乃至７０が一回または多数回実施されたか否かに関わらず、ステップ２０
を反復するのが望ましい。すなわち上記二次ライブラリ出力データは、最初の順
序付けが実施された時点で掌握している情報によっては識別され得ない順序付け
であって機能特性の観点から自然な変数値に対して更に合理的な順序付けを表し
得る。上記方法はステップ３０乃至７０から直接的にステップ２０に戻ることで
該ステップを反復しても良く、または、ステップ２０を反復する場合に掌握情報
を用いて付加的なデータ点を求めるべく、先ずステップ１０に戻ることで該ステ
ップを反復して一次ライブラリ出力データの別の集合を構築しても良い。The present invention contemplates iterative iterations of steps 30, 40, 50, 60 and 70 until sufficient output data has been determined for the above interpolation step to identify reaction products with optimal functional properties. However, at any point in the process, regardless of whether steps 30-70 were performed once or multiple times, step 20
It is desirable to repeat. That is, the secondary library output data is an ordering that cannot be identified by the information held at the time when the first ordering is performed, and a more reasonable ordering with respect to natural variable values from the viewpoint of functional characteristics. Can be represented. The method may repeat the steps by returning directly from step 30-70 to step 20, or, if step 20 is repeated, first using the gripping information to determine additional data points. The process may be repeated by returning to step 10 to build another set of primary library output data.

【００３７】次にステップ３０乃至７０は、もし在るならば、ステップ２０の反復により生
成された変数値階層に基づく付加的な低頻度サンプリングにより反復され得る。
ステップ３０乃至７０の反復繰り返しは、最適機能特性を有する反応生成物が識
別されるまで自然な変数値の合理的順序付けの再評価のためのステップへ戻るこ
とを含めて継続され得る。Steps 30-70 may then be repeated, if any, with additional infrequent sampling based on the variable value hierarchy generated by the iteration of step 20.
The iterative iteration of steps 30-70 may be continued, including returning to the step for reassessment of the reasonable ordering of the natural variable values until the reaction product with optimal functional properties is identified.

【００３８】ひとつの結果は、各変数に対して選択された値により定義された可能性の全空
間にわたり最適機能特性を有する反応生成物が識別されたとしても、その最適値
が上記研究の発端において設定された目標値に対して依然として不十分なことも
ある。薬剤作用に対して研究されつつある分子構造を例に取ると、たとえば全て
の可能な組合せにおいて最高の効力を有すると識別された化合物の効力が依然と
して低すぎて、候補薬剤として更に研究するだけの価値が無いこともある。おそ
らく、これは一個以上の変数に対して選択された値の結果であり、その場合の解
決策は本発明のステップ９０に進み、そこで変数値の各集合を拡大して上記方法
を最初から反復することである。One result is that even if reaction products with optimal functional properties were identified over the entire space of possibilities defined by the values chosen for each variable, the optimal value was the origin of the above study. It may still be insufficient for the target values set in. Taking the molecular structure being studied for drug action as an example, the compounds identified as having the highest potency in all possible combinations, for example, still have too low potency to be further studied as candidate drugs. Sometimes it's not worth it. Perhaps this is the result of the selected values for one or more variables, then the solution proceeds to step 90 of the present invention, where each set of variable values is expanded and the above method repeated from the beginning. It is to be.

【００３９】すなわちステップ９０においては、変数値の各集合の全てまたは幾つかを拡大
した後、本発明の方法を反復する選択肢が提供される。たとえば、研究のために
付加的な化学的骨組が選択され得るか、または、一個以上の化学機能化部位に対
して付加的な化学的部分が選択され得る。目的は、機能特性に対して企図された
目標における機能特性最適化の領域を探す上で可能性の空間を拡大することであ
る。Thus, in step 90, an option is provided to expand all or some of each set of variable values and then repeat the method of the present invention. For example, additional chemical scaffolds may be selected for study, or additional chemical moieties may be selected for one or more chemical functionalization sites. The purpose is to expand the space of possibilities in looking for areas of functional property optimization in the intended goals for functional properties.

【００４０】別の結果は単に、三次および可能な場合には四次の協働性の故に、一次および
二次ライブラリ出力データのみに基づいては可能性の全空間にわたり正確に補間
が行えないことであり得る。化学系は通常、低次の多変数協働性により定義され
ることから、殆どの場合に一次および二次ライブラリ出力データは十分な情報を
求めて可能性の空間にわたる補間を許容することで最適機能特性を有する反応生
成物を識別するに十分である。しかし特に多数の変数を含むシステムの場合に変
数の相互依存性の可能性は大きくなることから、三次および可能な場合には四次
変数値の組合せの精査およびそれから導出される出力データなどが必要となる。Another result is that, due to the cooperativity of cubic and possibly quartic, interpolation cannot be done accurately over the entire space of possibilities based solely on primary and secondary library output data. Can be. Since chemical systems are usually defined by low-order multivariable cooperativity, in most cases primary and secondary library output data are optimal by seeking enough information to allow interpolation over the space of possibilities. Sufficient to identify reaction products with functional properties. However, since the possibility of interdependence of variables increases especially in the case of a system containing a large number of variables, it is necessary to scrutinize the combinations of tertiary and quaternary variable values and output data derived from them when possible. Becomes

【００４１】これは図１においてステップ１００で示されており、該ステップは、先行する
出力データのライブラリから低頻度サンプリングされた入力変数組合せの反応か
ら導出された機能特性出力データの三次および更に高次のライブラリの構築を確
認するものである。すなわち、三次ライブラリは二次ライブラリの出力データか
ら一度に３個の変数を採用するという低頻度サンプリングに基づき、四次ライブ
ラリは三次ライブラリの出力データから一度に４個の変数を採用するという低頻
度サンプリングに基づくものであり、以下同様である。任意の時点において出力
データは、最適機能特性を有する反応生成物を識別すべく本発明の方法に従い補
間され得る。This is shown in FIG. 1 at step 100, which is a third and even higher order of functional characteristic output data derived from the response of the input variable combinations that were sampled infrequently from a library of previous output data. Check the construction of the following library. That is, the ternary library employs low frequency sampling, which employs three variables at a time from the output data of the secondary library, while the quaternary library employs low frequency sampling, which employs four variables at a time from the output data of the tertiary library. It is based on sampling, and so on. At any point in time, the output data can be interpolated according to the method of the present invention to identify reaction products with optimal functional properties.

【００４２】本発明の方法の一定の態様を、部位特異的なタンパク質の突然変異に対する応
用に関する以下の説明を参照しながら示すが、これは請求項により定義された発
明の範囲を制限するものと解釈されるべきでない。Certain embodiments of the method of the present invention are illustrated with reference to the following description of applications for site-specific protein mutations, which are intended to limit the scope of the invention as defined by the claims. Should not be interpreted.

【００４３】部位特異的なタンパク質の突然変異に適用されるＨＤＭＲ再順序付けこれまでの殆どのタンパク質突然変異は、タンパク質化学およびアミノ酸の物
理的／化学的特性に関する直感により導かれた低次の多数の突然変異を含めて、
高度に選択的であった。而して斯かる研究が、ＨＤＭＲ再順序付けの実効性を評
価するために必要な単一および複数の突然変異体の大規模集合を系統的に分析す
ることは殆ど無かった。均一な二量体の単一鎖ＤＮＡ結合タンパク質である、バ
クテリオファージｆｌの遺伝子Ｖタンパク質に対しては、一定の系統的データが
入手可能である（Ｓａｎｄｂｅｒｇ等、Ｐｒｏｃ．Ｎａｔｌ．Ａｃａｄ．Ｓｃｉ
．ＵＳＡ，９０，８３６７−８３７１（１９９３）；Ｓａｎｄｂｅｒｇ等、Ｂｉ
ｏｃｈｅｍ．，３４，１１９７０−１１９７８（１９９５））。上記タンパク質
の疎水性コア内における２つの接触位置Ｖａｌ３５およびＩｌｅ４７が部分的に
ランダム化され、対応する二重変異体の多くを含め、各位置にて８種の置換が為
された。突然変異体の作用がそれらのファージ増殖表現型の温度感応性により半
定量的に評価されると共に、単一変異体の全て及び二重変異体の内の１８種が精
製されてフォールディング安定性及びＤＮＡ結合親和力の両者に関して特徴付け
られた。 HDMR Reordering Applied to Site-Specific Protein Mutations Most of the protein mutations to date have been guided by intuition regarding protein chemistry and the physical / chemical properties of amino acids. Including mutations,
It was highly selective. Thus, such studies rarely systematically analyzed the large set of single and multiple mutants required to assess the effectiveness of HDMR reordering. Certain systematic data are available for the gene V protein of bacteriophage fl, a homogeneous dimeric single-stranded DNA binding protein (Sandberg et al., Proc. Natl. Acad. Sci.
． USA, 90, 8367-8371 (1993); Sandberg et al., Bi.
ochem. , 34, 11970-11978 (1995)). The two contact positions Val35 and Ile47 within the hydrophobic core of the protein were partially randomized, making eight substitutions at each position, including many of the corresponding double mutants. The effects of the mutants were evaluated semi-quantitatively by their temperature sensitivity of their phage growth phenotype, and all single mutants and 18 of the double mutants were purified to yield folding stability and It has been characterized in terms of both DNA binding affinity.

【００４４】図２Ａ、図２Ｂ、図３Ａおよび図３Ｂには、Ｓａｎｄｂｅｒｇ等（１９９５）
の安定性および結合のデータを用い、比較的散在的であるが依然として有益な行
列がプロットされ得る。切断中心として野性型のタンパク質を用いて実験室デー
タに基づき各変数を再順序付けすると、規則的パターンが識別され得ることが例
証される。図２Ａおよび図３Ａは、Ｓａｎｄｂｅｒｇ等により与えられた残基置
換のオリジナルの順序付けを用いている。単一部位の置換の効果は比較的に穏や
かであり且つ対合的な置換の大半は付加的効果を与えたという事実にも関わらず
、図２Ａおよび図３Ａのプロットでは何らのパターンも明らかでない。これと対
照的に図２Ｂおよび図３Ｂは、応答データを再順序付けして各軸に沿った単調な
挙動を生成すると全空間にわたり比較的に円滑な応答が得られることを示してい
る。一般的な場合に全空間応答面が単調であるとは限らないが、此処では概ねそ
うであり、且つ、相当の非加算効果が存在する場合にはおそらくそうである。し
かし応答面のみは、全空間にわたり補間およびＨＤＭＲ分析を使用し得るべく合
理的に規則的である必要が在る。斯かる研究、特に多くの部位における突然変異
の場合における重要な考察は、データの精度である。ＨＤＭＲは考え得る突然変
異の全空間にわたり低頻度サンプリングを行うと共に補間する系統的手段を提供
するが、観察されたデータの精度は限界的である。ＨＤＭＲからのひとつの結論
は、観察の質に注意を払うことを条件として、少数の突然変異が空間の全体にわ
たるタンパク質機能特性に対する評価を提供すべきことである。2A, 2B, 3A, and 3B, Sandberg et al. (1995).
Using the stability and binding data of, a relatively sparse but still informative matrix can be plotted. Reordering each variable based on laboratory data using the wild-type protein as the cleavage center illustrates that a regular pattern can be identified. 2A and 3A use the original ordering of residue substitutions provided by Sandberg et al. Despite the fact that the effects of single-site substitutions were relatively mild and most of the pairwise substitutions provided additive effects, no pattern was evident in the plots of FIGS. 2A and 3A. . In contrast, FIGS. 2B and 3B show that reordering the response data to produce a monotonic behavior along each axis results in a relatively smooth response across space. Not all spatial response surfaces are monotonic in the general case, but almost so here, and probably when significant non-additive effects are present. However, only the response surface needs to be reasonably regular to be able to use interpolation and HDMR analysis over the whole space. An important consideration in such studies, especially in the case of mutations at many sites, is the accuracy of the data. Although HDMR provides a systematic means of performing and interpolating low frequency sampling over the entire space of possible mutations, the accuracy of the observed data is marginal. One conclusion from HDMR is that a few mutations should provide an assessment for protein functional properties across space, subject to attention to quality of observation.

【００４５】遺伝子Ｖタンパク質の例において明らかな如く、各機能特性は変数に関して異
なる単調的な再順序付けを有し得ることから、ＨＤＭＲ拡張における各項の相対
有意性に対してその機能特性自体の独特のパターンを有する。同様に、各変数を
再順序付けした後の応答面の結果的な規則性は、各変数の特性（側鎖タイプ）に
パターンが存在することを意味する。この方法により示唆されるデータの解釈は
おそらく、種々の分子の機能または特性に対して関連する個々の残基の特性が更
に包括的に理解される。As is apparent in the gene V protein example, each functional property may have a different monotonic reordering with respect to variables, thus making the functional property itself unique to the relative significance of each term in the HDMR expansion. Pattern. Similarly, the resulting regularity of the response surface after reordering each variable means that there is a pattern in the properties (side chain type) of each variable. The interpretation of the data suggested by this method is likely to provide a more comprehensive understanding of the properties of individual residues that are relevant to the function or properties of various molecules.

【００４６】ＨＤＭＲ分析が後に続く上記の変数再順序付け方法は、タンパク質突然変異の
研究に限定されるのでは無く、他のタイプの分子（たとえば薬剤およびゲノム）
およびそれらの関連する観測可能特性に拡張され得る。更にこの再順序付け方法
は構成要素と観察特性との間の関係を先験的に理解する必要は無く、代わりに、
観察特性がその関係を定義し得る。一例として図４Ａおよび図４Ｂは、Ｓａｎｄ
ｂｅｒｇ等の突然変異体に対する生体内での半定量的な温度感応性を示している
。まさに図２Ａ、図２Ｂ、図３Ａおよび図３Ｂにおける結合および安定性のデー
タと同様に、再順序付けを行うと、温度感応性の表現型に対する分子的な理解が
欠如しているにも関わらずに応答面には規則性がもたらされる。The above variable reordering method followed by HDMR analysis is not limited to the study of protein mutations, but may include other types of molecules (eg drugs and genomes).
And their associated observable properties. Moreover, this reordering method does not require a priori understanding of the relationship between components and observational properties, instead:
Observational characteristics may define the relationship. As an example, FIGS. 4A and 4B show Sand
It shows a semi-quantitative temperature sensitivity in vivo to a mutant such as Berg. Just like the binding and stability data in FIGS. 2A, 2B, 3A and 3B, reordering was despite a lack of molecular understanding of the temperature-sensitive phenotype. Regularity is brought to the response surface.

【００４７】これらの結果は、上記再順序付けの手法の一般性を示唆すると共に、広範囲な
他のタイプの多次元データ解析に対する適用性を示唆している。容易に理解され
る如く、請求項に示された本発明から逸脱することなく、上記に示された特徴の
多くの変更例および組合せが利用され得る。斯かる変更例は本発明の趣旨および
範囲からの逸脱物と解釈されてはならず、斯かる変形例の全ては特許範囲の範囲
内に包含されると解釈される。These results suggest the generality of the above reordering approach and its applicability to a wide range of other types of multidimensional data analysis. As will be readily appreciated, many variations and combinations of the features set forth above may be utilized without departing from the invention as set forth in the claims. Such modifications should not be construed as deviations from the spirit and scope of the present invention, but all such modifications are intended to be included within the scope of the patent scope.

[Brief description of drawings]

【図１】本発明に係る方法を示すフローチャートである。1 is a flow chart showing a method according to the present invention.

【図２Ａ】当初に示された如く構成された部位Ｉ４７およびＶ３５における遺
伝子Ｖタンパク質の突然変異からの安定性データのヒストグラム・プロットであ
る。FIG. 2A is a histogram plot of stability data from mutations of the gene V protein at sites I47 and V35 constructed as originally shown.

【図２Ｂ】本発明に係る方法により実施された図２Ａのデータの再配置である
。2B is a rearrangement of the data of FIG. 2A performed by the method according to the invention.

【図３Ａ】図２Ａと同様であるがＤＮＡ結合親和力に対している。3A is similar to FIG. 2A, but for DNA binding affinity.

【図３Ｂ】図３Ａのデータの同様の再配置である。3B is a similar rearrangement of the data of FIG. 3A.

【図４Ａ】当初に示された如く構成された部位Ｉ４７およびＶ３５における遺
伝子Ｖタンパク質の突然変異からの表現型データのヒストグラム・プロットであ
る。FIG. 4A is a histogram plot of phenotypic data from mutations of the gene V protein at sites I47 and V35 constructed as originally shown.

【図４Ｂ】図４Ａのデータの同様の再配置である。4B is a similar rearrangement of the data of FIG. 4A.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ，ＴＲ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＧＷ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＭＺ，ＳＤ，ＳＬ，ＳＺ，ＴＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＥ，ＡＧ，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＢＺ，ＣＡ，ＣＨ，ＣＮ，ＣＲ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＤＭ，ＤＺ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＤ，ＧＥ，ＧＨ，ＧＭ，ＨＲ，ＨＵ，ＩＤ，ＩＬ，ＩＮ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＡ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＭＺ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＴＺ，ＵＡ，ＵＧ，ＵＳ，ＵＺ，ＶＮ，ＹＵ，ＺＡ，ＺＷ─────────────────────────────────────────────────── ─── Continued front page (81) Designated countries EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, I T, LU, MC, NL, PT, SE, TR), OA (BF , BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, G M, KE, LS, MW, MZ, SD, SL, SZ, TZ , UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, B Z, CA, CH, CN, CR, CU, CZ, DE, DK , DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, J P, KE, KG, KP, KR, KZ, LC, LK, LR , LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, NO, NZ, PL, PT, R O, RU, SD, SE, SG, SI, SK, SL, TJ , TM, TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA, ZW

Claims

[Claims]

1. A method for selectively varying multivariable molecular synthesis for optimizing functional properties of reaction products, which is obtained by reacting all primary library input variable combinations to the multivariable synthesis. Constructing a primary library of functional property output data and measuring the functional properties for all reaction products, wherein the primary library input variable combinations are all selections for variables adopted one at a time. Order the values for each input variable according to the effect of the input variables on the functional characteristic optimization based on the above primary library of functional characteristic output data, while including the values, while other variable values are kept constant or randomized Obtained by reacting a process with a set of low-frequency sampled secondary library input variable combinations from the ordered input variables Constructing a secondary library of functional characteristic output data and measuring the functional characteristic for all reaction products, wherein the secondary library input variable combination is a low frequency of the ordered input variables. For the variables that are combined two at a time from sampling, while there are other variable values, the other variable values are held constant or randomized, and for optimizing the above functional characteristics. And a step of interpolating between the functional characteristic output data described above.

2. The method of claim 1, further comprising the step of selecting a reaction product having optimal functional properties from the results of said interpolating step.

3. The method of claim 1, wherein at least one input variable among the combinations is randomized when building the primary library.

4. The method of claim 3, wherein all the input variables among the combinations are randomized when building the primary library.

5. The multivariate synthetic reaction is the synthesis of a molecule to be studied for drug action, and the input variables are altered in one or more chemical scaffolds and in the chemical functionalization sites of the scaffolds. The method of claim 1, including a set of parts to be made.

6. The multivariable synthesis reaction is the synthesis of a protein for the study of site-directed mutagenesis, and the input variable comprises the set of amino acid residues to be incorporated at each mutation site. The method described.

7. The method of claim 1, wherein at least one input variable is reaction condition.

8. After the step of ordering the values for each input variable according to the effect of each input variable on the functional characteristic optimization, the method further comprises: data for functional characteristic optimization for at least one input variable. Selecting additional input variable combinations by interpolation of the above, adding the additional input variable combinations to the primary library, and constructing the primary library of the functional characteristic output data before building the secondary library. The method of claim 1, comprising: repeating the steps of constructing and the ordering of the values of each input variable.

9. The method of claim 8, wherein the additional input variable combination comprises a quadratic combination.

10. The method of claim 8, wherein the additional input variable combination comprises a new variable selection.

11. The method of claim 1, wherein the infrequent sampling is ordered infrequent sampling.

12. The method of claim 11, wherein the ordered infrequent sampling is non-uniform.

13. The method of claim 1, wherein the infrequent sampling is random infrequent sampling.

14. The random infrequent sampling is non-uniform.
The method described.

15. The method further comprising: identifying an input variable combination that produces a reaction product having optimal functional characteristics; and reacting the optimal input variable combination to synthesize the reaction product. The method described in 1.

16. After the interpolation step, the method further comprises: (1) (a) infrequently sampling additional secondary input variable combinations from the ordered input variables, and (b) the additional secondary variables. React the next input variable combination, and
(C) expanding the secondary library of functional property output data by measuring the functional properties for all reaction products; and (2) repeating the interpolation process. The method according to item 1.

17. The method of claim 16, wherein the step of expanding the secondary library of functional characteristic output data and the step of repeating the interpolation step are performed iteratively.

18. The method further comprising repeating the step of ordering the values for each input variable according to the effect of the input variables on the functional characteristic optimization based on both the primary and secondary libraries of functional characteristic output data. Item 17. The method according to Item 17.

19. Before iterating the steps of ordering the values for each input variable, the method further repeats the steps of building a primary library to determine an additional primary library of functional characteristic output data. 19. The method of claim 18 including the steps.

20. After the interpolation step, the method further comprises: (1) creating a correlated secondary input variable combination based on the secondary library of functional characteristic output data; and (2) the function. The method of claim 1, comprising repeating the steps of constructing the secondary library of characteristic output data and interpolating between the functional characteristic output data.

21. The method of claim 1, wherein the functional characteristic output data comprises output data for more than one type of functional characteristic, and variable values are independently ordered for each functional characteristic.

22. After the step of interpolating, the method further comprises increasing the number of values selected for at least one variable and repeating the method from the construction of the primary library. The method according to item 1.

23. After the step of interpolating, the method further comprises generating a tertiary library of functional characteristic output data obtained by reacting a set of infrequently sampled tertiary library input variable combinations from the ordered input variables. Constructing and measuring the functional properties for all reaction products, wherein the tertiary library input variable combinations are variables combined three at a time from the infrequent sampling of the ordered input variables. On the other hand, if there is another variable value, the other variable value is held constant or randomized, and a step of interpolating between the functional characteristic output data for optimizing the functional characteristic. The method of claim 1, comprising: