JP2015119718A

JP2015119718A - Method of selecting aptamers

Info

Publication number: JP2015119718A
Application number: JP2015018164A
Authority: JP
Inventors: ブラウン，クライブ，ギャヴィン; gavin brown Clive
Original assignee: Caris Life Sciences Switzerland Holdings GmbH
Current assignee: Caris Life Sciences Switzerland Holdings GmbH
Priority date: 2007-10-22
Filing date: 2015-02-02
Publication date: 2015-07-02
Also published as: CN102277353B; EP2209914B2; EP2209914B1; EP2209914A1; CN102277353A; WO2009053691A1; CN101835904A; JP2011500076A; US20100304991A1; HK1164364A1; US9315804B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for the identification of one or more aptamers to at least one target molecule.SOLUTION: The method comprising: selecting candidate aptamer sequences that bind to a target molecule, assigning to the bound sequences a measure (fitness function) of each sequence's aptameric potential, allowing evolution of some or all of the sequences to create a new mixture of candidate sequences, and repeating the method with the newly created candidate aptamer pool until the aggregate aptameric potential of the candidate pool reaches a plateau, wherein sequences present in the final pool are optimal aptamers to the target molecule.

Description

本発明はアプタマーの分野に関する。特に、本発明は、タンパク質バイオマーカー同定などのプロテオミクスにおいて使用される、アプタマーライブラリーおよびタンパク質特異的アプタマーを生成する方法に関する。 The present invention relates to the field of aptamers. In particular, the present invention relates to aptamer libraries and methods for generating protein-specific aptamers used in proteomics such as protein biomarker identification.

アプタマーは、抗体に概念的に類似する方式での標的分子の結合を可能にする十分に定義された立体的な形状を形成する、短いポリマー（通常は核酸（ＤＮＡ、ＲＮＡ、ＰＮＡ））である。アプタマーは、低分子および抗体の最適な特徴を組み合わせたものであり、それらは高特異性および高親和性、化学的安定性、低免疫原性、ならびに標的のタンパク質−タンパク質相互作用に対する能力を含む。アプタマーは、高特異性に加えて、標的に対する非常に高い親和性を有する。典型的には、タンパク質に対して生成されたアプタマーは、ピコモルから低いナノモルの範囲の親和性を有する。モノクローナル抗体とは対照的に、アプタマーは生物学的に発現されるのではなく化学的に合成され、有意な費用優位性を提供する（８、９）。 Aptamers are short polymers (usually nucleic acids (DNA, RNA, PNA)) that form well-defined steric shapes that allow the binding of target molecules in a manner that is conceptually similar to antibodies. . Aptamers are a combination of small molecules and optimal characteristics of antibodies, including high specificity and affinity, chemical stability, low immunogenicity, and the ability to target protein-protein interactions. . Aptamers have a very high affinity for the target in addition to high specificity. Typically, aptamers generated against proteins have affinities in the picomolar to low nanomolar range. In contrast to monoclonal antibodies, aptamers are chemically synthesized rather than biologically expressed, providing significant cost advantages (8, 9).

アプタマーは、典型的には、米国特許出願第０７／５３６，４２８号、米国特許出願第５，４７５，０９６号、および米国特許出願第５，２７０，１６３号において記述されている、「指数関数的濃縮によるリガンドの系統的進化（Systematic Evolution of Ligands by Exponential enrichment）」（ＳＥＬＥＸ）と呼ばれるインビトロ進化プロセスを介して産生される。ＳＥＬＥＸプロセスは、同じ一般的な選択スキームを使用する、候補オリゴヌクレオチドの混合物からの選択、ならびに結合、分割および増幅の段階的な反復を含み、結合親和性および結合選択性の任意の所望される基準を事実上達成する。好ましくはランダム化配列のセグメントを含む核酸の混合物から開始して、ＳＥＬＥＸプロセスは、結合に好ましい条件下で標的と混合物を接触させる工程、特異的に標的分子に結合した核酸から未結合の核酸を分割する工程、核酸−標的複合体を解離する工程、核酸標的複合体から解離された核酸を増幅してリガンドが濃縮された核酸の混合物を産出する工程、そして結合、分割、解離および増幅の工程を所望される数のサイクルだけ反復して、標的分子に対して最も高い結合親和性を備えた配列のみを得る工程を含む。候補オリゴヌクレオチドは、配列内の固定モチーフまたは既知のモチーフを含むことができる。もし純粋にランダムな配列が使用されるならば、候補集団内の選択的なアプタマーの発見は完全に偶然に依ることになる。実際は、オリゴヌクレオチド配列がランダムであればあるほど、研究下の標的についてのその配列の選択は偶然に依るに違いない（１１、１２、１３）。 Aptamers are typically described in U.S. Patent Application No. 07 / 536,428, U.S. Patent Application No. 5,475,096, and U.S. Patent Application No. 5,270,163. It is produced through an in vitro evolution process called “Systematic Evolution of Ligands by Exponential enrichment” (SELEX). The SELEX process includes selection from a mixture of candidate oligonucleotides using the same general selection scheme, and stepwise repetition of binding, splitting and amplification, and any desired binding affinity and binding selectivity Achieve the standard effectively. Starting from a mixture of nucleic acids, preferably comprising a segment of randomized sequence, the SELEX process involves contacting the mixture with the target under conditions favorable for binding, specifically removing unbound nucleic acid from nucleic acid bound to the target molecule. Splitting, dissociating the nucleic acid-target complex, amplifying the nucleic acid dissociated from the nucleic acid target complex to yield a ligand-enriched mixture of nucleic acids, and binding, splitting, dissociation and amplification steps Is repeated for the desired number of cycles to obtain only the sequence with the highest binding affinity for the target molecule. Candidate oligonucleotides can contain fixed motifs or known motifs within the sequence. If purely random sequences are used, the discovery of selective aptamers within the candidate population will be entirely by chance. In fact, the more random the oligonucleotide sequence, the more likely the choice of that sequence for the target under study will depend on chance (11, 12, 13).

その最も基本的な形態において、ＳＥＬＥＸプロセスは、以下の一連の工程によって定義することができる。
１）異なる配列の核酸の候補混合物を調製する。候補混合物は、一般的に、固定された配列の領域（すなわち、候補混合物の各々のメンバーは同じ位置で同じ配列を含んでいる）およびランダム化配列の領域を含む。固定された配列領域は、（ａ）後述する増幅工程を支援するか、（ｂ）標的に結合することが既知である配列を模倣するか、または（ｃ）候補混合物中の核酸の所与の構造配列の濃度を増強するように選択される。ランダム化配列は、完全にランダム化（すなわち、任意の位置である塩基を見出す確率は４分の１である）、または部分的にのみランダム化（例えば、任意の場所である塩基を見出す確率は０〜１００パーセントの間の任意のレベルで選択できる）することができる。
２）候補混合物は、標的と候補混合物のメンバーとの間の結合に好ましい条件下で、選択された標的と接触させられる。これらの状況下で、標的と候補混合物の核酸との間の相互作用は、標的と標的に対する最も強い親和性を有する核酸との間の核酸標的ペアの形成として判断することができる。
３）標的に対する最も高い親和性を備えた核酸は、標的に対してより低い親和性を備えたそれらの核酸から分割される。非常に少ない数の最も親和性が高い核酸に対応する配列のみ（および恐らく１分子の核酸のみ）が候補混合物中に存在するので、候補混合物中の有意な量の核酸（およそ５〜５０％）が、分割の間に保持されるように、分割基準を設定することが一般的に望ましい。
４）次に、標的に対して比較的より高い親和性を有するとして分割の間に選択された核酸を増幅して、標的に対する比較的より高い親和性を有する核酸が濃縮された新しい候補混合物を生成する。
５）分割を繰り返し上記の工程を増幅することによって、新しく形成された候補混合物が含む弱い結合配列はますます少なくなり、標的に対する核酸の親和性の平均程度は一般的に増加することになる。極端な場合は、ＳＥＬＥＸプロセスは、標的分子に対する最も高い親和性を有するもとの候補混合物から、それらの核酸を表す１つのまたは少数のユニークな核酸を含む候補混合物をもたらすことになる（１１、１３）。 In its most basic form, the SELEX process can be defined by the following sequence of steps.
1) Prepare a candidate mixture of nucleic acids of different sequences. A candidate mixture generally includes a region of fixed sequence (ie, each member of the candidate mixture contains the same sequence at the same position) and a region of randomized sequence. The fixed sequence region may either (a) assist the amplification process described below, (b) mimic a sequence known to bind to the target, or (c) a given number of nucleic acids in the candidate mixture Selected to enhance the concentration of the structural sequence. Randomized sequences can be completely randomized (ie, the probability of finding a base at any position is a quarter), or only partially randomized (eg, the probability of finding a base at any place is Can be selected at any level between 0 and 100 percent).
2) The candidate mixture is contacted with the selected target under conditions favorable for binding between the target and a member of the candidate mixture. Under these circumstances, the interaction between the target and the nucleic acid of the candidate mixture can be judged as the formation of a nucleic acid target pair between the target and the nucleic acid having the strongest affinity for the target.
3) The nucleic acids with the highest affinity for the target are split from those nucleic acids with the lower affinity for the target. Only a sequence corresponding to a very small number of highest affinity nucleic acids (and possibly only one molecule of nucleic acid) is present in the candidate mixture, so a significant amount of nucleic acids in the candidate mixture (approximately 5-50%) However, it is generally desirable to set the split criteria so that it is maintained during splitting.
4) Next, amplify the nucleic acids selected during the split as having a relatively higher affinity for the target to produce a new candidate mixture enriched for nucleic acids having a relatively higher affinity for the target. Generate.
5) By repeating the resolution and amplifying the above steps, the newly formed candidate mixture will contain less and less weak binding sequences, and the average degree of nucleic acid affinity to the target will generally increase. In the extreme case, the SELEX process will result in a candidate mixture containing one or a few unique nucleic acids representing those nucleic acids from the original candidate mixture having the highest affinity for the target molecule (11, 13).

アプタマーライブラリーを生成する場合の主要な問題の１つは、可能な探索空間の全体の大きさである。例えば、単一のタンパク質について選択的および特異的であり得る何百または何千もの可能なオリゴヌクレオチド配列があり得るが、どこで開始して最も正しい配列を見つけるか？ One of the main problems when generating an aptamer library is the overall size of the possible search space. For example, there can be hundreds or thousands of possible oligonucleotide sequences that can be selective and specific for a single protein, but where to start to find the most correct sequence?

用語「探索空間」は、所与の長さのアプタマー分子で生じることができるポリマー単位（ヌクレオチドなどの）の可能なまたは許容されたバリエーションをすべて包含する。一度に限定的な数の配列のみしか問い合わせることができないので、候補ライブラリーは、所与の長さの可能なヌクレオチド配列の可能な探索空間の画分のみしかサンプリングすることができない。例えば、４つの基本単位（ＤＮＡ／ＲＮＡについてはヌクレオチド）を含む可能な４０ｍｅｒは１０^２４ある。したがって、生物学的サンプル（これが何万もの異なるタンパク質を含むと仮定して）からの典型的なタンパク質空間に対するアプタマー探索空間（４０ｍｅｒ核酸について）は、〜１０^２０：１（±１０倍の幅で）のアプタマー候補ポリマー対タンパク質の比率で存在する。６０ｍｅｒでは、１０^３２：１である。たとえ１０^１０の配列のうち１つのみが強く「アプタマー性」であっても、これは１つの強いアプタマーの配列を見つけるために問い合わせられる１０^２２：１のアプタマー対タンパク質をなお残している。 The term “search space” encompasses all possible or permissible variations of polymer units (such as nucleotides) that can occur in an aptamer molecule of a given length. Since only a limited number of sequences can be queried at a time, the candidate library can only sample a fraction of the possible search space for possible nucleotide sequences of a given length. For example, 40 mer is ^{10 24} possible including four basic units (nucleotides for DNA / RNA). Thus, the aptamer search space (for a 40mer nucleic acid) relative to a typical protein space from a biological sample (assuming it contains tens of thousands of different proteins) is -10 ²⁰ : 1 (± 10 times wider) ) In the ratio of candidate aptamer polymer to protein. At 60 mer, it is 10 ³² : 1. Even if only one of the 10 ¹⁰ sequences is strongly “aptameric”, it still leaves a 10 ²² : 1 aptamer-pair protein that is queried to find the sequence of one strong aptamer.

しかしながら、任意のタンパク質について様々な質の多数のアプタマーが存在する可能性がある。したがって、ＳＥＬＥＸは、適切な配列がテスト下の初期候補混合物中に偶然に存在するということに依存する。次に、この方法は、選択された混合物において利用可能な最も正しい解に指数関数的に収束する。可能性のある適切な配列（例えば配列に対して特定の二次構造または三次構造を付与する配列）のいくつかの予備的知識は、プロセスが完全に偶然によって決定されるとは限らないように保証するために、候補ライブラリーのデザインの間に利用することができる。たとえそうであっても、ＳＥＬＥＸによって提供される解は、その問題について最も正しい解（すなわち特定のタンパク質について最も正しいアプタマー）を提供する可能性は極めて低い。 However, there can be a large number of aptamers of various qualities for any protein. SELEX thus relies on the appropriate sequence being present in the initial candidate mixture under test by chance. The method then converges exponentially to the most correct solution available in the selected mixture. Some prior knowledge of possible suitable sequences (eg, sequences that give a particular secondary or tertiary structure to a sequence) ensures that the process is not completely determined by chance Can be used during candidate library design to ensure. Even so, the solution provided by SELEX is very unlikely to provide the most correct solution for the problem (ie, the most accurate aptamer for a particular protein).

したがって、ＳＥＬＥＸプロセスは、結合選択に対して適用可能な単一発現され単離されたタンパク質標的の使用を好み、小規模なアプタマーセットまたは単一アプタマーへの収束に向けた増幅のラウンドを介して無作為に進行する。そのプロセスは、指数関数的および反復的な選択によって、所与の配列のセットから単離されたタンパク質に対する「最も正しい」アプタマーを見つける。最近発表されたスキームにおいて、しばしば、可能な配列探索空間のうちの１×１０^{−２０ｔｈ}のみが利用される（８）が、特定の基本的な配列モチーフが他のものよりもアプタマーとしてはるかにより多くの可能性を有することが理解される。それにもかかわらず、現在のアプタマー探索は、全体的に最も適合するアプタマーをプールから見つける見込みがない。また、探索の結果は初期プールの多様性に非常に依存的である。大部分の現在のＳＥＬＥＸスキームにおいて、アプタマーは、候補アプタマーの初期探索プール中に存在しなければならない。各々のタンパク質は、現在までに行われた小規模な探索を考慮すると、完全なアプタマー空間内に広いスペクトルの可能なアプタマーを有する可能性もまた非常に高い。ＳＥＬＥＸスキームが指数関数的であるので、そして一般的に推定上のアプタマーが初期ライブラリー中に存在しなければならないので、複数のタンパク質標的に対するアプタマーを同時に選択すること、または他のもののバックグラウンド中でマスクされた１つのタンパク質標的に対するアプタマーを選択することは、ＳＥＬＥＸでは困難を伴うことになる。ＳＥＬＥＸは、文献において典型的であると最近主張される１０^１５の候補アプタマー配列を完全にスクリーニングすることもまたできない。これは、ライブラリー多様性を実証するのに必要なレベルへと、典型的には市販ののＤＮＡ／ＲＮＡシンセサイザー上でのライブラリーの合成の質を制御することができないからである。また、ライブラリー中のアプタマーのうちのいくつかは、非常に他のものにアニール（粘着する）する可能性が非常に高く、あるものは、合成プロセスにおけるバイアスのために集団中で化学量論的に少なく表現されることになる。多くのアプタマーが三次構造の分布へと折り畳まれることになる（あるものは活性があり、他のものは活性がない）。多くは、単純にそれらの相対的な希釈のために標的タンパク質に結合する機会を有さないことになる。従って、１０^１５の候補アプタマーはタンパク質に対して実際にスクリーニングされる可能性は低く、そのため探索は考えられるよりもさらに限定され、ランダム配列を生成する手段に依存するプログラミングである。進化するアプタマーの中へランダムな変動を導入することによってこれらの限定を回避しようとするＳＥＬＥＸスキームについてのいくつかのバリエーションが記述されている（７）。 Thus, the SELEX process favors the use of a single expressed and isolated protein target applicable for binding selection, through a small set of aptamers or rounds of amplification towards convergence to a single aptamer. Progress randomly. The process finds the “most correct” aptamer for a protein isolated from a given set of sequences by exponential and iterative selection. In recently published schemes, often only 1 × 10 ^{−20th of} the possible sequence search space is utilized (8), but certain basic sequence motifs are much more aptamers than others. It is understood that Nevertheless, current aptamer searches are unlikely to find the best fit aptamer from the pool overall. Also, the search results are very dependent on the initial pool diversity. In most current SELEX schemes, aptamers must be in the initial search pool of candidate aptamers. Each protein is also very likely to have a broad spectrum of possible aptamers within the complete aptamer space, considering the small searches made to date. Since the SELEX scheme is exponential and generally a putative aptamer must be present in the initial library, selecting aptamers for multiple protein targets simultaneously, or in the background of others Selecting aptamers for a single protein target masked with SELEX would be difficult with SELEX. SELEX is also unable to fully screen 10 ¹⁵ candidate aptamer sequences recently claimed to be typical in the literature. This is because the quality of library synthesis on a commercially available DNA / RNA synthesizer cannot be controlled to the level necessary to demonstrate library diversity. Also, some of the aptamers in the library are very likely to anneal (stick) to others, some of which are stoichiometric in the population due to bias in the synthesis process. Will be expressed less. Many aptamers will fold into a tertiary structure distribution (some are active and others are inactive). Many will simply have no opportunity to bind to the target protein due to their relative dilution. Thus, 10 ¹⁵ candidate aptamers are unlikely to be actually screened against the protein, so the search is more limited than possible and programming that relies on a means to generate random sequences. Several variations on the SELEX scheme have been described that attempt to circumvent these limitations by introducing random variations into evolving aptamers (7).

別の観点では、ＳＥＬＥＸは、本質的にはコンピュータによるヒューリスティックの物理的なインビトロの実施形態である。ヒューリスティックとは、「適度に良い」正解で間に合う問題の解決を支援するための近似手法である。本事例において、問題は、大きすぎるため全体として探索することができない有限の探索空間から最も正しい解を見つけるということである。ヒューリスティックは、コンピュータでの解決が難しい問題についての解を近似する試行錯誤法を使用する、コンピュータによる方法である。言い換えれば、ヒューリスティックな方法または手順は、いくつかの目標のコンテキスト内で問題を解決する近似手法のみで始めて、次にそれ自体のパフォーマンスを改善し、したがって、より良い解に向かって移動させるために解の影響からのフィードバックを使用するものである。ＳＥＬＥＸは、標的プログラムに対してランダムに生成された候補アプタマー配列の基本的な探索を実行し、分析におけるその残存率によって各候補配列の結果の成功を査定し、次にさらなる再選択のラウンドの前に残存物を増幅する。最終的に、初期ライブラリーからの最も強い候補配列（すなわち標的タンパク質に最も高い親和性で結合するもの）が、手順の後続するラウンドにおいて漸進的により濃縮されるようになる。ＳＥＬＥＸ手順は、したがって研究下のタンパク質に対して最も高い結合親和性を備えたポリマー（または核酸）を見つけるコンピュータによる手順の実験室における実行である。もしアプタマーおよびタンパク質の物理化学的特性ならびにそれらの挙動の原因がすべて既知であるならならば、完全に決定論的にコンピュータ内でＳＥＬＥＸ手順をモデル化することができるだろう。このプロセスはあまり完全でない情報を用いて近似することができる（１７）。コンピュータによる問題をより明瞭に解決する生体分子の使用のさらなる例が公表されている（１、２、３、４、５）。 In another aspect, SELEX is essentially a physical in vitro embodiment of a computer heuristic. A heuristic is an approximation method for assisting in solving a problem in time with a “reasonably good” correct answer. In this case, the problem is to find the most correct solution from a finite search space that is too large to search as a whole. Heuristics are computer-based methods that use trial and error methods that approximate solutions to problems that are difficult to solve on a computer. In other words, a heuristic method or procedure starts with only an approximation method that solves the problem within the context of some goal, then improves its own performance and therefore moves towards a better solution Use feedback from the effects of the solution. SELEX performs a basic search for randomly generated candidate aptamer sequences against the target program, assesses the success of each candidate sequence result by its survival rate in the analysis, and then further rounds of reselection. Amplify the residue before. Eventually, the strongest candidate sequences from the initial library (ie those that bind to the target protein with the highest affinity) become progressively more concentrated in subsequent rounds of the procedure. The SELEX procedure is therefore a laboratory run of a computerized procedure to find the polymer (or nucleic acid) with the highest binding affinity for the protein under study. If the physicochemical properties of aptamers and proteins and the causes of their behavior are all known, it would be possible to model the SELEX procedure in a computer in a completely deterministic manner. This process can be approximated with less complete information (17). Further examples of the use of biomolecules that solve computer problems more clearly have been published (1, 2, 3, 4, 5).

米国特許出願第０７／５３６，４２８号US patent application Ser. No. 07 / 536,428 米国特許出願第５，４７５，０９６号US Patent Application No. 5,475,096 米国特許出願第５，２７０，１６３号US Patent Application No. 5,270,163

本発明者は、本問題を解決し、ＳＥＬＥＸによって提示される探索能力および解を改良するために、アプタマー探索空間を探索し、プロテオームにおいて存在する多数のタンパク質および各タンパク質に対する非常に多数の可能な候補アプタマーによって提示される問題のスケールに対処する最適解を見出す、知的方法が必要であることを認識した。 In order to solve this problem and improve the search capabilities and solutions presented by SELEX, the inventor explored the aptamer search space and the numerous proteins present in the proteome and a large number of possible for each protein We recognized the need for an intelligent method to find the optimal solution to deal with the scale of the problem presented by the candidate aptamer.

多数の可能な解に関する非常に複雑な問題に対する正解を探すための手段としての候補解の集団の進化的成長は、コンピュータにより十分に記述されており（１４、１５、１６）、いわゆる遺伝的アルゴリズム（進化論的な探索ヒューリスティックの特定のクラス）において最も良く具体化される。遺伝的アルゴリズムは、自然選択および自然遺伝学の機構（増殖、突然変異、組換え、自然選択および適者生存）に基づいた探索アルゴリズムである。それらは、構造化されるがなおランダム化された情報交換と、候補解（例えばストリング構造）のコード化表現の中で適者生存を組み合わせる。これは、ヒトによる探索の革新的な直感のうちのいくつかにより探索アルゴリズムを形成することを可能にする。研究下の最適化問題に対する候補解は、集団中で人工創造物（個体）の役割を果たす。次に、集団の進化は、上記の演算子（突然変異、交差、増殖および選択）の繰り返し適用後に起こる。すべての世代において、人工創造物の新しいセット（ストリングコード化または二進法コード化）は、古い集団または以前の集団の適者を断片的に使用して生成される。時には新たな一部分は正しい測定値のために試みられる。遺伝的アルゴリズムは、効率的に過去の情報を利用して、予想される改善されたパフォーマンスを備えた新しい探索ポイントについて推測する（１４）。 The evolutionary growth of a group of candidate solutions as a means to find the correct answer to a very complex problem with a large number of possible solutions has been well described by computers (14, 15, 16), so-called genetic algorithms Best embodied in (a specific class of evolutionary search heuristics). Genetic algorithms are search algorithms based on natural selection and natural genetics mechanisms (growth, mutation, recombination, natural selection and survival of the fittest). They combine structured but still randomized information exchange with survival of the fittest in a coded representation of the candidate solution (eg string structure). This makes it possible to form a search algorithm with some of the innovative intuitions of human search. Candidate solutions to the optimization problem under study play the role of artificial creations (individuals) in the population. Next, population evolution occurs after repeated application of the above operators (mutation, crossover, propagation and selection). In all generations, a new set of artificial creations (string coding or binary coding) is generated using pieces of the old or previous population in pieces. Sometimes a new part is tried for correct measurements. The genetic algorithm efficiently utilizes past information to infer about new search points with the expected improved performance (14).

これらのスキームにおいて、探索問題は、数または文字の鎖、配列またはストリングとして表現されている各々の可能な解ｓによりコード化される。このコード化はヒューリスティックの実行の便宜のみのためのものである。数または文字のこれらの鎖、配列またはストリングは、アナロジーによって、「ゲノム」として、しばしば記載される。各々の人工的な個別の解は、したがってそれ自体のゲノムを有する。候補解（ゲノム）の多くの初期集団が生成されるが、集団は完全にコード化することができる可能な候補解の総数の非常に小さな部分のみを表す。近似正解が大きすぎて網羅的探索で扱うことができない探索空間から捜し求められる場合、典型的にはこれらのヒューリスティックは展開される。「適応度関数」または「目的関数」ｆは、各々の可能な解ｓまたは候補集団のメンバーに適用される。この関数は、個別の解の最適解の特徴に対する近接性の評価である。高いｆ（ｓ）は、ｓが正しい解であることを示唆する。 In these schemes, the search problem is encoded by each possible solution s expressed as a chain of numbers or letters, a sequence or a string. This encoding is for the convenience of heuristic execution only. These chains, sequences or strings of numbers or letters are often described by analogy as “genome”. Each artificial individual solution thus has its own genome. Many initial populations of candidate solutions (genome) are generated, but the population represents only a very small portion of the total number of possible candidate solutions that can be fully encoded. These heuristics are typically developed when an approximate correct answer is too large to be searched from a search space that cannot be handled by an exhaustive search. The “fitness function” or “objective function” f is applied to each possible solution s or member of the candidate population. This function is an evaluation of the proximity of the individual solutions to the optimal solution features. A high f (s) suggests that s is the correct solution.

通常、ランダムに生成された候補解ｓの初期集団は第一世代を含む。適応度関数ｆは候補解および任意の後続する子孫に適用される。選択において、次世代のための親はより高い適応度に向けたバイアスにより選択される。適応度関数ｆおよびカットオフ（例えば上位四分位点）の適用によって測定されるような、「適応」個体のみが選択されて、突然変異の第２のラウンド（それらのゲノムに対するランダム変化）を介して進行し、他の高いスコアの個体とそれらの「ゲノム」の一部分を交差する（組換えまたは増殖）。これを行うために用いられる方法は、１つの可能性から別の可能性までそれらの「ゲノム」の単位のアルゴリズムの変化と、任意の２つの適応ゲノム間のランダムなポイントでのゲノムの前置および後置の組合せの産生とを必要とする。したがって、子の第二世代はもとの親の解に加えて生成される。これをより詳細に表現すると、親は組換えおよび／または突然変異によるコピーによって増殖する。増殖は、個別の候補またはストリングが適応度関数ｆに従ってコピーされるプロセスである。これは、より高い適応度値を備えた候補が次世代に対して１つまたは複数の子孫を与える確率がより高く、内部にコード化された特徴のうちのいくつかが子孫に伝達されることになることを意味する。この演算子（ｆ）は、自然選択の人工バージョン（候補の中のダーウィン的な適者生存）である。一旦候補が増殖のために選択されたならば、候補の正確なレプリカが作製される。次に、この候補はさらに一般的な演算子動作のために交配プール（一時的な新しい集団）へと入力される。 Usually, the initial group of randomly generated candidate solutions s includes the first generation. The fitness function f is applied to the candidate solution and any subsequent descendants. In selection, the parent for the next generation is selected with a bias towards higher fitness. Only “adaptive” individuals, as measured by application of the fitness function f and a cutoff (eg, upper quartile), are selected for the second round of mutations (random changes to their genome). And crosses a portion of their “genome” with other high-scoring individuals (recombination or propagation). The method used to do this is to change the algorithm of those “genome” units from one possibility to another, and the genome prefix at random points between any two adaptive genomes. And the production of postfix combinations. Thus, a second generation of children is generated in addition to the original parent solution. Expressed in more detail, parents grow by copying by recombination and / or mutation. Proliferation is a process in which individual candidates or strings are copied according to a fitness function f. This is because a candidate with a higher fitness value is more likely to give one or more offspring to the next generation, and some of the internally coded features are transmitted to the offspring It means to become. This operator (f) is an artificial version of natural selection (Darwinian survival of candidates among candidates). Once a candidate is selected for growth, an exact replica of the candidate is created. This candidate is then entered into a mating pool (temporary new population) for further general operator action.

組換えは２つの選択された親（候補）に作動し、１つまたは２つの子（新しい候補）をもたらす。増殖の後に、単純な交差を２つの工程で進行することができる。第一に、交配プールにおける、新しい候補または子のメンバーは、任意に交配される。第二に、各ペアの新しい候補は交差を行って、さらに２つの新しい候補を生成する。 Recombination operates on two selected parents (candidates) resulting in one or two children (new candidates). After growth, a simple crossover can proceed in two steps. First, new candidate or child members in the mating pool are arbitrarily mated. Second, each pair of new candidates crosses to generate two more new candidates.

突然変異は１つの候補に作動し、新しい候補をもたらす。たとえ増殖および交差が残存する物を効果的に探索し組み換えても、偶発的にそれらが過剰になり、いくつかの有用な可能性のある遺伝物質を失わせうるので、突然変異は必要である。人工システムにおいて、突然変異演算子は、かかる取消不能な減少に対して防御する。したがって、突然変異は重要な物の早期損失に対する保険である。 Mutations work on one candidate, resulting in a new candidate. Mutations are necessary even if effectively searching for and recombining surviving growths and crossings can accidentally become excessive and cause some useful potential genetic material to be lost. . In artificial systems, mutation operators protect against such irreversible reductions. Mutations are therefore insurance against early loss of important things.

要約すると、これらの演算子は子孫（１セットの新しい候補）を生成し、いくつかの任意の集団サイズの多様であるが類似した個別の候補解（またはゲノム）の新しいセットをもたらす。この新しいセットは、その適応度について査定され、最も正しい解が生き残って増殖および変異する。これらの新しい候補は、次世代でそれらの地位を古い候補と競合する（適者生存）。集団の総計の適応度が安定したプラトーに到達するまでこれを進行させる。しばしば、これは、探索に対して妥当な解である数個の個体があることを意味し、しばしば、それらはいくつかの特徴を共通に有する。したがって、候補解（それらはそのままの状態でコード化される）の集団は、与えられた「適応度」の基準を満たす１セットの解に向かって収束することができる。結果として生じる個体の集団は、初期集団との組成的な共通性をほとんど有していない可能性がある。かかる反復の数、および安定し高度に適応する候補解集団を達成するのに必要な考慮する可能性の数は、比較的少なく、網羅的探索よりもはるかに少ない。 In summary, these operators generate offspring (a set of new candidates), resulting in a new set of diverse but similar individual candidate solutions (or genomes) of several arbitrary population sizes. This new set is assessed for its fitness and the most correct solution survives to grow and mutate. These new candidates will compete with their old candidates in the next generation (surviving the fittest). This is done until the total fitness of the population reaches a stable plateau. Often this means that there are several individuals that are reasonable solutions to the search, and often they have some features in common. Thus, a group of candidate solutions (they are coded as they are) can converge towards a set of solutions that meet a given “fitness” criterion. The resulting population of individuals may have little compositional commonality with the initial population. The number of such iterations and the number of considerations necessary to achieve a stable and highly adaptive candidate solution population is relatively small and much less than an exhaustive search.

したがって、遺伝的アルゴリズムは、大きすぎるため一度にすべてを探索することができない空間における問題に対して、解または解のセットについて反復的に探索することを可能にする。解決される問題は１セットの候補解において最初にコード化され、適応度関数が各々の可能な解に適用される。次に、最も正しい解を同定することができる。次に、ヒューリスティックは、それらの親の特徴を保有する「子」の多様であるが類似した集団を生成する。次に、適応度の質問を再度適用し、プロセスを繰り返す。したがって、探索に対する解のランダム近似を用いて開始し、アルゴリズムは、最も正しい解が適応度関数のロバスト性に依存的であることに留意しつつ、解の最適セット側に集団の移動を許容にする。所与の問題に対する複数の同等に正しい解がある一方で、遺伝的アルゴリズムの強度は、適正な条件下で、これらのアルゴリズムはこれらの同等に正しい解のうちのいくつかまたはすべてを見つける傾向があるということである。 Thus, genetic algorithms allow iterative searches for solutions or sets of solutions for problems in space that are too large to search all at once. The problem to be solved is first coded in a set of candidate solutions, and a fitness function is applied to each possible solution. The most correct solution can then be identified. The heuristic then generates a diverse but similar group of “children” that possess their parental features. The fitness question is then reapplied and the process is repeated. Thus, starting with a random approximation of the solution to the search, the algorithm allows the movement of the population to the optimal set side of the solution, keeping in mind that the most correct solution is dependent on the robustness of the fitness function. To do. While there are multiple equally correct solutions for a given problem, the strength of genetic algorithms tends to find some or all of these equally correct solutions under the right conditions. That is.

これに留意して、本発明は、ＳＥＬＥＸの改善を包含する。別の表現をすれば、本発明はＳＥＬＥＸを改善するコンピュータによるヒューリスティックの適用に関する。この目的のために、本発明は遺伝的アルゴリズムの物理的な実施形態である。実際は、１つの態様において、本発明は、候補アプタマー配列のデザインにおけるポリマー配列進化を方向付ける遺伝的アルゴリズムパラダイムの使用である。 With this in mind, the present invention encompasses improvements in SELEX. In other words, the present invention relates to the application of computerized heuristics to improve SELEX. For this purpose, the present invention is a physical embodiment of a genetic algorithm. Indeed, in one aspect, the invention is the use of a genetic algorithm paradigm that directs polymer sequence evolution in the design of candidate aptamer sequences.

別の見方では、本発明は、少なくとも１つの標的分子に対する１つまたは複数のアプタマーの同定のための方法であり、ＳＥＬＥＸを含む方法であり、各標的分子についての最も適合した配列に向けたその配列の進化を方向付ける候補アプタマー配列の適応度の使用をさらに含むという点を特徴とする方法である。 In another aspect, the invention is a method for the identification of one or more aptamers to at least one target molecule, including SELEX, which is directed to the best-fit sequence for each target molecule. The method is characterized in that it further includes the use of fitness of candidate aptamer sequences to direct sequence evolution.

具体的には、本発明は、少なくとも１つの標的分子に対する１つまたは複数のアプタマーの同定のための方法であって、
ａ）標的分子に結合する候補アプタマー配列を選択することと；
ｂ）結合された配列に各配列のアプタマーの可能性の測定値（適応度関数）を割り当てることと；
ｃ）配列のうちのいくつかまたはすべてに対するランダム変化または指向性変化によって、進化を可能にして新しい候補配列の混合物を生成することと；
ｄ）候補プールの総計のアプタマーの可能性がプラトーに到達するまで、新しく生成された候補アプタマープールにより工程ａ）〜ｃ）を繰り返すこととを含み、
最終プール中に存在する配列が標的分子に対する最適アプタマーである方法である。 Specifically, the present invention is a method for the identification of one or more aptamers against at least one target molecule comprising:
a) selecting candidate aptamer sequences that bind to the target molecule;
b) assigning to the combined sequences a measure of the aptamer potential (fitness function) of each sequence;
c) enabling evolution through random or directed changes to some or all of the sequences to generate a mixture of new candidate sequences;
d) repeating steps a) -c) with the newly generated candidate aptamer pool until the total aptamer potential of the candidate pool reaches a plateau,
This is a method in which the sequence present in the final pool is the optimal aptamer for the target molecule.

より具体的には、
ａ）少なくとも１つの標的分子と候補ポリマー配列のプールを接触させることと；
ｂ）特異的に標的分子に結合した配列から未結合の配列を分割することと；
ｃ）配列標的複合体を解離して、配列のリガンドが濃縮された混合物を得ることと；
ｄ）工程ｃ）において得られた各配列に配列のアプタマーの可能性の測定値（適応度関数）を割り当てることと；
ｅ）工程ｄ）の測定値を使用して、リガンドが濃縮された混合物のアプタマーの可能性を決定することと；
ｆ）工程ｅ）において得られた情報を使用して、工程ｃ）において得られた配列のうちのいくつかまたはすべての進化を可能にして、新しい配列の混合物を生成することと；
ｇ）候補プールの総計のアプタマーの可能性がプラトーに到達するまで、新しく生成された候補アプタマープールにより工程ａ）〜ｆ）を繰り返すこととを含み、
最終プール中に存在する配列が少なくとも１つの標的分子に対する最適アプタマーである方法である。 More specifically,
a) contacting a pool of candidate polymer sequences with at least one target molecule;
b) splitting the unbound sequence from the sequence specifically bound to the target molecule;
c) dissociating the sequence target complex to obtain a mixture enriched in ligands of the sequence;
d) assigning each sequence obtained in step c) a measure of the aptamer potential of the sequence (fitness function);
e) using the measurements of step d) to determine the aptamer potential of the ligand enriched mixture;
f) using the information obtained in step e) to allow the evolution of some or all of the sequences obtained in step c) to generate a new sequence mixture;
g) repeating steps a) -f) with the newly generated candidate aptamer pool until the total aptamer potential of the candidate pool reaches a plateau,
A method in which the sequence present in the final pool is the optimal aptamer for at least one target molecule.

したがって、本発明の方法は、標的について選択的なアプタマー（実質的に最も正しいアプタマー）の同定を可能にする。アプタマー自体の候補ポリヌクレオチド配列は、探索空間（探索空間はすべての可能なアプタマー（および暗黙的にそれらの配列）である）内の可能な解を表す。配列進化の使用は、可能な解の集団が最適解に向かって移動することを可能にする。したがって、最終プール中の配列は、もとの探索プール中に存在する必要はなく、実際にはもとの探索プール中に恐らくほとんど存在しなかった。さらに、ＳＥＬＥＸは可能な解の固定された集団を問い合わせるので、プール中の候補配列の進化を可能にすることにより最も正しい解（アプタマー）の獲得を成功する機会は、ＳＥＬＥＸに比べてより可能性が高い。 Thus, the method of the invention allows the identification of aptamers that are selective for the target (substantially the most correct aptamer). The candidate polynucleotide sequence of the aptamer itself represents a possible solution within the search space (the search space is all possible aptamers (and implicitly their sequences)). The use of sequence evolution allows a population of possible solutions to move towards an optimal solution. Thus, the sequences in the final pool did not have to be present in the original search pool, and in fact were probably hardly present in the original search pool. Furthermore, because SELEX queries a fixed population of possible solutions, the opportunity to succeed in obtaining the most correct solution (aptamer) by allowing the evolution of candidate sequences in the pool is more likely than SELEX. Is expensive.

進化という用語は、選択プロセスの反復の間の配列の増殖、組換え、交差および突然変異を包含する。進化は、ランダムであるか、デザインされているか、またはその組み合わせであり得る。 The term evolution encompasses sequence propagation, recombination, crossover and mutation during the selection process iterations. Evolution can be random, designed, or a combination thereof.

１つのポリヌクレオチドアプタマーを別のポリヌクレオチドアプタマーから区別する唯一の物がその配列であるので、そのアプタマーの特性はしたがってその配列にコード化されていなけらばならないということに注目するべきである。 It should be noted that since the only thing that distinguishes one polynucleotide aptamer from another is its sequence, the properties of that aptamer must therefore be encoded in that sequence.

アプタマーの可能性の割り当ては、各配列に特異的な１つまたは複数の測定された特性または計算した特性を使って行われる。例えば、アプタマーの可能性は、リガンドが濃縮された混合物における配列の存在量（相対的な存在量は以前の反復または対照に比較される）に基づくことができる。定量化はアプタマーの可能性の他の測定値と組み合わせることができる。かかる測定値は、典型的には配列それ自体、および二次または三次の構造予測、疎水性、既知のアプタマーへの類似性などのような配列によって与えられた特性に由来する。数学的にこれらの測定値を組み合わせるかまたは総計する、混成測定値もまた適切であり得る。特定の一般的な配列モチーフの可能性を使用して、配列のアプタマーの可能性を推定することもできる。その測定された統計特徴または配列特徴（または他のもの）から候補配列の相対的または絶対的なアプタマーの可能性を決定する方法は、「適応度関数」と呼ばれる。これは、進化的探索ヒューリスティックの当業者のための標準的な用語に従う（１４）。他の用語は、「目的関数」などの同じ概念についてもまた存在する。非技術的に表現すれば、これは候補アプタマーの可能性の客観的な測定値（通常数のスコア）である。 The assignment of aptamer possibilities is done using one or more measured or calculated properties specific for each sequence. For example, aptamer potential can be based on the abundance of sequences in a mixture enriched for ligands (relative abundance is compared to previous repeats or controls). Quantification can be combined with other measures of aptamer potential. Such measurements are typically derived from the sequence itself and properties given by the sequence such as secondary or tertiary structure predictions, hydrophobicity, similarity to known aptamers, and the like. Hybrid measurements that combine or aggregate these measurements mathematically may also be appropriate. The possibility of a particular general sequence motif can also be used to estimate the aptamer potential of a sequence. The method of determining the relative or absolute aptamer potential of a candidate sequence from its measured statistical or sequence features (or others) is called the “fitness function”. This follows standard terminology for those skilled in the evolutionary search heuristic (14). Other terms also exist for the same concept, such as “objective function”. Expressed non-technically, this is an objective measure of the potential of a candidate aptamer (usually a score).

本発明の特色は、高度にアプタマーである個体が、配列の単一の変更などの小さなランダムな組成的変化および配列の交差組換えを受けることを可能にしており、これによって、より強い候補アプタマー配列の新しいプールを生成する。配列の進化を可能にすることの有意な利点は、それが初期候補プール中に存在しえない配列およびモチーフの生成を可能にするということであり、したがって標的タンパク質について最適アプタマーを同定する可能性を促進する。本方法の他の有意な利点は、互いと組み換えて突然変異を行うことが可能にされるのは「適者」候補配列のみであり、したがって推定上より適応した新規のアプタマー配列の子集団を生成するということである。 Features of the present invention allow individuals who are highly aptamers to undergo small random compositional changes such as a single change in sequence and cross-recombination of sequences, thereby creating stronger candidate aptamers. Create a new pool of arrays. A significant advantage of allowing sequence evolution is that it allows the generation of sequences and motifs that cannot exist in the initial candidate pool, and thus the possibility of identifying the optimal aptamer for the target protein Promote. Another significant advantage of this method is that only “candidate” candidate sequences are allowed to recombine with each other, thus generating a putatively more adapted child population of aptamer sequences. Is to do.

本発明のスキームとＳＥＬＥＸとの間の基本的な差は、ＳＥＬＥＸが研究下の配列中に突然変異または変化の導入を容易には可能にしないということである。実際は、突然変異を生じることができる唯一の方法は、結合された配列がＰＣＲによって増幅される場合の増幅工程の間の偶然によるものである。しかしながら、かかる突然変異は非常にまれであるので、プロセス（すなわち研究下の配列の集団の全体的な組成）に影響できない可能性がある。また、ＳＥＬＥＸは、アプタマー種の間の配列の組換えまたは交差（他の場合には「組換えまたは増殖」と呼ばれる）も、アプタマー配列を変化させる合理的な介入も作動させないかまたは可能にしない。さらに、ＳＥＬＥＸスキームは、例えば配列がアプタマーとしてそれらの「適応度」について査定され、選択のラウンド間で合理的に操作されることを可能にしない。 The basic difference between the scheme of the present invention and SELEX is that SELEX does not readily allow for the introduction of mutations or changes in the sequence under study. In fact, the only way that mutations can occur is by chance during the amplification process when the combined sequences are amplified by PCR. However, such mutations are so rare that they may not affect the process (ie the overall composition of the population of sequences under study). SELEX also does not activate or allow recombination or crossover of sequences between aptamer species (otherwise referred to as “recombination or propagation”) or rational intervention to alter aptamer sequences. . Furthermore, the SELEX scheme does not allow sequences to be assessed for their “fitness” as aptamers, for example, and manipulated rationally between rounds of selection.

合理的な介入の例としては、任意の反復でのすべてのアプタマー配列が特定の配列モチーフを含むことを保証することであってもよい。これは突然変異および組換えの後は行われないかもしれない。従って、合理的な介入はランダム性についてのフィルターまたは制約として作動する。重要なことには、予備的知識（例えば既知のタンパク質結合性モチーフ）を、例えば、研究下の配列の集団へと組み入れて維持することが可能であるので、選択プロセスの効率を改善することができる。これは、その結果を完全には決定しないが、人間が進化を方向付けて制約することを可能にする。 An example of a reasonable intervention may be to ensure that all aptamer sequences at any repeat contain a particular sequence motif. This may not occur after mutation and recombination. Thus, reasonable intervention acts as a filter or constraint on randomness. Importantly, prior knowledge (eg, known protein binding motifs) can be maintained, eg, incorporated into a population of sequences under study, thus improving the efficiency of the selection process. it can. This does not fully determine the outcome, but allows humans to direct and constrain evolution.

さらなる例において、各々の反復で配列プールの多様性をモニタリングすることによって、集団で、複数のタンパク質に対する複数の解（アプタマー）が並列して見出だされるように助長することができる。これは集団全体内の別個の配列の複数の亜集団の進化を同定し、モニタリングし、方向付けることによって遂行することができる。提案された発明とは異なり、ＳＥＬＥＸが、プロセスの各々の反復で結合されたアプタマーの定量化を可能にしないので、および全般的な結果にポジティブに影響を及ぼす方式でアプタマープールの各々の反復の間の各々の配列についての推論および決定を下すためにこの情報を使用しないので、これはＳＥＬＥＸでは可能ではない。 In a further example, monitoring the sequence pool diversity at each iteration can help the population find multiple solutions (aptamers) for multiple proteins in parallel. This can be accomplished by identifying, monitoring and directing the evolution of multiple subpopulations of distinct sequences within the entire population. Unlike the proposed invention, SELEX does not allow the quantification of bound aptamers at each iteration of the process, and for each iteration of the aptamer pool in a manner that positively affects the overall results. This is not possible with SELEX since this information is not used to make inferences and decisions about each sequence in between.

いわゆる「次世代」のポリヌクレオチド配列決定技術は、ビーズまたはチップなどの表面にＤＮＡもしくはＲＮＡまたは他のポリヌクレオチド配列の個別の分子を分離し、それによって配列の単一分子アレイを生成することを必要とする。アレイは、各々の分子が例えば光学顕微鏡法によって個別に分解されることを可能にする表面密度を有する。アレイ上のポリヌクレオチド分子の配列決定は、配列の「デジタルな」（すなわち、絶対的な）カウント、およびしたがってアレイに存在する配列の直接定量を可能にする。いくつかの技術において、配列をいったんアレイ化しクローン的に増幅して、各々の配列からのシグナルを増強および／または明確にすることができる。それにもかかわらず、定量化は、各々のアンプリコンから産生されたシグナルではなく、アレイ上の配列の出現のカウントによって得られる。適切な配列決定技術および定量化技術の例は、ＷＯ００／００６７７０およびブラントン（Branton）ら（２１）などの出版物において見出すことができる。 The so-called “next generation” polynucleotide sequencing technology involves separating individual molecules of DNA or RNA or other polynucleotide sequences onto a surface such as a bead or chip, thereby producing a single molecule array of sequences. I need. The array has a surface density that allows each molecule to be resolved individually, for example by optical microscopy. Sequencing of the polynucleotide molecules on the array allows a “digital” (ie, absolute) count of sequences, and thus direct quantification of the sequences present in the array. In some techniques, sequences can be arrayed once and clonally amplified to enhance and / or define the signal from each sequence. Nevertheless, quantification is obtained by counting the occurrence of sequences on the array, not the signal produced from each amplicon. Examples of suitable sequencing and quantification techniques can be found in publications such as WO 00/006770 and Branton et al. (21).

提案された発明において、候補アプタマー配列の操作は、「第二世代」、「次世代」または「第三世代」のＤＮＡシークエンサーにおいて具体化された技術などの超並列ＤＮＡ塩基配列決定の使用によって達成される。超並列配列決定の使用は、リガンドが濃縮された候補アプタマーが並列して定量および配列決定されることを可能にし、それによってそれらの存在量に由来する適応度測定値、および同時にそれらの配列に由来する他の適応度測定値に対する情報を提供する。なお、超並列シークエンサーのパフォーマンスを考慮すると、かかるスキームが、提案されたスキーム下でアプタマーを見出す実験の時間およびコストを有意に減少させることになる。 In the proposed invention, manipulation of candidate aptamer sequences is accomplished through the use of massively parallel DNA sequencing, such as techniques embodied in “second generation”, “next generation” or “third generation” DNA sequencers. Is done. The use of massively parallel sequencing allows candidate aptamers enriched for ligands to be quantified and sequenced in parallel, thereby adapting fitness measures derived from their abundance, and simultaneously to their sequences Provides information on other fitness measures from which it comes. In view of the performance of massively parallel sequencers, such a scheme will significantly reduce the time and cost of experiments to find aptamers under the proposed scheme.

したがって、合理的な介入（または人間による方向付け）の別の例は、研究下のアプタマー配列を検討およびカウントすることであり、特定の配列の表示を損なうシークエンサーそれ自体の効率の欠点が指摘される。これは後続する集団の組成にバイアスをかけることによって補償することができる。 Thus, another example of rational intervention (or human orientation) is to examine and count the aptamer sequences under study, pointing out the disadvantages of the sequencer itself that impairs the display of specific sequences. The This can be compensated by biasing the composition of the subsequent population.

したがって、好ましい実施形態において、アプタマーの可能性は、工程ｃ）において得られた各々の配列の定量によって測定される。理想的には、定量化は、配列の単一分子アレイ、または超並列様式で個別の分子を配列決定しカウントすることができる類似した装置を使用して実行される。特に、アレイ上の各々の配列を同定するのに十分な配列決定または部分的な配列決定は、各々の配列のカウントと組み合わせて実行して、定量化を達成する。あるいは、表面上にアレイ化された後に各々の配列を増幅し、クローンのアレイ上で配列を定量することができる。このような方法で、一次適応度測定値は、研究下の集団における所与の候補アプタマー配列の頻度を表現するアレイ上のカウントであり得る。いったん一次適応度が得られたならば、各々の候補アプタマーのヌクレオチド配列または分子組成に由来し、定量化の間に得られたバイオインフォマティクスデータ（類似したモチーフまたは二次構造などの）は、リガンドが濃縮された配列へのドリルダウンに加えて使用され、さらなる適応度基準を得ることができる。計算されているが組成依存的な特性は、それらを組み入れること、それらにバイアスをかけること、または消失させることのいずれかによって、次のラウンドのための候補アプタマーライブラリーの進化的生成において続いて利用することができる。 Thus, in a preferred embodiment, the aptamer potential is measured by quantification of each sequence obtained in step c). Ideally, quantification is performed using a single molecule array of sequences, or similar devices that can sequence and count individual molecules in a massively parallel fashion. In particular, sequencing or partial sequencing sufficient to identify each sequence on the array is performed in combination with a count of each sequence to achieve quantification. Alternatively, each sequence can be amplified after being arrayed on the surface and the sequence quantified on an array of clones. In this way, the primary fitness measure can be a count on the array that represents the frequency of a given candidate aptamer sequence in the population under study. Once primary fitness is obtained, bioinformatics data (such as similar motifs or secondary structures) derived from the nucleotide sequence or molecular composition of each candidate aptamer and obtained during quantification is Can be used in addition to drilling down into enriched sequences to obtain additional fitness criteria. The calculated but composition-dependent properties continue in the evolutionary generation of candidate aptamer libraries for the next round, either by incorporating them, biasing them, or eliminating them. Can be used.

本発明は、現在のところ利用可能な、超並列様式で、単一分子感度まで個別の配列を分離し定量する技術の能力を利用する。それは、複雑かつ多様な配列の集団の研究もまた可能にする。単一タンパク質に対してスクリーニングした場合、結合された画分においてより多量にあることが見出される候補アプタマー配列は、本方法が研究下のタンパク質に粘着する（結合する）候補アプタマー配列を探索する事実から、１コピーのみが結合される配列よりも高い適応度を有することになる。これはアプタマーを定義する一次特性である。タンパク質の複雑な混合物に対してスクリーニングした場合、定量化手順に由来する他の統計的特性がアプタマーを同定するために必要となる。超並列配列決定の利点は、各々およびすべての配列がカウントされ、従って定量化されることを可能にするということである。これは、例えば、分解能が同程度には高くなりえない抗体分析とは対照的である。 The present invention takes advantage of the capabilities of techniques currently available and in a massively parallel fashion to separate and quantify individual sequences to single molecule sensitivity. It also allows the study of complex and diverse populations of sequences. Candidate aptamer sequences that are found to be more abundant in the bound fraction when screened against a single protein are the fact that the method searches for candidate aptamer sequences that adhere (bind) to the protein under study Thus, it has a higher fitness than a sequence in which only one copy is combined. This is the primary characteristic that defines aptamers. When screened against complex mixtures of proteins, other statistical properties derived from the quantification procedure are required to identify aptamers. The advantage of massively parallel sequencing is that it allows each and every sequence to be counted and thus quantified. This is in contrast to, for example, antibody analysis where the resolution cannot be as high.

したがって、アプタマーのカウント、およびそれらの配列から生成された他の測定された統計、計算された統計、または過去の統計は、「適応度関数」へと使用され、組み合わせることができる。これは、ＳＥＬＥＸで具体化されたものよりも分子レベルで洗練されてより成功した進化的探索ヒューリスティックスの物理的な実行を可能にする。計算領域におけるかかる洗練されたヒューリスティックスのように、それは、所与の「適応度」の測定値自体の詳細よりもむしろ、かかる「適応度関数」自体を導出し、それを適用して重要な研究下の分子の集団の後続する反復を改良する能力である。なお、本発明は、多くのかかる調査を可能にし、これによってアプタマーの最適な一般的な「適応度」特性の学習を可能にすること、または例えば特定のクラスのタンパク質にコンテキスト特異的「適応度」測定値を導出することになる。 Thus, aptamer counts, and other measured, calculated, or historical statistics generated from their sequences can be used and combined into a “fitness function”. This allows the physical implementation of evolutionary search heuristics that are refined and more successful at the molecular level than those embodied in SELEX. Like such sophisticated heuristics in the computational domain, it is important research to derive and apply such “fitness function” itself rather than details of a given “fitness” measurement itself. The ability to improve subsequent iterations of the population of molecules below. It should be noted that the present invention allows many such investigations, thereby allowing learning of the optimal general “fitness” characteristics of aptamers, or context specific “fitness” for a particular class of proteins, for example. ”Will be derived.

候補配列およびモチーフについての情報が構築されるので、関連タンパク質または類似タンパク質に対するアプタマーの同定のための後の候補プールまたは新しいプールへ、特定の特色の知識を組み入れることができる。同様に、効果的でないことが見出された配列およびモチーフは、候補プールから積極的に除外することができる。 As information about candidate sequences and motifs is constructed, knowledge of particular features can be incorporated into later candidate pools or new pools for identification of aptamers to related or similar proteins. Similarly, sequences and motifs found to be ineffective can be actively excluded from the candidate pool.

また、異なるタンパク質についての選択性を備えたアプタマーが必要とされるならば、既知のタンパク質に対する既知のアプタマーの配列は、積極的に候補プールから除外することができる。これは、見出されたアプタマーがこのタンパク質に対する特異的な可能性が高いこと（有用なアプタマーのための主要な必要条件）を保証することを支援する。あるいは、特定のタンパク質ファミリーに対する選択性および特異性を提供する配列および／またはモチーフが選ばれるならば、これらの配列および／またはモチーフは、次にこれらのタンパク質のサブタイプについて選択的なアプタマーの同定に使用される候補アプタマープールへと組み入れることができる。 Also, if aptamers with selectivity for different proteins are required, known aptamer sequences for known proteins can be actively excluded from the candidate pool. This helps to ensure that the aptamers found are likely to be specific for this protein (a major requirement for useful aptamers). Alternatively, if sequences and / or motifs are selected that provide selectivity and specificity for a particular protein family, these sequences and / or motifs are then used to identify aptamers that are selective for these protein subtypes. Can be incorporated into the candidate aptamer pool used in

これらの利点のすべては、アプタマー配列の知識、それらの相対的な「適応度」に加えて、それらの配列または選択的なスキーム下でのそれらの定量的挙動に由来する他の計算可能な特徴を有することによって推進される。 All of these advantages include knowledge of aptamer sequences, their relative “fitness”, as well as other computable features derived from their sequence or their quantitative behavior under selective schemes. Is promoted by having

遺伝的アルゴリズムによって使用されるアルゴリズムのコード化スキームは、本発明の実際の組成的本質を反映する。これらのアルゴリズムは、コンピュータのメモリ内ではあるが、交差組換えおよび選択／増幅などの反復的な変化が行われる性質または数（またはビット）のアナロジーによって、探索問題を「ゲノム」へとコード化する。本発明に従って、これらのプロセスは研究下の実際の分子を使用して実施される。分子（この事例においてはポリヌクレオチド配列）自体は、探索に対する可能な解をコード化し、タンパク質に対するそれらの粘着性（定量化によって表されるように）などのそれらの特性（それらの配列中にもコード化される）の評価であり、それは本方法を推進する（１５）。したがって、本発明は遺伝的アルゴリズムの物理的な実施形態である。これは、超並列ＤＮＡ／ＲＮＡ配列決定の使用、およびさらにアプタマーはアプタマー特性が配列中にコード化される個々のポリマー分子であるという事実によって可能になる。 The algorithm encoding scheme used by the genetic algorithm reflects the actual compositional nature of the present invention. These algorithms encode the search problem into a “genome” with an analogy of the nature or number (or bits) of iterative changes such as cross-recombination and selection / amplification, even in computer memory To do. In accordance with the present invention, these processes are performed using the actual molecule under study. The molecules (polynucleotide sequences in this case) themselves encode possible solutions to the search and their properties such as their stickiness to proteins (as represented by quantification) (also in their sequences) Coded), which drives the method (15). Thus, the present invention is a physical embodiment of a genetic algorithm. This is made possible by the use of massively parallel DNA / RNA sequencing and the fact that aptamers are individual polymer molecules whose aptamer properties are encoded in the sequence.

より明確には、超並列配列決定の使用は、タンパク質標的または複数のタンパク質標的に対する特異的なアプタマーの同定のための最適化された半合理的な／セミランダムな進化的検索戦略の実行を可能にする。例えば、研究下のものに類似するタンパク質に対して効果的であると以前に示されている既知の配列モチーフは、初期ライブラリー生成の一部としておよび適応度選択の一部として、両方で優先的に選択することができる。このような方法で、これらのモチーフ（これらのモチーフはまた任意の解についての前提条件である）を含む、高品質の解または「適合する」解に向かって、本質的におよび都合好く、プロセスにバイアスをかける。しかしながら、ライブラリー生成における十分なランダム性およびモチーフの後続する突然変異変更を使用して、研究下のタンパク質を結合する新しいが類似した配列を見つけることを保証することができる。以前のタンパク質に対して既に見つけられていたものに類似するが十分に異なる配列が選択されるように、適応度関数はデザインすることができ、適切なアプタマーの選択における成功および特異性の両方を保証する。これは、多くのタンパク質に対して感受性があり特異的なアプタマーのカタログの経時的蓄積もまた促進する。１つのタンパク質に対して作動し、他のものには作動しない別個のアプタマーのかかる「非重複」亜集団を、続いて同時に使用して、複数のタンパク質を並列してプロービングし測定することができる。 More specifically, the use of massively parallel sequencing allows the execution of optimized semi-rational / semi-random evolutionary search strategies for the identification of specific aptamers against protein targets or multiple protein targets To. For example, known sequence motifs previously shown to be effective against proteins similar to those under study are preferred both as part of initial library generation and as part of fitness selection. Can be selected. In this way, essentially and conveniently towards a high-quality solution or a “fit” solution that includes these motifs, which are also a precondition for any solution, Bias the process. However, sufficient randomness in the library generation and subsequent mutational modification of the motif can be used to ensure finding new but similar sequences that bind the protein under study. The fitness function can be designed to select sequences that are similar but sufficiently different from those already found for the previous protein, and both success and specificity in selecting the appropriate aptamer. Guarantee. This also facilitates the accumulation of a catalog of aptamers that are sensitive and specific to many proteins over time. Such “non-overlapping” subpopulations of distinct aptamers that operate on one protein and not the other can then be used simultaneously to probe and measure multiple proteins in parallel. .

超並列配列決定は多重化を可能にするので、１を超えるのタンパク質に対するアプタマーライブラリーを、同時に生成およびスクリーニングすることができる。同様に、本発明は、高品質であるが（配列レベルで）異なる多くのアプタマーが、単一または複数のタンパク質標的に対して並列して選択されることを可能にする。 Since massively parallel sequencing allows multiplexing, aptamer libraries for more than one protein can be generated and screened simultaneously. Similarly, the present invention allows many aptamers of high quality but different (at the sequence level) to be selected in parallel against single or multiple protein targets.

さらに、超並列配列決定が各々の配列種の定量化を可能にし、各々の種が単一タンパク質を表現することができるので、提案された方法はゲノミクスを使用してプロテオミクスのパワーおよびダイナミックレンジを拡張する。 Furthermore, since the massively parallel sequencing allows quantification of each sequence species, and each species can represent a single protein, the proposed method uses genomics to reduce the power and dynamic range of proteomics. Expand.

本発明の特異的な実施形態において、デザインされた配列、ランダムな配列、または半デザインされた配列の多様なライブラリーは、生物学的材料（例えば血清、血漿）に由来する折り畳まれたタンパク質のインビトロのセットに対してスクリーニングおよび選択される。なお、配列のライブラリーは任意の方法によって生成されてもよい。 In a specific embodiment of the invention, a diverse library of designed, random, or semi-designed sequences can be used for folded proteins derived from biological material (eg, serum, plasma). Screened and selected against in vitro set. The library of sequences may be generated by any method.

タンパク質を結合しない配列は、除去または溶出される。これを達成するために、様々な既知の選択肢が利用可能である。例えば、単一タンパク質または複雑なタンパク質サンプルは、固体支持体上で固定化することができる。ストリンジェントな洗浄を実行して、未結合の配列および弱い結合の配列を除去することができる。別の選択肢は、タンパク質標的上の配列の可逆的な架橋（フォトアプタマー）の使用である。 Sequences that do not bind protein are removed or eluted. Various known options are available to accomplish this. For example, a single protein or complex protein sample can be immobilized on a solid support. Stringent washing can be performed to remove unbound and weakly bound sequences. Another option is the use of reversible cross-linking (photoaptamers) of sequences on protein targets.

残存する結合配列はそれらのタンパク質ホストから除去し、それらを配列決定およびカウントする場合は超並列シークエンサーを通す。 The remaining binding sequences are removed from their protein host and passed through a massively parallel sequencer when they are sequenced and counted.

超並列配列決定の具体的な例として、結合配列は、ビーズまたはチップなどの表面上でランダムにアレイ化される。配列は任意でサイクル的に増幅され、表面上のまたは個別のビーズ上の個々のｘ座標およびｙ座標でクローンの一本鎖分子のグループをもたらす。次に、適切なＤＮＡシークエンサーは、１サイクルあたり各々の相補的な配列上の１塩基の決定を可能にする試薬からなる段階的なケミストリーを実施して、リアルタイムで塩基の取り込みをモニタリングする。照射および画像化のシステムは、もとの候補の配列を得ることができるようにこのプロセスが撮影されることを可能にする。配列決定技術の詳細に関係なく、各々の結合された候補配列を反映する相補的な配列が構築される。典型的には、かかる技術は、４０００万〜３億のＤＮＡ断片を越えて最大長さ７５塩基対で配列決定することができ、技術が改良されるにつれてこれらの数は急速に増加している。このプロセスは、現在のところサンプル調製から配列決定アウトプットまで１〜３日未満しかかからず、これらの時間スケールは短縮している。 As a specific example of massively parallel sequencing, binding sequences are randomly arrayed on a surface such as a bead or chip. The sequence is optionally cyclically amplified, resulting in a group of clonal single-stranded molecules at individual x and y coordinates on the surface or on individual beads. A suitable DNA sequencer then performs stepwise chemistry consisting of reagents that allow the determination of one base on each complementary sequence per cycle to monitor base incorporation in real time. The illumination and imaging system allows this process to be filmed so that the original candidate sequence can be obtained. Regardless of the details of the sequencing technique, a complementary sequence that reflects each combined candidate sequence is constructed. Typically, such techniques can be sequenced at a maximum length of 75 base pairs beyond 40 million to 300 million DNA fragments, and these numbers are rapidly increasing as the technique is improved. . This process currently takes less than 1-3 days from sample preparation to sequencing output, and these time scales are shortening.

なお、「次世代ポリヌクレオチド配列決定」のカテゴリーへと分類される他の超並列方法が使用でき、かつ本発明は記載された特定の例に限定されない。「次世代ポリヌクレオチド配列決定」は、一般的には、２００４年に出現したＤＮＡ／ＲＮＡ配列決定プラットフォームについて記述するために作られた用語である。２００８年以来、改善された特徴を備えた配列決定プラットフォームの別の世代は、現在「第三世代」として表現される。それらの共通の特徴は、「サンガー」の配列決定に基づいた旧技術とは異なる配列決定ケミストリーを使用するということである。新しいプラットフォームは新しいケミストリーを使用し、一般的に非常に高いスループットおよびはるかに低いコストのものである。これは、非常に高い程度まで配列決定反応を並列化する能力を介して達成された。新しいプラットフォームは塩基を切り取る（分解する）ことによって機能するサンガー法とは異なり、合成または鎖の伸長（構築）によって、典型的には進行するが、これのみに限定されない。さらに、新しいプラットフォームは、非常に大きな塩基に対する分子の測定比率のＤＮＡシークエンサーを有するサンガー法とは異なり、少数の単一分子またはクローン性分子で作動する。 It should be noted that other massively parallel methods that fall into the category of “next generation polynucleotide sequencing” can be used and the invention is not limited to the specific examples described. “Next generation polynucleotide sequencing” is a term made to describe a DNA / RNA sequencing platform that generally emerged in 2004. Since 2008, another generation of sequencing platforms with improved features is now represented as “third generation”. Their common feature is the use of different sequencing chemistry than the old technology based on "Sanger" sequencing. New platforms use new chemistry and are typically of very high throughput and much lower cost. This was achieved through the ability to parallelize sequencing reactions to a very high degree. Unlike the Sanger method, where new platforms function by truncating (degrading) bases, they typically progress through synthesis or chain extension (construction), but are not limited to this. In addition, the new platform operates with a small number of single or clonal molecules, unlike the Sanger method, which has a DNA sequencer with a very high molecular to molecular ratio.

なお、ＤＮＡ塩基配列決定が言及されているが、ＲＮＡまたはＰＮＡ（ペプチド核酸）の配列に対してもまたＤＮＡ塩基配列決定技術を使用できる。したがって、次世代のシークエンサーまたは配列決定に対する言及は、ＤＮＡ、ＲＮＡおよびＰＮＡと同様に、本発明の方法における使用に適した核酸ベースのポリマーの他のすべての化学的バリアントおよびそれらの類似体を包含する。すべての新しい配列決定技術は、「次世代」または「第三世代」かどうかに関わらず、超並列かつ高度に効率的な方式で個別の分子またはそれらのクローン性のコピーを配列決定することができる。 Although DNA sequencing is mentioned, DNA sequencing techniques can also be used for RNA or PNA (peptide nucleic acid) sequences. Thus, references to next generation sequencers or sequencing encompass all other chemical variants of nucleic acid based polymers suitable for use in the methods of the invention and their analogs as well as DNA, RNA and PNA. To do. All new sequencing technologies can sequence individual molecules or their clonal copies in a massively parallel and highly efficient manner, whether they are “next generation” or “third generation”. it can.

「配列カウント」（すなわち複雑な生物学的サンプル（ｍｉＲＮＡなどの）中の配列の絶対的存在量または相対的存在量を定量する）の使用は、既に十分確立されており、かかる次世代プラットフォームまたは超並列プラットフォームについて記述される（１８）。効果的な超並列配列決定プラットフォーム上で、もとの候補配列に対して相補的な由来配列は高度に正確であり、強い構造要素であるホモポリマーおよびパリンドローム配列または他のモチーフを解消することができないような有意なシステマティック配列コンテキストバイアスを、たとえあったとしても、ほとんど有さないはずである。いくつかのプラットフォーム上で、これは、大きい複雑なゲノム（それはかかるモチーフを含んでいる）の再配列決定によって既に確立されている。 The use of “sequence counts” (ie, quantifying the absolute or relative abundance of sequences in complex biological samples (such as miRNA)) is already well established, and such next generation platforms or A massively parallel platform is described (18). On an effective massively parallel sequencing platform, derived sequences complementary to the original candidate sequence are highly accurate, eliminating homopolymer and palindromic sequences or other motifs that are strong structural elements There should be little, if any, significant systematic sequence context bias that cannot. On some platforms, this has already been established by resequencing a large complex genome, which contains such a motif.

本発明の方法は、単一の単離されたタンパク質に対して実行することができる。あるいは、本方法は、タンパク質の混合物内に存在することが既知の単一タンパク質に対して実行することができる。 The methods of the invention can be performed on a single isolated protein. Alternatively, the method can be performed on a single protein known to exist in a mixture of proteins.

しかしながら、遺伝的アルゴリズムの長所のうちの１つは、亜集団がより広範囲の集団から進化できるということである。これは異なる基準に対して副選択することによって達成される。したがって、さらなる代替において、本方法は、混合物内の多くのタンパク質の問い合わせもまた同時に可能にする。本技術は多重化を可能にするので、これは次世代ＤＮＡ塩基配列決定技術の使用によって可能になる。 However, one of the advantages of genetic algorithms is that subpopulations can evolve from a wider population. This is achieved by sub-selecting against different criteria. Thus, in a further alternative, the method also allows interrogation of many proteins in the mixture at the same time. This technique allows for multiplexing, which is made possible by the use of next generation DNA sequencing techniques.

複数のタンパク質に対して候補アプタマーを探索する場合、候補配列の別個の集団の出現をモニタリングすることができる。次に、配列集団は、本発明における一般的なスキーム下で並列して分類および発達させることができる。例えば、単一タンパク質の異なる領域のための別個のアプタマーを同定することができる。なお、どの時点においても問い合わせられた集団の数は、配列決定アレイに加えて標的分子のダイナミックレンジによって限定される。 When searching for candidate aptamers for multiple proteins, the appearance of a distinct population of candidate sequences can be monitored. The sequence population can then be classified and developed in parallel under the general scheme in the present invention. For example, distinct aptamers for different regions of a single protein can be identified. Note that the number of populations queried at any time is limited by the dynamic range of the target molecule in addition to the sequencing array.

本発明の方法を使用して単一の標的またはタンパク質（例えばゲルから切除されたタンパク質）を問い合わせることができるが、複雑なタンパク質混合物を含む標的の混合物の分析に特に適している。用語「タンパク質の混合物」または「タンパク質混合物」は、２つまたは複数の異なるタンパク質（例えば上記２つ以上の異なるタンパク質またはそれらのアイソフォームを含む組成物）の混合物を一般的に指す。 Although the methods of the invention can be used to query a single target or protein (eg, a protein excised from a gel), it is particularly suitable for analysis of a mixture of targets, including complex protein mixtures. The term “protein mixture” or “protein mixture” generally refers to a mixture of two or more different proteins (eg, a composition comprising two or more different proteins or isoforms thereof).

好ましい実施形態において、本明細書で分析されるタンパク質の混合物は、約１０を越える、好ましくは約５０を越える、さらにより好ましくは約１００を越える、さらにより好ましくは約５００を越える異なるタンパク質、例えば約１０００を越えるまたは約５０００を越える異なるタンパク質などを含むことができる。例示的な複雑なタンパク質混合物は、生物学的サンプルまたはその一部分において存在するタンパク質のすべてまたは画分を含むが、これらに限定されない。 In preferred embodiments, the mixture of proteins analyzed herein is greater than about 10, preferably greater than about 50, even more preferably greater than about 100, even more preferably greater than about 500 different proteins, such as More than about 1000 or more than about 5000 different proteins can be included. Exemplary complex protein mixtures include, but are not limited to, all or a fraction of proteins present in a biological sample or portion thereof.

用語「生物学的サンプル」または「サンプル」は、一般的に本明細書において使用される時、生物学的ソースから得られて、非精製形態または精製形態の材料を指す。限定ではなく例として、サンプルは、ウイルス（例えば、原核生物ホストまたは真核生物ホストのウイルス）；原核生物細胞（例えば、細菌または古細菌（例えば、自由生活性原核生物もしくはプランクトン性原核生物、または原核生物を含むコロニーもしくはバイオフィルム））；インビボでもしくはインシトゥーで得られるか、またはインビトロで培養された真核生物細胞を含む真核生物細胞またはそのオルガネラ；真核生物の組織または生物（例えば、真核生物の組織または生物からの細胞含有サンプルまたは無細胞サンプル）から得ることができ；真核生物は、原生生物（例えば、原虫類または藻類）、真菌（例えば、酵母またはカビ）、植物および動物（例えば、哺乳類、ヒトまたは非ヒト哺乳類）を含むことができる。生体サンプルは、したがって、例えば、細胞、組織、生物、またはその抽出物を包含することができる。生物学的サンプルは、尿、唾液、喀痰、精液、乳汁、粘液、汗、糞便などの回収または採取、血液、脳脊髄液、間質液、眼内液（ガラス体液）または関節液の採取などのこれらに限定されない適切な方法によって、または組織生検、切除などによって、その生物学的ソースから（例えば、哺乳類、ヒトまたは非ヒト哺乳類などの動物から）好ましくは取り出すことができる。生物学的サンプルをさらに細分して、本発明における分析のためのタンパク質を得るために使用されるその一部分を単離または濃縮することができる。限定ではなく例として、多様な組織タイプは互いから分離することができ；特異的な細胞タイプまたは細胞表現型を、例えばＦＡＣＳ選別、抗体パニングならびにレーザー捕捉解剖などを使用して、サンプルから単離することができ；細胞は間質液から分離することができ、例えば、血球は血漿または血清から分離することができ；または同様なことができる。サンプルは、本発明の方法に直接適用することができるか、または使用前にさまざまな程度まで加工、抽出もしくは精製することができる。 The term “biological sample” or “sample”, as generally used herein, refers to material that is obtained from a biological source and is in an unpurified or purified form. By way of example and not limitation, the sample can be a virus (eg, a prokaryotic or eukaryotic host virus); a prokaryotic cell (eg, a bacterium or archaea (eg, a free-living prokaryotic or planktonic prokaryote, or Colonies or biofilms containing prokaryotes)); eukaryotic cells or organelles thereof, including eukaryotic cells obtained in vivo or in situ or cultured in vitro; eukaryotic tissues or organisms (eg, Eukaryotes can be obtained from eukaryotic tissues or cell-containing or cell-free samples); eukaryotes are protists (eg protozoa or algae), fungi (eg yeast or mold), plants and Animals (eg, mammals, humans or non-human mammals) can be included. A biological sample can thus include, for example, cells, tissues, organisms, or extracts thereof. Biological samples include collection or collection of urine, saliva, sputum, semen, milk, mucus, sweat, feces, etc., collection of blood, cerebrospinal fluid, interstitial fluid, intraocular fluid (glass body fluid) or joint fluid, etc. It can preferably be removed from its biological source (eg from an animal such as a mammal, human or non-human mammal) by any suitable method, including but not limited to, or by tissue biopsy, excision and the like. Biological samples can be further subdivided to isolate or concentrate portions thereof used to obtain proteins for analysis in the present invention. By way of example and not limitation, a variety of tissue types can be separated from each other; specific cell types or cell phenotypes are isolated from samples using, for example, FACS sorting, antibody panning and laser capture dissection Cells can be separated from interstitial fluid, for example, blood cells can be separated from plasma or serum; or the like. Samples can be applied directly to the methods of the invention or can be processed, extracted or purified to varying degrees prior to use.

サンプルは、健康な被験者、または病態、障害、疾患もしくは感染を患う被験者に由来しうる。例えば、被験者は、健康な動物（例えば、ヒトまたは非ヒト哺乳類）、または癌、炎症性疾患、自己免疫性疾患、代謝疾患、中枢神経系疾患、眼疾患、心臓疾患、肺疾患、肝臓疾患、胃腸疾患、神経変性疾患、遺伝性疾患、感染症もしくはウイルス感染、または他の病気（複数可）を有する動物（例えば、ヒトまたは非ヒト哺乳類）であり得るが、これらに限定されない。 The sample can be from a healthy subject or a subject suffering from a disease state, disorder, disease or infection. For example, the subject may be a healthy animal (eg, a human or non-human mammal), or cancer, inflammatory disease, autoimmune disease, metabolic disease, central nervous system disease, eye disease, heart disease, lung disease, liver disease, It can be, but is not limited to, an animal (eg, a human or non-human mammal) with gastrointestinal disease, neurodegenerative disease, genetic disease, infection or viral infection, or other disease (s).

好ましくは、プロテオーム解析の感度およびパフォーマンスを増加させるために、生物学的サンプルに由来するタンパク質混合物を処理して、高度に多量にあるタンパク質をそれから枯渇させることができる。例として、ヒトの血清サンプルまたは血漿サンプルなどの哺乳類サンプルは、多量にあるタンパク質、とりわけアルブミン、ＩｇＧ、アンチトリプシン、ＩｇＡ、トランスフェリン、ハプトグロビンおよびフィブリノーゲンを含み、これらは好ましくはサンプルからそのように枯渇されていてもよい。多量にあるタンパク質の除去のための方法およびシステムは、例えば、免疫親和性枯渇などが既知であり、例えば、マルチプル・アフィニティ・リムーバル・システム（Multiple Affinity Removal System）（ＭＡＲＳ−７、ＭＡＲＳ−１４）が、アジレント・テクノロジー（Agilent Technologies）社（サンタクララ、カリフォルニア）からしばしば市販で入手可能である。 Preferably, in order to increase the sensitivity and performance of proteome analysis, a protein mixture derived from a biological sample can be processed to deplete a high amount of protein therefrom. By way of example, mammalian samples such as human serum samples or plasma samples contain abundant proteins, especially albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin and fibrinogen, which are preferably so depleted from the sample. It may be. Methods and systems for the removal of abundant proteins are known, such as immunoaffinity depletion, for example, Multiple Affinity Removal System (MARS-7, MARS-14) Are often commercially available from Agilent Technologies (Santa Clara, Calif.).

本発明はタンパク質のための特異的なアプタマーの同定に対する特定の適用を有するが、本発明は代謝物質および可能な低分子および生物学的治療剤などの他の分子の問い合わせのための適用もまた有することが認識されるべきである。 Although the present invention has particular application to the identification of specific aptamers for proteins, the present invention also applies to interrogation of other molecules such as metabolites and possible small molecules and biotherapeutic agents. It should be recognized that it has.

上述された方法は、配列決定を介するアプタマー配列の定量に着目する。単一のタンパク質のための特異的なアプタマーのライブラリーを生成する目的のために、ライブラリーが完成するまでアプタマーの配列を具体的に調べる必要はない。したがって、本方法はアプタマー配列の最終プールのみの配列を得ることをさらに含むことができる。 The method described above focuses on the quantification of aptamer sequences via sequencing. For the purpose of generating a library of specific aptamers for a single protein, it is not necessary to specifically examine the aptamer sequence until the library is complete. Thus, the method can further comprise obtaining a sequence of only the final pool of aptamer sequences.

本方法の有意な利点は、必要とされる材料の量は少ない（ピコモル、さらにはフェントモル）ということである。４つの塩基（Ａ、Ｃ、ＧおよびＴ）を使用して、４０ｍｅｒのＤＮＡアプタマー配列が調製されるならば、可能性として１．２×１０^２４の組合せが利用可能である。したがって、１モル（６．２×１０^２３）は、最もよい場合ですべての可能な配列の５％のみを含み得、かかる調製物は重さ１ｋｇを超えることになる。１ピコモルはおよそ１×１０^１１分子を含み、したがって１ｍｌのピコモーラーのユニークなアプタマーは１×１０^８分子を含むことになる。２３の可変塩基対および／またはランダム塩基対を残して、４０ｍｅｒ中の１７塩基対がすなわち「モチーフ」として固定されるならば、１ピコモルは最大約７０コピーの各々の可能な配列を含むので、１〜１０ｍｌのピコモーラー溶液は１〜およそ１０コピーの間の各々の配列を含むはずである。したがって、定量化装置（すなわち超並列シークエンサー）は、感度において、タンパク質についてのアプタマーの親和性および未修飾生物学的サンプルにおける大部分のタンパク質の生来の濃度の両方を上回ることができ、かつ上回るはずである。 A significant advantage of this method is that the amount of material required is small (picomoles, even fentomoles). If a 40-mer DNA aptamer sequence is prepared using 4 bases (A, C, G and T), then a possible 1.2 × 10 ²⁴ combination is available. Thus, one mole (6.2 × 10 ²³ ) may contain only 5% of all possible sequences in the best case, and such a preparation would exceed 1 kg in weight. One picomole contains approximately 1 × 10 ¹¹ molecules, so a 1 ml picomolar unique aptamer will contain 1 × 10 ⁸ molecules. If 17 base pairs in a 40mer are fixed as a “motif”, leaving 23 variable base pairs and / or random base pairs, 1 picomole contains up to about 70 copies of each possible sequence, so A 1-10 ml picomolar solution should contain between 1 and approximately 10 copies of each sequence. Thus, a quantifier (ie, a massively parallel sequencer) can and should exceed both the aptamer affinity for the protein and the native concentration of most proteins in the unmodified biological sample in sensitivity. It is.

イルミナ（Illumina）社（登録商標）によって販売されるものなどの次世代配列決定装置は、典型的には、画像化領域あたり２５０００配列程度をきれいに抽出することができる表面上に３３０×８画像化領域を有している。したがって、現在のところ、１稼動および１チップあたり約１×１０^８の４０ｍｅｒ配列を問い合わせることが可能であるはずである。したがって、本技術は、溶液中のピコモーラーアプタマーの１：１観察を可能にするはずである。したがって、本技術は、生来のサンプル中のタンパク質のダイナミックレンジにわたるダイナミックレンジを有する。かかる配列決定技術のパフォーマンスは経時的に改良されている。 Next generation sequencing devices such as those sold by Illumina® typically image 330 × 8 on a surface that can cleanly extract as many as 25000 sequences per imaging area. Has an area. Therefore, it should currently be possible to query about 1 × 10 ⁸ 40mer arrays per run and per chip. Therefore, this technique should allow 1: 1 observation of picomolar aptamers in solution. Thus, the technology has a dynamic range that spans the dynamic range of proteins in the native sample. The performance of such sequencing techniques has improved over time.

いったん特定のタンパク質のための特異的なアプタマーのライブラリーまたは集団が開発されたならば、アプタマーの配列を使用して、特異的なタンパク質のためのアプタマー集団の構造、機能および結合の特徴の合理的な知識を構築することができ、またはその逆も成立する。今度は、これを使用して、タンパク質をさらに調べ、アプタマーライブラリーの他のパラメーターおよび／または特色を改良することができる。これらの改良は、初期ライブラリー生成の間、および／または提案されたスキーム下で反復的な選択のラウンドの間のいずれかで、適用することができる。実際は、経時的に、同定されたアプタマーとタンパク質との間の配列／構造的関係が構築され、初期アプタマーライブラリーのデザインへのインプットとして使用して、異なるモチーフによりアプタマー空間の異なる一部分および他の一部分を調査することができる。 Once a specific aptamer library or population for a particular protein has been developed, the sequence of the aptamer can be used to rationalize the structure, function and binding characteristics of the aptamer population for a specific protein. Knowledge can be built, or vice versa. In turn, this can be used to further investigate the protein and improve other parameters and / or features of the aptamer library. These refinements can be applied either during initial library generation and / or during a round of iterative selection under the proposed scheme. In fact, over time, sequence / structural relationships between identified aptamers and proteins are built and used as input to the design of the initial aptamer library, with different motifs and other parts of the aptamer space A part can be investigated.

かかる情報を使用して、対象となる初期タンパク質に類似した構造を有するタンパク質の評価をすることもできる。このような方法で、特定のクラスのタンパク質に対するアプタマーを見つけて、類似するタンパク質は、同じクラスにおけるタンパク質の予備的知識に基づいて合理的に選択された出発アプタマーライブラリーの利用によって、調査および問い合わせることができる。１つのタンパク質、すなわち特定のドメイン、のためのライブラリー配列の知識はまた、関連するタンパク質および仮想上のタンパク質ファミリーの他のメンバーのためのアプタマープールのデザインの支援に使用することができるか、または類似するアプタマー配列が類似するタンパク質の表面に結合する特徴を実証および測定することができる。 Such information can also be used to evaluate proteins having a structure similar to the initial protein of interest. In this way, aptamers for a particular class of proteins are found and similar proteins are investigated and queried by using a starting aptamer library that is reasonably selected based on prior knowledge of proteins in the same class. be able to. Can the knowledge of library sequences for one protein, a particular domain, also be used to assist in the design of aptamer pools for related proteins and other members of the hypothetical protein family, Alternatively, the ability of similar aptamer sequences to bind to the surface of similar proteins can be demonstrated and measured.

廃棄されたアプタマー配列も、情報のために掘り起こすことができる。例えば、これらの配列の統計的特性および計算上の特性（例えば構造）を使用して、弱いアプタマーの一般的な特性を特徴づけることができる。初期アプタマーライブラリーをデザインする場合、かかる情報は有用な知識を提供し、配列レベルとしての初期ライブラリーのランダム性の限定への近道を可能にする。 Discarded aptamer sequences can also be mined for information. For example, the statistical and computational properties (eg, structure) of these sequences can be used to characterize the general properties of weak aptamers. When designing an initial aptamer library, such information provides useful knowledge and allows a shortcut to limiting the randomness of the initial library as a sequence level.

代替の実施形態において、中間の工程は、アプタマータグ（配列）をバイオインフォマティクス的に調べて、それらを合理的に改良することであり、したがって多様であるが高度に特異的なタグに向けた集団の進化的進行を保証することができる。バイオインフォマティクスデータは、特定の二次構造および三次構造をもたらす配列、および候補配列とタンパク質との間の配列相補性を含む。 In an alternative embodiment, the intermediate step is to examine aptamer tags (sequences) bioinformatically and rationally improve them, and thus a population directed towards diverse but highly specific tags Can guarantee the evolutionary progression. Bioinformatics data includes sequences that yield specific secondary and tertiary structures, and sequence complementarity between candidate sequences and proteins.

以前に略述されたように、特徴は、所与のアプタマー配列、およびタンパク質ファミリー、折り畳みドメインなどのような他の生物学的データのために、計算、集計、関連づけ、および保存することができる。これらのバイオインフォマティクス特徴を使用して、初期ライブラリー生成を改良することができる。また、それらは、各々のステージで集団の特徴を形作りかつ改善するために、候補アプタマー配列の選択ラウンド間の「適応度関数」の一部として使用することができる。 As outlined previously, features can be calculated, aggregated, related, and stored for a given aptamer sequence and other biological data such as protein families, folding domains, etc. . These bioinformatics features can be used to improve initial library generation. They can also be used as part of a “fitness function” between selection rounds of candidate aptamer sequences to shape and improve the characteristics of the population at each stage.

血液中のタンパク質は、疾患状態のマーカーの同定および薬物治療についての特定の標的である。血液中のタンパク質の量および／またはコンフォーメーションが、内因性の天然の変動を上回る方式で、かかる状態に統計的に関連するはずであることが広く想定されている。血液および他の体液は、影響を受けた組織を浸しており、生命に必要なタンパク質を輸送し、医学相談の間に比較的安価で簡単な手順を使用して検査のために得ることができるので、血液および他の体液は特定の標的である。 Proteins in the blood are specific targets for identification of disease state markers and drug treatment. It is widely assumed that the amount and / or conformation of proteins in the blood should be statistically related to such conditions in a manner that exceeds endogenous natural fluctuations. Blood and other bodily fluids immerse affected tissues, transport vital proteins, and can be obtained for testing using relatively inexpensive and simple procedures during medical consultations So blood and other body fluids are specific targets.

しかしながら、血液中のタンパク質は非常に広い範囲の濃度を有し、少数のタンパク質がすべてのタンパク質のうちの９９．９％以上を占め、残りが１ミリリットルあたりピコグラム〜ミリグラムの分布を占める（１９）。したがって、高存在量のタンパク質の小集団は、タンパク質混合物中に同様に存在する生命に必要であるがまれなタンパク質を隠してしまう。 However, proteins in the blood have a very wide range of concentrations, with a small number of proteins accounting for more than 99.9% of all proteins and the rest accounting for a picogram to milligram distribution per milliliter (19) . Thus, a small population of high abundance proteins hides the necessary but rare proteins for life that are also present in the protein mixture.

本発明の方法は、低い存在量または非常に低い存在量で生物学的サンプル中に存在するタンパク質へのアプタマーの同定を可能にする。したがって、さらなる実施形態において、本方法は、反復の初期（第１および／または第２）ラウンドおよび／または他のラウンドにおいて見出される非常に高度に多量にある候補配列を除去することをさらに含む。配列決定アレイに「スロット」の有限数があるとすれば、高度に多量にある候補配列は全部ではないが大部分のスロットを占め、それによって任意のそれほど多量にない候補配列を隠し、潜在するタンパク質のダイナミックレンジの歪んだ概観図を提供することになる。配列プールから、非常に高度に多量にある候補配列を、絶対的または相対的な意味のいずれかにおいて、取り除くことによって、次に、高度に多量にあるタンパク質のための特異的な候補配列はもはや問い合わせることができず、たとえタンパク質が混合物中になお存在しても、複雑な混合物内のタンパク質のこの特定の集団は効果的に無視される。 The method of the present invention allows the identification of aptamers to proteins present in biological samples in low or very low abundance. Thus, in a further embodiment, the method further comprises removing a very high amount of candidate sequences found in the initial (first and / or second) round and / or other rounds of iteration. Given that there is a finite number of “slots” in the sequencing array, a highly abundant candidate sequence occupies most if not all slots, thereby hiding and potentially occupying any less abundant candidate sequence It will provide a distorted overview of the dynamic range of the protein. By removing very high abundance candidate sequences from the sequence pool, either in absolute or relative sense, then specific candidate sequences for highly abundant proteins are no longer This particular population of proteins within a complex mixture is effectively ignored, even though it cannot be interrogated and the protein is still present in the mixture.

１つの高度に多量にあるタンパク質へ適応する配列の除去は、別の高度に多量にある候補配列を示すか、またはより少量の標的に対する候補配列を示すことになる。 Removal of a sequence that accommodates one highly abundant protein will indicate another highly abundant candidate sequence or a candidate sequence for a smaller amount of target.

これは、より少量のタンパク質に向けたアプタマー配列を見出すことにバイアスをかける合理的な介入の別の例である。他の平衡選択（同じ反復の複製間の推定上の低存在量のアプタマーについての低い分散、および／または特定の既知の配列モチーフの優先性など）もまた適応度関数において必要とされることになる。これは、非特異的配列またはランダム配列のバックグラウンドノイズから低存在量だが「適応する」配列が選択されることの保証を支援することになる。 This is another example of a rational intervention that biases in finding an aptamer sequence directed to a smaller amount of protein. Other equilibrium choices (such as low variance for putative low abundance aptamers between duplicates of the same repeat and / or preference for certain known sequence motifs) may also be required in the fitness function Become. This will help ensure that low abundance but “adapted” sequences are selected from background noise of non-specific or random sequences.

この追加の工程の利点は、非常に高度に多量にあり共通のタンパク質を除去するタンパク質混合物操作が回避されるということである。このような方法で、タンパク質混合物はより忠実に天然サンプルに基づき、推論によって、混合物に由来するタンパク質についての任意の情報は各々のタンパク質の天然の状況をより正確に反映する可能性がより高い。タンパク質の天然状態のこの使用は、それが化学的修飾または天然で一般的な他の翻訳後修飾を有するタンパク質に向けた特異的なアプタマーの選択を可能にするという追加の長所もまた有する。 The advantage of this additional step is that protein mixture manipulations that are very highly abundant and remove common proteins are avoided. In this way, the protein mixture is more faithfully based on natural samples, and by reasoning, any information about the protein derived from the mixture is more likely to more accurately reflect the natural status of each protein. This use of the native state of the protein also has the added advantage that it allows the selection of specific aptamers directed to proteins with chemical modifications or other post-translational modifications that are common in nature.

高度に多量にある候補アプタマーは単純に無視することができるか、またはプールからその集団を引くことができるかまたは除去することができる。除去は、例えば、多量にある配列に対して相補的なプローブを含む固体支持体上でのハイブリダイゼーションによって達成することができる。また除去は、高度に多量にあるタンパク質と結合する配列を例外として、「適応度」を暗示する望ましい特徴のすべてを備えた新しい配列のプールを生成するようにＤＮＡシンセサイザーを適切にプログラミングことによって、後続する反復から特定の候補配列を除外するという点で、合理的な介入の例である可能性がある。次にこの新しいプールは、本発明を含む一般的なスキーム下で、後続する選択のラウンドへと進められる。 Highly abundant candidate aptamers can simply be ignored, or the population can be subtracted or removed from the pool. Removal can be achieved, for example, by hybridization on a solid support containing probes that are complementary to a large number of sequences. Removal is also done by appropriately programming the DNA synthesizer to generate a pool of new sequences with all of the desirable features that imply "fitness", with the exception of sequences that bind to a high amount of protein. It may be an example of a reasonable intervention in that it excludes certain candidate sequences from subsequent iterations. This new pool is then advanced to a subsequent round of selection under the general scheme involving the present invention.

次に、第１の工程から最も量が少ない候補配列のパネルのみが注目され、続いて使用することができる。したがって、シークエンサーの定量的パワーは非常に少量の候補配列、および、代わりに、タンパク質混合物中に低存在量で存在するタンパク質に注目することができる。これは、従来のＭＳ（質量分析）に基づいた技術またはＳＥＬＥＸのいずれでも現在のところ可能ではない。 Next, only the panel of candidate sequences with the least amount from the first step is noted and can be used subsequently. Thus, the quantitative power of the sequencer can focus on very small amounts of candidate sequences and, alternatively, proteins that are present in low abundance in the protein mixture. This is currently not possible with either conventional MS (mass spectrometry) based techniques or SELEX.

高度に多量にある候補配列の反復的なサブトラクションは、より低存在量の候補配列の問い合わせを可能にするはずである。任意の結合が選択的であることおよび単に偶然に起こらないことを保証するために、候補配列に既知の量の既知のアプタマー配列を添加し、既知のアプタマーが既知の量で存在して結合するタンパク質を含むタンパク質の混合物に対して、配列を作動させる。あるいはまたはさらに、多数の複製にわたって低分散であることは、問い合わせ下の候補配列がタンパク質に高度に結合（粘着）する可能性が高いことを意味するはずである。 Iterative subtraction of candidate sequences that are highly abundant should allow querying of lower abundance candidate sequences. To ensure that any binding is selective and does not just happen by chance, a known amount of known aptamer sequence is added to the candidate sequence and the known aptamer is present and binds in a known amount The sequence is operated on a mixture of proteins, including proteins. Alternatively or additionally, low dispersion across multiple replicates should mean that the query candidate sequence is highly likely to bind (stick) to the protein.

したがって、本発明の方法は、プロテオミクスの主要な問題のうちの１つ（すなわち、どのように高度に多量にあるタンパク質に対処し取り組む）を検討する。これはこれらのタンパク質に結合する候補配列を合理的に除外することによって達成される。これらのタンパク質は、後続する測定および選択のラウンドにおいて無視される。突然変異／交差のプロセスは、高い存在量のタンパク質に結合するいくつかの候補配列の再出現を引き起こすことができるが、これらの配列もまた以前に記述されるようなプロセスの各々の反復で合理的に消失させることができる。少量の候補配列（反復間のそれらの測定上で強い配列特性および少ない分散を備えた）の注意深い選択は、少量のタンパク質に特異的に感度よく結合する可能性のあるアプタマーの集団（またはサブセット）上への収束を可能にするはずである。 Thus, the method of the present invention considers one of the major problems of proteomics (ie, how to deal with and address highly abundant proteins). This is accomplished by rationally excluding candidate sequences that bind to these proteins. These proteins are ignored in subsequent rounds of measurement and selection. The mutation / crossover process can cause the reappearance of several candidate sequences that bind to high abundance proteins, but these sequences are also rational in each iteration of the process as previously described. Can be eliminated. Careful selection of small amounts of candidate sequences (with strong sequence characteristics and low variance on their measurement between repeats) allows the population (or subset) of aptamers to bind specifically and sensitively to small amounts of protein It should allow convergence to the top.

さらなる実施形態において、候補配列をカウントしそれらを対照集団に比べることによって、各々の反復における候補配列の存在量範囲もモニタリングすることができる。これをさらに使用して、適応度ベースの選択を改良することができる。例としては、選択の第一ラウンドの間で、選択された配列の集団は初期のスクリーニングされていないライブラリー中に存在するものと比較される。特定の配列の配列組成および統計的な普及率の有意な変化（恐らくいくつかのモデルに合致して）は、相対的な成功を示しうる。初めのいくつかの選択のラウンドが、初期集団とは組成が有意に異なる候補配列プールを産生しないならば、プールおよび実験は失敗と判断されうる。 In further embodiments, the abundance range of candidate sequences in each iteration can also be monitored by counting candidate sequences and comparing them to a control population. This can be further used to improve fitness-based selection. As an example, during the first round of selection, the population of selected sequences is compared to that present in the initial unscreened library. Significant changes in the sequence composition and statistical prevalence of specific sequences (perhaps consistent with some models) can indicate relative success. If the first few rounds of selection do not produce a candidate sequence pool that is significantly different in composition from the initial population, the pool and experiment can be judged to have failed.

なお、いったん１つの候補アプタマー配列、または候補アプタマー配列のライブラリーが本発明の方法によって同定されたならば、配列がアプタマーとして検証されなくてはならない。これは、同じ元のサンプルの複数のコピーに対して、または複数の異なるサンプルにわたって、候補配列の結合および定量化を複製することによって達成することができる。サンプル間での複製が不十分であるかまたは変動が高いならば、恐らく候補配列は１つの標的について特異的ではない。かかる妥当性検証に適切な方法は同時係属中の欧州特許出願第０７０２００４９．８号中に記述されている。 It should be noted that once a candidate aptamer sequence or library of candidate aptamer sequences has been identified by the method of the present invention, the sequence must be verified as an aptamer. This can be achieved by replicating the binding and quantification of candidate sequences against multiple copies of the same original sample or across multiple different samples. If replication between samples is insufficient or the variability is high, the candidate sequence is probably not specific for one target. A suitable method for such validation is described in co-pending European Patent Application No. 070200498.8.

本発明を、ここで、非限定例によりさらに記述する。 The invention will now be further described by way of non-limiting examples.

（実施例１）
この実施例において、ＳＥＬＥＸとの比較のために、１セットのアプタマーを、溶液中に存在するかまたは固定化された単一のタンパク質に対して捜索する。 Example 1
In this example, for comparison with SELEX, a set of aptamers is searched for a single protein present in solution or immobilized.

ＤＮＡまたはＲＮＡなどのポリヌクレオチド配列の集団は、モチーフまたはパターンとして表現される。例えば、配列ＧＧＣＴおよびＣＣＧＡは１つのパターンＧＧＣ（Ａ／Ｔ）によって表現することができ、ここで「／」は「または」を意味する。ＩＵＰＡＣ一文字コードの高度なセットは、共通性が強調された配列の多重表示を可能にする配列パターンを表現するように存在する（２０）。 A population of polynucleotide sequences such as DNA or RNA is expressed as a motif or pattern. For example, the sequences GGCT and CCGA can be represented by one pattern GGC (A / T), where “/” means “or”. An advanced set of IUPAC single letter codes exists to represent sequence patterns that allow multiple displays of sequences with emphasized commonality (20).

この実施例において、ＤＮＡアプタマーモチーフが選択され、セミランダムライブラリーが既知の技術を使用して合成される（１１）。これは「候補ライブラリー」と呼ばれる。ライブラリーの多様性は合成の可能な任意の範囲中であり得る。この実施例において、合理的な予備的知識を使用して、自己類似性およびアニーリングによって特定の二次構造を有するＤＮＡ配列を支援する。対照ライブラリーも同じ長さの純粋にランダムな配列から構築される。これは「ランダムライブラリー」と呼ばれる。 In this example, DNA aptamer motifs are selected and a semi-random library is synthesized using known techniques (11). This is called a “candidate library”. The diversity of the library can be within any possible range of synthesis. In this example, reasonable prior knowledge is used to assist DNA sequences with specific secondary structure by self-similarity and annealing. A control library is also constructed from purely random sequences of the same length. This is called a “random library”.

次に、従来のＳＥＬＥＸスキームにおいて具体化されるように、非結合アプタマー配列が廃棄されるように両方のライブラリーは選択された条件下で単一タンパク質に対してスクリーニングされる。 Next, as embodied in the conventional SELEX scheme, both libraries are screened against a single protein under selected conditions such that unbound aptamer sequences are discarded.

生き残ったアプタマーＤＮＡ配列（すなわちタンパク質に結合した配列）は、タンパク質からアプタマーの結合をはずした後に、適切な分解能およびダイナミックレンジの次世代ＤＮＡシークエンサーでアレイ化される。この手順は、結果として生じる測定の統計的有意性の改良のために複製される。 Surviving aptamer DNA sequences (ie, protein bound sequences) are arrayed with a next generation DNA sequencer of appropriate resolution and dynamic range after unbinding the aptamer from the protein. This procedure is replicated to improve the statistical significance of the resulting measurement.

生き残ったライブラリー中に存在するアプタマーが同定されカウントされるように、アレイ上のすべての分子を配列決定およびカウントする。アレイ上に存在する配列の数は、通常の統計的サンプリング手順に従って、起源とする集団中の分子の比率を表すはずである。測定の正確性および任意の統計的サンプリング問題（特に、よりまれなアプタマーとの）は、各々のライブラリーの複製からの分散の測定によって確認することができる。 All molecules on the array are sequenced and counted so that aptamers present in the surviving library are identified and counted. The number of sequences present on the array should represent the proportion of molecules in the originating population according to normal statistical sampling procedures. The accuracy of the measurement and any statistical sampling issues (especially with the rarer aptamers) can be confirmed by measuring the variance from each library replica.

初期の選択されていないライブラリーに対して選択されたアプタマーの第１の反復の差異的な比較が行われる。これは、ライブラリー合成の成功および効率／多様性の評価もまた可能にする。同様に、アプタマーの純粋にランダムなライブラリーのその選択されていない相当物に対する差異的解析が実行される。選択／暴露されたセミランダムライブラリーが、暴露されない／ランダムなライブラリーとは有意に異ならなければ、それは、アプタマー選択が適切でなく、ライブラリーが見込みがないと結論することができる。この時点で、このライブラリーのトライアルを停止することができ、配列レベルで異なる特徴を有する別のライブラリーを合成することができる。 A differential comparison of the first iteration of the selected aptamer is performed against the initial unselected library. This also allows for successful library synthesis and evaluation of efficiency / diversity. Similarly, a differential analysis is performed on its unselected counterpart of a purely random library of aptamers. If the selected / exposed semi-random library is not significantly different from the non-exposed / random library, it can be concluded that aptamer selection is not appropriate and the library is not promising. At this point, the trial of this library can be stopped and another library with different characteristics at the sequence level can be synthesized.

候補ライブラリーが、第１の反復の後に、暴露されない候補ライブラリーおよび／またはランダムライブラリーと十分に相違するならば、当該ライブラリーは使用され、さらに発達されることになる。 If the candidate library is sufficiently different from the unexposed candidate library and / or random library after the first iteration, the library will be used and further developed.

任意のサイズの初期アプタマープールから、１０^７〜１０^９のアプタマーの範囲のサンプルがアレイ上で配列決定されることになる。シークエンサーの未来の世代はより高いダイナミックレンジを可能にすることになる。初期候補ライブラリー多様性ならびに以前の工程における配列決定およびカウント（および対照）の結果を考慮すると、アプタマーモチーフの品質は第１の反復の後に査定することができる。記述されるように、このアプタマー集団がもし有望であると判断されるならば進行させることができるか、または、あるいは、新しい初期候補モチーフライブラリーを生成することができる。例えば、生き残ったアプタマーの分布および分散が、対照から測定されるような初期ライブラリーとは非常に異なるならば、かかるライブラリーは進行のために好ましい。 From an initial aptamer pool of any size, samples ranging from 10 ⁷ to 10 ⁹ aptamers will be sequenced on the array. Future generations of sequencers will enable higher dynamic ranges. Given the initial candidate library diversity and the results of sequencing and counting (and controls) in previous steps, the quality of the aptamer motif can be assessed after the first iteration. As described, this aptamer population can be advanced if deemed promising, or alternatively, a new initial candidate motif library can be generated. For example, if the distribution and variance of surviving aptamers are very different from the initial library as measured from controls, such libraries are preferred for progression.

第１の反復は、実験条件および対照からのデータを提供する。 The first iteration provides data from experimental conditions and controls.

表１において、各々のアプタマー配列、その量および分散、二次構造などの配列に由来するバイオインフォマティクス特性、ならびに前述の「適応度関数」を介する配列適応度の全般的な測定値を計算することができる。使用において、表は何千または何万ものエントリーを有し、１つの行は各々のユニークなアプタマー配列のためのものである。列は、そのアプタマー配列についての総計された測定値を含む。示された表は、説明の目的のみのためである。 In Table 1, calculate the bioinformatics properties derived from each aptamer sequence, its amount and variance, the secondary structure and other sequences, and the overall measure of sequence fitness via the aforementioned “fitness function” Can do. In use, the table has thousands or tens of thousands of entries, one row for each unique aptamer sequence. The column contains the aggregated measurements for that aptamer sequence. The table shown is for illustrative purposes only.

複数のスコアからの適応度計算は、遺伝的アルゴリズムについての文献において十分記述されており（１４、１５）、単なる例示としては、複数の確率のベイズの組合せを含むことができる。任意の適切に計算可能な適応度の測定値は、配列の測定された存在量または配列それ自体（および任意の推論可能な特性）のいずれかに基づいて使用することができる。表１において提供される単純化された例において、モチーフＧＣＴは明らかに好まれ、特にＧＣＴＧは対照に対するその量によって非常に重きを置かれていることが示される。このような方法で、各々の候補アプタマー配列は、適応度測定値の適用によって集計され、アプタマーとしてその可能性について査定することができる。適応度の測定値として得られるスコアは、所与の候補配列が次の反復に生き残るかどうかを査定するのに使用され、配列が増殖（交差）を可能にするか、またはその配列組成を変化させる突然変異プロセスを行うかどうかの決定にも使用することができる。これらのプロセスを適用した後に、他の新しいアプタマー配列は、もとのものに加えて導かれうる。 Fitness calculations from multiple scores are well described in the literature on genetic algorithms (14, 15), and can include, by way of example only, multiple probability Bayes combinations. Any appropriately computable fitness measure can be used based on either the measured abundance of the sequence or the sequence itself (and any inferable properties). In the simplified example provided in Table 1, the motif GCT is clearly preferred, and in particular, it is shown that GCTG is heavily weighted by its amount relative to the control. In this way, each candidate aptamer sequence can be aggregated by application of fitness measurements and assessed for its potential as an aptamer. The score obtained as a measure of fitness is used to assess whether a given candidate sequence will survive the next iteration, allowing the sequence to grow (cross) or change its sequence composition It can also be used to determine whether to perform a mutation process. After applying these processes, other new aptamer sequences can be derived in addition to the original.

複製間の分散が低いことは、所与のモチーフまたは配列のタンパク質に対する特異性を実証する統計値の例でもあり得る。したがって実験的測定の複製を使用して、表１において例示された導き出された統計的測定値を改良することができる。本発明の重要な態様は、実験および学習によって適切な統計を導き出すことができるということである。 The low variance between replications can also be an example of a statistic that demonstrates the specificity of a given motif or sequence for a protein. Thus, a duplicate of experimental measurements can be used to improve the derived statistical measurements illustrated in Table 1. An important aspect of the present invention is that appropriate statistics can be derived by experimentation and learning.

このステージで集められた情報を基盤として、第一世代の結果からの測定された統計、バイオインフォマティクスおよび合理的分析に基づいた新しい合成アプタマー集団を生成することができる。この実施例において、純粋に図示のためであるが、ライブラリーは、ＳＥＬＥＸにおいて使用されるような従来の増幅よりもむしろ、ＤＮＡシンセサイザー上で再合成される。この再合成は、表１において示されるような、適応配列の集計に由来するモチーフの交差組合せのプログラミング、および他の位置ではなく特定の位置である程度の新しい配列のランダム突然変異を含むことができる。このような方法で、アプタマー集団中の配列多様性は、初期反復において適切に多様なレベルで維持される。従来の遺伝的プロセスに従って、以前の集団の最も適応する個体に由来する後続する新しい集団（表１）も「適応する」または「より適応する」はずである。集団の世代の適応度が収束するにつれて、集団多様性における介入は減少することになる。合理的なデザイン、突然変異および交差手順の使用によって、候補アプタマーの多様性は初期集団においても維持することができる。これは、最終的な解が、時期尚早の「極小」、またはかかるアルゴリズムの他の立証された実施形態において記述されているような他の探索停止条件（１５）に遭遇しないことを保証することになる。 Based on the information gathered at this stage, a new synthetic aptamer population can be generated based on measured statistics, bioinformatics and rational analysis from first generation results. In this example, purely for illustration, the library is re-synthesized on a DNA synthesizer rather than conventional amplification as used in SELEX. This resynthesis can include programming cross-combination of motifs derived from the aggregation of adaptive sequences, as shown in Table 1, and some new sequence random mutations at specific positions rather than other positions. . In this way, sequence diversity within the aptamer population is maintained at appropriately diverse levels in the initial iteration. In accordance with conventional genetic processes, subsequent new populations (Table 1) from the most adapted individuals of the previous population should also “adapt” or “more adapt”. As population generation fitness converges, intervention in population diversity will decrease. By using rational design, mutation and crossover procedures, the diversity of candidate aptamers can be maintained in the initial population. This ensures that the final solution does not encounter premature “minimum” or other search termination conditions (15) as described in other proven embodiments of such algorithms. become.

新しい候補ライブラリー（ｎ番目）はここで、研究下のタンパク質に対して暴露される。生き残り結合したアプタマーは溶出され、シークエンサー上に再アレイ化され、上記の一般的な評価手順が繰り返される。 The new candidate library (nth) is now exposed to the protein under study. Surviving bound aptamers are eluted, re-arrayed on the sequencer, and the general evaluation procedure described above is repeated.

候補ライブラリーの以前の配列決定反復（ｎ−１）は、新しい集団が配列決定およびカウントされるにつれて、先に進められる対照になりえ、表１中の値と置き換わる現在の反復（ｎ）について生成された表を結果として生じる。特定の配列が急速に優位になることが見られ、現在の反復の複製との間と同様に、直前の反復（ｎ−１）に比べた存在量および分散に反映される。同様に、この傾向が見られなくてもよい。後者のシナリオにおいて、「突然変異」（合成の間のアプタマーに対するランダムな変化）の数を増加させて研究下の配列の集団の進化および多様性を促進することができる。どちらにしても、解（すなわちアプタマー）の安定したセットに向けた集団の漸近収束は、必要であるならば、そして必要に応じて、追跡および最適化することができる。かかる原理は、集団が、適応度および組成の安定したプラトーに向かう上昇速度を最大限にできることも保証する。 The previous sequencing iteration (n-1) of the candidate library can be a forwarded control as the new population is sequenced and counted, for the current iteration (n) that replaces the values in Table 1. The resulting table results. It can be seen that certain sequences rapidly dominate and are reflected in abundance and variance relative to the previous iteration (n-1), as well as during replication of the current iteration. Similarly, this tendency may not be seen. In the latter scenario, the number of “mutations” (random changes to the aptamer during synthesis) can be increased to promote the evolution and diversity of the population of sequences under study. In any case, the asymptotic convergence of the population towards a stable set of solutions (ie aptamers) can be tracked and optimized if necessary and as needed. Such a principle also ensures that the population can maximize the rate of climb towards a plateau with a good fitness and composition.

候補解（アプタマー）の安定した集団が達成されるまで、候補アプタマーの集団の反復が実行される。後の反復において、いったん特定の配列モチーフの優位性が明瞭になれば、配列を再合成するというよりはむしろ、反復間の配列を増幅することが必要となるまたは望ましい可能性がある。あるいは、合成および増幅ラウンドを交互に使用することができる。 An iteration of the candidate aptamer population is performed until a stable population of candidate solutions (aptamers) is achieved. In later iterations, once the superiority of a particular sequence motif becomes clear, it may be necessary or desirable to amplify the sequence between repeats rather than re-synthesize the sequence. Alternatively, synthesis and amplification rounds can be used alternately.

結果として生じる配列は研究下のタンパク質のための強いアプタマーであり、複数の同様に存続可能なアプタマーを見出せることが期待される。各々の存続可能なアプタマーは、配列レベルで構造的に異なってもよいし、異なっていなくてもよい。 The resulting sequence is a strong aptamer for the protein under study and is expected to find multiple similarly viable aptamers. Each viable aptamer may or may not be structurally different at the sequence level.

いったん１つまたは複数の適切なアプタマーが同定されたならば、次に、例えばヌクレオチドアナログを使用してアプタマーを化学的に修飾し、安定性および他の望ましい特性を増加させることができる。次に、同じ手順または他の手順を使用して集団を再スクリーニングし、各々のアプタマーの特異性および他の特性を確認することができる。 Once one or more suitable aptamers have been identified, the aptamers can then be chemically modified, for example using nucleotide analogs, to increase stability and other desirable properties. The population can then be rescreened using the same or other procedures to confirm the specificity and other characteristics of each aptamer.

したがって、所与のタンパク質について、そのタンパク質のためのアプタマーとして作用することが示された配列のライブラリーを構築することができる。最も重要なことには、配列の組成に対する変化が、ランダムおよび合理的の両方で導かれた反復の間に可能にされたので、初期候補アプタマーライブラリー中に必ずしも存在しなかった強い解の集団を見出すことができる。 Thus, for a given protein, a library of sequences shown to act as aptamers for that protein can be constructed. Most importantly, a population of strong solutions that did not necessarily exist in the initial candidate aptamer library because changes to the composition of the sequences were made possible during both random and rationally derived iterations. Can be found.

（実施例２）
上記の実施例は単一の既知のタンパク質に対するアプタマーライブラリーの生成に関するが、上記プロセスはインビトロの既知のタンパク質の混合物に対してもまた使用することができる。これは可能なアプタマーに対するタンパク質の比率が非常に高いからである。この事例において、配列（暗黙的には構造）レベルでモチーフの多様性は重要である。多様だが機能的な配列の初期集団はしたがって本発明の重要な特徴である。 (Example 2)
While the above examples relate to the generation of aptamer libraries for a single known protein, the process can also be used for a mixture of known proteins in vitro. This is because the ratio of proteins to possible aptamers is very high. In this case, motif diversity is important at the sequence (implicitly structure) level. The initial population of diverse but functional sequences is therefore an important feature of the present invention.

この実施例において、複数の既知のタンパク質は、アプタマー選択のために単離される。複数の候補アプタマーライブラリーを並列して選択および管理する目的以外は、方法は実施例１のように進行する。これは、１つの候補ライブラリー中のアプタマーライブラリーをグループ化すること、または１つの十分に多様な候補ライブラリーにより出発して研究下のすべてのタンパク質に対するアプタマーを包含することによって遂行される。実施例１において記述された方法に従い、追加で、初期反復の後に、アプタマーは遺伝的アルゴリズムにおいて既に記述された方式（１６）でグループ化またはクラスター化され、表１において示されるような複数の概念的な表を生成する。実施例１において、グルーピングは、表１において具体化された測定された特徴を使用する「クラスタリング」または他の方法に基づくことができる。この事例において、選択間の介入は、研究下のタンパク質の混合物を抽象的に表現する概念的に別個の表（アプタマーの亜集団を表す）を維持するように機能する。その最も単純な実施形態において、これは多重化の形態である。 In this example, multiple known proteins are isolated for aptamer selection. The method proceeds as in Example 1 except for the purpose of selecting and managing multiple candidate aptamer libraries in parallel. This is accomplished by grouping aptamer libraries in one candidate library, or by including aptamers for all proteins under study starting with one sufficiently diverse candidate library. In accordance with the method described in Example 1, additionally, after the initial iteration, aptamers are grouped or clustered in the manner (16) already described in the genetic algorithm, and a plurality of concepts as shown in Table 1 A typical table. In Example 1, the grouping can be based on “clustering” or other methods that use the measured features embodied in Table 1. In this case, the inter-selection intervention functions to maintain a conceptually distinct table (representing a subpopulation of aptamers) that abstractly represents the mixture of proteins under study. In its simplest embodiment, this is a form of multiplexing.

選択プロセスの終了時に、研究下の混合物中のタンパク質に対して特異的なアプタマーが得られることになる。この事例において、どのアプタマーサブセットがどのタンパク質に結合するかは分からない可能性がある。これは他の実験技術を使用して後に解決できるか、または必要ではない可能性がある。後者の場合は、もしアプタマーおよびタンパク質が配置されともに使用されることになっていれば、既知のタンパク質に対する「無名の」アプタマーを生成および使用することが可能である。 At the end of the selection process, aptamers specific to the protein in the mixture under study will be obtained. In this case, it may not be clear which aptamer subset binds to which protein. This can be resolved later using other experimental techniques, or may not be necessary. In the latter case, it is possible to generate and use “anonymous” aptamers for known proteins if the aptamer and protein are to be placed and used together.

あるいは、特定のタンパク質構造に対する特定のアプタマー配列モチーフの特異性の予備的知識を使用して、どのアプタマーがどのタンパク質にマッチしているかを示唆することができる。 Alternatively, prior knowledge of the specificity of a particular aptamer sequence motif for a particular protein structure can be used to suggest which aptamer matches which protein.

実施例１および２の両方において、所与のタンパク質に対して同定されたアプタマーの、それらが選択されたものに対する特異性は、追加の工程として実証されるべきである。この追加の工程は、アプタマーが選択されたタンパク質の既知量を含む複雑な生物学的サンプルに対してアプタマーを暴露することによって、次世代配列決定を使用してもまた遂行することができる。アプタマーの測定は、それらが選択されたものではないサンプル中に存在するタンパク質に対する非特異的結合によって混乱させられるべきではない。これは、アレイ化によって生成されたカウンティングの統計、および暴露されたアプタマーの配列決定を使用して検査することができる。 In both Examples 1 and 2, the specificity of aptamers identified for a given protein to those from which they were selected should be demonstrated as an additional step. This additional step can also be accomplished using next generation sequencing by exposing the aptamer to a complex biological sample containing a known amount of the selected protein. Aptamer measurements should not be confused by non-specific binding to proteins present in samples where they are not selected. This can be examined using counting statistics generated by arraying and sequencing of exposed aptamers.

（実施例３）
理想的には、それぞれのタンパク質に対するアプタマーの特異性は、選択プロセス／生成プロセスへと組み入れられる。これは、ＳＥＬＥＸスキーム、および単一のタンパク質または小規模なサブセットに対するスキームでは可能ではない。 (Example 3)
Ideally, the specificity of the aptamer for each protein is incorporated into the selection / generation process. This is not possible with SELEX schemes and schemes for single proteins or small subsets.

この実施例において、方法は実施例１において記述されているように進行する。しかしながら、この実施例において、アプタマーライブラリーは生来の複雑な生物学的サンプルに対して暴露される。この実験は多数複製される。 In this example, the method proceeds as described in Example 1. However, in this example, the aptamer library is exposed to a native complex biological sample. This experiment is replicated in large numbers.

第１に、アレイ上で非常に多く表されるアプタマーが見られることになる。多く表されたアプタマーのうちのいくつかまたはすべては、複製間で高い分散度もまた有することができる。前者の事例において、アプタマーが高度に多量のタンパク質を結合していることは論理的に結論することができる。それらの分散が高いならば、アプタマーが特異性を欠くと結論することは妥当である。 First, aptamers that are very much represented on the array will be seen. Some or all of the well-represented aptamers can also have a high degree of dispersion between replicates. In the former case, it can be logically concluded that the aptamer binds a high amount of protein. If their variance is high, it is reasonable to conclude that aptamers lack specificity.

合成の次のラウンドにおいて、これらの多量にあるアプタマーまたは分散のかかるアプタマーは除外される。次に、配列決定アレイが、複製間で相対的表示（恐らくいくつかの対照に対して）が低く、分散が非常に低い配列の多様なセットを含むようになるまで、実施例１において略述されるように、本発明の方法を実行する。この事例において、サンプル中に高度に多量にあるタンパク質を結合するアプタマーおよび／または低特異性を備えたアプタマーは、効果的に除外されている。 In the next round of synthesis, these abundant or dispersed aptamers are excluded. The sequencing array is then outlined in Example 1 until it contains a diverse set of sequences that have a low relative representation between replicates (perhaps over some controls) and a very low variance. As described, the method of the present invention is performed. In this case, aptamers that bind a high amount of protein in the sample and / or aptamers with low specificity are effectively excluded.

後続するアプタマーを個別にまたはグループで使用して、実施例２において設定されるような方法を既知のタンパク質の混合物を使用して繰り返すことができる。同定されたアプタマー配列は、より単純な例において必要とされるセミランダムライブラリーの生成に使用される「モチーフ」を提供することができる。このような方法で、ライブラリーは、感度および特異性（抗体ベースの方法が失敗する共通の原因）についてあらかじめフィルタリングされることになる。 The subsequent aptamers can be used individually or in groups to repeat the method as set out in Example 2 using a mixture of known proteins. The identified aptamer sequence can provide a “motif” that is used to generate the semi-random library required in simpler examples. In this way, the library will be pre-filtered for sensitivity and specificity (a common cause of failure of antibody-based methods).

さらなる工程において、集団はまとめて、実施例２の一般的なスキームを使用して、未知の低存在量のタンパク質に対する多数のアプタマーの同定を続行することができる。この事例において、アプタマーが結合しているタンパク質は未知である。タンパク質の同定は他の技術を使用するさらなる工程において解明されることになる。しかしながら、未知の低存在量タンパク質に対する「無名の」アプタマープローブのセットを開発することは概念的に可能である。 In a further step, the population can be collectively continued to identify multiple aptamers for unknown low abundance proteins using the general scheme of Example 2. In this case, the protein to which the aptamer is bound is unknown. Protein identification will be elucidated in a further step using other techniques. However, it is conceptually possible to develop a set of “unnamed” aptamer probes for unknown low abundance proteins.

それらを次に配置して多くのサンプル中のこれらのタンパク質を測定することができるので、これはそれらの有用性を限定しない。アプタマーはここでこれらのタンパク質の代理であり、統計的に有意に変化するアプタマー（すなわちタンパク質）はしたがってサンプル間で測定することができる（すなわちバイオマーカーを見出す）。これを達成し、適切なアプタマーを掌中にして、続いてアプタマーは抽出され、代理であるタンパク質の同定に使用することができる。このような方法で、無名の低存在量タンパク質の複雑な混合物に対するアプタマーを見出すこと、バイオマーカー発見においてそれらを配置すること、有意なアプタマーを見出すこと、および現在の最先端技術のように最初ではなく、最終的な工程としてタンパク質を同定することが可能である。 This does not limit their usefulness as they can then be placed to measure these proteins in many samples. Aptamers here are surrogates of these proteins, and aptamers (ie proteins) that vary statistically significantly can therefore be measured between samples (ie find biomarkers). Achieving this, with the appropriate aptamer in the palm, can then be extracted and used to identify the surrogate protein. In this way, at first, like finding aptamers to complex mixtures of unnamed low-abundance proteins, placing them in biomarker discovery, finding significant aptamers, and current state of the art Rather, it is possible to identify the protein as a final step.

（参考文献）
１．“Ａｐｒｏｇｒａｍｍａｂｌｅｂｉｏｍｏｌｅｃｕｌａｒｃｏｍｐｕｔｉｎｇｍａｃｈｉｎｅｗｉｔｈｂａｃｔｅｒｉａｌｐｈｅｎｏｔｙｐｅｏｕｔｐｕｔ．”コソイ（Kossoy E）、ラヴィド（Lavid N）、ソレニ−ハラリ（Soreni-Harari M）、ショーハム（Shoham Y）、ケイナン（Keinan E）「ケムバイオケム（Chembiochem）」、２００７年７月２３日、８（１１）、１２５５〜６０頁
２．“ＤＮＡｍｏｌｅｃｕｌｅｐｒｏｖｉｄｅｓａｃｏｍｐｕｔｉｎｇｍａｃｈｉｎｅｗｉｔｈｂｏｔｈｄａｔａａｎｄｆｕｅｌ．”ベネンソン（Benenson Y）、アダー（Adar R）、パズ−エリズール（Paz-Elizur T）、リヴネ（Livneh Z）、シャピロ（Shapiro E）「米国科学アカデミー紀要（Proc Natl Acad Sci USA）」２００３年３月４日、１００（５）、２１９１〜６頁
３．“Ｓｏｌｖｉｎｇｓａｔｉｓｆｉａｂｉｌｉｔｙｐｒｏｂｌｅｍｓｕｓｉｎｇａｎｏｖｅｌｍｉｃｒｏａｒｒａｙ−ｂａｓｅｄＤＮＡｃｏｍｐｕｔｅｒ．”リン（Lin CH）、チェン（Cheng HP）、ヤング（Yang CB）、ヤング（Yang CN）「バイオシステムズ（Biosystems）」２００７年７〜８月、９０（１）、２４２〜５２頁
４．“ＤＮＡｃｏｍｐｕｔｉｎｇｕｓｉｎｇｓｉｎｇｌｅ−ｍｏｌｅｃｕｌｅｈｙｂｒｉｄｉｚａｔｉｏｎｄｅｔｅｃｔｉｏｎ．”シュミット（Schmidt KA）、ヘンケル（Henkel CV）、ローゼンベルグ（Rozenberg G）、スペンク（Spaink HP）「ヌクレイック・アシッズ・リサーチ（Nucleic Acids Res）」２００４年９月２３日、３２（１７）、４９６２〜８頁
５．“ＦａｓｔｐａｒａｌｌｅｌｍｏｌｅｃｕｌａｒａｌｇｏｒｉｔｈｍｓｆｏｒＤＮＡ−ｂａｓｅｄｃｏｍｐｕｔａｔｉｏｎ：ｆａｃｔｏｒｉｎｇｉｎｔｅｇｅｒｓ．”チャン（Chang WL）、グオ（Guo M）、ホウ（Ho MS）「ＩＥＥＥトランスナノバイオサイエンス（IEEE Trans Nanobioscience）」、２００５年６月、４（２）、１４９〜６３頁
６．“ＦｕｎｃｔｉｏｎａｌＲＮＡｍｉｃｒｏａｒｒａｙｓｆｏｒｈｉｇｈ−ｔｈｒｏｕｇｈｐｕｔｓｃｒｅｅｎｉｎｇｏｆａｎｔｉｐｒｏｔｅｉｎａｐｔａｍｅｒｓ．”コレット（Collett JR）、チョウ（Cho EJ）、リー（Lee JF）、レヴィ（Levy M）、フッド（Hood AJ）、ワン（Wan C）、エリントン（Ellington AD）「アナリテイカルバイオケミストリー（Anal Biochem）」２００５年３月１日、３３８（１）、１１３〜２３頁
７．“Ｎｕｃｌｅｉｃａｃｉｄｅｖｏｌｕｔｉｏｎａｎｄｍｉｎｉｍｉｚａｔｉｏｎｂｙｎｏｎｈｏｍｏｌｏｇｏｕｓｒａｎｄｏｍｒｅｃｏｍｂｉｎａｔｉｏｎ．”ビットカ（Bittker JA）、ロー（Le BV）、リュウ（Liu DR）「ネイチャーバイオテクノロジー（Nat Biotechnol）」２００２年１０月、２０（１０）、１０２４〜９頁
８．“Ａｐｔａｍｅｒｓｃｏｍｅｏｆａｇｅ−ａｔｌａｓｔ”．ブンカ（Bunka DH）、ストックレイ（Stockley PG）「ネイチャーレビューマイクロバイオロジー（Nat Rev Microbiol）」２００６年８月、４（８）、５８８〜９６頁
９．“Ａｐｔａｍｅｒｓ：ｍｏｌｅｃｕｌａｒｔｏｏｌｓｆｏｒａｎａｌｙｔｉｃａｌａｐｐｌｉｃａｔｉｏｎｓ．”マイラル（Mairal T）、チェンギ（Cengiz Ozalp V）、ロザノ（Lozano Sanchez P）、ミール（Mir M）、カタキス（Katakis I）、オサリヴァン（O'sullivan CK）「アナリティカル・アンド・バイオアナリティカルケミストリー（Anal Bioanal Chem）」２００７年６月２１日
１０．“Ａｐｔａｍｅｒｓａｓｔｏｏｌｓｆｏｒｔａｒｇｅｔｖａｌｉｄａｔｉｏｎ．ブランク（Blank M）、ブラインド（Blind M）「カレント・オピニオン・イン・ケミカル・バイオロジー（Curr Opin Chem Biol）」２００５年８月、９（４）、３３６〜４２頁
１１．“ＭｅｔｈｏｄｓｄｅｖｅｌｏｐｅｄｆｏｒＳＥＬＥＸ．ゴピナス（Gopinath SC）「アナリティカル・アンド・バイオアナリティカルケミストリー（Anal Bioanal Chem）」２００７年１月、３８７（１）、１７１〜８２頁
１２．“Ａｐｔａｍｅｒｓ−ｂａｓｅｄａｓｓａｙｓｆｏｒｄｉａｇｎｏｓｔｉｃｓ，ｅｎｖｉｒｏｎｍｅｎｔａｌａｎｄｆｏｏｄａｎａｌｙｓｉｓ．”トンベリ（Tombelli S）、ミナニ（Minunni M）、マチニ（Mascini M）「バイオモレキュラーエンジニアリング（Biomol Eng）」２００７年６月、２４（２）、１９１〜２００頁
１３．“ＳＥＬＥＸ−Ａ（ｒ）ｅｖｏｌｕｔｉｏｎａｒｙｍｅｔｈｏｄｔｏｇｅｎｅｒａｔｅｈｉｇｈ−ａｆｆｉｎｉｔｙｎｕｃｌｅｉｃａｃｉｄｌｉｇａｎｄｓ．”ストルテンブルグ（Stoltenburg R）、レインマン（Reinemann C）、ストレヒリッツ（Strehlitz B）「バイオモレキュラーエンジニアリング（Biomol Eng）」２００７年１０月、２４（４）、３８１〜４０３頁
１４．“ＧｅｎｅｔｉｃＡｌｇｏｒｉｔｈｍｓｉｎＳｅａｒｃｈ，Ｏｐｔｉｍｉｚａｔｉｏｎ，ａｎｄＭａｃｈｉｎｅＬｅａｒｎｉｎｇ．”ゴールドバーグ（Goldberg, D.E.）アディソン−ウェズリー（Addison-Wesley）米国、１９８９年
１５．“ＧｅｎｅｔｉｃＡｌｇｏｒｉｔｈｍｓ＋ＤａｔａＳｔｒｕｃｔｕｒｅｓ＝ＥｖｏｌｕｔｉｏｎＰｒｏｇｒａｍｓ．”ミカルウィク（Michalewicz, Z）第３版スプリンガー、１９９６年
１６．“ＥｖｏｌｕｔｉｏｎＡｌｇｏｒｉｔｈｍｓｉｎＣｏｍｂｉｎａｔｏｒｉａｌＯｐｔｉｍｉｚａｔｉｏｎ，”ミュレンベイン（H. Muhlenbein）、シュリューター（M. Gorges-Schleuter）、クレイマー（O. Kramer）「パラレルコンピューティング（Parallel Computing）７」（１９８８年）、６５〜８８頁
１７．“ＣｏｍｐｌｅｘＳＥＬＥＸａｇａｉｎｓｔｔａｒｇｅｔｍｉｘｔｕｒｅ：ｓｔｏｃｈａｓｔｉｃｃｏｍｐｕｔｅｒｍｏｄｅｌ，ｓｉｍｕｌａｔｉｏｎ，ａｎｄａｎａｌｙｓｉｓ”．チェン（Chen CK）「コンピュータメソッド・アンド・プログラム・イン・バイオメディシン（Comput Methods Programs Biomed）」２００７年９月、８７（３）、１８９〜２００頁
１８．“ＥｌｕｃｉｄａｔｉｏｎｏｆｔｈｅＳｍａｌｌＲＮＡＣｏｍｐｏｎｅｎｔｏｆｔｈｅＴｒａｎｓｃｒｉｐｔｏｍｅ．”ルー（Lu）ら「サイエンス（Science）２」２００５年９月、１５６７〜１５６９頁
１９．チエン（Qian W.J.）ら「モレキュラー・アンド・セルラー・プロテオミクス（Mol Cell Prot）」（２００６年）、５（１０）、１７２７〜１７４４頁
２０．ｈｔｔｐ：／／ｗｗｗ．ｃｈｅｍ．ｑｍｕｌ．ａｃ．ｕｋ／ｉｕｂｍｂ／ｍｉｓｃ／ｎａｓｅｑ．ｈｔｍｌ
２１．“Ｔｈｅｐｏｔｅｎｔｉａｌａｎｄｃｈａｌｌｅｎｇｅｓｏｆｎａｎｏｐｏｒｅｓｅｑｕｅｎｃｉｎｇ”ブラントン（Branton）ら「ネイチャーバイオテクノロジー（Nature Biotechnology）」（２００８年）２６、１１４６〜１１５３頁 (References)
1. “A programmable biomolecular computing machine with bacterial phenotype output.” Kosoy E, Lavid N, Soreni-Harari M, BioC ", July 23, 2007, 8 (11), pages 1255-60. “DNA molecule supplies a computing machine with both data and fuel.” Benenson Y, Adar R, Paz-Elizur T, Livneh Z, Shapiro “Shapiro” 2. Proc Natl Acad Sci USA ”March 4, 2003, 100 (5), 2191-6. “Solving satisfiability problems using a novel array-based DNA computer.” Lin CH, Cheng HP, Yang CB, Yang CN “Biosystems” July-August 2007 90 (1), pages 242-252. “DNA computing using single-molecule hybridization detection.” Schmidt KA, Henkel CV, Rozenberg G, Spaink HP, “Nucleic Acids Research 9” May 23, 32 (17), 4962-8. “Fast parallel molecular algorithms for DNA-based computation: factoring integrators.” Chang WL, Guo M, Ho MS “IEEE Trans Nanobioscience”, June 2005, April (2) pp. 149-63 “Functional RNA microarrays for high-throughput screening of antiprotein receptors.” Collet JR, Cho EJ, Lee JF, Levy M, Hood A, Hood J Ellington AD “Anal Biochem”, March 1, 2005, 338 (1), pp. 113-23. “Nucleic acid evolution and minimization by nonhomogenous random recombination.” Bittker JA, Lo B (Le BV), Liu DR “Nat Biotechnol”, October 24, 2010 (10) Page 9 8. “Aptamers come of age-at last”. 8. Bunka DH, Stockley PG "Nat Rev Microbiol" August 2006, 4 (8), 588-96. “Aptamers: molecular tools for analytical applications.” Myral T, Cengiz Ozalp V, Lozano Sanchez P, Mir M, Katakis I, Osalivan (O 'ulli) “Analytical and Bio-Analytical Chemistry” June 21, 2007 10. “Aptamers as tools for target validation. Blank M, Blind M,“ Curr Opin Chem Biol ”, August 2005, 9 (4), 336-42. Page 11. “Methods developed for SELEX. Gopinath SC” Analytical and Bioanalytical Chemistry, January 2007, 387 (1), 171-82. “Aptamers-based assays for diagnostics, environmental and food analysis.” Tombelli S, Minunni M, Masini M, “Biomolecular Engineering, June 2006” Pages 191-200. “SELEX-A (r) evolutionary method to generate high-affinity nucleic acid ligands.” Stoltenburg (R), Reinemann (C), Strehlitz (B) “Bio-Molecular Engineering” (200) Moon, 24 (4), pages 381-40314. “Genetic Algorithms in Search, Optimization, and Machine Learning.” Goldberg, DE Addison-Wesley USA, 1989, 15. “Genetic Algorithms + Data Structures = Evolution Programs.” Michalewicz, Z, 3rd edition Springer, 1996. “Evolution Algorithms in Combinatorial Optimization,” H. Muhlenbein, M. Gorges-Schleuter, O. Kramer, “Parallel Computing 7” (1988), pages 65-88. 17. “Complex SELEX again target target: stochastic computer model, simulation, and analysis”. Chen CK “Comput Methods Programs Biomed”, September 2007, 87 (3), 189-200. “Elucidation of the Small RNA Component of the Transscript.” Lu et al. “Science 2” September 2005, pages 1567-1569 19. Qian WJ et al. "Mol Cell Protomics" (2006), 5 (10), pp. 1727 to 1744. http: // www. chem. qmul. ac. uk / ibummb / misc / naseq. html
21. “The potential and challenges of nanopore sequencing,” Branton et al., “Nature Biotechnology” (2008) 26, 1461-1153.

Claims

A method for the identification of one or more aptamers against at least one target molecule comprising:
a) contacting a pool of candidate polymer sequences with at least one target molecule;
b) splitting the unbound sequence from the sequence specifically bound to the target molecule;
c) dissociating the sequence target complex to obtain a mixture enriched in ligands of the sequence;
d) assigning each sequence obtained in step c) a measure of the aptamer potential of the sequence (fitness function);
e) using the measurements of step d) to determine the aptamer potential of the ligand enriched mixture;
f) using the information obtained in step e) to allow the evolution of some or all of the sequences obtained in step c) to generate a new sequence mixture;
g) repeating steps a) -f) with the newly generated candidate aptamer pool until the total aptamer potential of the candidate pool reaches a plateau,
The method wherein the sequence present in the final pool is an optimal aptamer to at least one target molecule.

The method of claim 1, wherein the at least one target molecule is a protein.

The method of claim 2, wherein the at least one target molecule is a single isolated protein.

4. A method according to claim 1, 2 or 3, wherein it is known what one or each target molecule is.

5. A method according to claim 2 or 4, wherein the plurality of proteins are interrogated in a mixture of proteins.

6. The method of claim 5, wherein the at least one protein is a known protein present in a mixture of proteins.

The method according to claim 5 or 6, wherein the protein mixture is derived from a biological sample.

8. The method of claim 7, wherein the biological sample is a body fluid.

9. The method of claim 8, wherein the body fluid is blood or derived from blood.

The method according to claim 9, wherein the body fluid is serum or plasma.

The method according to claim 1, wherein the polymer sequence is a polynucleotide.

12. The method of claim 11, wherein the polynucleotide sequence is DNA, RNA, PNA (peptide nucleic acid), or variants or combinations thereof.

13. A method according to any one of claims 1 to 12, wherein the polymer is between 30 mer and 60 mer.

14. The method of claim 13, wherein the polymer is 40mer.

15. A method according to any one of the preceding claims, wherein the potential for aptamers is measured by quantification of candidate sequences in a mixture enriched for ligands.

16. The method of claim 15, wherein quantification is performed by sequencing at least a portion of each candidate sequence.

The method of claim 16, wherein sequencing is performed on a single molecule array or a clonal single molecule array.

18. A method according to any one of claims 1 to 17, further comprising arraying sequences in the mixture enriched with ligand onto the surface prior to step d).

19. The method of claim 18, further comprising amplifying the arrayed sequence.

20. The method of any one of claims 1-19, wherein the aptamer likelihood measure further comprises one or more measured properties, calculated properties, or bioinformatics properties.

21. The method of claim 20, wherein the bioinformatics characteristics include secondary structure prediction, tertiary structure prediction, self-similarity, information complexity, similarity to known aptamer sequences, sequence motifs, or combinations thereof.

6. The ligand enriched sequence, having a statistically significant aptamer potential when compared to the candidate sequence population under study, is advanced from step d) and e) to step f). The method according to any one of 1 to 21.

22. A ligand enriched sequence having the potential of aptamers classified in the average or upper percentile range is advanced from step d) and e) to step f). The method according to item.

24. A method according to any one of claims 1 to 23, wherein the ligand enriched sequences with the potential for statistically insignificant aptamers are removed from the candidate pool.

25. A method according to any one of claims 1 to 24, wherein unbound candidate sequences are eluted from the candidate pool and discarded.

26. The method of any one of claims 1-25, further comprising removing a numerically dominant sequence from the candidate pool.

27. The method of any one of claims 1-26, further comprising obtaining the complete sequence of candidate aptamers present in the final pool.

28. A method according to any one of claims 1 to 27, wherein a new candidate aptamer pool is designed using sequences with high aptamer potential or motifs derived from such sequences.

29. A sequence and / or motif having a high probability of an aptamer is used to affect random changes and / or recombination of that sequence that indicate a high probability of the aptamer. The method according to one item.

30. The method of any one of claims 1-29, further comprising modifying the candidate sequence to increase stability and / or binding ability.