JP2024518336A

JP2024518336A - Protein-protein interaction detection

Info

Publication number: JP2024518336A
Application number: JP2023566455A
Authority: JP
Inventors: ユ・チウ; ヤンフェン・チョウ; ユリー・アイオゾ
Original assignee: Sanofi SA
Current assignee: Sanofi SA
Priority date: 2021-04-28
Filing date: 2022-04-26
Publication date: 2024-05-01
Also published as: WO2022232145A1; EP4330967A1

Abstract

第１のタンパク質および第２のタンパク質を含むタンパク質－タンパク質複合体のサンプルの複数の複合体画像を含む合成画像がアクセスされる。合成画像はマスキングされ、マスキング部分および非マスキング部分が生成される。第１のタンパク質の第１の３次元（３ｄ）形状および第２のタンパク質の第２の３ｄ形状がアクセスされる。各々が候補ポーズペアを定義する複数のドッキングモデルがアクセスされる。ドッキングモデルごとに、第１の３ｄ形状、第２の３ｄ形状および候補ポーズペアが適用され、ドッキングモデルについて、ポーズペアとドッキングモデルとの適合の良好度を記述する対応する適合スコアが生成される。適合スコアに基づいて、タンパク質－タンパク質複合体のための検知されたモデルとしてドッキングモデルのうちの１つが選択される。【選択図】図１A composite image is accessed, the composite image including a plurality of complex images of a sample of a protein-protein complex including a first protein and a second protein. The composite image is masked to generate a masked portion and an unmasked portion. A first three-dimensional (3d) shape of the first protein and a second 3d shape of the second protein are accessed. A plurality of docking models, each of which defines a candidate pose pair, are accessed. For each docking model, the first 3d shape, the second 3d shape and the candidate pose pair are applied, and a corresponding fit score is generated for the docking model that describes a goodness of fit between the pose pair and the docking model. Based on the fit score, one of the docking models is selected as a detected model for the protein-protein complex. [Selected Figure]

Description

本書は、センサデータを使用してタンパク質－タンパク質結合を特徴付ける技術を記載する。 This paper describes techniques for characterizing protein-protein binding using sensor data.

タンパク質－タンパク質相互作用（ＰＰＩ）は、静電力、水素結合および疎水性効果を含む相互作用の結果として２つ以上のタンパク質分子間に確立された高い特異性の物理的接触である。多くは、特定の生体分子の状況において細胞または生体組織内に生じる鎖間の分子関係を有する物理的接触である。タンパク質はそれらの機能が調節される傾向にあるため、単独で作用することが稀である。細胞内の多くの分子プロセスは、ＰＰＩによって編成される多数のタンパク質成分から構築される分子機械によって行われる。 Protein-protein interactions (PPIs) are highly specific physical contacts established between two or more protein molecules as a result of interactions involving electrostatic forces, hydrogen bonds and hydrophobic effects. Many are physical contacts with chain-to-chain molecular relationships that occur within cells or living tissues in specific biomolecular contexts. Proteins rarely act alone, as their functions tend to be regulated. Many molecular processes within cells are carried out by molecular machines built from numerous protein components orchestrated by PPIs.

免疫学において、抗原（Ａｇ）は、抗原－特異抗体またはＢ細胞抗原受容体によって制限することができる病原体の外側に存在することができるような分子または分子構造である。身体内の抗原の存在は、通常、免疫反応をトリガする。 In immunology, an antigen (Ag) is a molecule or molecular structure that can be present on the outside of a pathogen that can be restricted by an antigen-specific antibody or a B cell antigen receptor. The presence of an antigen in the body usually triggers an immune response.

タンパク質－タンパク質相互作用を特徴付ける技術が本書に説明される。例えば、臨床的または生物学的使用（例えば、薬剤開発）のために分子を開発するとき、抗体が抗原とどのように相互作用するかの理解にこの技術を提供することができる。タンパク質複合体のｃｒｙｏ－ＥＭイメージングを行うことができ、イメージングからのデータをコンピュータシステムを使用して処理し、相対ロケーション、配向および２つのタンパク質の結合を記述するドッキングモデルを選択することができる。 Techniques for characterizing protein-protein interactions are described herein. For example, the techniques can be useful in understanding how an antibody interacts with an antigen when developing molecules for clinical or biological use (e.g., drug development). Cryo-EM imaging of protein complexes can be performed and data from the imaging can be processed using computer systems to select docking models that describe the relative location, orientation, and binding of the two proteins.

１つまたはそれ以上のコンピュータのシステムは、動作時にシステムにアクションを行わせるシステムにインストールされたソフトウェア、ファームウェア、ハードウェアまたはそれらの組合せを有することにより特定の動作またはアクションを行うように構成することができる。１つまたはそれ以上のコンピュータプログラムは、データ処理装置によって実行されると、装置にアクションを実行させる命令を含むことにより特定の動作またはアクションを実行するように構成することができる。１つの一般的な態様は、タンパク質－タンパク質複合体相互作用を検知するための方法を含み：この方法は、第１のタンパク質および第２のタンパク質を含むタンパク質－タンパク質複合体のサンプルの複数の複合体画像を含む合成画像にアクセスすることを含むことができる。方法は、第１のタンパク質の第１の３次元（３ｄ）形状および第２のタンパク質の第２の３ｄ形状にアクセスすることも含む。方法は、各々が候補ポーズペアを定義する複数のドッキングモデルにアクセスすることも含む。方法は、ドッキングモデルごとに、第１の３ｄ形状、第２の３ｄ形状および候補ポーズペアを適用して、ドッキングモデルについて、ポーズペアとドッキングモデルとの適合の良好度を記述する対応する適合スコアを生成することも含む。方法は、適合スコアに基づいて、タンパク質－タンパク質複合体のための検知されたモデルとしてドッキングモデルのうちの１つを選択することも含む。この態様の他の実施形態は、対応するコンピュータシステム、装置、および、各々が方法のアクションを実行するように構成された１つまたはそれ以上のコンピュータストレージデバイス上に記録されたコンピュータプログラムを含む。 One or more computer systems can be configured to perform a particular operation or action by having software, firmware, hardware, or a combination thereof installed on the system that, when operated, causes the system to perform the action. One or more computer programs can be configured to perform a particular operation or action by including instructions that, when executed by a data processing device, cause the device to perform the action. One general aspect includes a method for detecting protein-protein complex interactions: the method can include accessing a composite image including a plurality of complex images of a sample of a protein-protein complex including a first protein and a second protein. The method also includes accessing a first three-dimensional (3d) shape of the first protein and a second 3d shape of the second protein. The method also includes accessing a plurality of docking models, each of which defines a candidate pose pair. The method also includes applying, for each docking model, the first 3d shape, the second 3d shape, and the candidate pose pair to generate, for the docking model, a corresponding fit score that describes a goodness of fit between the pose pair and the docking model. The method also includes selecting one of the docking models as a detected model for the protein-protein complex based on the fit score. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the method.

実施態様は、以下の構成のうちの１つまたはそれ以上を含むことができる。方法は、複数の複合体画像を生成することを含むことができる。合成画像を生成することは、複合体画像からタンパク質－タンパク質－複合体のサブ画像を抽出することと；サブ画像を配向および分類することとを含むことができる。複合体画像は、低温電子顕微鏡法（ｃｒｙｏ－ＥＭ）画像である。各複合体画像は、各々がアドレスを有し、タンパク質－タンパク質複合体のサンプルの対応部分を表す色値を保持する、複数のピクセルを含むことができる。合成画像は、各々がアドレスを有し、複数の複合体画像の各々において同じアドレスを有するピクセルの色値の集合体である色値を保持する、複数のピクセルを含むことができる。合成画像のマスキングは特定のユーザ入力なしで実行される。合成画像のマスキングは、非マスキング部分を指定する第１のユーザ入力を受けることを含むことができる。非マスキング部分を指定する第１のユーザ入力を受けることは：第１のユーザ入力によって指定されたロケーションを接続することによってバウンディングボックスを生成することと；バウンディングボックス内の合成画像の部分を非マスキング部分として記録することと、を含むことができる。第１の３ｄ形状は第１のタンパク質としてインデックス付けされ、第２の３ｄ形状は第２のタンパク質としてインデックス付けされる。第１の３ｄ形状は第１のタンパク質の第１の同族体としてインデックス付けされる。第２の３ｄ形状は第２のタンパク質の第２の同族体としてインデックス付けされる。候補ポーズペアは、候補ロケーション、候補配向および候補ドッキングエリアを含む。適合スコアは、２Ｄ空間上にドッキングモデルを投影することによって生成される、合成画像と画像との間の相互相関スコアである。適合スコアに基づいて、タンパク質－タンパク質複合体のための検知されたモデルとしてドッキングモデルのうちの１つを選択することは：ドッキングモデルのサブセットを：ｉ）上位ｎ個の適合スコアを有するドッキングモデルを選択することと；ｉｉ）閾値ｍを超える適合スコアを有する全てのドッキングモデルを選択することと、からなるグループのうちの１つによって、それらの対応する適合スコアに基づいて識別することを含むことができる。タンパク質－タンパク質複合体のための検知されたモデルとしてドッキングモデルのうちの１つを選択することは、ドッキングモデルのサブセットのうちの１つを検知されたモデルとして選択する第２のユーザ入力を受けることを含むことができる。記載される技術の実施態様は、ハードウェア、方法もしくはプロセス、またはコンピュータアクセス可能媒体上のコンピュータソフトウェアを含むことができる。 The embodiment may include one or more of the following configurations. The method may include generating a plurality of complex images. Generating the composite image may include extracting a sub-image of the protein-protein-complex from the complex image; and orienting and classifying the sub-image. The complex image is a cryo-electron microscopy (cryo-EM) image. Each complex image may include a plurality of pixels, each having an address and holding a color value that represents a corresponding portion of the sample of the protein-protein complex. The composite image may include a plurality of pixels, each having an address and holding a color value that is a collection of color values of pixels having the same address in each of the plurality of complex images. Masking of the composite image is performed without specific user input. Masking of the composite image may include receiving a first user input that specifies a non-masking portion. Receiving a first user input that specifies a non-masking portion may include: generating a bounding box by connecting locations specified by the first user input; and recording a portion of the composite image within the bounding box as a non-masking portion. The first 3d shape is indexed as a first protein and the second 3d shape is indexed as a second protein. The first 3d shape is indexed as a first homolog of the first protein. The second 3d shape is indexed as a second homolog of the second protein. The candidate pose pair includes a candidate location, a candidate orientation, and a candidate docking area. The fitness score is a cross-correlation score between the composite image and the image generated by projecting the docking model onto the 2D space. Selecting one of the docking models as a detected model for the protein-protein complex based on the fitness score can include: identifying a subset of the docking models based on their corresponding fitness scores by one of the group consisting of: i) selecting docking models with top n fitness scores; and ii) selecting all docking models with fitness scores above a threshold m. Selecting one of the docking models as a detected model for the protein-protein complex can include receiving a second user input selecting one of the subset of docking models as the detected model. Implementations of the described techniques can include hardware, methods or processes, or computer software on a computer-accessible medium.

実施態様は、以下の利点のうちの任意のもの、または全てを含むか、またはいずれも含まないことが可能である。この技術は、有利には、取得がより容易なｃｒｙｏ－ＥＭイメージング結果を使用することができる。例えば、他の技法は、様々な角度からのタンパク質複合体粒子のｃｒｙｏ－ＥＭ画像を要する場合がある。これらの技法と異なり、本書は、より少数の角度からの粒子画像で機能することができる技法を記載する。これは、いくつかのタンパク質複合体の性質に起因して、ｃｒｙｏ－ＥＭのためのサンプル準備中に特定の空間配向を取得するために特に有利である。これは、場合によっては、優先配向の課題と呼ばれる。他の技法は、特定のタンパク質複合体についてこの課題を克服する方式を発見するために大規模な実験を必要とする場合があり、依然として失敗する場合がある。一方でこの技術は、課題をまとめて回避することができる。これにより、数時間または数日程度で完了することができるプロセスを導くことができ、ここで、いくつかの他のプロセスは、完了するのに数週間または数か月程度かかる。極端な場合、この技術は、他の技法と異なり、タンパク質粒子画像がｃｒｙｏ－ＥＭイメージング中に得られる場合に毎回機能するため、唯一の頼みの綱である可能性がある。さらに、この技法は、非常に迅速で僅かな時間しか要しないプロセスにおいて、人間ユーザの分野の専門家を取り入れるように有利に構成することができる。しかしながら、この技術は、様々な段階における任意の特定の人間の入力を必要とせずに進行するように有利に構成することもでき、プロセスを完了させるために少ない人的時間および注意を可能にする。 Implementations can include any, all, or none of the following advantages. This technique can advantageously use cryo-EM imaging results that are easier to obtain. For example, other techniques may require cryo-EM images of protein complex particles from a variety of angles. Unlike these techniques, this document describes a technique that can work with particle images from fewer angles. This is particularly advantageous for obtaining specific spatial orientations during sample preparation for cryo-EM due to the nature of some protein complexes. This is sometimes referred to as the challenge of preferred orientation. Other techniques may require extensive experimentation to discover how to overcome this challenge for a particular protein complex and may still fail. On the other hand, this technique can avoid the challenge all together. This can lead to a process that can be completed in a matter of hours or days, where some other processes take a matter of weeks or months to complete. In extreme cases, this technique may be the only recourse because, unlike other techniques, it works every time a protein particle image is obtained during cryo-EM imaging. Additionally, the technique can be advantageously configured to incorporate human user domain expertise in a process that is very rapid and requires little time. However, the technique can also be advantageously configured to proceed without the need for any specific human input at various stages, allowing for less human time and attention to complete the process.

他の構成、態様および潜在的利点は、添付の説明および図面から明らかとなろう。 Other configurations, aspects and potential advantages will become apparent from the accompanying description and drawings.

タンパク質複合体内のタンパク質－タンパク質相互作用を検知することができる例示のシステムを示す。1 shows an exemplary system capable of detecting protein-protein interactions within a protein complex. タンパク質－タンパク質複合体を検知する際に使用することができる例示のデータを示す。Exemplary data are presented that can be used in detecting protein-protein complexes. タンパク質－タンパク質複合体を検知するための例示のプロセスを示す。1 shows an exemplary process for detecting protein-protein complexes. 合成画像を作成するための例示のプロセスを示す。1 illustrates an exemplary process for creating a composite image. 合成画像をマスキングするための例示のプロセスを示す。1 illustrates an example process for masking a composite image. 候補モデルのグループから検知されたモデルを選択するための例示のプロセスを示す。1 illustrates an example process for selecting a detected model from a group of candidate models. コンピューティングデバイスおよびモバイルコンピューティングデバイスの例の概略図を示す。1A-1C show schematic diagrams of example computing devices and mobile computing devices.

様々な図面における類似の参照符号は類似の要素を示す。 Like reference numbers in the various drawings indicate like elements.

タンパク質－タンパク質相互作用は、例えば、ｃｒｙｏ－ＥＭ画像から視覚化することができる。ｃｒｙｏ－ＥＭ画像フラグメントのグループは、単一の合成画像に集約され、これは次にマスキングされ、２つのタンパク質が結合するエリアが分離される。次に、合成画像は、各々がモデルが合成画像に示されるドッキングをどの程度良好に記述しているかを識別するようにスコアリングされたドッキングモデルのグループに送られる。マスキング、ドッキングおよびスコアリングは、タンパク質複合体の異なる部分に、数回（少なくとも、ＰＰＩを形成するタンパク質の数に等しい回数）適用することができる。このスコアリングに基づいて、最良のモデルが結果として識別される。 Protein-protein interactions can be visualized, for example, from cryo-EM images. A group of cryo-EM image fragments are aggregated into a single composite image, which is then masked to isolate the areas where the two proteins bind. The composite image is then sent to a group of docking models, each of which is scored to identify how well the model describes the docking shown in the composite image. Masking, docking and scoring can be applied several times (at least as many times as the number of proteins forming the PPI) to different parts of the protein complex. Based on this scoring, the best model is eventually identified.

例えば、タンパク質－タンパク質相互作用は、単一粒子３次元再構成を使用する必要なく、ｃｒｙｏ－ＥＭ画像または他のタイプの画像を使用してモデル化することができる。タンパク質複合体の粒子は、ｃｒｙｏＥＭ画像のグループから抽出され、位置合わせ、分類および平均化が後続する。次に、タンパク質複合体の平均化画像がマスキングされ、個々のタンパク質またはタンパク質複合体の一部のエリアが分離される。次に、マスキング画像が送られ、抗原および抗体フラグメント抗原結合（Ｆａｂ）のような個々のタンパク質成分の３－Ｄ構造またはモデルから連続して投影された一連２次元画像に対し検索される。マスキング画像（または非マスキング画像）と２Ｄ投影画像との間で相互相関が実行され、平均化された２ＤｃｒｙｏＥＭ画像において抗原および／またはＦａｂの配向が識別された。結合部位の制約として相互相関が識別された界面を使用してタンパク質－タンパク質ドッキング（例において抗原－Ｆａｂドッキング）が行われる。Ｆａｂ－抗原ドッキングの場合、我々は、ＣＤＲの多様なコンフォメーションを有するＦａｂの数十個の類似のモデル（構造アンサンブル）を生成する。ドッキングの出力ポーズの各々を投影により一連の２Ｄ画像に変換した。次に、元の２ＤｃｒｙｏＥＭにおいて、平均化された画像および／または複合体のエリアがマスクアウトされ、次に検索テンプレートとして使用され、ドッキング結果からの２Ｄ画像に対して相互相関が行われる。マスキングされた複合体に対し最も高い相互相関スコアを与えるドッキング結果が、タンパク質－タンパク質複合体の最良モデルとしてマーキングされる。Ｆａｂ－抗原のような２つの成分を有する複合体の場合、これらのステップにより作業が完了する。識別される更なる成分が存在する場合、反復的なマスキングおよび相互相関が行われる。 For example, protein-protein interactions can be modeled using cryo-EM images or other types of images without the need to use single particle 3D reconstructions. Particles of protein complexes are extracted from a group of cryo-EM images, followed by alignment, classification and averaging. The averaged image of the protein complex is then masked to isolate individual proteins or some areas of the protein complex. The masked image is then searched against a series of 2D images projected consecutively from 3-D structures or models of individual protein components such as antigens and antibody fragment antigen binding (Fab). Cross-correlation is performed between the masked (or unmasked) image and the 2D projected images to identify the orientation of the antigen and/or Fab in the averaged 2D cryo-EM image. Protein-protein docking (antigen-Fab docking in the example) is performed using the cross-correlated identified interfaces as constraints for the binding site. In the case of Fab-antigen docking, we generate dozens of similar models (structural ensembles) of Fabs with diverse conformations of the CDRs. Each of the docking output poses was transformed into a series of 2D images by projection. Next, in the original 2D cryoEM, the averaged images and/or areas of the complex are masked out and then used as search templates to perform cross-correlation against the 2D images from the docking results. The docking result that gives the highest cross-correlation score for the masked complex is marked as the best model of the protein-protein complex. For complexes with two components, such as Fab-antigen, these steps complete the task. If there are further components to be identified, iterative masking and cross-correlation is performed.

図１は、タンパク質－タンパク質複合体相互作用を検知することができる例示のシステム１００を示す。システム１００において、イメージャ１０２は、タンパク質－タンパク質複合体１０４のサンプルをイメージングする。これにより、タンパク質－タンパク質複合体１０４の検知を可能にし、イメージャ１０２がタンパク質複合体１０４のドッキング特徴を検知するセンサとなることを可能にすることができる。 FIG. 1 shows an exemplary system 100 that can detect protein-protein complex interactions. In the system 100, an imager 102 images a sample of a protein-protein complex 104. This can enable detection of the protein-protein complex 104 and enable the imager 102 to be a sensor that detects the docking characteristics of the protein complex 104.

イメージャ１０２は、複合体１０４の物理的現象を検知することが可能であり、これらの物理的現象を反映するデータ（例えば、デジタル情報）を生成するイメージャである。例えば、イメージャは、加速電子のビームを複合体１０４を通じてセンサまで通すことによって複合体１０４を測定することが可能な低温電子顕微鏡とすることができる。ビームにおける摂動を記録および測定して、形状、断面密度等に関する情報を捕捉する。この検知を支援するために、複合体１０４は、低温（例えば、極低温）で極薄氷層に保持することができる。理解されるように、複合体１０４は、追加のタンパク質および他の成分を含むことができる。 Imager 102 is an imager capable of sensing physical phenomena of complex 104 and generating data (e.g., digital information) reflecting these physical phenomena. For example, the imager can be a cryo-electron microscope capable of measuring complex 104 by passing a beam of accelerated electrons through complex 104 to a sensor. Perturbations in the beam are recorded and measured to capture information about shape, cross-sectional density, etc. To aid in this sensing, complex 104 can be held in an ultra-thin layer of ice at low temperatures (e.g., cryogenic temperatures). As will be appreciated, complex 104 can include additional proteins and other components.

イメージャ１０２は、複合体画像１０６を生成することができる。例えば、イメージャ１０２は、複合体１０４ごとに複合体画像１０６の１つのセットを生成することができる。いくつかの場合、複合体画像１０６のうちのいくつかは、例えば、他の複合体１０４の角度と異なる角度で複合体１０４を捕捉するために除外することができる。理解されるように、いくつかのｃｒｙｏ－ＥＭプロセスは、グリッドの検知の際に特定の配向にバイアスされる複合体を伴い、結果として、同じまたは類似の配向を有する、場合によっては全てではなく、多くの複合体１０４が生じる。加えて、複合体１０４のうちのいくつかは、対応する画像１０６において捕捉されない場合がある。例えば、イメージングプロセスは、複合体１０４全体を捕捉するのに失敗する場合がある（例えば、特定の配向への複合体バイアスであり、これにより様々な配向からの複合体の完全なイメージングが捕捉されない）。 The imager 102 can generate the composite images 106. For example, the imager 102 can generate one set of composite images 106 for each composite 104. In some cases, some of the composite images 106 can be omitted, for example, to capture the composite 104 at an angle different from that of the other composites 104. As will be appreciated, some cryo-EM processes involve the composite being biased to a particular orientation during sensing of the grid, resulting in many, but possibly not all, composites 104 having the same or similar orientation. In addition, some of the composites 104 may not be captured in the corresponding image 106. For example, the imaging process may fail to capture the entire composite 104 (e.g., the composite biases to a particular orientation, which does not capture complete imaging of the composite from various orientations).

複合体画像１０６を合成画像１０８に集約することができる。合成画像１０８は、各複合体画像、このため各検知された複合体１０４による影響を受ける単一のデータオブジェクトを可能にするフォーマットで複合体画像１０６の集合体を表すことができる。 The composite images 106 can be aggregated into a composite image 108. The composite image 108 can represent the collection of composite images 106 in a format that allows for a single data object that is affected by each composite image and thus each detected composite 104.

複合体画像１０６を集約して合成画像にするプロセスは、複合体１０４の捕捉画像を組み合わせることができるように複合体画像１０６の要素を抽出、再配向および分類する動作を含むことができる。例えば、複合体画像１０６の各々を調査して、背景値示す部分（例えば、ピクセルの集合）および複合体１０４を示すおよび部分を識別することができる。複合体１０４を示す部分を抽出して新たなデータファイルにし、コンピュータビジョンにより分析して、値の一意のクラスタ、複合体１０４の最も長い軸等の特徴を識別することができる。次に、これらの抽出データを回転させて、特徴、このため抽出データ全体を位置合わせすることができる。例えば、データを、特徴のクラスタが、テンプレート画像に対する最小限の誤差または差異を有するようにするか、または最も長い軸が特定の角度（例えば、０または９０度）に一致するような角度だけ回転させることができる。次に、複合体１０４ごとの抽出データを、これが作業ファイルにおいて全てが同じ配向およびロケーションにある複合体１０４の画像を組み合わせるという予測により組み合わせることができる。 The process of aggregating the composite images 106 into a composite image can include extracting, reorienting, and classifying elements of the composite images 106 so that the captured images of the composites 104 can be combined. For example, each of the composite images 106 can be examined to identify portions (e.g., groups of pixels) that exhibit background values and portions that exhibit the composites 104. The portions that exhibit the composites 104 can be extracted into a new data file and analyzed by computer vision to identify features such as unique clusters of values, the longest axis of the composites 104, etc. These extracted data can then be rotated to align the features, and thus the entire extracted data. For example, the data can be rotated by an angle such that the clusters of features have minimal error or difference relative to the template image, or the longest axis coincides with a particular angle (e.g., 0 or 90 degrees). The extracted data for each composite 104 can then be combined with the expectation that this will combine images of the composites 104 that are all in the same orientation and location in the working file.

ドッキングモデル１１０のグループは、各々が、タンパク質－タンパク質の相互作用の１つの可能なモデルを記述することができる。複合体１０４のタンパク質－タンパク質相互作用を特徴付けるために、各ドッキングモデルに、ｉ）合成画像１０８と、ｉｉ）結合部位を含むことが予期されない合成画像１０８のエリアをマスクアウトする画像マスク１１２と、ｉｉｉ）データにおいて、複合体１０４におけるタンパク質のうちの１つを記述する３Ｄ形状１１４と、ｉｖ）データにおいて、複合体１０４におけるタンパク質のうちの他のものを記述する３Ｄ形状１１６とを設けることができる。３つ以上のタンパク質がＰＰＩに関与する場合、更なるマスクおよび３Ｄ形状を使用することができる。理解されるように、ゼロマスクが使用される事例が存在する場合があり、マスクのシーケンスを使用して動作を反復的に実行する場合がある事例が存在する場合がある。 The group of docking models 110 may each describe one possible model of protein-protein interaction. To characterize the protein-protein interactions of the complex 104, each docking model may be provided with i) a composite image 108, ii) an image mask 112 that masks out areas of the composite image 108 that are not expected to contain binding sites, iii) a 3D shape 114 that describes one of the proteins in the complex 104 in the data, and iv) a 3D shape 116 that describes another of the proteins in the complex 104 in the data. If more than two proteins are involved in PPI, additional masks and 3D shapes may be used. As will be appreciated, there may be cases where zero masks are used and where operations may be performed iteratively using a sequence of masks.

ドッキングモデルごとに、ドックモデルとそこに提供されるデータとの間の適合の尺度を記述するスコア１１８が生成される。換言すれば、スコア１１８は、モデル１１０がデータ１０８、１１２～１１６とどの程度類似しているかまたは異なっているかを記録する。 For each docking model, a score 118 is generated that describes a measure of fit between the docking model and the data provided thereto. In other words, the score 118 records how similar or different the model 110 is from the data 108, 112-116.

最大スコア１１８を調査して、選択されたドッキングモデル１２０を識別することができる。例えば、２つの最高スコアを識別し、コンピュータ１２２のユーザインタフェースにレンダリングすることができる。次に、例えば、ユーザ入力および／またはコンピュータビジョンプロセスによって、これらのドッキングモデルのうちの１つを選択することができる。 The maximum scores 118 can be examined to identify a selected docking model 120. For example, the two highest scores can be identified and rendered in a user interface of the computer 122. One of these docking models can then be selected, for example, by user input and/or a computer vision process.

この例において、２つのタンパク質の相互作用が記述される。しかしながら、３つ以上のタンパク質での相互作用が可能であることが理解される。例えば、ここに記載のプロセスは、タンパク質の対ごと、または互いに接触するタンパク質の対ごとに繰り返すことができる。これらの繰り返されるプロセスは、例えば連続してまたは並列に実行することができる。 In this example, the interaction of two proteins is described. However, it is understood that interactions with more than two proteins are possible. For example, the processes described herein can be repeated for each pair of proteins, or for each pair of proteins that contact each other. These repeated processes can be performed, for example, in series or in parallel.

図２は、タンパク質－タンパク質複合体を検知する際に使用することができる例示のデータ１０６～１１４を示す。例えば、データ１０６～１１４は、コンピュータメモリに記憶される、コンピューティングデバイス間でデータネットワークを介して送信される等のバイナリデジタル情報を含むことができる。データはバイナリフォーマットでディスク上に記憶することができ、バイナリデータによって定義された色および形状で表示スクリーン上にレンダリングすることができる。 FIG. 2 shows example data 106-114 that can be used in detecting protein-protein complexes. For example, the data 106-114 can include binary digital information stored in computer memory, transmitted over a data network between computing devices, etc. The data can be stored on disk in binary format and rendered on a display screen with colors and shapes defined by the binary data.

複合体画像１０６は、低温電子顕微鏡（ｃｒｙｏ－ＥＭ）画像を含むことができる。各画像は、ピクセルのビットマップ、すなわち、各セルを一意に識別するために［ｘ］［ｙ］によってアドレス指定された規則的２次元グリッドに配列されたセルを含むことができる。各セルは、例えば０～１の値を使用することができる強度フォーマット、赤緑青（ＲＧＢ）フォーマット、６桁ｈｅｘフォーマット等で値を表す１つまたはそれ以上の値を含むことができる。各ピクセルの値は、タンパク質－タンパク質複合体のサンプルの対応する部分を表す。例えば、画像１０２のセンサマップは、タンパク質－タンパク質複合体１０４の一部分を通過する電子を受け、検知を数値に変換し、その数値を、合成画像１０６において同様にアドレス指定されたピクセルに記憶することができる。理解されるように、複合体画像１０６は、複合体１０４に関する情報の記録に起因してそのように命名することができる。 The complex images 106 may include cryo-electron microscopy (cryo-EM) images. Each image may include a bitmap of pixels, i.e., cells arranged in a regular two-dimensional grid, addressed by [x][y] to uniquely identify each cell. Each cell may include one or more values representing values in, for example, an intensity format that may use values between 0 and 1, a red-green-blue (RGB) format, a six-digit hex format, etc. The value of each pixel represents a corresponding portion of a sample of a protein-protein complex. For example, a sensor map of the image 102 may receive electrons passing through a portion of the protein-protein complex 104, convert the detection to a numerical value, and store the numerical value in a similarly addressed pixel in the composite image 106. As will be appreciated, the complex images 106 may be so named due to the recording of information about the complex 104.

合成画像１０６は、複合体画像１０６のグループの集合体（例えば、クラス平均）、例えば、イメージャ１０２によって検知されたのと同じタイプのタンパク質－タンパク質複合体１０４の異なる例の様々な異なる複合体画像１０６を含むことができる。各画像は、ピクセルのビットマップ、すなわち、各セルを一意に識別するために［ｘ］［ｙ］によってアドレス指定された規則的２次元グリッドに配列されたセルを含むことができる。各セルは、例えば赤緑青（ＲＧＢ）フォーマット、６桁ｈｅｘフォーマット等で色を表す１つまたはそれ以上の値を含むことができる。各ピクセルの色値は、複数の合成画像の各々における同じアドレスを有するピクセルの色値の集合体である色値を表す。例えば、合成画像１０８のピクセル［１３３］［２１７］について、複合体画像１０６の抽出されたサブ画像のグループにおける各ピクセル［１３３］［２１７］の色値を集約することができる。この集約は単純な平均、合算、またはピクセル値および他の技術的係数のデータフォーマットに適した他の集約基準とすることができる。 The composite image 106 may include an aggregate (e.g., class average) of a group of composite images 106, e.g., various different composite images 106 of different instances of the same type of protein-protein complex 104 detected by the imager 102. Each image may include a bitmap of pixels, i.e., cells arranged in a regular two-dimensional grid, addressed by [x][y] to uniquely identify each cell. Each cell may include one or more values representing a color, e.g., in red-green-blue (RGB) format, 6-digit hex format, etc. The color value of each pixel represents a color value that is an aggregate of the color values of pixels having the same address in each of the multiple composite images. For example, for pixel [133][217] of the composite image 108, the color values of each pixel [133][217] in the group of extracted sub-images of the composite image 106 may be aggregated. This aggregation may be a simple average, summation, or other aggregation criterion appropriate to the data format of the pixel values and other technical coefficients.

画像マスク１１２は、合成画像１０８等の別の画像のマスキング部分および非マスキング部分を指定する情報を含むことができる。各画像は、ピクセルのビットマップ、すなわち、各セルを一意に識別するために［ｘ］［ｙ］によってアドレス指定された規則的２次元グリッドに配列されたセルを含むことができる。各セルは、例えば赤緑青（ＲＧＢ）フォーマット、６桁ｈｅｘフォーマット等で色を表す１つまたはそれ以上の値を含むことができる。各ピクセルの色値は、マスキングステータス、非マスキングステータス等について予約された色値を表す。例えば、黒色および白色を使用することができる。画像１１２’は、合成画像１０８の上に重ね合わされた画像マスク１１２を示す。ここで、マスキングセクションは黒色でレンダリングされ、非マスキングセクションは、合成画像１０８の非マスキングピクセルのピクセル値を使用してレンダリングされている。いくつかの構成において、画像マスク１１２は、マスキングセクションまたは非マスキングセクションのエッジを記述するバウンディングボックスを含むかまたはこれを使用することができる。例えば、プロセス（例えば、ユーザ入力選択、自動スクリプト）は、頂点のグループを識別し、頂点間で、バウンディングボックスとして機能する多角形を作成するためのエッジを作成することができる。 The image mask 112 may include information specifying masked and unmasked portions of another image, such as the composite image 108. Each image may include a bitmap of pixels, i.e., cells arranged in a regular two-dimensional grid, addressed by [x][y] to uniquely identify each cell. Each cell may include one or more values representing a color, e.g., in red-green-blue (RGB) format, 6-digit hex format, etc. The color value of each pixel represents color values reserved for masked status, unmasked status, etc. For example, black and white colors may be used. Image 112' shows the image mask 112 superimposed on the composite image 108, where the masked section is rendered in black and the unmasked section is rendered using pixel values of the unmasked pixels of the composite image 108. In some configurations, the image mask 112 may include or use a bounding box that describes the edges of the masked or unmasked sections. For example, a process (e.g., user input selection, automated script) can identify groups of vertices and create edges between the vertices to create polygons that act as bounding boxes.

３Ｄ形状１１４は、単一のタンパク質の形状または他の分子構造を指定するための情報を含むことができる。例えば、３Ｄ形状１１４は、ＨＥＡＤＥＲ、ＴＩＴＬＥおよびＡＵＴＨＯＲレコード；ＲＥＭＡＲＫレコード；ＳＥＱＲＥＳレコード；ＡＴＯＭレコード；ならびにＨＥＴＡＴＭレコードを記録するタンパク質データバンク（．ｐｄｂ）ファイルを含むことができる。しかしながら、他のファイルタイプおよび他のデータモデルが使用される場合がある。例えば、３Ｄ形状１１４は、高分子構造データを表すためのタグ値フォーマットでデータを記録する高分子結晶情報ファイル（．ｍｍＣＩＦ）ファイルを含むことができる The 3D shape 114 may include information for specifying the shape of a single protein or other molecular structure. For example, the 3D shape 114 may include a Protein Data Bank (.pdb) file that records HEADER, TITLE, and AUTHOR records; REMARK records; SEQRES records; ATOM records; and HETATM records. However, other file types and other data models may be used. For example, the 3D shape 114 may include a Macromolecule Crystal Information File (.mmCIF) file that records data in a tag-value format for representing macromolecule structural data.

３Ｄ形状１１４および１１６は、タンパク質－タンパク質複合体１０４における２つのタンパク質のうちの一致するものに基づいて使用のために選択することができる。例えば、第１のタンパク質が既知であり、完全に記述された３Ｄ形状１１４を有し、第２のタンパク質も既知であり、完全に記述された３Ｄ形状１１６を有する場合、これらの３Ｄ形状１１４および１１６を、タンパク質の名称でインデックス付けし、これらのプロセスにおいて使用することができる。しかしながら、いくつかの場合、タンパク質の一方または双方の同族体の３Ｄ形状が使用される場合がある。そのような場合、構造的に類似したタンパク質を同族体として識別することができ、同族体の名称によってインデックス付けされた３Ｄ形状にアクセスすることができる。 The 3D shapes 114 and 116 can be selected for use based on matching of the two proteins in the protein-protein complex 104. For example, if a first protein is known and has a fully described 3D shape 114, and a second protein is also known and has a fully described 3D shape 116, these 3D shapes 114 and 116 can be indexed by the name of the protein and used in these processes. However, in some cases, the 3D shapes of homologs of one or both of the proteins may be used. In such cases, structurally similar proteins can be identified as homologs, and the 3D shapes indexed by the name of the homolog can be accessed.

ドッキングモデル１１０は、タンパク質－タンパク質複合体において２つのタンパク質の可能なポーズペアを定義する構造化データを含む。例えば、ポーズペアは、相対的ロケーション、相対的配向およびドッキングエリアを含むことができる。データは、１つのタンパク質におけるポイントが３Ｄ空間におけるポイント［０］［０］［０］にあると仮定するように編成することができる。次に、ポーズは、第２のタンパク質を位置特定するのに必要な原点からの並進運動を定義する［ｘ］［ｙ］［ｚ］の観点での並進運動（例えば移動）を指定することができる。ポーズは、第２のタンパク質の配向を位置特定するのに必要な第１のタンパク質の配向からの回転を定義する［ｘ］［ｙ］［ｚ］の観点での回転（例えばスピン）を指定することもできる。ドッキングエリアは、モデルが２つのタンパク質がドッキングまたは接触するドッキング表面として指定するタンパク質の１つまたはそれ以上の表面を指定することができる。ドッキングモデル１１０は、物理的タンパク質接触エリアを表すと考えられる予期される規則に従って計算的に生成することができる。ドッキングモデル１１０は、実際のタンパク質－タンパク質複合体の実世界サンプルを測定する実験に従って実験的に生成することができる。 The docking model 110 includes structured data that defines possible pose pairs of two proteins in a protein-protein complex. For example, the pose pairs can include relative locations, relative orientations, and docking areas. The data can be organized to assume that a point in one protein is at point [0][0][0] in 3D space. The pose can then specify a translation (e.g., translation) in terms of [x][y][z] that defines the translation from the origin required to locate the second protein. The pose can also specify a rotation (e.g., spin) in terms of [x][y][z] that defines the rotation from the orientation of the first protein required to locate the orientation of the second protein. The docking area can specify one or more surfaces of the proteins that the model designates as docking surfaces where the two proteins dock or contact. The docking model 110 can be computationally generated according to expected rules that are believed to represent physical protein contact areas. The docking model 110 can be experimentally generated according to experiments that measure real-world samples of actual protein-protein complexes.

図３は、タンパク質－タンパク質複合体を検知するための例示のプロセス３００を示す。例えば、プロセス３００は、システム１００の要素と共に実行することができ、明確にするために、ここでの例は、システム１００の要素の観点で説明される。しかしながら、他のシステムを使用して、プロセス３００および他の類似のプロセスを実行する場合がある。 FIG. 3 illustrates an example process 300 for detecting protein-protein complexes. For example, process 300 can be performed with elements of system 100, and for clarity, the examples herein are described in terms of elements of system 100. However, other systems may be used to perform process 300 and other similar processes.

第１のタンパク質および第２のタンパク質を含むタンパク質－タンパク質複合体のサンプルの複数の複合体画像を含む合成画像がアクセスされる（３０２）。例えば、コンピュータ１２２は、内部メモリから、またはリモート（例えば、クラウドによりホスティングされた）メモリサービスから合成画像１０８にアクセスすることができる。これは例えば、イメージャ１０２によってイメージングされたタンパク質複合体１０４の分析を要求するユーザ入力を受ける結果として生じる場合がある。 A composite image is accessed (302), the composite image including multiple complex images of a sample of a protein-protein complex including a first protein and a second protein. For example, the computer 122 may access the composite image 108 from an internal memory or from a remote (e.g., cloud-hosted) memory service. This may occur, for example, as a result of receiving a user input requesting analysis of a protein complex 104 imaged by the imager 102.

合成画像はマスキングされ（３０４）、マスキングされた部分および非マスキング部分が生成される。例えば、画像マスク１１２を合成画像に適用して、画像マスク１１２に記憶されたピクセル値に基づいて、マスキング部分および非マスキング部分を指定することができる。いくつかの構成において、画像マスク１１２は、自動化されたスクリプトによって、またはそうでない場合、特定のユーザ入力なしで生成される。いくつかの場合、コンピュータビジョン技法を適用して、合成画像１０８における特徴を識別することができ、自動化されたコンピュータビジョンプロセスによってマスクが生成される。いくつかの構成において、画像マスク１１２は、ユーザからの入力を使用して生成される。いくつかの場合、合成画像のマスキングは行われない。１つのそのようなプロセスの例が本書において後に説明される。 The composite image is masked (304) to generate masked and unmasked portions. For example, an image mask 112 may be applied to the composite image to designate masked and unmasked portions based on pixel values stored in the image mask 112. In some configurations, the image mask 112 is generated by an automated script or otherwise without specific user input. In some cases, computer vision techniques may be applied to identify features in the composite image 108, and the mask is generated by an automated computer vision process. In some configurations, the image mask 112 is generated using input from a user. In some cases, no masking of the composite image is performed. An example of one such process is described later in this document.

第１のタンパク質の第１の３次元（３Ｄ）形状および第２のタンパク質の第２の３Ｄ形状がアクセスされる（３０６）。例えば、コンピュータ１２２は、内部メモリから、またはリモート（例えば、クラウドによりホスティングされた）メモリサービスから３Ｄ形状１１４および１１６にアクセスすることができる。いくつかの場合、コンピュータ１２２は、タンパク質－タンパク質複合体１０４において特定のタンパク質を検索することによって、またはタンパク質の一方もしくは双方の１つもしくはそれ以上の同族体を検索することによって、またはそのような同族体の異なる部分を組み合わせて新たな同族体を作成することによって、３Ｄ形状のライブラリから３Ｄ形状１１４および１１６を探すことができる。 A first three-dimensional (3D) shape of the first protein and a second 3D shape of the second protein are accessed (306). For example, the computer 122 can access the 3D shapes 114 and 116 from an internal memory or from a remote (e.g., cloud-hosted) memory service. In some cases, the computer 122 can look for the 3D shapes 114 and 116 in a library of 3D shapes by searching for a particular protein in the protein-protein complex 104, or by searching for one or more homologs of one or both of the proteins, or by combining different portions of such homologs to create new homologs.

各々が候補ポーズペアを定義する複数のドッキングモデルがアクセスされる（３０８）。例えば、コンピュータ１２２は、内部メモリから、またはリモート（例えば、クラウドによりホスティングされた）メモリサービスからモデル１１０にアクセスすることができる。いくつかの場合、コンピュータ１２２は、利用可能な全ての可能なモデル１１０を探すことができる。いくつかの場合、コンピュータ１１２は、プロセス３００の技術的要件に基づいて指定された特定のパラメータを有するモデルのみについてクエリを行うことによって、全ての可能なモデル１１０のサブセットを探すことができる。 A number of docking models are accessed (308), each of which defines a candidate pose pair. For example, the computer 122 may access the models 110 from an internal memory or from a remote (e.g., cloud-hosted) memory service. In some cases, the computer 122 may search all possible models 110 available. In some cases, the computer 112 may search a subset of all possible models 110 by querying only for models with certain parameters specified based on the technical requirements of the process 300.

ドッキングモデルごとに、第１の３Ｄ形状、第２の３Ｄ形状および候補ポーズペアが適用され（３１０）、ドッキングモデルについて、ポーズペアとドッキングモデルとの適合の良好度を記述する対応する適合スコアが生成される。例えば、コンピュータ１２２は、合成画像１０８、画像マスク１１２、３Ｄ形状１１４および１１６、ならびに単一のモデル１１０を適合関数に供給することによって単一のモデル１１０の適合スコアを計算することができ、適合関数は、この入力に対し計算を行い、モデルが他の入力データの特定の状態をどの程度良好に記述するかを示す数値を返す。コンピュータ１２２は、モデル１１０ごとにこれを繰り返すことができる。 For each docking model, the first 3D shape, the second 3D shape, and the candidate pose pair are applied (310), and a corresponding fit score is generated for the docking model that describes how well the pose pair fits the docking model. For example, computer 122 can calculate the fit score for the single model 110 by feeding the composite image 108, the image mask 112, the 3D shapes 114 and 116, and the single model 110 to a fit function, which performs calculations on the inputs and returns a numerical value indicating how well the model describes a particular state of the other input data. Computer 122 can repeat this for each model 110.

適合スコアに基づいて、タンパク質－タンパク質複合体のための検知されたモデルとしてドッキングモデルのうちの１つが選択される（３１２）。例えば、コンピュータ１２２は、各モデル１１０の適合スコア、および以下に説明するように、場合により他のデータに基づいて最良のモデル１１０を選択することができる。 Based on the fit scores, one of the docking models is selected as the detected model for the protein-protein complex (312). For example, the computer 122 can select the best model 110 based on the fit scores of each model 110 and possibly other data, as described below.

図４は、例えば合成画像にアクセスする（３０２）前に実行される前処理の一部として、合成画像を生成するための例示のプロセス４００を示す。例えば、プロセス４００は、システム１００の要素と共に実行することができ、明確にするために、ここでの例は、システム１００の要素の観点で説明される。しかしながら、他のシステムを使用して、プロセス４００および他の類似のプロセスを実行する場合がある。 FIG. 4 illustrates an example process 400 for generating a composite image, e.g., as part of pre-processing performed prior to accessing 302 the composite image. For example, process 400 can be performed in conjunction with elements of system 100, and for clarity, the examples herein are described in terms of elements of system 100. However, other systems may be used to perform process 400 and other similar processes.

タンパク質－タンパク質複合体サンプルは、低温電子顕微鏡に装填される（４０２）。例えば、人間のオペレータおよび／または自動化サービスマシン（例えば、材料取り扱いロボット）が、タンパク質－タンパク質複合体１０４を低温冷却し、これらをガラス状水（ｖｉｔｒｅｏｕｓｗａｔｅｒ）等の媒体に埋め込むことができる。溶液をグリッドメッシュに施し、液体エタン等の冷却媒体において凍結させることができる。次に、メッシュをイメージャ１０２内に装填することができる。 The protein-protein complex sample is loaded into a cryo-electron microscope (402). For example, a human operator and/or an automated service machine (e.g., a material handling robot) can cryo-cool the protein-protein complexes 104 and embed them in a medium such as vitreous water. The solution can be applied to a grid mesh and frozen in a cooling medium such as liquid ethane. The mesh can then be loaded into the imager 102.

複数の複合体画像が生成される（４０４）。例えば、人間のオペレータ、自動サービスマシン、および／またはコンピュータ１２２は、複合体１０４に対し電子顕微鏡法を実行して複合体画像１０６を生成するようにイメージャ１０２に命令することができる。生成されると、複合体画像１０６は、コンピュータメモリに（例えば、コンピュータ１２２の内部、または外部ロケーションに）記憶することができる。 A number of composite images are generated (404). For example, a human operator, an automated service machine, and/or a computer 122 can instruct the imager 102 to perform electron microscopy on the composite 104 to generate a composite image 106. Once generated, the composite image 106 can be stored in computer memory (e.g., internal to the computer 122 or at an external location).

合成画像は、複数の複合画像から生成される（４０６）。例えば、ピクセルロケーションごとに、コンピュータ１２２は、平均を算出し、平均を画像１０６の複数のピクセルにわたる所与のピクセルロケーションに記憶することによってピクセル値を集約し、単一の集約ピクセル値を作成することができ、そのピクセル値を合成画像１０８に記憶することができる。いくつかの場合、この集合体は、加重平均とすることができ、外れ値を除外することができ、中央値またはモードを含むことができる等である。 A composite image may be generated (406) from the multiple composite images. For example, for each pixel location, computer 122 may aggregate the pixel values by calculating an average and storing the average at a given pixel location across multiple pixels of image 106 to create a single aggregate pixel value, which may be stored in composite image 108. In some cases, this aggregation may be a weighted average, may exclude outliers, may include a median or mode, etc.

いくつかの場合、合成画像を生成することは、複合体画像からタンパク質－タンパク質－複合体のサブ画像を抽出することと；サブ画像を分類および配向することとを含む。例えば、コンピュータ１２２は、画像１０６の各々を調査し、複合体を示すピクセルエリアを発見することができ、これらのピクセル値を別個のサブ画像ファイル内にコピーすることができる。別の例において、コンピュータ１２２は、別個のファイルを使用することなくこれを行うことができるが、明確にするために、別個のファイルが説明される。次に、別個のファイルごとに、コンピュータは、各サブ画像が同じ方向、スケール、強度等のタンパク質を示すようにサブ画像を変更することができる。理解されるように、これは１つまたはそれ以上の画像操作プロセスを含むことができる。 In some cases, generating a composite image involves extracting protein-protein-complex sub-images from the complex image; and sorting and orienting the sub-images. For example, computer 122 can examine each of images 106 and find pixel areas that show the complex, and copy these pixel values into separate sub-image files. In another example, computer 122 can do this without using separate files, but for clarity, separate files are described. Then, for each separate file, the computer can modify the sub-images so that each sub-image shows the protein with the same orientation, scale, intensity, etc. As will be appreciated, this can include one or more image manipulation processes.

図５は、例えば合成画像のマスキング３０４の一部として、合成画像をマスキングするための例示のプロセス５００を示す。例えば、プロセス５００は、システム１００の要素と共に実行することができ、明確にするために、ここでの例は、システム１００の要素の観点で説明される。しかしながら、他のシステムを使用して、プロセス５００および他の類似のプロセスを実行する場合がある。 FIG. 5 illustrates an example process 500 for masking a composite image, e.g., as part of masking composite image 304. For example, process 500 can be performed in conjunction with elements of system 100, and for clarity, the examples herein are described in terms of elements of system 100. However, other systems may be used to perform process 500 and other similar processes.

マスキンググラフィックユーザインタフェース（ＧＵＩ）がユーザに提示される（５０２）。例えば、コンピュータ１１２は、アプリケーションインタフェースまたはスクリーン上のウェブページ等のＧＵＩをロードすることができる。スクリーンは、合成画像１０８の画像を、ユーザ入力を受けるインタフェース要素（例えば、ボタン、スクロールバー）と共にレンダリングすることができる。ユーザ入力は、人間のオペレータが物理的ボタンを押下し、ポインティングデバイスを動かし、タッチスクリーンをタップすること等により提供することができる。 A masking graphic user interface (GUI) is presented to the user (502). For example, the computer 112 can load the GUI, such as an application interface or a web page on a screen. The screen can render an image of the composite image 108 along with interface elements (e.g., buttons, scroll bars) that receive user input. User input can be provided by a human operator pressing physical buttons, moving a pointing device, tapping a touch screen, etc.

非マスキング部分を指定する第１のユーザ入力が受けられる（５０４）。例えば、ユーザはインタフェース要素を使用して、レンダリングされた合成画像１０８上の複数の（例えば、３、４、６または９個の）ポイントを指定することができる。例えば、ユーザは、分野の知識を使用して、抗原によって指定されたタンパク質－タンパク質複合体１０４のドッキングエリアを示す可能性が高い合成画像１０８のエリアを視覚的に識別することができる。次に、ユーザは、マウス等のポインティングデバイスを使用して、識別するエリアの周りに描かれるバウンティングボックスの４つの頂点を識別することができる。 A first user input is received (504) specifying the unmasked portion. For example, the user may use an interface element to specify multiple (e.g., 3, 4, 6, or 9) points on the rendered composite image 108. For example, the user may use domain knowledge to visually identify areas of the composite image 108 that are likely to represent docking areas of the antigen-specified protein-protein complex 104. The user may then use a pointing device, such as a mouse, to identify four vertices of a bounding box that is drawn around the identified area.

バウンディングボックスは、第１のユーザ入力によって指定されたロケーションを接続することによって生成される（５０６）。例えば、コンピュータ１２２は、第１のロケーションおよび最後のロケーションにおいて終端する線分を含む、ユーザによって識別される連続ポイントにおいて終端する線分を計算的に生成することができる。これにより、完全に接続された多角形を生成することができる。 The bounding box is generated by connecting the locations specified by the first user input (506). For example, the computer 122 can computationally generate line segments terminating at successive points identified by the user, including line segments terminating at the first location and the last location. This can generate a fully connected polygon.

ボックスの外側部分は、マスキング部分として記録され（５０８）、ボックスの内側部分は非マスキング部分として記録される（５１０）。例えば、多角形の完全にまたは部分的に内側の各ピクセルには、画像マスク１１２において色値（例えば、黒、白）を与えることができ、多角形の完全にまたは部分的に外側の各ピクセルには、異なる色値（例えば、白、黒）を与えることができる。 The portion outside the box is marked as the masked portion (508) and the portion inside the box is marked as the unmasked portion (510). For example, each pixel that is fully or partially inside the polygon can be given a color value in the image mask 112 (e.g., black, white) and each pixel that is fully or partially outside the polygon can be given a different color value (e.g., white, black).

図６は、例えば、検知されたモデル３１２を選択する部分として、候補モデルのグループから検知されたモデルを選択するための例示のプロセス６００を示す。例えば、プロセス６００は、システム１００の要素と共に実行することができ、明確にするために、ここでの例は、システム１００の要素の観点で説明される。しかしながら、他のシステムを使用して、プロセス６００および他の類似のプロセスを実行する場合がある。 FIG. 6 illustrates an example process 600 for selecting a detected model from a group of candidate models, e.g., as part of selecting detected model 312. For example, process 600 can be performed in conjunction with elements of system 100, and for clarity, the examples herein are described in terms of elements of system 100. However, other systems may be used to perform process 600 and other similar processes.

最良適合スコアを有する候補ドッキングモデルが選択される（６０２）。例えば、データがモデル３１０に適用されるとき、適合スコアがモデルごとに計算される。適合スコアは、モデル１１０が所与として画像マスク１１２ならびに３Ｄ形状１１４および１１６を与えられて、合成画像１０８における色の配置をどの程度良好に予測するかの尺度とみなすことができる。いくつかの場合、適合スコアは、ドッキングモデルの様々な配向を２Ｄ画像に投影し、投影画像を検知された複合体画像１０６と比較することによって生成される。検知された複合体に対し最も小さな差異を生じる投影は、最良の相互相関スコアでスコアリングされる。 The candidate docking model with the best fit score is selected (602). For example, when data is applied to the models 310, a fit score is calculated for each model. The fit score can be viewed as a measure of how well the model 110 predicts the arrangement of colors in the composite image 108 given the image mask 112 and the 3D shapes 114 and 116. In some cases, the fit score is generated by projecting various orientations of the docking model into the 2D image and comparing the projected images to the sensed composite image 106. The projection that produces the smallest difference to the sensed composite is scored with the best cross-correlation score.

モデル１１０ごとの適合スコアを使用して、最良適合スコアを有するモデル１１０のサブセットが識別される。いくつかの場合、これらは上位スコアのモデル１１０である。これらは、コンピュータ１２２が、Ｎ個（例えば、５、１０、２０、１００個）の最高適合スコアを有するドッキングモデル１１０を選択することによって発見することができる。いくつかの場合、これらは十分予測的な任意のモデルである。これらは、コンピュータ１２２が、閾値Ｍ（例えば、０～１のスケールで０．８、０．９、０．０９５、０．０９９９）を上回る適合スコアを有する全てのドッキングモデル１１０を選択することによって発見することができる。 The fit scores for each model 110 are used to identify a subset of models 110 with the best fit scores. In some cases, these are the top scoring models 110. These can be found by the computer 122 selecting the docking models 110 with the N (e.g., 5, 10, 20, 100) highest fit scores. In some cases, these are any models that are sufficiently predictive. These can be found by the computer 122 selecting all docking models 110 with fit scores above a threshold M (e.g., 0.8, 0.9, 0.095, 0.0999 on a scale of 0 to 1).

ユーザインタフェースにおいて候補モデルが提示され（６０４）、ドッキングモデルのサブセットのうちの１つを検知されたモデルとして選択するユーザ選択入力が受けられる（６０６）。例えば、コンピュータ１２２は、各ドックモデルをレンダリングし、関連付けられたスコア１１８を、合成画像１０８のレンダリングと共に示すことによって、モデル１１０のサブセットを表示することができる。ユーザは入力デバイスを使用して１つを選択することができる。いくつかの場合、全ての候補ドッキングモデル１１０が同時に示され、ユーザが全てのオプションを同時にレビューすることを可能にし、より好便で正確な検討を可能にする。 The candidate models are presented in a user interface (604) and a user selection input is received (606) to select one of the subset of docking models as the detected model. For example, the computer 122 can display the subset of models 110 by rendering each docking model and showing the associated score 118 along with a rendering of the composite image 108. The user can select one using an input device. In some cases, all candidate docking models 110 are shown simultaneously, allowing the user to review all options simultaneously, allowing for more convenient and accurate consideration.

図７は、コンピューティングデバイス７００の例、およびここに記載の技法を実施するのに使用することができるモバイルコンピューティングデバイスの例を示す。コンピューティングデバイス７００は、ラップトップ、デスクトップ、ワークステーション、携帯情報端末、サーバ、ブレードサーバ、メインフレーム、および他の適切なコンピュータ等のデジタルコンピュータの様々な形態を表すように意図される。モバイルコンピューティングデバイスは、携帯情報端末、携帯電話、スマートフォンおよび他の類似のコンピューティングデバイス等のモバイルデバイスの様々な形態を表すように意図される。ここに示す構成要素、それらの接続および関係、ならびにそれらの機能は、例示のみを意図し、本明細書に説明されおよび／または特許請求される発明の実施態様を限定するように意図されない。 7 illustrates an example of a computing device 700 and an example of a mobile computing device that can be used to implement the techniques described herein. The computing device 700 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The mobile computing device is intended to represent various forms of mobile devices, such as personal digital assistants, mobile phones, smartphones, and other similar computing devices. The components shown, their connections and relationships, and their functions are intended to be illustrative only and are not intended to limit the implementation of the invention described and/or claimed herein.

コンピューティングデバイス７００は、プロセッサ７０２と、メモリ７０４と、記憶デバイス７０６と、メモリ７０４および複数の高速拡張ポート７１０に接続する高速インタフェース７０８と、低速拡張ポート７１４および記憶デバイス７０６に接続する低速インタフェース７１２とを備える。プロセッサ７０２、メモリ７０４、記憶デバイス７０６、高速インタフェース７０８、高速拡張ポート７１０および低速インタフェース７１２の各々は、様々なバスを使用して相互接続され、共通のマザーボード上に、または適宜他の方式で搭載することができる。プロセッサ７０２は、メモリ７０４また記憶デバイス７０６に記憶された命令を含む、コンピューティングデバイス７００内での実行のための命令を処理して、ＧＵＩのためのグラフィカル情報を、高速インタフェース７０８に結合されたディスプレイ７１６等の外部入力／出力デバイスに表示することができる。他の実施態様において、複数のメモリおよびメモリタイプと共に、適宜複数のプロセッサおよび／または複数のバスを使用することができる。また、複数のコンピューティングデバイスを接続することができ、各デバイスは、（例えば、サーババンク、ブレードサーバグループ、またはマルチプロセッサシステムとして）必要な動作の部分を提供する。 The computing device 700 includes a processor 702, a memory 704, a storage device 706, a high-speed interface 708 that connects to the memory 704 and multiple high-speed expansion ports 710, and a low-speed interface 712 that connects to the low-speed expansion port 714 and the storage device 706. Each of the processor 702, memory 704, storage device 706, high-speed interface 708, high-speed expansion port 710, and low-speed interface 712 are interconnected using various buses and may be mounted on a common motherboard or in other manners as appropriate. The processor 702 may process instructions for execution within the computing device 700, including instructions stored in the memory 704 and/or the storage device 706, to display graphical information for a GUI on an external input/output device, such as a display 716 coupled to the high-speed interface 708. In other implementations, multiple processors and/or multiple buses may be used as appropriate, along with multiple memories and memory types. Also, multiple computing devices may be connected, with each device providing a portion of the required operations (e.g., as a server bank, a blade server group, or a multiprocessor system).

メモリ７０４は、コンピューティングデバイス７００内に情報を記憶する。いくつかの実施態様では、メモリ７０４は揮発性メモリユニットである。いくつかの実施態様では、メモリ７０４は不揮発性メモリユニットである。メモリ７０４は、磁気ディスクまたは光ディスク等のコンピュータ可読媒体の別の形態をとることもできる。 Memory 704 stores information within computing device 700. In some implementations, memory 704 is a volatile memory unit. In some implementations, memory 704 is a non-volatile memory unit. Memory 704 may also take another form of computer-readable medium, such as a magnetic disk or optical disk.

記憶デバイス７０６は、コンピューティングデバイス７００のための大容量記憶を提供することが可能である。いくつかの実施態様において、記憶デバイス７０６は、フロッピーディスクデバイス、ハードディスクデバイス、光ディスクデバイス、またはテープデバイス、フラッシュメモリまたは他の類似の固体メモリデバイス等のコンピュータ可読媒体、または記憶エリアネットワークもしくは他の構成におけるデバイスを含むデバイスのアレイであるか、またはこれらを含むことができる。コンピュータプログラム製品は情報担体内に有形的に具現化することができる。コンピュータプログラム製品は、実行されると上記の方法等の１つまたはそれ以上の方法を実行する命令も含むことができる。コンピュータプログラム製品は、メモリ７０４、記憶デバイス７０６またはプロセッサ７０２上のメモリ等のコンピュータまたは機械可読媒体上に有形に具現化することもできる。 The storage device 706 can provide mass storage for the computing device 700. In some embodiments, the storage device 706 can be or include a floppy disk device, a hard disk device, an optical disk device, or a computer-readable medium such as a tape device, a flash memory or other similar solid-state memory device, or an array of devices including devices in a storage area network or other configuration. The computer program product can be tangibly embodied in an information carrier. The computer program product can also include instructions that, when executed, perform one or more methods, such as the methods described above. The computer program product can also be tangibly embodied on a computer or machine-readable medium, such as memory 704, the storage device 706, or memory on the processor 702.

高速インタフェース７０８は、コンピューティングデバイス７００のための帯域幅消費型の動作を管理するのに対し、低速インタフェース７１２は、より少ない帯域幅消費型の動作を管理する。機能のそのような割り当ては例示にすぎない。いくつかの実施態様において、高速インタフェース７０８はメモリ７０４、（例えば、グラフィックプロセッサまたはアクセラレータを通じて）ディスプレイ７１６、および様々な拡張カード（図示せず）を受けることができる高速拡張ポート７１０とに結合される。実施態様において、低速インタフェース７１２は、記憶デバイス７０６および低速拡張ポート７１４に結合される。様々な通信ポート（例えば、ＵＳＢ、Ｂｌｕｅｔｏｏｔｈ、イーサネット、無線イーサネット）を含むことができる低速拡張ポート７１４は、キーボード、ポインティングデバイス、スキャナ、またはスイッチもしくはルータ等のネットワーキングデバイスなどの１つまたはそれ以上の入力／出力デバイスに、例えばネットワークアダプタを通じて結合することができる。 The high-speed interface 708 manages bandwidth-intensive operations for the computing device 700, while the low-speed interface 712 manages less bandwidth-intensive operations. Such allocation of functions is merely exemplary. In some embodiments, the high-speed interface 708 is coupled to the memory 704, the display 716 (e.g., through a graphics processor or accelerator), and a high-speed expansion port 710 that can receive various expansion cards (not shown). In an embodiment, the low-speed interface 712 is coupled to the storage device 706 and the low-speed expansion port 714. The low-speed expansion port 714, which can include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet), can be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, for example, through a network adapter.

図に示すように、コンピューティングデバイス７００は、複数の異なる形態で実施することができる。例えば、コンピューティングデバイス７００は、標準的なサーバ７２０として、またはそのようなサーバのグループにおいて複数回実施することができる。加えて、コンピューティングデバイス７００は、ラップトップコンピュータ７２２等のパーソナルコンピュータにおいて実施することができる。コンピューティングデバイス７００は、ラックサーバシステム７２４の一部として実施することもできる。代替的に、コンピューティングデバイス７００からの構成要素は、モバイルコンピューティングデバイス７５０等のモバイルデバイス（図示せず）における他の構成要素と組み合わせることができる。そのようなデバイスの各々は、コンピューティングデバイス７００およびモバイルコンピューティングデバイス７５０のうちの１つまたはそれ以上を含むことができ、システム全体を、互いに通信する複数のコンピューティングデバイスを含むことができる。 As shown, computing device 700 can be implemented in a number of different forms. For example, computing device 700 can be implemented multiple times as a standard server 720 or in a group of such servers. In addition, computing device 700 can be implemented in a personal computer, such as a laptop computer 722. Computing device 700 can also be implemented as part of a rack server system 724. Alternatively, components from computing device 700 can be combined with other components in a mobile device (not shown), such as mobile computing device 750. Each such device can include one or more of computing device 700 and mobile computing device 750, and the overall system can include multiple computing devices in communication with each other.

モバイルコンピューティングデバイス７５０は、他の構成要素の中でも、プロセッサ７５２、メモリ７６４、ディスプレイ７５４等の入力／出力デバイス、通信インタフェース７６６および送受信機７６８を含む。モバイルコンピューティングデバイス７５０には、更なる記憶を提供するために、マイクロドライブまたは他のデバイス等の記憶デバイスを提供することもできる。プロセッサ７５２、メモリ７６４、ディスプレイ７５４、通信インタフェース７６６および送受信機７６８の各々は、様々なバスを使用して相互接続され、構成要素のうちのいくつかは、共通のマザーボードに、または適宜他の方式で搭載することができる。 The mobile computing device 750 includes, among other components, a processor 752, a memory 764, an input/output device such as a display 754, a communication interface 766, and a transceiver 768. The mobile computing device 750 may also be provided with a storage device such as a microdrive or other device to provide additional storage. Each of the processor 752, memory 764, display 754, communication interface 766, and transceiver 768 are interconnected using various buses, and some of the components may be mounted on a common motherboard or in other manners as appropriate.

プロセッサ７５２は、メモリ７６４に記憶された命令を含む、モバイルコンピューティングデバイス７５０内の命令を実行することができる。プロセッサ７５２は、別個のおよび複数のアナログおよびデジタルプロセッサを含むチップのチップセットとして実施することができる。プロセッサ７５２は、例えば、ユーザインタフェースの制御等の、モバイルコンピューティングデバイス７５０の他の構成要素の協調のために、モバイルコンピューティングデバイス７５０によって実行されるアプリケーション、およびモバイルコンピューティングデバイス７５０による無線通信を提供することができる。 The processor 752 can execute instructions within the mobile computing device 750, including instructions stored in the memory 764. The processor 752 can be implemented as a chipset of chips including separate and multiple analog and digital processors. The processor 752 can provide for the coordination of other components of the mobile computing device 750, such as control of a user interface, applications executed by the mobile computing device 750, and wireless communication by the mobile computing device 750.

プロセッサ７５２は、ディスプレイ７５４に結合された制御インタフェース７５８およびディスプレイインタフェース７５６を通じてユーザと通信することができる。ディスプレイ７５４は、例えば、ＴＦＴ（薄膜トランジスタ液晶ディスプレイ）ディスプレイもしくはＯＬＥＤ（有機発光ダイオード）ディスプレイ、またはその他の適切な表示技術とすることができる。ディスプレイインタフェース７５６は、ユーザに対してグラフィカルな情報および他の情報を提示するようにディスプレイ７５４を駆動するための適切な回路を含むことができる。制御インタフェース７５８は、ユーザからコマンドを受け、それらのコマンドを、プロセッサ７５２に送るために変換することができる。加えて、外部インタフェース７６２が、他のデバイスとのモバイルコンピューティングデバイス７５０の近い地域の通信を可能にするために、プロセッサ７５２との通信を提供することができる。外部インタフェース７６２は、例えば、いくつかの実施態様においては有線通信を、または他の実施態様においては無線通信を提供することができ、複数のインタフェースも使用することができる。 The processor 752 can communicate with a user through a control interface 758 and a display interface 756 coupled to a display 754. The display 754 can be, for example, a TFT (thin film transistor liquid crystal display) display or an OLED (organic light emitting diode) display, or other suitable display technology. The display interface 756 can include suitable circuitry for driving the display 754 to present graphical and other information to the user. The control interface 758 can receive commands from the user and translate those commands for sending to the processor 752. In addition, an external interface 762 can provide communication with the processor 752 to enable near area communication of the mobile computing device 750 with other devices. The external interface 762 can provide, for example, wired communication in some implementations or wireless communication in other implementations, and multiple interfaces can also be used.

メモリ７６４は、モバイルコンピューティングデバイス７５０内に情報を記憶する。メモリ７６４は、コンピュータ可読媒体、揮発性メモリユニット、または不揮発性メモリユニットのうちの１つまたはそれ以上として実施することができる。また、拡張メモリ７７４を設け、例えば、ＳＩＭＭ（シングルインラインメモリモジュール）カードインタフェースを含むことができる拡張インタフェース７７２を通じてモバイルコンピューティングデバイス７５０に接続することができる。拡張メモリ７７４は、モバイルコンピューティングデバイス７５０に追加的な記憶空間を提供することができるか、またはモバイルコンピューティングデバイス７５０に関するアプリケーションまたは他の情報を記憶することもできる。特に、拡張メモリ７７４は、上述のプロセスを実行または補足する命令を含むことができ、セキュア情報を含むこともできる。したがって、例えば、拡張メモリ７７４は、モバイルコンピューティングデバイス７５０のセキュリティモジュールとして設けることができ、モバイルコンピューティングデバイス７５０のセキュアな使用を可能にする命令を使用してプログラムすることができる。加えて、ハッキングすることができない方式でＳＩＭＭカードに識別情報を置く等、追加の情報と共に、セキュアなアプリケーションをＳＩＭＭカードにより提供することができる。 The memory 764 stores information within the mobile computing device 750. The memory 764 may be embodied as one or more of a computer readable medium, a volatile memory unit, or a non-volatile memory unit. An expansion memory 774 may also be provided and connected to the mobile computing device 750 through an expansion interface 772, which may include, for example, a SIMM (single in-line memory module) card interface. The expansion memory 774 may provide additional storage space for the mobile computing device 750 or may store applications or other information related to the mobile computing device 750. In particular, the expansion memory 774 may include instructions that perform or supplement the processes described above, and may also include secure information. Thus, for example, the expansion memory 774 may be provided as a security module of the mobile computing device 750 and may be programmed with instructions that enable secure use of the mobile computing device 750. In addition, secure applications may be provided by the SIMM card along with additional information, such as placing identifying information on the SIMM card in a manner that cannot be hacked.

メモリは、例えば、以下で論じられるように、フラッシュメモリおよび／またはＮＶＲＡＭメモリ（不揮発性ランダムアクセスメモリ）を含むことができる。いくつかの実施態様において、コンピュータプログラム製品は情報担体内に有形的に具現化される。コンピュータプログラム製品は、実行されると上記の方法等の１つまたはそれ以上の方法を実行する命令を含む。コンピュータプログラム製品は、メモリ７６４、拡張メモリ７７４またはプロセッサ７５２上のメモリ等のコンピュータまたは機械可読媒体とすることができる。いくつかの実施態様において、コンピュータプログラム製品は、例えば、送受信機７６８または外部インタフェース７６２を介して伝播信号において受けることができる。 The memory may include, for example, flash memory and/or NVRAM memory (non-volatile random access memory), as discussed below. In some embodiments, the computer program product is tangibly embodied in an information carrier. The computer program product includes instructions that, when executed, perform one or more methods, such as the methods described above. The computer program product may be a computer or machine readable medium, such as memory 764, expansion memory 774, or memory on processor 752. In some embodiments, the computer program product may be received in a propagated signal, for example, via transceiver 768 or external interface 762.

モバイルコンピューティングデバイス７５０は、必要に応じてデジタル信号処理回路を含むことができる通信インタフェース７６６を通じて無線で通信することができる。通信インタフェース７６６は、数ある中でも、ＧＳＭ音声電話（移動体通信用グローバルシステム）、ＳＭＳ（ショートメッセージサービス）、ＥＭＳ（拡張メッセージングサービス）、またはＭＭＳメッセージング（マルチメディアメッセージングサービス）、ＣＤＭＡ（符号分割多元接続）、ＴＤＭＡ（時分割多元接続）、ＰＤＣ（パーソナルデジタルセルラー）、ＷＣＤＭＡ（広帯域符号分割多元接続）、ＣＤＭＡ２０００、またはＧＰＲＳ（汎用パケット無線サービス）等の様々なモードまたはプロトコルの下で通信を提供することができる。そのような通信は、例えば、無線周波数を使用する送受信機７６８を通じて行うことができる。加えて、Ｂｌｕｅｔｏｏｔｈ、ＷｉＦｉ、または他のそのような送受信機（図示せず）を使用するなどして近距離通信を行うことができる。加えて、ＧＰＳ（全地球測位システム）受信機モジュール７７０が、モバイルコンピューティングデバイス７５０において実行されるアプリケーションによって適宜使用することができる更なるナビゲーションおよび位置に関連する無線データをモバイルコンピューティングデバイス７５０に提供することができる。 The mobile computing device 750 may communicate wirelessly through a communication interface 766, which may include digital signal processing circuitry as required. The communication interface 766 may provide communications under various modes or protocols, such as GSM voice telephony (Global System for Mobile Communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (Code Division Multiple Access), TDMA (Time Division Multiple Access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communications may be through a transceiver 768 using radio frequencies, for example. In addition, short-range communications may be performed, such as using Bluetooth, WiFi, or other such transceivers (not shown). In addition, a GPS (Global Positioning System) receiver module 770 may provide the mobile computing device 750 with additional navigation- and location-related radio data that may be used by applications executing on the mobile computing device 750, as appropriate.

モバイルコンピューティングデバイス７５０は、ユーザから発話された情報を受け、その情報を使用可能なデジタル情報に変換することができる音声コーデック７６０を使用して音声通信することもできる。同じく、音声コーデック７６０は、例えば、モバイルコンピューティングデバイス７５０におけるハンドセットのスピーカーを介するなどして、ユーザのための可聴音声を生成することができる。そのような音声は、音声電話呼からの音声を含むことができ、記録された音声（例えば、音声メッセージ、音楽ファイルなど）を含むことができ、モバイルコンピューティングデバイス７５０上で動作するアプリケーションによって生成された音声も含むことができる。 The mobile computing device 750 can also provide voice communication using a voice codec 760 that can receive spoken information from a user and convert that information into usable digital information. Similarly, the voice codec 760 can generate audible sound for the user, such as through a speaker in a handset on the mobile computing device 750. Such sound can include sound from a voice telephone call, can include recorded sound (e.g., voice messages, music files, etc.), and can also include sound generated by applications running on the mobile computing device 750.

図に示すように、モバイルコンピューティングデバイス７５０は、複数の異なる形態で実施することができる。例えば、モバイルコンピューティングデバイス７５０は、携帯電話７８０として実施することができる。また、モバイルコンピューティングデバイス７５０は、スマートフォン７８２、携帯情報端末、または他の同様のモバイルデバイスの一部として実施することもできる。 As shown, the mobile computing device 750 may be implemented in a number of different forms. For example, the mobile computing device 750 may be implemented as a mobile phone 780. The mobile computing device 750 may also be implemented as part of a smartphone 782, a personal digital assistant, or other similar mobile device.

ここに記載のシステムおよび技術の様々な実施態様は、デジタル電子回路、集積回路、特別に設計されたＡＳＩＣ（特定用途向け集積回路）、コンピュータハードウェア、ファームウェア、ソフトウェア、および／またはこれらの組合せにおいて実現することができる。これらの様々な実施態様は、記憶システム、少なくとも１つの入力デバイス、および少なくとも１つの出力デバイスからデータおよび命令を受け、それらにデータおよび命令を送信するために結合された、専用または汎用である可能性がある少なくとも１つのプログラム可能なプロセッサを含むプログラム可能なシステム上の、実行可能および／または解釈可能な１つまたはそれ以上のコンピュータプログラムにおける実施態様を含むことができる。 Various implementations of the systems and techniques described herein may be realized in digital electronic circuitry, integrated circuits, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor, which may be special purpose or general purpose, coupled to receive data and instructions from and transmit data and instructions to a storage system, at least one input device, and at least one output device.

（プログラム、ソフトウェア、ソフトウェアアプリケーション、またはコードとしても知られる）これらのコンピュータプログラムは、プログラム可能なプロセッサ用の機械命令を含み、高水準手続き型プログラミング言語および／もしくはオブジェクト指向プログラミング言語、ならびに／またはアセンブリ／機械言語で実施することができる。本明細書において使用されるとき、機械可読媒体、コンピュータ可読媒体という用語は、機械命令を機械可読信号として受ける機械可読媒体を含む、プログラム可能なプロセッサに機械命令および／またはデータを提供するために使用される任意のコンピュータプログラム製品、装置、および／またはデバイス（例えば、磁気ディスク、光ディスク、メモリ、プログラマブルロジックデバイス（ＰＬＤ））を指す。機械可読信号という用語は、プログラム可能なプロセッサに機械命令および／またはデータを提供するために使用される任意の信号を指す。 These computer programs (also known as programs, software, software applications, or code) contain machine instructions for a programmable processor and may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms machine-readable medium, computer-readable media refer to any computer program product, apparatus, and/or device (e.g., magnetic disk, optical disk, memory, programmable logic device (PLD)) used to provide machine instructions and/or data to a programmable processor, including machine-readable media that receive machine instructions as machine-readable signals. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.

ユーザとのインタラクションを提供するために、ここに記載のシステムおよび技術は、ユーザに対して情報を表示するためのディスプレイデバイス（例えば、ＣＲＴ（陰極線管）またはＬＣＤ（液晶ディスプレイ）モニタ）、ならびにユーザがコンピュータに入力を与えることができるキーボードおよびポインティングデバイス（例えば、マウスまたはトラックボール）を有するコンピュータ上に実装することができる。他の種類のデバイスを使用して、ユーザとのインタラクションを提供することもできる；例えば、ユーザに提供されるフィードバックは、任意の形態の感覚フィードバック（例えば、視覚フィードバック、聴覚フィードバック、または触覚フィードバック）とすることができ；ユーザからの入力は、音響、発話、または触覚による入力を含む任意の形態で受けることができる。 To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user, as well as a keyboard and pointing device (e.g., a mouse or trackball) by which the user can provide input to the computer. Other types of devices can also be used to provide interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or haptic feedback); input from the user can be received in any form, including acoustic, speech, or tactile input.

ここに記載のシステムおよび技術は、バックエンド構成要素を（例えば、データサーバとして）含むか、またはミドルウェア構成要素（例えば、アプリケーションサーバ）を含むか、またはフロントエンド構成要素（例えば、ユーザがここに記載のシステムおよび技術の実装とインタラクトすることができるグラフィカルユーザインタフェースもしくはウェブブラウザを有するクライアントコンピュータ）を含むか、またはそのようなバックエンド構成要素、ミドルウェア構成要素、もしくはフロントエンド構成要素の任意の組合せを含むコンピューティングシステムにおいて実施することができる。システムの構成要素は、デジタルデータ通信（例えば、通信ネットワーク）の任意の形態または媒体によって相互接続することができる。通信ネットワークの例は、ローカルエリアネットワーク（ＬＡＮ）、広域ネットワーク（ＷＡＮ）、およびインターネットを含む。 The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or includes middleware components (e.g., an application server), or includes front-end components (e.g., a client computer having a graphical user interface or web browser through which a user can interact with an implementation of the systems and techniques described herein), or includes any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communications network). Examples of communications networks include a local area network (LAN), a wide area network (WAN), and the Internet.

コンピューティングシステムは、クライアントおよびサーバを含むことができる。クライアントおよびサーバは一般に、互いに離れており、典型的には、通信ネットワークを介してインタラクトする。クライアントとサーバとの関係は、それぞれのコンピュータ上で実行される、互いにクライアント－サーバ関係を有するコンピュータプログラムにより生じる。 A computing system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method for detecting a protein-protein complex interaction, comprising:
accessing a composite image comprising a plurality of complex images of a sample of a protein-protein complex comprising a first protein and a second protein;
masking the composite image to generate a masked portion and an unmasked portion;
accessing a first three-dimensional (3D) shape of a first protein and a second 3D shape of a second protein;
accessing a plurality of docking models, each of which defines a candidate pose pair;
applying, for each docking model, the first 3D shape, the second 3D shape, and the candidate pose pairs to generate, for the docking model, a corresponding fit score describing how well the pose pairs fit the docking model;
selecting one of the docking models as a detected model for the protein-protein complex based on the fitness score;
The method comprising:

The method of claim 1, further comprising generating a plurality of composite images.

The method of claim 2, further comprising generating a composite image from a plurality of complex images, where generating the composite image comprises extracting protein-protein-complex sub-images from the complex images; and orienting and classifying the sub-images.

The method of any one of claims 1 to 3, wherein the composite image is a cryo-electron microscopy (cryo-EM) image.

The method of claim 1, wherein each complex image includes a plurality of pixels, each having an address and carrying a color value that represents a corresponding portion of the protein-protein complex sample.

The method of any one of claims 1 to 3, wherein the composite image includes a plurality of pixels, each having an address and holding a color value that is a collection of color values of pixels having the same address in each of the plurality of composite images.

The method of any one of claims 1 to 3, wherein the composite image is masked without specific user input.

The method of claim 1, wherein the composite image is masked, and the masking includes receiving a first user input specifying a non-masked portion.

Receiving a first user input specifying an unmasking portion includes:
generating a bounding box by connecting locations specified by a first user input;
recording a portion of the composite image within the bounding box as an unmasked portion;
The method of claim 8 , comprising:

The method of any one of claims 1 to 3, wherein the first 3D shape is indexed as a first protein and the second 3D shape is indexed as a second protein.

The method of claim 1, wherein the first 3D shape is indexed as a first homolog of the first protein.

The method of claim 11, wherein the second 3D shape is indexed as a second homolog of the second protein.

The method of any one of claims 1 to 3, wherein the candidate pose pairs include a candidate location, a candidate orientation, and a candidate docking area.

The method of any one of claims 1 to 3, wherein the compatibility score is a cross-correlation score generated by projecting the docking model onto the 2D model.

Selecting one of the docking models as a detected model for the protein-protein complex based on the fitness score includes:
A subset of docking models:
i) selecting docking models with top N fitness scores;
ii) selecting all docking models having a fitness score above a threshold M;
2. The method of claim 1, further comprising: discriminating between the first and second objects based on their corresponding match scores by one of the group consisting of:

The method of claim 15, wherein selecting one of the docking models as the detected model for the protein-protein complex includes receiving a second user input that selects one of the subset of docking models as the detected model.

1. A system for detecting protein-protein complex interactions comprising:
one or more processors;
a computer memory for storing instructions;
the instructions, when executed by a processor, cause the processor to:
accessing a composite image comprising a plurality of complex images of a sample of a protein-protein complex comprising a first protein and a second protein;
accessing a first three-dimensional (3D) shape of a first protein and a second 3D shape of a second protein;
accessing a plurality of docking models, each of which defines a candidate pose pair;
applying, for each docking model, the first 3D shape, the second 3D shape, and the candidate pose pair to generate, for the docking model, a corresponding fit score describing how well the pose pair fits the docking model;
selecting one of the docking models as a detected model for the protein-protein complex based on the fitness score;
The system performs the following operations:

The system of claim 17, wherein the operation further includes generating a plurality of composite images.

The system of claim 18, wherein the operations further include generating a composite image from a plurality of complex images, where generating the composite image includes extracting protein-protein-complex sub-images from the complex images; and orienting and classifying the sub-images.

The system of any one of claims 17 to 19, wherein the composite image is a cryo-electron microscopy (cryo-EM) image.

The system of claim 17, wherein each complex image includes a plurality of pixels, each having an address and carrying a color value that represents a corresponding portion of the protein-protein complex sample.

22. The system of claim 21, wherein the composite image includes a plurality of pixels, each having an address and holding a color value that is a collection of color values of pixels having the same address in each of the plurality of composite images.

The system of any one of claims 17 to 19, wherein the composite image is masked without specific user input.

The system of claim 17, wherein the composite image is masked, and the masking includes receiving a first user input designating an unmasked portion.

Receiving a first user input specifying an unmasking portion includes:
generating a bounding box by connecting locations specified by a first user input;
recording a portion of the composite image within the bounding box as an unmasked portion;
25. The system of claim 24, comprising:

The system of any one of claims 17 to 19, wherein the first 3D shape is indexed as a first protein and the second 3D shape is indexed as a second protein.

The system of any one of claims 17 to 19, wherein the first 3D shape is indexed as a first homolog of the first protein.

20. The method of any one of claims 17 to 19, wherein the second 3D shape is indexed as a second homolog of the second protein.

The system of any one of claims 17 to 19, wherein the candidate pose pairs include a candidate location, a candidate orientation, and a candidate docking area.

The system of any one of claims 17 to 19, wherein the compatibility score is a cross-correlation score generated by projecting the docking model onto the 2D image.

Selecting one of the docking models as a detected model for the protein-protein complex based on the fitness score includes:
A subset of docking models:
i) selecting docking models with top N fitness scores;
ii) selecting all docking models having a fitness score above a threshold M;
20. The system of claim 17, further comprising: discriminating between the first and second objects based on their corresponding match scores by one of the group consisting of:

The system of claim 31, wherein selecting one of the docking models as the detected model for the protein-protein complex includes receiving a second user input that selects one of the subset of docking models as the detected model.