JPWO2020028912A5

JPWO2020028912A5 -

Info

Publication number: JPWO2020028912A5
Application number: JP2021505820A
Authority: JP
Publication date: 2022-08-09

Description

変化、置換および変更の例は、当業者によって確かめられ、本明細書に開示される情報の範囲から逸脱することなく為すことができる。本明細書に引用されるあらゆる参考文献は、これによりそれらの全体が参照により本明細書に組み込まれ、本願の一部を為す。
本発明は、例えば、以下の項目を提供する。
(項目１)
核酸配列に記憶された情報を読むための方法であって、
長さＬの記号のストリング由来のデジタル情報を記憶する識別子核酸分子のプールを得るステップであって、個々の識別子核酸分子のそれぞれが、複数のコンポーネント核酸分子を含み、前記記号のストリングにおける記号値および記号位置に対応し、識別子核酸分子の前記プールが、長さＬを有する記号のいずれかのストリングをコードすることができる識別子ライブラリーにおける識別子核酸配列のサブセットに対応する、ステップと、
前記得られた識別子核酸分子の識別子核酸分子を読むステップであって、前記識別子核酸分子の部分に対応する読み取りデータ配列を識別するステップと、
前記読み取りデータ配列に基づき、それぞれ（ｉ）前記識別子ライブラリーにおけるエントリーに対応し、（ｉｉ）前記読み取りデータ配列に近似するまたは的確にマッチする配列を有するコンポーネント核酸分子を含む、候補識別子核酸配列のセットを識別するステップと、
各候補識別子核酸配列に、前記それぞれの候補識別子核酸配列が前記識別子核酸分子とどの程度類似するかを代表するスコアを割り当てるステップと、
前記スコアに基づき、前記候補識別子核酸配列のうち１個を選択された配列として選択するステップと
を含む方法。
(項目２)
前記識別子ライブラリーを使用して、前記選択された配列を、前記記号のストリング内の前記記号位置の１個および前記記号値の１個にマッピングするステップをさらに含む、項目１に記載の方法。
(項目３)
前記識別子ライブラリーを使用して、前記プールにおける識別子核酸分子の可能性がある配列に対応する追加的な選択された配列をマッピングすることにより、前記記号のストリング内の追加的な記号位置および記号値を決定するステップをさらに含む、項目２に記載の方法。
(項目４)
前記識別子核酸分子を読むステップが、化学的配列決定、チェーン・ターミネーション配列決定、ショットガン配列決定、ブリッジＰＣＲ配列決定、単一分子リアルタイム配列決定、イオン半導体配列決定、パイロシークエンシング、合成による配列決定、組み合わせプローブアンカー合成配列決定、ライゲーションによる配列決定、ナノポア配列決定、ナノチャネル配列決定、超並列シグネチャー配列決定、ポロニー配列決定、ＤＮＡナノボール配列決定、単一分子蛍光配列決定、トンネル電流配列決定、ハイブリダイゼーションによる配列決定、質量分析配列決定、マイクロ流体配列決定、透過型電子顕微鏡配列決定、ＲＮＡポリメラーゼ配列決定またはｉｎｖｉｔｒｏウイルス配列決定のうち少なくとも１種によって、前記識別子核酸分子の少なくとも部分を配列決定することを含む、項目１から３のいずれかに記載の方法。
(項目５)
前記配列決定することが、
電場を電解液および少なくとも１個のナノポアチャネルに印加することと、
前記少なくとも１個のナノポアチャネルを通して前記識別子核酸分子を移行させることと、
前記少なくとも１個のナノポアチャネルにおけるインピーダンスを測定することであって、前記コンポーネント核酸配列がそれぞれ、前記配列の前記長さに沿って対応するユニークインピーダンスシグネチャーを有することとを含む、項目４に記載の方法。
(項目６)
前記識別子核酸分子における少なくとも１個のコンポーネント核酸配列を配列決定することが、測定されたインピーダンス値を前記ユニークインピーダンスシグネチャーと比較することを含む、項目５に記載の方法。
(項目７)
前記少なくとも１個のナノポアチャネルが、アルファ－溶血素（αＨＬ）またはｍｙｃｏｂａｃｔｅｒｉｕｍｓｍｅｇｍａｔｉｓポリンＡ（ＭｓｐＡ）から形成される、項目５または６に記載の方法。
(項目８)
前記少なくとも１個のナノポアチャネルが、ソリッドステート膜内に形成される、項目５または６に記載の方法。
(項目９)
識別子核酸分子を読むステップに先立ち、前記少なくとも１個の識別子核酸分子を第２の識別子核酸分子にライゲーションするステップをさらに含む、項目５から８のいずれかに記載の方法。
(項目１０)
前記識別子核酸分子を読むステップに先立ち、前記少なくとも１個の識別子核酸分子の一方の鎖を分解するステップをさらに含む、項目５から９のいずれかに記載の方法。
(項目１１)
鎖特異的エキソヌクレアーゼを使用して、前記少なくとも１個の識別子核酸分子の一方の鎖を選択的に分解する、項目１０に記載の方法。
(項目１２)
前記電場が、前記少なくとも１個のナノポアチャネルをわたる１００ｍＶを超える差次的電位を生成し、前記少なくとも１個の識別子核酸分子の移行が、毎秒１，０００塩基を超える速度で起こる、項目５から１１のいずれかに記載の方法。
(項目１３)
移行前に、前記少なくとも１個の識別子核酸分子に薬剤を結合させるステップをさらに含み、前記薬剤が、測定インピーダンスにおける薬剤シグネチャーに関連する、項目５から１２のいずれかに記載の方法。
(項目１４)
少なくとも１個のユニークインピーダンスシグネチャーが、薬剤シグネチャーを含み、前記識別子核酸分子における少なくとも１個のコンポーネント核酸配列を決定することが、測定されたインピーダンス値を前記少なくとも１個のユニークインピーダンスシグネチャーと比較することを含む、項目５から１３のいずれかに記載の方法。
(項目１５)
前記少なくとも１個の核酸分子上の前記薬剤の存在が、前記少なくとも１個の核酸分子上の前記薬剤の非存在下での所望のレベルの精度を達成する第２の最大移行速度よりも速い、所望のレベルの精度を達成する第１の最大移行速度を可能にする、項目１３または１４に記載の方法。
(項目１６)
前記少なくとも１個の識別子核酸分子に前記薬剤を結合させるステップが、酵素を使用することを含み、既知の場所における前記薬剤シグネチャーが、移行中のインピーダンス値の既知シフトをもたらすように、前記少なくとも１個の識別子核酸分子に前記薬剤を結合させるステップが、コンポーネント核酸分子における既知の場所で発生する、項目１３から１５のいずれかに記載の方法。
(項目１７)
前記薬剤が、塩基アナログであり、前記酵素が、ポリメラーゼであり、前記ポリメラーゼが、複製中に前記少なくとも１個の識別子核酸分子に前記塩基アナログを取り込む、項目１６に記載の方法。
(項目１８)
前記複数のコンポーネント核酸分子における既知の場所において複数の薬剤の各薬剤を結合させるステップをさらに含み、前記複数の薬剤および各薬剤の既知の場所が、薬剤シグネチャーを含む、項目１３から１７のいずれかに記載の方法。
(項目１９)
前記酵素が、メチルトランスフェラーゼである、項目１６および１８のいずれか一項に記載の方法。
(項目２０)
各識別子核酸分子が読み誤り許容度に関連するように、前記得られた識別子核酸分子が、互いとの最小数の塩基差によりコードされる、項目１から１９のいずれかに記載の方法。
(項目２１)
前記読み誤り許容度が、前記識別子核酸分子のより速い読みを可能にする、項目２０に記載の方法。
(項目２２)
前記記号のストリング内の前記記号位置の１個および前記記号値の１個への前記選択された配列のマッピングに基づき、デコードされた記号のストリングを決定するステップと、
前記デコードされた記号のストリングの部分のハッシュを計算するステップと、
前記計算されたハッシュを、前記記号のストリングの対応する部分に関連する本来のハッシュと比較するステップと、
前記比較に基づき、前記デコードされた記号のストリングの前記部分が、前記記号のストリングの前記部分にマッチするか検証するステップと
をさらに含む、項目１から２１のいずれかに記載の方法。
(項目２３)
前記デコードされた記号のストリングの前記部分が、前記記号のストリングの前記部分にマッチしないと決定するステップと、
前記スコアに基づき、第２の候補識別子核酸配列を前記選択された配列として選択するステップと、
前記識別子ライブラリーを使用して、前記選択された配列を、前記記号のストリング内の前記記号位置の１個および前記記号値の１個にマッピングするステップと
をさらに含む、項目２２に記載の方法。
(項目２４)
前記デコードされた記号のストリングの前記部分の前記ハッシュが、ＭＤ５、ＳＨＡ－２２４、ＳＨＡ－２５６、ＳＨＡ－３８４、ＳＨＡ－５１２、ＳＨＡ－５１２／２２４またはＳＨＡ－５１２／２５６のうち少なくとも１種を使用して計算される、項目２０から２３のいずれかに記載の方法。
(項目２５)
識別子核酸分子の前記プールに関する試料サイズ推定値をコンピュータ処理するステップと、
前記試料サイズ推定値に基づき、識別子核酸分子の前記プールをサンプリングして、前記識別子核酸分子を得るステップと
をさらに含む、項目１からＡ４のいずれかに記載の方法。
(項目２６)
核酸分子の前記プールにおける各識別子核酸分子が、Ｍ個の層に対応するＭ個のコンポーネント核酸分子を含む、項目１から２５のいずれかに記載の方法。
(項目２７)
前記識別子核酸分子を読むステップが、前記Ｍ個の層のうちＮ個を読むことを含む、項目２６に記載の方法。
(項目２８)
前記識別子核酸分子が改変された塩基を含むように、前記識別子核酸分子を複製するステップをさらに含む、項目１から２７のいずれかに記載の方法。
(項目２９)
前記スコアが、前記それぞれの候補識別子核酸配列と前記識別子核酸分子との間の類似性の程度を代表する距離メトリックである、項目１から２８のいずれかに記載の方法。
(項目３０)
核酸分子にデジタル情報を記憶するための方法であって、
前記デジタル情報を記号のストリングとして受け取るステップであって、前記記号のストリングにおける各記号が、前記記号のストリング内の記号値および記号位置を有し、前記記号のストリングが長さＬを有する、ステップと、
Ｃ個の別個のコンポーネント核酸配列のセットを使用して、前記記号のストリングをコードするための分割スキームを決定するステップであって、前記分割スキームが、コンポーネント数ｃ _ｉの積が、前記記号のストリングの長さＬを超えるか、またはこれに等しく、前記コンポーネント数ｃ _ｉの和が、別個のコンポーネント核酸配列の前記数Ｃ未満であるか、またはこれに等しいように、（ｉ）その内に前記Ｃ個の別個のコンポーネント核酸配列を配置するための数Ｍの層、および（ｉｉ）各第ｉの層におけるコンポーネントの数を定義する前記コンポーネント数ｃ _ｉを定義する、ステップと、
（１）前記Ｍ個の層のそれぞれからコンポーネント核酸配列を有する１個のコンポーネント核酸分子を選択すること、
（２）前記Ｍ個の選択されたコンポーネント核酸分子を区画に置くこと、
（３）（２）における前記Ｍ個の選択されたコンポーネント核酸分子を物理的にアセンブルして、第１の識別子核酸分子を形成すること
により、第１の識別子核酸分子を形成するステップと、
それぞれの記号位置にそれぞれ対応する、複数の追加的な識別子核酸分子を形成するステップと、
プールにおいて前記識別子核酸分子の少なくとも部分を収集するステップとを含む方法。
(項目３１)
前記Ｍ個の層のそれぞれにおけるコンポーネント核酸配列の前記数ｃ _ｉの分布が、不均一である、項目３０に記載の方法。
(項目３２)
長さＬを有する記号のいずれかのストリングが、前記Ｃ個の別個のコンポーネント核酸配列のいずれかの組合せを有する分子から形成された前記識別子核酸分子によって表され得るように、前記分割スキームが設計され、１個のコンポーネント核酸配列が、前記Ｍ個の層のそれぞれから選択される、項目３０または３１に記載の方法。
(項目３３)
前記識別子核酸分子における各層が、トライデータ構造における層を表す、項目３０から３２のいずれかに記載の方法。
(項目３４)
各層における前記コンポーネント核酸分子が、第１および第２の末端領域により構造化され、前記Ｍ個の層の１個に由来する各コンポーネント核酸分子の前記第１の末端領域が、前記Ｍ個の層の別のものに由来するいずれかのコンポーネント核酸分子の前記第２の末端領域に結合するように構造化される、項目３０から３３のいずれかに記載の方法。
(項目３５)
前記記号のストリング内の各記号位置が、対応する異なる識別子核酸配列を有する、項目３０から３４のいずれかに記載の方法。
(項目３６)
前記識別子核酸分子が、前記Ｍ個の層のそれぞれに由来する１個のコンポーネント核酸配列をそれぞれ含む、可能な識別子核酸配列の組み合わせ空間のサブセットを代表する、項目３０から３５のいずれかに記載の方法。
(項目３７)
前記プールにおける識別子核酸分子の存在または非存在が、前記記号のストリング内の前記対応するそれぞれの記号位置の前記記号値を代表する、項目３６に記載の方法。
(項目３８)
前記コンポーネント数ｃ _ｉの前記積が、ビット単位の前記記号のストリングの長さを超えるか、またはこれに等しい、項目３０から３７のいずれかに記載の方法。
(項目３９)
前記分割スキームが、Ｍ個の選択されたコンポーネント核酸分子の少なくともいずれかのセットを区画に置くことができるプリンターシステムの構成にさらに基づく、項目３０から３８のいずれかに記載の方法。
(項目４０)
Ｃが、前記プリンターシステムにおける利用できるインクの数に等しく、各利用できるインクが、１個のコンポーネント核酸配列を含む、項目３９に記載の方法。
(項目４１)
核酸分子にデジタル情報を記憶するための方法であって、
長さＬ１を有する記号の第１のストリングとしてデジタル情報を受け取るステップであって、前記記号の第１のストリングにおける各記号が、前記記号の第１のストリング内の記号値および記号位置を有する、ステップと、
前記記号のストリングを複数のブロックに分けるステップであって、各ブロックが長さＢを有する、ステップと、
ブロック毎に、長さＨのハッシュをコンピュータ処理し、前記ハッシュを前記ブロックに加えて、ハッシュ化ブロックを得るステップと、
前記ハッシュ化ブロックをつなぐことにより、長さＬ２を有する記号の第２のストリングを形成するステップと、
前記記号の第２のストリングを複数のスライスに分けるステップであって、各スライスが長さＳを有する、ステップと、
スライス毎に、長さＰの誤り保護記号の数をコンピュータ処理し、前記誤り保護記号を前記スライスに付加して、誤り保護されたスライスを得るステップと、
前記誤り保護されたスライスをつなぐことにより、長さＬ３を有する記号の第３のストリングを形成するステップと、
前記記号の第３のストリングを複数のワードに分けるステップであって、各ワードが長さＷを有する、ステップと、
１個または複数のコードブックを使用して、ワード毎に、コードワードを決定するステップと、
前記コードワードをつなぐことにより、長さＬ４を有する記号の第４のストリングを形成するステップと
前記記号の第４のストリングを複数の識別子核酸分子にマッピングするステップであって、前記複数の識別子核酸分子の個々の識別子核酸分子が、前記記号の第４のストリングにおける個々の記号に対応し、対応する複数のコンポーネント核酸配列を含み、前記複数のコンポーネント核酸配列における各コンポーネント核酸配列が、別個の核酸配列を含む、ステップと、
前記対応する複数のコンポーネント核酸配列を区画に置き、前記複数のコンポーネント核酸配列を一緒にアセンブルすることにより、前記複数の識別子核酸分子の個々の識別子核酸分子を構築するステップと
を含む方法。
(項目４２)
プールにおいて前記複数の識別子を収集するステップをさらに含む、項目４１に記載の方法。
(項目４３)
前記プールにおける識別子核酸分子の存在または非存在が、記号のストリング内の対応するそれぞれの記号位置の前記記号値を代表する、項目４１～４２のいずれかに記載の方法。
(項目４４)
各コードワードが、前記複数のワードの前記それぞれのワードの的確なマッチである、項目４１から４３のいずれかに記載の方法。
(項目４５)
前記コードワードが、コードまたはデコードの際の化学条件に最適化される、項目４１から４４のいずれかに記載の方法。
(項目４６)
各コードワードに固定された数の１個または複数の種類の記号が存在するように、前記コードワードが、固定された重みを有する、項目４１から４５のいずれかに記載の方法。
(項目４７)
各区画が、固定された数の識別子核酸配列を含有し、各区画内のおよび区画にわたる識別子核酸分子の濃度がほぼ等しい、項目４１から４６のいずれかに記載の方法。
(項目４８)
前記対応する複数のコンポーネントを区画に置き、前記複数のコンポーネントを一緒にアセンブルすることが、
複数のプリントヘッドを使用して、複数のコンポーネントを含む複数の溶液を基板上の座標に分配することと、
反応ミックスを前記基板上の前記座標に分配して、前記複数のコンポーネントを物理的にリンクさせる、前記複数のコンポーネントを物理的にリンクするのに必要な条件を提供する、またはその両方を行うことと
を含む、項目４１から４７のいずれかに記載の方法。
(項目４９)
前記記号の第４のストリングを複数の識別子核酸分子にマッピングするステップが、前記複数の区画における各区画が同数の識別子核酸分子を含有するように、前記識別子核酸分子を分布させることを含む、項目４１から４８のいずれかに記載の方法。
(項目５０)
前記誤り保護記号が、リード・ソロモンコードを使用して決定される、項目４１から４９のいずれかに記載の方法。
(項目５１)
前記誤り保護記号が、２個の記号で割ったＰの誤り許容度を提供し、Ｐの抹消許容度が、保護されたスライスにおいて抹消する、項目４１に記載の方法。
(項目５２)
前記複数の区画が、基板上に設置され、前記記号のストリングにおける隣接する記号を表す識別子核酸分子が、隣接する区画において構築されないように、前記対応する複数のコンポーネントを前記区画に置く前記ステップを並べ替える、インタリーブするまたはプログラミングすることをさらに含む、項目４１から５１のいずれかに記載の方法。
(項目５３)
前記複数の識別子核酸分子への前記記号の第４のストリングのマッピングに基づきプリンター指令のセットを開発するステップと、
プリント指令の前記セットをプリンター・フィニッシャーシステムに送るステップと
をさらに含む、項目４１から５２のいずれかに記載の方法。
(項目５４)
前記ハッシュ、誤り保護またはコードワード決定が、前記複数のブロックにおける個々のブロックで行われる、項目４１から５３のいずれかに記載の方法。
(項目５５)
前記ハッシュ、誤り保護またはコードワード決定が、前記個々のブロックおよび追加的なブロックで並行して行われる、項目５４に記載の方法。
(項目５６)
Ｈが、ゼロに等しい、項目４１から５５のいずれかに記載の方法。
(項目５７)
Ｐが、ゼロに等しい、項目４１から５６のいずれかに記載の方法。
(項目５８)
追加的な誤り保護記号またはハッシュ記号が、磁気ストレージデバイス、光学ストレージデバイス、フラッシュメモリデバイスまたはクラウドストレージに記憶される、項目４１から５７のいずれかに記載の方法。
(項目５９)
核酸分子にデジタル情報を記憶するための方法であって、
デジタル情報を記号の第１のストリングとして受け取るステップであって、前記記号の第１のストリングにおける各記号が、前記記号の第１のストリング内の記号値および記号位置を有し、前記記号の第１のストリングが長さＬ１を有する、ステップと、
前記記号のストリングを複数のブロックに分けるステップであって、各ブロックが長さＢを有する、ステップと、
ブロック毎に、長さＨのハッシュをコンピュータ処理するステップと、
前記ハッシュをブロック毎に記憶するステップと、
前記ハッシュ化ブロックをアセンブルすることにより、長さＬ２を有する記号の第２のストリングを形成するステップと、
前記記号の第２のストリングを複数のスライスに分けるステップであって、各スライスが長さＳを有する、ステップと、
スライス毎に、長さＰの誤り保護記号の数をコンピュータ処理し、前記スライスの末端に前記誤り保護記号を加えて、誤り保護されたスライスを得るステップと、
前記誤り保護されたスライスをアセンブルすることにより、長さＬ３を有する記号の第３のストリングを形成するステップと、
前記記号の第３のストリングを複数のワードに分けるステップであって、各ワードが長さＷを有する、ステップと、
１個または複数のコードブックを使用して、ワード毎に、コードワードをコンピュータ処理するステップと、
前記コードワードをつなぐことにより、長さＬ４を有する記号の第４のストリングを形成するステップと、
前記記号の第４のストリングを複数の識別子核酸分子にマッピングするステップであって、前記複数の識別子核酸分子の個々の識別子が、対応する複数のコンポーネントを含み、前記複数のコンポーネントにおける各コンポーネントが、別個の核酸配列を含み、前記複数の識別子核酸分子の個々の識別子のそれぞれが、前記記号の第４のストリングにおける個々の記号に対応する、ステップと、
前記対応する複数のコンポーネントを区画に置き、前記複数のコンポーネントを一緒にアセンブルすることにより、前記複数の識別子の個々の識別子を構築するステップと
を含む方法。
(項目６０)
前記複数の識別子を含む識別子プールを構築するステップをさらに含む、項目５９に記載の方法。
(項目６１)
各ブロック毎の前記ハッシュが、核酸分子、磁気ストレージデバイス、光学ストレージデバイス、フラッシュメモリデバイスまたはクラウドストレージに記憶される、項目５９または６０に記載の方法。
(項目６２)
核酸分子にデジタル情報を記憶するための方法であって、
複数のブロックを得るステップであって、各ブロックが、記号のストリングを含み、ブロックＩＤに関連する、ステップと、
前記複数のブロックのブロックをコンテナに割り当てるステップと、
前記ブロックを、前記コンテナと関連するべき複数の識別子核酸配列にマッピングするステップであって、前記複数の識別子核酸配列の個々の識別子核酸配列が、前記記号のストリングにおける個々の記号に対応し、対応する複数のコンポーネント核酸配列を含み、前記複数のコンポーネント核酸配列における各コンポーネント核酸配列が、別個の核酸配列を含む、ステップと、
前記複数の識別子核酸配列の個々の識別子核酸分子を構築するステップと、
前記割り当てられたコンテナにおいて前記個々の識別子核酸分子を記憶するステップであって、前記コンテナおよびそれに関連する前記複数の識別子核酸配列の同一性を含む物理的アドレスが、前記関連ブロックＩＤを使用して決定されるように構成される、ステップと
を含む方法。
(項目６３)
前記ブロックＩＤが、整数、ストリング、三重、属性のリスト、または意味的アノテーションである、項目６２に記載の方法。
(項目６４)
前記物理的アドレスが、前記関連ブロックＩＤを使用して前記物理的アドレスのアクセスを容易にするように設計されたデータ構造に記憶される、項目６２または６３に記載の方法。
(項目６５)
前記データ構造が、Ｂツリー、トライまたはアレイのうち１種である、項目６２から６４のいずれかに記載の方法。
(項目６６)
前記データ構造の少なくとも部分が、インデックスにおける前記デジタル情報と共に記憶される、項目６４～６５のいずれかに記載の方法。
(項目６７)
前記インデックスが、第２のコンテナに関連する第２の複数の識別子核酸配列を含む、項目６６に記載の方法。
(項目６８)
前記インデックスが、Ｂツリーデータ構造を含み、前記Ｂツリーの各ノードが、前記第２の複数の識別子核酸配列の別個の複数の識別子核酸分子を含む、項目６７に記載の方法。
(項目６９)
前記Ｂツリーにおける前記ブロックＩＤを探索することが、
（ｉ）第１のノードを含む前記別個の複数の識別子核酸分子を選択するステップと、
（ｉｉ）前記第１のノードの値を読むステップと、
（ｉｉｉ）後続のノードにより、ステップ（ｉ）および（ｉｉ）のプロセスを反復するステップであって、前記後続のノードを含む前記別個の複数の識別子核酸分子の同一性が、前記第１のノードの前記値に関する前記ブロックＩＤによって決定される、ステップと、
を含む、項目６８に記載の方法。
(項目７０)
前記第１のノードが、前記Ｂツリーのルートノードであり、ステップ（ｉ）および（ｉｉ）の前記プロセスが、前記Ｂツリーのリーフノードの値が読まれるまで続き、前記リーフノードの前記値が、前記ブロックＩＤに対するブロックが存在するか連絡するように構成され、前記ブロックＩＤが存在する場合、前記ブロックの前記物理的アドレスを連絡する、項目６９に記載の方法。
(項目７１)
前記インデックスが、トライデータ構造を含み、前記トライの各ノードが、前記第２の複数の識別子核酸配列の別個の複数の識別子核酸分子を含む、項目６７に記載の方法。
(項目７２)
前記ブロックＩＤが、記号のストリングであり、前記トライにおける各ノードが、前記記号のストリングの可能な接頭語に対応する、項目７１に記載の方法。
(項目７３)
前記データ構造が、アレイであり、前記アレイの各エレメントが、前記第２の複数の識別子核酸配列の別個の複数の識別子核酸分子を含む、項目６７に記載の方法。
(項目７４)
前記アレイにおける各エレメントが、ブロックＩＤに対応する、Ｅ７３に記載の方法。
(項目７５)
前記インデックスが、磁気ストレージデバイス、光学ストレージデバイス、フラッシュメモリデバイスまたはクラウドストレージに記憶される、項目６６に記載の方法。
(項目７６)
前記物理的アドレスの前記インデックスにおける場所が、前記ブロックＩＤにネイティブに構成される、項目６７に記載の方法。
(項目７７)
前記ブロックＩＤが、複数の核酸コンポーネントに直接マッピングされる、項目７６に記載の方法。
(項目７８)
前記物理的アドレスを記憶する前記インデックスにおける前記複数の識別子核酸分子が、前記複数のコンポーネントをそれぞれ含む個々の識別子核酸分子で構成される、項目７７に記載の方法。
(項目７９)
前記ブロックＩＤが、前記関連ブロックをアノテートする三重の実体であり、前記三重の実体が、複数の核酸コンポーネントにマッピングする、項目７７または７８に記載の方法。
(項目８０)
前記複数の核酸コンポーネントを含む個々の識別子核酸分子を含む前記インデックスにおける前記複数の識別子核酸分子が、前記実体によりアノテートされる全ブロックの前記物理的アドレスを記憶する、項目７９に記載の方法。
(項目８１)
前記物理的アドレスが、前記ブロックＩＤにネイティブに構成される、項目６７に記載の方法。
(項目８２)
前記ブロックＩＤが、前記物理的アドレスに直接マッピングされる、項目７６に記載の方法。
(項目８３)
前記関連ブロックを記憶する前記複数の識別子核酸分子が、前記複数のコンポーネントをそれぞれ含む個々の識別子核酸分子で構成される、項目８２に記載の方法。
(項目８４)
前記ブロックＩＤが、前記関連ブロックをアノテートする三重の実体であり、前記三重の実体が、複数の核酸コンポーネントにマッピングする、項目８２または８３に記載の方法。
(項目８５)
前記複数の核酸コンポーネントを含む個々の識別子核酸分子を含む前記コンテナにおける前記複数の識別子核酸分子が、前記実体によりアノテートされる全ブロックを記憶する、項目８２に記載の方法。
(項目８６)
前記トライのリーフノードが、前記リーフノードにおける前記トライによって指定された前記記号のストリングにマッチする前記ブロックＩＤに関連する前記物理的アドレスを記憶するように構成される、項目７２に記載の方法。
(項目８７)
前記アレイの各エレメントが、前記関連ブロックＩＤの前記物理的アドレスを記憶する、項目７４に記載の方法。
(項目８８)
複数の識別子核酸分子が、前記識別子範囲における前記第１のおよび最後の識別子の前記同一性を含む識別子範囲によって指定されるように構成されるように、連続して順序付けされた識別子核酸配列で完全に構成される、項目６２から８７のいずれかに記載の方法。
(項目８９)
前記識別子範囲における前記第１のおよび最後の識別子が、整数によって表される、項目８８に記載の方法。
(項目９０)
前記デジタル情報と共に起動およびオントロジー情報を記憶するステップをさらに含む、項目６２からＥ８９のいずれかに記載の方法。
(項目９１)
項目３０から９０に記載の方法のいずれかに従って核酸分子にデジタル情報を記憶するためのシステムであって、核酸の複数のコンテナを記憶するための試料管理システムを含むシステム。
(項目９２)
前記試料管理システムから指定されたコンテナを回収するための自動化機械をさらに含む、項目９１に記載のシステム。 Examples of changes, substitutions and alterations can be ascertained by those skilled in the art and made without departing from the scope of the information disclosed herein. All references cited herein are hereby incorporated by reference in their entireties and made part of this application.
The present invention provides, for example, the following items.
(Item 1)
A method for reading information stored in a nucleic acid sequence, comprising:
Obtaining a pool of identifier nucleic acid molecules that store digital information from a string of symbols of length L, each individual identifier nucleic acid molecule comprising a plurality of component nucleic acid molecules and symbolic values in said string of symbols and symbol positions, wherein said pool of identifier nucleic acid molecules corresponds to a subset of identifier nucleic acid sequences in an identifier library that can encode any string of symbols having length L;
reading an identifier nucleic acid molecule of said obtained identifier nucleic acid molecule, identifying a read data sequence corresponding to a portion of said identifier nucleic acid molecule;
candidate identifier nucleic acid sequences, each comprising component nucleic acid molecules that (i) correspond to entries in the identifier library and (ii) have sequences that closely or exactly match the read data sequences, based on the read data sequences; identifying a set;
assigning each candidate identifier nucleic acid sequence a score representative of how similar said respective candidate identifier nucleic acid sequence is to said identifier nucleic acid molecule;
selecting one of the candidate identifier nucleic acid sequences as the selected sequence based on the score;
method including.
(Item 2)
2. The method of item 1, further comprising mapping the selected sequence to one of the symbol positions and one of the symbol values in the string of symbols using the identifier library.
(Item 3)
additional symbol positions and symbols within said string of symbols by mapping additional selected sequences corresponding to potential sequences of identifier nucleic acid molecules in said pool using said identifier library; 3. The method of item 2, further comprising determining a value.
(Item 4)
The step of reading the identifier nucleic acid molecule is chemical sequencing, chain termination sequencing, shotgun sequencing, bridge PCR sequencing, single molecule real-time sequencing, ion-semiconductor sequencing, pyrosequencing, sequencing-by-synthesis. , combinatorial probe anchor synthesis sequencing, sequencing by ligation, nanopore sequencing, nanochannel sequencing, massively parallel signature sequencing, polony sequencing, DNA nanoball sequencing, single molecule fluorescence sequencing, tunneling current sequencing, high sequencing at least a portion of said identifier nucleic acid molecule by at least one of sequencing by hybridization, mass spectrometry sequencing, microfluidic sequencing, transmission electron microscope sequencing, RNA polymerase sequencing or in vitro viral sequencing 4. The method of any of items 1-3, comprising:
(Item 5)
The sequencing comprises:
applying an electric field to the electrolyte and the at least one nanopore channel;
translocating the identifier nucleic acid molecule through the at least one nanopore channel;
5. The method of item 4, comprising measuring impedance in said at least one nanopore channel, wherein said component nucleic acid sequences each have a corresponding unique impedance signature along said length of said sequence. Method.
(Item 6)
6. The method of item 5, wherein sequencing at least one component nucleic acid sequence in said identifier nucleic acid molecule comprises comparing measured impedance values to said unique impedance signature.
(Item 7)
7. The method of items 5 or 6, wherein said at least one nanopore channel is formed from alpha-hemolysin (αHL) or mycobacterium smegmatis porin A (MspA).
(Item 8)
7. The method of items 5 or 6, wherein the at least one nanopore channel is formed in a solid state membrane.
(Item 9)
9. The method of any of items 5-8, further comprising ligating said at least one identifier nucleic acid molecule to a second identifier nucleic acid molecule prior to reading the identifier nucleic acid molecule.
(Item 10)
10. The method of any of items 5-9, further comprising degrading one strand of said at least one identifier nucleic acid molecule prior to reading said identifier nucleic acid molecule.
(Item 11)
11. The method of item 10, wherein a strand-specific exonuclease is used to selectively degrade one strand of said at least one identifier nucleic acid molecule.
(Item 12)
from item 5, wherein said electric field produces a differential potential greater than 100 mV across said at least one nanopore channel, and translocation of said at least one identifier nucleic acid molecule occurs at a rate greater than 1,000 bases per second 12. The method according to any one of 11.
(Item 13)
13. The method of any of items 5-12, further comprising binding an agent to said at least one identifier nucleic acid molecule prior to translocation, wherein said agent is associated with a drug signature in measured impedance.
(Item 14)
at least one unique impedance signature comprises a drug signature, and determining at least one component nucleic acid sequence in said identifier nucleic acid molecule is comparing measured impedance values to said at least one unique impedance signature. 14. The method of any of items 5-13, comprising:
(Item 15)
the presence of the agent on the at least one nucleic acid molecule is faster than a second maximum translocation rate that achieves a desired level of precision in the absence of the agent on the at least one nucleic acid molecule; 15. Method according to item 13 or 14, allowing a first maximum transition speed to achieve a desired level of accuracy.
(Item 16)
Binding the agent to the at least one identifier nucleic acid molecule comprises using an enzyme, wherein the agent signature at a known location results in a known shift in impedance values during translocation, such that the at least one 16. The method of any of items 13-15, wherein the step of binding the agent to the identifier nucleic acid molecules occurs at known locations on the component nucleic acid molecules.
(Item 17)
17. The method of item 16, wherein said agent is a base analogue, said enzyme is a polymerase, and said polymerase incorporates said base analogue into said at least one identifier nucleic acid molecule during replication.
(Item 18)
18. Any of items 13-17, further comprising binding each agent of the plurality of agents at a known location in the plurality of component nucleic acid molecules, wherein the plurality of agents and the known location of each agent comprises a drug signature. The method described in .
(Item 19)
19. The method of any one of items 16 and 18, wherein said enzyme is a methyltransferase.
(Item 20)
20. A method according to any of items 1 to 19, wherein the obtained identifier nucleic acid molecules are encoded with a minimum number of base differences from each other such that each identifier nucleic acid molecule is associated with read error tolerance.
(Item 21)
21. The method of item 20, wherein said read error tolerance allows faster reading of said identifier nucleic acid molecule.
(Item 22)
determining a decoded string of symbols based on a mapping of the selected array to one of the symbol positions and one of the symbol values within the string of symbols;
computing a hash of the portion of the string of decoded symbols;
comparing the calculated hash to the original hash associated with the corresponding portion of the string of symbols;
verifying whether the portion of the decoded string of symbols matches the portion of the string of symbols based on the comparison;
22. The method of any of items 1-21, further comprising:
(Item 23)
determining that the portion of the decoded string of symbols does not match the portion of the string of symbols;
selecting a second candidate identifier nucleic acid sequence as the selected sequence based on the score;
mapping the selected sequence to one of the symbol positions and one of the symbol values within the string of symbols using the identifier library;
23. The method of item 22, further comprising:
(Item 24)
the hash of the portion of the string of decoded symbols is at least one of MD5, SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224 or SHA-512/256; 24. The method of any of items 20-23, calculated using:
(Item 25)
computing a sample size estimate for said pool of identifier nucleic acid molecules;
sampling said pool of identifier nucleic acid molecules to obtain said identifier nucleic acid molecules based on said sample size estimate;
The method of any of items 1-A4, further comprising:
(Item 26)
26. The method of any of items 1-25, wherein each identifier nucleic acid molecule in said pool of nucleic acid molecules comprises M component nucleic acid molecules corresponding to M layers.
(Item 27)
27. The method of item 26, wherein reading the identifier nucleic acid molecule comprises reading N of the M layers.
(Item 28)
28. The method of any of items 1-27, further comprising replicating the identifier nucleic acid molecule such that the identifier nucleic acid molecule comprises modified bases.
(Item 29)
29. The method of any of items 1-28, wherein said score is a distance metric representative of the degree of similarity between said respective candidate identifier nucleic acid sequence and said identifier nucleic acid molecule.
(Item 30)
A method for storing digital information in a nucleic acid molecule comprising:
receiving the digital information as a string of symbols, each symbol in the string of symbols having a symbol value and a symbol position within the string of symbols, the string of symbols having a length L; When,
determining a partitioning scheme for encoding the string of symbols using a set of C distinct component nucleic acid sequences, wherein the partitioning scheme is such that the product of the component numbers c _i is the number of components of the symbol greater than or equal to the length L of the string and such that the sum of said number of components c _i is less than or equal to said number C of distinct component nucleic acid sequences, (i) within defining a number M of layers for arranging said C distinct component nucleic acid sequences, and (ii) said component number c _i defining the number of components in each ith layer;
(1) selecting one component nucleic acid molecule having a component nucleic acid sequence from each of said M layers;
(2) placing the M selected component nucleic acid molecules into compartments;
(3) physically assembling the M selected component nucleic acid molecules in (2) to form a first identifier nucleic acid molecule;
forming a first identifier nucleic acid molecule by
forming a plurality of additional identifier nucleic acid molecules, each corresponding to a respective symbolic position;
collecting at least a portion of said identifier nucleic acid molecules in a pool.
(Item 31)
31. The method of item 30, wherein the distribution of the number c _i of component nucleic acid sequences in each of the M layers is non-uniform.
(Item 32)
said partitioning scheme is designed such that any string of symbols having a length L can be represented by said identifier nucleic acid molecule formed from molecules having any combination of said C distinct component nucleic acid sequences; and one component nucleic acid sequence is selected from each of said M layers.
(Item 33)
33. The method of any of items 30-32, wherein each layer in the identifier nucleic acid molecule represents a layer in the trie data structure.
(Item 34)
wherein said component nucleic acid molecules in each layer are structured by first and second terminal regions, wherein said first terminal region of each component nucleic acid molecule from one of said M layers comprises said M layers; 34. A method according to any of items 30-33, wherein any component nucleic acid molecule derived from another of is structured to bind to said second terminal region.
(Item 35)
35. The method of any of items 30-34, wherein each symbol position within the string of symbols has a corresponding different identifier nucleic acid sequence.
(Item 36)
36. Any of items 30 to 35, wherein said identifier nucleic acid molecule represents a subset of the combinatorial space of possible identifier nucleic acid sequences, each comprising one component nucleic acid sequence from each of said M layers. Method.
(Item 37)
37. The method of item 36, wherein the presence or absence of an identifier nucleic acid molecule in said pool represents said symbolic value of said respective corresponding symbolic position within said string of symbols.
(Item 38)
_38. A method according to any of items 30 to 37, wherein said product of said component numbers ci exceeds or equals the length of said string of symbols in bits.
(Item 39)
39. The method of any of items 30-38, wherein said partitioning scheme is further based on the configuration of a printer system capable of placing at least any set of M selected component nucleic acid molecules into compartments.
(Item 40)
40. The method of item 39, wherein C equals the number of available inks in said printer system, each available ink comprising one component nucleic acid sequence.
(Item 41)
A method for storing digital information in a nucleic acid molecule comprising:
receiving digital information as a first string of symbols having a length L1, each symbol in said first string of symbols having a symbol value and a symbol position within said first string of symbols; a step;
dividing the string of symbols into a plurality of blocks, each block having a length B;
computing a hash of length H for each block and adding said hash to said block to obtain a hashed block;
concatenating the hashed blocks to form a second string of symbols having a length L2;
dividing the second string of symbols into a plurality of slices, each slice having a length S;
computing, for each slice, a number of error protection symbols of length P and appending said error protection symbols to said slice to obtain an error protected slice;
forming a third string of symbols having a length L3 by concatenating the error protected slices;
dividing the third string of symbols into a plurality of words, each word having a length W;
determining a codeword, word by word, using one or more codebooks;
concatenating said codewords to form a fourth string of symbols having a length L4;
mapping the fourth string of symbols to a plurality of identifier nucleic acid molecules, wherein individual identifier nucleic acid molecules of the plurality of identifier nucleic acid molecules correspond to individual symbols in the fourth string of symbols; comprising a corresponding plurality of component nucleic acid sequences, each component nucleic acid sequence in said plurality of component nucleic acid sequences comprising a separate nucleic acid sequence;
constructing individual identifier nucleic acid molecules of the plurality of identifier nucleic acid molecules by placing the corresponding plurality of component nucleic acid sequences in compartments and assembling together the plurality of component nucleic acid sequences;
method including.
(Item 42)
42. The method of item 41, further comprising collecting the plurality of identifiers in a pool.
(Item 43)
43. The method of any of items 41-42, wherein the presence or absence of an identifier nucleic acid molecule in said pool is representative of said symbolic value at each corresponding symbolic position within a string of symbols.
(Item 44)
44. A method according to any of items 41 to 43, wherein each codeword is an exact match of said respective word of said plurality of words.
(Item 45)
45. A method according to any of items 41 to 44, wherein said codeword is optimized for chemical conditions during coding or decoding.
(Item 46)
46. A method according to any of items 41 to 45, wherein the codewords have fixed weights such that there is a fixed number of one or more types of symbols in each codeword.
(Item 47)
47. A method according to any of items 41 to 46, wherein each compartment contains a fixed number of identifier nucleic acid sequences and the concentration of identifier nucleic acid molecules within and across each compartment is approximately equal.
(Item 48)
placing the corresponding plurality of components in a compartment and assembling the plurality of components together;
dispensing multiple solutions containing multiple components to coordinates on a substrate using multiple printheads;
Distributing a reaction mix to the coordinates on the substrate to physically link the plurality of components, provide the conditions necessary to physically link the plurality of components, or both. When
48. The method of any of items 41-47, comprising
(Item 49)
wherein mapping said fourth string of symbols to a plurality of identifier nucleic acid molecules comprises distributing said identifier nucleic acid molecules such that each compartment in said plurality of compartments contains the same number of identifier nucleic acid molecules. 49. The method of any one of 41-48.
(Item 50)
50. A method according to any of items 41 to 49, wherein said error protection symbols are determined using a Reed-Solomon code.
(Item 51)
42. The method of item 41, wherein the error protection symbols provide an error tolerance of P divided by two symbols, and an erasure tolerance of P erasures in a protected slice.
(Item 52)
said plurality of compartments being placed on a substrate and placing said corresponding plurality of components in said compartments such that identifier nucleic acid molecules representing adjacent symbols in said string of symbols are not built up in adjacent compartments. 52. The method of any of items 41-51, further comprising reordering, interleaving or programming.
(Item 53)
developing a set of printer instructions based on the mapping of the fourth string of symbols to the plurality of identifier nucleic acid molecules;
sending said set of print instructions to a printer finisher system;
53. The method of any of items 41-52, further comprising:
(Item 54)
54. A method according to any of items 41 to 53, wherein said hashing, error protection or codeword determination is performed on individual blocks in said plurality of blocks.
(Item 55)
55. The method of item 54, wherein the hashing, error protection or codeword determination is performed on the individual blocks and additional blocks in parallel.
(Item 56)
56. The method of any of items 41-55, wherein H is equal to zero.
(Item 57)
57. The method of any of items 41-56, wherein P is equal to zero.
(Item 58)
58. The method of any of items 41-57, wherein the additional error protection or hash symbols are stored in a magnetic storage device, optical storage device, flash memory device or cloud storage.
(Item 59)
A method for storing digital information in a nucleic acid molecule comprising:
receiving digital information as a first string of symbols, each symbol in said first string of symbols having a symbol value and a symbol position within said first string of symbols; a step in which the string of 1's has a length L1;
dividing the string of symbols into a plurality of blocks, each block having a length B;
computing a hash of length H for each block;
storing the hash block by block;
assembling the hashed block to form a second string of symbols having a length L2;
dividing the second string of symbols into a plurality of slices, each slice having a length S;
computing, for each slice, a number of error protection symbols of length P and adding said error protection symbols to the ends of said slices to obtain error protected slices;
assembling the error protected slices to form a third string of symbols having a length L3;
dividing the third string of symbols into a plurality of words, each word having a length W;
computing the codeword, word by word, using one or more codebooks;
concatenating the codewords to form a fourth string of symbols having a length L4;
mapping the fourth string of symbols to a plurality of identifier nucleic acid molecules, wherein each identifier of the plurality of identifier nucleic acid molecules comprises a corresponding plurality of components, each component in the plurality of components comprising: comprising distinct nucleic acid sequences, each individual identifier of said plurality of identifier nucleic acid molecules corresponding to an individual symbol in said fourth string of symbols;
constructing individual identifiers of the plurality of identifiers by placing the corresponding plurality of components in compartments and assembling the plurality of components together;
method including.
(Item 60)
60. The method of item 59, further comprising building an identifier pool comprising said plurality of identifiers.
(Item 61)
61. Method according to item 59 or 60, wherein said hash for each block is stored in a nucleic acid molecule, magnetic storage device, optical storage device, flash memory device or cloud storage.
(Item 62)
A method for storing digital information in a nucleic acid molecule comprising:
obtaining a plurality of blocks, each block containing a string of symbols and associated with a block ID;
assigning blocks of the plurality of blocks to containers;
mapping said block to a plurality of identifier nucleic acid sequences to be associated with said container, wherein individual identifier nucleic acid sequences of said plurality of identifier nucleic acid sequences correspond to individual symbols in said string of symbols; wherein each component nucleic acid sequence in said plurality of component nucleic acid sequences comprises a separate nucleic acid sequence;
constructing individual identifier nucleic acid molecules of the plurality of identifier nucleic acid sequences;
storing said individual identifier nucleic acid molecule in said assigned container, wherein a physical address comprising the identity of said container and said plurality of identifier nucleic acid sequences associated therewith are identified using said associated block ID; configured to determine a step and
method including.
(Item 63)
63. The method of item 62, wherein the block ID is an integer, string, triple, list of attributes, or semantic annotation.
(Item 64)
64. Method according to item 62 or 63, wherein said physical address is stored in a data structure designed to facilitate access of said physical address using said associated block ID.
(Item 65)
65. The method of any of items 62-64, wherein the data structure is one of a B-tree, trie or array.
(Item 66)
66. A method according to any of items 64-65, wherein at least part of said data structure is stored with said digital information in an index.
(Item 67)
67. The method of item 66, wherein the index comprises a second plurality of identifier nucleic acid sequences associated with a second container.
(Item 68)
68. The method of item 67, wherein the index comprises a B-tree data structure, each node of the B-tree comprising a distinct plurality of identifier nucleic acid molecules of the second plurality of identifier nucleic acid sequences.
(Item 69)
searching for the block ID in the B-tree;
(i) selecting said distinct plurality of identifier nucleic acid molecules comprising a first node;
(ii) reading the value of said first node;
(iii) repeating the process of steps (i) and (ii) with subsequent nodes, wherein the identity of said distinct plurality of identifier nucleic acid molecules comprising said subsequent nodes is determined by said first node; determined by the block ID for the value of
69. The method of item 68, comprising
(Item 70)
The first node is the root node of the B-tree, and the process of steps (i) and (ii) continues until the value of a leaf node of the B-tree is read, and the value of the leaf node is 70. The method of item 69, wherein the method is configured to communicate if a block exists for said block ID and, if said block ID exists, communicate said physical address of said block.
(Item 71)
68. The method of item 67, wherein the index comprises a trie data structure, each node of the trie comprising a distinct plurality of identifier nucleic acid molecules of the second plurality of identifier nucleic acid sequences.
(Item 72)
72. The method of item 71, wherein the block ID is a string of symbols, and each node in the trie corresponds to a possible prefix of the string of symbols.
(Item 73)
68. The method of item 67, wherein said data structure is an array and each element of said array comprises a distinct plurality of identifier nucleic acid molecules of said second plurality of identifier nucleic acid sequences.
(Item 74)
The method of E73, wherein each element in the array corresponds to a block ID.
(Item 75)
67. The method of item 66, wherein the index is stored in a magnetic storage device, optical storage device, flash memory device or cloud storage.
(Item 76)
68. The method of item 67, wherein the location in the index of the physical address is configured natively to the block ID.
(Item 77)
77. The method of item 76, wherein said block ID is directly mapped to a plurality of nucleic acid components.
(Item 78)
78. The method of item 77, wherein said plurality of identifier nucleic acid molecules in said index storing said physical addresses is composed of individual identifier nucleic acid molecules each comprising said plurality of components.
(Item 79)
79. The method of item 77 or 78, wherein said block ID is a triple entity annotating said associated block, said triple entity mapping to a plurality of nucleic acid components.
(Item 80)
80. The method of item 79, wherein said plurality of identifier nucleic acid molecules in said index comprising individual identifier nucleic acid molecules comprising said plurality of nucleic acid components stores said physical addresses of all blocks annotated by said entities.
(Item 81)
68. The method of item 67, wherein the physical address is configured natively to the block ID.
(Item 82)
77. Method according to item 76, wherein said block ID is directly mapped to said physical address.
(Item 83)
83. The method of item 82, wherein said plurality of identifier nucleic acid molecules storing said associated block is composed of individual identifier nucleic acid molecules each comprising said plurality of components.
(Item 84)
84. The method of item 82 or 83, wherein said block ID is a triple entity annotating said associated block, said triple entity mapping to a plurality of nucleic acid components.
(Item 85)
83. The method of item 82, wherein the plurality of identifier nucleic acid molecules in the container containing individual identifier nucleic acid molecules comprising the plurality of nucleic acid components stores all blocks annotated by the entity.
(Item 86)
73. The method of item 72, wherein a leaf node of the trie is configured to store the physical address associated with the block ID that matches the string of symbols specified by the trie in the leaf node.
(Item 87)
75. The method of item 74, wherein each element of the array stores the physical address of the associated block ID.
(Item 88)
a plurality of identifier nucleic acid molecules complete with a contiguously ordered identifier nucleic acid sequence, configured as specified by an identifier range comprising said identity of said first and last identifier in said identifier range; 88. The method of any of items 62-87, wherein the method comprises:
(Item 89)
89. The method of item 88, wherein the first and last identifiers in the range of identifiers are represented by integers.
(Item 90)
The method of any of items 62-E89, further comprising storing activation and ontology information with said digital information.
(Item 91)
91. A system for storing digital information in nucleic acid molecules according to any of the methods of items 30-90, comprising a sample management system for storing a plurality of containers of nucleic acids.
(Item 92)
92. The system of item 91, further comprising an automated machine for retrieving designated containers from said sample management system.

Claims

A method for storing digital information in a nucleic acid molecule comprising:
obtaining a plurality of blocks, each block containing a string of symbols and associated with a block ID;
assigning blocks of the plurality of blocks to containers;
mapping said block to a plurality of identifier nucleic acid sequences to be associated with said container, wherein individual identifier nucleic acid sequences of said plurality of identifier nucleic acid sequences correspond to individual symbols in said string of symbols; wherein each component nucleic acid sequence in said plurality of component nucleic acid sequences comprises a separate nucleic acid sequence;
constructing individual identifier nucleic acid molecules of the plurality of identifier nucleic acid sequences;
storing said individual identifier nucleic acid molecule in said assigned container, wherein a physical address comprising the identity of said container and said plurality of identifier nucleic acid sequences associated therewith are identified using said associated block ID; and a method configured to be determined.

2. The method of claim 1 , wherein the block ID is an integer, string, triple, list of attributes, or semantic annotation.

2. The method of claim 1 , wherein said physical address is stored in a data structure designed to facilitate access of said physical address using said associated block ID.

4. The method of claim 3 , wherein said data structure is one of a B-tree, trie or array.

4. The method of claim 3 , wherein at least a portion of said data structure is stored with said digital information in an index.

6. The method of claim 5, wherein the index is stored in a magnetic storage device, optical storage device, flash memory device or cloud storage.

6. The method of claim 5 , wherein the index comprises a second plurality of identifier nucleic acid sequences associated with a second container.

8. The method of claim 7 , wherein said index comprises a B-tree data structure, each node of said B-tree comprising a distinct plurality of identifier nucleic acid molecules of said second plurality of identifier nucleic acid sequences.

searching for the block ID in the B-tree;
selecting said distinct plurality of identifier nucleic acid molecules comprising a first node;
reading the value of the first node;
repeating the process of steps (i) and (ii) with subsequent nodes, wherein the identity of said distinct plurality of identifier nucleic acid molecules comprising said subsequent nodes is determined by said determined by the block ID for a value;
9. The method of claim 8 , comprising:

The first node is the root node of the B-tree, and the process of steps (i) and (ii) continues until the value of a leaf node of the B-tree is read, and the value of the leaf node is 10. The method of claim 9 , configured to communicate if a block exists for said block ID, and to communicate said physical address of said block if said block ID exists.

8. The method of claim 7 , wherein said index comprises a trie data structure, each node of said trie comprising a distinct plurality of identifier nucleic acid molecules of said second plurality of identifier nucleic acid sequences.

12. The method of claim 11 , wherein the block ID is a string of symbols and each node in the trie corresponds to a possible prefix of the string of symbols.

12. The method of claim 11 , wherein a leaf node of said trie is configured to store said physical address associated with said block ID that matches said string of symbols specified by said trie in said leaf node. .

8. The method of claim 7 , wherein said data structure is an array, each element of said array comprising a distinct plurality of identifier nucleic acid molecules of said second plurality of identifier nucleic acid sequences.

15. The method of claim 14 , wherein each element in said array corresponds to a block ID.

16. The method of claim 15, wherein each element of said array stores said physical address of said associated block ID.

8. The method of claim 7 , wherein the physical address is configured natively to the block ID.

18. The method of claim 17 , wherein said block ID is directly mapped to said physical address.

19. The method of claim 18 , wherein said plurality of identifier nucleic acid molecules storing said associated blocks is composed of individual identifier nucleic acid molecules each comprising said plurality of components.

19. The method of claim 18 , wherein said block ID is a triple entity that annotates said associated block, said triple entity mapping to multiple nucleic acid components.

19. The method of claim 18 , wherein said plurality of identifier nucleic acid molecules in said container comprising individual identifier nucleic acid molecules comprising said plurality of nucleic acid components stores all blocks annotated by said entity.

a plurality of identifier nucleic acid molecules complete with a sequentially ordered identifier nucleic acid sequence such that a plurality of identifier nucleic acid molecules are configured as specified by an identifier range comprising said identity of said first and last identifier in said identifier range; 2. The method of claim 1 , comprising:

24. The method of claim 23 , wherein said first and last identifiers in said range of identifiers are represented by integers.

2. The method of claim 1 , further comprising storing activation and ontology information with said digital information.